LLM Knowledge Graph Builder Frontend¶
Objective¶
This document provides a comprehensive guide for developers on how we build a React application integrated with Neo4j Aura for graph database functionalities. The application allows users to connect to a Neo4j Aura instance and we show you how to automatically create a graph from the unstructured text. We allow users to upload documents locally and from cloud buckets, YouTube videos, and Wikipedia pages, configure a graph schema, extract the lexical, entity and knowledge graph, visualize the extracted graph, ask questions and see the details that were used to generate the answers.
Architecture Structure¶
- For Knowledge Graph builder App:
- React JS – Application logic
- Axios – for network calls and handling responses
- Styled Components – To handle CSS in JS – Where we write all CSS ourselves, Or Tailwind CSS – 3rd party CSS classes to speed up development
- LongPooling: Long polling can be conceptualized as the simplest way to maintain a steady connection between a client and a server. It holds the request for a period if it has no response to send it back. It regularly updates clients with new information like updating a status, processed chunks every minute with new data.
- SSEs are the best options when the server generates the data in a loop and sends multiple events to the clients and if we need real-time traffic from the server to the client.
Project Structure¶
.
├── API
├── Assets
├── Components
│ ├── ChatBot
│ │ ├── Chatbot
│ │ ├── ChatInfoModal
│ │ ├── ChatModesSwitch
│ │ ├── ChatModeToggle
│ │ ├── ChatOnlyComponent
│ │ ├── ChatInfo
│ │ ├── CommonChatActions
│ │ ├── CommunitiesInfo
│ │ ├── EntitiesInfo
│ │ ├── ExpandedChatButtonContainer
│ │ ├── MetricsCheckbox
│ │ ├── MetricsTab
│ │ ├── MultiModeMetrics
│ │ └── SourcesInfo
│ ├── Data Sources
│ │ ├── AWS
│ │ ├── GCS
│ │ ├── Local
│ │ └── Web
│ │ └── WebButton
│ ├── Graph
│ │ ├── CheckboxSelection
│ │ ├── GraphPropertiesPanel
│ │ ├── GraphPropertiesTable
│ │ ├── GraphViewButton
│ │ ├── GraphViewModal
│ │ ├── LegendsChip
│ │ ├── ResizePanel
│ │ └── ResultOverview
│ ├── Layout
│ │ ├── AlertIcon
│ │ ├── DrawerChatbot
│ │ ├── DrawerDropzone
│ │ ├── Header
│ │ ├── PageLayout
│ │ └── SideNav
│ ├── Popups
│ │ ├── ChunkPopUp
│ │ ├── ConnectionModal
│ │ ├── DeletePopup
│ │ ├── GraphEnhancementDialog
│ │ ├── LargeFilePopup
│ │ ├── RetryConfirmation
│ │ └── Settings
│ ├── UI
│ │ ├── Alert
│ │ ├── ButtonWithTooltip
│ │ ├── BreakDownPopOver
│ │ ├── CustomButton
│ │ ├── CustomCheckBox
│ │ ├── CustomMenu
│ │ ├── CustomPopOver
│ │ ├── CustomProgressBar
│ │ ├── DatabaseIcon
│ │ ├── DatabaseStatusIcon
│ │ ├── Dropdown
│ │ ├── ErrorBoundary
│ │ ├── FallBackDialog
│ │ ├── HoverableLink
│ │ ├── IconButtonTooltip
│ │ ├── Legend
│ │ ├── ScienceMolecule
│ │ ├── ShowAll
│ │ └── TipWrapper
│ ├── Websources
│ │ ├── Web
│ │ ├── Wikipedia
│ │ ├── Youtube
│ │ ├── CustomSourceInput
│ │ ├── GenericSourceButton
│ │ └── GenericSourceModal
│ ├── Content
│ ├── FileTable
│ └── QuickStarter
├── HOC
│ ├── CustomModal
│ └── withVisibility
├── Assets
│ ├── images
│ │ └── Application Images
│ ├── chatbotMessages.json
│ └── schema.json
├── Context
│ ├── Alert
│ ├── ThemeWrapper
│ ├── UserCredentials
│ ├── UserMessages
│ └── UserFiles
├── Hooks
│ ├── useSourceInput
│ ├── useSpeech
│ └── useSSE
├── Services
├── Styling
│ └── info
├── Utils
│ ├── constants
│ ├── FileAPI
│ ├── Loader
│ ├── Queue
│ ├── toats
│ └── utils
├── App
├── index
├── main
├── router
├── types
└── README.md
Application Features¶
1. Setup and Installation¶
- Added Node.js with version v21.1.0 and npm on the development machine
- Install necessary dependencies by running
yarn install
, such as axios for making HTTP requests and others to interact with the graph
2. Connect to the Neo4j Aura instance¶
Created a connection modal by adding details including protocol, URI, database name, username, and password. Added a submit button that triggers an API: /connect
and accepts params like uri, password, username and database to establish a connection to the Neo4j Aura instance. Handled the authentication and error scenarios appropriately, by displaying relevant messages. To check whether the backend connection is up and working we hit the API: /health
. The user can now access both AURA DS and AURA DB instances.
- If GDS Connection is there icon is scientific molecule > Graph enhancement model > Post processing jobs > gives user the leverage to check and uncheck the communities checkbox
- If AURA DB > icon is database icon > Graph enhancement model > Post processing jobs > communities checkbox is disabled
Aura DS Connection
Aura DB connection
ReadOnly User
User not connected
3. File Source Integration¶
Implemented various file source integrations including drag-and-drop, web sources search that includes YouTube video, Wikipedia link, Amazon S3 file access, and Google Cloud Storage (GCS) file access. This allows users to upload PDF files from local storage or directly from the integrated sources.
The APIs are as follows:
/source_list
: to fetch the list of files in the DB
/upload
: to upload files from Local
/url/scan
: to scan the link or sources of YouTube, Wikipedia, and Web Sources
-
/url/scan
: to scan the files of S3 and GCS -
Add the respective Bucket URL, access key and secret key to access S3 files
- Add the respective Project ID, Bucket name, and folder to access GCS files
- User gets a redirect to the authentication page to authenticate their google account
4. File Source Extraction¶
/extract
to fetch the number of nodes and relationships created- During Extraction the selected files or all files in 'New' state go into 'Processing' state and then 'Completed' state if there are no failures
- A file with status Completed has an option to be Reprocess with the following options:
- A file with status Failed/Cancelled has an option to be Reprocess with the following options:
5. Graph Generation¶
/graph_query
:- Created a component for generating graphs based on the files in the table, to extract nodes and relationships
- When the user clicks on the Preview Graph or on the Table View icon the user can see that the graph model holds three options for viewing: Lexical Graph, Entity Graph and Knowledge Graph
- We utilized Neo4j's graph library to visualize the extracted nodes and relationships in the form of a graph query API:
/graph_query
- There are options for customizing the graph visualization such as layout algorithms [zoom in, zoom out, fit, refresh], node styling, relationship types
Preview Graph
File Graph
Graph Types
- Document & Chunk
- Entities
- Communities
/get_neighbours
: This API is used to retrieve the neighbor nodes of the given element id of the node
6. Chatbot¶
Created a Chatbot Component which has state variables to manage user input and chat messages. Once the user asks the question and clicks on the Ask button API: /chatbot
is triggered to send user input to the backend and receive the response. The chat also has options for users to see more details about the chat, text to speech and copy the response.
Chat Drawer View
Chat Modal View
/clear_chat_bot
: to clear the chat history which is saved in Neo4j DB
/chunk_entities
: to fetch the number of sources, entities and chunks
Sources
Entities
Chunks
/metric
: The API responsible for a evaluating chatbot responses on the basis of different metrics such as faithfulness and answer relevancy. This utilises RAGAS library to calculate these metrics
/additional_metrics
: The API responsible for a evaluating chatbot responses on the basis of different metrics such as context entity recall, semantic score, rouge score. This reuqire additional ground truth to be supplied by user. This utilises RAGAS library to calculate these metrics
Chat Modes
- There are five modes Vector, Fulltext, Graph+Vector+Fulltext, Entity search+Vector, Graph+Vector+Fulltext that can be provided to the chat to retrieve the answers in Production environment
- There is one more mode Graph that can be provided to the chat to retrieve the answers in Development environment
- There is one more mode Global search+Vector+Fulltext that can be provided to the chat to retrieve the answers if aura instance is GDS
1) In Production Environment
2) In Development Environment
7. Graph Enhancement Settings¶
Users can now set their own Schema for nodes and relations or can already be an existing schema
- Entity Extraction Settings:
/schema
: to fetch the existing schema that already exists in the db
/populate_graph_schema
: to fetch the schema from user entered document text
- Additional Instructions:
/delete_unconnected_nodes
: to remove the lonely entities
/merge_duplicate_nodes
: to merge the duplicate entities
1) to merge the duplicate entities
2) to get duplicate entities
/post_processing
: to fine-tune the knowledge graph for improved performance and deeper analysis
1) When GDS instance
2) When Aura DB instance
8. Application Options¶
- LLM Model
User can select desired LLM models
- Documentation: User can navigate to the application overview : https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
- GitHub Issues: User can navigate to the gitHub issues which are in developers bucket list : https://github.com/neo4j-labs/llm-graph-builder/issues
- Dark/Light Mode: User can choose the application view : both in dark and light mode
1) Dark
2) Light
- Chat Only Mode
User can also use the chat only feature by navigating to the url at: https://llm-graph-builder.neo4jlabs.com/chat-only to ask questions related to documents which have been completely processed. User is required to pass the login credentials to connect to the database
9. File Table Options¶
User can explore various features available for files in the table, including sorting, filtering, viewing as a graph, examining nodes and relationships, copying file details, and accessing chunks related to the file
File Status
File Nodes
File Relationships
File Actions
** Graph View
** Copy File Data
** Text Chunks
10. Interface Design¶
Designed a user-friendly interface that guides users through the process of connecting to Neo4j Aura, accessing file sources, uploading PDF files, and generating graphs
- Components: @neo4j-ndl/react
- Icons: @neo4j-ndl/react/icons
- Graph Visualization: @neo4j-nvl/react
- NVL: @neo4j-nvl/core
- CSS: Inline styling, tailwind CSS
11. Deployment¶
Followed best practices for optimizing performance and security of the deployed application
- Local Deployment: ** Running through docker-compose ** By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations ** In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both),
** By default, the input sources will be: Local files, Youtube, Wikipedia ,AWS S3 and Webpages. As this default config is applied: ** By default,all of the chat modes will be available: vector, graph+vector and graph. If none of the mode is mentioned in the chat modes variable all modes will be available: ** You can then run Docker Compose to build and start all components:
[source,indent=0]¶
- VITE_LLM_MODELS=""
- VITE_REACT_APP_SOURCES=""
- VITE_GOOGLE_CLIENT_ID="xxxx" [For Google GCS integration]
- VITE_CHAT_MODES=""
- VITE_CHUNK_SIZE=5242880
- VITE_TIME_PER_PAGE=50
- VITE_LARGE_FILE_SIZE=5242880
- VITE_ENV="PROD"/ 'DEV'
- VITE_BACKEND_API_URL=
- VITE_BLOOM_URL=
- VITE_BACKEND_PROCESSING_URL=
- VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
- VITE_BATCH_SIZE=2
- Cloud Deployment: ** To deploy the app install the gcloud cli , run the following command in the terminal specifically from frontend root folder. *** gcloud run deploy *** source location current directory > Frontend *** region : 32 [us-central 1] *** Allow unauthenticated request : Yes
12. API Reference¶
1) Connection Modal¶
POST /connect¶
Neo4j database connection on frontend is done with this API
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database name
2) Backend Database connection¶
POST /backend_connection_configuation¶
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database name
3) Upload Files from Local¶
POST /upload¶
The upload endpoint is designed to handle the uploading of large files by breaking them into smaller chunks. This method ensures that large files can be uploaded efficiently without overloading the server
API Parameters :
file
= File to be uploadedsource_type
= Source of the file
4) User Defined Schema¶
POST /schema¶
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database name
5) Graph schema from Input Text¶
POST /populate_graph_schema¶
The API is used to populate a graph schema based on the provided input text, model, and schema description flag
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database nameinput_text
= Text to generate schema frommodel
= LLM model to useis_schema_description_checked
=A flag indicating whether the schema description should be considered.
6) Unstructured Sources¶
POST /url/scan¶
Create Document node for other sources - s3 bucket, gcs bucket, wikipedia, youtube url and web pages
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database nameurl
= URL to scansource_type
= Source of the file
7) Extration of Nodes and Relations from Data¶
POST /extract¶
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namesource_types
= Source of the filelanguage
=Language in which wikipedia content will be extracted
8) Get list of sources¶
GET /sources_list¶
List all sources (Document nodes) present in Neo4j graph database
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database name
9) Post processing after graph generation¶
POST /post_processing :¶
This API is called at the end of processing of whole document to get create k-nearest neighbor relations between similar chunks of document based on KNN_MIN_SCORE which is 0.8 by default and to drop and create a full text index on db labels
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database nametasks
= List of tasks to perform
10) Chat with Data¶
POST /chat_bot¶
The API responsible for a chatbot system designed to leverage multiple AI models and a Neo4j graph database, providing answers to user queries. It interacts with AI models from OpenAI and Google's Vertex AI and utilizes embedding models to enhance the retrieval of relevant information
Components :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namequery
= User querymodel
= LLM model to usemode
= Chat mode to usesession_id
= Session ID used to maintain the history of chats during the user's connection
11) Get entities from chunks¶
POST/chunk_entities¶
This API is used to get the entities and relations associated with a particular chunk and chunk metadata
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namechunk_id
= ID of the chunk to get entities for
12) Clear chat history¶
POST /clear_chat_bot¶
This API is used to clear the chat history which is saved in Neo4j DB
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namesession_id
= User session id for QA chat
13) View graph for a file¶
POST /graph_query¶
This API is used to view graph for a particular file
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namedocument_names
= File name for which user wants to view graph
14) Get neighbour nodes¶
POST /get_neighbours¶
This API is used to retrive the neighbor nodes of the given element id of the node
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database nameelementId
= Element id of the node to retrive its neighbours
15) SSE event to update processing status¶
GET /update_extract_status¶
The API provides a continuous update on the extraction status of a specified file. It uses Server-Sent Events (SSE) to stream updates to the client
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namesource_types
= Source of the file
16) Delete selected documents¶
POST /delete_document_and_entities¶
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namesource_types
= Source of the filedeleteEntities
= Boolean value to check entities deletion is requested or not
17) Cancel processing job¶
POST/cancelled_job¶
This API is responsible for cancelling an in process job
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namesource_types
= Source of the file
18) Deletion of orpahn nodes¶
POST /delete_unconnected_nodes¶
The API is used to delete unconnected entities from database
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database nameunconnected_entities_list
=selected entities list to delete of unconnected entities
19) Get the list of orphan nodes¶
POST /get_unconnected_nodes_list¶
The API retrieves a list of nodes in the graph database that are not connected to any other nodes
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database name
20) Get duplicate nodes¶
POST /get_duplicate_nodes¶
The API is used to fetch duplicate entities from database
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database name
21) Merge duplicate nodes¶
POST /merge_duplicate_nodes¶
The API is used to merge duplicate entities from database selected by user
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database nameduplicate_nodes_list
= selected entities list to merge of with similar entities
22) Drop and create vector index¶
POST /drop_create_vector_index¶
The API is used to drop and create the vector index when vector index dimesion are different
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database nameisVectorIndexExist
= True or False based on whether vector index exist in database
23) Reprocessing of sources¶
POST /retry_processing¶
API Parameters :
uri
= Neo4j database URIusername
= Neo4j database usernamepassword
= Neo4j database passworddatabase
= Neo4j database namesource_types
= Source of the fileretry_condition
= One of the above 3 conditions which is selected for reprocessing.
13. Conclusion¶
In conclusion, this technical document outlines the process of building a React application with Neo4j Aura integration for graph database functionalities
14. Referral Links¶
- Dev env : https://dev-frontend-dcavk67s4a-uc.a.run.app/
- Staging env: https://staging-frontend-dcavk67s4a-uc.a.run.app/
- Prod env: https://prod-frontend-dcavk67s4a-uc.a.run.app/