LLM Knowledge Graph Builder Frontend¶

Objective¶

This document provides a comprehensive guide for developers on how we build a React application integrated with Neo4j Aura for graph database functionalities. The application allows users to connect to a Neo4j Aura instance and we show you how to automatically create a graph from the unstructured text. We allow users to upload documents locally and from cloud buckets, YouTube videos, and Wikipedia pages, configure a graph schema, extract the lexical, entity and knowledge graph, visualize the extracted graph, ask questions and see the details that were used to generate the answers.

Architecture Structure¶

For Knowledge Graph builder App:
React JS – Application logic
Axios – for network calls and handling responses
Styled Components – To handle CSS in JS – Where we write all CSS ourselves, Or Tailwind CSS – 3^rd party CSS classes to speed up development
LongPooling: Long polling can be conceptualized as the simplest way to maintain a steady connection between a client and a server. It holds the request for a period if it has no response to send it back. It regularly updates clients with new information like updating a status, processed chunks every minute with new data.
SSEs are the best options when the server generates the data in a loop and sends multiple events to the clients and if we need real-time traffic from the server to the client.

Project Structure¶

.
├── API 
├── Assets
├── Components 
│   ├── ChatBot
│   │   ├── Chatbot
│   │   ├── ChatInfoModal
│   │   ├── ChatModesSwitch
│   │   ├── ChatModeToggle
│   │   ├── ChatOnlyComponent
│   │   ├── ChatInfo
│   │   ├── CommonChatActions
│   │   ├── CommunitiesInfo
│   │   ├── EntitiesInfo
│   │   ├── ExpandedChatButtonContainer
│   │   ├── MetricsCheckbox
│   │   ├── MetricsTab
│   │   ├── MultiModeMetrics
│   │   └── SourcesInfo
│   ├── Data Sources
│   │   ├── AWS
│   │   ├── GCS
│   │   ├── Local
│   │   └── Web
│   │       └── WebButton
│   ├── Graph
│   │   ├── CheckboxSelection
│   │   ├── GraphPropertiesPanel
│   │   ├── GraphPropertiesTable
│   │   ├── GraphViewButton
│   │   ├── GraphViewModal
│   │   ├── LegendsChip
│   │   ├── ResizePanel
│   │   └── ResultOverview
│   ├── Layout
│   │   ├── AlertIcon
│   │   ├── DrawerChatbot
│   │   ├── DrawerDropzone
│   │   ├── Header
│   │   ├── PageLayout
│   │   └── SideNav
│   ├── Popups
│   │   ├── ChunkPopUp
│   │   ├── ConnectionModal
│   │   ├── DeletePopup
│   │   ├── GraphEnhancementDialog
│   │   ├── LargeFilePopup
│   │   ├── RetryConfirmation
│   │   └── Settings
│   ├── UI
│   │   ├── Alert
│   │   ├── ButtonWithTooltip
│   │   ├── BreakDownPopOver
│   │   ├── CustomButton
│   │   ├── CustomCheckBox
│   │   ├── CustomMenu
│   │   ├── CustomPopOver
│   │   ├── CustomProgressBar
│   │   ├── DatabaseIcon
│   │   ├── DatabaseStatusIcon
│   │   ├── Dropdown
│   │   ├── ErrorBoundary
│   │   ├── FallBackDialog
│   │   ├── HoverableLink
│   │   ├── IconButtonTooltip
│   │   ├── Legend
│   │   ├── ScienceMolecule
│   │   ├── ShowAll
│   │   └── TipWrapper
│   ├── Websources
│   │   ├── Web
│   │   ├── Wikipedia
│   │   ├── Youtube
│   │   ├── CustomSourceInput
│   │   ├── GenericSourceButton
│   │   └── GenericSourceModal
│   ├── Content
│   ├── FileTable
│   └── QuickStarter
├── HOC
│   ├── CustomModal
│   └── withVisibility
├── Assets
│   ├── images
│   │   └── Application Images
│   ├── chatbotMessages.json
│   └── schema.json
├── Context
│   ├── Alert
│   ├── ThemeWrapper
│   ├── UserCredentials
│   ├── UserMessages
│   └── UserFiles
├── Hooks
│   ├── useSourceInput
│   ├── useSpeech
│   └── useSSE
├── Services
├── Styling
│   └── info
├── Utils
│   ├── constants
│   ├── FileAPI
│   ├── Loader
│   ├── Queue
│   ├── toats
│   └── utils
├── App
├── index
├── main
├── router
├── types
└── README.md

Application Features¶

1. Setup and Installation¶

Added Node.js with version v21.1.0 and npm on the development machine
Install necessary dependencies by running yarn install, such as axios for making HTTP requests and others to interact with the graph

2. Connect to the Neo4j Aura instance¶

Created a connection modal by adding details including protocol, URI, database name, username, and password. Added a submit button that triggers an API: /connect and accepts params like uri, password, username and database to establish a connection to the Neo4j Aura instance. Handled the authentication and error scenarios appropriately, by displaying relevant messages. To check whether the backend connection is up and working we hit the API: /health. The user can now access both AURA DS and AURA DB instances.

If GDS Connection is there icon is scientific molecule > Graph enhancement model > Post processing jobs > gives user the leverage to check and uncheck the communities checkbox
If AURA DB > icon is database icon > Graph enhancement model > Post processing jobs > communities checkbox is disabled

Aura DS Connection

Aura DB connection

ReadOnly User

User not connected

3. File Source Integration¶

Implemented various file source integrations including drag-and-drop, web sources search that includes YouTube video, Wikipedia link, Amazon S3 file access, and Google Cloud Storage (GCS) file access. This allows users to upload PDF files from local storage or directly from the integrated sources.

The APIs are as follows:

/source_list: to fetch the list of files in the DB

/upload: to upload files from Local

/url/scan: to scan the link or sources of YouTube, Wikipedia, and Web Sources

/url/scan: to scan the files of S3 and GCS
Add the respective Bucket URL, access key and secret key to access S3 files

Add the respective Project ID, Bucket name, and folder to access GCS files

User gets a redirect to the authentication page to authenticate their google account

4. File Source Extraction¶

/extract to fetch the number of nodes and relationships created
During Extraction the selected files or all files in 'New' state go into 'Processing' state and then 'Completed' state if there are no failures

A file with status Completed has an option to be Reprocess with the following options:

A file with status Failed/Cancelled has an option to be Reprocess with the following options:

5. Graph Generation¶

/graph_query:
Created a component for generating graphs based on the files in the table, to extract nodes and relationships
When the user clicks on the Preview Graph or on the Table View icon the user can see that the graph model holds three options for viewing: Lexical Graph, Entity Graph and Knowledge Graph
We utilized Neo4j's graph library to visualize the extracted nodes and relationships in the form of a graph query API: /graph_query
There are options for customizing the graph visualization such as layout algorithms [zoom in, zoom out, fit, refresh], node styling, relationship types

Preview Graph

File Graph

Graph Types

Document & Chunk

Entities

Communities

/get_neighbours: This API is used to retrieve the neighbor nodes of the given element id of the node

6. Chatbot¶

Created a Chatbot Component which has state variables to manage user input and chat messages. Once the user asks the question and clicks on the Ask button API: /chatbot is triggered to send user input to the backend and receive the response. The chat also has options for users to see more details about the chat, text to speech and copy the response.

Chat Drawer View

Chat Modal View

/clear_chat_bot: to clear the chat history which is saved in Neo4j DB

/chunk_entities: to fetch the number of sources, entities and chunks

Sources

Entities

Chunks

/metric: The API responsible for a evaluating chatbot responses on the basis of different metrics such as faithfulness and answer relevancy. This utilises RAGAS library to calculate these metrics

/additional_metrics: The API responsible for a evaluating chatbot responses on the basis of different metrics such as context entity recall, semantic score, rouge score. This reuqire additional ground truth to be supplied by user. This utilises RAGAS library to calculate these metrics

Chat Modes

There are five modes Vector, Fulltext, Graph+Vector+Fulltext, Entity search+Vector, Graph+Vector+Fulltext that can be provided to the chat to retrieve the answers in Production environment
There is one more mode Graph that can be provided to the chat to retrieve the answers in Development environment
There is one more mode Global search+Vector+Fulltext that can be provided to the chat to retrieve the answers if aura instance is GDS

1) In Production Environment

2) In Development Environment

7. Graph Enhancement Settings¶

Users can now set their own Schema for nodes and relations or can already be an existing schema

Entity Extraction Settings:

/schema: to fetch the existing schema that already exists in the db

/populate_graph_schema: to fetch the schema from user entered document text

Additional Instructions:

/delete_unconnected_nodes: to remove the lonely entities

/merge_duplicate_nodes: to merge the duplicate entities

1) to merge the duplicate entities

2) to get duplicate entities

/post_processing: to fine-tune the knowledge graph for improved performance and deeper analysis

1) When GDS instance

2) When Aura DB instance

8. Application Options¶

LLM Model

User can select desired LLM models

Documentation: User can navigate to the application overview : https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/

GitHub Issues: User can navigate to the gitHub issues which are in developers bucket list : https://github.com/neo4j-labs/llm-graph-builder/issues

Dark/Light Mode: User can choose the application view : both in dark and light mode

1) Dark

2) Light

Chat Only Mode

User can also use the chat only feature by navigating to the url at: https://llm-graph-builder.neo4jlabs.com/chat-only to ask questions related to documents which have been completely processed. User is required to pass the login credentials to connect to the database

9. File Table Options¶

User can explore various features available for files in the table, including sorting, filtering, viewing as a graph, examining nodes and relationships, copying file details, and accessing chunks related to the file

File Status

File Nodes

File Relationships

File Actions

** Graph View

** Copy File Data

** Text Chunks

10. Interface Design¶

Designed a user-friendly interface that guides users through the process of connecting to Neo4j Aura, accessing file sources, uploading PDF files, and generating graphs

Components: @neo4j-ndl/react
Icons: @neo4j-ndl/react/icons
Graph Visualization: @neo4j-nvl/react
NVL: @neo4j-nvl/core
CSS: Inline styling, tailwind CSS

11. Deployment¶

Followed best practices for optimizing performance and security of the deployed application

Local Deployment: ** Running through docker-compose ** By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations ** In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both),
** By default, the input sources will be: Local files, Youtube, Wikipedia ,AWS S3 and Webpages. As this default config is applied: ** By default,all of the chat modes will be available: vector, graph+vector and graph. If none of the mode is mentioned in the chat modes variable all modes will be available: ** You can then run Docker Compose to build and start all components:

[source,indent=0]¶

VITE_LLM_MODELS=""
VITE_REACT_APP_SOURCES=""
VITE_GOOGLE_CLIENT_ID="xxxx" [For Google GCS integration]
VITE_CHAT_MODES=""
VITE_CHUNK_SIZE=5242880
VITE_TIME_PER_PAGE=50
VITE_LARGE_FILE_SIZE=5242880
VITE_ENV="PROD"/ 'DEV'
VITE_BACKEND_API_URL=
VITE_BLOOM_URL=
VITE_BACKEND_PROCESSING_URL=
VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
VITE_BATCH_SIZE=2

Cloud Deployment: ** To deploy the app install the gcloud cli , run the following command in the terminal specifically from frontend root folder. *** gcloud run deploy *** source location current directory > Frontend *** region : 32 [us-central 1] *** Allow unauthenticated request : Yes

12. API Reference¶

POST /connect¶

Neo4j database connection on frontend is done with this API

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name

2) Backend Database connection¶

POST /backend_connection_configuation¶

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name

3) Upload Files from Local¶

POST /upload¶

The upload endpoint is designed to handle the uploading of large files by breaking them into smaller chunks. This method ensures that large files can be uploaded efficiently without overloading the server

API Parameters :

file= File to be uploaded
source_type= Source of the file

4) User Defined Schema¶

POST /schema¶

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name

5) Graph schema from Input Text¶

POST /populate_graph_schema¶

The API is used to populate a graph schema based on the provided input text, model, and schema description flag

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
input_text= Text to generate schema from
model= LLM model to use
is_schema_description_checked=A flag indicating whether the schema description should be considered.

6) Unstructured Sources¶

POST /url/scan¶

Create Document node for other sources - s3 bucket, gcs bucket, wikipedia, youtube url and web pages

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
url= URL to scan
source_type= Source of the file

7) Extration of Nodes and Relations from Data¶

POST /extract¶

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
source_types= Source of the file
language=Language in which wikipedia content will be extracted

8) Get list of sources¶

GET /sources_list¶

List all sources (Document nodes) present in Neo4j graph database

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name

9) Post processing after graph generation¶

POST /post_processing :¶

This API is called at the end of processing of whole document to get create k-nearest neighbor relations between similar chunks of document based on KNN_MIN_SCORE which is 0.8 by default and to drop and create a full text index on db labels

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
tasks= List of tasks to perform

10) Chat with Data¶

POST /chat_bot¶

The API responsible for a chatbot system designed to leverage multiple AI models and a Neo4j graph database, providing answers to user queries. It interacts with AI models from OpenAI and Google's Vertex AI and utilizes embedding models to enhance the retrieval of relevant information

Components :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
query= User query
model= LLM model to use
mode= Chat mode to use
session_id= Session ID used to maintain the history of chats during the user's connection

11) Get entities from chunks¶

POST/chunk_entities¶

This API is used to get the entities and relations associated with a particular chunk and chunk metadata

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
chunk_id= ID of the chunk to get entities for

12) Clear chat history¶

POST /clear_chat_bot¶

This API is used to clear the chat history which is saved in Neo4j DB

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
session_id = User session id for QA chat

13) View graph for a file¶

POST /graph_query¶

This API is used to view graph for a particular file

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
document_names = File name for which user wants to view graph

14) Get neighbour nodes¶

POST /get_neighbours¶

This API is used to retrive the neighbor nodes of the given element id of the node

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
elementId = Element id of the node to retrive its neighbours

15) SSE event to update processing status¶

GET /update_extract_status¶

The API provides a continuous update on the extraction status of a specified file. It uses Server-Sent Events (SSE) to stream updates to the client

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
source_types= Source of the file

16) Delete selected documents¶

POST /delete_document_and_entities¶

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
source_types= Source of the file
deleteEntities= Boolean value to check entities deletion is requested or not

17) Cancel processing job¶

POST/cancelled_job¶

This API is responsible for cancelling an in process job

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
source_types= Source of the file

18) Deletion of orpahn nodes¶

POST /delete_unconnected_nodes¶

The API is used to delete unconnected entities from database

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
unconnected_entities_list=selected entities list to delete of unconnected entities

19) Get the list of orphan nodes¶

POST /get_unconnected_nodes_list¶

The API retrieves a list of nodes in the graph database that are not connected to any other nodes

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name

20) Get duplicate nodes¶

POST /get_duplicate_nodes¶

The API is used to fetch duplicate entities from database

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name

21) Merge duplicate nodes¶

POST /merge_duplicate_nodes¶

The API is used to merge duplicate entities from database selected by user

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
duplicate_nodes_list= selected entities list to merge of with similar entities

22) Drop and create vector index¶

POST /drop_create_vector_index¶

The API is used to drop and create the vector index when vector index dimesion are different

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
isVectorIndexExist= True or False based on whether vector index exist in database

23) Reprocessing of sources¶

POST /retry_processing¶

API Parameters :

uri= Neo4j database URI
username= Neo4j database username
password= Neo4j database password
database= Neo4j database name
source_types= Source of the file
retry_condition = One of the above 3 conditions which is selected for reprocessing.

13. Conclusion¶

In conclusion, this technical document outlines the process of building a React application with Neo4j Aura integration for graph database functionalities

14. Referral Links¶

Dev env : https://dev-frontend-dcavk67s4a-uc.a.run.app/
Staging env: https://staging-frontend-dcavk67s4a-uc.a.run.app/
Prod env: https://prod-frontend-dcavk67s4a-uc.a.run.app/

LLM Knowledge Graph Builder Frontend¶

Objective¶

Architecture Structure¶

Project Structure¶

Application Features¶

1. Setup and Installation¶

2. Connect to the Neo4j Aura instance¶

3. File Source Integration¶

4. File Source Extraction¶

5. Graph Generation¶

6. Chatbot¶

7. Graph Enhancement Settings¶

8. Application Options¶

9. File Table Options¶

10. Interface Design¶

11. Deployment¶

[source,indent=0]¶

12. API Reference¶

1) Connection Modal¶

POST /connect¶

2) Backend Database connection¶

POST /backend_connection_configuation¶

3) Upload Files from Local¶

POST /upload¶

4) User Defined Schema¶

POST /schema¶

5) Graph schema from Input Text¶

POST /populate_graph_schema¶

6) Unstructured Sources¶

POST /url/scan¶

7) Extration of Nodes and Relations from Data¶

POST /extract¶

8) Get list of sources¶

GET /sources_list¶

9) Post processing after graph generation¶

POST /post_processing :¶

10) Chat with Data¶

POST /chat_bot¶

11) Get entities from chunks¶

POST/chunk_entities¶

12) Clear chat history¶

POST /clear_chat_bot¶

13) View graph for a file¶

POST /graph_query¶

14) Get neighbour nodes¶

POST /get_neighbours¶

15) SSE event to update processing status¶

GET /update_extract_status¶

16) Delete selected documents¶

POST /delete_document_and_entities¶

17) Cancel processing job¶

POST/cancelled_job¶

18) Deletion of orpahn nodes¶

POST /delete_unconnected_nodes¶

19) Get the list of orphan nodes¶

POST /get_unconnected_nodes_list¶

20) Get duplicate nodes¶

POST /get_duplicate_nodes¶

21) Merge duplicate nodes¶

POST /merge_duplicate_nodes¶

22) Drop and create vector index¶

POST /drop_create_vector_index¶

23) Reprocessing of sources¶

POST /retry_processing¶

13. Conclusion¶

14. Referral Links¶