Skip to content

LLM Knowledge Graph Builder Frontend

Objective

This document provides a comprehensive guide for developers on how we build a React application integrated with Neo4j Aura for graph database functionalities. The application allows users to connect to a Neo4j Aura instance and we show you how to automatically create a graph from the unstructured text. We allow users to upload documents locally and from cloud buckets, YouTube videos, and Wikipedia pages, configure a graph schema, extract the lexical, entity and knowledge graph, visualize the extracted graph, ask questions and see the details that were used to generate the answers.

Architecture Structure

  • For Knowledge Graph builder App:
  • React JS – Application logic
  • Axios – for network calls and handling responses
  • Styled Components – To handle CSS in JS – Where we write all CSS ourselves, Or Tailwind CSS – 3rd party CSS classes to speed up development
  • LongPooling: Long polling can be conceptualized as the simplest way to maintain a steady connection between a client and a server. It holds the request for a period if it has no response to send it back. It regularly updates clients with new information like updating a status, processed chunks every minute with new data.
  • SSEs are the best options when the server generates the data in a loop and sends multiple events to the clients and if we need real-time traffic from the server to the client.

Project Structure

.
├── API 
├── Assets
├── Components 
│   ├── ChatBot
│   │   ├── Chatbot
│   │   ├── ChatInfoModal
│   │   ├── ChatModesSwitch
│   │   ├── ChatModeToggle
│   │   ├── ChatOnlyComponent
│   │   ├── ChatInfo
│   │   ├── CommonChatActions
│   │   ├── CommunitiesInfo
│   │   ├── EntitiesInfo
│   │   ├── ExpandedChatButtonContainer
│   │   ├── MetricsCheckbox
│   │   ├── MetricsTab
│   │   ├── MultiModeMetrics
│   │   └── SourcesInfo
│   ├── Data Sources
│   │   ├── AWS
│   │   ├── GCS
│   │   ├── Local
│   │   └── Web
│   │       └── WebButton
│   ├── Graph
│   │   ├── CheckboxSelection
│   │   ├── GraphPropertiesPanel
│   │   ├── GraphPropertiesTable
│   │   ├── GraphViewButton
│   │   ├── GraphViewModal
│   │   ├── LegendsChip
│   │   ├── ResizePanel
│   │   └── ResultOverview
│   ├── Layout
│   │   ├── AlertIcon
│   │   ├── DrawerChatbot
│   │   ├── DrawerDropzone
│   │   ├── Header
│   │   ├── PageLayout
│   │   └── SideNav
│   ├── Popups
│   │   ├── ChunkPopUp
│   │   ├── ConnectionModal
│   │   ├── DeletePopup
│   │   ├── GraphEnhancementDialog
│   │   ├── LargeFilePopup
│   │   ├── RetryConfirmation
│   │   └── Settings
│   ├── UI
│   │   ├── Alert
│   │   ├── ButtonWithTooltip
│   │   ├── BreakDownPopOver
│   │   ├── CustomButton
│   │   ├── CustomCheckBox
│   │   ├── CustomMenu
│   │   ├── CustomPopOver
│   │   ├── CustomProgressBar
│   │   ├── DatabaseIcon
│   │   ├── DatabaseStatusIcon
│   │   ├── Dropdown
│   │   ├── ErrorBoundary
│   │   ├── FallBackDialog
│   │   ├── HoverableLink
│   │   ├── IconButtonTooltip
│   │   ├── Legend
│   │   ├── ScienceMolecule
│   │   ├── ShowAll
│   │   └── TipWrapper
│   ├── Websources
│   │   ├── Web
│   │   ├── Wikipedia
│   │   ├── Youtube
│   │   ├── CustomSourceInput
│   │   ├── GenericSourceButton
│   │   └── GenericSourceModal
│   ├── Content
│   ├── FileTable
│   └── QuickStarter
├── HOC
│   ├── CustomModal
│   └── withVisibility
├── Assets
│   ├── images
│   │   └── Application Images
│   ├── chatbotMessages.json
│   └── schema.json
├── Context
│   ├── Alert
│   ├── ThemeWrapper
│   ├── UserCredentials
│   ├── UserMessages
│   └── UserFiles
├── Hooks
│   ├── useSourceInput
│   ├── useSpeech
│   └── useSSE
├── Services
├── Styling
│   └── info
├── Utils
│   ├── constants
│   ├── FileAPI
│   ├── Loader
│   ├── Queue
│   ├── toats
│   └── utils
├── App
├── index
├── main
├── router
├── types
└── README.md

Application Features

1. Setup and Installation

  • Added Node.js with version v21.1.0 and npm on the development machine
  • Install necessary dependencies by running yarn install, such as axios for making HTTP requests and others to interact with the graph

2. Connect to the Neo4j Aura instance

Created a connection modal by adding details including protocol, URI, database name, username, and password. Added a submit button that triggers an API: /connect and accepts params like uri, password, username and database to establish a connection to the Neo4j Aura instance. Handled the authentication and error scenarios appropriately, by displaying relevant messages. To check whether the backend connection is up and working we hit the API: /health. The user can now access both AURA DS and AURA DB instances.

  • If GDS Connection is there icon is scientific molecule > Graph enhancement model > Post processing jobs > gives user the leverage to check and uncheck the communities checkbox
  • If AURA DB > icon is database icon > Graph enhancement model > Post processing jobs > communities checkbox is disabled

No Connection

Aura DS Connection

Connection

Aura DB connection

Connection

ReadOnly User

ReadOnly User

User not connected

User not Connection

3. File Source Integration

Implemented various file source integrations including drag-and-drop, web sources search that includes YouTube video, Wikipedia link, Amazon S3 file access, and Google Cloud Storage (GCS) file access. This allows users to upload PDF files from local storage or directly from the integrated sources.

The APIs are as follows:

  • /source_list: to fetch the list of files in the DB

Connected

  • /upload: to upload files from Local

Local File

  • /url/scan: to scan the link or sources of YouTube, Wikipedia, and Web Sources

Web Sources

  • /url/scan: to scan the files of S3 and GCS

  • Add the respective Bucket URL, access key and secret key to access S3 files

S3 scan

  1. Add the respective Project ID, Bucket name, and folder to access GCS files

GCS scan

  1. User gets a redirect to the authentication page to authenticate their google account

auth login scan

4. File Source Extraction

  • /extract to fetch the number of nodes and relationships created
  • During Extraction the selected files or all files in 'New' state go into 'Processing' state and then 'Completed' state if there are no failures

Generate Graph

  1. A file with status Completed has an option to be Reprocess with the following options:

Completed Ready To Reprocess

  1. A file with status Failed/Cancelled has an option to be Reprocess with the following options:

Failed Ready To Reprocess

5. Graph Generation

  • /graph_query:
  • Created a component for generating graphs based on the files in the table, to extract nodes and relationships
  • When the user clicks on the Preview Graph or on the Table View icon the user can see that the graph model holds three options for viewing: Lexical Graph, Entity Graph and Knowledge Graph
  • We utilized Neo4j's graph library to visualize the extracted nodes and relationships in the form of a graph query API: /graph_query
  • There are options for customizing the graph visualization such as layout algorithms [zoom in, zoom out, fit, refresh], node styling, relationship types

Preview Graph

All Files Graph

File Graph

Single File Graph

Graph Types

  1. Document & Chunk

Knowledge Graph

  1. Entities

Entity Graph

  1. Communities

Community Graph

  • /get_neighbours: This API is used to retrieve the neighbor nodes of the given element id of the node

Neighbourhood Graph

6. Chatbot

Created a Chatbot Component which has state variables to manage user input and chat messages. Once the user asks the question and clicks on the Ask button API: /chatbot is triggered to send user input to the backend and receive the response. The chat also has options for users to see more details about the chat, text to speech and copy the response.

Chat Drawer View

ChatBot Side View

Chat Modal View

ChatBot Modal View

  • /clear_chat_bot: to clear the chat history which is saved in Neo4j DB

Clear Chat History

  • /chunk_entities: to fetch the number of sources, entities and chunks

Sources

Sources

Entities

Entities Info

Chunks

Chunks Info

  • /metric: The API responsible for a evaluating chatbot responses on the basis of different metrics such as faithfulness and answer relevancy. This utilises RAGAS library to calculate these metrics

Metric Eval

  • /additional_metrics: The API responsible for a evaluating chatbot responses on the basis of different metrics such as context entity recall, semantic score, rouge score. This reuqire additional ground truth to be supplied by user. This utilises RAGAS library to calculate these metrics

Additional Metric Eval

Chat Modes

  • There are five modes Vector, Fulltext, Graph+Vector+Fulltext, Entity search+Vector, Graph+Vector+Fulltext that can be provided to the chat to retrieve the answers in Production environment
  • There is one more mode Graph that can be provided to the chat to retrieve the answers in Development environment
  • There is one more mode Global search+Vector+Fulltext that can be provided to the chat to retrieve the answers if aura instance is GDS

1) In Production Environment

Chat Modes Prod

2) In Development Environment

Chat Modes Dev

7. Graph Enhancement Settings

Users can now set their own Schema for nodes and relations or can already be an existing schema

  • Entity Extraction Settings:

Graph Enhancements

  • /schema: to fetch the existing schema that already exists in the db

Predefined Schema

  • /populate_graph_schema: to fetch the schema from user entered document text

User Defined Schema

  • Additional Instructions:

Additional Instructions

  • /delete_unconnected_nodes: to remove the lonely entities

Delete Orphan Nodes

  • /merge_duplicate_nodes: to merge the duplicate entities

1) to merge the duplicate entities

Merge Duplicate Entities

2) to get duplicate entities

Get Duplicate Nodes

  • /post_processing: to fine-tune the knowledge graph for improved performance and deeper analysis

1) When GDS instance

Post Processing DB

2) When Aura DB instance

Post Processing DB

8. Application Options

  • LLM Model

User can select desired LLM models

Dropdown

  • Documentation: User can navigate to the application overview : https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/

LLM Graph Builder Documentation

  • GitHub Issues: User can navigate to the gitHub issues which are in developers bucket list : https://github.com/neo4j-labs/llm-graph-builder/issues

GitHub Issues

  • Dark/Light Mode: User can choose the application view : both in dark and light mode

1) Dark

Dark Mode

2) Light

Light Mode

  • Chat Only Mode

User can also use the chat only feature by navigating to the url at: https://llm-graph-builder.neo4jlabs.com/chat-only to ask questions related to documents which have been completely processed. User is required to pass the login credentials to connect to the database

9. File Table Options

User can explore various features available for files in the table, including sorting, filtering, viewing as a graph, examining nodes and relationships, copying file details, and accessing chunks related to the file

File Status

File Status

File Nodes

File Nodes

File Relationships

File Relationships

File Actions

** Graph View

Graph Actions

** Copy File Data

Copy File Data

** Text Chunks

Text Chunks

10. Interface Design

Designed a user-friendly interface that guides users through the process of connecting to Neo4j Aura, accessing file sources, uploading PDF files, and generating graphs

  • Components: @neo4j-ndl/react
  • Icons: @neo4j-ndl/react/icons
  • Graph Visualization: @neo4j-nvl/react
  • NVL: @neo4j-nvl/core
  • CSS: Inline styling, tailwind CSS

11. Deployment

Followed best practices for optimizing performance and security of the deployed application

  • Local Deployment: ** Running through docker-compose ** By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations ** In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both),
    ** By default, the input sources will be: Local files, Youtube, Wikipedia ,AWS S3 and Webpages. As this default config is applied: ** By default,all of the chat modes will be available: vector, graph+vector and graph. If none of the mode is mentioned in the chat modes variable all modes will be available: ** You can then run Docker Compose to build and start all components:

[source,indent=0]

  • VITE_LLM_MODELS=""
  • VITE_REACT_APP_SOURCES=""
  • VITE_GOOGLE_CLIENT_ID="xxxx" [For Google GCS integration]
  • VITE_CHAT_MODES=""
  • VITE_CHUNK_SIZE=5242880
  • VITE_TIME_PER_PAGE=50
  • VITE_LARGE_FILE_SIZE=5242880
  • VITE_ENV="PROD"/ 'DEV'
  • VITE_BACKEND_API_URL=
  • VITE_BLOOM_URL=
  • VITE_BACKEND_PROCESSING_URL=
  • VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
  • VITE_BATCH_SIZE=2

  • Cloud Deployment: ** To deploy the app install the gcloud cli , run the following command in the terminal specifically from frontend root folder. *** gcloud run deploy *** source location current directory > Frontend *** region : 32 [us-central 1] *** Allow unauthenticated request : Yes

12. API Reference

1) Connection Modal


POST /connect

Neo4j database connection on frontend is done with this API

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name

2) Backend Database connection


POST /backend_connection_configuation

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name

3) Upload Files from Local


POST /upload

The upload endpoint is designed to handle the uploading of large files by breaking them into smaller chunks. This method ensures that large files can be uploaded efficiently without overloading the server

API Parameters :

  • file= File to be uploaded
  • source_type= Source of the file

4) User Defined Schema


POST /schema

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name

5) Graph schema from Input Text


POST /populate_graph_schema

The API is used to populate a graph schema based on the provided input text, model, and schema description flag

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • input_text= Text to generate schema from
  • model= LLM model to use
  • is_schema_description_checked=A flag indicating whether the schema description should be considered.

6) Unstructured Sources


POST /url/scan

Create Document node for other sources - s3 bucket, gcs bucket, wikipedia, youtube url and web pages

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • url= URL to scan
  • source_type= Source of the file

7) Extration of Nodes and Relations from Data


POST /extract

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • source_types= Source of the file
  • language=Language in which wikipedia content will be extracted

8) Get list of sources


GET /sources_list

List all sources (Document nodes) present in Neo4j graph database

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name

9) Post processing after graph generation


POST /post_processing :

This API is called at the end of processing of whole document to get create k-nearest neighbor relations between similar chunks of document based on KNN_MIN_SCORE which is 0.8 by default and to drop and create a full text index on db labels

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • tasks= List of tasks to perform

10) Chat with Data


POST /chat_bot

The API responsible for a chatbot system designed to leverage multiple AI models and a Neo4j graph database, providing answers to user queries. It interacts with AI models from OpenAI and Google's Vertex AI and utilizes embedding models to enhance the retrieval of relevant information

Components :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • query= User query
  • model= LLM model to use
  • mode= Chat mode to use
  • session_id= Session ID used to maintain the history of chats during the user's connection

11) Get entities from chunks


POST/chunk_entities

This API is used to get the entities and relations associated with a particular chunk and chunk metadata

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • chunk_id= ID of the chunk to get entities for

12) Clear chat history


POST /clear_chat_bot

This API is used to clear the chat history which is saved in Neo4j DB

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • session_id = User session id for QA chat

13) View graph for a file


POST /graph_query

This API is used to view graph for a particular file

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • document_names = File name for which user wants to view graph

14) Get neighbour nodes


POST /get_neighbours

This API is used to retrive the neighbor nodes of the given element id of the node

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • elementId = Element id of the node to retrive its neighbours

15) SSE event to update processing status


GET /update_extract_status

The API provides a continuous update on the extraction status of a specified file. It uses Server-Sent Events (SSE) to stream updates to the client

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • source_types= Source of the file

16) Delete selected documents


POST /delete_document_and_entities

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • source_types= Source of the file
  • deleteEntities= Boolean value to check entities deletion is requested or not

17) Cancel processing job


POST/cancelled_job

This API is responsible for cancelling an in process job

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • source_types= Source of the file

18) Deletion of orpahn nodes


POST /delete_unconnected_nodes

The API is used to delete unconnected entities from database

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • unconnected_entities_list=selected entities list to delete of unconnected entities

19) Get the list of orphan nodes


POST /get_unconnected_nodes_list

The API retrieves a list of nodes in the graph database that are not connected to any other nodes

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name

20) Get duplicate nodes


POST /get_duplicate_nodes

The API is used to fetch duplicate entities from database

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name

21) Merge duplicate nodes


POST /merge_duplicate_nodes

The API is used to merge duplicate entities from database selected by user

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • duplicate_nodes_list= selected entities list to merge of with similar entities

22) Drop and create vector index


POST /drop_create_vector_index

The API is used to drop and create the vector index when vector index dimesion are different

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • isVectorIndexExist= True or False based on whether vector index exist in database

23) Reprocessing of sources


POST /retry_processing

API Parameters :

  • uri= Neo4j database URI
  • username= Neo4j database username
  • password= Neo4j database password
  • database= Neo4j database name
  • source_types= Source of the file
  • retry_condition = One of the above 3 conditions which is selected for reprocessing.

13. Conclusion

In conclusion, this technical document outlines the process of building a React application with Neo4j Aura integration for graph database functionalities

  • Dev env : https://dev-frontend-dcavk67s4a-uc.a.run.app/
  • Staging env: https://staging-frontend-dcavk67s4a-uc.a.run.app/
  • Prod env: https://prod-frontend-dcavk67s4a-uc.a.run.app/