Paco Nathan

How to create knowledge graphs from structured and unstructured data based on entity resolution, to enhance downstream AI applications

A Talk by Paco Nathan (Principal DevRel Engineer, Senzing)

About this Talk

Graph RAG has become a buzzword in the tech industry recently, given the popularity of using knowledges graphs to "ground" LLMs with domain-specific facts. This approach improves the overall quality of responses in AI applications by reducing "hallucination" in the results. It also allows for faster data updates and helps reduce the need (and costs) for fine-tuning LLM models.

While most Graph RAG examples tend to use LLMs to generate graph elements automatically, we'll step back and examine the workflows needed for generating knowledge graphs from structured and unstructured data sources. By employing state-of-the-art open models and open source libraries at each step, in this masterclass we will explore how to combine the use of entity resolution and entity linking to produce graphs that emphasize data quality, curation, and feedback from domain experts, while making affordances for audits, evidence-based decision making, and practices required for mission-critical enterprise applications in highly-regulated environments.

Overall, we will discuss a generalized architecture for how to build and update knowledge graphs using a blend of structured and unstructured data sources, and consider the impact of entity resolution on downstream AI apps.

Key Topics

  • Distinguish among the key terminology: entity resolution, named entity recognition, relation extraction, entity linking.
  • Leverage entity resolution to provide a semantic overlay on records from structured data, i.e., generating graph elements while preserving evidence.
  • Using a textgraph algorithm to construct a lexical graph alongside the text chunking and embedding needed for retrieval augmented generation (RAG).
  • Graph construction practices which are consistent with the needs of trustworthy AI applications, audits, evidence-based decision making, and so on.
  • Case studies for production use cases which leverage these practices.
  • Why not simply use an LLM to do all of the work?

Target Audience

  • Data science teams
  • People with general interest in AI, especially for highly-regulated enterprise environments

Goals

  1. Gain hands-on experience creating knowledge graphs from both structured sources and unstructured sources, for use in a Graph RAG application.
  2. Follow practices which emphasize data quality plus affordances for audits, evidence handling, and trustworthy AI applications downstream.
  3. Use entity resolution to create the "backbone" of a knowledge graph from structured data sources.
  4. Compare use of contemporary state-of-the-art open models and open source libraries in Python for extracting graph elements from unstructured data sources.
  5. Use entity resolution results to build a context-sensitive entity linker, blending graph elements from structured and unstructured sources.

Session outline:

  • Start with multiple open datasets used in sanctions compliance (e.g., money laundering, ultimate beneficial ownership, and relate work) as structured data sources.
  • Use entity resolution to identify entities and relations which have supporting evidence (e.g., for use in investigations).
  • Build a "skeleton" graph from the structured data sources plus the "semantic overlay" of entities and relations.
  • Load unstructured data (e.g., from relevant news articles) and split into text chunks organized in a vector database, based on an embedding model.
  • Parse the text chunks to build a lexical graph, then use a textgraph algorithm to extract its most important elements.
  • Build a context-specific entity linker based on the entity resolution results from above, to blend the unstructured elements into the "skeleton" graph.
  • Show how to use the resulting knowledge graph and vector database together in a Graph RAG application.

Format

This class will start with a lecture describing important terms and practices, then move to hands-on coding examples in Python.

We'll work with a collection of Jupyter notebooks which are available in a GitHub repository.

Each notebook illustrates an important section of code, along with information for debugging, illustrating intermediate results, and performance monitoring.

Then we'll work with a Python program which assembles these pieces into one application, which you can repurpose for your own use cases.

To make the most of the time available (2 hours) we will link to some other online tutorials for deep-dives into specific areas which are beyond the scope of this class.

Level

Intermediate

Prerequisite Knowledge

Some experience coding in Python and familiarity with popular packages such as Pandas and Jupyter.

11 December 2024, 11:45 AM

GenAI & Graph RAG Stage

11:45 AM - 01:45 PM

About The Speakers

Paco Nathan

Paco Nathan

Principal DevRel Engineer, Senzing

Stage Host

Paco Nathan is a Principal DevRel Engineer at Senzing.com leading the Knowledge Graph practice area, and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing.

Paco Nathan

Location

Convene 133 Houndsditch

133 Houndsditch, London

Neo4j

Neo4j, the Graph Database & Analytics leader, helps organizations find hidden relationships and patterns across billions of data connections deeply, easily, and quickly.

Platinum Sponsor

Ontotext

Connect the dots of your data! Ontotext helps enterprises to lower data management costs by up to 30%, enable data fabric architectures, create digital twins, utilize Graph RAG benefits, and take information delivery from days to minutes!

Gold Sponsor

Semantic Web Company / PoolParty

The vendor of PoolParty Semantic Suite. Graph-based text mining, recommender systems, and data fabric solutions.

Gold Sponsor

yWorks

yWorks specializes in the development of professional software solutions that enable the clear visualization of diagrams and networks.

Gold Sponsor

Oracle

We’re a cloud tech company that provides organisations around the world with computing infrastructure and software to help them innovate, unlock efficiencies and become more effective. We also created the world’s first – and only – autonomous database to help organise and secure our customers’ data.

Gold Sponsor

Ultipa

Ultipa builds next-gen graph XAI & real-time database empowering smart enterprises w/ smooth digital transformations.

Sliver Sponsor

Oxford Semantic Technologies

Oxford Semantic Technologies (OST) spun out from the University of Oxford and was acquired by Samsung in 2024. OST provides AI software to extract insights from big data, solving issues like medical diagnostics and financial crime. One founder is a BCS Lovelace Medal winner.

Sliver Sponsor

FlureeDB

Web3 data platform built on standards. Fluree powers connected, secure, and agile data ecosystems.

Bronze Sponsor

Senzing

Senzing is the first to deliver real-time, artificial intelligence for entity resolution. Senzing software enables organizations of all sizes to gain highly accurate and valuable insights about who is who and who is related to whom in data.

Bronze Sponsor

Semantic Partners

We partner with you, and your chosen semantic stack, to liberate your data's meaning from isolated silos.

Bronze Sponsor

Epsilla

All-in-one platform to create AI agents powered by your private data and knowledge. Make GenAI prototype to production 10 times faster. We are backed by Y Combinator. Start free today: https://epsilla.com

Bronze Sponsor

Neural Alpha

Since 2016 Neural Alpha have delivered cutting edge, sustainability centric Connected Data solutions for blue-chip corporates, financial institutions, Governments and NGOs. Our bespoke software & data solutions fuse AI, Knowledge Graphs, Taxonomies & other technologies for unprecedented insights.

Sliver Sponsor

GraphWise

Graphwise, born from the merger of Ontotext and Semantic Web Company, empowers enterprises to maximize AI ROI with trusted knowledge graph and semantic AI solutions, employing over 200 people globally across North America, Europe, and APAC.

Gold Sponsor

Lettria

Transparent, verifiable AI, Lettria lets your business docs and data deliver trustworthy AI answers.

Bronze Sponsor

Cricket Hill

Cricket Hill: Greek Organic Premium Olive Oil, Cosmo-Local Events and Tours

Partner

Want to sponsor this event? Contact Us