Zornitsa Manolova Yener Karaca

Automating Rule Extraction with Ontology Driven LLMs

A Talk by Zornitsa Manolova and Yener Karaca

About this Talk

Comprehensive and accurate data quality checks are required for many types of data including legal entities to meet regulatory and operational standards. However, extracting information from regularly updated complex rule documents and managing the checks that are applying the extracted rules is typically a labor-intensive process prone to errors and struggles to scale. The challenge is further compounded when dealing with complex, multi-line conditional checks. The growing complexity and volume make manual management of data quality increasingly challenging.

Accurate and consistent data is the foundation for compliance, risk assessment, and critical decision-making. Automating data quality process is not only more efficient but also critical as the complexity and volume increase to keep the process scalable, and thus accurate. This work impacts the broader business landscape relying on trustworthy data. Ensuring that checks are accurate, non-overlapping and non-contradicting is essential for maintaining high data integrity. The incorporation of ontology driven LLM workflow enhances the precision and consistency of the generated checks.

This presentation is intended for data scientists, technologists, data governance and compliance officers, chief data officers, and knowledge graph practitioners. Attendees will gain insights into utilising in-context learning with multiple LLM instances, guided by the ontology, to automate the extraction of complex, multi-lined conditional data quality checks from rule texts. They will also gain insights into how to integrate and manage these checks to improve data governance.

The process is conducted in three key stages:

1. Conversion of Rule Documents to Triples: Using in-context learning, LLMs transform rule documents into triples. The process employs Retrieval-Augmented Generation(RAG), where multiple similar examples are provided in context in a structured manner to enhance the accuracy and relevance of the generated triples.

2. Creation of Checks from Triples: Multiple LLM instances work together to produce complex, multi-line conditional data quality checks from these triples. Process with multiple LLM instances allows monitoring and regulating the outputs, enhancing the accuracy and consistency of the generated checks.

3. Knowledge Graph Validation: The generated checks are integrated into a knowledge graph, where they are further analysed to ensure they do not conflict or overlap with existing checks, which is increasingly critical as the number and complexity of checks grow.

This approach significantly reduces the time and effort needed to generate and validate complex data quality checks from rule documents, thereby increasing the efficiency, improving the quality of the data and allowing scalability. The use of ontology driven LLM workflow, along with the validation capabilities of a knowledge graph, ensures that these checks are accurate, consistent, and free from redundancy and contradictions, which is essential for maintaining robust data.

12 December 2024, 10:55 AM

Lightning Talk Stage

10:55 AM - 11:15 AM

About The Speakers

Zornitsa Manolova

Zornitsa Manolova

Head of Data Quality Management & Data Science, Global Legal Entity Identifier Foundation

Zornitsa is leading a team of strong Data Scientist at GLEIF where she has been instrumental since April 2018 in enhancing the organization's data quality and governance framework through innovative data analytics approaches.

Zornitsa Manolova

Yener Karaca

Yener Karaca

Junior Data Scientist, Global Legal Entity Identifier Foundation

Yener is a Junior Data Scientist at GLEIF, specialising in data science with a strong focus on integrating AI and graph technologies to solve complex data governance challenges.

Yener Karaca

Location

Convene 133 Houndsditch

133 Houndsditch, London

Neo4j

Neo4j, the Graph Database & Analytics leader, helps organizations find hidden relationships and patterns across billions of data connections deeply, easily, and quickly.

Platinum Sponsor

Ontotext

Connect the dots of your data! Ontotext helps enterprises to lower data management costs by up to 30%, enable data fabric architectures, create digital twins, utilize Graph RAG benefits, and take information delivery from days to minutes!

Gold Sponsor

Semantic Web Company / PoolParty

The vendor of PoolParty Semantic Suite. Graph-based text mining, recommender systems, and data fabric solutions.

Gold Sponsor

yWorks

yWorks specializes in the development of professional software solutions that enable the clear visualization of diagrams and networks.

Gold Sponsor

Oracle

We’re a cloud tech company that provides organisations around the world with computing infrastructure and software to help them innovate, unlock efficiencies and become more effective. We also created the world’s first – and only – autonomous database to help organise and secure our customers’ data.

Gold Sponsor

Ultipa

Ultipa builds next-gen graph XAI & real-time database empowering smart enterprises w/ smooth digital transformations.

Sliver Sponsor

Oxford Semantic Technologies

Oxford Semantic Technologies (OST) spun out from the University of Oxford and was acquired by Samsung in 2024. OST provides AI software to extract insights from big data, solving issues like medical diagnostics and financial crime. One founder is a BCS Lovelace Medal winner.

Sliver Sponsor

FlureeDB

Web3 data platform built on standards. Fluree powers connected, secure, and agile data ecosystems.

Bronze Sponsor

Senzing

Senzing is the first to deliver real-time, artificial intelligence for entity resolution. Senzing software enables organizations of all sizes to gain highly accurate and valuable insights about who is who and who is related to whom in data.

Bronze Sponsor

Semantic Partners

We partner with you, and your chosen semantic stack, to liberate your data's meaning from isolated silos.

Bronze Sponsor

Epsilla

All-in-one platform to create AI agents powered by your private data and knowledge. Make GenAI prototype to production 10 times faster. We are backed by Y Combinator. Start free today: https://epsilla.com

Bronze Sponsor

Neural Alpha

Since 2016 Neural Alpha have delivered cutting edge, sustainability centric Connected Data solutions for blue-chip corporates, financial institutions, Governments and NGOs. Our bespoke software & data solutions fuse AI, Knowledge Graphs, Taxonomies & other technologies for unprecedented insights.

Sliver Sponsor

GraphWise

Graphwise, born from the merger of Ontotext and Semantic Web Company, empowers enterprises to maximize AI ROI with trusted knowledge graph and semantic AI solutions, employing over 200 people globally across North America, Europe, and APAC.

Gold Sponsor

Lettria

Transparent, verifiable AI, Lettria lets your business docs and data deliver trustworthy AI answers.

Bronze Sponsor

Cricket Hill

Cricket Hill: Greek Organic Premium Olive Oil, Cosmo-Local Events and Tours

Partner

Want to sponsor this event? Contact Us