About this Talk
Comprehensive and accurate data quality checks are required for many types of data including legal entities to meet regulatory and operational standards. However, extracting information from regularly updated complex rule documents and managing the checks that are applying the extracted rules is typically a labor-intensive process prone to errors and struggles to scale. The challenge is further compounded when dealing with complex, multi-line conditional checks. The growing complexity and volume make manual management of data quality increasingly challenging.
Accurate and consistent data is the foundation for compliance, risk assessment, and critical decision-making. Automating data quality process is not only more efficient but also critical as the complexity and volume increase to keep the process scalable, and thus accurate. This work impacts the broader business landscape relying on trustworthy data. Ensuring that checks are accurate, non-overlapping and non-contradicting is essential for maintaining high data integrity. The incorporation of ontology driven LLM workflow enhances the precision and consistency of the generated checks.
This presentation is intended for data scientists, technologists, data governance and compliance officers, chief data officers, and knowledge graph practitioners. Attendees will gain insights into utilising in-context learning with multiple LLM instances, guided by the ontology, to automate the extraction of complex, multi-lined conditional data quality checks from rule texts. They will also gain insights into how to integrate and manage these checks to improve data governance.
The process is conducted in three key stages:
1. Conversion of Rule Documents to Triples: Using in-context learning, LLMs transform rule documents into triples. The process employs Retrieval-Augmented Generation(RAG), where multiple similar examples are provided in context in a structured manner to enhance the accuracy and relevance of the generated triples.
2. Creation of Checks from Triples: Multiple LLM instances work together to produce complex, multi-line conditional data quality checks from these triples. Process with multiple LLM instances allows monitoring and regulating the outputs, enhancing the accuracy and consistency of the generated checks.
3. Knowledge Graph Validation: The generated checks are integrated into a knowledge graph, where they are further analysed to ensure they do not conflict or overlap with existing checks, which is increasingly critical as the number and complexity of checks grow.
This approach significantly reduces the time and effort needed to generate and validate complex data quality checks from rule documents, thereby increasing the efficiency, improving the quality of the data and allowing scalability. The use of ontology driven LLM workflow, along with the validation capabilities of a knowledge graph, ensures that these checks are accurate, consistent, and free from redundancy and contradictions, which is essential for maintaining robust data.