About this Talk
Data is infinite now. It’s not going to stop; it isn’t even going to slow down. The skills required to build and use tomorrow’s data systems will require new techniques to understand complex and endless data in real-time. That’s what we’re going to tackle in this masterclass.
The goal of this masterclass is to give participants a new superpower: to work comfortably with infinite streaming data and turn it into real-time connected insights. Welcome to the exciting world of streaming graphs!
This is a hands-on class where we learn through doing. After a little background context to understand the key new ideas, we’ll dive into a series of hand-on exercises where participants use the open source “Quine” streaming graph (https://quine.io) on batch and streaming data input.
What makes a “streaming graph” streaming is not only that data comes into it as a stream, but also, that results stream out. We’ll use a powerful capability called “standing queries” to monitor the entire graph for patterns expressed as a standard database query. Think of this as continually querying the entire graph for anything you like, except that it’s far more efficient and you don’t have to know when the right time is to issue your query; it’s just always running. Results from your query stream out in time, or can even call back into the graph to advance to the next step of an algorithm.
With a solid understanding of what we can do with streaming graphs, the class will conclude with an application of graph neural networks to streaming graphs. To make this practical in a streaming scenario, we’ll make use of Quine’s ability to maintain a fully-versioned graph and query back in time to access historical states—even while new data streams in.
This class is aimed at data engineers, data scientists, product managers, and the managers of these teams. No deep experience is assumed or required. By the end of this masterclass, students will have built several useful applications with streaming graphs, and have the foundational knowledge and skills to apply these tools in their own environments to turn infinite data streams into real-time answers to deep questions.
Key Topics
- Streaming data vs. Batch data
- Streaming graphs
- Event-driven data pipelines
- Creating a graph from streaming data using Cypher
- Monitor dynamically changing graphs for insightful patterns
- Implement algorithms on a dynamically changing graph
- Streaming graph-based machine learning and graph embedding
Target Audience
- Data Engineers
- Data Scientists & Machine Learning Engineers
- Data Analysts
- Managers of the above
Goals
Get hands-on experience with graph analytics on live streaming data. Build the understanding needed to apply these techniques to problems in your own work and life.
Session outline:
Introduction to streaming graphs (20 minutes)
- Streaming data vs. Batch data
- Graphs in streaming data
- Introduction to the Quine streaming graph
Hands on #1: Build a streaming graph from a static data set (30 minutes)
- Get up and running
- Data source and goals
- Creating the data ingest
- Exploring the data
Hands on #2: Build and analyze a streaming graph from a live streaming dataset (30 minutes)
- Streaming sources
- Standing Queries
- Graph Algorithms
Hands on #3: Graph Neural Networks with streaming graphs (30 minutes)
- Temporal Queries
- Random Walks
- Graph Neural Networks
Conclusion (10 minutes)
- Applications in the real world and how to get there
Format
This class is very hands-on. The beginning of the class will start in a lecture format, but will quickly move to hands-on exercises.
Each of the exercises is meant to be run independently on participants’ own laptops (MacOS, Windows, or Linux). The exercises will make use of the open source Quine streaming graph software to ingest data, build graphs, perform streaming operations, and produce output from those graphs. Sample data will be provided.
Participants will edit text files and execute command-line programs to see their changes visible in a web browser running from their own local web server. Those text files are in YAML format, where participants will learn to write Cypher queries to orchestrate and customize their streaming graph applications. Some REST API calls to the local web server may be useful for deeper understanding or customization.
The hands-on sessions will conclude with an exercise demonstrating streaming graphs for Graph Neural Networks (GNNs), where participants can execute Python code to train their own graph neural network with streaming graph data.
Level
Beginner - Intermediate
Prerequisite Knowledge
Basic familiarity with running programs at the command line and editing text files. Some familiarity with the Cypher graph query language is helpful but not required. Basic Python experience is helpful for the final exercise.