In this episode of the GraphStuff.FM podcast Will & Lju break down the Neo4j Graph Data Platform. We discuss the individual components of the graph platform, how developers and data scientists can get started, and why the value delivered by the platform is greater than just the sum of the individual components.
00:00 - 01:54 Introduction
01:54 - 10:44 History of Neo4j & The Evolution From Graph Database To Graph Platform
10:44 - 16:29 Components of the Neo4j Graph Data Platform, Neo4j Database, & Graph Native
16:29 - 21:34 Neo4j Desktop, Using Neo4j in the Cloud, & On-Prem
21:34 - 27:11 Neo4j Browser & Low-Code Graph Apps
27:11 - 32:40 Neo4j Language Drivers & Building APIs
32:40 - 38:00 Graph Data Science, Analytics, & Visualization
38:00 - 49:41 Neo4j Connectors & Neo4j Labs
Lju Lazarevic (00:00):
Hello everybody. Thank you very much for joining us once again. I’m Lju Lazarevic, and I’m joined by my colleague Will Lyon. And in this episode, we are going to be discussing an overview of the Neo4j graph data platform. So just to give you a little bit of context of what we’re covering, we’re going to give you the backstory of Neo4j. And then we’re going to talk a little bit about some use cases and how the platform has evolved over time and the amazing tools that are available today. So, Will, give us a bit of the background about Neo4j.
William Lyon (00:39):
Neo4j is fundamentally a graph database. I thought it’d be interesting to start off talking about why Neo4j was created and a little bit about how it’s evolved from those early days. So Neo4j was first created as an embedded Java database. And this is where that “4j” in the name comes from. It has since evolved beyond that, so the “4j” Java aspect is no longer really relevant for users of Neo4j but that’s the history. Neo4j was created to address some problems that the founders were having in building a CMS, and they had metadata associated with different assets in the CMS. I think it was specifically some of the rights and metadata around the usage of photos that was the initial use case. And that’s very difficult to represent in a relational database because of all of the different connections and relationships and the richness of the data. So that’s why Neo4j was first created.
William Lyon (01:47):
The founders, I think, quickly realized that there were lots of other interesting use cases beyond just this embedded database in the CMS application. And so Neo4j quickly evolved into something that was more generally useful beyond just this embedded database. So quickly folks were using Neo4j for things like generating personalized recommendations, for handling logistics, routing, dealing with the complex access patterns inherent in things like identity and access management, fraud detection, where you’re interested in the connections between actors and maybe say a payment network, that sort of thing. Things like customer 360, where an enterprise wants to understand the different components and interactions that they’ve had with a customer throughout their organization across many different systems, use cases like monitoring network operations.
William Lyon (02:49):
So if I’m responsible for a data center, there are lots of connections within that data center from the hardware, what rack is the server deployed on, what are the different network interfaces going on there all the way to the software applications that are deployed on these different services, the data dependencies there. So really these are, I think, some of the first core use cases where Neo4j really started to shine. Anywhere that the connections in your data are very important or you have very complex connected data sets, I think were some of the first use cases that started to emerge for Neo4j.
Lju Lazarevic (03:32):
Yes, very much that graph-shape problem. Are you looking at a graph shaped problem? It’s all about those connections in the data. And I guess the interesting thing is if we look at how Neo4j, the database has evolved into Neo4j, the platform. And as you talked about many of these use cases that have been identified though, a good fit for Neo4j. And we look at the different tools that these audiences of these use cases need. And this idea of that you have graph transactions, so this is all about thinking about your real-time events happening.
Lju Lazarevic (04:16):
For example, somebody requests a new financial product, and then you’re getting that new information into your database. And you’re looking to identify, are there any potential fraud opportunities occurring? So this idea of graph analytics, being able to do things such as graph data science or other analytical use cases where you’ve already got this data, and you’re just analyzing patterns or interesting things going on in the database, which you then may use as part of your real-time transaction system later.
Lju Lazarevic (04:54):
And the really interesting thing as well is as time goes on, we’ve got lots of different types of users using the database. Unsurprisingly we’ve got developers, so developers are working with the database, they’ll be integrating them with their various frameworks and stacks. We’ve got the administrators, so somebody who’s going to be in charge of watering and feeding the database. So making sure that it’s working well for all of the various, the configuration is correct, and so forth. The data scientist who is going to be doing their research and investigation on the data, especially when they’re looking at connected data, and being able to understand what’s going on. You’re going to have analysts, so perhaps the people who are looking at the data.
Lju Lazarevic (05:51):
So again, let’s take the fraud example, somebody who really understands their domain well and using that knowledge of their domain, being able to analyze and inspect the data, and there’s going to be many more. We saw examples as well with the ICIJ when they were looking at things like the Panama Papers and this idea of the citizen scientist going off and looking at the data. And this is very much looking at this idea of not only thinking about Neo4j as this database that is dealing with these heavily connected use cases but this idea that you can also use it as a general-purpose database.
Lju Lazarevic (06:39):
And one thing that I’ve spotted and really love about Neo4j and working with it is, when you are investigating some data, the fact that you have a very flexible data model when you work with it. So yes, you need to think about how your data is related to each other, your data points. You don’t have to go away into your database and specifically dictate as you would in relation-based, based on your tables are going to be what your keys are and so forth. This idea of being able to get your data in as quickly as possible and then test that hypothesis straight away.
Lju Lazarevic (07:28):
So effectively from a POC perspective or just seeing an idea, using a graph database for moves that need to do a lot of data pre-planning, and gives you a lot of flexibility to go away and iterate an idea. So I think there are some really exciting things around that and tools that go along to support that. And I think there is a lot of excitement here and to think about the tools, which I know is something you’ve spent a lot of time considering. Well, so if you can tell us more about that.
William Lyon (08:07):
When we think of the evolution of going from a core database to a platform, you spoke a bit about the audience of users expanding from developers to data scientists to administrators and analysts as sort of the use cases have evolved. I think it’s also important to understand the context of larger trends going on in the developer and data science ecosystem as well. So these are things like realizing that we live now in sort of a cloud-first world. Some developer expectations around tooling and ease of use and sort of what you want to enable a developer, a data scientist to accomplish within an amount of time, I think that’s significantly grown over time, right?
William Lyon (08:57):
As a developer, when I’m using a new tool, I have greater expectations for what I can accomplish in that first hour, that first week with using the tool. Whereas, now maybe I expect to be able to build and deploy an application within just a couple of days of the first contact with a new technology. And so we want to be able to address those emerging trends and really enable developers and data scientists to be more productive. I think that’s also a big component of this idea of the graph data platform.
William Lyon (09:31):
So, let’s talk about the specifics of the Neo4j graph data platform. There’s I think a really helpful diagram and we’ll link this in the show notes. This is on the Neo4j developer page for the graph platform. And it’s sort of this diagram that shows, at the core of the platform is the Neo4j graph database and then some of the tooling use cases and audience around that. And so we’ll use that as maybe sort of a guide for how to think about and how to talk about the graph data platform as we go on.
William Lyon (10:12):
So I mentioned, at the core of the platform is the Neo4j graph database itself. Some of you may be quite familiar, I’m sure, with Neo4j and some of the features, but I think it’s important to talk about and maybe have some understanding of how graph databases and Neo4j specifically are different from other databases that you may have seen, like relational databases, document databases, these kinds of things. So in that context, I think one of the core differences and really, I would say one of the core advantages of a graph database like Neo4j is the property graph data model. So we’re modeling, storing, querying our data, thinking about it as a graph. So nodes, these are the entities, relationships, connect nodes, we store properties on both nodes and relationships. But important to know that relationships are a first-class citizen of the data model and how we store and query them in the database.
William Lyon (11:18):
So speaking of storing and querying, we use a query language called Cypher to work with Neo4j. If you’re familiar with relational databases where you’re using SQL, you can think of Cypher as SQL for graphs and Cypher is all about pattern matching. So we define these sort of ASCII art patterns using Cypher, and then it’s a declarative query language so then it’s up to the database to figure out how to identify, how to work with that graph pattern. Speaking of leaving things up to the database to figure out, it’s important to understand also I think, the performance optimizations that a graph database like Neo4j makes relative to the performance optimization that other databases make. Neo4j is optimized for traversing the graph. So that’s sort of the core operation or the core graph processing that you do is traversing from one node to another, by following relationships.
William Lyon (12:28):
And oftentimes your Cypher query that you’re describing a pattern or traversals through the graph is often many relationships deep. You may see 10, 12, even an unspecified variable-length pattern to find the shortest path, these sorts of very complex traversals. So that is the core of what a graph database like Neo4j is optimized to do. And there’s this concept called index-free adjacency that’s at the root of this performance optimization. So this means that any time that I’m traversing from one node to another, that I’m not using an index lookup operation to do that traversal. Doing sort of the equivalent of chasing a pointer, I’m looking at just sort of an offset, which computers are very good at.
William Lyon (13:20):
The implication of this means that the performance of this traversal is then not dependent on the overall size of the data. So if I’m doing a local graph traversal going from one node falling in a relationship and another relationship and so on, the performance of that is not dependent on the overall size of the graph that I have and that’s always going to be a performance operation. If you compare that to a similar concept in relational databases, which would be this idea of a join, well, a join is sort of a set operation I’m using in the index to see where two tables overlap to join those two tables. That is an index back operation and the performance of that join is going to be dependent on the size of those two tables. So as those tables grow, that performance is going to be impacted. So that’s a really important fundamental concept to understand the optimizations that graph databases like Neo4j make.
William Lyon (14:22):
So taking these concepts together, and you’ll often hear the term graph native talked about, and I think this is fundamentally what graph native means. It’s sort of optimizing all the way through the stack from the data model to the query language, to how that data is stored on disc and how the database processes it, is optimized for graph and graph workload. So that’s what graph native means when you hear that term. The other thing I think that’s interesting about Neo4j the database is that it’s very extensible. So you have the ability to write user-defined procedures and functions, plugins to define a custom logic beyond Cypher. And we’ll see that as kind of the basis for some other interesting components of the graph data platform as we go through it.
Lju Lazarevic (15:16):
So we’ve talked about the database and very briefly how it works, why it’s different from other databases and it’s this key idea about index-free adjacency and the fact that it’s optimized for traversing relationships, whereas something like a relational database is more optimized towards running down rows. And the next question is, well, how do we get up the database? So what do we do to interact with the database? So for many of you, if you go away and Google Neo4j, you will be presented to our website and one button you may see is download Neo4j Desktop.
Lju Lazarevic (16:08):
So what is Neo4j Desktop? It is the mission control for your Neo4j projects. So what it is, it’s a developer tool that allows you to manage your Neo4j instances in a nice way. So it’s a great way of being able to talk to your remote databases and allows you to run these things called graph apps and that’s something we will touch on in a bit. It also allows you to run local databases. So if you’re trying out Neo4j or you’re pulling together a proof of concept or you’re just learning and having a play, then it also allows you to very easily start up a local instance and it takes away lots of the configuration thinking that you might need to think about, whether you’ve got the right version of Java set up and so forth. So it is a really comfortable one-stop shop of being able to configure your databases, access to databases, run custom applications off the back of your database as well.
Lju Lazarevic (17:18):
And then moving from Neo4j Desktop, you also have the ways you can administer your database. So let’s say you have set up a local database on your machine, you’ve done that through the Neo4j Desktop, and let’s say you’ve done some work on there. So we have a number of tools available to us, so we’ve got Neo4j Admin. This is a tool you would run in your console window, and this allows you to do operations on the database such as dump your database data. So if this is something to go and move your data to another database or some other provision, you have the option to do that, as well as load that data.
Lju Lazarevic (17:59):
You have options as well to do maintenance through some of the utils available there too. So you have that sort of administration. You also have ways of getting different formats of your database, so if you’d like to use Docker, you can go and get a Docker container of Neo4j. We have got a Helm Chart as well, if you are looking at the Kubernetes route. You can also get Neo4j in various flavors and other packages so for example, you can get it for, there’ll be another Linux setups as well as having it available in Windows. And we also have another tool for monitoring as well. So there’s Halin. So this is one of our Neo4j Labs applications which we’ll talk about a bit later, but that is another option too for you to go and administer and check on your database and the performance.
Lju Lazarevic (18:49):
Now, if that sounds like too much of a headache, we do have a cloud option available to you as well. So we do have Neo4j database as a service, we call this Neo4j Aura, and this is a scalable service that is available with clustering and so forth, so you don’t have to be worrying about any kind of resilience that is all looked after for you. You do not have to configure it, you do not have to administrate it, that is all built for you as you would expect from a DBaaS. And we have three different flavors available. So we’ve got the professional tier, so this is your usual thing that you would expect as a service. This is something where you come along with your credit cards and you can spin up a database and do all of that good stuff.
Lju Lazarevic (19:38):
If you need something that is a bit more dedicated to your enterprise, so you want a special SLAs wrapped around your work. If you need something large scale, you need special provisions and so forth, we do have an enterprise offering as well. And coming very soon as well, we have got our free tier. So this is targeted at developers, so for those of you who’ve got small projects, if you’re looking to learn more about Neo4j, if you want something that is more closely integrated with the cloud, you have the offering coming soon. And we do have a waitlist available for that so please do sign up. And again, we will have the link for that in the show notes.
William Lyon (20:21):
Regardless of how you have deployed a Neo4j, if you’re using it in the context of Neo4j Desktop, if you’ve deployed it using the helm chart or on Kubernetes or if you’re using Neo4j or a database as a service, regardless of how you’ve deployed or using Neo4j, one of the first tools that you’ll use to interact with Neo4j is what’s called Neo4j Browser. So Neo4j Browser runs in your web browser, you can think of it as a query workbench for Neo4j. It allows you to execute Cypher queries against the database and then visualize and interprets the results. There are also various administration commands that I can run in Neo4j Browser. But I think of Neo4j Browser as that sort of core query workbench that during development testing, if I want to write some load CSV scripts for importing data, that’s sort of the core tool that I’m using initially.
William Lyon (21:28):
Now, Neo4j Browser is what’s called a graph app. So graph apps are single page applications that when are run in Neo4j Desktop, have an additional API injected into the application that gives them access to the currently running database in Neo4j Desktop as well as some other operations. So what this means is that anyone can build these graph apps that can do sort of any similar operation to what Neo4j Browser does, which is really interesting then to see what’s possible from almost sort of a low code tooling perspective. I guess that’s kind of how I think of a graph apps. So if we look at some of the examples of the graph apps that are available within Neo4j Desktop, and some of these run independently outside of desktop as well, similar like how Neo4j Browser, we can run in desktop or outside and still connect to a database.
William Lyon (22:32):
But if you look at some of these examples, I think you quickly start to see how useful these graph apps can be. So one graph app that I think is really powerful is NEuler or the graph algorithms playgrounds. NEuler gives you a really no-code environments to explore, compose and execute graph data science and graph algorithms on Neo4j. So it’s leveraging the graph data science library which gives you lots of functionality for running these graph algorithms. They can be overwhelming, I think initially, to understand what is the library of algorithms available, what are the different configuration options, and so NEuler gives you this UI driven environment for composing the different configurations for these algorithms, running them, visualizing the results, which I think is really powerful.
William Lyon (23:28):
Another example of a graph app is what’s called GraphQL Architect. So GraphQL Architect allows you to construct, develop, run and query a GraphQL API locally or within the context of Neo4j Desktop and it has some options for exporting that GraphQL API once you’ve built it and tested it. Another graph app is the charts graph app which allows you to do exactly what it says, and that is creates charts. Oftentimes the answer to our question, even when working with graph data is a more tabular result. And so charts can be a useful way to interpret and share that.
William Lyon (24:12):
So these are some of the graph apps that have been built internally within Neo4j under the context of the Neo4j labs program which we’ll talk about in a bit more detail, but there are also some great community built graph apps. So I mentioned that anyone can build a graph app and run it inside of Neo4j desktop, but we’ll be sure to also link some of the documentation for building graph apps. But I want to highlight just a couple of the community graph apps that are out there, there’s lots of different ones. You can find all of these in the graph app gallery, which is available within desktop, or just go to your install.graphapp.io. We’ll link to that in the show notes.
William Lyon (24:55):
Two community graph apps that I want to highlight, one is GraphXR. GraphXR gives you this really interesting 3D data visualization tooling within desktops that you can render and explore your graph in a sort of 3D, almost VR-type environments. And then the other graph app I want to highlight is called Neomap. Neomap allows you to visualize spatial data that you’ve stored in Neo4j. So we can store spatial data and some geometries like points and Elaser of points to represent either a line segment or a polygon, and the Neomap allows us to build layers. If you’ve used other GIS and map building tools, the concept of layers where I’m sort of defining what geometries I want to bring in and represents and explore on the map, because this is really useful for working with spatial data. So anyway, that’s the concept of graph apps and we’ll be sure to link those all in the show notes.
Lju Lazarevic (25:58):
So we’ve reached the stage where we’ve got that database downloaded, we’ve set it up, we’ve explored some data using either browser or one of the graph apps. And now we're starting to think about, how do we start working towards the project that we’re looking to build, what problem are we looking to solve, how do we get more sorts of control over what we’re doing? So let’s have a look at how we would connect to the database. You have a number of ways of hiking connected to a database. So let’s look at those.
Lju Lazarevic (26:32):
We have the Neo4j drivers and these are language drivers that allow you to communicate with the database. And the official drivers that we have, so these are for javascript.net, Java, Python, and Go, work on this idea of a common obstruction. So all of them have very similar phrasing in how you would go off and connect to the database. They all follow very similar pattern across all of them. So if you look at the syntax that you would use for JavaScript versus Java, you’ll notice they’re similar naming conventions for the functions and so forth. And these fall onto this idea of a session, so what you would do when you are connecting to the database with the drivers, you would open up a session, you would do your transactions and you would then close that session off.
Lju Lazarevic (27:24):
We also have a large number of community drivers. So our wonderful community members have gone away and written drivers for other languages as well. So you will find community drivers for things like R, PHP, Ruby, Julia, and you’ll also find community drivers for some of the officially supported languages as well. So you will find, for example, the official driver for .NET and a community drive for .NET. And most of these drivers run under what we call the bot protocol. And this is a type system and what this effectively does is it manages how you connect to the databases. So for example, if you have got a database cluster that you’re working with, you don’t have to worry about which database you connect to, the bot protocol is going to look after that.
Lju Lazarevic (28:18):
So if you’re sending a read or a write, that’s going to deal with that and figure out where that crew needs to be routed to. So that’s something that you don’t have to worry about when you are using the drivers. And we also have object graph mappers. So you have the object graph mappers for Java antinode and also we have a Spring Data Neo4j, which is hugely popular and used a lot in enterprise. And we also have another way of interacting with the database, which is GraphQL.
William Lyon (28:54):
If you’re looking at that diagram, we mentioned earlier of the graph data platform. At this point, talking about drivers and APIs, we’re clearly up in this upper left quadrant, where we’re talking about building applications with Neo4j, the audience that we’re interested in, the users at this point are firmly developers. And so the tooling like the Neo4j language drivers, some of the OGMs, Spring Data Neo4j, those projects enable us to build applications and oftentimes we’re building an API layer, so some application layer that we’re going to deploy that sits between the database and the clients. And one way of building API these days is using GraphQL. And GraphQL I think is very interesting in the context of graph databases because there’s a lot of symbiosis there. So it’s a really a mutually beneficial relationship.
William Lyon (29:58):
One is that GraphQL makes this observation that your application data is a graph and allows the consumer of the graphical API to query that data as a graph. So you can think of a GraphQL query as almost sort of a traversal through this data graph and specifying exactly the data from that traversal that should be returned. So this is of course a naturally great fit with graph databases and the property graph data model. But I think also, with GraphQL, we often build these very nested queries that are again, they’re a traversal through the data graph. And so being able to have a graph database back-end like Neo4j that’s actually responsible for executing those traversals on a property graph, means that those queries are going to be very efficient and sort of optimized by the database engine.
William Lyon (30:55):
So there is tooling as part of the graph data platform to help you build that GraphQL API layer that sits between the client of the database, taking advantage of this sort of symbiotic relationship between GraphQL and graph databases. And we’ll go into a lot of the details, but things like helping you generate the API and take care of a lot of the sort of common boilerplate and performance things you had run into when building a GraphQL API.
So we’ve talked about, like I said, kind of this upper left quadrants, where we’re talking about building applications, where we’re developers, we need to build maybe like that API layer, but in this other half of this diagram of the graph data platform is more of the analytics use case where the users are more likely to be data scientists or data analysts or even maybe business users.
William Lyon (32:03):
So let’s talk about some of the tooling in that area of the Neo4j graph platform. I think one of the most interesting pieces of the platform in this area is the Graph Data Science Library. So the Graph Data Science Library allows you to run graph algorithms in Neo4j. So these are things like centrality, page rank to find the most important nodes in the network, things like community detection algorithms to find communities or clusters in the group. And these also form the basis for more advanced techniques that feed into machine learning in our artificial intelligence pipelines as well. So I think lots of really interesting things going on in the graph data science realm.
Lju Lazarevic (32:56):
Another really exciting element that we have sitting in this quadrant is Neo4j Bloom. So Neo4j Bloom is another graph app. So we talked a bit about graph apps, and it is a graph visualization tool. It allows you to explore your data using natural language. What do I mean by that? If you have a good understanding of your data model and your domain, so you have a good understanding of data that you’ve got in the database and how’s that been set up within the use case that you’re working with, you don’t really need to know any Cypher to be able to go off and investigate your data. So what Bloom will do, it’ll go away and investigate the data that you’ve got in your database, it does some sampling and it’ll go off and take things such as labels and relationship types.
Lju Lazarevic (33:51):
And it has lots of flexibility around how it’s interpreted. So you can just go away and type an approximation of the label or property that you’re looking for and it just allows you to really easily explore data. And this is great if you are an analyst, so you understand your data and you can go off and start to investigate the data and the patterns. And it’s a nice visual aid that allows you to do that. And what’s really powerful about Bloom is the fact that you can investigate these patterns and you have this concept of a perspective. So what you can do within your Bloom environment is you can customize it so you can do things such as setting icons for your notes. So if you’ve got people nodes, you can put a picture of head and shoulders on there. If you’re looking at financial institutions, maybe you can put an icon of a bank, you can color code things so you can pick what makes best sense to describe what’s going on.
Lju Lazarevic (34:54):
What you can do as well is if there’s a common pattern that you keep putting in, as a Bloom phrase, when you’re investigating data, you can then turn that away to your database experts who can convert that into what we call a search phrase. So what they can do is they can take away that query that can parameterize, if that makes sense, and they can give you this easily accessible search phrase, for example, find interesting cases, that could be your search phrase, for example, and based on the query that it would continuously run this defined query for you and bring back results. And what you can do with this perspective is you can share it. You can easily see if there is a group of you within a department and you’re all looking at, for example, fraud investigation. You will have the same perspective to work off with the same search phrases, you can share your Bloom phrases for any discoveries that you have found and so forth.
Lju Lazarevic (35:55):
Now, what makes Bloom extra powerful is that it weds really well with graph data science and you have the option to further tailor your perspective. So you can do things such as page rank, as you spoke about earlier, where you’re looking for influential nodes. And what you can do is you can size your nodes based on the number that is generated. So the more influential a node is, the bigger the node could be in your visualization and so forth. So it blends itself wonderfully with graph data science, with being able to do things such as display hierarchies, if you’re bringing back hierarchical data as part of some analysis, the fact that you can size notes, the fact that you can color notes based on various queries. So it works really, really well together.
William Lyon (36:47):
So Neo4j does not exist in a vacuum, right? We also need to think about how Neo4j fits into the larger existing architectures that we have in place. And this is where the Neo4j Connectors can be really helpful and come into play here. And there are three specific connectors that I think would be interesting to talk about. The first is the Neo4j Connector for Business Intelligence. This is designed to enable Neo4j to be used with business intelligence, BI tooling. These are things like Tableau, Looker, these sorts of tools. The way they work is by generating SQL statements. So in Tableau, I may go and configure a bar chart or a line chart and sort of chart to be rendered and Tableau is going to generate some SQL statements to go fetch this data from the database.
William Lyon (37:46):
So the Neo4j connector for business intelligence is actually translating the SQL statements generated by tools like Tableau, translating that into Cypher so the database can then return the data needed to render that visualization. So this is really, really useful if you are using tools like Tableau or other BI tooling already and you want to add data from Neo4j into maybe those dashboards that you’re building.
Another connector that is really, really useful is the Neo4j Connector for Apache Spark. So Apache spark is this framework for distributed data processing and the Neo4j connector for Apache Spark allows us to read and write data to Neo4j from Spark jobs. This is especially useful where we have very, very large data sets that we’re performing some processing or some ETL process on and we can do that in a distributed fashion to transform and then load that data into Neo4j. Really, any other way that we would think about interacting with Neo4j and spark can also be done using the connector.
William Lyon (38:57):
And the third connector that I want to mention is the Neo4j Connector for Apache Kafka. So Kafka is a distributed event streaming platform. So with Kafka, we have publishers that are publishing events to these things called topics, and then consumers that are subscribing to topics and consuming events and then taking action as these events come in. Now with the Neo4j connector for Apache Kafka, we can use Neo4j to subscribe to topics, to be a consumer of events in Kafka, or also to publish to emit events as changes happen in the database.
William Lyon (39:46):
So what are some interesting use cases here for working with Neo4j and Apache Kafka? Well, things I think can be really powerful and interesting when you combine graph data science and some of those graph algorithms with Kafka. So things like fraud detection and I’m running the graph algorithms as data is coming into the system when I’ve flagged a suspicious transaction as fraudulent, I can then publish an event that goes out into Kafka and sort of alerts the analytics team that can then dive into more detail. So real-time root cause analysis is also interesting, real-time recommendations, right? So any sort of streaming real-time process can be something interesting to take a look at using the Neo4j connector for Apache Kafka.
William Lyon (40:37):
Kafka has this concept of the stream table duality, which is a way I think of thinking of streaming data and databases and how those work together. And essentially the stream table duality makes this statement that I can sort of transform any stream into a database table and also any table in a database I can also think of as a stream. And what the Neo4j connector for Apache Kafka does is really extends this stream table duality into, I guess, a Trinity, you could call it, where graphs come into the play and really makes this observation that, well, if I can think of any database table as a stream, I can also compose that stream as a graph, right? So then that’s sort of completing the Trinity there. So that’s a helpful way to think of how to reason about working with streaming data in the context of graphs when we have other systems that are also working with these streams as well.
Lju Lazarevic (41:41):
We’ve looked at the connectors, we’ve looked at ways you can interact with the data, we’ve looked at ways you can do some understanding from the data. And now we’re going to look at some of the places where many of these ideas were sourced from. And this is from the Neo4j Labs. And what Neo4j Labs is, is this space where we go and have a look at experimental integrations. And this is all about thinking about what are the latest trends coming up? What are these for these new things coming in technology and what is the graph story with that? And looking to determine what that graph story is, and build out extensions and so forth to test that. And that is the main driver behind the Neo4j Labs. And we can look at some of the example products that we have within Neo4j Labs.
Lju Lazarevic (42:35):
So for example, we’ve got APOC, which is a package of components. So some of you, oh, sorry, awesome procedures on Cypher. So this is probably the most common project that you have seen which has now moved into labs, which is a set of some 400 to 500 procedures and functions. And this is everything from doing interesting ways of manipulating your data so you have a lot of helper functions with regards to refactoring your graph, to rapidly bringing data in, to being able to build your own custom Cypher procedures and functions, which is extraordinarily powerful to having things such as working with your texts, if you want to do some kind of texts cleanup, fuzzy matching, and so forth. So it’s a very complete set of functionality in there, extraordinarily powerful.
Lju Lazarevic (43:31):
But another project as well which is very, very popular, which is the Neosemantics project. And this allows you to convert all RDF, Resource Descriptive Frameworks into property graphs and back again. So RDFs are extraordinarily popular for your ontologies. So if you’re looking to validate, if you’re looking to serialize your data, if you’re moving it around the web, that is extraordinarily useful for that. And there will be reasons why you want to keep it in that format. Also, it is very useful as well to put it into a property graph. And this is a really handy tool that allows you to do that, transfer back and forth.
William Lyon (44:16):
The last Labs project that I think would be interesting in talking about is GRANDstack. So that’s GraphQL, React, Apollo in Neo4j database. I think it’s really helpful, especially for full stack developers who have a lot of different technologies to reason about as they’re building their application. It can be helpful to have this sort of type of framework. I wouldn’t call GRANDstack a framework, I would call it a series of technologies that work well together and some opinionated tooling around that, maybe as a way to think of it. But I think it’s helpful for full-stack developers to have these sort of stacks to think about, mostly just to reason about how the pieces fit together and then also leverage some of the integrations between technologies that are out there and sort of leverage those into a more powerful, full-stack sort of starting kit, almost.
William Lyon (45:16):
So GRANDstack is sort of that for building applications, leveraging the Neo4j GraphQL integration, React has some great tooling for using GraphQL as a data source. So I think that that makes a lot of sense and then Apollo is tooling both on the server and the client for building graphical APIs and then also again, working with GraphQL data in the client. So the GRANDstack is sort of the best in show sort of combination of these technologies with an opinionated way of how to get started with them together. The easiest way to get started with GRANDstack, I think, is to use the create-grandstack-app command line tool which will sort of skeleton out a full-stack application using all of these components together.
William Lyon (46:05):
And you can find more about any of the Labs projects, there are several though that we didn’t mention as well on neo4j.com/labs which we will again, link in the show notes. So that’s been kind of a whirlwind overview, I guess, of the Neo4j graph data platform. Hopefully, that wasn’t too overwhelming, and you saw maybe some aspects that may be relevant for areas or tooling that you’re interested in, if not, and you just sort of want a very easy way to get started and see how to use Neo4j and maybe a bit more of a guided environment. I think the Neo4j Sandbox is a great way to sort of dip your toes in the water of the larger Neo4j Graph Data Platform.
William Lyon (46:56):
And so we’ll link the Neo4j Sandbox in the show notes as well, but Sandbox allows you to spin up a Neo4j instance, a preloaded with some data and sort of guided Cypher queries to start working with and exploring data you can choose from different data in different use cases, even choose from some of your personalized, like your own Twitter data. You can sign in with Twitter and we’ll import some data specifically from your Twitter network and show you how to query and visualize that with the graph. Sandbox also includes things like the Graph Data Science Library and some of those graph apps that we mentioned in Bloom as well. So anyway, that I think, is a great way to sort of get started with the graph platform.
Lju Lazarevic (47:40):
Brilliant. And I think it’s probably worth mentioning quickly as well is if you get stuck in any of these during your journey, do check out our community forum. So that’s community.neo4j.com. And please let us know if you get stuck on some think posts that we have an amazing community who come and talk about their experiences and perhaps guide you there as well. But also if you do something fun, we want to hear your graph story. So what projects are you working on, what are you experimenting on, what fun thing have you learned? Please put it up there, we’d love to hear about it.
William Lyon (48:18):
Absolutely. Well, I think that is all we have to say for today about the graph platform and we will see you next time. Cheers.