GraphStuff.FM: The Neo4j Graph Database Developer Podcast

2022 Recap: Highlights From The Neo4j Graph Community

Episode Summary

Developer Advocates Will Lyon and Jason Koo recap major news, product updates, and notable community contributions from 2022. This episode is broken up into three parts: first covering topics of interest for beginners - those new to graphs and graph databases. The second section highlights news for intermediate graph database practitioners. And the third section presents noteworthy items for advanced graphistas.

Episode Notes

Episode Transcription

William Lyon (00:00):

Welcome to GraphStuff.FM your audio guide to the graph technology landscape with a focus on Neo4j. My name's Will and I'm joined in the studio today with our co-host Jason. Hey, Jason.

 

Jason Koo (00:13):

Hey, Will. Hi everyone, my name is Jason Koo, I am a developer advocate like Will, and welcome back to a whole new year, a new season of GraphStuff. In this episode, we're going to structure it in a particular way since we have a lot of stuff to cover here. So we're going to look back at a lot of the stuff that came out and happened in 2022, and at a high-level talk about new Neo4j releases, the nodes, and the GraphConnect conferences, and events, and other community contributions that have occurred. To break up all this content we'll basically break up this podcast into thirds, right? The first third beginner topics, and then the middle cover intermediate, and then lastly cover advanced topics. We'll see how that goes.

 

William Lyon (00:52):

Cool, should be fun. Let's get started talking about some more early stage of your half to discovering graphs in Neo4j. Starting by talking about the best way to really get started with graphs in Neo4j, which is the AuraDB free tier. So AuraDB is Neo4j's managed database service so you can spin up Neo4j instances. What I like about Aura is there's three tiers to Aura. There's the free tier, you don't even need to put in a credit card just get up and running, the professional tier which is what you use when you have an app that you're ready to deploy and you need a Neo4j for your application in production, and then there's the enterprise tier which, of course, is for the giant mission-critical enterprise application that's a bit more customizable and has some additional enterprise features.

 

(01:50):

For that initial getting started with Neo4j, I think Aura is the best way to go about that, and specifically the free tier of Neo4j Aura. One of the things that was new in 2022 for Neo4j Aura is the ability to start with sample data sets. So this was something that you may have seen previously in Neo4j Sandbox, being able to spin up a database and choose from a handful of sample data sets with some guides that walk you through how to work with and explore the data. This was added to Aura this year so there's a handful of those in there. You can spin up things like the movies dataset, the Stack Overflow dataset, things like this.

 

(02:30):

The other thing that was new in the Aura land in 2022 was the GA launch of AuraDS, which is a flavor of Aura geared around data science workflows. So AuraDS includes the Graph Data Science library, which we're going to talk about later on today. Graph Data Science is basically a way of doing graph algorithms and graph analytics using Graph Data Science in Neo4j. And so now you have that tooling available to you in Aura, in the cloud with the configuration of your Neo4j cluster in a way geared for working with data science. A bit different then application development which is what you see in AuraDB. In addition to the three tiers in AuraDB, you have two flavors of Aura, AuraDB for application development and AuraDS for data science so it was really cool to see that come out. Now, one thing that is really valuable, especially when we're working with data science is being able to visualize the results of our analysis. For a while, Neo4j Bloom has been that tool. What was new in Bloom in 2022?

 

Jason Koo (03:44):

For those who haven't used Bloom, Bloom came out in early 2019. Right at the end of 2021 2.0 came out. And 2.0 came out with quite a number of features. I believe GDS algorithms could be run in Bloom at that time, but it wasn't until I think June of 2022, so last year, that the GDS panel was added so that you could more easily and visually use the Graph Data Science algorithms and see the results. One thing to note about the GDS integration is when you run an algorithm you don't immediately see its effects so you have to either click into nodes or relationships to see the scores that were added from the algorithmic outputs. Or you turn on the conditional formatting so that you specify how nodes inside the Bloom UI tool change shape or change color based on the algorithmic scores that are handed to it.

 

(04:39):

And so 2.0 came out at the end of '21, and now that we're into '23 I think version 2.6 came out just before the end of the year. 2.5 came in October, 2.6 I think came out just after the New Year's. The biggest thing I think is probably that GDS integration which allows you to use seven of the Graph Data Science algorithms. Those algorithms are broken up into two broad categories, centrality, and community detection algorithms. You can find things like page rank or finding out which of your nodes are more central to whatever your problem space is or further out in the edges.

 

(05:17):

To get the GDS integration working if you AuraDS, right out of the gate it's working for you, right, you don't have to do any configuration you can just go ahead and use it. But if you are running Neo4j in the desktop application, then you have to first turn on the GDS plugin and then you have to go to the Neo4j.com file, do a few updates, and then you can use Blooms GDS panel to actually work some magic. With all the sort of updates with Bloom, there are a number of videos that have been produced throughout the year, but there are a lot of other beginner-related videos that may be of interest for you. Will, do you have any favorite videos in the beginner category?

 

William Lyon (05:53):

I think my most concise sort of intro to Neo4j video is definitely Fireship's Neo4j in 100 Seconds video. If you're not familiar with Fireship it's a very popular YouTube channel that covers lots of different technologies in a very compelling and very visual way so it's a way to level up your skills learning about new technologies with really short videos. Definitely worth a subscription to that YouTube channel. But they did a Neo4j in 100 Seconds video that touched on what is Neo4j. What does the query language Cypher look like? What does some of the developer tooling look like? Graph algorithms in Neo4j. There's one with visual elements in it but then also that ... It's only 100 seconds so it's a great video to share with folks if you're trying to explain what exactly is Neo4j.

 

(06:41):

For a deeper dive in especially understanding why Neo4j is so powerful. How does Neo4j achieve things like index re-adjacency? How does Cypher work? Under the hood so to speak is the Under the Hood series that was done by Chris Gioran who's chief architect at Neo4j. And this is a YouTube series. We'll link all these in the show notes. You can find this one on the Neo4j YouTube channel. It does a really good job of going, as it says, under the hood in a lot of depth to understand really how does Neo4j make these optimizations? How is Neo4j so powerful in giving you that super-performant graph traversal?

 

(07:24):

Another really interesting video that we saw in 2022 was the NODES keynote. So Nodes is the Neo4j Online Developer Education Summit, an online conference for leveling up Neo4j skills and learning about other things that people are doing with Neo4j. And the keynote was given by Yale Professor Nicholas Christakis who has written a lot about graphs. He was a co-author of the book Connected which talked a lot about how graphs influence and are found in everyday life. And this keynote, while it's not about Neo4j, I think it was really compelling to show how graphs exist and influence the society around us. He talks about some of the papers and experiments that his lab has done in most recent years and how graphs and social networks influence human behavior. That was really compelling. So we'll link that in the show notes but definitely worth checking out as well.

 

(08:28):

In addition to talking about interesting videos that you might be interested in checking out, I want us to highlight at least one question from the Neo4j community forum I think for each of these sections that I think would be interesting. If you're not familiar, the Neo4j community forum allows you to post questions or share interesting projects that you're working on, and it's an online forum for folks to respond and get discussions going around that. I took a look at the most viewed and most favored hosts in the community forum in 2022. And the beginner category, I thought this one was really interesting. So the title is How to Get All Connected Nodes and Relationships of a Particular Node. And the thing that I want to highlight in this question that I think became really apparent is, of course, the power of Cypher. So if you're not familiar, Cypher is Neo4j's query language. We draw these ASCII art representations of the graph pattern that we're looking for was one of the compelling pieces of working with Neo4j technology.

 

(09:33):

But what was really powerful in this discussion was the use of the variable length path operator. If you're familiar with how we conduct patterns and paths in Cypher, and specifically relationships with square brackets, there's some syntax you can use. Basically an asterisk and then a lower and upper bound on the number of relationships, the number of paths to follow. Using that variable length path operator allows you to specify very complex patterns and search for connections when you don't know how many relationships to follow. So that was a cool one. We'll link the question and also the Neo4j community site in the show notes.

 

Jason Koo (10:11):

Cool. For those of you who are very comfortable or already aware of the benefits of using a graph database, and Neo4j, in particular, if you're still having trouble or in the process of trying to decide how to talk to your team or maybe a boss on the benefits of using a graph database on a or future project. One thing I thought was exciting, and is worth noting, is that last year Gartner, for the first time in the cloud-based management service Magic Quadrant, added a graph database so added Neo4j into that Magic Quadrant which was the first time any graph database vendor was added to I believe any Magic Quadrant. I'll have to double-check on that. But it's definitely for the cloud management service that was the first time. We'll put a link to this in the show notes as well. That says a lot when Gartner, which does such a great job of tracking so many different technologies, acknowledges that the graph database space, right, the vendors and the services are now mature enough to include into their sort of top rankings.

 

William Lyon (11:12):

Cool. So at this point, you've written some Cypher, maybe you've used Bloom to explore and visualize the data set, you've played around with some of the sample data sets available in Neo4j Aura, and you're ready to start leveling up. We're going to call this the intermediate section of the podcast. Okay, where do I go from here? I've worked through some of the guides, some of the sample data sets, how do I level up? I think the next place you turn is GraphAcademy, which is Neo4j's online course platform. There are lots of different courses covering lots of different topics both on application development and data science with Neo4j. In 2022, there were at least eight new courses that were added to GraphAcademy. I saw there was a new one this week. I feel like there's a new course coming out now almost every week so this is a really cool platform to work through, get your hands dirty. It has a really neat integration with Neo4j Sandbox.

 

(12:13):

Some of the courses that were introduced in 2022 were building Neo4j applications series. So we saw one of these for Java, Go, Python, .Net, and Neo4j. What I think is interesting about this course is you're building a movie web application called Neoflix, and the idea basically is to build the backend API for this web application. So you start with the UI for it and you need to be able to see how to use the Neo4j drivers to be able to build the backend for this movies web app which is pretty cool. There's some data science courses, there's a Neo4j Graph Data Science certification course so you can get certified in Neo4j Graph Data Science through GraphAcademy. Definitely check out GraphAcademy, there are new courses coming. I'm aware of a few in the pipeline that I've seen previews of so lots of exciting stuff there.

 

Jason Koo (13:09):

Cool. Some other updates that came out with the UI besides things like Bloom is Workspaces. Prior to [inaudible 00:13:17] of last year, the default interface when you signed into Aura database or AuraDS was you would have cards, right, that represented each of your Diss's. You had the option of jumping into one of three tools, right? You could jump into the Neo4j browser where you could enter in Cypher commands and interface with the database directly. You could kick off Bloom as one of those tool options. And then the third was turning on the Data Importer which I'll talk about in a moment here. But after December, the default new interface is a single workspace so you can still access each of those tools but it puts it all under one unified UI umbrella. The idea is that you only have to log into the Workspace tool once instead of into each individual tool separately. It just makes it more a streamlined workflow.

 

(14:04):

If you did have an older database that you had started prior to December, then base interface will still be that older classic interface. To update it, you go to your account dropdown, which is basically your profile icon, and when that dropdown appears you'll have the option of switching over to the Workspaces interface or staying with the classic. And I believe you can jump between the two if you so desire but that is the default now. If you wanted to see more details regarding Workspaces in AuraDB, John Stigman did a great NODES Intro to Workspaces video on it so we'll link that in the show notes as well.

 

(14:40):

Okay. The Data Importer, which is one of those three tools that comes with Aura, is a tool that came out last year and it is a no-code interface for importing data into a Neo4j instance. So prior, you would use either the APOC commands or you would use a set of Cypher commands to load CSV data into your Neo4j instance. So it allows you to drop comma-separated files or TSV, tab-separated files, into this web app and then connect it using a graphical interface which is basically an iteration of the Arrows.app, right? So if you haven't used Arrows.app before, it's a publicly available graphical application that allows you to quickly data model a graph database. It allows you to quickly put nodes and relationships down and then to visualize the whole thing. Once you're done you can export that data as a JSON file or export a PDF or an image of it but you can't convert that into a graph database.

 

(15:40):

So what data import does is it builds on top of that. You can do this graphical modeling and tie those nodes and relationships on the properties to data from those CSV or TSV documents. Okay. So that's the data import tool in a nutshell. Okay. One thing that is not part of Workspaces, maybe in the future, would be GraphQL. Will, can you tell us more about what came out in the GraphQL space last year?

 

William Lyon (16:04):

So now you've got your data loaded in Neo4j, you're working on developing your application. A great way to expose that data to your application is with a GraphQL API. GraphQL is this API query language that allows us to work with our application data as a graph so, of course, it makes sense to be building your GraphQL APIs with Neo4j. To help with that, the Neo4j GraphQL library has been around as a graphical integration for a while now. But in 2022 there were some really interesting developments in the world of Neo4j QL. One of those is similar to some of the ideas in Workspaces and the Data Importer, which are low code tool, was the introduction of the Neo4j QL toolbox. Now, this is still in beta but it's available out there. I'll drop a link to it in the show notes.

 

(17:00):

This is a web application. You point it at a Neo4j database and it will generate a fully functional graph kill API just by inspecting your database. And this is powered by the Neo4j GraphQL Library which gives you that functionality for a no JS GraphQL server application. GraphQL Toolbox then allows you to edit and update your GraphQL schema, doing things like adding custom functionality with Cypher, these sorts of things. But really a tool for developing and testing your graphical API with writing absolutely zero code. So that's a pretty exciting development.

 

(17:42):

The other big feature added to the Neo4j GraphQL Library this year was support for subscriptions. So subscriptions are GraphQLs Pub/Sub or real-time vent publishing mechanism, and subscriptions allow you to build applications around messaging or real-time collaboration, these sorts of things. There was a great demo of this at Graph Connect. We mentioned GraphConnect in passing, but we'll say that GraphConnect was our in-person conference in June in Austin. Lots of really great talks available at Graph Connect. The videos are also available which will drop a link to those in the show notes as well.

 

(18:25):

Around GraphConnect, we also had an online hackathon in the time leading up to GraphConnect that was focused around Code-Golf. You may have heard of Code-Golf before as this idea of being able to write a program using the shortest number of lines or the shortest number of characters in your program. So we thought it would be fun to take that idea of Code-Golf and apply it to Cypher and Neo4j, but rather than not just writing the shortest queries but focusing on optimizing queries so query tuning. The way that we do this is looking at a metric called DB hits. So if you've ever looked at the query plan for a Cypher query or profiled a Cypher query after it ran, you're trying to optimize and tune the query. The metric that you're trying to optimize typically is called DB hits which is basically the number of database operations that a given Cypher query does to give you an answer to your question.

 

(19:28):

So the Code-Golf Hackathon, this is really fun. We took a Stack Overflow data set so users, questions, answers, tags, modeled that as a graph. By the way, this is a data set that's available also in Aura as one of those sample data sets I mentioned. But being able to take that Stack Overflow to set and ask questions of the set using the fewest number of DB hits and writing the shortest Cypher query. So we gave away I think over $27,000 was the prize pool for this across three different tiers. That was super fun. Now, I mentioned GraphConnect and I said some of the videos were available. Jason, you want to tell us a little bit more about some of those GraphConnect videos and your favorites?

 

Jason Koo (20:10):

Yes, definitely. Now fortunately there were quite a few sessions at GraphConnect that related to Code-Golf. The first one I'm going to mention is the Top 10 Cypher Tuning Tips & Tricks presented by Michael Hunger so that's a good one. If you're just getting started with Cypher or you've been doing it for a little while and you're wondering why is my query taking so long or why is it returning me to too much data? This is a great session on just best practices and things to watch out for. The profile and the explain are covered so that you yourself can figure out how to improve your own queries.

 

(20:41):

Some other good videos and other good sessions that came out of GraphConnect was how to import JSON using Cypher and APOC. This is really good if you're not wanting to use say the Data Importer and you want to do more a script-based importing of data into your Neo4j instance. One of the first things that everyone is going to want to do is import data into your live database so this is a great recording to go to and take a look at. And that was done by Eric Monk. So we will post a link to that into the show notes as well. The show notes is going to have just a giant heap of links. We might have to add a table of content to the whole thing later.

 

(21:15):

Okay. Some other recording center of interest is the Neo4j Driver Best Practices by David Allen. This one is really good in understanding, or if you're curious to know, how the driver works under the hoods in many ways. But even outside of that, there are certain things, certain commands you may run against the driver and expect some result. And you may be wondering why is the result may be slightly different than what you initially thought? This recording I definitely recommend especially if you're coming from Python because I found David's session to be highly insightful on how to get a little more performance and B, how to avoid some very common pitfalls that we see with implementing the driver, at least in the Python space and I assume it's also in some of the other platforms.

 

(21:59):

Two more videos I'd like to mention. First is Michael Hungers and Alex [inaudible 00:22:04] entire Discovering Aura Free with Fun Dataset series. Pretty much I think every two weeks they get on, they pull some dataset from someplace, right, Kaggle or some public dataset that's available, pull it into a Neo4j instance. Oh, they do some data modeling, of course, beforehand. And through this whole process, right, Michael and Alex are talking about A, what the end database is going to show them. But they go through the process of pulling in that data set, and cleaning it up, and then running a number of different Cypher commands on it. And it's always a very organic experience, right? They don't go ahead of time and clean everything up to make this sort of clean running session, right, it's basically a recorded Twitch session.

 

(22:44):

And there's always some road bump in either importing the data and cleaning up or something and they always work through it. They keep running into these technical speed bumps and it's fun to watch them noodle around it. If you're doing your own Cypher production or you're working on projects and you also undoubtedly will encounter something, right, some data that's formatted in some weird way. How they noodle through it is hugely beneficial.

 

(23:08):

The last video I'll mention in our intermediate category is Exploring Graphs Visually with Jupyter Notebooks which was a NODES presentation by Sebastian Muller who's the CTO at yWorks. So if you're a heavy Jupyter Notebooks user I think you'll find this plugin that they created hugely useful. It's very visually appealing, it seems very simple to use, and I totally recommend that. If you're using Jupyter Notebooks and you want to visualize graphs, that is a great starting place.

 

(23:36):

Moving on to community blogs. One that I would like to highlight is written by Sixing Huang. The blog that Sixing wrote was in companion plant knowledge graph in Google Sheets and Neo4j. Which at first glance is wait, what? We're doing a knowledge graph on plant knowledge. Yes, it is exactly that. Which if you're into plants or the scientific word for plants are, if you're interested in that space then yes, it is a great piece regarding that. If you're not like myself, the part that I found most interesting was his combination of using Google Sheets and Neo4j to glean interesting data. We'll put this link into the show notes as well, definitely check that out.

 

(24:17):

Another reason I really wanted to highlight this article is that Sixing, he writes quite a few articles, at least on Medium that I've noticed, and he is one of our Neo4j Ninjas. And the Ninja program, if you haven't heard of it before, is basically a cadre of community folks who are very knowledgeable about Neo4j and graph databases. And they often write about projects they're working on or give insights into different graph-related technologies. They're often in either Discord or on social networks or in Stack Overflow and they're answering lots of questions, they're running meetups, and they are really community champions, right? And so that we recognize them by adding them or inviting them to our Neo4j Ninja program which you can apply for, and I think we will add that link in as well too.

 

William Lyon (25:09):

For our intermediate Neo4j community question, one of the highest viewed questions had to do with geocoding. It's called error when trying to invoke Cypher procedure apoc.spatial.geocodeOnce. I thought this was an interesting question because it's first of all highlighting APOX which is the standard library for Cypher. It adds a number of procedures and functions to give enhanced functionality to Cypher. Lots of things around things like data import, it can mean different data sources to work with, these sorts of things. And one, the procedures available in APAC gives you the ability to geocode addresses or information. Geocoding is basically taking a description of a physical place like an address or the name of a country, the name of a city, something like that, and converting that into latitude and longitude. Now, geocoding typically uses an external service, and the APOC procedure by default uses OpenStreetMaps Nominatim API. You can also configure it to use the Google Geocoding API and a couple other options as well. And so this one ended up being a configuration issue.

 

(26:21):

But I thought this one was really interesting just being able to highlight some of the really cool and useful functionality that APOC brings to Cypher. So definitely check out APOC if you haven't already and you'll be able to add like geocoding which is something that I use quite fully. They have been building up a news knowledge graph of news articles over the last year or so and I use this APOC geocode procedure to geocode all of the geographic regions mentioned in the news so definitely very useful. So that brings us to the end of our intermediate section and the next section is really for the graph rows out there.

 

Jason Koo (27:05):

Cool. So if are a pro, if you're advanced user, I think you will appreciate a lot of the new features that came out with Neo4j 5. Just rattle off a couple of them. So we've got a differential backup that was added, graph pattern matching, which is something you'll definitely hear more about. What's the best way of saying it?

 

William Lyon (27:23):

I would think of it as a way of moving predicates out of a wear clause and moving those predicates into the ASCII art representation of the graph pattern that you're describing.

 

Jason Koo (27:36):

You described it I think perfectly well. But if I recall correctly, one of the big things of bringing graph pattern matching in was to bring Neo4j Cypher in line up with what will become the future of graph query language, right? And so this part requires some explanation. There is an ISO working group that has been working on creating a graph query language standard similar to the SQL standard but for graphs. It comes in two parts, right? There's the graph query language, which as far as I know it currently looks a lot like Cypher. And then also for an SQL pattern graph extension of some sorts, there's these two parallel components that this working group is working on. And so they have a four-year term where they're going from exploring the different query languages to making a final proposal. And so they're halfway through, right? In 2024, if everything goes to schedule, they will have an ISO standard out for everyone to follow. The idea is, all the graph vendors with query languages will make it compatible with the graph query language.

 

(28:43):

Okay. So outside of graph pattern matching we have autonomous clustering. Autonomous clustering replaces the former causal clusters. And basically what it does is it allows you to automatically manage placements which means where a database goes on a server and allows for horizontal scaling. The idea is if you have this master database instance and you have different departments who want access to parts of the graph for their department's use, they can spin those up using autonomous clustering. So all your different departments can spin up these different instances without causing the main incidence or the main cluster a great deal of headache. Now, to use autonomous clustering, it is a breaking change from 4.4x to the new five. The last thing about autonomous clustering is its enterprise only so just be aware of that.

 

(29:32):

Neo4j version five's feature is a differential backup. There are two options for backing up your database. One is a full backup which as you can imagine is basically you're more or less duplicating your database, and the other is a differential backup. The differential backup is very similar ... Is a diff right? So if you use Git and you're familiar with how Git works, right, it's recording the deltas, right, recording the changes from version to version. When you select backup options with version five, you now have the option of choosing a differential backup, right, a diff backup versus doing a full backup every single time. The next great feature in version five is the Ops Manager. What's new with that, Will?

 

William Lyon (30:13):

So the Ops Manager that was introduced in June is really a new product on its own I suppose. Once you've built your application, you've deployed your Neo4j cluster, the ongoing task is managing your cluster. You want some central tools and some nice UI to be able to do this that wraps up a lot of the administrative procedures and functionality in Neo4j. For a long time folks may have used Halin from Neo4j labs to fit this role but now there's an officially supported Ops Manager tool to be able to administer and manage your Neo4j cluster. And I think there's really four pillars that the Ops or NOM, the Neo4j Ops Manager as some folks like to call it, is fun.

 

(30:58):

And so those four pillars are monitoring, so these are things like being able to look at logs, metrics, basically figure out what is the state of each instance in my cluster. Administration, being able to tweak security settings, different configuration options, things like that. The third pillar is going to be operations. So really being able to tasks things like backups, scheduling upgrades, these sorts of things. And the fourth pillar is this idea of integration. We want to be able to integrate with third-party tooling like alerting systems, maybe some central IT management system.

 

(31:35):

We'll link some documentation and a bit more about Ops Manager but this is a really exciting tool to have for those cases of being able to administer and manage your Neo4j cluster. So we talked a little bit in the first section of this episode about AuraDS, which is the flavor of Aura focused around data science that includes the Graph Data Science Library. And I said we were going to talk a lot more about GDS later in the episode. Jason, do you want to tell us some more about GDS?

 

Jason Koo (32:13):

A little bit more information. A number of algorithms for the Graph Data Science package was moved from the beta tier to production tier so five of them. The breadth first search, the depth-first search, K-Nearest neighbor, and Delta-Stepping in similarity functions, they were all moved from beta to production tier. So now if you're using those algorithms you can use them more confidently and know that the APIs attached to them aren't going to be changed. There's not going to be any immediate breaking changes for those so those are solid. Other information about GDS is the pipeline catalog. So I don't know if this was in a different tier as well before, but anyways, the pipeline catalog is available. So the GDS Python client wasn't made available until last year. So if you're working in a Python application doing some ML work in Python you can now access Neo4j's GDS package with a Python client.

 

(33:06):

So we do have a Python client already, and the GDS Python client is built on top of that. You can actually use either clients to run direct Cypher commands but they'll run slightly different between the two. In both the GDS Python and the normal Python client, you just put in a string representation of a Cypher command into it. What you get back though is different, right? In the normal Python client, you'll get back either a list or ... Usually you get a results object that you have to parse through.

 

(33:39):

But in the GDS Python client, you will get a Pandas DataFrame back so that's something to be aware of. So if you are not needing to use a data frame you will have to take the result and convert it to something else. Otherwise, if you are looking for a data frame that is, obviously, the way to go. Obviously, as we mentioned before, AuraDS is an option through our cloud services. And last year we also made AuraDS available as a Google Cloud extension. So if you're running a Google Cloud instance you can basically just spin up a Neo4j instance with a click of a button.

 

(34:11):

The last thing I'll mention regarding GDS is an article by Tomas [inaudible 00:34:16]. Now Tomas, if you follow him on Medium, he writes a ton of interesting articles on Medium and specifically about ... Maybe not specifically, but I see a lot of his Python articles. And so he's got a good one on GDS and Python to improve ML models. So if you're in that space, a data scientists or data engineer, I totally recommend checking out that article of his. Tomas writes a lot of great content. He also does a number of videos of which you can find if you search on YouTube.

 

(34:42):

But in the advanced video highlights realm I would like to call attention to a video by Dr. Maya Natarajan who also lives in San Diego, she's a great person, and Dr. Jesus Barrasa did a really good session during nodes called A Universe of Knowledge Graphs. There's quite a few insights from this, but basically, they go through a lot of graph use cases. One of the great takeaways I got from Maya was that prior to the pandemic she mentions that there was a great focus on big data but the move has moved away from that, right, especially for knowledge graphs and data. Instead of just getting large amounts of data in very narrow domain spaces, is to actually go to shallow data across a wide range of contexts. That paradigm enriches ML better, makes it more than having very deep single domain big data stores. Lots of great use cases in that session. They talk about finding really big impacts with small dependencies and using pattern matching to find hidden values in the data that you do have.

 

William Lyon (35:50):

Another great video that I think we should highlight in the advanced comes from Advanced Track at NODES2022 titled Arcurve Skills and Staffing Recommender. So this was a talk given by Mike Morley and Pete Tunkis at Arcurve in Canada. Arcurve is a geoscience technology, consulting, and technology company. They do lots of really things working with geospatial data and they use Neo4j for a big portion of their projects. And are really always excited to talk about what they're doing which is really neat. In this talk, they talked about one of the challenges that they have which is identifying consultants and finding the right staffing for projects.

 

(36:39):

So they talk about a system that they built that looks at experience, various projects that people have worked on using graph algorithms to find people that are a good fit for various projects that come up but also building a capacity model. So not just who would be a good fit but how can we find people that have capacity based on projects that they're working on for building teams in the organization? There's lots of really cool visualization with things like GraphXR for 3D graph visualization, looking at geospatial tooling using Neodash as well. Definitely check the recording of this talk out if that at all sounds interesting to you.

 

(37:20):

Moving on to our advanced post in the Neo4j community forum. One of the most viewed posts there that I think fits in the advanced category has to do with Pregel API for the Neo4j Graph Data Science Library. So Pregel unsupported class file major version is the title of this one. So what is the Pregel API? Well, the Pregel API basically is a computation model, and the Java API that allows you to build your own graph algorithms and then run those in Neo4j alongside the other algorithms that are available by default with the Graph Data Science Library. So if there's some specific graph algorithm that you have a use case for that's not supported by GDS you can use the Pregel API to implement that self. Fun fact. It's called Pregel after the name of the river in Königsberg from the old bridges of Königsberg problem, that was the origin of graph theory. So that's a fun little tidbit there for you.

 

Jason Koo (38:33):

That was, I think, a great summary, great encapsulation of all the new and exciting things that happened in 2022. Now looking forward to 2023, what do we think is going to happen in the Graphing space? There's a great interview from Daniel Ng interviewing Jim Weber who's our chief scientist, and he gave a number of predictions for 2023 and broke down to three broad categories. He was thinking that in general there's going to be more graph database practitioners, the number of folks using graph databases, in general, will expand, and the people who have been using it prior will become more experienced. A greater number of experts will emerge in the next few years. Interestingly, he thought Graph Data Science will grow faster than general graph database use because Graph Data Science makes for better ML prediction. That sort of growth will be spurned by it. And I definitely agree with Jim Weber in this regard.

 

(39:29):

Anytime I mention Graph Data Science or graph with ML models, the amount of engagement and circulation of information it just toots more than when I talk about general graph database features and whatnot. And the last prediction that Jim made was that there will be this continued movement from on-premises or local database instances to cloud manage or cloud-available services. Okay. So those were Jim Weber's predictions for 2023. Will, what do you personally think will happen either this year or in the next few years?

 

William Lyon (40:00):

Well, I think that what we're going to see is not just more usage of graph databases and graph algorithms with Graph Data Science like Jim said, but I think the types of users that we're going to see of these technologies is going to expand. And I think that's going to be driven by improvements in these low-code tools that we've talked about already. So we talked about the data science integration in Bloom that allows us to click a couple of buttons, run some graph algorithms, and visualize the results. We talked about GraphQL Toolbox that allows us to build a fully functional graphical API without writing any code. We talked about the Data Importer that allows us to import and model data as a graph also with just a few button clicks.

 

(40:46):

I think what we're going to see is moving beyond just developers and data scientists with a background in programming. A term I've seen used to describe these users are citizen developers and citizen data scientists. But I think what that means really is you no longer need to have the skills of working with programming languages, even writing Cypher queries to be able to use graph technology to answer the questions that you have of your data, which I think is going to be really powerful and is really going to help for ties access to graph technology in 2023.

 

Jason Koo (41:26):

Totally agree. Along those lines, I'm hoping, I'm predicting that graph databases will start to be used more in more apps, everyday applications that a developer might spin up and prototype or create some new product with. Whereas I think a lot of the use cases we see now, or at least in the past, was larger companies with huge complex interconnected data problems and they needed to solve those by putting them into a graph database. What I'm hoping is that we'll see graph databases used in more just applications that people are just building right out of the gate, right, with these sort of low no-code tools, right, whether it be something low-code like FlutterFlow or no-code like Bubble. To have graph database integrations with those and see those apps become marble and more insightful just from changing their ... Or adding a graph database to the mix is ... It's definitely what I hope will happen but I'm pretty sure it will definitely happen once more people see the benefits of using a graph database to tackle complex problems.

 

William Lyon (42:28):

Totally. I'm right with you there. So should we end by talking about some of our picks?

 

Jason Koo (42:34):

Yes. I guess I could go first. I have a couple tools in the Python space that I really like and would like to see it growing. The first one I'll mention are these two community libraries. One is called Simple and one is called Pyfer. So Pyfer is the older one. And what it allows is for a Python developer to Pythonic create a Cypher statement and then run that through the Python driver. Simple does the same thing as well. I guess one of the major differences between Simple and Pyfer is Simple has auto-completion hooks so that right from your ID you can get that auto-completion handiness that comes when you're developing very quickly. But Pyfer has been around longer so it has more of the Cypher that's already baked in. As far as I know they're both open so you can look at them you can contribute code to them. So I like using and monitoring those two tools.

 

(43:27):

Most recently our colleague Adam Colley, he created a Neo4j VS Code extension that allows you to run Cypher commands into a Neo4j database right from VS Code which is great. When you're developing oftentimes, especially with a Python driver without one of these extra packages is you would have to write up the string, test it in Neo4j browser or some other space, make sure that call actually works, and then copy the string representation and put it into your Python. The ability to test the call right from a VS code without having to leave your ID, a great time saver so I definitely recommend checking out that extension.

 

William Lyon (44:06):

Totally. I love that VS Code extension. As developers, we're in tools like VS Code for most of the day so not being able to need to step away in that context, switching out of the tooling that you're familiar with is super powerful for developer productivity. That's really cool. For me, I just have one pick that I wanted to highlight which is the Flat Graph hub action. If you're not familiar with hub actions they are tools and jobs that run typically when something happens in a GitHub repo. So typically these are used for things like running your tests or deploying your application or running tests when someone opens a new PR, but you can also use them to schedule periodic CRON jobs. The Flat Graph GitHub action allows you to periodically go out and fetch a JSON or CSV file or hit some API endpoint, and then just use Cypher to define how you want to say load that data into an Aura database.

 

(45:08):

And so I have a number of repos running that are using the Flat Graph GitHub action to import data from, for example, my news graph that I mentioned earlier that I'm building up. I'm fetching the day's news every day to build that up using Flat Graph or grabbing articles from the Lobsters news aggregator every hour. A simple way to handle some basic ETL with GitHub Action and Neo4j Aura.

 

Jason Koo (45:38):

Nice, cool. A great pick. GitHub Actions, super, super useful. One of those underrated things too. Cool. Before we sign off I did want to invite anyone who's listening to subscribe or check out our weekly newsletter that Yoland handles. Every week spends a lot of effort picking out like the best graph and community-related blogs, videos. And she interviews I think some of our top Ninja's, top community contributors and gets her insights, and puts it into the newsletter. Definitely recommended her resource.

 

William Lyon (46:11):

Absolutely. Great. This has been a lot of fun. A little look back at 2022 in the world of graphs and Neo4j. It's a little bit of a reboot of the GraphStuff.FM podcast. We're going to try to do these once a month coming up so definitely hit subscribe or where you get your podcasts. Let us know what you think either on Twitter or the Neo4j Discord. And we will see you next time. Thanks a lot, Jason.

 

Jason Koo (46:43):

Thank you, Will. Thank you, everyone.

 

William Lyon (46:45):

Bye-bye.