GraphStuff.FM: The Neo4j Graph Database Developer Podcast

A Buffet of Cypher and ChatGPT Information

Episode Summary

Feast on various topics from ChatGPT, Graph Databases, and the Cypher Query Language with Neo4j’s Developer Advocates Will Lyon, Alison Cossette, and Jason Koo. Learn about the history of Cypher, its basic syntax, tips for creating more performant queries, and what’s new for Cypher in Neo4j Version 5. Summaries of the most recent livestreams, a Graph Data Science book, and upcoming workshops also covered. Tune in to satisfy your appetite for graph knowledge!

Episode Notes

OpenCypher: http://opencypher.org/
ChatGPT: https://openai.com/blog/chatgpt
Introduction To Cypher: https://neo4j.com/docs/getting-started/current/cypher-intro/
Create Neo4j Database Model with ChatGPT: https://neo4j.com/developer-blog/create-neo4j-database-model-with-chatgpt/
Use ChatGPT to Query Your Neo4j Database: https://towardsdatascience.com/use-chatgpt-to-query-your-neo4j-database-78680a05ec2#d083-cf615c9d7f04
Maximising Efficiency: The Power of ChatGPT and Neo4j for Creating and Importing Sample Datasets: https://neo4j.com/developer-blog/chatgpt-neo4j-import-sample-dataset/
Graph Data Science with Neo4j Book: https://medium.com/@st3llasia/graph-data-science-with-neo4j-book-e7f32cfa41cc
Neo4j (slow) Query Logs: https://neo4j.com/docs/operations-manual/current/monitoring/logging/#query-logging
Neo4j Query Tuning Guide: https://neo4j.com/docs/cypher-manual/current/query-tuning/
Neo4j Live: Wardley Mapping with Neo4j: https://www.youtube.com/watch?v=UKvjYZ2kiNY
Full Stack GraphQL Book Series: https://www.youtube.com/playlist?list=PL9Hl4pk2FsvVg3c74thYEWVsCPPVB1qqn
Going Meta - Ep 13: Creating (and RDF-izing) virtual graphs over external data: https://www.youtube.com/watch?v=FoHAyBhcH4s
Neo4j Live: Neo4j VS Code Extension: https://www.youtube.com/watch?v=kSH4eqNARAw
Alison’s next LinkedIn Livecast: https://www.linkedin.com/posts/alison-cossette-7115857_join-me-and-varun-shenoy-from-stanford-university-activity-7034149009019535362-4mMP?utm_source=share&utm_medium=member_desktop
How Cypher changed in Neo4j v5: https://towardsdatascience.com/how-cypher-changed-in-neo4j-v5-d0f10cbb60bf
Exists Subqueries: https://neo4j.com/docs/cypher-manual/current/syntax/expressions/#existential-subqueries
OSMNX library: https://github.com/gboeing/osmnx
GraphGPT: https://github.com/varunshenoy/GraphGPT
Streamlit: https://streamlit.io/
Arrows: https://arrows.app/
Data-Importer: https://data-importer.graphapp.io/
Jason’s Mock Graph Data Generator: https://github.com/jalakoo/mock-graph-data-generator

Episode Transcription

Jason Koo (00:00):

Welcome everyone to another episode of Graphic Stuff FM, your buffet of graph knowledge. We've got a little something for everyone today and joining me in our virtual studio is Will Lyon and Alison Cossette. Alison is another one of our developer advocates specializing in data science. Alison, would you like to say a few words about yourself?

Alison Cossette (00:19):

Thanks, Jason. Yes, I am a data science developer advocate here at Neo4j, longtime lover of graphs and new to the Neo4j team and I'm excited to be here with you guys today.

Jason Koo (00:31):

Awesome. This episode, we're going to cover news across January and February. Since our last episode was focused mainly on recap of 2022, we didn't give January all the love and attention that we could have. So we're going to bring some of that love to this episode. And as everyone is probably, maybe, painfully aware at this point, the last two months has been filled with tons of news regarding ChatGPT. So we will talk a little bit about ChatGPT, including some data science stuff, and also spread to things out, we're going to also talk a little bit about Cypher, what it is, how you can get started and some tuning tips. So to kick off our ChatGPT and Cypher combo here, I've actually asked ChatGPT for a joke regarding buffets and graph databases. All right, I hope you guys are ready for this. Okay, so here it is. Why did the Buffet restaurant switch to a graph database? Answer is because they needed to keep track of all the relationships between the dishes and it was too much for a regular tape.

Listener Kevin (01:41):

Ooh.

William Lyon (01:44):

I love it.

Alison Cossette (01:45):

That's actually pretty good.

William Lyon (01:46):

Yeah, that-

Alison Cossette (01:48):

Nerd humor at its finest.

Jason Koo (01:49):

Okay, so let me move into Cypher here. For those who are not familiar with what Cypher is a query language for querying Graph databases, namely Neo4j as it was originally developed here back in 2011, but it's actually been open source since 2015. Now Cypher is a declarative query language, and it is very ASCII art like. So if you can imagine since we are in a audio podcast, imagine an open and closed parent together, right? That's basically looks like a circle. So that represents a node. And then to connect a node to another node, you have dashes. Now a dash can't really contain much information, so the dashes are a... Brackets are put into dashes. So you got a dash, an open bracket, closed bracket, and a dash. And information about nodes can go inside the parents and information about the relationship goes into the brackets.

Jason Koo (02:47):

So for example, say you want to connect people work at a company. So you would have people inside the parents, then you have a dash and then brackets, and inside the bracket you would have works for and then the other parents at the other end is a company. So you can see that we use this ASCII art syntax to describe patterns of data that you can easily put into or submit to a graph database and get information about those nodes and relationships. Okay. A quick history of Cypher. So again, it was embedded back in 2011 and we at Neo4j have a real love of the original matrix series. And so Cypher, as everyone knows, is a character from the Matrix. Now the interesting thing that I just learned from Will was that Cypher was only the engineering code name for the language. It was never intended to be the publicly released name because the inventor of Cypher, I guess he had wanted to choose a name that for sure the company wouldn't, we would not use because Cypher was a bad guy, one of the bad guys in the Matrix. But for reasons I'm not entirely sure the name was kept and it went all the way through and now the query language name has stuck. Oh, was there anything else that I missed from that, Will?

William Lyon (04:18):

That's about it. I guess we'll have to put a spoiler alert on Cypher being a traitor in the Matrix on a 2022 year old movie spoiler alert at this point. But yeah, that's the basic gist.

Jason Koo (04:30):

The last thing I like to mention about Cypher is it's syntax is become a core component of the upcoming graph query language that is being developed by a ISO working group. So I think that group has another two years before they officially release the spec, but the current versions of it, current renditions looks very Cypher-esque. So if you learn Cypher and you use Cypher now you'll have this great onboarding ramp to the future graph query language, which is meant to be a query language that hopefully all major graph database vendors will be compatible for. Okay. To segue into some information about, again, ChatGPT before Alison takes us off, I wanted to find another joke to combine Cypher and ChatGPT but none of the jokes it gave me was good. So I did find this great pun that David Allen had posted in a internal Slack quite some time ago. So here's this joke. Okay. A new database query walks into a bar. The service says, "Sorry, cache only." Cache being C-A-C-H-E, not C-A-S-H. So for all you database dads out there, this is one for your books.

William Lyon (05:44):

Database dad jokes. I like it.

Jason Koo (05:48):

Hey Allison, take us away.

Alison Cossette (05:50):

Excellent. So the ChatGPT chat continues. So obviously as Jason mentioned earlier, everyone under creation is talking about ChatGPT these days. And there's been some really interesting content that's come out recently. Tomaz Bratanic published a blog called Use ChatGPT to Query a Neo4j Database. I highly encourage that you check it out not only for informational purposes, but I think there's a level of humor in his approach. But what he does is he goes through, and in many ways it's almost a primer, being able to leverage ChatGPT so much goes into what is a good prompt, what is going to give you different kinds of output. So you can actually learn a lot about how to refine your prompts or not over refine your prompts from Tomaz.

Alison Cossette (06:42):

Additionally, another blog that came out recently was Create Neo4j Database Model with ChatGPT by AK Conrad. And in it he is trying... He has a son who has Down syndrome and he has a very hard time keeping track of all the different supplements that his son takes and their various effects on his son's body. So what he did was he went through and leveraged ChatGPT to help create a Neo4j database model that would allow him to build out an actual useful tool for him and his family. And again, so much of what's interesting about ChatGPT isn't just the cool fun factor, but how can we actually leverage it as a proper tool? How do we know when it's giving us good responses or not? And so there's a lot to be learned about the relationship between ChatGPT and Neo4j in those couple of blogs, so I highly recommend you check those out. There's also another one about Maximizing Efficiency: ChatGPT Sample Data Sets by David Stevens. And I'm going to actually kick this to Jason because I know Jason, you've got some interesting things going on with that.

Jason Koo (08:01):

Yes, so both David and me have been separately working on creating sample graph data to use for any sorts of purpose. And the reason for this is the same reason you would want mock data for testing any other sort of system. If you don't have access to the real data yet, maybe it's forthcoming or maybe it's not publicly available, or you are working on a greenfield application and you've got a data model that you want to use with the app, but the app isn't in production yet, so you don't have the real data, but you want to see what queries you should use and what sort of interesting information you can glean from your particular data model, then having a sample dataset is incredibly useful. So David's approach was using a ChatGPT to just create a small sample dataset that he would then import.

Jason Koo (08:52):

The separate mock graph data generator app that I built, which I'll talk a little bit more later, is an app that will take a data model and allow you to configure data on nodes and relationships to tell the mock graph data generate how many of a particular node or particular relationships to generate and also to create mock data into its properties. So you'll want a bunch of imaginary first name, last name, combos, imaginary emails, whatever property that you would want, day times, integer, billions, whatever that this mock data generator that I'm working on would allow you to configure all those options so that you get a different interconnected mock data set every time you ran it. So that's the mock data generation options that you've got now. We'll post some links to both David's article and also to my repo on these mock generation options.

Jason Koo (09:49):

Now, I'm going to move back to talking about Cypher tips and tricks a little bit here because there are some things that you should be aware of when you start off just to minimize the challenges or roadblocks that a lot of folks run into. The first thing that I like to always talk about is don't forget to escape your strings. Now when you're in the query browser directly, it will remind you. So it'll tell you like, "Okay, put quotations around string." But if you go from that and you do a direct copy paste, say, into a Python driver and you forget to escape the quotation marks, you will of course have a failed Cypher call. This is something everyone will fix fairly quickly, it's quite apparent. It's still something that you should be aware of before you run, especially a query on quite a bit of data because that will unfortunately cause a lot of query errors.

Jason Koo (10:41):

Other things that you should be aware of, the main clauses to use in Cypher are match, where and return. Match is used to define the pattern that you want to look for using Cypher, where of course is a filter keyword so you can filter for particular properties of nodes and relationships. And then return is just specifying what data from your query do you want. Do you want all the nodes? Do you want just the names of the nodes, et cetera, et cetera. The final keyword that I think a lot of people aren't immediately aware of when they first use Cypher is the optional match clause. So the optional match clause is a way of defining a particular pattern that you're interested in, but if the data you're looking for doesn't contain that pattern, it will just return null values. So you can look for a primary match, a pattern, and you're like, I'm interested also in something else. You put that into the optional match, and if that data is in there, it'll return the data you're looking for, but if not, it'll just return some empty null properties and allow you to forth with your query. So that's something to be aware of.

Jason Koo (11:54):

Three more things real quickly that you should be aware of when using Cypher for the first time is to use parameters inside your client libraries, so that you can reuse your Cypher queries and not have to hard code every single call. And then to improve performance, you should definitely use indexes and constraints. Indexes will improve the query time performance, it basically tells the query which nodes to start with right away, so it doesn't have to do a comprehensive search. It goes, "Oh, okay, you want a key off of say a particular ID that is index, I'm going to jump right to it." And then do the traversal from there. And then constraints. Constraints will help you avoid duplicates and will just generally improve your data integrity. So those are my basic Cypher tips for anyone who hasn't really started working comprehensively with Cypher. So those are my tips for using Cypher. Now, once you become comfortable with Cypher, you'll definitely want to try out Graph Data Science or GDS Library. It's got some wonderful algorithms and tools for you to really get the next level information from your graph database. Alison, what would you recommend for people to look into if they're wanting to get started with GDS?

Alison Cossette (13:13):

Thanks, Jason. It's an excellent question. Oftentimes, when folks are interested in getting into GDS, they don't really know where to start. There's a classic text that some of us have used called Hands-on Graph Analytics with Neo4j by Estelle Scifo. And Lucky for you and all of us, she has a new book out that is called Graph Data Science with Neo4j. The great thing about this book is it picks up almost where graph analytics with Neo4j left off and adds a lot of new information. So not only does it cover the basics of Graph Data Science, but also it gives us an understanding of how do you even understand a graph data set? What are some of the basic things that you can learn about the structure of your graph? Some of the things that you'll learn about are centrality or community detection algorithms. And in centrality it gives you an idea what are the importance of different nodes in your graph. Community detection gives you an idea of what are some of the communities or subsets within your graph.

Alison Cossette (14:22):

She'll give you an introduction to graph visualization and then actually doing the machine learning on these graphs. So what is also new and exciting in the new book is that she'll also dive into visualizing your graph data. At Neo4j, we have one of my favorite graph visualization tools called Neo4j Bloom. And I like to use a soft eye and take a look at the graph itself depending on how many, many nodes you have to just get a feel of almost the topography of the data. So she'll walk you through that. As I said, we've got machine learning models. She'll walk you through building a pipeline for node classification model training, and then also additional custom graph algorithms, APIs for Java and extending your Graph Data Science by writing your own message passing through the algorithm. So it really is soup to nuts.

Alison Cossette (15:16):

So if you've never done Graph Data Science before, this book is for you. If you have been doing it for a while and you're looking through the latest to Neo4j, this book is also for you. The fun thing for me is that coming up soon, I'm going to be hosting book club on Estelle's new book. So keep an eye on Neo4j. What we'll be doing is we are going to go through the book chapter by chapter and there'll be a lecture and we'll be able to answer questions as we move through the book club. So one of the things I love about being in DevRel is actually creating connections with people in our community through these types of book club workshops. And so that actually brings me to my next community comment, which is our most recent community question that came in this episode. So Will can you tell us a little bit about the community question?

William Lyon (16:13):

Absolutely. So this is a relatively new feature that we've added to graph stuff, which is the ability to submit audio questions. Anyone can submit one, just go to graph and look for the submit a question button. Our question for this episode comes from listener, Kevin. So let's cut away and hear what Kevin is asking about.

Listener Kevin (16:40):

I recently deployed my app to Netlify, but the queries run quite slowly and I have no idea why. Because on self-hosted docker containers, the app runs quite fast. The Neo4j is self-hosted, the client react app is also self-hosted on the same machine, and everything follows doca structure. And apparently their performance is really good, the self-hosted machines. But when I recently moved to Netlify, the invocation time is quite long for each request, it usually goes to 10 to 15 seconds. I even went to the pro version and it is still the same. Maybe you might want to help clarify on how to optimize some of the workflows for Netlify. Is it the Lambda function that needs some work? Maybe you might clarify on that, please. Thank you.

William Lyon (17:37):

Great. So that is I think a really common issue that folks have. Basically, my app is fast. When I run it locally, I go to deploy it and now it's slow. How do I figure out why it's slow? How do I make it fast again? I think there's maybe a checklist of things to work through here to try to give you an idea of what might be causing some of that slowness that you're seeing after deploying your application. One thing to look at would be checking the database indexes in your production deployment. So I'm assuming that you have some different deployment, some different environments of Neo4j your deployed production instance versus what you're running locally. And so it's always a good idea to make sure that the database indexes constraints are set up in a similar manner with graph databases like Neo4j, we use indexes to find the starting point of a traversal, what Jason was talking about earlier.

William Lyon (18:42):

So making sure those indexes are in place are a good starting point. The next thing you can look at is enabling the slow query log, or it's just called the queries log, but I always call it the slow queries log because that's typically what we're using it for. And this is a setting where we can enable a threshold and any queries that take a amount of time greater than the threshold to execute than are added to the slow queries log. So we can see this as a result from actual production use of our database. We can check that slow queries log, we'll see the Cypher query, any parameters that were passed as well. So that'll give us some indication of are there specific queries that are slow? Is every query slow? And actually seeing, okay, is the database the actual bottleneck here? Because in any sort of production application, there's different pieces of the infrastructure that could be introducing those performance issues that you're seeing.

William Lyon (19:42):

Once you've identified with the help of the slow query log, what some of the queries that you're having some performance issue with, then we can take a look at query tuning, which we typically start by looking at the execution plan. So if we add, explain or profile to the beginning of a Cypher query, we can see the execution plan. And if we run profile, we'll see this metric called DB hits, which is basically for each operation in the query plan, what are the number of database operations that are going on. So query tuning is basically an operation in getting the result that you expect with the fewest number of DB hits. And I'll drop a link to the documentation that talks a little bit about query tuning, goes through a couple of examples. So that would be a helpful place to look.

William Lyon (20:38):

The next thing to look at would be make sure that you're following best practices when using the Neo4j drivers. So these are things like making sure, sure that you're closing the session object that you are creating for some given body of work to do with the database. Make sure you're not recreating the driver instance unnecessarily. This is especially important when you're using AWS Lambda serverless functions. So I think Kevin, you said you're using Netlify, so I'm assuming you are deploying some AWS Lambda functions there to interact with Neo4j. So we want to make sure that each time we're invoking the Lambda function, we're not creating a new driver instance because that can take a little bit of time and resources. So we want to be able to hang on to the driver object and only create a session object to do the actual work with the database when that function is invoked. I'll link a blog post and some documentation for these sort of driver best practices that go into a couple of other things to check out.

William Lyon (21:44):

But I think in the context of Lambda functions, thinking about that cold start and making sure we're not doing any unnecessary work during the cold start can be an important one. You can also look at things like making sure that your database is deployed in the same region as Lambda function, these sorts of things. So if you go through that checklist and none of those are helping the performance of your app, I'd say the next thing to do is to submit a post @community.neo4j.com, this is the Neo4j community site. And if you give some details there about the issues you're having, there's a great community of folks that can help get you started on figuring out what's going on. Cool. So thanks so much Kevin for submitting that audio question. And just a reminder, any listener, feel free to submit an audio question for us. Just go to graphstuff.fm, look for the submit a question button and you can record one that we will hopefully feature and answer on a future episode.

William Lyon (22:47):

So with that, let's talk about some of the near for J livestream events that happened in January and February. And the first one that I want to highlight that I really enjoyed was all about Wardley mapping. This was a totally new concept to me. I had not heard of Wardley mapping before, but we had Tom from the community and Alex from Neo4j that walked us through how we can use Wardley mapping for decision making. And basically what this is plotting, as you are planning some new product or working on some iteration of a product. You're basically plotting evolution of product development versus value chain of the product. And you very quickly can start to see where risks can appear.

William Lyon (23:40):

It's really, I think, a helpful framework for just sort of ging assumptions, visualizing and communicating your strategies is to get some discussion going on your team. And Tom showed us some online tools for creating these Wardley maps and then talked about a project that he built called Parsley, which parses this Wardley map syntax for creating these diagrams and loads them into Neo4j, and then showed some of the structured queries that you can make against your Wardley map once you have its loaded into Neo4j. So that was a really cool livestream, I like that one quite a bit just because it was introducing something that was totally new for me. Another livestream series that we actually wrapped up in February was the Fullstack GraphQL book club. This is a series that Alex and I did on the Fullstack GraphQL applications book that I wrote and was published by Manning. And we worked through the exercises basically in each chapter. We took a look at the different things we were touching on in each chapter, worked through the exercises and eventually got our full app up and running. So that series is over, but we have recorded all of the livestream sessions, so you can watch those. I also wrote a blog post this week just taking a look back at some of the things that we learned in each episode.

Jason Koo (25:11):

Another episode that might be of interest was the Going Meta series with Alex and Dr. Jesus creating an RDFIZing virtual graphs over external data. Now this is the 13th episode of the series. This is a new series that covers graphs, semantics, and knowledge. And in prior episodes, they've covered a variety of topics including importing RDF or Resource Description Framework data, also known as triples. They've also done comparisons between Cypher and the sparkle language. A sparkle is another query language for querying triples, and they've also covered how to use ontologies and graphs and other similar topics. So definitely check it out, there's quite a bit of material in that playlist.

Jason Koo (25:54):

Now, in this particular episode, they cover how to use external data and add it in as a virtual graph information. So what you can do is you can query the data that's inside your Neo4j instance, the nodes and the relationships already in there. But you can also add in external data as virtual nodes. So you can combine external resources and your own native data inside your database. So you could test something before importing data or just continue to query that data as a sort of mixed model system. Now, one of the takeaways from this video was to make this work you needed to use... Or at least the way that is presented by Dr. Jesus is you had to use the extended APOC library. Now, the APOC library is a library that contains just a whole host of extra features and functions that you can run on your Neo4j instances. So the extended APOC library is kind of a new addition. In version 5.0 APOC has been split up between a core APOC library, which is available in AuraDB and your local instance, and then the extended APOC, which is currently only available on local instances.

Jason Koo (27:10):

So moving on this video, Dr. Jesus goes and in great detail show you how to run these virtual graphs and what calls to use to make this happen. So use APOC to virtualize the Graph and define how you import these external data points and assign it to what kind of node, so what labels you want to give these virtual nodes and then shows how to query it in a mixture of Cypher and these APOC calls. So yeah, so if you're interested in that sort of thing, totally recommended. This was a very good, very detailed but also mind opening episode in terms of what you can do outside of just querying what's inside your Neo4j instance. And another video that might be of interest to you is the BS code extension run by Adam. Alison, did you get a chance to check that one out?

Alison Cossette (28:00):

I did. I did. One of the things I.... Just to... We'll talk a little bit about what's in it, but there were some really great Q&A at the end of the video. If you get a chance check it out. But in this video, Adam goes through and talks about the VS code extension for Neo4j. He gives you some details about obviously how to load it. But what's... Some of the highlights for me in leveraging the VS code extension is it allows you to do a number of things. One, it allows you to have direct connect from your IDE to Neo4j. There you can manage multiple connections as well through VS code. And it's also... You were talking earlier about making sure that you keep your Cypher queries tidy. We have the auto completion in the Cypher queries as well, so you've got the coding on it. So if you're missing a closed parent or a bracket or a curly brace is definitely going to help you out with that as well.

Alison Cossette (29:02):

But the big thing for me, as I said, is this ability to go from the IDE actually write your Cypher queries right from there. I know we mentioned earlier parameters, you can see the parameters that are currently loaded as well. And so being able to run those in a familiar environment I just think makes it so easy to just leverage Neo4j in your everyday coding environment. All right, so other upcoming live streams, I especially around Graph Data Science, I recently started a series on LinkedIn called Topics in Graph Data Science where it's a live stream once a week and we cover sometimes technical, sometimes non-technical topics all around Graph, Data, Science.

Alison Cossette (29:46):

It's been going pretty well. And one of the things that came up is in the lectures I'm able to give you an understanding of some of the algorithms, what is the math and the methodology under the algorithm. And then additionally, I am now just launching the topics in Graph Data Science workshop, which is the companion series. So if in topics in Graph Data Science, it's a technical talk that week, there will also be a live workshop where you'll be provided a Jupyter Notebook and we can help you set up with your Neo4j instance. Before that you can come to the workshop and we actually work through, get our hands dirty, run the algorithms, talk through some of the aspects that we talked about in theory. And the first one that we had was about community detection, so that was our first workshop. All of those are recorded and available along with the companion notebook, so we're excited about that. And that's what's coming up in topics. And Graph Data Science.

William Lyon (30:47):

That's awesome. What's the best way for folks to join? You said that's on LinkedIn?

Alison Cossette (30:52):

It is on LinkedIn. You're going to find us in Neo4j, so you can either find me Alison Cossette. You can go to Neo4j on LinkedIn, and those are all LinkedIn live streams.

William Lyon (31:03):

Awesome. That sounds great. To bring us back to our Cypher theme here, I wanted to talk about this blog post, another one from Tomaz, Neo4j 5 Changes in Cypher. So Neo4j 5 was released towards the end of last year, and there were of course some changes and improvements in Cypher. And Tomaz did a great job of highlighting some of those in this blog post, which we of course will link in the show notes. He talks about some syntax changes that are a bit different, which are geared towards getting things more in line with GQL coming up. For example, slightly different syntax in the way that we create indexes, these sorts of things. But really the focus of this post from Tomaz is all about subqueries and the different ways we can use subqueries in Neo4j 5. So subqueries, you think of them as an independent Cypher statement that is a component of a larger Cypher statement.

William Lyon (32:12):

And there's a number of things we can do as subqueries, they're quite powerful. The first example that Tomaz shows is in the context of a LOAD CSV statement. So LOAD CSV is Cypher's functionality for loading CSV or flat files and then being able to with Cypher define how you want to create or update data in the database. And when we have very large CSV files, if we want to be able to split those update operations across multiple transactions, we don't want to build up in memory the transaction state for just one giant transaction and try to commit that. We can often run out of memory or it's rather inefficient. And so previously we would use the using periodic commit syntax in a LOAD CSV statement. Using periodic commit is now deprecated and instead with Neo4j 5, what we do is we'll say in transactions of 10,000, a hundred thousand, however many rows, and then we give a subquery that defines the logic. So basically the logic we want to execute for each row to be able to batch these transactions for large CSV files.

William Lyon (33:34):

Tomaz also talks about using subqueries for conditional logic with a fun throwback to the for-each-case-if hack any folks are familiar with that for a while, that was a common hacky way to get some conditional logic in Cypher. I've fond memories of writing those, but that's no longer needed. Now we can use existential subqueries to have conditional logic now with Neo4j 5. And you'll see a few other examples. So using size with a Graph pattern is no longer supported, instead we're going to use a count subquery. So definitely check out this blog post from Tomaz if you are running into any warnings in your Cypher statements, talking about things that have been deprecated or changes in Neo4j 5 and you want to see a real example what's going on there. So we'll definitely link to that in the show notes.

William Lyon (34:38):

Great. So for the next section of the episode, I think it's always fun to talk about our favorite tools of the month. So what are some interesting things that we've seen in the ecosystem around Neo4j that we want to share with others. And I'll go first and my favorite tool this month is the OSM NX Python package. So OSM for Open Street Map and NX for Network X, which is another Python package for working with networks or graphs in Python. So OSM NX is a Python package that was actually created at an urban planning lab at the University of Southern California. And it was created really for grabbing data from Open Street Map and specifically road networks and then being able to analyze those road networks for the purposes of urban planning.

William Lyon (35:41):

But of course, open street map data is very valuable for things beyond just urban planning. And so I've been working on some geospatial projects where we're using data from Open Street map. And OSM NX gives us a network X representation of the road network or also GIA Pandas, which we can then iterate over and import into Neo4j. I think one of my favorite features about OSM NX is that it will automatically simplify the graph topology of road networks from Open Street map. And this is really useful if you've ever pulled down raw open street map data, there's a lot of additional points and additional tagging that occurs that depending on your use case for working with the Open Street map data, you often want to simplify to be able to work with for, say, routing for example. And so that's all built in to OSM NX, which is very useful and came in handy for me quite a bit this month.

Alison Cossette (36:43):

I'm going to go next with my favorite tool of the month. My favorite tool of the month comes from the APOC library and specifically APOC refactor. Sometimes you'll be in a situation where you have a graph and something I came upon was I have a relationship, but there are many of those relationships within the graph and the interesting piece of what I need is one of the properties of the relationship. And so within the APOC refactor, you have something called categorize. And in categorize it will take each unique property key in the category node and connect to it. So it's going to allow you to take that property, turn it into a node, and actually create that connection.

Alison Cossette (37:31):

So what that's going to then allow you to do is it's going to allow you to increase the efficiency of your query. So rather than having a single node with many relationships that are a similar type of relationship, but a particular property within that, each of those relationships now is the relationship type of the property key as well. So that's my favorite tool. Another thing I came across this month that we had Varun Shenoy on topics in Graph Data Science. Varun is a graduating student from Stanford University and he had a fun Saturday afternoon playing with ChatGPT, everyone's favorite tool this month. And what he did was he leveraged it to take text and create a knowledge graph. So that's been fun to play with. You can check out our live stream on that as well. And those are my favorite tools of this month.

Jason Koo (38:29):

Nice. And my favorite tool this month was Streamlit, which I used to create the mock Graph data generator application. And if you haven't used Streamlit before, it's a great tool for converting Python scripts into shareable web apps. You don't have to have any prior knowledge of building UIs and you can just basically insert a few API calls. And when you run Streamlit, it will take all those commands and build a UI around your script. The great thing about Streamlit is you can use other Python packages just like you would with any other Python script, except now that becomes accessible to a web app that is very quick to deploy. So I totally recommend it if you haven't tried it yet before, give it a go.

Jason Koo (39:12):

Oh, some other tools that I did use that aren't Python packages but are great tools for doing graph modeling and importing in general are the Arrows app. So it is available for free@theurlarrows.app, and this tool is Great for quickly creating a graph data model. And once you're done, you can export a JSON file of it, you can save it to a Google Drive, you can print a PDF of that data model. And then on the other side, the data importer tool, which builds on top of Arrows app, allows you to take a bunch of CSV data and do data modeling graphically like Arrows and then import all that data into a Neo4j instance of your choice. And so for my app in particular, I use Streamlit to build an app that built around both Arrows and Data Importer. So you use Arrows to design your data model. You add in a couple extra syntax to basically configure mock generation. You export the JSON file, the JSON file is imported by the Streamlit app. It runs all the mock data generation, spits out all the CSV files and a zip file that Data Importer can read. And then you can basically review everything real quick before you import it into your Neo4j instance.

Jason Koo (40:32):

And the reason I went with Data Importer versus using the Native Python client to move all that data in is the Data Importer team is much bigger, much smarter than I am. And so they're always optimizing that data import process. So rather than duplicate what they're doing with Native Python client, which I might go ahead and attempt at some later date, it struck me as faster to just leverage Data Importer, which is a great tool. So those are my three tools of the month. And if you are ready to get started with Neo4j or take your Cypher skills to the next level, we have quite a few upcoming events in March. The first one being intro to Neo4j, this will be mid-March and it will cover intro to graphs, how to identify if you have a graphy problem and introduction to AuraDB, which is our cloud managed service and some basic Cypher commands.

Jason Koo (41:30):

A week after that on March 22nd is an Intermediate Cypher workshop, which will cover intermediate Cipher, using the APOC library and how to model your Graph data. And after that end of March, we've got building a routing app. So you can take all that knowledge that you learned in the earlier workshops and create a web application that leverages on Open Street Maps. And that workshop will be taught by Will, definitely recommended. He will get you from zero to hero on Open Street Maps and pathfinding. So basically, if I'm right Will, tell me if I'm wrong, that routing app is basically going to allow someone to create a simple or a Google Maps light. Is that correct?

William Lyon (42:12):

Yeah, that's a good way to think of it. We're going to use that OSM NX package that I mentioned earlier to import a road network for your area. We're going to see how you can import points of interest, so what are all the restaurants, bakeries, whatever in your area, and then also pull in data from open addresses to be able to route between arbitrary addresses or points of interest and view that on a nice map that gives you turn by turn directions.

Jason Koo (42:44):

Nice. Looking forward to that. All right. The last workshop that I'll mention is happening in early April, which is the Spring Data Neo4j training workshop. And this is for anyone who's working in the spring framework, so if you're a Java developer and you want to know how to use Neo4j inside that framework, that workshop is for you. Okay, so we'll drop in a link to all the training series in the show notes. Definitely check that out. There's a lot of good workshops coming up. And before we wrap everything up, before you go, just want to give a quick shout out that we do have some open roles here in Neo4j internationally. We've got positions in America or North America, the UK, and in Sweden.

Jason Koo (43:31):

And the two roles that I guess I'd like to give a particular shout out to is product management role in London. And also in the London location, a technical curriculum developer role. So this person would work on our team with specifically helping create Graph Academy courses. So the Graph Academy is our built-in training subsite, which includes a whole host of interactive workshops that you can go through and learn about different aspects of Neo4j, learn how to become more proficient in Cypher and learn our Graph Data Science library. And we offer a couple of certificate courses through that program. So definitely check out our careers page. We will of course link that in our show notes as well. And that I believe concludes our episode.

William Lyon (44:23):

Thanks for listening, everyone. We'll see you next time.

Alison Cossette (44:27):

Thanks all. We'll see you soon.