GraphStuff.FM: The Neo4j Graph Database Developer Podcast

The Perfect Graph For The Perfect Problem

Episode Summary

Welcome to GraphStuff.fm, your audio guide to the interconnected world of graph databases. We’ll take you on a journey to unravel the mysteries of complex data and showcase the power of relationships. Today we’ll be covering the latest news from the Neo4j ecosystem, providing some tips and resources for speakers interested in improving their game, and helping you find the perfect graph for the perfect problem.

Episode Notes

What's New In ArcGIS Knowledge: https://www.esri.com/arcgis-blog/products/arcgis-knowledge/announcements/whats-new-arcgis-knowledge-23q2/
StepZen + Neo4j GraphQL: https://neo4j.com/developer-blog/api-development-neo4j-graphql-stepzen/
Improving a node.js GraphQL server performance: https://medium.com/@angrykoala/645a4ae711c3
Neosemantics: https://github.com/neo4j-labs/neosemantics
Workspace updates w/ Greg King [Dev Tools Product Manager at Neo4j]: https://www.youtube.com/watch?v=NUuI9NkOfZo
Community data modeling question (What is the better data model-- creating more nodes, or utilizing more properties?): https://community.neo4j.com/t/what-is-the-better-data-model-creating-more-nodes-or-utilizing-more-properties/62533
Mock Data Generator Livecast: https://www.youtube.com/watch?v=zHtJ_qNKA-Q
Intermediate Cypher & Data Modeling: https://neo4j.com/event/apac-training-series-intermediate-cypher-and-data-modelling/
Michael and the Bot: Graph Coding with ChatGPT for Fun and Profit - Exploring Bluesky: https://www.youtube.com/watch?v=6BMCXFii4Po
Software Dependencies w/ Alex and Michael: https://www.youtube.com/watch?v=QOu5VAsCAoA
Graphversations Series (2 new episodes): https://www.youtube.com/playlist?list=PL9Hl4pk2FsvU-AZJmF70jTVe_6riV9Qap
Going META, episodes 15-17: https://www.youtube.com/playlist?list=PL9Hl4pk2FsvX-5QPvwChB-ni_mFF97rCE
BioCypher: https://neo4j.com/developer-blog/biocypher-biomedical-knowledge-graphs/
NODES is coming (back)!: https://neo4j.com/blog/nodes-2023/
Submit a talk to NODES (CFP is still open until June 30th): https://sessionize.com/neo4j-nodes-2023
Long form - TED Talks (book) by Chris Anderson: https://www.ted.com/read/ted-talks-the-official-ted-guide-to-public-speaking
Short form - Julian Treasure’s TED Talk - How to speak so that people want to listen : https://www.youtube.com/watch?v=eIho2S0ZahI&list=UULPAuUUnT6oDeKwE6v1NGQxug&index=3
Nifdi: https://nifdi.app/
The Pros & Cons of Native vs Non-Native (Graph Databases) June 7 + 8th:
- APAC: https://neo4j.com/event/neo4j-finding-the-right-graph-database-the-pros-cons-of-native-vs-non-native-asia-pacific-june-8/
- EMEA: https://neo4j.com/event/neo4j-finding-the-right-graph-database-the-pros-cons-of-native-vs-non-native-europe-june-8/
- US: https://neo4j.com/event/neo4j-finding-the-right-graph-database-the-pros-cons-of-native-vs-non-native-june-8/
LLMs and Knowledge Graphs June 13th: https://neo4j.com/event/large-language-models-and-knowledge-graphs/
Meetup at our office in Sweden: June 15 https://neo4j.com/event/meetup-neo4j-office-malmo/
Mock Data Presentation at Tokyo Meetup June 23: https://neo4j.com/event/how-to-create-interconnected-synthetic-data-meetup-in-tokyo/

Episode Transcription

Jason (00:00):

Welcome to GraphStuff.FM, your audio guide into the interconnected world of graph databases. We'll take you on a journey to unravel the mysteries of complex data and showcase the power of relationships. Today we'll be covering the latest news from the Neo4j ecosystem and be providing some tips and resources for speakers interested in improving their game and helping you find the perfect graph for the perfect problem. My name is Jason and I'm joined by our wonderful full stack experts, Will and ABK and our amazing in-house data scientist, Allison. How are we all doing today?

Will (00:34):

Great. Hey, everyone.

Alison (00:34):

[inaudible 00:00:36].

ABK (00:36):

Fantastic.

Jason (00:37):

Excellent.

Alison (00:38):

Ready for June.

Jason (00:39):

Ready for June. Speaking of early June, right around the corner is international world Bicycle day. As in usual form. I tried to get GraphGPT to tell me some good jokes. If you guys are ready, I've got something that technically counts as a joke. Why did the bicyclist fall in love with graph databases? Because they knew it was the wheel deal, wheel deal for efficient navigation.

Alison (01:08):

Mercy.

Jason (01:09):

That was the best one I could come up with after quite a number of probs.

Will (01:14):

Right. That does have the technical elements to be defined as a joke, but that's about it.

Jason (01:20):

Yes.

Alison (01:21):

I think it also goes in with June and Father's Day, because it would definitely fall under nerdy dad jokes, I would say.

Jason (01:28):

Speaking of bicyclist and navigation, we've got some updates regarding ArcGIS, which is... Will, what is ArcGIS?

Will (01:37):

Yeah, let's talk a little bit about updates in the Neo4j partner ecosystem. ArcGIS is a product from Esri. If you're familiar with Esri, they're basically the largest makers of GIS software. GIS is Geospatial Information System. We're talking about software systems for working with and making sense of geospatial data. Esri, for a long time, Esri's a fascinating company that've been around I think since the '60s, and were the original pioneers in this area. The GIS software world is very much based on relational databases. You have layers, layers have features and an attribute table that's all stored in some relational database that you can do some spatial operations with, and that's really the core of a lot of GIS technology. Now that leaves out of course all the awesome things we can do with graphs when we're working with geospatial data.

(02:38):

And so, Esri has built a product called ArcGIS Knowledge, which incorporates graph algorithms and graph querying functionality built not on top of a relational database, but built on top of graph database technology. ArcGIS Knowledge has been out for a while. It's pretty neat. I've seen some demos, you can do this investigation notebook-based approach for common GIS workflows like suitability analysis or fraud detection are some of the demos that I saw. But the announcement that we're talking about the week and today is now Esri has added the ability for you to bring your own Neo4j database to ArcGIS enterprise. This is outside of the ArcGIS Knowledge product, which basically manages your graph database for you. You can't really bring your own Neo4j instance, everything you're doing with working with through the ArcGIS knowledge interface. But now with this additional support for existing Neo4j databases in ArcGIS enterprise, you can bring your own Neo4j instance and work with that in ArcGIS enterprise like you would other data sources.

(03:52):

This is really cool to see this adoption of graph technology in the GIS ecosystem, so we'll link the blog post that goes into a bit more detail about this. I guess, this is really interesting if you are already using Esri and specifically Esri Enterprise and Arc Enterprise within your organization. Surprisingly, a lot of organizations actually are and may have the Esri license and not even realize it. I've talked to some few county governments, the university I'm affiliated with has the ESRI enterprise license for all of this. Worth checking around to see if you may have access to this already.

ABK (04:29):

Had a quick question, if you don't mind, Will. Taking your existing new database and using that with ArcGIS, do you happen to know is there any prep you have to do on your side? Is it enough to just decorate it with geolocation data, or how do those two things mesh?

Will (04:46):

That's a good question. I haven't actually tried this new feature myself, but my understanding is that there's a convention for how you represent your geospatial features in Neo4j, a convention for how those are stored to how those are represented as features in your layers in ArcGIS enterprise. For example, Neo4j, the database has a native point type that can be used to represent geospatial or Cartesian coordinate points. We can store individual points, we can store arrays of points to say represent more complex geometry like a line, for example. If we are storing an array of points as a line, I think ArcGIS enterprise would interpret that as a line geometry. I'm not sure how polygons and multi-polygons are represented, but my understanding is that yeah, it's using pretty much the native spatial points types that are supported in the database to represent that geometry as features in ArcGIS.

ABK (05:47):

That sounds really exciting. I think that's cool stuff.

Will (05:50):

For sure. Yeah. Great. Let's talk about some things elsewhere in the Neo4j partner ecosystem. Ellen is out this week. Ellen, who's the product manager for the Neo4j GraphQL team is out along with someone from the StepZen team talking about using StepZen alongside the Neo4j GraphQL library to build and deploy GraphQL APIs. StepZen, if you're not familiar, they're a startup building tooling that make it easier to build GraphQL backend. They're focused on the backend GraphQL development piece. Similar in some ways to what we've done with the Neo4j GraphQL library, which has the same goal, but of course to help you build GraphQL specifically sourcing data from Neo4j. Whereas StepZen, of course, wants to make it easier for you to build GraphQL APIs from a multitude of different sources, not just, say, from Neo4j.

(06:45):

This tutorial, it's out on the Neo4j developer blog, basically showing how to get started with the Neo4j GraphQL library to build a GraphQL API, and then extending it with StepZen to pull in movie reviews from the New York Times API, basically extending the schema of the generated GraphQL API from the Neo4j GraphQL library. This I thought was really neat to, it shows I guess the power of GraphQL almost as like a data connector thanks to having this GraphQL standard. It allows us to build tooling to work together, interchange interchangeably, which is pretty neat. Definitely check out this tutorial if you're looking at building graphical APIs, pulling in data from multiple sources.

Jason (07:28):

I got a question, Will. Is StepZen similar to Apollo or?

Will (07:35):

That's a good question. Apollo was very early to build a lot of open source tooling to round out the GraphQL ecosystem. I think the things they were most successful building were Apollo Client, which are integrations with front-end frameworks for data fetching from an existing graphical API and things like Apollo server, which is tooling for building the backend piece. Then Apollo eventually built their own cloud enterprise offering for trying to work with more complex-federated schemas in a large organization. StepZen, I think, is much more focused on just the how do we make it easier to build the GraphQL backend piece looking at incorporating different data sources. Their executive team and their founder comes from Apigee, so I think they have a lot of familiarity with some of the challenges and problems that come up when you're building, deploying and managing APIs.

(08:34):

And so, I think that's more of their focus, whereas Apollo is right now focused on how do we enable GraphQL almost as a microservice within a large enterprise, and then also doing a lot to support the open source GraphQL tooling that most people use in the GraphQL world. While we're talking about the Neo4J GraphQL library, Andres from the GraphQL team was also out this week with a blog post talking about some of the performance improvements that they've done in the Neo4j GraphQL library. One of the jobs, I guess, of the Neo4j GraphQL library is to take GraphQL requests and translate those into Cypher queries. Of course, you're not always generating perhaps the optimal Cypher query in that way, and so it's important to be able to have some tooling, some ways to measure performance, especially as you're building new releases and the way you're generating those queries change. It's important to understand what the performance characteristics are there.

(09:35):

Andres wrote a great blog post talking a little bit about some of the performance aspects specific to the generated Cypher, things like this, but really showing some of the tooling that's out there for building performance monitoring tooling for Node.js servers in general, so things like flame graphs to pinpoint problem areas and how you dig in and really get into some of those performance improvements. If you've been using more recent versions of the Neo4j GraphQL library, and if you've noticed that it's getting faster and better, this is a deep dive from Andres into how they made that possible, which is pretty neat. Cool. I think we probably have some other updates in the ecosystem to talk about next. Jason, do you want to walk us through some of those?

Jason (10:18):

Yes. Yeah, Neosemantics, which is a plugin for connecting RDF data or resource description frameworks to Neo4j. RDF is another graph database data format. This plugin has had some upgrades, so it now connects with Neo4j 5.7, I believe, and works with RDF, I think, 4.2.4, which I believe is the most recent version. Basically, it's this upgrading Neosemantics to work with the latest on both ends. The team also added some additional features. They added semantic similarity metrics and ontology inference procedures and functions. If you are working with RDFs and Neo4j, definitely check out those. Check out Neosemantics and its recent updates. Also, Greg King, our developer tools product manager, put out a short video on recent workspace updates and it's got quite a few interesting extra features that if you haven't used data importer in a while. The most prevalent one in the UI is the green check marks that appear on the Arrows inset UI of showing all the nodes and relationships.

(11:30):

Data importer, which we'll talk again a little bit later, allows you to take CSV data, model it as nodes and relationships, and then configure what data from the CSVs is added into that data model for import. Now the UI will do green check marks for all the nodes and relationships that have been mapped correctly. Previously a relationship would've had just like dash lines. The UI, it's easier to see what from your data model has been accurately mapped. Also, some additional add-ons was the ability to stop a running import. If it starts to import and you decide there's too much data and you just want just a subset of it, you can stop the run, go and check the data. Also, a great new feature with the import process is the ability to filter particular data. If you've got a relationship that is taking data from a CSC file, you can add in a filter for just targeting a subset of that CSV data.

(12:35):

For example, say there's a role with some Boolean. If true, then import that data, otherwise ignore it. It gives you a little more power in terms of the import process rather than taking in all the data and then cleaning up on the database side or cleaning up your CSV. You can do it in the import process. Besides the data importer, the browser has some updates as well too. You can export table data as CSV or JSO. When you make a query, and if your query results can be viewed as a table, which should be pretty much all queries, when you switch to the table view versus the graph view mode, there's a download button that allows you to pull a CSV or JSO of that data so that you can use it elsewhere. Also, there is a new toggle in the lower left for running experimental visualization.

(13:30):

This is using a different underlying engine for visualizing graphs. It is much more performant than the old system, but again, it's in an experimental phase. So, you can toggle between the two and get a feel for it. But I think you'll find that generally it is more performant for displaying larger and more complex graphs than the previous visualizer was. The last thing I'll mention is saving saved Cypher queries. You were always able to save queries in this left-hand panel, but now you can run a hotkey, right? In Mac it's Control+R and then F9, I believe, for Windows. For displaying a list of the previous Cypher queries you've ran, so that you can quickly go back to a command that you had done previously rather than saving and then pulling it from the left-hand side or scrolling down to see what calls you had done in the past. Those are things that they can help you move faster within the browser.

(14:33):

Shifting back to data importer, within data importer there's the UI component that has basically an upgraded Arrows and Arrows is used for data modeling. Alison, we had a, I think a great question regarding data modeling this month.

Alison (14:49):

We absolutely do. In our community. For those of you who haven't joined us, you can join us at community.neo4j.com. We get lots of questions where the folks here in developer relations will answer your questions for you as we move through. There was a great question, and it was a really simple question. What is the better data model creating more nodes or utilizing more properties? What it brought me to, I mean there was a longer discussion about it, but what it really points to, and those of you who have been working in Graph probably know this, but for those of you who are new, so much of the success and ease of your Graph project can really be brought back to the data model itself. If you're new to Graph and you're asking, what is a data model, Alison? Essentially, the data model is a blueprint.

(15:44):

It's a blueprint of the node labels or the node types, the relationship types, and then the properties of both. The property of a relationship, the property of a label, which items are nodes, which items are relationships, and trying to figure out where all of your information goes can be a challenge. In Graph we have what we call the data model. I can sometimes draw the parallel to an entity relationship diagram if you're in a relational database. We will talk a lot about the data model, but sometimes people say, "If I have a SQL database and I want to just transfer it over into Graph. Is there an easy way to do that?" Some people will say that a node label is akin to a table name. If I have a table of students, then I want a student node. If I have a table of classes, I want a class node. In these cases, the table name is the label and the primary key would then be the identifier of the node.

(16:51):

Then your relationships could be those foreign keys, so connecting a student to a class. Properties can then be different values within the row, but really it's not that simple. You can't just take your basic tables in SQL and put them into a Graph database. Today we are going to dive into why this blueprint is important and some considerations when you're creating your data model. I threw out a few things that for me I consider important in your data model, and I just thought I'd throw it out to everyone here to talk a little bit about. The first is really efficiency in representing the relationships. The advantage of Graph, of course, is to actually track those relationships. I find that the data model and Graph in general is just a really efficient way to represent that. Any thoughts on representation of relationships?

ABK (17:49):

Yeah, I definitely agree with, I think, your perspective on that. I think one of the nice things about relationships that we always talk about in using relationships for modeling is the whiteboard friendliness. That's the classic thing that we at Neo4j always say, and I think even though we've said it for a decade or more, it's really true that the way you tend to naturally think about something that if the four of us are talking about something in our heads, we'd be forming some a graph structure to talk about the thing, how this relates to this other thing. It's so natural to do and it's really clean. Your domain maps so cleanly to the Graph representation that it's like there's no friction, which makes it fun. In fact, I think as well, I think it's efficient and it also is gratifying that, "Oh, okay, the way I'm thinking about this is actually totally right, and this machine actually understands it the way I'm thinking about it, and all this tooling understands it and this scales the way you need it to." I think there's something fun about that.

Alison (18:45):

If anybody heard last month, is it last month's podcast? Where we were doing the live data modeling at PyCon. It was funny because people, we would be talking to people and they'd be talking about their problem and we'd put together their data model and there'd be this moment of realization like, "Oh yes, that's exactly how it works. That's exactly what I mean." There was just this, to your point, ABK, just natural way of representing something that... I love a good data model. Sorry, go ahead, Jason.

Jason (19:15):

Yeah, I was going to say it's super intuitive. When you're asking someone just to sketch out architecturally what's going on with their domain, you have a graph that they are sketching out. When it's visualized like that, especially if you put into another tool where you can move the nodes and the relationships around it, I think it very clearly shows where if you have some part of your domain or your stack that's overly complicated, you see it in that data model, and I think it naturally pushes people to iterate on it. You know what, maybe we don't need this thing here. We can simplify it by connecting these two systems directly with a relationship. Or maybe they do need to expand it out. It's too much in one thing. We need to create several nodes or several classes of objects to better break this up. It allows for that really high level iteration in a really satisfying way.

Alison (20:09):

Agreed. The next point I wanted to bring up was, and I mean, I would say this is my favorite, but they're all my favorite, is the idea of flexible and scalable structure. For me, one of the things, I just remember being in school in one of my first database courses and just sweating over making sure that I had the ERD right and that I had all the tables and all the primary keys and is this going to take into account all of these edge cases and all of these use cases? But what I love about the graph data model is exactly what it says, how flexible it is. Because if you are working on a data model and then you have another, say, data source, you can just connect it and plug it in.

(20:55):

I just think that ability to allow, and as Jason said, this evolution of the data model itself as you iterate, the fact that you're not constrained into something and that refactoring is really much simpler than in a traditional relational database. But I was wondering if maybe, Will, you had some thoughts on this idea of flexibility and scalability of structure?

Will (21:22):

I always describe graph data modeling as a very iterative process. What I mean by that is, the way I think of it is there's a very common four or five step process where you're first identifying in my domain what are the entities, those are the nodes, what are the relationship that connect the, what are the attribute values, what are the properties that describe the thing? What are things that are describing two nodes or should be relationship properties? These kinds of things, which somewhat straightforward. But then you get to this point of, "Okay. Well, can I traverse the graph as I've drawn it now to answer the questions that I have of the data? Oftentimes the first answer is, "Well, no, I need to add this piece that I didn't think about, or I need to promote this property to a node," so I can properly traverse to answer this question.

(22:18):

And so, I think for me, that's where I see a lot of the flexibility in the graph data model, is at that somewhere between the ideation and the proof of concept stage where you're still at the whiteboard, you're drawing that data model, you're in Arrows, you're choosing what data types for your properties and then you're literally drawing a traversal through the graph to answer the different questions that you have of the data. I think also very much unlike a relational data model where I'm going to do my third normal form or whatever and that I'm done, that's not how it works for graph data modeling. It is iterative and there is no one right data model for the data. It also depends on the questions that you have to ask of the data, and that I think is a huge benefit and feature of the property graph model, for sure.

Alison (23:08):

Yeah, it leads me to my next point, which is the idea of having simplified querying and analysis. Jason, I don't know if you wanted to talk a little bit about the querying use case that you were recently working on.

Jason (23:24):

Yeah, that was showing how in tackling like a navigation problem to answer that using tabular data stores with a bunch of Python functions, that you could do this, but it was extremely complicated, where ultimately you can take all that hopping and joins and scanning through data and all the functions of trying to decide how to plot a course when it's moved to a graph database and when you're searching just for patterns instead of data and then creating logic that you get your answer to this type of question much, much faster. For application development and iterating, it became much, much simpler. You could spend more time on the UI, you could spend more time on making sure the data is good, is clean, instead of trying to work out the complex noodling journey of getting really rigid data to conform to a dynamic system like course plotting and mapping.

Alison (24:33):

I think it brings us back to our overall topic of the day as far as the right graph for the right problem. It's really clear in your case of a navigation problem that graph is very obviously the answer. You also hit on another point, which is this idea of simpler application development. I didn't know if anybody else had thoughts on how having the right data model can really help with simplifying on the developer side what you're actually building.

Jason (25:05):

Yeah, I got a quick comment about that and this goes back to Will's mentioning about the ability to iterate quickly. In app development, when you're creating a new app, especially a new app, iteration is you're going to do it right. You're going to build initially to do one thing and then discover very quickly, "Oh, we need to handle this use case, we need to handle this data." When you can update your data model so fast and update a graph database equally fast, it allows the whole rest of the process to move quicker so you don't have to spend hours and hours trying to make sure your database schema is correct and that the queries will get you all the base data just so that you can bring it in and then run a bunch of native functions to massage the answer out of that data. You can just get your answer, depending on your question, but most of your questions you can just get it right out of the patterns of the data and then move on to do the other stuff that you have to do with app development.

Will (26:07):

I think also going back to one of the first points we were talking about, which is I think Andres brought this up originally, which is that the graph model is how you intuitively think of your data anyway, how you think of your domain. It's so natural because you go to the whiteboard and you draw out your entities and Arrows between them. That's how we think, and that's how we model and store and query our data in the database, and getting rid of that impedance mismatch for the developer where you are building your application by querying your data the exact same way that we think about it, that just removes so much friction through the whole stack of the business analyst who's identifying what the requirements are for the application all the way down to the developer's actually building it.

(26:57):

At the end of the day, you've removed so much friction from just trying to work with these systems and get the data into and out of different formats. We don't have any of that when we're building applications with graphs in the Neo4J. Everyone is thinking about your data as connected relationship first oriented data as a graph anyway, and the friction that removes, especially for the developer, I think, is really apparent as well.

Jason (27:24):

Yeah, it's similar to the benefit of GraphQL compared to REST, right? You can ask the question and get the answer in the format that you had presented versus using REST, where you get this giant glob of data, you got to pull out the pieces you want, put it all together and then use it the way you want. The difference, the acceleration in development, I feel, is similar.

ABK (27:51):

I think that's absolutely right, and I want to extend those great features about graph application development in one other angle that for all applications, application developers focus on their code, their UI, what they're building. To some extent, you want to just get the data running and just get past that, get back to doing the fun stuff of coding. But part of the reality, and I think Alison, you probably know this better than developers, is that data lives longer than the application. Your code is not going to live forever. The data probably will, close to forever. One of the beautiful things about the graph model is that it more gracefully accommodates multiple applications, multiple views of the same domain, that with the relational database you tend to some degree you might over-index for a particular application, two in the table structures, two in your schemas, so the application you're building works really well.

(28:45):

But then if you want to do something completely different with it, okay, now what are you going to do? I think GraphQL itself is a great example of this, and this is jumping from, Jason, your comments that with GraphQL, it ostensibly is just a tree view of whatever data source you've got. I've got a starting point and I've got things from that starting point that I want to grab. You can get exactly what you want and you get that exactly back, and you can optimize for that, right? You could have a backend database, let's say it's a document database, that is optimized exactly for that tree. But if your tree starts somewhere else, now that optimized tree you had in your document database isn't so optimized anymore, or the graph, there is no top tree, any part of the graph can be the top of the tree and you can very nicely fan out from that. Grab what you need to come back. There's like no penalties for it. It's absolutely amazing. What other database can you do that with?

Alison (29:39):

Exactly. That's a great point and at least to my final point, which was around this idea of performance optimization, and you beautifully explained where that comes into play, is this idea that it becomes much more flexible for multiple use cases, for sure.

ABK (29:55):

I think you're right. I think this is one of the things that. We talk about modeling, we tend to be in the schema-based mindset about how to think about the data we have. We want to have it look one particular way, but this is the other aspect of graphs is that multiple applications, multiple users of that graph can have different ways of thinking about how that graph is shaped. You can actually embellish the graph without disturbing whoever else is using it, "Oh, actually I do a lot of queries that look like this. I'm going to add relationships that match that pattern." Nobody else needs to know about it, nobody else needs to care, but your application now has been optimized for those hops.

(30:36):

This is a terrible example, but if you had a social network and you suddenly you've cared a lot about third cousins for whatever reason, okay, great, you could actually go ahead and jump from the person to all of their third cousins directly optimize for those kinds of queries. People who don't care about third cousins just don't use those paths. You can have within the same data representation manage lots of different application views of that and everybody gets exactly what they need.

Alison (31:03):

Awesome. Sometimes people say, "what's the best way to actually build your models?" And there's lots of different tools that we can use. What are some of your thoughts on how to go about even making the model in the first place?

Jason (31:19):

As we mentioned, or ABK mentioned before, the whiteboard example. If you got a whiteboard or a napkin and pen, that's a great place to start. Just start putting down words and lines and connecting things. Then from there, you could do things like you could do sticky notes, you could build your own crime board with a cork board and put everything up there. Then of course when you're ready to move it to a digital space and collaboratively do that like we do remotely, a great tools like Miro. The more formally Realtime Board, is a great tool as well, because you can drop down shapes, you connect lines, you can add notes. These are all fantastic ways to get started with data modeling and very natural and intuitive tools.

(32:06):

Also harking back to earlier conversation about data importer, again, data importer has a UI for you to be able to data model with the end goal of connecting that to CSV data. As I mentioned before, data importer uses the Arrows application as its primary interface, and the Arrows app is especially cool. Here I'll hand this off to ABK, because ABK is now helping manage the continued updates with Arrows.

ABK (32:40):

Yeah, the Arrows app, first of all, I think you're absolutely right that it is incredibly cool. It's a fun tool for actually drawing graphs. For those of you who are listening who haven't used it before, you can go to Arrows.app and check out the UI. Fundamentally, the goal of Arrows is to just let you draw graphs, so you plop down a node, you can drag out relationships to new nodes. It has lots of fun behaviors like you can grab multiple nodes at once and drag relationships from all of those at once. It's a really smooth way that there's a lot of nice behaviors for quickly putting together small-size graphs. I would say it's good for up to maybe a couple hundred nodes. You could probably get that all into one Arrows diagram. It is the brainchild, it is actually the passion project of one of our former colleagues, Alistair Jones, who has moved on to his own fantastic startup with a bit of sponsorship money we might even plug right now, but Alistair, if you're listening, phone lines are open, you can call us up and get an ad spot.

(33:40):

But Alastair has been working on this for just about as long as he's been at Neo4j. It's gone through a few, couple different iterations and the latest one is super nice. It saves to Google Drive, it has nice import export options, and most importantly I think for this particular conversation is that if you've drawn a nice graph in Arrows, you can choose to export all of that as Cypher, so that if you've got some data that you like that you've created in Arrows, it's a really nice way to make small graphs, export that straight away, run that in Neo4j, now you've got it live data. I've actually earlier today was on a call with some people who exactly that was their introduction to Neo4j that before they used any of the Neo4j tools, they discovered Arrows, they were using Arrows and they're like, "Oh, okay, we can actually export this data and use it in a database," and that was their introduction to Neo4j.

(34:31):

That was pretty awesome to hear, just anecdotally. I guess to come back to your point, Jason, about meet managing Arrows now, when we lost Alistair, Arrows has been managed under Neo4j labs and we had some material discussions about what to do with it. There's still some discussion to be had, but it looks like for sure I'm at least going to be spending a lot of my time updating the code, maintaining the code, adding features. If you do use Arrows now, you've got things you'd like to hear about how it could be improved, please reach out to me. The GitHub repository where it's available is the best place to do that and it's going to continue to live and continue to be a big part of the Neo4j ecosystem.

Alison (35:11):

I will say I use Arrows a lot when I'm answering questions in community, specifically because it makes it much easier as we've all mentioned, that natural cognitive understanding of the data. Getting back to our original community question of the day, which was, is it better to have more nodes or more properties? One of the things that came up is this person was showing like a node with many properties versus a property for each node. It leads us to the idea that if you only have a node with lots of properties, you're basically looking at a table, which isn't really going to serve you in a graph database. What you really want is you want to extract the relationships. When you're determining whether or not something is a property or a node or a relationship as was mentioned earlier, really think about what are the patterns that you're going to be looking for.

(36:08):

Are you going to have a relatively short traversal to get to your answer? Are you going to have a long traversal? Is that information, do you need to have it more readily available? If you're doing a search and part of your wear statement is based on a certain type of value, you're not going to want to have that buried in a property. You're going to want it for to have an efficient query. You're going to want to have it at the node or relationship level. The short answer to the question is, enough nodes that you don't have to go too deep to get to what you're looking for. That's my short answer. I don't know if anybody else has a different viewpoint.

Will (36:49):

The way I think about the question of should a piece of data be treated as a property on a node or should I move this out into its own node? The way I think about that is if I move it out into its own node, is there something useful that I can discover by traversing through this node? If there is, then yeah, it absolutely needs to be a node. If not, maybe it can just remain as a property. I think a good example of this is, say, movie genres. In the movie data set we have all of these movies, movies have one or more genres like action, drama, romance, whatever, and we could store those as properties on the node or we could create a genre node and traverse through those.

(37:31):

The usefulness here is I'd like to discover other action movies by traversing through the action genre. That's a good case for moving that out into its own node. I use that as a check just to think within the context of the domain that I'm working through. Is there some useful traversal that I can imagine? If so, then yeah, that should be a node.

Jason (37:52):

Adam in his intermediate Cypher presentations recently had also suggested using additional labels, which something I always often forget, I just do properties. But his example was, yeah, product node and then also adding in an extra label of discontinued or active, because he will say Boolean make pretty good additional labels, because if you are actively using a type of query that depends on if a product is inactive or discontinued, it makes sense to just add it as a label so that you can just jump right to all those appropriate nodes rather than finding all the product nodes and then filtering if it's continued or discontinued. You can just get that batch right from the get-go.

Alison (38:38):

The other thing that Adam was showing was is this idea also that you can then take that additional label, like electronics or whatever it is, and make it part of the relationship. As ABK mentioned earlier, creating a relationship, third cousin. I was working with someone yesterday and they're doing some logistics tracking, trying to identify places where there are lots of either drop-offs in an area or pickups in an area, but having those numbers not be balanced. What they were trying to determine was not just the total number in a certain geographic location, but those by shipper. Does it make sense to have just a shipper node for each of those orders? Does it make sense to create a new relationship of a shipper between two locations? It was a matter of thinking through what were the questions they were going to ask, and is that better as a newer type of relationship, a newer type of node? So that they could create the appropriate graph to answer the questions that they have.

(39:50):

There's lots of things, and to your point, I do want to promote that particular course. I watched it myself earlier today. It's the Intermediate Cypher and Data Modeling Course. Make sure that we have the link to it. I'm going to talk to you about BioCypher. BioCypher comes out of this idea that researchers in general, they're not in a commercial setting. They often don't have the time or the bandwidth to build their own processes around semantic taxonomies when it comes to their knowledge graphs. You really have to get that semantic taxonomy set before in order to create a knowledge graph that is really going to be stable as your process continues. Otherwise, things that you then build upon it are going to fail and you're not going to achieve your goal. The idea is that with BioCypher, the intention is to make it easier to create these knowledge graphs out of these existing resources.

(40:51):

The way it does this is by three main points. One is resources. Being able to do ETL on all different kinds of data and putting it through different adapters so that it can actually define what information should be on the node and the edge level. Graph modeling, sound familiar. There are lots of different kinds of adapters around the resources. The second element is ontologies, really making sure that the information is mapped well to the data so the user can select which ontologies they want to use. You can link different ontologies. We have some merge that isn't up yet, but it is on the pipeline of what's going to be coming. Then lastly, the output you just mentioned, RDFs. This combined data is then exported either into a property graph, you can export it to SQL or to RD F for the users then to use to work with their data.

(41:49):

In this case, that schema that you have, that knowledge graph that you have can then be shared so that others can easily replicate it, because as we know, research dollars are finite, and the more efficiency we can have in the scientific community, the more that can be done in a smaller space. That really is the mission of BioCypher in general, is to make this biomedical knowledge more available and just simpler, a simpler knowledge graph that the researchers can actually leverage. That's what the framework is meant to do. It facilitates this creation of the knowledge graphs and they're informed by the latest developments in the field of biomedical knowledge representation. As the field is always changing, there's this up-to-date ontology that you can have access to. That's what going on with BioCypher.

Will (42:47):

One event where we see a lot of these bioinformatics graph examples is NODES, which is the Neo4j Online Developer Education Summit, which is our annual online conference, which will be coming back in October. This will be a 24-hour event, so around the clock to capture all time zones around the world. We did that last year for the first time. I think that was pretty fun and worked out good to keep things live and engaged for everyone no matter where you are. We're going to have three tracks at NODES this year. They will be building intelligence applications, so focused on things like are you building APIs, libraries, frameworks, anything having to do with building apps with graphs in Neo4j. Second track is machine learning and AI, so very popular graph data science focused area, for sure. The third track is around visualization, which really I think combines these two aspects of building applications and the more data science aspects, visualization helps us interpret our results, explain things to the users. So tools, techniques, best practices around visualization is the third track.

(44:01):

The call for proposals. We're still taking ideas for conference presentations. If you're interested in presenting, please go ahead and submit your idea. That will be open in until, I think, the end of June, so a few more weeks to get those in. I thought it would be fun to have a little live challenge for each of us here as we're talking about NODES and thinking about what talks we're going to submit to nodes. I thought it would be fun for each of us to answer the question of whether or not we have submitted a talk to NODES yet. If we have, we want to talk about what it is. But then also maybe we could offer, what is your process for thinking about putting a talk together? Are there any resources you think are helpful for this thing?

(44:53):

I'll go first. I have not yet submitted my talk to NODES, but I am planning one and actually it's on my whiteboard over there. That's the way I like to approach putting together, even before I submit the proposal for a conference talk, I like to go to a whiteboard or just like a blank sheet of paper or whatever and just sketch out what are the big takeaways, what's the story I want to tell, what are the missing pieces? How does it all fit together? Of course, it's a graph that I end up drawing. I find that to be a pretty helpful process. Doing a whiteboard is great, because you're standing up, you're moving around, it gets the blood flowing a little more, creative thinking occurring, which I think is helpful. But Jason, do you want to tell us if you are submitting a talk and then what your process for that looks like?

Jason (45:38):

I haven't yet submitted my talk, but I want to submit a talk about the mock data generator that I've been working on, specifically on how to generate mock data for graphs, what that process is, whether you're using my app or building something from the ground-up, like what to be aware of when you're doing this thing. Since this is going to be more of an iteration rather than building something new, the app and the underlying insights are already there, so I don't really have to build anything new though. I will be expanding on the mock graft data generator. Then also I'm hoping to do a talk with Allison. If Allison, I'll pass you the hot potato.

Alison (46:18):

Yes, and I have submitted our talk. We have a talk that we're hoping to share about how to leverage graph for risk management. As I'm sure you've heard many times now, Jason and I are both big Star Wars fans and we like to use it as one of our datasets. In this case we're going to be looking at leveraging centrality algorithms for risk management, which of the empire planets, if you take it out, will have the biggest impact on the empire supply chain and which of our planets do we need to reinforce so that we do not have weak points within our system, among other things? What about you, ABK? Are you speaking at NODES or trying to speak at NODES this year?

ABK (47:04):

What I'm most interested in these days, and it's also actually been something that I've been interested in for a long time [inaudible 00:47:11] Neo4j, is data visualization and graph visualization, particularly from creating Neo4J browser and that early visualization that we hadn't baked into that. If you recall, browser had nodes with little strokes around them and these balloon arrows if you have multiple relationships. That was just a lark. That was just, "Hey, we need to put something together." We ended up creating that. Fast-forward a little bit to Bloom, when we created Bloom, we actually spent a bit of time with some data visualization consultants, thinking more deeply about what does it mean to do graph visualization. A lot of the focus tends to be on large scale, what do you do with millions of nodes, how do you actually draw that, the mechanics of it or the challenge.

(47:55):

We were taking the different approach, and this is I think what I'd like to submit as a talk, is, can we develop a visual language for graphs that is intuitive and friendly and that covers most of the use cases that we have around it? Even just in the conversations we've had today, thinking about Arrows particularly, there are two different kinds of graphs we were talking about. There's the data modeling graph and then there's the data graph like, "Oh, I want to draw an example of a graph. You have that be data."

(48:23):

When you just look at them visually, you visually have no idea that those are two different things. Semantically, we know they're different. Once we look at them, we know they're different. That's what I'd like to talk about. I'd like to explore through Arrows and then present what I've learned at NODES is, "Okay, here's surveying the field, surveying the history of graph visualization." There's amazing books actually about this as well. Condensing all of that into a talk saying, "Here's a survey of what graphic visualization could be like, and here's where I think things should go." Of course, all of that will be centered around Arrows because Arrows.

Jason (48:57):

Speaking of books, if anyone is looking for a lot of great tips for giving presentations and speaking, I've recently listened to, and I'm already on my second go around to a audiobook by Chris Anderson called TED Talks. Chris Anderson is, I believe he's the current president, leader of Ted. He didn't create TED, but he picked it up. In his audiobook, in his book, he covers just a ton of great advice that TED both gives to their own speakers and their learnings over the years of what makes a great presentation and what doesn't. But if you've got a lot of time and looking for a long form piece of content with just endless amounts of great advice, I totally recommend that book.

(49:49):

If you don't have that much time and you just want one short snippet of great TED-related advice, there is a great talk by... Oh, was it Julian? Oh yes, Julian Treasure. Great last name. He gave a TED talk called "How to Speak So That People Want To Listen." He just covers some, I guess, basic but very impactful suggestions, mainly around voice, tempo, pauses. If you're looking for a single quick high value piece of TED content, I totally recommend his talk.

Alison (50:24):

My general advice is, first of all, don't be afraid honestly, because whatever it is that you are excited about, you are not the only one. I have to say, when Jason and I started working on our Star Wars series, I was very pleased. I was like, "Oh great, somebody else is interested." At our last conference, the room was packed, and we didn't have room for [inaudible 00:50:47] was standing room only. Whatever it is that you're excited about, others are going to be excited about, so don't be afraid. But more importantly, have your talk be about something that you love, something you are excited about. I call it cocktail party test. If you were at a cocktail party with people who are like-minded, what would you want to talk to someone about? Like, "Oh, I have to tell you, I did the coolest thing." Or, "Oh, have you seen this or that?" If it passes the nerd cocktail party test, then it's probably a good idea for a conference talk.

Will (51:22):

Should we talk about our favorite tools of the month?

Jason (51:28):

Let's do it. Mine was just a small Python package called the Faker Library. I've been working with this quite a bit while developing the mock graph data generator, which makes use of the Faker Library quite a bit. This package is a great little thing that covers quite a few different fake data sets. I guess it's a tech appropriate Lorem Ipsum generator. It's fun to use for creating fake subtitles or project names and stuff like that. Yeah, if you were looking to create or use mock data in your own Python project, take a look at the Faker Library. Alison, what was your favorite tool of the month?

Alison (52:10):

My favorite tool of the month, it's technically a tool, but it's almost more of a graph thing. For the data science nerds and the crowd, if you're familiar with Adam Bly, Adam Bly is Canadian and genius. He started Seed Magazine, has been in scientific research for a long time and has been the king of all things analytics when it comes to science. A couple of years ago he started his Stealth startup, which then finally became Public knowledge and it's called System. Just recently, System launched System Pro. System Pro leverages the graph of 35 million studies on PubMed and 120 million studies on OpenAlex and creates an aggregate of the statistical outcomes across all of those studies. For example, I was recently looking up something about autism and some prenatal conditions and specifically C-sections as they relate to autism. If you put in that research question, what it will do is it will give you a summary of all the statistical outputs from the various research and they're grouped by type.

(53:29):

Then if you click on any of those, it will take you to all of the individual white papers. What's great about it is obviously the synthesis of this data is all just on peer-reviewed scientific studies, it's up to date, all the citations are available, but what it does is it generates the synthesis explicitly constrained to the sentences that are generated from the findings and the metadata in the actual research. They have a very high standard for accuracy on the models that are deployed and the performance and it's $199 for the year not to promote anybody else's product. But since I was talking about BioCypher and since I have a deep love myself for scientific research, I think it's just such an impactful way for the connection of this research to be leveraged. My shout out this month goes to Adam Bly, everyone, at System for the launch of System Pro.

Will (54:25):

My favorite tool of the month is really a piece of the GraphQL specification, and that is the GraphQL Schema Directive. What this is, as you're writing your GraphQL type definitions, which define the basic schema for your GraphQL API, you have the ability to add these annotations to each type or field. These are called schema directives. This is really GraphQL built-in extension mechanism that was foreseen by the creators of the specification that there was going to be some custom logic that builders of GraphQL servers wanted to insert and we better have a way for them to do that, and that became the Schema Directive.

(55:11):

This is so powerful because libraries and graphical integrations, tooling providers be to leverage this extension mechanism to build really cool features on top of GraphQL. The Neo4j GraphQL library uses schema directives for things like the Cypher directive, which allows you to use Cypher to define the logic for GraphQL fields or the auth directive to add find grain authentication rules, who should have access to which fields, this thing. I talked about this earlier with StepZen, they have a directive called the Materializer, which helps fetch data from different sources and stitch it together. I was using this a lot on a couple of projects in the last few weeks, specifically a lot of Cypher directives. That's I think my favorite schema directive. Just the ability to add in a Cypher functionality and GraphQL is super powerful. Anyways, that's my favorite tool of the month, the GraphQL Schema Directive.

ABK (56:12):

Those are all really good. I'm of course obsessed with one particular graph tool these days and can think of nothing else. I don't think there is anything else that exists as far as I know anymore, and that's Arrows. Arrows of course is my tool of the month. But Alison, I'll take a little bit of inspiration from your free promotion for System Pro and talk a little bit about where the Arrows creator, Alistair Jones, the brilliant Alistair Jones has gone on to create a new product of his own that has the clever name of Nifdi, spelled Nifdi.

(56:48):

If you go out to Nifdi.App, you can see the beginnings of what Alistair's creating as where he had all of this thinking around Arrows and taking that I think to another level as a general purpose diagramming tool, not specifically for graphs, but it looks like it's going to be fantastic. Because we know Alistair from his work on Arrows and everything he's done with Neo4j over the years, I have high expectations for this. I think you should go check it out. Sign up for the early beta, whatever's available and support Mr. Jones, because he deserves it. He's a fantastic human being.

Alison (57:24):

Sounds Nifty.

ABK (57:25):

This has been quite a marathon session, I think, right, Jason?

Jason (57:29):

Do you want to let us and our audience know what things are up coming?

ABK (57:34):

Jason, yeah, I'm happy to talk about some of the upcoming events we've got out in the graph world. In the US there are a few Graph summits. Graph summits are one day concentrated conferences around Neo4j graphs. In the US, you're going to be able to do that in Boston, coming up very soon on June 6th. Just after that over in Portland. I think that's probably Portland, Oregon, not Portland, Maine. Apologies to one of those cities. But Portland on June 8th and then Palo Alto, which I'm pretty sure is Palo Alto, California on June 14th. We also then have international Graph summits that continue around the world. In Jakarta on June 6th. Paris and Singapore, both on June 8th. Rome and Mumbai, both on the same day as well on June 13th. Tel Aviv on June 19th. June is chock-full of graph summits, somewhere around the world you could be near one. If you're not, trains, planes and automobiles.

(58:28):

We have some great virtual events happening as well. There's going to be The Pros & Cons of Native Versus Non-native Graph Databases happening on June 7th and 8th with regional friendly time zones, either for APAC, EMEA, or the US. For these events though, actually all the events, you can always go out to neo4j.com/event and find the details that you'll be able to see there. There's also great thing happening about LLMs and Knowledge Graphs. Everybody's favorite topic these days. That's going to be happening on June 13th. Our Swedish office, Neo4j is a Swedish company, and there's actually going to be a meetup at our offices in Sweden coming up sometime soon in Malama. That's also going to be happening in June. It'll just be a regular meetup, and the mock data generator is going on tour. Jason, where's the tour starting?

Jason (59:15):

I will be in Japan next month, and one of our ninjas, Koji, has graciously set up a meetup in Tokyo on June 23rd. If you're in Japan, I will be in Tokyo giving a presentation on creating mock deeply interconnected datasets in Tokyo.

ABK (59:34):

Awesome. It's a great opportunity. That's all we've got for events coming up in June. Lots of things happening around the world. Again, if you want to catch up on any of these things, go to new Nifdi.com/event. You'll find all the details there for anything that's happening near you.

Jason (59:48):

Cool. Thank you, ABK. Thank you, everyone.

ABK (59:50):

Great to sit down with you guys.

Will (59:52):

For sure. We'll see you next month. Bye-Bye.

Jason (59:55):

Bye.