GraphStuff.FM: The Neo4j Graph Database Developer Podcast

Graph Database Use Cases That Will Blow Your Mind

Episode Summary

Explosively expand your understanding of graph databases! Join Alison Cossette, Andreas Kollegger, Will Lyon, and Jason Koo as they take you on an exploratory journey on some of the most innovative and impactful graph database use cases. Also catch up on the latest news, recent video recaps, Neo4j product updates, and get a preview of things to come.

Episode Notes

Call for Community Speakers: https://dev.neo4j.com/submit-your-talk
Neo4j User Survey: https://docs.google.com/forms/d/e/1FAIpQLSeDbYReEL8JXtlNKsBodtAUnlkuGsAdWJLAQornvxWihembMQ/viewform
Data Importer File Filter: https://neo4j.com/developer-blog/neo4j-data-importer-introducing-file-filtering/
Bloom Slicer: https://neo4j.com/docs/bloom-user-guide/current/bloom-tutorial/slicer/
Experimental Client APIs: https://medium.com/neo4j/a-new-steer-for-the-neo4j-drivers-ed3de051209e
Additional 5.6 release notes: https://neo4j.com/release-notes/database/neo4j-5/
Intro to Neo4j Workshop Recording: https://www.youtube.com/watch?v=otCRuINPUak
Intermediate Cypher Workshop Recording: https://www.youtube.com/watch?v=IJBTdWig564
Building a Routing Application Workshop Recording:
Register for Spring Data Workshop: https://go.neo4j.com/WBR-230412-Training-Series---Spring-Data-Neo4j-DevRel_Registration2.html
FOSDEM 23 Graph Systems + Algos Devroom Sessions: https://fosdem.org/2023/schedule/track/graph_systems_and_algorithms/
Raspberry Pi Neo4j v5 Installation instructions: https://gist.github.com/jalakoo/7b854f9c60c237f2d6f8c9fd5ca1356f
Tomaz Bratanic’s Knowledge Graph-Based Chatbot With GPT-3 and Neo4j article: https://medium.com/neo4j/knowledge-graph-based-chatbot-with-gpt-3-and-neo4j-c4ebbd325ed
Open Sanctions With Friedrich Lindenberg: https://www.youtube.com/watch?v=T3uGVrrMeTo
Linkurious partnership: resources.linkurious.com/openscreening
Max Andersson’s Innovation Vertex Podcast with Filip Asblom of Digital Tvilling: https://www.youtube.com/watch?v=NS3xEFR1jCA
Hoare Lea’s Will Reynold’s presentation: https://www.youtube.com/watch?v=OOihIxbo0dE
David Meza’s Keynote from KGC 2022: https://www.youtube.com/watch?v=K2EWl7Mn9Ac
Bloom Coordinate Layout: https://www.youtube.com/watch?v=ZYIasTOhLtI
Varun Shenoy’s GraphGPT app: https://graphgpt.vercel.app
Alison Cossette’s Interview with Varun Shenoy: https://www.youtube.com/watch?v=A6uTdDs6E-E
FastRP’s Memory Estimation: https://neo4j.com/docs/graph-data-science/current/machine-learning/node-embeddings/fastrp/#algorithms-embeddings-fastrp-examples-memory-estimation
Neo4j Fabric: https://neo4j.com/docs/operations-manual/4.4/fabric/introduction/#fabric-fabric-concepts
Graph Summit in Munich: https://neo4j.com/event/graphsummit-munich-2/
2023 Knowledge Graph Conference in NYC: https://www.knowledgegraph.tech
Opportunity at Neo4j - Technical Curriculum Developer: https://boards.greenhouse.io/neo4j/jobs/4030477006?gh_jid=4030477006
Submit questions and your top dev tools to: https://www.speakpipe.com/GraphStuff

Episode Transcription

Alison (00:00):

Hello all and welcome to the April episode of GraphStuff FM. This is Alison. I'm here with Jason.

Jason (00:06):

Hello, everyone.

Alison (00:08):

Will.

Will (00:08):

Howdy.

Alison (00:10):

And ABK.

ABK (00:12):

Hi.

Alison (00:12):

We're here this month talking to you about mind-blowing use cases of graph databases and graph data science. And I actually have some very interesting news for everyone. Gentlemen, I don't know if you've heard this yet, but did you know that we have a new product? Have you heard about this? I'll tell you all about it. In this episode we are introducing our newest graph database, called Graphy McGraph Face. And what we've done is we've taken the popular trend of naming things after boats, and applied it to graph databases. So Graphy McGraph Face is a seaworthy database that you'll need. It's got a slick interface that makes your data analysis feel like a day out on the water.

(00:51):

And you may be asking yourselves, "How is that possible, Alison?" What we've done is we've taken the latest and greatest of the metaverse and Graphy McGraph Face comes with a special feature that will blow your mind. It's called the Graphy Boat'O'Meter, and it allows you to visualize your data as if you were on a boat, so you can see which nodes and edges are riding the waves, which ones are lost at sea, and it's a perfect tool for when you need to make sense of your data, or if you're feeling a little bit seasick. So, there's no need to settle for boring old graph databases when you have Graphy McGraph Face. So, you can try it out today. Let us know what you think. There is one slight disclaimer. Graphy McGraph Face may or may not actually exist, and any resemblance to a real database is purely coincidental. So, this is our April Fools for you. Welcome to the April episode. We hope you enjoy this episode of Mind-Blowing Databases and Use Cases.

Will (01:51):

Nicely done. You really had me going there for a moment. I was pretty excited to do some nautically themed graph data basing. Disappointed that that's not a real product. But yeah, let's talk about some quick things, news from the community going on. One thing I want to mention is, if you are interested in presenting your graph story, or a project that you're working on that you want to share with the community, there is a form that the Neo4j community team is collecting this information. So, we'll link that in the show notes. You can basically submit your idea for a Neo4j community presentation coming up. Also, if you are a Neo4j user, the Neo4j product management team would love to book a 45-minute user survey session with you, so you can help give feedback and help us improve Neo4j. Also another form for that which we will link in the show notes.

(03:00):

And the third thing I want to mention, this is a little bit of a teaser. This is not quite ready for submissions and registration yet, but for Global Graph Celebration day, which is April 15th, thereabouts every year, in honor of Euler's birthday. Oiler, the inventor of graph theory. We try to celebrate graphs with the community every year. And this year, we are going to be launching a hackathon around Euler's birthday and Global Graph Celebration Day. The theme is going to be related to Neo4j Aura, and specifically integrating Aura into your development workflow. So stay tuned, look for that announcement. Probably the best way to find out about that is to subscribe to the Twin4j weekly newsletter this week in Neo4j. So, if you're not subscribed to that, definitely check that out, and you can find out about cool things like what I just mentioned. And so with that, I think we'll go over to ABK to talk about some of the product updates that were released this month.

ABK (04:13):

Sure. Thanks, Will. If you've got some feedback to give for that 45-minute survey, that's awesome, but if you're looking for new things to provide feedback for, here's a couple of options for you. You've seen probably the awesome new data importer tool that we have, and that's great if you already have some files prepared and you just want to quickly create a graph out of that, that's great. Sometimes though the files you've got maybe need a little bit of massaging. And one of the common things is that you might have one file with just all nodes of different types, and one other single file with different types of relationships.

(04:46):

Previously, you would've had to have done some data processing on that, tease apart all the nodes and tease apart all the different relationship types, and then do the importing with the data importer. Now, there's a great new feature called the file filter, which lets you set criteria as you're going through the import file, and you can say, okay, look at this column, and that's going to let you know what node I'm importing or what relationship I'm importing, and map that differently into the data. It's a small little feature with a huge productivity bonus. I think it's definitely pretty cool.

(05:15):

There's another feature worth commenting about, everybody's favorite data visualization tool, Bloom, has a new feature called Slicer. Sounds very edgy, don't you think? Graph database with an edgy feature. Of course, this is fantastic. What Bloom Slicer really does for you, if you think about all the properties you've got in the graph, you can basically grab a property that is numerical or maybe time based or geo, and pull that out into what amounts to a little scrubber. You pull out that dimension of the graph, and then you can filter the graph based on that value. So, you can see ranges of values, and then you can decide what ranges of values you want to see. It's a really nice way to drill down into the graph that you're looking at, and see different parts of it.

(05:57):

Another thing worth mentioning is that our beloved Neo4j drivers and all the different languages that we support, out of the box are incredibly powerful. You've got lots of different options for how you want to run your transactions if you do them in sessions, if you want to keep track of bookmarks. It's very sophisticated stuff. Also possibly a little daunting when you're just getting started. So, there's an experimental API for using the drivers, which basically makes a bunch of good decisions for how to handle bookmarks, how to handle sessions, how to do all that stuff, and collapses all of the API calls into single API calls. So, you could pretty much just say, "Hey, run the Cypher query and give me the results," without worrying about how to do all that, and what to do with failure modes and things like that. Definitely worth checking out some experimental client APIs.

(06:44):

And there's a Neo4j 5.6 release, which has the latest of everything in it, really just APOC, the latest tools, a bunch of other stuff. And of all the features that's in there, I'll actually call out one particular thing. It's been a slow burn trend throughout Neo4j, that I think is worth appreciating. There's a new feature for being able to through Cypher show settings. Previously you might have had to have called, I think there was like a DBMS dot list config or something. It was a bit of an odd thing you had to actually use for figuring out how this particular database is set up. Now it's actually just a formal command called show settings, show databases, show other things you can see. There's this nice trend of actually recognizing the objects that the database manages. You can show those objects, you can then manipulate those objects. The mental model is getting cleaned up a little bit and becoming more sophisticated, a little less feeling like it's file based, a little bit more feeling like these are logical things you're working with.

(07:40):

It's a subtle shift that's happening. It takes a lot of time to do it all, but I think it's going to be pretty nice when it all comes together. Nice improvements every day, every release, so that's really great. So, those are the product updates I've got for you this time around. Go check those out, and call your local PM and schedule a survey to talk about these new features, and give them some feedback about what you like, and what maybe could use a little more work. That's all for me. Hey Jason, what do you got going on?

Jason (08:07):

Hey, thanks for those updates, ABK. Yeah, so this month by the time you're listening to this podcast, we would've finished three out of four workshop series. So, the first one was a few weeks ago, introduction to Neo4j with Alex Ertel and Michael Hunger. And then just last week, was Intermediate Cypher with Alex again and this time Adam Coley. And in the Intermediate Cypher workshop they covered data modeling and how to import data into a Neo4j instance. Also covered Advanced Cypher functionality, APOC, which is that awesome procedures on Cypher, this extra package for extra commands. Yeah, they covered all that stuff last week. And then this week, was the building a routing application workshop with our very own Will. And in this workshop, what was covered was in the geospatial data in Neo4j, including the spatial data types, and then also how to do spatial searches and routing. Was there anything else that was covered, Will?

Will (09:01):

Yeah, that about covers it. We built a web map, simple web application to search for addresses, points of interest, and view and compute routes between them. One thing that I think is neat that we did in this workshop is, we built functionality for auto complete. So, in a text box when you're searching for an address or point of interest, being able to have the dropdown with a bunch of options for auto complete as you're typing, something that a lot of sites have and use. So, we learned how to build that using data from Neo4j. So searching Neo4j for addresses that match what you've typed so far, and we saw how to use a full text index to be able to power that in Neo4j. That's a neat feature to know how to do, because that is very useful for building web apps in general.

Jason (09:51):

No, totally. That's a great feature to have. Very cool. Awesome. So yeah, everyone listening, we'll post up links so you can catch up with those workshops. The final workshop, which is the fourth one, will be on spring data, and this one will be taught by Garrett Meyer, who's one of our software engineers. And he will show how to start developing an app using the Spring Data Network. And he'll be covering three abstractions as part of the spring network, the Neo4j clients, templates, and repositories. So if you're interested in that, you can see the link that we'll post, and you can just register and take part in that workshop.

(10:24):

So some other news, FOSDEM '23, their recordings are available, and of interest from FOSDEM is, they had a whole dev room dedicated to graph systems and algorithms. And of those, I'm choosing a top three that I would recommend. So, first one was the LDBC, which is the LinkedIn Data Benchmark Council. They showed some of their open source tooling for benchmarking graph databases, any datasets going from one gigabyte to 30 terabytes. And the way their process works is, they have a bunch of certified auditors.

(10:59):

So, a company can submit their benchmarks, and what these certified auditors will do is, basically repeat the benchmark testing using their tool sets and workflows, and basically certify that these databases can move as fast as are claimed. So anyways, yeah, that's a good talk to checkout. The other two I'd recommend is TEDetective, which TED is capitalized. So, when you first look at it, I thought it was like, oh, it's a TED Talk as a craft database. It actually stands for Tenders European Daily. That's what the TED part stands for. So, what they're doing here is, it's an open source solution for exploring public procurement data for basically non-experts. So in the European Union, a lot of these public procurement programs are public. So, what they've done is, they put this into a system where you can explore and peruse the data.

(11:50):

And the last talk that I'd recommend checking out is the ipysigma talk. So ipysigma is a Jupyter widget for doing visual network analysis in a Jupyter Notebook. So, that's a cool plugin. If you're using Jupyter notebooks, definitely check that out. Next on things that happened this month, there was Pi day, which is supposed to be for mathematical Pi, but every time Pi day comes up, I want to do something Raspberry Pi related because Pi and Pi. So, this year I went and created some instructions just for downloading Neo4j Five onto Raspberry five if you want to do a project like that.

(12:28):

And I started working on a future Halloween project that I want powered by Raspberry Pi. So, this was very timely for me to get up and running. But anyways, getting it running, a pie, is very simple, so long as you have Java installed. If you don't, you just install it, then you can download the Neo4j Five package, and then you're basically good to go.

(12:49):

Now, the only thing is really deciding which package you're going to go with. Are you going to go with community or enterprise? And then, if you go with enterprise, you've got two licenses to consider. You've got the commercial license and the evaluation license. If you're just playing around testing, especially if you're starting off on a Raspberry Pi, you can download the enterprise edition and just choose the evaluation license to get started. Now the enterprise edition, and probably you're wondering what's the difference between community versus the enterprise edition. So, the enterprise edition has a number of features that the community edition doesn't. For example, the Enterprise edition has access control, the ability to do backups and metrics. So, it has these extra features that you may want to consider, in which case want to go with the Enterprise edition.

(13:33):

So anyways, downloading Raspberry Pi is pretty straightforward. The only time it gets tricky is if you want the pie to auto-start Neo4j. Every time it restarts, you just have to configure system CTL to do that. And that's a few steps. I'll post a link to those instructions in the description. And last thing I want to talk about in terms of recent news, is just a little bit about GPT since now that continues to make a lot of news. Thomas Breathnach... Oh man, sorry Tomas, I always butcher your last name. So, he wrote this, a really good article knowledge graph based chatbot with GPT-3 and Neo4j. And in this article he goes and basically demonstrates how to seche GPT to construct Cypher queries against your Neo4j instance. And he does open his blog, talking about using chat GPT to construct data to import into a graph database. But because GPT, especially version three has, how shall we say, has a hallucination issue. So, if it doesn't really know the data, it'll just give you some answer. So, this can be a problem depending on what you're working on.

(14:39):

So, he ended up demonstrating using an NLP custom pipeline that he had worked on previously, and then uses that to feed data into a graph database. And then, trains a prompt to create the Cypher queries to extract the data in that instance. And then, he uses a streamlet chat for the UI, and he put it onto a doctor container. So, it's a great article if you're looking to use something like ChatGPT to help you take care of that query layer. So, if you're not yet a Cypher expert, but you would like some assistance on... You know what information you want from your graph database, here's an alternative to get a system that will help you build those queries. So, those are all the updates for me.

Will (15:22):

So, every week there are at least two, sometimes more, live streams on the Neo4j live stream. Usually run our friend and colleague, Alex, usually interviewing a guest, talking about their project, and sharing some things. And this month my favorite livestream was with Alex and Friedrich Lindenberg, talking about the open sanctions project. I met Friedrich a few years ago in Berlin, I think, for the first time when he was working at OCCRP, which is the Organized Crime and Corruption Reporting Project. So, a group of investigative journalists. And at the time he was working on a project called Aliif, which is a tool for sharing and managing data for investigations, combining data sets from documents, like an investigative dashboard. And as part of this project with Aliif, Friedrich developed this Follow the Money data model, which is a schema for data journalism investigations, that deal with how to look at assets, money, people, how they're connected in the context of a data investigation.

(16:37):

And Frederick's taken this idea from the Follow the Money data model, and applied it to his latest project, which is OpenSanctions. And OpenSanctions is really all about combining data from different sanctions lists. So, sanctions are lists of people and companies that have been sanctioned by some governments, because they're doing something that the government typically doesn't like, and doesn't want to help promote. But not just the people, but also dealing with their relationships to companies, investment holdings, other individuals. And so, the goal of OpenSanctions is to combine all of this data from multiple sources into a single graph.

(17:19):

And this is interesting timing. There's a lot more interest around sanctions since Russia's invasion of Ukraine, that has increased the number of sanctioned individuals significantly. So, data and API access are available currently through this project at opensanctions.org. And Friedrich also talked about a recent collaboration with another project called OpenOwnership. So, OpenOwnership is combining corporate registry information from lots of different companies. So, in many countries the information about who owns what companies is public information. And so, OpenOwnership takes that data and combines that. And so, this new collaboration that Friedrich was talking about with OpenSanctions, OpenOwnership, and Linkurious, which is a graph data visualization tool. If you're been a longtime Neo4j user, you're probably familiar with Linkurious.

(18:15):

So Frederick showed some really interesting examples of what you can do with this data that are really powerful for data investigations. Typically, investigative journalists. So, things like show me all the people that are sanctioned and own a company in a certain country. So, this is really difficult today for journalists to answer these questions, but Frederick showed how that's possible with all these tools together. So, that was a really neat one. Definitely recommend watching this recording. We will link that in the show notes. And I think Alison is going to tell us about another livestream that happened this month as well.

Alison (18:55):

Yes, our friend Max Anderson who's here at Neo4j is starting another livestream series associated with GraphStuff FM, and it's called the Innovation Vertex. And what Max does is, he goes out into industry, and interviews folks who are really taking graph and using it to move some innovation, and moving a project forward in a really innovative way. The first episode that just came out is featuring Philip Asblum, who's from Digital Tvilling out of Sweden. And what they do is they actually... I'm not sure folks are familiar with the concept of Digital Twinning, but in Digital Twinning what you do is, you take some sort of system that exists in the real world, and you create a representation of that so that you can do additional analysis and have a different understanding. So for example, one of the things they worked on recently, his company worked on was, digital twin for the transportation administration in Sweden so that there's new ways that they can actually understand the... They can test prototypes.

(20:04):

So the idea is that, when you have this digital representation, you can actually do different simulations. So, what they did was, they went through and they were able to develop and test prototype solutions with the aim of streamlining and optimizing traffic information across organizational boundaries. So, really what it allows them to do is, in organizations and complex systems where you have lots of subdivisions and things are more siloed, it allows you to create this representation so that you can test across the entire system. So, that's what Philip has been up to at Digital Tvilling, and I'm super excited about what Max and... I've heard a few of the people that are coming up. So, definitely keep an eye on that, and he'll be bringing you all the latest in innovation that's happening in the graph space.

Will (20:57):

Now, it's time for our listener audio question. So, just a reminder that anyone can submit an audio question to be featured on a future episode of the podcast, just go to graphstuff.fm, and look for the submit a question button, and you can record your question and send it right to us. This question for this episode comes from a very special member of the graph community. Let's give it a listen.

Emmel (21:31):

Hey guys, my name is Emil, and I have a question for the podcast. One of the most magical things about the property graph model, at least to me is just how broadly applicable it is. And out in the real world, we've seen it used in fraud detection in banking, we've seen it used for recommendations in retail and digital twin in manufacturing and the use cases just go on and on. And we've seen it being deployed to chase down Russian oligarchs. We've seen it used to be a part of finding the cure for cancer and the mission to Mars. So my question for the podcast is, out of all these use cases that we see out there, just personally for you, what are the use cases that are the most mind-blowing to you?

Will (22:18):

Great. So, for those who don't know, that was Emil Eifrem who is the co-founder and CEO of Neo4j, and I bet Emil has definitely seen a lot of mind -blowing graph database use cases over the years for sure. Yeah, that's a great question. I love it. So for me personally, I started using Neo4j, I know probably about 10 years ago or so first at a startup I was working at and then for the last several years working at Neo4j. On the developer relations team, my job is to help developers build applications with Neo4j. And so as part of that, I get exposed to a lot of really amazing things that people are building. And Emil talked about a few of the more known public projects, but for every one of those there are hundreds of equally super cool and amazing things that we see people working on that we can't talk about.

(23:13):

I'm always drawn to these projects that are representing the physical world as a graph, like Alison was just talking about digital twins. And I think I'm drawn to that, because it reminds us that graphs are everywhere. This is an inherent data structure that describes connections, and we see those in the physical world. This is not just some human construct. I think one specific mind-blowing representation of this came from Will Reynolds, in a nodes presentation. So Nodes is the Neo4j online Developer Education summit. So I think this was in 2020, maybe 2021. We'll link the recording in the show notes, but Will Reynolds was talking about the building graph. So, Will is a principal software engineer at Hoare Lea, which is the UK's biggest MEP consultancy. MEP is mechanical, electrical and plumbing system. So think of the large scale infrastructure inside very large buildings.

(24:17):

And what Will talked about surprised me. What he was saying is essentially that, they used lots of different software systems to design and manage these different electrical and HVAC ducts, plumbing systems, these things. And it's really difficult to move data back and forth between these different systems. So, these are the Autodesk AutoCAD layout kind of things and other tools that they're using. So, they end up moving to these intermediate formats, like Excel files to move data back and forth, and we're wasting a lot of time doing this. And so, Will worked on this project that he called the Building Graph, which is really this cloud distributed single source of truth for all of their building data on a single project that works with all the different systems that their engineers and designers are using. So, this covers electrical systems, plumbing, IOT, sensors, these sorts of things.

(25:20):

Want to bring that all into Neo4j in a single graph. And they used the Neo4j graph kill library, and specifically the graph kill schema to drive a lot of that, which is great. That makes a lot of sense. This use case was mind-blowing for me though, of two reasons really. One is that I think this did a great job of showing the power of graphs in Neo4j to represent real relationships in the real world, like the ducting in a building. That is a graph. That was super neat, but also highlighted this ecosystem of technology tooling that exists around Neo4j, that I think makes Neo4j so much more powerful. So, he talks about using the Neo4j graph kill library to automatically build graph kill APIs, the near Neo4j Kafka connector to pick up changes from these different systems and bring them into Neo4j, as well as using tools like Node Red and Dynamo, which are these kind of low-code application development tools, visual programming languages.

(26:19):

So, that was really neat to see, just all these things working together. But then, the other mind-blowing aspect comes in when Will demonstrates this augmented reality building view that overlays the graph representation of, let's say the electrical system in the building, overlays that over the view of the interior of the building as you're exploring in 3D. And you can click the graph representation. So, click on say a relationship that represents some piece of the electrical system. And in your augmented reality view, all of those systems are highlighted where they go throughout the building. So, you can see that in real time in this augmented reality view, which is really pulling together this idea of graphs in the physical world, and that really graphs are everywhere. This for me, I think was a really cool mind-blowing use case. Jason, what's your favorite mind-blowing graph database use case?

Jason (27:16):

My favorite use case is probably just knowledge graphs in general. So, using knowledge graphs to find hidden or really interesting data, that would otherwise be hard to find. I haven't been working with Neo4j as long as Will has. I got introduced probably about four years ago, and it was through the ICIJ, through the International Consortium of Investigative Journalists. Their work on the Panama Papers, using a graph database to power their investigation. I like to think of graphs as... Or knowledge graphs in particular, as a giant digital crime board. It can link very disparate points of data and coalesce it in a way that you wouldn't see when you look at information in silos.

(27:57):

So after the Panama Papers, after I learned about Neo4j, I went on a year after that to work on a project. They ended up not seeing the light of day, but what we did was, we had built this weighted multi-skill search engine. So, the idea was you would take your existing team of developers, and you would get an aggregate score of what the team's abilities were. Are they expert Python developers? How good are they with Kubernetes? And then, show where in this graph that the team was missing some skill sets, so that when you went to bring on another developer, you would have this radar view, and you would see, does this person's puzzle piece looking set of skills overlap or help fill in the gap of what the existing team was or has?

(28:46):

And this was a difficult thing to do if you were to do just in a relational database. Because what we were doing was, we were allowing people to search not only by the sets of skills that they wanted or what they were looking for, but in a weighted fashion. So, are they beginner, intermediate, advanced? And then, when you're searching between ranges of these skills, how do you aggregate that data? How do you determine which candidates should be surfaced up to the top? Ultimately, we ended up using the harmonic centrality graph data science algorithm, which has a great name I think. Which is a variation of the closeness centrality algorithm, which is used for finding how close a particular node is to other nodes. So, the answer is going to be a node that is closest to the set of parameters that you have put into this.

(29:32):

So anyway, so harmonic centrality is a version of closeness centrality, that is capable of taking into an account graph data that is disconnected. So closeness centrality requires... You can tell me if I'm wrong here, Alison. My understanding is, closeness centrality algorithm requires that the graph be all connected. It won't take into account disconnected sub graph components.

Alison (29:55):

That is correct.

Jason (29:55):

Awesome. All right. Even just a straight-up Python developer with minimal graph data science skills, now you just read the descriptions and you can find an algo that will work for you. So, if you're interested in something like what we were building, David Meza, who is the head of analytics over at NASA, and has been using Neo4j for quite a while, he gave a keynote at the Knowledge Graph conference last year. And in that keynote he brings up quite a few things, but in one of the things he's worked on, is this using knowledge graphs to find hidden skills within NASA itself. So, his database was way more comprehensive than the one I was working on. He combined job information, which occupations different NASA employees had, their training, which missions they had worked on, and other internal employee data sets. And put it all into this comprehensive knowledge graph.

(30:46):

And after it was available, he was able to give answers to different team members. And these questions that he was able to answer were things like, what is the next career move that an employee should consider, based on their existing skill sets? What is just beyond their current capabilities, what do they have to learn to make it to the next step? And what are those occupations that are available to that person?

(31:11):

I was also able to find related skills between [inaudible 00:31:14] departments. So, a team was going out to a recruitment fair, and they wanted to know basically what skills they should look for when they were looking to fill a particular role on the team, what skills were the highest priority and at what level, what that order was. And another thing he was also able to answer was, what is the diversity in the current workforce. Because they had general demographic info, and so they could split the entire employee data set into different groups, and find out how well they were getting coverage on. So yeah, knowledge graphs. And Alison, I think you have a great mind-blowing example. What was yours?

Alison (31:53):

I do have a mind-blowing example. But before I get to mine, I just have to say, what you brought up is something that's always been super interesting to me is how do we understand teams as graphs? And there's some really interesting research that was done by E.R. Rowan who was at MIT about understanding where the knowledge needs to be within the team, and how to actually balance your team based on that knowledge. So, bring up one of my favorite researchers, nerd alert. But my personal mind-blowing use of graph is a company called Basecamp Research. And these folks, they're just fascinating to me.

(32:32):

So, they have the world's most comprehensive knowledge graph of nature. Literally nature. So their graph, they have hundreds of geological environmental chemical tags, they have environmental context. And so, where their strength lies is around protein research. And so right now, most protein design is limited by public data sets that are available.

(33:04):

And so, what they've done is, they go out and they go into the far reaches of the world. They have folks that do deep sea expeditions. They go into all different kinds of territories around the world to sample biodiversity. So, 90% of the proteins in their graph are actually entirely new. Their database right now is larger than the largest publicly available database, and is significantly less redundant, but they have increased the number of proteins known to science by 50%. So, when you think of all the proteins that were known to science, to think that they were able to move that far forward in what they're able to understand about planet earth, it's truly mind-blowing to me. If you're interested, I definitely suggest that you just look in on Basecamp Research, because they really are in this mind-blowing space of understanding what is actually occurring with life on earth and the proteins on the planet. Totally mind -blowing. So, that's certainly mine. I think, Will, do you have a favorite tool this month?

Will (34:20):

Yeah. Let's move on to talking about our favorite tools of the month. Some things that have brought us joy to use this month, related to Neo4j or not. Mine happens to be related to Neo4j. And this is specifically the new coordinate layout feature in Neo4j. So, we talked about Neo4j as this graph data visualization tool, allows us to visualize and explore graph data in Neo4j. Typically, when you're working with graph visualizations, you're working with the default force directed layout where you have clusters that form based on the physics simulation that that runs. Bloom also supports hierarchical layout, if you have hierarchical data. But if your nodes have location data associated with them, so either a point longitude and latitude, or just XY coordinates, you can choose to lay them out using this location information from the node properties.

(35:19):

So, I had a lot of fun with this in the last few weeks. I was looking at things like airplane routes. But the most interesting ones that I found using this new coordinate layout feature, is working with open street map data. When you combine this coordinate layout system with the graph data science integration in Bloom, this is when things I think start to get really interesting, because you can not only lay out all of the say intersections and road segments of a city, but you can then run graph algorithms on the road network to find things like what are the most important intersections in the road network. You can find neighborhoods with the algorithms, and style and color them differently in Neo4j.

(36:09):

So, combining those two features, the coordinate layout feature in Bloom, with the graph data science for visually styling your visualizations using graph algorithms, definitely my favorite tool of the month. I've been tweeting out a lot of these visualizations of different city road networks, just because I thought they were interesting. So, I'll include a few of those in the show notes. Over to Jason. Jason, what's your favorite tool of the month?

Jason (36:34):

Thank you, Will. Talking about that coordinate system, I'm looking forward to playing around with that. As part of a project of an upcoming a sci-fi celebration day, I am moving data from a particular sci-fi series Galaxy Map into a Neo4j instance, and being able to have that coordinate system be able to mimic the shape of that galaxy is... I'm really looking forward to seeing how that all plays out.

(36:57):

But yeah, my favorite tool of the month is actually GraphGPT, which was developed by Varun Shenoy, who is a Stanford student. I think just a few days ago, Alison actually interviewed Varun on this app that he had developed. So Alison, you might have some more insights on this project. What GraphGPT is, a simple app where you just take a prompt of a story or something you would like to see as a graph, and you just put it in and it will take that narrative text data and break it down into a graph.

(37:28):

So, say you were looking at the history of the Battle of Thermopylae, given the summary you'll go, "Oh, King Leo led the Greeks." Well, the Spartans specifically, and they fought it. So, it'll create this graph that you can visually look at, and see the interrelationships between different components. So, it's very impressive. It's built by using [inaudible 00:37:45] for the UI, and it's very simple, and probably the key core component of this app is the prompt smithing that he did to train GPT to be able to give a consistent answer in triples, that could be imported into the graph visualization library that he used.

(38:03):

I've been playing around with a lot, because what I've basically done is taken his work, and I'm now adding it to a mock graph data generator that I've been working on and off with for the last few months. So, the idea is to take basically what he did with GraphGPT, and allow someone to put in their particular use case in a narrative form, and get back a graph representation of it, that is importable into Aros app, so that someone could take that graph output and immediately start massaging it, playing around with it, and then use it to create data to import through the data importer. So, that's my favorite tool of the month, this month. Alison, what was your favorite tool?

Alison (38:48):

If you hadn't taken my favorite tool... No, I'm kidding. Can I comment just a little bit more on what Varun has built? One of the use cases of it that I find really interesting for myself is, my youngest son has autism and sometimes when he's reading he gets lost with who all the characters are and how it all fits together. And so, what we've done a couple of times is, he'll be reading something online and he'll be struggling with it, and we'll take the text, we'll drop it into Varun's GraphGPT, and it will create an image of how the things are connected, and that makes sense for him in his brain. And so, what I love about what he did is, there's so many different applications, but anytime that there's a way that we can empower neurodiverse children, I'm always going to celebrate that. I'm going to do a plus one on your tool of the month.

(39:43):

My tool of the month is not nearly as exciting, but it's around memory estimations in GDS, graph data science. So, I was recently answering a question about one of the node embeddings algorithms called FASTrp, and the community user was trying to figure out how they could manage the memory. So, one of the aspects that we have available in our additions is, we have in the stream of GDS. So, in this case it was gds.fastrp.stream. We have an estimate method. So, what you can do is you can use that and it will give you an understanding of how much memory will be required to run that actual algorithm. So, I know we're always trying to manage the memory, especially if you're working on say a desktop server. So, just know that memory estimations are out there and available. ABK, how about you? Do you have a favorite tool of the month?

ABK (40:41):

I do. Thank you, Alison. My favorite tool of the month is one that I was reminded about recently, and I wanted to remind everyone about it. I think it is it quietly awesome and incredible, and it's called Fabric. And Fabric in Neo4j really means in practical terms that if you have some cluster deployed with multiple databases running on that cluster, you can take a couple of those databases and create a composite view that lets you see those databases as one thing. That's a practical description of what it is. But what that really ends up meaning is, this is our first real step towards having graphs themselves as part of modeling things. That we can have architectures that say, actually I've got a graph over here, could be plumbing, this graph over here could be electrical. And you know what? I want to have a view that is the building view, that takes those two things and has a composite view of that.

(41:44):

Or you could have a graph that is North America, and a separate graph that is Europe. And they live on their own most of the time, but you can also have a composite view that sees those things. This is an incredible modeling opportunity that is quietly awesome and underappreciated. It is a practical solution right now for enterprises that have huge amounts of data. You can do this modeling within a single database using labels. We have conventions for doing this today. And I'm sure the building graph instance, probably just does that all within one graph, because Neo4j can handle that scale and you can do it with labels pretty darn well. But this formalizes that observation that actually sometimes we have multiple separate graphs, themselves have connections, but that we can have that are bigger than the individual graphs themselves.

(42:32):

So, I'm super excited about Fabric, because I'm super excited about where it's heading. It's really just the first steps towards having composable graphs in the future. And I think that's going to be, for me, a mind-blowing use case that isn't about a particular domain, just the capabilities of graphs you're going to get pretty amazing in the future. So, that's my tool of the month that I've been reminded of, and you should all go check out Fabric too.

Jason (42:55):

Thank you, ABK. Yeah, no, that's a great tool. That definitely doesn't get enough attention. So, upcoming events coming down the line is, again, the next training workshop, a training session on spring data. Also coming up in mid-April, we got quite a few things happening. Well, there's the Global Graph Celebration day, which is connected to... Of course, our hackathon is also connected to that. Also in mid-April is the Graph Summit in Munich. Also, we have the CFPs, a Call For Papers for nodes, that is happening later this year. It will also be opening up in mid-April. So, keep an eye out for that. And then in early May, if you're interested in the Knowledge Graph Conference, that David Meza the head of NASA analytics. If you're interested in that conference, they will be having the 2023 version in early May in New York. We'll post a link for that as well too.

(43:42):

Closing items, we are still looking for a technical curriculum developer, someone to join our team and help with developing future Graph Academy courses. And also, if you have any questions for us, we would love for you to submit audio questions to us through the link that is also in the notes. Will, I have a question here. Every month we talk about our favorite tools. If somebody wants to let us know what their favorite tools are, should they use that same audio link? Is that the best way to let us know?

Will (44:12):

Sure. Ask us a question. Let us know your favorite tool. Maybe we can splice together some audio of all the favorite tools that have been submitted in the month, but sure. Yeah, that seems like a good forum for that.

Jason (44:24):

Cool. Awesome. Yeah, so if you've got a favorite graph related tool that we didn't mention, send it our way. We'd love to know what you're using. All right. I think that's it. Thank you everyone for listening, and we will catch you next time.

Will (44:37):

Bye-bye.

ABK (44:38):

So long.

Alison (44:39):

Thanks all.