GraphStuff.FM: The Neo4j Graph Database Developer Podcast

NODES 2023: Behind the Scenes, Highlights, and More!

Episode Summary

Neo4j just wrapped up the fabulous NODES event last week, and since we at Neo4j are still running on some adrenaline (and maybe caffeine!), we’d like to recap our thoughts, favorite things, and all the technical goodies from the virtual conference.

Episode Notes

NODES 2023 YouTube playlist: dev.neo4j.com/nodes-2023
Tools of the Month:
- Spring AI: https://docs.spring.io/spring-ai/reference/
- LangSmith: https://www.langchain.com/langsmith
- Cody AI: https://about.sourcegraph.com/cody
Community Project(s):
- AD Miner: https://github.com/Mazars-Tech/AD_Miner
Product Updates:
- Vector Search Index (5.13 GA): https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/
- Parallel Runtime (5.13): https://neo4j.com/docs/cypher-manual/current/planning-and-tuning/runtimes/reference/
- Change Data Capture (5.13, Enterprise, Beta): https://neo4j.com/docs/cdc/current/
- GDS:
  - Longest Path: https://neo4j.com/docs/graph-data-science/current/algorithms/dag/longest-path/
  - Topological Sort: https://neo4j.com/docs/graph-data-science/current/algorithms/dag/topological-sort/
Articles:
- GenAI Stack Walkthrough: Behind the Scenes with Neo4j, LangChain, and Ollama in Docker https://neo4j.com/developer-blog/genai-app-how-to-build/
- GraphQL Development Best Practices https://neo4j.com/developer-blog/graphql-development-best-practices/
- Enhanced QA Integrating Unstructured and Graph Knowledge Using Neo4j and LangChain https://medium.com/neo4j/enhanced-qa-integrating-unstructured-and-graph-knowledge-using-neo4j-and-langchain-6abf6fc24c27
- Needle v2 is Here: Rebranding in the Fast Lane https://medium.com/neo4j/needle-v2-is-here-rebranding-in-the-fast-lane-30cd7dae3efb
- Neo4j DevTools Update: Workspace, MultiDB, Editors https://medium.com/neo4j/neo4j-devtools-update-workspace-multidb-editors-oh-my-f680cd0cd17e
- Speed Up Your Queries with Neo4j’s New Parallel Runtime https://neo4j.com/developer-blog/speed-up-queries-neo4j-parallel-runtime/
Events:
- (Nov 1 and 2) Conference (London): The Perfect Couple: Uniting Large Language Models and Knowledge Graphs for Enhanced Knowledge Representation https://neo4j.com/event/the-perfect-couple-uniting-large-language-models-and-knowledge-graphs/
- (Nov 2) Livestream: Testcontainers with Neo4j https://neo4j.com/event/neo4j-live-testcontainers-with-neo4j/
- (Nov 8) Hands-on Lab (Munich): Neo4j & Google Cloud Generative AI Hands-on Lab https://neo4j.com/event/neo4j-google-cloud-generative-ai-hands-on-lab/
- (Nov 8) Conference (Texas, USA): Migrate, Modernize, and Transform Your Databases with Google Cloud https://neo4j.com/event/migrate-modernize-and-transform-your-databases-with-google-cloud/
- (Nov 9) Conference: GraphTalk Munich: Pharma https://neo4j.com/event/graphtalk-munich-pharma/
- (Nov 14) Conference (CA, USA): Scale by the Bay https://neo4j.com/event/scale-by-the-bay/
- (Nov 14) Conference: Neo4j GraphSummit London https://neo4j.com/event/graphsummit-london/
- (Nov 16) Meetup: Graph Database Sydney https://neo4j.com/event/graph-database-sydney-meetup-nov-edition/
- (Nov 17) DevFest Singapore: Gen-AI and Graphs with Google Cloud https://neo4j.com/event/graphtalk-devfest-singapore-gen-ai-and-graphs-with-google-cloud/
- (Nov 17) Meetup (Delhi, India): The Data & Gen-AI Nexus https://neo4j.com/event/the-data-gen-ai-nexus/
- (Nov 21) GraphSummit Paris https://neo4j.com/event/graphsummit-paris-2/
- (Nov 23) GraphTalk Barcelona https://neo4j.com/event/graphtalk-barcelona/
- (Nov 27) AWS re:Invent (Las Vegas) https://neo4j.com/event/aws-reinvent-2/
- (Nov 27) FIMA Europe (London) https://neo4j.com/event/fima/

Episode Transcription

Jennifer Reif:
Welcome back graph enthusiasts to another exciting episode of GraphStuff.FM. I am your host, Jennifer Reif, and I'm joined today in the studio by fellow developer advocates, Andreas Kollegger and Jason Koo.

Jason Koo:
Hello.

ABK:
Hello, everyone.

Jennifer Reif:
So Neo4j just wrapped up the fabulous NODES 2023 event last week, and since we at Neo4j are still running out some adrenaline, and maybe just some caffeine, we'd like to recap our thoughts, our favorite things and all the technical goodies from the virtual conference. Let's dive right in. Does anybody want to start?

ABK:
Yeah, so I think the first obvious thing about NODES is just the absolutely epic kind of broadcast of it. It's 20 hours, from multiple time zones, it just kept going and it was like every hour, there's great material, incredible depth of knowledge of all the speakers who were there. The takeaway still was like, "Oh my God." Just the people in the community, all of the speakers, it was just such an amazing, overwhelming to some degree event, and I know that I was trying to get through some of it and I couldn't. I'm sure we should probably, maybe next year, actually have some kind of challenge like who can actually watch the entire thing end to end. I have to confess that I didn't, I had to take breaks, I had to take naps to get through it, but then it's like, okay, now there's things in my head and I'm trying to write down things to go back to on the playlist once the recording's available. Did you guys have the same kind of experience, like, "Oh my gosh"? Awesome, kind of overwhelming, right?

Jennifer Reif:
Well, and I'd be curious too, how many attendees actually watched across time zone segments, like those in the Americas track? Did they also watch the EMEA track or some of the APAC track? And vice versa, how much overlap did you have between tracks? People pretty much stay within their regional time zone segment. I know I bridged a few different gaps. I logged on for the first part of the APAC track late Wednesday night, and then I actually was speaking at the end of the EMEA track, and so I was around for part of that, had my session and then stuck around for some of the last couple. And then I was online through all of the Americas track. So I would be curious how many attendees also had that, if they bridged the segments or not.

But it was a lot of content and for those that maybe didn't pay attention, the end of the APAC track, you had a short break and then you immediately started the next EMEA region segment, and then at the very end of the EMEA segment you had a few minute break and then you immediately started the Americas segment. So it was nonstop, back to back to back.

Jason Koo:
Yeah, it was a lot of great material. Even though we had put in breaks, it still felt like constant, back to back, great sessions, great information and... For a lot of the sessions. So I was the same as Jen, I caught the beginning part of when we opened up in APAC and then switched over to early morning America. And the whole time I felt like, "Oh, you know what? I could sit listening to this speaker for much longer than the time slots." Yes, so much great information. Did anyone have, well, I know we all have favorite sessions and there were a lot of great sessions, but was there one or two sessions that really stuck out to either one of you?

ABK:
Well, of course there was the fantastic Q&A sessions with Emil. We'll plug for those, of course. Those were fun. But for the material, and of course we touched on this with Emil and it was the subtext of almost everybody's talk was, of course the world is all about GenAI and large language models and all that, and there was so much about that in each of the sessions. But then the particular session that I loved, of course, because it's replaced a lot of my internal focus actually I would say, is this whole thing that we had done with Docker, and LangChain, and Ollama that we announced at DockerCon... What is it? It feels like just last week, but it's a couple of weeks ago already, right?

The GenAI Stack that was released and there was that session with Harrison Chase and Oskar Hane from Neo4j talking through, okay, let's review what is this Docker, GenAI Stack thing? How do you actually get into it? And even though I've heard that material before, every time I listen to those guys, I get a little more nugget of information about how they're thinking about it, how it all formed together and why it is the way it is. And I love the takeaway of course, which is everybody's doing so much with GenAI everywhere, we're trying to figure it out. Both technology people, but everyday people. And certainly for programmers, some of the basics of this stuff is easy enough. Okay, we've seen ChatGPT, fine. You've heard this term RAG, okay, intellectually that makes sense, but does that mean you can just pop into a terminal and just open up a new directory and just start creating something?

There's so many things where I know I could do that, I can just initialize it, get in at something, maybe start a JavaScript project, maybe a Java project. I know how to do that. Can I just fire up a GenAI project? Not really. I definitely would be doing a lot of Googling for it. And so the GenAI Stack is say, okay, let's get cloned this thing, Docker Compose up, and then bang, you've got a running system. And I remember the first time that I did that, because I caught up with a lot of this stuff, I was almost intimidated by the whole language learning model part of it. Okay, these things are giant. I feel like, okay, you must have only a data center to run these things and you can only tap in through an API.

But then when you spin up this stack, a couple of Docker images spin up and these containers spin up, you're like, "Oh, okay." The model itself, you can pull these things down, different versions of them, you can just have that running on your laptop and get great results and actually get an understanding of how the flow works. And I'm sure I'll keep talking about this. I'll stop myself from talking about it. I'm so excited about, okay, it makes it possible to just dig your hands into it and just start experimenting, which I loved. I loved the stack and I loved having the talk about it because it makes me feel more comfortable like, okay, smart people put this together to help us do stuff, and now I feel like I can do it myself also.

Jennifer Reif:
It could help provide a nice on-ramp for those who are new to it, something where they don't have to dig into a lot of the weeds and try to figure out how to assemble the pieces. It's already kind of prebuilt, you can start messing around with it and then customize or tweak or adjust or swap things out as you need to after that point.

Jason Koo:
And it's like a kitchen sink demonstrator, because there are five applications inside. So if you want to know how to do any one piece, you just dive into that part of the code and you can take parts of it and cobble it back together or make it your own. Yeah, it's a great single resource to reference, right?

Jennifer Reif:
Yeah.

ABK:
Do you guys feel like from a developer's perspective, it makes it more accessible? The Hugging Face is the obvious place for now where a lot of the model stuff is being shared. There's a lot of great information there. As a developer, when I go out to Hugging Face, I'm like, "I don't know what that stuff is." There's so much for me to learn there and there's so many long lists of things, I don't even know, where do I just start? How do I grab stuff? It's such a different experience. It's so far out of my comfort zone from like I can pop over to GitHub and look through some repositories and I can understand what to do with all that stuff. But Hugging Face was like, okay, for machine learning people, totally get it. For developer, I'm intimidated. Okay, so what are the other sections? I'm sure GenAI will come back up in our conversation. What else did you guys listen to or watch?

Jason Koo:
I listened to Sudhir's roadmap and product announcement, which is a great summary of everything Neo4j related for the whole year. Including the two major product announcements, the parallel runtime and the change data capture. So if you haven't been keeping up with all of the product updates and all the features that have been released over the last year, that was a great talk about that. Just summarized everything that gets you back to level set one of, okay, where's Neo4j at right now? What is in version five that makes it outstanding versus the previous version? And what are some extra features that you can look forward to or you might not have thought of? Like, "Oh, Neo4j can do this now." It's like, yes. So I thought it was a great... If you only knew parts of graph or you didn't know Neo4j at all, I recommend watching that session to get right up to like, "Okay, I know now what our capabilities are at this stage."

ABK:
There's an aspect of that that I want to call out. One is, Sudhir isn't just some random dude, it's our chief product officer. So he gave us one of these talks, which is fantastic to have him involved along with the rest of the community and the parallel runtime, I guess change data capture also super impactful, but parallel runtime from the outside sounds like one of those nice, okay, that's cool. You now have a parallel runtime. Shrug. Why do I care? Okay, we know why you care. Because if you're running particular queries that can be paralyzed, it's going to be much more efficient, it's going to be awesome and faster in lots of ways. But there's also this dimension where this simple like, okay, you don't change anything in your Cypher queries, they all just magically become faster, which is great. So from the user's perspective, you don't have to do anything to take advantage of it.

And parallel runtime is one of those things... Okay, it's not quite the same as AI, but AI has been chugging along quietly for the last 30 plus years and it's been like, "Yeah, yeah, good luck with that stuff. It seems like a really hard problem. You're never going to get to the end of it." And then bang, it lands in ChatGPT and the world is like, "Oh, apparently that's a thing now. This actually exists." Parallel runtime has been a hard problem to solve well. It's not easy to just do that. That's been at least five-ish years, maybe longer of engineering that has been building and building towards this thing. So there's been elements of it always with Neo4j, but it's been a long road with a lot of people doing hard work, like get to this point where like now, "Oh, here's a thing we just did that's going to be invisible." All these engineers spending all this time solving hard problems, for something that for the user ends up being kind of invisible, but just like, "Oh, thanks. Everything's better now."

So shout out to the engineers. Thank you for all the work that's gone into that. We see, we know the work that's gone into that, people who are using it for GenAI, you're going to want to upgrade just to get the parallel runtime. I think it's awesome.

Jennifer Reif:
Yeah. And I did want to make just a quick call out. We were just talking about this before the session started, but our keynote speaker was fabulous. It was awesome to listen to her and even though we don't have recordings of that to distribute, she has a lot of content out there, including I think regular podcasts that she's part of talking about data. So even if you didn't catch the keynote, be sure to check out her stuff. She's a fabulous speaker, really engaging, talks a lot about data and just some of the really cool things that are going on in the world surrounding data and its analysis. So definitely check that out. I want to give a quick shout out to that before I charge on. The other thing that I really enjoyed throughout the day, I hopped back and forth between the tracks, even inside the Americas segment and some of the others as well.

But there were two cybersecurity I guess talks that ran, which I think is relatively new. I haven't seen a whole lot of cybersecurity stuff. We've got a little bit of content here and there popping up, but there were two, and they were both really interesting. One was looking at dark web analysis, so pulling in data from the dark web into Neo4j and doing some really cool things on that for research. I believe it was at a university. Anyway, really interesting session there. Kind of neat to see some of the tools they use as well as things they're trying to do with it and all that. And then the other one was incident response. Again, pouring the data into Neo4j and figuring out how to respond to cybersecurity events. And that was really neat as well, just getting an in-depth look at their use case and what they were doing with it. So those were two, I would say, less common things that popped up and I was really excited to see and really intrigued to hear their stories.

ABK:
That's really fantastic. So just as you say, I don't think I've delved into that aspect of people using Neo4j for those kinds of things before, cybersecurity. And that's part of again, all of the breadth, everyone who's speaking across a broad range, the depth that they got into. I didn't catch those sessions, but it sounds like that's kind of typical. It's like, "Okay, I haven't thought about that before." But other people certainly have, super smart people have, and they're using Neo4j to good effect in cybersecurity. That's awesome.

Jennifer Reif:
Yeah.

Jason Koo:
I don't think I caught any of the cybersecurity sessions, but there was one session I believe by Jim Thornton who used Neo4j to track his webpage that this company runs. So most of us think a webpage's kind of this sort of hierarchal or page to page journey, but he was saying, "Actually, it's more like a graph because each page links to so many other pages, but different parts of it link to different parts of other pages. And so when you map it out into this networked web, you get a better sense of exactly what you entire website looks like." And he had gone on and built a tool, I believe, that looked at page rerouting. So when you had a deprecated page and it was forwarded to another. And so he had all these tables and this interconnected web of what old pages had... Where they were funneling, and then using that as part of discovering truly what the user journey was over time and what was driving the most viewership on certain pages. So I thought that was a great way of looking at it all.

Jennifer Reif:
I remember seeing that session on the schedule and thinking, "Oh, that looks so interesting." And that was one I didn't get to listen to. So I guess I'll have to wait for the recordings to come out.

Jason Koo:
Yeah, definitely check that out. And that was definitely one where I was like, "Wait, wait, wait. You could talk for another hour." Because he's going very fast through all the slides and stuff, and it's like, this is highly, highly interesting. And completely different, but there was one session by an instructor out of China and she was talking about how she was using graph data modeling to teach the students, and she had these great images of the students with these giant vinyl sticker nodes and relationships, and they were putting up on the whiteboard, blackboard, so they were physically noodling data modeling before she transitioned them to working on an actual digital system to work out some problems.

ABK:
Oh, cool.

Jason Koo:
So that experience in her learnings were also really interesting to watch.

Jennifer Reif:
That's really cool.

ABK:
Hands-on with the graph, physical nodes, this goes here.

Jason Koo:
Yes.

ABK:
That's awesome.

Jason Koo:
Didn't we use to have a fridge magnets or something that...

ABK:
We did. Yeah, one of our developer relations conferences or sessions or I don't remember what it was, but we definitely had a sheet where you could break out of the sheet different arrows and different nodes and then you could assemble your own graph out of it. And each of those things were magnets.

Jennifer Reif:
Oh, that's cool. I didn't know about that. Now I feel like we need to revive that.

Jason Koo:
Yeah, we got to bring that back. I need some of those.

ABK:
Yeah.

Jennifer Reif:
Along that same thread where you're talking about education, there was one, I believe PhD student that... Totally new speaker, and she even said at the very end, she's like, "This is my first time speaking. Thanks for listening." And all that, but she wrote a TypeScript integration for Prisma or is working on it. It's still in progress, and she was just showcasing what she'd done so far. But I thought it was really neat, first of all, that she had developed this basically side project and then secondly, that she was getting out for the first time and speaking about it to such a huge audience.

And what may seem to some be very intimidating, you're getting up in front of possibly thousands of people, you've never spoken before, this is a new project for you, it's not quite 100% completed or finished in her eyes, and to get up there and present and showcase what she had and for people to really encourage her and to thank her for doing this, I think was really neat and just really inspiring and encouraging. That this is why we have events like this, to encourage people and to help them and show them that yes, please do build these things. People need this, and we'd love to see those community projects out there. So I came away from that session, maybe... I'm not a TypeScript person or a Prisma person or anything like that, but walking away from that session going, "Wow, this is really cool." And I'm so excited to see this vibrancy in the community and the inspiration going on. It makes me want to do more.

Jason Koo:
Nice.

ABK:
Totally.

Jason Koo:
It's amazing. Speaking of, I guess community projects, that might be a good segue way into our community project of the month. There's a project called AD Miner, which was just put out recently. And so this is in the cybersecurity space, like you had mentioned earlier. And so what AD Miner is... Well, I haven't run it myself, but just looking at the repo and looking at some of the resources on it, it's this great visualization tool for tracking cybersecurity threats. So it stacks on a couple of different things, BloodHound and I think another piece of technology.

But anyways, so not only can you graphically look at some of the threat assessment, but it's also got this card, tabular view of a bunch of charts that shows the status of various systems over time. So you could see if one part of your system was, I don't know, collapsing or becoming riskier than another, or if performance in one area was declining or getting better. So cybersecurity in general is outside of my wheelhouse, but still, if you take a look at the repo, which we'll have a link to, just the screen grabs alone, like, "Oh, this looks very cool." And you can see different pathways of how your systems are interconnected.

Jennifer Reif:
That's cool. Maybe this episode should be renamed to something about cybersecurity.

ABK:
Cybersecurity.

Jennifer Reif:
A lot of that coming out. Yeah, I'll definitely have to check that out.

Jason Koo:
Cool. Yeah, give it a go. I think it was... Let's see, this came out, the latest commit was last month, so it's a brand new project.

ABK:
You naturally distracted me, now I'm just looking through the GitHub repo. I'll get back into the conversation later. I'll be right back. It looks like a nice tool.

Jason Koo:
Yeah, at first I was like, "AD Miner?" I was like, "Is this some sort of Bitcoin thing?" And then afterwards, "Oh, okay. It's in cybersecurity space." Which actually reminds me, I didn't get to catch their session this time, but Chad from Intuit, I believe gave a talk on... So him and Zach, I believe have been working on the security audit tool that they use at Intuit. It's different than AD Miner, but it's basically a configuration based system for tracking all the dependencies across all their microservices, so that whenever there's a known vulnerability that comes up with one of the systems, they're immediately flagged and they can go talk to the implementation team and go like, "Hey, please update that dependency." And there's much more to their stuff, but that's the gist.

Jennifer Reif:
I was going to say, I might've been in that session actually. And now I'm trying to remember. I think I was. Yeah, yeah, yeah. That one was interesting.

Jason Koo:
So many sessions. Cool. Okay. It's community projects. Let's see, product updates. So-

ABK:
Oh, wait, wait. We were going to do the tools of the month as well.

Jason Koo:
Oh, yes. Do we want to do that now? Yeah, let's do that now. Okay. All right. Jen, what's your tool of the month?

Jennifer Reif:
Okay, I'm going with Spring AI. So my session at NODES, we did some analysis, we looked a little bit at Bloom with Neo4j and some other things, and we kind of thought when all this GenAI and GenAI Stack and everything kind of broke loose a little bit ago, like, "Hey, I wonder if we could use Spring AI." Because that was really, really new. I want to say they released it relatively recently. And then I think our Spring Data Neo4j team was working on some things, not too long before NODES, to integrate with that and so on. So we actually pulled in a little bit of Spring AI. It's still very rudimentary. It's in beta phases, so it's not like a solid out of the box support just yet, but we just played around with it and dipped our toes into it.

And some pretty neat things, I definitely want to dive more into the Retrieval Augmented Generation, the RAG sort of things, which is what we did by pulling data from Neo4j and sending it over by prompt stuffing to the LLM Spring AI and then getting responses back based on that data. But I want to do a little bit more with the vector side of Neo4j and some other things that are involved there, but yeah, just some really neat things that I'm starting to interact with and get involved with, and it's just really neat to explore. So that's mine for the month. Spring AI.

ABK:
So Jenn, Spring AI has been on one of my tabs open. It's like, okay, that's something I got to get back to and take a look at.

Jennifer Reif:
Yeah.

ABK:
So is this the Java version of what... Most of AI stuff is happening at Python [inaudible 00:23:18], right?

Jennifer Reif:
Right.

ABK:
So this is the Java version of either LangChain or LlamaIndex, is it comparable to that or?

Jennifer Reif:
Yeah, I think so. I think it's supposed to be a drop-in for those Spring users who already have the Spring Boot projects and all that. It's just another dependency you drop in, you get access. They have implementations of OpenAI as well as Azure OpenAI, so both of those are supported, but you get access to that and it's just through their... They've just got a Spring, I guess packaging is the way you might want to frame it. But yeah, it's I think like a drop-in replacement for LangChain or some of the others out there. And all Java based, so you're interacting in a Spring Boot app. It's very familiar for those Spring developers to utilize it that way.

ABK:
Nice.

Jennifer Reif:
And they've got a few vector databases that are already up, Neo4j being one of them that you can use with Spring AI. And there's a couple others, and I think they're supposed to be integrating some others as well, but their documentation is not quite filled in yet, so there's not a whole lot of deep dive or overview segments on all the different topics and stuff. But yeah, they're getting rolling with it, so it's pretty cool.

ABK:
Completely fresh, new, like everything else.

Jennifer Reif:
We're all learning together.

ABK:
Live code commits are happening now, breaking changes on a regular basis. It's kind of fun. Carrying forward, of course, the AI theme, and for me it's still the GenAI Stack and one part of the GenAI Stack that we had. Of course, this is the Docker containers with Ollama for local LLMs if you want to. And then LangChain is where we were just doing a lot of the work for building the app itself, so LangChain being a Python based version of Spring AI... Or no, whichever way you want to say that, whoever came first, okay. They're in that same territory, here's abstracting out the data sources and interfacing to the LLMs. And along with that, what happens is if you've done any of this stuff by hand, you've working with ChatGPT and this whole idea of prompt engineering and people getting their own expertise, which is super critical.

You have to learn, if you're talking straight to the LLM, sure you can just have a chat with it and build a relationship and become its friend. Sure, you can do that. But if you're getting it to actually do work for you, then it's not always so simple. And so there's these patterns that emerge and you've realized, okay, you've got to set it up like, "Hey, you're an expert on European history." Whatever it might be, you've got to help the LLM along and coach it. And so these interface players like Spring AI and LlamaIndex and LangChain kind of have best practices. Here's the patterns that emerge from doing that. Here's how you want to do the prompt stuffing as you said it, Jennifer. And so along with that though, okay, so the framework is taking care of some of the... Okay, it'll take care of creating the prompt, it'll stuff in what is necessary for the context, the user input, and construct it all in a way that the LLM knows how to deal with it.

Well, if you're just using the APIs, it just calls something with your results. Even if you're doing RAG, at least in LangChain, you just hand it over to the LangChain and LangChain just makes magic happen and then a result pops out. "Okay, what the heck just happened?" Is one of the big challenges. So one of the tools that I like that's still I think in beta, it's not generally available quite yet, is from the LangChain folks. They realized that's an issue. And so they have a tool called LangSmith that they're working on, which is like a debugging tool for doing RAG and other patterns of interacting with the LLM, so that once you've set stuff up, you basically have the ability to trace from the initial like, okay, here's the question. "Hey, I need some embeddings for this, so can you help me with that?"

And then, okay, here's the original user question. We've combined it with a vector lookup and then maybe a graph database query, all of that. There's different things that LangChain is taken care of. And in LangSmith, you can actually watch the flow and you can dig into the details. So it's like... Okay, I was going to say like looking through a stack trace or something, but that's not quite right. But you get to see when something goes wrong, you're not sure why the answer came out. You get some visibility into like, okay, here's what was actually sent over to the model. Here's what the model gave you back, and if there's a couple of steps you can follow along there. Because you still need to do some level of debugging, because even though LangChain helps you out with all that stuff, sometimes things aren't quite going the way you want to. So you have to understand, well why? And it's all behind this magic curtain right now.

Jennifer Reif:
It's opening that black box a little bit.

ABK:
It is, it is. Totally right. And it's funny, it reminded me a little bit of some earlier days of Spring, sometimes with the annotation based programming, I'm going to call it. Magic happens here because you've annotated, you sometimes have no idea what the heck just happened, and you have to be really careful about how you're actually orchestrating stuff, even just in code. And so of course it happens with LangChain and large language model interactions as well. So LangSmith from the LangChain folks, my current tool of the month that I love notably has replaced Arrows in my heart.

Jennifer Reif:
For this month.

ABK:
For this month. Arrows might come back next month. I might realize, "Actually, Arrows is the best way to do all of this stuff."

Jennifer Reif:
Open the black box of LLM for you.

Jason Koo:
That's fantastic. And for LangSmith, it's a signup process right now. It's not to think of public general release. So you got to sign up and when you are given a key to the golden garden, then you can start using it.

Cool. Also GenAI related, my favorite tool of the month is Cody AI. So Cody is basically... Similar to GitHub Copilot, their Visual Studio Code interface integration I thought was really good. So I actually switched off my GitHub Copilot temporarily to try out Cody. You could probably run both simultaneously, but I figure one AI system at a time is best. And so it's like Copilot, it'll give you code completion suggestions, but it also gives you inline chat ability. So you could click on a line of code or you could highlight a function and then go, "Hey, Cody, give me a better version of this or check it for code smells." Or what I use Cody a lot for is to generate my unit test for me, because as much as I keep telling myself like, "Okay, I'm going to go back to TDD, I'm going to build my test first and then build my function."

Unfortunately, most developers, 80% of the time, it's like, okay, I'm not even sure what I want this function to do yet, so I'm just going to build it out and then once it's all flushed out, it's like, okay, this is what I want these function or these now three functions to do. Just click on the function or the whole class, the whole file and go like, "Okay, Cody, write me a bunch of unit tests." And the unit tests are pretty good. There's some edge cases that'll miss but in general, of the different AI tools that I've used, the unit tests that Cody produces are really the closest to, okay, at least you got the name of the test and what the test is doing. Okay, yes, I should totally test for that thing. So I can go in and tweak the actual test, but at least it got all the high level things that I should look for. So if you're not already using some sort of AI assistant while coding, I do recommend Cody or GitHub Copilot, both of them I've found success with.

ABK:
Is Cody independent or is that from Microsoft or somebody or where's that from?

Jason Koo:
It's built by Sourcegraph. And if I remember right, does Sourcegraph make... Sourcegraph has been around a while.

ABK:
Yeah.

Jason Koo:
So yeah, if you look them up, you have to look for Sourcegraph Cody, because if you look for just Cody, at least the last couple of times I've looked, the AI doesn't show up, but if you put Sourcegraph Cody, then it immediately pops up.

ABK:
Might be a lot about people in Montana who ride horses.

Jason Koo:
So one big thing between Copilot and Cody for anyone looking, so Copilot you need to pay for, Cody right now is in an open beta, so it's completely free. And so far I've not hit a limit on just random questions I've asked it. So yeah, if you're looking to try out a tool that doesn't cost you anything, Cody is definitely something to take a look at.

ABK:
That sounds really nice.

Jennifer Reif:
Yeah, I might have to check that out too. I like this tools of the month section, we always come up with several things. It's like, "Oh, I haven't heard of that. I need to look at that." Of course, that means my list of things to look at is getting longer and longer and longer.

Jason Koo:
Right. Especially now, there's so many tools popping out. The second runner up for my list this month was a service called Make, which is this sort of API integration, ETL tool. And so it is graphically built, so you create like, okay, here's an input source, API sending to whatever your target source is. And when you are mapping, this is my favorite part, because I've been looking for a tool like this forever, is mapping the data from the input to the output integration. So it'll hit the endpoint and give you all the key values from the payload or even the header and you can click and drag that property over to your output and you let go and it'll just map it. And so when you run it, that value from the input will go to whatever your target output is.

Jennifer Reif:
Oh, that's nice.

ABK:
Yeah. Nice.

Jason Koo:
Right, right. It was like, "Oh."

Jennifer Reif:
I mean it does do some auto mapping for you, but you can customize that a little-

Jason Koo:
Map function, string function, so you can do a good amount of ETL on that data as it's coming through. You can even split data. So they have this router thing, so when data's coming in, you can route it to multiple endpoints. The visual execution of this, I was like, "Oh, this is way better than programming a very brittle integration that only works for as long as either side has not updated their APIs."

ABK:
Right, right.

Jennifer Reif:
Yeah, that's cool. I'm going to have check that out too. Add it all on my list.

Jason Koo:
The list. The list just keeps getting bigger.

Jennifer Reif:
Yeah.

Jason Koo:
Cool. Okay. Okay, we'll switch over to product updates. We talked a bit about this already earlier when we were talking about Sudhir's session at NODES. So the big ones that we didn't mention actually was a vector search. So that came out recently and in a nutshell, I've been describing it as, you can take information in a graph database and basically pull... It's not quite the right analogy, but you can pull a vector representation of it and then use that in a different system. So you could pull that and use it, if you wanted to, put it into a vector database and have this sort of two different DB stack to power your system. Is that the best way to describe it, ABK? I don't know.

ABK:
It's really one way to think about it. I think the way I've started to think about these things, and I think keying off the index aspect of it, you can think about it similarly to how you think about Lucene indexing or even spatial indexing for that matter, that if you think about the vectors themselves, that's just a new data type and putting aside all the machine learning stuff, it's just, here's an array of numbers and you can stick these numbers into an index, and what it creates is this vector space. And so what a vector space is, and the analogy I've been using there to think about what that is. And so putting aside LLMs, all that stuff, and even like [inaudible 00:35:16], if you think about colors, we're all pretty familiar with color spaces. We know what RGB is, CMYK, there's all these different HSL, there's these different representations of colors, and each of those different representations are just vectors.

It's like, oh, okay, here's three numbers. An R number, a G number and a B number, that's going to place this color into a space, a vector space. And the beauty of the vector index is it says, okay, great, all the things that you know about, it's not a contiguous space. So we think about colors being contiguous, you can tweak one number and you get a slightly different color, that's fine. But if you instead think about all the colors as being swatches from color palettes? Okay, this Benjamin Moore paint bucket color orange has a particular color. Okay, turn that into the numbers, put that into the vector database, and then okay, if you have some other color that has its same... Comes from a different manufacturer or whatever it is, but ends up being the same color, you want to know how similar are these things.

So because they're all just been turned into just numbers, you can visually try to identify them, of course, but the database now knows. The database can now do a lookup because this set of numbers is these three RGB numbers, and these other three RGB numbers are very close in the color space. So we've seen color space diagrams before when you do the color picker, that's the same thing as being in the vector space because now you can compute the differences. So you know that those two numbers are actually very similar or those two vectors are very similar, and that's how you find similar vectors. And then once you realize that you can get similar vectors, whether it's colors or you've taken a document and turned it into a vector, pulling out entities, pulling out meanings, pulling out structure, all those things, each of those things end up turning into a number.

You get a bunch of arrays of numbers, stick that into the same index. This index doesn't care what you've given it as long as you've given it arrays of numbers. It's good at figuring out which arrays of numbers are similar to the other arrays of numbers. So winding all the way back to why this matters now for language models, it's because when we talk about large language model embeddings, the embedding aspect is exactly this. That's the, "I see the color orange. Give me the numbers that represent that orange." That's the embedding aspect right there. Instead it's for language. So okay, "Here's a PDF, turn that into a bunch of numbers. Great, thank you." That's the embedding, that's the vector number. And then that goes into the vector index. So that's what lets us do all this fun lookup stuff. And the reason that I say it's nice to focus on the indexing aspect of it is that the logical operation is different.

That's the kind of physical operation of what's happening. The implementation detail is you've got some numbers, you can compare them into some other numbers in different ways and decide how close they are. But then logically what you're doing is, because you're doing semantic similarity. Hopefully if you've done a good job of representing those, the document or whatever it is, that you'll have things that are semantically similar when you ask for things out of the index. And so on the logical level, all you're saying is, "Hey, I've got this question." For instance, in the Q&A kind of context, you embed the question and that question, should you end up in the color space or the vector space near the answers, you're like, okay, just give me the things that are near that, pop those back out, and then that's your answer. But the logical operation is just a semantic search. Of course, then-

Jennifer Reif:
One really cool example...

ABK:
Go ahead. Yeah.

Jennifer Reif:
One really cool example, I actually saw this that solidified this in my mind was that you can calculate, let's say countries, languages, et cetera. And the example they said was, you can now start aggregating and subtracting and calculating based on these vectors. So for instance, if you had Germany and France and Paris, and you calculate all those vectors and you say, okay, I have Germany and France and Paris, take France and... Or take, I think it was Germany and subtract Paris or something like that. What do you get? You get Berlin, because okay, now you have Paris, France, and you have Berlin, Germany. It must've been maybe France minus Paris, and now do the same thing with Germany, et cetera. But anyway, it was really cool because now you...

One second. There we go, lost sound for a second.

Now we can start doing subtraction on entities that you would never be able to turn into a mathematical problem. But now that you're here calculating the string of floats that you can store in the database, you can now run these calculations and say, "Okay, what's Germany minus Paris? What am I left with?" You get Berlin or something like that. So it was just really interesting that you can turn these very non-mathematical things into mathematical problems. And that's helping like when you ask that question to the LLM, it knows, "Hey, okay, I kind of know this trajectory because I've already got the pattern of city, country, and now when you give me another country, I can look for the capital city there." So just really neat, assimilating that to something I guess that's a little bit more human sensible.

ABK:
And now that we've got this Neo4j as an index, what's awesome is you can do all those things, but then you still have all the rest of your data. It's like, you know how when we do pattern matching in Neo4j, we love to anchor our patterns. Okay, you know a place you're starting or maybe a place you want to end, and now you can get those kinds of places you want to start and end from the vector index. So rather than knowing exactly where you're starting like, "Okay, I'm starting over here, but then I kind of want to get things related to that." Which will even then help you refine those answers you might've gotten out the vector database or the vector index. So you might've gotten things that are great candidates for the answer, but then now that you've got factual information in the database, the graph, you can then say, okay, yeah, these things, I understand why they're calculated as being similar or semantically meaningful, but we know that actually some of them are more important than others.

We know that... I don't know, what's the classic recommendation examples? You're looking for a place to eat and you can just get all the recommendations for, "What's the best place to eat near me?" Okay, there's that, but then, "Where's the best place to eat near me, for me?" I don't care if the rest of the world says McDonald's is the best place to go to, that's a bad answer for me. So do not tell me, dear LLM, do not tell me I should go to McDonald's. But the way to solve that then is by having knowledge about yourself, and that's what you get out of the knowledge graph, because then you get it personalized and customized to the context.

Jennifer Reif:
Yeah.

ABK:
Sorry Jason, that was a huge sidebar. You're the first bullet of your product updates.

Jason Koo:
Yeah. No, it was great. It was great. Okay. But yeah, we'll have to speed this up [inaudible 00:42:02] the rest of... So, okay, so parallel runtime, which we did talk about a little bit before, but also just to put this into context, I think all of us know what parallel and what multi-threaded means. So in the context of Neo4j. So there are three main elements. There's the defining, what is the word? Query parser. So when you put in a query like Cypher, there's a system that parses your command. Then there's the optimizer, which creates the plan for deciding how to hit the database. And then the runtime is the actual execution part. So when we talk about parallel runtime, this is one of several runtime options available through Neo4j. The other being I think slotted, which is used by Community and pipeline, which is the default for Aura and Enterprise currently.

Jennifer Reif:
Isn't there a cost runtime too?

Jason Koo:
A cost runtime?

Jennifer Reif:
Yeah. I thought there was a third one, but maybe I'm misremembering that.

Jason Koo:
There was an old interpreted runtime, which I think was the very first one...

Jennifer Reif:
It might've been that.

Jason Koo:
Which I think the slotted runtime is also interpreted, but it's a more efficient one.

Jennifer Reif:
Okay.

ABK:
And I think it does a bit of cost-based analysis. I think you're right. So it could be one aspect of the slotted.

Jennifer Reif:
Okay.

Jason Koo:
Okay, so that's parallel runtime. Change data capture. Hold on to this one really quick as well too. I think for developers this is kind equivalent to a outbound webhook being fired from a system. So with CDC, you can track two types of change. So anytime one node has updates, I think the entire data payload related to that node will be sent off with the CDC trigger or event. And then you can also choose, which I think is cool, you can choose a diff. So only the change in that node is sent over, is streamed off. So based on either system, you could build an interesting tracking system. Using the diff obviously is going to be more performant, more efficient, but you got options. So that's CDC in general. I think it's already built into the Kafka connector, so if you're using that, you don't have to enable anything.

Okay. Some GDS updates, which since we don't have Allison, I don't know if these are recent ones or just been around for a little while that we haven't talked about. One is finding longest path and the other is topological sort. I'm not going to go into either, since this is... I don't know. Okay. Yeah, we won't spend too much time. So those are two updates that you may want to take a look at and we'll put links in if you want to jump into those. Okay. Next section. We got quite a few articles that were released recently. Jen, I don't know if you took a look at these earlier. Are there maybe just three that really jumped out at you that people should look at?

Jennifer Reif:
Well, we did already talk about the GenAI Stack. So the article and getting the behind the scenes, how to build it, how it works a little bit. It's not super dive deep walkthrough kind of stuff, but very technical, walks you through the Stack on what it all is assembled and where to find the resources and get ahold of it. So there's that out there. Some of the other stuff, I haven't dug into that, but there's also a new parallel runtime article sitting out there on Medium. So if you haven't checked that out yet, feel free if you want to dig into the details on parallel runtime and such, there's that out there. There's also an article out for a DevTools update, workspace, multi-DB, editors. There's an update on Needle, which I believe is also a Neo4j thing, correct?

Jason Koo:
Design system, right.

Jennifer Reif:
Right, right. Okay. Yeah, so many terms throwing around. So that's out there. That's Neo4j specific, if you're interested to see some of the details there. And then there's some stuff on LangChain Neo4j for enhanced QA. There's a GraphQL development best practices article out there and some other things that I'm sure are sitting out there as well.

Jason Koo:
Again, harking back to NODES. The NODES playlist is available on our YouTube channel. It has just a few key videos that were streamed live during the event currently up there. But as we process the rest of the recordings and the rest of the sessions, they'll all be put into that playlist. So definitely check out that if you are wanting to see a recap of the conference. And upcoming, we've got quite a lot of events coming up in November.

Jennifer Reif:
Fall is always busy conference season.

Jason Koo:
Go ahead.

Jennifer Reif:
So the conference in London, yeah, for November 1st, and then it also is on the 2nd as well. The perfect couple: Uniting Large Language Models and Knowledge Graphs for Enhanced Knowledge Representation. And then on November 2nd, I'm actually hosting along with Alex Ertel and a couple of guests. We're talking about test containers with Neo4j, so that'll be a really fun that's coming up on November 2nd, again. And then a hands-on lab in Munich, on Google Cloud Generative AI Hands-on Lab with Neo4j. So that's in Munich on November 8th and then also on November 8th, but on the opposite side of the pond, in Texas, USA is a conference and a session. There's Migrate, Modernize, and Transform Your Databases with Google Cloud. Again, November 8th there. On November 9th, there's a GraphTalk Munich on the topic of Pharma. So if you're interested in that content, check that out. On November 14th, there's a couple of conferences, Scale by the Bay in California, USA and I've done a virtual event there.

That's a good conference, but I think it's a in-person conference this year. And then a Neo4j GraphSummit London. So we go all the way from California back to London here on November 14th. And so check those out. November 16th, there's a meetup for Graph Database Sydney, and I'm not sure what the topic is on that one, I just had a placeholder for the event. So if you're curious and in the area, check that one out. November 17th, we're at DevFest Singapore, and the topic there is GenAI and Graphs with Google Cloud. And then there's also a meetup on November 17th in Delhi, India, or just outside of Delhi, The Data & GenAI Nexus. So check those out if you're in the APAC region. November 21st is GraphSummit Paris, and then November 23rd is GraphTalk Barcelona. So back into Europe there for the end of November.

And then on November 27th, so closing out the month, we have AWS re:Invent in Las Vegas. So that's a massive event. So that'll be in Vegas this year. And then FIMA in Europe, that's out of London. I did actually, when I clicked the link on this one, they have a US version as well as a Europe version. So this is the Europe, it is out of London. So if you're curious with that one, November 27th in London for that. And I think that closes us out for events.

ABK:
Jenn, what's that acronym, FIMA? I don't know that one.

Jennifer Reif:
I had to look that up. It was...

ABK:
Federal... Something, something?

Jennifer Reif:
It is. Well, yeah, data management.

ABK:
I'm thinking of the US FIMA.

Jennifer Reif:
Financial Data Innovation, but that's not FIMA.

Jason Koo:
Yeah, so FIMA is spelled F-I-M-A for those listening, not F-E-M-A, which is the...

ABK:
Not Federal Emergency Management in the US, sorry. Okay. It's not that. Okay.

Jennifer Reif:
Yeah, on their website, they don't even list it out. They say F-I-M-A, FIMA, and then their little tagline is Financial Data Innovation. They don't explain it.

ABK:
[inaudible 00:49:37].

Jennifer Reif:
Data management committee. Financial Information Management. There it is. Had to go into their about section. So yeah, so it'll be a financial data conference there.

ABK:
Cool.

Jennifer Reif:
Cool.

Jason Koo:
Nice. Thank you, Jen.

Jennifer Reif:
So lots of goodies going out from November. You won't stay bored. I know I'll be sitting in front of that NODES playlist waiting for the videos to trickle out so that I can watch, catch up on everything that I missed at the end of October.

ABK:
What do you think, guys? Should we talk another half hour or so about GenAI and vector indexes?

Jennifer Reif:
We probably could.

ABK:
We could just keep going. Maybe more next month.

Jennifer Reif:
Yeah, next month. All right. Well, hopefully we'll talk to you all very soon in another installment of GraphStuff. But until then, enjoy all the graphy content coming your way.

Jason Koo:
Thank you, everyone.

ABK:
Happy Halloween.