GraphStuff.FM: The Neo4j Graph Database Developer Podcast

Pragmatic Knowledge Graphs with Ashleigh Faith

Episode Summary

This episode our special guest is Ashleigh Faith, who's been hosting a Knowledge Graph (KG) focused Youtube channel for nearly half a decade. She shares her unconventional journey, starting from academic search with Machine Learning (ML). She shares hard lessons learned while implementing KGs as a consultant, and offers tips and tricks for desiging data models and using KGs with Large Language Models (LLMs). To close out, we'll cover our favorite tools of the month and highlight events we're planning on attending in the near future.

Episode Notes

Speaker Resources:

Ashleigh’s YT Channel: https://www.youtube.com/@AshleighFaith

Tools of the Month:

Abk: Any Airline using Windows 3.1 😆
Ashleigh: Zentity: https://zentity.io/
Jason: Cypher Co-pilot: https://neo4j.com/developer-blog/cypher-co-pilot/
Alison: System https://www.system.com

Announcements / News:

Articles:
- Turning Your Tabular Data Into a Graph Using Cypher https://neo4j.com/developer-blog/scoobygraph-1/
- Typing the Neo4j Query API https://neo4j.com/developer-blog/neo4j-query-api/
- Empowering Open-Source Cyber Threat Intelligence Analysis With Graph Visualization https://neo4j.com/developer-blog/cyber-threat-intelligence-analysis/
- The Neo4j GenAI Package for Python https://neo4j.com/developer-blog/neo4j-genai-package-python/
- Integrating unstructed.io with Neo4j AuraDB to Build a Document Knowledge Graph https://neo4j.com/developer-blog/document-knowledge-graph-unstructuredio/
- Creating a Graph of Chemical Reactions https://neo4j.com/developer-blog/chemical-reactions-graph/
- Introducing Your New Cypher Co-Pilot https://neo4j.com/developer-blog/cypher-co-pilot/
- Implementing 'From Local to Gloabl' GraphRAG with Neo4j and LangChain: Constructing the Graph https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/
- Graph App Restrictions on Neo4j Descktop 1.6.0 https://neo4j.com/developer-blog/neo4j-desktop-graph-app-restrictions/
- GraphRAG in (Almost) Pure Cypher https://neo4j.com/developer-blog/graphrag-pure-cypher/
Videos:
- NODES 2023 playlist https://youtube.com/playlist?list=PL9Hl4pk2FsvUu4hzyhWed8Avu5nSUXYrb&si=8_0sYVRYz8CqqdIc
Events
- (Aug 6) Livestream (virtual) Graph-Powered Code Debugging with GenAI https://www.youtube.com/live/o2eQ6GBecgg
- (Aug 7) Workshop (Melbourne, Australia) https://go.neo4j.com/LE240808Neo4jandGCPGenerativeAIWorkshop-Melbourne_Registration.html
- (Aug 14) Webinar (virtual) How a Property Graph Database Accelerates GenAI App Development https://go.neo4j.com/WBR-240815--Graph-Data-Model_Registration.html
- (Aug 15) Webinar (virtual) How a Property Graph Database Accelerates GenAI App Development https://go.neo4j.com/WBR-240815-Graph-Data-Model-EMEA_Registration.html
- (Aug 15) Conference (São Paulo, Brazil) AWS Summit https://neo4j.com/event/aws-summit-sao-paulo/
- (Aug 15) Meetup (Sunnyvale, California, USA) Build and Deploy GenAI RAG Applications and AI Agents using Google Cloud https://lu.ma/6cdht2un
- (Aug 20) Meetup (London, UK) Swifty GenAI - Chatting with Taylor Swift Lyrics https://www.meetup.com/graphdb-uk/events/302055475/
- (Aug 21) Conference (Canberra, Australia) Government GraphTalk https://go.neo4j.com/LE240822GovernmentGraphTalkCanberra_Registration.html
- (Aug 22) Workshop (Bengaluru, India) Neo4j and GCP Generative AI https://go.neo4j.com/LE240823Neo4jandGCPGenerativeAIWorkshop-BLR_Registration.html
- (Aug 26) Webinar (virtual) Explainable AI with Knowledge Graphs and RAG - Asia Pacific https://go.neo4j.com/WBR-240827--Neo4j-Graph-Rag---Asia-Pacific_Registration.html
- (Aug 27) Webinar (virtual) Explainable AI with Knowledge Graphs and RAG - Europe https://go.neo4j.com/WBR-240827-Neo4j-Graph-Rag-EMEA_Registration.html
- (Aug 27) Webinar (virtual) Explainable AI with Knowledge Graphs and RAG https://go.neo4j.com/WBR-240827-Neo4j-Graph-Rag_Registration.html
- (Aug 27) Meetup (Washington, DC, USA) LLMs, Vectors, Graph Databases and RAG in the Cloud https://lu.ma/mctijpjm
- (Sept 10) Meetup (San Francisco, CA, USA) AI Tools HackNight at Github https://lu.ma/ozt7jtq5
- (Oct 23-24) Conference (Bengaluru, India) Open Source India https://www.opensourceindia.in/
- (Nov 7) Conference (virtual) Nodes 2024 https://neo4j.registration.goldcast.io/events/03805ea9-fe3a-4cac-8c15-aa622666531a
- (Dec 11-13) Conference (London, UK) Connected Data London https://www.connected-data.london/

Episode Transcription

Jason Koo: Hello, welcome back everyone to another episode of Graph Stuff FM, a podcast all about graphs and graph related technologies. I am your host today, Jason Khoo, and I'm joined by my fellow advocates, Andreas Colliger and Alison Cassette. Good morning. Good day to you too. Good to see you, Jason. And I'd like to extend a super warm welcome to our special guest today, who is a Just well beloved awesome member of the graph community, Ashley Faith, Ashley, welcome.

Ashleigh Faith: Thank you very much. Happy to be here.

Jason Koo: Nice. Ashley, I like, I remember seeing some of your videos when I started at Neo4j a couple of years ago. so you've been doing YouTube videos on knowledge graphs for, I want to say just a little bit more than four years.

Ashleigh Faith: Yeah. That long. Doesn't feel like it, but yeah, that long.

Jason Koo: Nice. So what, what, so I'm curious to know, what is your origin story? Like, what is the thing that got you? into knowledge graphs.

Ashleigh Faith: Yeah. Well, you know, when I started into knowledge graphs, I, it was an accident, I think as most people probably got into it, but I was working in a academic search at the time. So at a publisher, they had a lot of content and they had company stuff that they, they wanted to, to sell so that people could.

You know, search for it. And I kept coming up against this issue because I was doing machine learning at the time. Not LLM machine learning. I told you how it was a ways back and I was working on auto classification and, you know, that kind of stuff. And I kept coming up against this problem where the users were using different words.

Was, on the metadata of of the content and I thought there's got to be a way to tie these things together more. And at the time, Knowledge graphs were not a thing that people really talked about, but I remember going to a conference and, the head scientist, for I think human, human resources, David Meza, which is a long time advocate of Neo4j, I was talking to him because I was in the same discipline as him in aerospace, and he's like, Hey, have you thought, have you thought about like a knowledge graph?

And I'm like, a what? And then he said, Yeah, that's me. I said, Can you spell that? I had no idea. So you all were actually the first ones I knew about. And, and so that's kind of where I started where I was really looking at a way to tie together a lot of the natural language that humans were using with a lot of the structured data that we were getting from full text and you know, the actual metadata on content and it just spiraled from there in a good way, right?

Because there's a lot of different applications for knowledge graphs that I came up against. I was in aerospace, so I got into a lot of cyber security and cyber physical systems. Transcribed I was not that surprised when I also found knowledge graphs there that were being used, got into some supply chain stuff because again, like, you know, vehicles generally was the content I was working on are also using supply chain and they showed up there.

And so I started to get a lot of, variety of knowledge graph exposure while I was, you know, in, in that role. And I really found. a passion for it, because I just felt like this is one way that you can tie so much information together and find things in a way that I've never seen before. And so many people were either intimidated or didn't know about it.

And I said, you know what, I want to learn more about this. So I went back, got my PhD focused on this and advanced semantics and started publishing and started eventually the YouTube channel because I wanted to help people. Teach people more about this thing that I thought was really cool. And it seems like a lot of people like it because I have over 6, 000 people that watch the channel a month now.

So that's insane. I want to not think about it or I won't make a video cause then I'll be too scared.

Jason Koo: Do you ever get a chance to like meet some of your fans who watch your videos? I mean, besides us,

Ashleigh Faith: Oh my goodness. Calling them fans is also like, Oh my goodness. Like, I don't know, like I'm, I'm so happy that I help people. Let's, let's put it that way. It is the most rewarding thing and my channel, the people that watch are so positive, you know, they, they come in and make like the nicest comments, people are saying, Oh my goodness, this is the first video that made.

sense to me over all the textbooks and all these other things that I've seen. And I'm like, Paul, thank you. I'm so happy that that, that was helpful. and I started the channel, in 2020. So I actually didn't get to meet a lot of people until recently. And so I will say I went to the knowledge graph conference this year, 2024.

And it was my first in person conference where I realized more people might know me than I know them because of the channel. And I thought, Oh no, that's, that's kind of, surreal. And so I show up and I did the normal conference thing where I'm like, Oh, hi, my name's Ashley. And they're like, I know. I'm like, Oh, I

Alison Cossette: will say I was at the Knowledge Croft conference and I was going to bring it up when you said, Oh, I don't know about fans.

Let me tell you definite fans. and to your point, like we were sitting at tables and I remember that moment when, you know, Ashley sat down and everybody was talking about how great your videos were. So your impact is not insignificant in the community. So,

Ashleigh Faith: Well, I'm, I'm glad that I'm making a positive difference out there.

that's all I really hope for. And again, like the people, I think just in the knowledge graph space in general are just nice people for the most part. I mean, there's always like one or two that were like, Oh my, but, for the most part, people are really supportive and nice and, and lovely. And I, I will have to say that for most people that I've met.

Talk to that watch the channel too.

Andreas Kollegger: That's awesome. I I have to ask a question then like as you mentioned it that for you know, your videos are amazing for people like Make sense to them. Suddenly, as they watch your videos, you explain things so well. And so I've got, I guess, a two part question. One, what do you think is behind the scary factor of knowledge graphs?

Because people are intimidated, right? And then kind of on the flip side of that, like, what's your go to? Hey, don't worry. Let me explain it to you. Kind of anecdote or metaphor or like how do you explain it? You know, make it nice for them.

Ashleigh Faith: Yeah, I think it just goes back to most people are familiar with rows and columns and when you start to show like these big hairball things, people are like, Oh, that's a no.

That's a no. I think it really scared. And I think that that's, that's one of my biggest, like, pieces of advice for folks that are trying to get stakeholders involved in knowledge graph is, I think a lot of people like us were like, Ooh, yeah, information data. I can see how that's really cool stuff, but that's not everyone.

And if you show that to stakeholders, they think, Oh, that's going to be expensive. Oh no. How do I use this? And Especially with the engineers that I've always, you know, known in my career, whether I work with them or, you know, consulting or whatever. It's, I remember this story that, that encapsulates this so, so well.

I was talking to an engineer about how we could use this to connect a lot of different data across different dispersed systems and how, you know, we could have one easy way to query across all of it. And, you know, we were talking about how to do that. And I remember him looking at me and saying, Ashley, What you're talking, because we were doing data transformation project at the same time.

He said, actually, what you're talking about is like building this really cool and amazing balcony off of a shack on the on the beach where it's kind of falling apart. So like, cool, I get it. But like, I got bigger problems right now. And, you know, this person, I think, because it's not at the time, at least it wasn't, Commonly taught.

It wasn't a pattern that was traditional to, you know, people that are used to sequel and, and typical engineering patterns that it was just like, okay, that's cool. But I got, I got bigger problems and that is too much to learn and how I help people get over that. I actually think, especially in this LLM space, while there's lots of other things we need to worry about and be cautious about, don't get me wrong.

One of the nicest things about it is. Like, even with like, let's say graph reg again, there's lots of disclaimers on that. but it's, it's helping break down the barriers like, oh, now I don't need to learn cypher. I can just use, I could just use the query language I'm used to. Oh, I don't need to worry. As much about how to design this, this graph of things and how everything is connected together.

I have assistance to help with that. There's so much more low code or no code ways to interact and build in the knowledge graph space now that it's not as scary. And I think it's just trying to meet people where they're at. Before it was like, Oh, no, I got to find, you know, if you're on the ontology side, there's, I got to go find somebody that knows how to do that.

Or, you know, I need to find somebody who understands this graph thing and how that stuff works. And there's a whole new set of tools that I have to figure out. I think now that we are. Kind of getting away from that a little bit, and we're kind of playing in the same space as everyone else is, it's become a whole lot less scary, so that's kind of the direction I go in now.

I could tell, I could tell you how I used to do it, but LLMs are here to stay, so I'm just gonna keep it the new way.

Alison Cossette: The other thing I've noticed, Ashley, and I don't know if you've seen this too, I feel like folks who come from the data science and analytics background, I find have an easier onboarding to graph than from an engineering perspective, simply because, you know, to us, it's just a different way of forming the data, right?

It's another sort of data type in a way. and as the role of the data scientist has evolved and graph rag has evolved, I was just curious, like what you're seeing as far as the interaction between developers and data scientists and graph and what that environment is looking like from your perspective.

Ashleigh Faith: Yeah. You know, it's, it's interesting because. I remember getting into my first analytical project. Like I was talking about search a little bit before and, you know, tying different data and all of all that stuff together. But on the analytics side, I remember getting into the 1st project and going in Tableau and there was no way to visualize a graph.

And I was like, what? That sucks, right? And then I had to go elsewhere, and hey, funny enough, I found Bloom. So, you know, there's, there's, there's that issue. I think that's not as much of an issue because he can't graph. are a little bit more mainstream, a lot of it more mainstream now. so you don't encounter that issue, but I think, what has helped is being able to actually crunch down what we're looking at so that the analysts can talk and the data scientists can talk to the engineers about like, why are we doing this?

And a really easy way to do it is, again, outside of the LLM space, which this is shameless plug for my, nodes conference. talk. We're going to talk about how you can use knowledge graph, of course, to do fact verification. And that's like a huge reason people are getting into it now from an LLM perspective.

But outside of that, you know, if you have a query that does multi multi hops and it goes past like three hops, you can show clearly. Okay, if I'm doing this query with my tables, here's a table. Here's a table. Here's a table. Here's all these areas of translation between one table another when I'm doing my joins.

I could get it wrong. Maybe I don't know the right things. And you kind of like start to build that, that narrative and then you show it in a graph and you get it in one hop and it's like, Oh, I think that's how the analyst is helping the rest of the, their, their community, not just the engineer is go, Oh, Oh, I get it.

I get it a little bit more now, right? So I think that's, that's where, it's helping translate between the groups and it's, it's helping. I think sometimes like the engineers, especially like data architects or enterprise architect are like, I have this knowledge. And then the data science is like, And I have this knowledge and it's, yes, you do.

And you all should get along and I feel like graph helps with that. now that again, it's, it's getting a little bit more mainstream. it's, it's helping that, that translation and that transparency. Right. Cause I think sometimes what the analysts and the data scientists were doing kind of seemed like magic and we all knew it wasn't magic, but now graphs actually helped to make it more explainable and, And then explainable queries, explainable algorithms, explainable AI and then passing that on to your stakeholders and your engineers and whoever else, it's like, Oh, I get it.

Okay, great. Even if it's not the way that you do it in production, right? Like, even just using the graph to explain what's happening in a visual way, because a lot of graph is visual, I think is really helpful.

Andreas Kollegger: So I love all that. I love that particular aspect of like when you're talking about the like an engineering mindset versus data science mindset, and it's the same data structure that you can look at as an analytical data structure, but you can also build an application with it and you can coexist. You don't have your own separate worlds.

It could all kind of come together, right?

Ashleigh Faith: Yeah. Yeah. I think that that that's a big part of it that, well, there's this misnomer. Like, I remember Yeah. When I first did the very first, graph project and that I did, and, there was the debate Rdf or triple, you know, triple stores or property graph. And, this is the, this.

I equate this to the same argument that was made like R versus Python. It's like, look, you, you all have your camps, but they can coexist and they are useful for, for different reasons. And sometimes, you know, they can, they can do things together. Same, same with this. Right. So I think that in the beginning it was well graph in general, not.

You know, picking sides on of anything, it's just too slow. It's too slow to do something at the edge. And it's too slow to, you know, get that transactional data going through and all these other things. Like, especially, you know, in the cyber security stuff I was working on, there was concerns that it wasn't going to be able to catch things in time.

And I think that. Now, especially with a lot more compute power. And of course, like, you know, getting better knowledge around how to structure your queries. And, you know, do you have to put the data at the edge? Or can it be like a pre processing thing? Like, there's all these other approaches, I think, that have come out that help you build better applications off of it.

And I think that has gotten a lot more adoption too, because it used to be that that was another part of the scary thing. It's like, Oh, yeah. Like looking across a whole graph, that's going to take forever. It's like, yeah, or you could have a named graph or you could have a persistent view or like, there's lots of other ways to do it that decrease the scariness.

Jason Koo: Ashley. So like in watching some of your videos, like you talk a lot about like, kind of like the quality of bringing in data and kind of putting a lot of attention on that sort of start of the ETL process. do you have any like general tips for folks who are first, you know, New to knowledge graphs and like, you know, wanting to import data into a graph, right?

Because it is, it can be quite daunting for someone who's coming from like a very tabular space to go like, okay, how do I even get my data into a graph? Like, what would be kind of your top tips for someone getting started in this space?

Ashleigh Faith: Yeah, well, you know, I always say that you do need to start to make sure, you know what a UID is like.

I know that's not always something that you need to have, you know, in a property graph, especially if you're spinning it up for analytics and then flushing it afterwards. But, you know, if you're going to do something where. You know, let's just take again the LLM example. Your LLM is coming in. It has entities.

It doesn't know if this is Michael Jordan or Michael B. Jordan has no idea. Right? And so your graph needs to be able to identify the right entity. So entity resolution is a huge key. and a lot of that resides in one doing entity resolution as you're, you're ETLing in or beforehand. If you have your data already in and you're transforming it into a graph, make sure you do that ETL.

you know, the, the transformation while you're, you're loading it into a graph format to make sure those entities are resolved. you know, I remember one of the earliest examples of me working in graph where I didn't think to ask this question, which was, are these truly, truly unique ideas? Because we were trying to do entity resolution amongst a lot of different databases of information on people, and these were like, You know, people in the world.

So not individuals. that didn't make any sense. So there are people that are like celebrities, like people out in the world that have a persona that you can see and identify rather than an individual that, you know, their data is very private. You don't want to like, you know, you can't want it. You don't want to mess around with inner database issues with that.

So, When we were doing this entity resolution to known people in the world, like a celebrity or somebody in the news, like a politician or something, We were bringing all these these ideas in and we were just using them as is as they came in. And what we started to realize where those ideas were not unique.

And so when we were doing the entity resolution, we were accidentally merging and deduping with wrong entities. And that screwed up so much of the system. It cost a lot of money to fix. And it was just because, you know, we had assumed we had asked the question, like, are these unique? And everyone's like, Yep, I will say please make sure.

Do a double check. That was a, that was a huge issue. but then there's also, you know, the things that you're, you're, bringing in from tables. I think some people will say, well, you know, I'll just have these, these entities and the edges will be my maybe column names or something. but sometimes you don't realize those, those columns are, multifaceted, right?

And so then your, your data gets a little wonky because you think it's an address, but really it. the person's name because maybe it was the mailing address instead of just a regular address. Right? So you really, really need to check what you're putting into the system as you go. I think that's, that's a big part of it.

And also making sure that all of your stakeholders are in agreement on what things mean, like the biggest debate known demand. What is a customer? How do you define it? How do you define it? How do I think of it? Every single person will tell you a different answer. And so, you know, if you ask nothing else, ask what does customer mean to you, to your stakeholders, and to their data, right?

Because everyone's making their own data. so you need to make sure that, like, okay, we're all using the same, maybe you don't need to agree on the same definition, but you need to agree what data represents the thing that you're trying to, to do in your graph. So, I think those are the biggest ones. I mean, anytime you have to deal with multiple graphs, that's the other one that has started to come up recently because a lot of people are using graph now.

And now you have to try to figure out, well, what's your, do you have a schema? What do you, what do you do with this data? How are you running your queries off of this? What do you do in these situations? even something as simple as what is an athlete? Is an athlete a role? Is it a type of person? I don't know, but you need to ask if you're looking at other people's data.

and it's like that, that stuff, it causes so much debate because. you have to maybe reconfigure a lot of your ETL based on that or, or your queries if, if you're going across multiple graphs at that point. so yeah, those are the, those are the big things that I would say are small questions that could lead to many, many headaches if not addressed.

Jason Koo: You mentioned like, you know, like data model. So like when I'm talking with folks, I am pretty sure what the rest of us too, like when someone's asking, like, how should I data model this? It's. Oftentimes we come back with a question like, well, you know, tell us more about your

Ashleigh Faith: so,

Jason Koo: you know, kind of putting you on the spot.

so, like, is that the same for you? Or do you have, like, this, this masterful, like, cheat sheet of, like, steps that you go through for, for kind of helping someone, you know, decide how to create a data box.

Ashleigh Faith: Yeah, no, that's a great question. so, yeah, the thing that I will say, and I see this all the time on my channel to where a lot of the examples that I use in, in the more how to like actually doing like one of my top videos is how do you take, these different data models, a taxonomy to a thesaurus to an ontology to a knowledge graph.

Knowledge graph here being like instances or more like a property graph kind of format. And it's one of the most popular videos because I literally go through the exact same data and how you change the data model for each of those. And I used an example in that example that I came up with from a lot of the experience that I have because I'm familiar with it and I know how to model it.

I always have people say, well, that's great, but I need it for my use case. I'm like, yes, but that's where you need like a consultant or like to, to like sit down and figure it out. You know, what, what do you need for your use And I know so many people get frustrated with that because a lot of people struggle with, with how to do this part.

even with graph rag, I mean, if you just let it go the way it is, don't get me wrong, it gives you great data and I'm super happy it exists and we can use it. It's good clay on the table, but at the end of the day, it doesn't know your data. It doesn't know how you think about your data and how you need to answer the questions that you're asking.

And so therefore you still need to model it and morph it a little bit to, to make it useful for your needs. I think that's where people struggle the most. So where would I start? I usually ask, what is the most important query that you have at your organization that you need to answer? What is the query that, you know, gives the most confidence in what your business is doing or makes the most money for you?

Whatever it might be your success metric, because if you get that one query right, and the data helps answer that, and you can show that it's being answered with higher quality, it's faster. you know, it consults better data, whatever again, metrics you're using to show your success. That's a good P. O. C.

It's gonna get you more resources to do what you need to do on a longer term. And I've always found that if you can model that thing, it kind of touches a lot of parts of your organization or a lot of parts of your data anyways. the other thing to keep in mind is sometimes you have to model wrong.

And I know that maybe it's a controversial statement, but here's, here's the problem is, I have seen this so many times where you talk to stakeholders and they're like, well, this is how we're doing this. And this is how we think about it. And you look at that and say, I wouldn't have modeled it that way.

Or, oh, that's not that's not correct. And I'm not saying just go with the flow. Like, sometimes you do have to educate and try to make it more efficient and more accurate and all of those things. But at the end of the day, some. Companies define customer, for instance, in a really weird and funky way, or there's a very strange reason, like when there was a, a consolidation of businesses, like maybe you were acquired, there's a weird and funky thing you have to do to kind of get around some of the weird data structures that existed from the previous company, you have to be able to accommodate some of that strangeness.

And that makes a lot of people who are doing data modeling uncomfortable. but again, if you try to recreate and fix the world, one, you will likely not get done. And two, the data isn't going to comply because it's already, you know, You know, got this long history of a different format or a different structure, and so then you're going to be fighting with that constantly, or you go back and do a data transformation project, which also cost time, money and energy.

So, you know, it's it's not me saying, go with the flow and and do something incorrect, but you do have to get a little comfortable with, let's say, creative ways of modeling things that maybe are textbook.

Alison Cossette: I have a question going back to graph rag. So when you look at graph rag, I think a lot of people at this point who have delved into it are comfortable with the concept of a graph and chunking. But when you look to the graph part of it, what do you find is, what would you recommend for someone who's newer?

Does it make more sense for them to try and tie it to something structured?

Would you say, let's actually use like an LLM to extract different types of relationships from the text itself. So when you're adding on the graph part to graph rag, which I guess the long way of asking a question, when you're adding on the graph part to graph rag, do you find that it's better to go with creating

graph structure from the unstructured text or bringing in something structured?

Ashleigh Faith: I think it depends on where you are in your maturity, right? So if you already have a graph. I think it's a good way to identify areas that you might want to expand the graph, right? So you already have a graph. You can map things from graph rag. Primary example is relations, right? The edges, to. to relations you already have.

And it's like, great. Okay. supporting evidence that this is a good way of modeling. We already have the data internally to support this. Now we have it from all this other stuff that we're bringing in. Great. but then you get suggestions, I think, from it where it's like, Oh, I did not have that relation.

I can see the value of that. I can see why that would be helpful. so I think it's a good way to get suggested, and expand the graph that you have to be even more specific or more nuanced. So, you know, in a certain area, but that's, if you have something, if you are starting from scratch and you do not have anything at all, I still think that it's useful, especially since In the grand scheme, the person doing graph rag is likely not the person who really understands deeply how your organization works, right?

I mean, sometimes if you're super lucky, if you're a startup and you have a unicorn somewhere, but chances are you need a subject matter expert in legal or finance or product, whatever it is that you're working on. and so what I have found is helpful is doing more of a rough sketch of how the space that you're like.

As an example, let's try. Let's say that you're trying to help your legal department find, you know, all of the data that you need to say that you are compliant in some regulation. That's out there. Let's just say that as an example, that is something that. you would want to want to sit down and say, Okay, what kind of data do we already have?

How do we and making sure that the SME of of that business use case is included? How do we think about this as an organization? How do we want to answer this question? Because if you don't do that part and you just do graph rag and you have the data scientist or the knowledge graph engineer or whoever it might be, They are still working in a vacuum at that point if they're starting to use that and then develop something out of it when they don't understand how the actual business is going to use that information.

Yeah, this is one of the biggest problems, by the way, in in graph is, all of us are very excited about data and I have a whole video on great. I have a graph. Now what? It's like, that's a problem. And don't get me wrong. I have been there so many times. It's like, I want to play. I want to play. And I want to see this.

And I want to see if I can model it this way. And I can see what kind of questions like, yay, exciting. But at the end of the day, you're doing this for a business. You're doing this for a startup or, You know, helping humanity. If you're doing something in environmental studies like it doesn't at the end of the day, you're building these things for a purpose, and you need to make sure that what you're building is fit for purpose.

And therefore, even if you're using graph rag to start everything, I think you still need to sit down and create not a model model, but how you're organized. Organization interprets that kind of information to answer the questions they're trying to answer with whatever you're supposed to be building.

Alison Cossette: Yeah, it reminds me so much of when the data scientist is told to go build a model that does X and they don't actually know how it fits into the business and knowing what that final application is, is going to change lots of decisions you make across that life cycle. You make a really, really good point.

And I love this idea of again tying the application builders not just to the data scientists, but also to the business and understanding what, what that really is and making sure that you've got the right combination of people and the right clarity of a mental model in order to build something that makes, you know,

Ashleigh Faith: Well, and even more important, I think, than that is understanding where the data can't be wrong or where it's very, very risky if the data was wrong, because a lot of the time, again, that data scientist maybe doesn't know that.

And that is so critical to building, you know, a model that can be trustworthy and that you can depend on. Because if you go in and even if you have that mental model, but you don't know that, Oh no, I didn't use the right column for customer. Oh no, that's doing this and this and this. And I didn't know about that.

especially anything that deals with people and their safety. And that would include like data privacy, I consider as part of safety. and anything that has to do with, and I mean, that would also be like supply chain, like the safety of people who have to depend on your product, or your services or whatever it is you're doing.

But also, anything that has to do with regulations and law and legal things, you, you really want to be careful and make sure that you really understand all the ins and outs of what is super sensitive. what are the things that you have to handle differently? Who should have access to what? and then if you get something, if you get one of these pieces of data wrong, what are the implications of that?

Because that helps. I think the data scientist and the engineer who is the knowledge graph engineer understand where they really need to focus on something. And maybe you just need to sit, have a think, have a coffee, have a think on that topic in your graph, because it's. So mission critical. And if you don't talk to stakeholders, you may have some inclination of that for sure.

But again, you want to always verify when you're dealing with these kind of things.

Andreas Kollegger: And when you're pulling in stakeholders for this kind of a conversation, is it, is it like a whiteboarding session, do you think? Or is it a, an interview? Like, Hey, just tell me your top 10 concerns or like, what's, what's the approach there?

Ashleigh Faith: Yeah. so normally what I do is the first thing I, I just. I don't do a regular interview, I just sit down and ask, like, okay, you know, what are, what are the implications of this? What does this thing do? The answers, you know, what are, why are these critical for us to answer now, you know, just the general, like, what is this thing?

And why are we doing it? But then after that, what I try to do is I identify where the data sources are. I try to figure out, like, what format they're in and, you know, all of that stuff that you have to figure out to understand, The level of effort for a project. But then once I like actually get into the project, that's where, I do, what's called a speak aloud where as the, and I have a video on my channel on this.

I think it's talking to, an aircraft pilot on how they go through their pre flight check. And so I don't know anything about that, but I was trying to create, an ontology, on this cause I was using RDF for this project. And. They don't know that either, right? The expert that I was talking to, they have no idea.

and so as they were talking, I was, just using, I think it was like arrows. io or something like that. And, just kind of showing, okay, here's how I, here's how I would interpret what you just said. And they would look at it and say, no, it's like, this kind of happens this way. And we were actually creating a data, a data model based on time, right?

So we quickly moved out of RDF because back then RDF started and existed. And it was too cumbersome. so we did get into, more of a time series model and it was really rewarding. And I got to understand more about why does this happen before that? And like the why behind things I think is so critical when you're building out a knowledge graph specifically, because, if you put something in the wrong sequence, which.

Is a common error when you ask an LLM to create the graph for you. They, it doesn't know like the sequencing of things all the time, especially if you're using your own, data to, to supply for the LLM, to create a model off of it, it doesn't get why things come before other things. that kind of stuff is, it comes out in the wash with this.

And honestly, like, I think we're all data people here. People listening are probably all data people. Like, don't you hate when someone asks you to do something you have no idea why? It's so frustrating. And so therefore, getting, getting, it builds bridges. It gives you a relationship with the people that you're dealing with.

It gives them a sense of trust that like, okay, they're asking about what I feel about this thing. And they start to learn a little bit more about what you're doing. I did one of these recently and, the person I was talking to was like, good gracious, no wonder they pay people to do this. They thought it was so easy.

They're like, Oh, okay. You just, you know, you, you just model it. It's fine. And then like, when we got into it, They were, they were giving me suggestions of, Oh, well, maybe this way. And then I said, we can't, because if we did that, here's what the logic, we're actually telling the, the, the machine, the LLM in this case, from the model.

And he's like, Oh, I see. Yeah. No wonder people get like whole companies. That's why. So it, it also gives them a sense of understanding as to it's, I, I Sharing a story, one of my saddest parts of my early career, I was told by a boss of mine that was very ignorant about what this stuff is, any monkey could do it, the exact words they told me.

And I said, wow, that tells me a whole lot. And I did not stay at that place for very long. but that, that's, that's a problem, right? First of all, I don't think anyone should be saying that period. But, Outside of that, like a lot of back to that, like, well, it's kind of a mysterious box, like all this stuff.

I will also say like try as much as I used to want to show like the hairball and make it like, Oh, look at all this great stuff. Show it in tables. Even if you're just showing these are your nodes and here are your relations and here like it's more familiar and you will get people dialoguing on it a lot more than if you show them the circles.

Like it's so strange but I've seen a remarkable difference when I go away from the circles and the lines and just put it in like a more table structure. Like even if it's just writing out. All the triples, that's all it is. That's fine, but it like helps them see, Oh, I see. This is how these things can connect.

I think that's, that's a big part of it too. So, yeah, it's, it's one of those things where, I think a lot of people are, are still kind of struggling to, to get stakeholder buy in, but if you can show them some of these, you know, maybe shortcuts and You know, don't say this thing, which is sad, but like, it helps.

and also like, just keep in mind, people are not always as excited about the data as you are. How can that possibly be the case?

Andreas Kollegger: That's a hard truth. That hurts.

Ashleigh Faith: Oh, it does. Doesn't it? Yeah. And I had to remind myself that all the time, because like, especially when I discover something unique and cool and interesting, that the graph Enabled.

I'm like, oh, yay. And then like I showed it to someone and they're like, I mean, cool. I mean, I'm glad we found that. Thank you. And it's like . It was really cool. . That's why I love doing my channel though. 'cause like I can geek out with people. . That's why I like doing it. Yeah. I love it.

Jason Koo: So speaking of your channel against that, you brought it up.

so there's this thing I think you do every summer, which is like, you kind of do like around like, honest reviews of different technologies and and and services. I kind of have two questions from that. A, like, how do you decide what to, let's start with this. Yeah. How do you decide which tools or which services to, to, to look at?

Ashleigh Faith: Yeah. So it is, I actually do it more often, in December. Usually it was kind of a play when I first started my, my channel. not a lot of people watch YouTube, or go on LinkedIn for that matter where I post a lot of things, around December. Of course they're with their families. but there's like this fun thing that companies do.

It's like, Oh, a buyer's guide of holiday gifts. And I thought, wouldn't it be cool if I did that, but like for graph stuff. And so that's where this came from. It was really just a fun. Like play on that around the holiday season and now it's like a, you know, holiday in July, Christmas in July kind of thing where I do it in the summer too sometimes.

I try to do themes in the summertime. like one year I did all like visualization tools. this year I'm doing, all things that are really focused on unique themes. Aspects of graph like I just did one focus on just, making sure that the security of your data is really strong, like stuff that is used for digital twins and by whole government organizations.

And that's like their primary customer. It's like, Oh, that's okay. Like, I mean, we all focus on that to some degree in the graph space, but like, that's like their primary thing that they work on. so, so, yeah, I do these. I always like to preface. I've actually had people say, well, these are way too positive to be honest that you must be getting, you know, something for doing this.

I get nothing. Everyone. Nothing. I do this because I find it fun and fascinating to see new tools. I get nothing from this. At all. so what happens is I, I see people on LinkedIn or in the communities I talk to, like, you know, Neo4j has like the ninjas, that people start talking about tools and, and, different things that they've encountered.

And I think to myself, huh. I never seen that one before. and I, I go in and I look at, I try not to influence myself too much by digging in too much, but I go to like the main page and I kind of see what they're all about. I kind of ask around like, Hey, have you heard anything about them? And if it's positive, I'll reach out and I'll just find somebody on LinkedIn and I'll just describe it as, you know, this is meant to My audience see what other tools are out there because it's kind of a pain in the butt to reach out to salespeople. And sometimes they're not allowed depending on the organization to reach out. so all they have to go off of is like the company's own marketing materials, which we all know is going to make them look good.

And so, that's kind of why I do this too, is to give my honest opinion. I don't ever ask anyone to come on. If I don't really like what they're doing, that's mostly because I'd like to keep my channel positive. but I don't ever hide anything that people aren't doing. I just highlight what they are doing.

And so, you know, if somebody is asking, well, what if they do, do they do this? Well, you know, you have to ask them. There's a lot of use cases that I don't know about. So you have to ask those vendors about those things. But that's generally how I pick who's coming up is I start to see what's bubbling up in the industry, in the, in the community, More and more people are watching these. So people have started like reaching out and saying, can you do this one? which is kind of cool. but that's generally how I pick who's who's coming up.

Jason Koo: Nice. And that's a perfect segue into our next section of, the podcast, which is our tools of the month.

In, I don't know, the last couple of months, Ashley, has there been one service, you know, don't feel like you're being like highly endorsing this one thing. No, no, no. One particular tool that really kind of stood out for you.

Ashleigh Faith: yeah, so one, that I've been kind of playing around with and again, like, I just started playing around with it.

is entity. And, that again, because so much of your graph plus LLM journey is going to depend on entity resolution. I've been really starting to focus on that. see what's out there. I know Paco Nathan, who I think just did something with Neo4j, on this topic. He's really passionate about this area.

He's a lovely individual too, by the way, like somebody that can. Be it could be a jerk because he's so brilliant, but he is not. He's just like the nicest individual, which I love when I meet people like that, that could be a jerk, but cause like, they're so smart, but they're not, which is lovely. but yeah, entity resolution is a big deal.

And so his entity is one that, I think either he pointed out or somebody else pointed out that they were, playing around with, so now I'm playing around with it and seeing what, what can be had with that.

Jason Koo: Cool. Excellent. Thank you. I guess we'll go around the table. ABK, did you have a tool of the month?

Andreas Kollegger: So for this month, since this is the first day I've turned on my computer in about three weeks, I'm going to give a shout out to any airline that's still using Windows 3. 1. Love you. Love you. I've just traveled around a bunch and I've had no issues. I'm going to assume all those airlines are still using really old software.

So sometimes the old ways are best.

Jason Koo: It's not broke. Alison, do you have a tool of the month?

Alison Cossette: mine actually isn't a tool. Mine is, a, a program that uses graph and we've talked about them before. Adam Bly and the folks at system, they've recently expanded their, the research areas that they cover to include climate change.

So if you're not familiar with system, they're basically. Trying to graph the knowledge of the world and, starting in certain areas. And, you know, one of the things that they do in system pro is it will, you can put in a query and it'll leverage the graph to not only do, you know, graph rag of getting what you want, but it will actually, it was probably actually the first application of graph rag, actually.

Right. Probably, we'll have to, we'll have to have him on and ask him about that. But what it does is it actually, summarizes different. Pieces of the research itself and gives you a summary of like, maybe you're trying to look up a particular scientific fact. And I'll say, oh, we have the ranges are from here to here on this particular piece that you're looking for, and it will sort of group and summarize the different aspects of the research around whatever the topic that you're looking into, but I'm super excited that they've, expanded into climate change because it's such an important topic.

And especially for us being in AI, the carbon footprint of compute is something that I think we all need to be mindful about. And so I just wanted to, you know, my tool of the month in this case is the topic of climate change and how we impact that, and also a way that you can start to research it and always a big shout out to Adam Lyon and the folks at system.

Jason Koo: Nice. my tool of the month is kind of a shameless plug for Neo4j. It's, so this month, we had released the, Cypher copilot. So if you log into the dashboard, there's this little star icon on the right, and it's great for, you know, because like before I would have like, you know, chat, GPT or Claude on the side and asking it, you know, Cypher related questions.

Now you can just kind of ask directly within, the dashboard. The dashboard, the browser tool. and there's a great writeup, which, which I'll throw a link into, which kind of goes through all of the like, strengths and weaknesses of the current version. Right. Because the current version is using open AI and then giving some prompts, but it's not a fine tune model.

Right. So that has some, you know, some issues that come with it. And the, the article goes into lots of examples of like, this is where the copilot is very good at. And then these are areas where the copilot could definitely use some work. So, you know, it's a, it's a, That was a great article. But yeah, in my own like testing of it, like certain things, it just did really well.

And then one thing I like to do in Cypher is, you know, check the schema of whatever graph I'm looking at. So this is called the schema visualization, right? So I was trying to imagine being someone who didn't know what that call was, and like trying to describe, like, how do I Get this, this thing, this data model, the schema, and unfortunately, I couldn't get my copilot when I was using it to give me that call, right?

It got very close. It gave me a call schema, which is not a function, but that it was, you know, it was, it was in the right, right direction. So, yeah, that's my, my tool of the month. and so, okay, so normally we would go into kind of a long list of like events and, you know, other articles that have come out in this last month.

We will post that in the description of the links. but instead of kind of going through each one, I just like to go around the table again and just kind of, you know, have everyone kind of mentioned what events that they're going to be going to soon. So if, you know, people want to come out and, you know, join us at a conference or at some sort of meetup, they'll, they'll kind of know where we're going this, this month.

Ashley, did you want to go first?

Ashleigh Faith: Sure. so I think I have like some, some smaller things that I'm going to be doing, leading up to the nodes conference, but I'm at the nodes conference, and I'm going to be talking about. trustworthiness that you can use, a graph to do fact verification for your LLMs.

and then I also am going to be at Connected Data London, and I think I will physically maybe be there. So if anybody wants to meet me in person, I am, I'm probably going to be there in person. So, we'll see. We'll see. Pretty excited about it.

Jason Koo: ABK, I think you should go next since you are based in London.

Andreas Kollegger: And I'm hoping to go to Connect2Data London, so maybe I'll see you there, that would be super cool. And before that, my next event this coming month, much smaller than Connect2Data London, but I'll be at a meetup in London, honoring the glorious Taylor Swift. We're going to have a, a Swift based Gen AI meetup.

It's going to be fantastic. and to be honest, it's also an opportunity for me to learn about Taylor Swift. Apologies to the Swifties. I'm not actually a Swiftie, but there's some, there's something there, so I've got to learn about it. And this will be the occasion for me to learn about Taylor and maybe for Taylor fans to learn about graphs.

Thanks.

Jason Koo: Allison,

Alison Cossette: the good news for me and maybe not everybody else is that I'm actually not going anywhere this month, which is a major event of the last two months. I think I've been gone like six out of eight weeks or something crazy like that. So, this month for me is going to be pretty quiet coming up in the fall.

we'll be at the open source conference in Bangalore in October. I'll be at the AI conference in September. obviously nodes is coming up. So lots of things on the forward horizon. So if anybody wants to catch up with me and has been meaning to connect with me via LinkedIn, now is an excellent time.

Jason Koo: Nice. just like Alison, I'm going to, I'm. Planning to be in San Diego for most of this month. I also just got back from a lot of traveling, so I'm hoping to reconnect with everyone that that's here in Southern California, but also in September, I'll be joining some events up there. We're going to be doing another hack night with Weaviate, and then in October, I'll be, you know, joining Allison for, for, the event that we're doing out there.

why am I drawing a blank on what we're, it's not PyCon, it's, no, it's

Alison Cossette: the open source. I think it's the open source data conference. It's an open source, the open source in Bangalore, October 23rd and 24th.

Jason Koo: Nice. Awesome. Yeah. We'll put a link to that. Cool. Yeah. And, yeah, nodes is happening in November, virtually.

And so all of us will, we'll be there and looking forward to everyone who's listening and hopefully you will join us for nodes, when it comes around. okay. So that is our episode today. everything we talked about, we'll put links in the description. And, we'll see everyone on the next one.

Andreas Kollegger: Thank you, Ashley.

Alison Cossette: Such a great special guest star. I feel like, you know, we've, we've reached a level in our podcast. Now we're like, we actually came with us. We're so lucky for all you do for the community. it was great spending time with you at the knowledge graph conference. And it's just really, it's lovely to get to spend some time with you and see inside the brain.

I love it.

Ashleigh Faith: Oh, goodness. Yeah. I mean, I, I try. That's all I could say is I try. All right. Well, you're very humble, but thank you very much. Thank you.

Jason Koo: Yeah. Big thanks.

Ashleigh Faith: Well, thank you for inviting me. This is fun.

Jason Koo: Yeah. Thank you again.