GraphStuff.FM: The Neo4j Graph Database Developer Podcast

Net Zero Decarbonization with Henry Bruce and Mike Napper from ExpectAI

Episode Summary

We are looking forward to chatting with Henry and Mike today. Henry Bruce (CPO at ExpectAI) is a product leader experienced in creating and scaling enterprise business intelligence platforms with impact. Mike Napper (CTO at ExpectAI) switched to focus on climate change solutions in 2015 after 22 years building and leading financial markets trading and analytics systems for large global banks.

Episode Notes

Speaker Resources:

Tools of the Month:

Announcements / News:

Articles:

Videos:

Events:

Episode Transcription

Jennifer Reif: Welcome back graph enthusiasts to GraphStuff.FM, a podcast all about graphs and graph related technologies. And since this is the May episode, May the 4th be with you. Star Wars has been near and dear to my heart for years, and I always love a geeky holiday to celebrate. I'm your host, Jennifer Reif, and I'm joined today by fellow advocate, ABK.

Andreas Kollegger: Hello.

Jennifer Reif: And we are looking forward to chatting with Henry and Mike today. So Henry Bruce, who is CPO at ExpectAI, Henry is a product leader and experienced in creating and scaling enterprise business intelligence platforms with impact. Before joining ExpectAI, he led the product development team of MIX Market and institutional and market data service for investors in banking and agriculture globally.

Prior to MIX, he spent over a decade building trading and risk management platforms for multinational investment and commercial bank clients. So quite, quite diverse there, I would say. And then Mike Napper is CTO at Expect AI after 22 years building and leading financial markets, trading, and analytics systems for large global banks like J. P. Morgan.

Michael switched in 2015 to focus on climate change solutions. He has a deep background in data and analytics on the tech side and in finance and environmental impact broadly. As well as serving as ExpectAI CTO. Michael's also an operating partner at an early stage climate solutions, venture capital fund, Satgana. Sorry. Okay, perfect. All right. So with that, let's dive right in and talk our topics today.

So could you both tell us a little bit about ExpectAI's mission and vision and kind of the back story there?

Henry Bruce: Thanks, Jennifer. Really great to be here. Looking forward to sharing a bit about what we're doing and the exciting technologies we have that are enabling that.

So ExpectAI we're on a mission to reduce 500 megatons of CO two equivalent. That's common dark side equivalent emissions by the year 2030. That's super ambitious. And we are doing that by enabling organizations, companies, to take action, that is profitable and will also reduce their carbon footprint.

And data is a huge part of that, whether that's public or private data, and how we organize and think about connecting companies with the right information, making decisions based on that. And enabling them to then take that action with providers in the market, and banks to finance. That is what we're all about.

Andreas Kollegger: Michael, do you have some thoughts about that as well? Like, any other thoughts to add?

Michael Napper: yeah, delighted to be here. This is Mike. And data is, is everywhere. There's just so much of it, and it's increasing at a ferocious rate, both structured and unstructured. And our mission is to make sense of that, to connect it, to use it to generate insights.

And it's something I've been doing for a long time, but now I'm doing it on the financial market side. There it is all about trading more profitably, and here it's all about, you know, saving the world, but profitably. And, I think that's our mission, how we guide and inform and provide insights to businesses so that they can take action.

Andreas Kollegger: That sounds fascinating to me. I love this mission, but I'll have to admit, even as you spoke through it, it's not immediately obvious to me, like how those things connect, how the data to the, okay, reducing, you know, emissions. I can understand somewhere that there's maybe some insights you can realize, like, Hey, you're spending, you're costing a lot of, you know, carbon emissions to happen at this part of your business and this part of your process.

So I guess maybe that's the data to insight connection. So I'm curious if that's true, but then I'm also curious about the next part that you kind of mentioned that you then somehow, have these insights that actually can help profitability or is it, is it cost savings? But what are those two languages actually?

Henry Bruce: Yeah, great question. First up, the challenge we see in, around companies in the UK, where we are at the moment, and around the world is, there is this big capacity and capability gap. There's huge awareness that still needs, to be built. Education needs to be deployed within, across organizations, companies to enable them to change their decision making and fundamentally is going to happen to consider emissions and carbon as one of their criteria.

And traditionally this has been, you know, finance kind of operations have been the limit of criteria. And that's going to change. And that is changing already. And the largest companies, the big multinationals, they have, the brand, the kind of consumer pressure, they have the resources to dedicate time, people, money to understanding the dynamic where those emitting activities are, what's material to them in the business that they're operating and what's not, and then start to think about what's the strategy, how can I plan some action, what's going to be involved, How do I manage my team to change their behaviors, maybe their operations, kind of day to day, week to week, but also much more significantly, how do I plan my investment?

What financing do I need to change the dynamic and the makeup of the assets that I own as an organization? If you're a wholesaler, you have some vehicles to move things around. If you are a manufacturer, you have machines and tools that build your product, and then you're using again, probably vehicles to get it to your customers.

And all of those things are the artifacts of a world where emissions weren't important. And so we're, you know, companies are having to change their operating model and some are ahead, some are behind. And so that capacity and capability, particularly in these in these midsize companies, the kind of longer tail of companies around the world, is missing.

And our hope is that, our expectation is that, by providing them with the information to make decisions about where that activity is that is emitting and what actions they can take. Connect them with providers, who can help them take that action, is how we will get the majority, the vast majority, of companies around the world to actually taking much closer to taking action.

And then this is going to be an incremental process, of course, but that capacity and capability is really what we're focused on. That's where the problem is. And we feel like there's a huge opportunity to build that, and use technology, particularly, data organization and some of the generative AI technologies with, you know, you've been talking about on the podcast, recently, to make that accessible and to be able to incorporate that in decisions.

Michael Napper: And just to sort of build on what Henry said there, companies might have a nagging feeling that they could or should be doing something. But they don't know where to start, they don't know who to talk to, they don't know that there are probably 50 things that they could do over the next four or five years that, and to your other question that would actually save the money, right?

And that, and that complexity, depends on many variables. It's where you are, electricity tariff you're on, what fuel costs are doing month to month, how many miles you drive, just looking at the, the, the movement use cases. But, there's just, where would you start if you were running a 50 person or a hundred person or a 500 person business?

If you haven't, if you're running a large multinational, you might hire McKinsey or one of the big consultants. Yeah? And you know, there's a very large number of companies who don't have access to that resourcing. So, we're trying to just educate and inform and then crucially provide a pathway to action. And, you know, whether that's introducing to service providers or working with our financial partners to bring financing, because even though something may save you money. And I went through this when I switched to electric vehicles, I went through this myself personally, when I switched my heating from a gas boiler to a heat pump, that was seven years ago, both of those things, built myself a spreadsheet model, because I'm an analytics guy.

That's what I do, right? I build a spreadsheet model. A very simple, just like a five year total cost of ownership and what you typically might see, and this is not all solutions, it's just some, is that you need an upfront investment and that green stuff requires a capital expenditure and then it's cheaper to run.

So my electric cars cost nothing to run, basically. Fuel is almost, I mean, the fuel used, the fuel cost is 80 percent less than a petrol or a diesel vehicle and the maintenance is almost zero. There's no, you know, there's no oil to change and so on and so forth, but it's a more expensive vehicle at the outset.

So, you know, that's just one very micro example that, you know, I lived from a personal capacity, but if you multiply that by 100 possible things that you could do across 5 million companies, or whatever it is in the UK, and then you look at that globally, those data points are what are going to enable people to understand what they could do.

And it will be very personalized. It will be what makes sense in the context of your business and your bottom line, your financial situation. And that's what we're about.

Henry Bruce: If I can share an anecdote, I was in a taxi yesterday going across London and it was it was an EV. And I asked the driver, who was an Uber driver, how he found charging his EV.

And we kind of progressed the conversation on to would he buy an EV again? Would he buy a combustion engine vehicle? And the context that's required to make that decision. There's currently a waiver for the congestion charge in London for electric vehicles, which is substantial. But the current legislation is set to run out or not be renewed or elapse next year.

And so that's a factor for him. His current, kind of, schedule allows him to go home to charge. So he doesn't have to wait for a charger in the city, which are usually occupied by the black cabs. All of these different criteria, current future potential, but kind of the context that's required for him to make the decision about what's the total cost of ownership exactly.

It's the same Michael. For him to run an EV versus a combustion engine vehicle for his business as a taxi driver in London is extensive. And this is where there's a huge opportunity to bring data together. And part of why we're here talking to you on Neo is about the ability of graph technology to bring that context to bear on a particular decision or a particular point in time.

Andreas Kollegger: I love the way that you're framing that right there. The points you're both bringing up and think back all the way to Michael, your spreadsheet that, for you, where Any given problem, you probably reach for a spreadsheet as your first tool. I don't know, it could be breakfast tomorrow. You're going to create a spreadsheet for it first. Possibly, right?

You know, but not everyone is going to be like that. And even like on a business level, businesses have those same kinds of behaviors. And for a topic like this, they might have some notion that they should do something, but they don't know what to do. And like the impacts are invisible to them. The costs are visible to them because they just don't know.

And so is that part of the, kind of your insight here is that you're making this visible to them so that they can make the right decisions and actually see where the return on investment is, if there's an initial investment to be made. To then, in the longterm, like in the five year, whatever the time period is, that that's actually going to be beneficial to their business.

Is that the overall idea?

Henry Bruce: I think that's the starting point. Yeah, certainly, to provide them with that visibility. And what's exciting is that there's this spectrum of actions that span behavioral kind of operational things that can be put into practice today, tomorrow, next week, you know, next month, and that there are these much more substantial actions to kind of infrastructure and assets that require financial planning and building permits and all these other things.

So the different level of context and requirements to put into practice. So there is a spectrum, and it's exciting to kind of work along some of those nearer term term things, but also the more substantial things, typically are where, you know, there is more financial investment required. And more parties need to come together and say, we're going to change an entire fleet of logistics vehicles from combustions and combustion engines to EVs. Or we're going to use hydrogen HGVs, which are barely running on the road yet, but seem to be a viable option for heavier loads. And maybe that's something, you know, that's planned for in a bit of a longer, the longer term.

Michael Napper: Something I'll just also kind of touch on. And I think this will be music to, graph aligned ears, which is that all of this is connected and interlocked and interwoven.

So, for example, we have a very large data set of companies, right? There are supply chain relationships between them. There are regulatory pressures that are increasing. There are consumer pressure, you know, levers that are starting to come into effect, and pressure is starting to be exerted up and down the supply chain.

And people are demanding sustainability data from their suppliers. And what are you going to do? What's your action plan for reducing your footprint environmentally, broadly speaking, not just carbon, but water, you know, air pollution, et cetera. There's also a network of financial relationships. So a big lever is asset managers, who invest in, you know, invest our pensions, our savings in companies. They're exerting pressure on boards, and banks, commercial banks who, you know, run accounts and loans for any number of businesses, thousands and thousands of businesses. They are under pressure to enable their businesses to reduce their emissions because that counts as part of the bank's carbon footprint. And every now and again, this flares up into a headline article in the press. There's a web. It's intrinsically a graph if you think about it that way. Of relationships between companies, between regulations and where they apply between suppliers, customers, between financiers and their customers.

So, a graph structure feels very natural for that. But, that also relates to the point that all of these things are interconnected. So climate change is a system level problem. So you need to look at all the details, but you also need to approach it from a systems thinking perspective as well.

Jennifer Reif: I really like the approach, too, that you're looking at this from a, you know. Yes, this is a big matter. And yes, things, you know, need to make a change. But in order for there to be change, it has to feel reachable and feasible to people, both cost wise, as well as action wise, and like long term, you know, bigger systems changes, as you were just mentioning.

And I think that's one big thing for especially, you know, individuals, smaller companies, even some of the larger companies, maybe you're, you're really big ones. See, this is, well, this is us giving back, you know, to the planet, to the future and so on. It's an investment that's going to reap rewards eventually down the line, even though right now they may not be seeing that.

But for smaller companies for individuals that may not always be financially or time wise feasible. I mean, that may not their business may not survive through that. And so to give them data and ways that they can start making incremental changes now that are very feasible, very reachable, very cost effective, I think is really going to help get everybody kind of on board and like, okay, I can do this or this company can make these shifts over time.

This is something that now we can put on the plan because now it feels more reachable. It's not just something that big companies or countries are doing overall. It's something we can all participate in.

Henry Bruce: Yeah, absolutely right, Jennifer. And I think maybe here to add that we've talked about actions, and to put this in the context of what kind of how we're going to solve the climate crisis, but kind of broadly what these actions that companies can take to reduce their emissions.

This is insetting, this is kind of the reduction, absolutely, of emitting activities. The other, opposite and kind of more, you know, more popularly understood, potentially part of the reduction pathways are offsets. Offsets have been in the press a lot about there's been a lot of controversy about the quality of offsets.

Offsets are we continue to emit, but we'll support some activities that, take carbon out of the carbon dioxide out of the atmosphere and achieve some balance. We're very much focused on insetting, actually reducing emitting activities, because there are not enough trees to plant to offset all of the emitting activities humans currently undertake.

So, to solve the climate crisis, we are going to need to do lots of insetting, and that's going to require alignment with the incentives of companies and economies as they exist today. And so the profitable piece is really important. And I think back to the example a little bit and some of what Michael was talking about, the kind of the spreadsheet is that is that element of it's not just about today.

It's about if you don't take that action, what is the cost of that to you? And it can seem like to the taxi driver, it made sense to, you know, based on his experience today, to not buy a replacement EV, if he had to tomorrow. But if you take into account all of those future things that are going to impact the total cost of ownership for him, of running his car, the increase in the cost of fuel, the reduction in the price of electricity.

Potential extension of the pass for for EVs in central London, the congestion zone. All of these things and how we expect them to change over time and making that accessible kind of available in that moment to to allow them to our users across the organization to plan better.

It's going to be really important.

Andreas Kollegger: I'm now curious, given all this and what, in some ways, this is a classic graph scenario, like where you guys have had the inside. Like, Oh, my gosh, all these things are connected. If we saw if we knew all these things, we could share this with companies, we could make a case for them for like how they can improve their business and their footprint.

I'm curious on two levels, both where you guys went from concept to like, you've got to start somewhere with the data and gathering the insights and building the case. How did that go? And I guess this is maybe a long two parter, but the second one is then in practice with an engagement with a kind of a business of the, kind of, in your target audience, I guess I'll say. Like, you know, business of a certain size, where do you start with them? And is that mirrored in the data that you've collected as well?

Michael Napper: Should I jump in on the data piece?

So I first met, my first graph database was I met Neo4j in September, I think it was, maybe it was October. And, I think it was, I forget the exact name of the book, but I think it was by Dr. Barrera and somebody else. Well, maybe he wrote the forward, but anyway, it was graph databases, written by some, you know, by people who are from Neo4j and I was like, wow, okay, this is different.

I, you know, I spent many, many years doing a lot of both relational databases and, time series databases. Financial market data is a very particular beast that comes at fairly ferocious bit rates. And it's quite large, but it's quite uniform in its structure because it's just, you know, market data. But so the graph was a fairly novel concept to me, but I just, you know, was immediately impressed.

As I'm sure many, many people have been, and also it was super easy to get started. I mean, the, you know, kudos to your whoever writes the onboarding documentation. Everything is there that you need to get started. I mean, it was really a very good onboarding experience and, just grabbed a whole load of public data.

Lots of it and filter graph, and it sounds quite straightforward. It wasn't. I'm sure there were a few evenings in November where it didn't feel that straightforward, but there was some late nights. But actually, you know, one of the things I love is that, you know, you have a first version of your schema, and I know it's sort of not really a schema, but your data model, your data semantics.

I guess you would use as the proper term. And then you learn and you can iterate on it quite organically. Yeah. It's very different than the traditional. Here's your logical data model. Here's your physical data model. And Oh, no, dear. I need to change this table or whatever. It's it.

Jennifer Reif: Yeah.

Michael Napper: So I personally, I was learning the product, as well, at the same time as learning graphs in general, at the same time as discovering the data sources and trying to put it all together.

And I suspect I'm not the first one to tread that path. And it was a very good experience. And then I forget your exact question, but, you know, how did we...we've got, you know, we've got data points on basically, you know, 5 million companies in the UK, and we've got about 3 years of historical data on them, and then what we've done is we've then layered on our own data that's derived from that, which is our own secret source, you know, proprietary models and, to generate the insights, yeah?

And all of that fits quite naturally into a graph structure. So. Going back to Andreas's question, did I answer? I think it was a two parter.

Andreas Kollegger: You did. So that was the first part, like where you started with the graph. It sounds like you kind of took an incremental approach, if I would say, like, you know, you just put some data in and then see what either...take a look at it.

Does it work? Does it answer questions you want? And they kind of just kept iterating. Is that fair?

Michael Napper: I think that's exactly fair. And I just call out again, the quality of the documentation and all the YouTube videos and all the rest of it was, it, you know, it was, it didn't feel like a heavy lift.

Actually getting the data, and understanding it and making sense of it was the bigger piece of the work. And then, as you say, it took me a while to get my head around the difference between CREATE and MERGE. I mean, dumb stuff like that, right? You know, it's like, okay, read the manual properly.

Okay. Ah, okay. Right. Yeah, I get it. You know, you get your nodes and your relationships and it's quite organic. Right? That's how it felt. It does. I've seen this in, somebody else made this point, you know, in a recent pod on this series. It's kind of how your brain sort of thinks.

It's much more intuitive than a relational structure with, you know, a second normal form and all the rest of it. This is what I, what I've been doing for a long time. So that was, that was a good change. And I think, what was the second part?

Henry Bruce: Can I add, can I add Michael?

Michael Napper: Yeah, go ahead.

Henry Bruce: The tooling as well, like, if you ask Mike or I to now go back to traditional relational database tooling, we would balk at the idea of strongly disagree, strongly, the graph visualization of the graph database.

I mean, we use it all the time to, it's different to establish, you know, what data is where and those connections and relationships, because that's such an important part of the data set that we have. So we've really, the workspace preview and the bloom presentations of the nodes, you know, that's been so critical and we wouldn't. I don't think either of us would you anticipate going back to Postgres, you know, a desktop app or whatever?

Andreas Kollegger: It does have that quality about it that. Like, once you start to have any kind of, you know, initial successes with it really just draws you in. And it's being almost kind of fun to work with data, if that's possible. I still have, you know, after more than 10 years of doing graph stuff now, I still just have a fun time working with graphs.

Henry Bruce: We had our board meeting yesterday, a quarterly board meeting, and I brought up a present a query and, in workspace preview with Bloom presentation. And it's so accessible to everyone to see, Oh, these are, these are how they're related and kind of, you expand, you know, a node and you see the others appear.

And it's this kind of a magical presentation, but it's so easy and accessible to understand for all users, technical and non technical.

Andreas Kollegger: All right, so that almost leads into the second part of my question. Then it was the, now that you've built up this internal database led with historical data about companies, you know, in the UK and elsewhere, like, you know, whatever the ranges and all the different data sources. Similar to like, you know, presenting it to the board, is that part of the bridge over when you approach businesses and say, Hey, here's our view of the world. We'd love to see, show you where you fit in and where is that the lead into the conversation of like, you know, here's what your options are for trying to improve your footprint. What does that engagement look like?

Henry Bruce: Yeah, absolutely. Right. So, by starting from public data, we're giving companies a huge headstart to come back to the capacity and capability piece, a lot of the work at the moment is very emissions focused. Go and source all the dates from across your organization, regardless of its materiality or relevance to calculating your emissions. And we can give you the perfect number, and congratulations, it's, you know, 2.54 kilotons a year of CO2 equivalent. Great. So, you know, this accurate number, but what am I, what should I do with that?

Are we taking the approach of working back from action? Yes. Emissions are important, but, reducing those and what is understanding what is material to understanding the kind of the next best recommended action, is where we should, where companies want to start and what does the return on investment, carbon versus financial, look like and how can they work with that to match their kind of business priorities, and their willingness to take or invest their own money or work with banks to reduce, ultimately reduce their emissions.

Jennifer Reif: That, that goal is great, but, working, you know, you can't flip a switch and make that goal today. So what can you do in the interim to start working towards that?

Henry Bruce: Exactly. The multiple tracks always.

And that helps us identify then what are the data points that are really important to understanding it's this aspect of your business, this operation, this asset that is going to be critical. And you should, if you can do something about it today, great. If you can't, let's start planning to do something about that in three years or five years time, because this is going to be really significant to you reducing your footprint.

And hopefully, saving the money along the way. A lot of these assets, as they are replaced or added, are, are profitable, with the right, you know, with the financing piece, connected.

Michael Napper: We haven't really dived into the GenAI piece particularly yet, but I think, I think, to the point of how do we engage with a company, it's the old adage, meet the customer where, either where they are or where they want to meet you.

Right. So we've given a lot of thought to this and we're on a GenAI journey, as well as sitting on top of bedrock, a graph of a knowledge graph. So AI plays in different ways. It helps us source unstructured data and turn it into something that's tractable in a structured graph environment, you know, whether that's text or, you know, PDFs or whatever.

It also allows us to, and this is one of the key kind of differentiators that we, you know, we focus on is we provide a natural language interface, a conversational interface to the user base, right? And that is very powerful. You can just have a chat with the data, right? We also recognize that there are some types of data that render better as a chart or as a table, so we have that integrated.

So we have a multimodal conversational interface that sits on top of the graph, if that makes sense. I don't know if that's easy to describe in audio, but, you know, you can imagine anyone who's used chat, you know, ChatGPT or something like that, you know, you ask it a question. It gives you an answer.

In our case, sometimes that answer will just be a textual answer with perhaps some numbers in it. Or you've asked about what regulations are relevant to your company. You've asked, what should you do? Why should you do it? You've asked, you know, why should I bother with any of this very existential questions or very tactical questions?

Like, tell me about What is SBTI? It's a science based targets initiative and it will explain. And we've got a library of curated content that we will answer your qualitative questions and your quantitative questions. And if it's a very quantity answer, like here's a roadmap for five years or 10 years worth of carbon action, we'll put a chart in there where it makes sense.

So interleaved in the conversational interface is what we would call graphical elements, traditional UI elements. We don't believe that's the end of the story. We would expect, perhaps on the enterprise client side, to potentially provide access via an API to the graph, you know? You know, if you've got a good data structure, you can deploy it in different ways.

So for the person running a 40 person company, we are your chief sustainable officer in your pocket, right? Basically. Yeah?

Henry Bruce: On a more technical note, Michael, cause you were leading the development of the agent Peter, and Lance was on the podcast a few episodes ago from LangChain, and we were using the GenAI stack initially.

Maybe you could share a bit of your experience on the, it might be interesting to share a bit of your experience on the development of that agent, which is, sits behind a Lang, which is LangChain based and sits behind the conversational UI.

Michael Napper: Absolutely. Is that something that's of interest to your listeners?

I was going to say viewers.

Andreas Kollegger: For sure.

Michael Napper: Stop myself. Yeah. So, we absolutely, have deployed, an agent. And it has, you know, a carefully constructed, fairly long now, prompt telling it what it should do and, it, we have made available to a number of tools and we follow the LangChain terminology here, which is probably familiar to, at least the two of you, if not everyone in the audience.

So a tool is essentially a piece of code that could do something, right? And the agent is told, well, here's his 10 different tools. This is what each one is for, right? Pick the one that is appropriate to what the user has just asked. And if it's a qualitative question about regulations, yeah, we might go and dip into our, do a, you know, a vector search on our carefully curated content and you know, the large language model would pick that tool and we go and call it. We go and take the results and package it back up and give it back to the user. And this all happens in like typically between one and two seconds. So we're trying to be fairly quick.

And, or it might be a quantitative tool. So what's my carbon reduction plan? And then it would, you know, generate a bunch of numbers. And this is where, you know, LangChain allows us to, you know, iterate pretty quickly. And again, they have excellent onboarding documentation as well. And, full disclosure, I've never written Python before December.

So, you know.

Jennifer Reif: I can jump onto that bandwagon as well, because I hadn't really looked at it until probably late last year, early this year as well. So. It's new for me too.

Michael Napper: Absolutely. And the fact that LangChain and Neo seem to have such a relationship is obviously very helpful for us. And I think the stuff is accelerating so fast, it can get a little bit dizzying, just the pace of change.

And we always, I mean, I'm a technologist. I was, you know, I was coding as a kid. I wrote Assembly code where I was, I mean, just for fun when I was a kid, it was a long time ago. And the thing we always have to, you know, focus on is what is the commercial and impact relevance of what we're deploying.

Yeah. So that, the technology is evolving so fast and so powerfully. And we're making a bet, effectively that, you know. There are things that we can't do now that we'd love to do, that there's a pretty good chance in six months or 12 months, you know, OpenAI or LangChain or Neo4j, I mean, the fact that you've got vector databases natively in the graph was like, that's pretty cool.

You know, vector search index. So, these technologies are evolving so fast. So our challenge is just to keep up with the pace of change, but always to harness it to the mission and to things that we're private, you know, we're a for-profit company. So we also have to serve customers. And deliver commercial value to everyone, including ourselves.

Andreas Kollegger: Which sometimes as a developer can be a challenge, right? When you're deep down in the code and first you're just, you know, building things, maybe you're fixing things, and it's easy to just keep building because you're now just almost having fun, right? You're just kind of exploring the space, expanding what's possible, but keeping in the back of your mind, the mission that you're trying to achieve, who the user is and what they're trying to do.

Is it..as a developer, sometimes it's a challenge. How do you guys balance that?

Michael Napper: It's, we like to say to ourselves fairly regularly, this is not a research project. Okay? And if for a moment we forget that, our chief exec, Anand, would remind us. But, yeah, I mean, my core focus for my second career since these last 10 years has been solving the climate crisis. So that's my north star.

So it's, I don't find it difficult to manage that tension. And also, you know, back in my previous career, I ran, you know, very, very large, very expensive and, very high throughputs, pretty scary scale algorithmic trading systems.

You know, so you very quickly focus on how do you harness technology to a commercial aim? That's second nature to me. It needs to work, right? It needs to deliver business value, right? When those things go wrong, it gets very ugly very quickly.

Andreas Kollegger: Have internalized those, both kind of missions at once in your own head.

Like you're, yeah, they're both top of mind at all times.

Henry Bruce: I think, Andreas, on the applying that question to the data side and the kind of knowledge side of, it does get a little, it does get harder and talking about kind of creating conversational interfaces using Large Language Models that are agent, kind of, agent managed, that don't necessarily do what you tell them to do.

And that's by design and also, you know, kind of frustrating when you're typing in all caps, you know, "PLEASE ABSOLUTELY DO THIS THING. ALWAYS. ABSOLUTELY." You know, three exclamation marks.

Jennifer Reif: And then it doesn't.

Henry Bruce: And it doesn't listen. It really is, and the data side as well. How much is enough data? How accurate does it need to be? How complete, how are you putting the number in context?

Is it you're trying to quantify those accuracy and completeness metrics for a kind of a company to make a decision based on those, versus a bank. You know, there's different levels of expectation around what the number means and represents, and how do you present some of that kind of that calculation and the explainability and kind of context of behind the number, is much more of a gray scale.

And I think the way that I try to think about it is...we get in front of our customers and our users and say, how, what else do you want to know? And, and that's also one of the interesting things about the conversation. You know, conversational UI, as an interface, is that you actually can see, then, from the conversation, what, where are they going next, in a way that isn't...is much harder in a traditional, visual presentation. You don't, you're explicitly saying where the user can and can't go next, and maybe have some support or help or chatbots or something to kind of ask, you know, questions to, but that's not part of the experience.

Whereas, and we'll come to that in kind of the tools of the month, we're really excited by some of the conversational UI as a means actually to understand the user behavior, much more, kind of, intuitively and not give them a sort of a challenge, but not give them, or constrain them to you must do this next.

So if you don't want to do that, what do you do? And that's a great feedback mechanism for us as we've been developing this.

Andreas Kollegger: It's a brilliant observation. Like, that sort of anticipatory design has been a UI goal forever. But always hard to do. And it's a great observation. Like actually, now that we're in the middle of a context of chatting, like you can see the history, you know a little bit about the user. You're better able to think what might be, you know, coming next.

And of course, you know, LLMs are designed to anticipate what's coming next.

Henry Bruce: Yeah.

Andreas Kollegger: In that idea of the feedback. Have you incorporated that into the loop as well? So that do the users actively kind of self curate as things are coming back? It's like, actually, this was not helpful or it was helpful, and you keep a record of that?

Henry Bruce: We were, so we're using, LangSmith, which you can talk a bit more about to get, collect all of this conversational history and the use of the tools, as Michael was saying. I think we, this is, it is kind of a, this interface development is relatively novel. So we're excited to be on this journey and what is the right balance of guiding, versus supporting, versus responding to requests for guidance, versus trying to kind of push guidance.

And there's definitely some trial and error and seeing, kind of, what works in what context or, as to whether we are more, kind of dynamic or we want to try and take users on an arc, or there's a kind of a narrative. This broader narrative we talked about from education and awareness through to taking action, and we have this arc, this narrative that we want to take users on through the platform and demonstrate all of our capabilities and get them to taking action.

And when is it appropriate to, to kind of, take them to the next stage? Or when do they want to kind of explore and explain a bit more where they are at the moment? If they're kind of looking at a particular initiative and aren't quite sure that it's ready for them to take that action yet. Maybe they need a bit more confidence or kind of context or what are the considerations?

So we're winding our way through some of those decisions, and I think it is going to be very interesting, to see what works and what doesn't. And, we're all learning at the same time, which is really exciting.

Michael Napper: Two things, just to layer on top of that. So, right from the outset, we, I put in an evaluator, which, for the moment, is assessing helpfulness.

So. You get the agent to answer the user's question, and asynchronously, you then feed that whole conversation chain and the answer to the LLM and ask it to evaluate it. And we persist that in LangSmith. So that just gives a score from zero to one. The helpfulness, you could imagine expanding that. And it does actually work.

It's like, you know, that is, you know. That is a treasure trove of information to help us be more helpful, basically. And the other thing, is that, this is what Henry was talking about, this cognitive journey is the sort of the theory that we're trying to put into practice.

So, we've got the, we asked the LLM to work out, effectively, some suggested next questions that the user might want to ask. And this, again, as Henry said, this is a winding trail, and we're just learning as we go. So this may look very different next month. But you know, one of my fears has always been, you know, how quickly do people type and how patient will they be for typing?

And if they don't know what they could ask, how will they ever learn? Right? We don't publish a user guide. Right? The idea is that the whole thing teaches you how to use it. So, we suggest four, typically, suggested next questions. You just asked about X. So, have you thought about asking three other questions in the same neighborhood?

And, we actually, all they have to do is click on the question, like, you know, and it just runs it. And here's another question, which, if you're ready to take that next step in that journey, this question opens the door to the next room in the house, if you like. Right?

Andreas Kollegger: It is. Yeah. Choose your own adventure at this point, right?

Michael Napper: Correct. But, but equally, you can just ignore all that and just type, right? Yeah. So I wouldn't say we've got all that figured out yet, but that's the approach, and we can only...And we've got this very diverse set of users and something that, you know, we're looking at is, you have a diversity of technical, you know, computer knowledge. Like, what's ChatGPT? never heard of it.

Andreas Kollegger: Right.

Michael Napper: What, right? So people are perhaps used to using a spreadsheet. Or, you know, maybe the limit of their tech knowledge is how to send an email. So there's a spectrum, you know. There's some experts, there's some people who are less-experienced. But equally on the climate access, you've got a very broad, very diverse set of experience and knowledge level. So we, we always have to try and intuit where the user is on those various axes and meet them where they are and deliver something valuable to them. And that's not trivial. But, it's interesting. And it's very powerful.

Andreas Kollegger: You guys are incorporating so many kind of next-level, advanced AI patterns already. To your point, everyone's talking about GenAI, you know, this year, certainly, I think in all industries, and yet in practice, there's still very few people who are doing things.

Everyone would like to, but nobody's necessarily taking the time to. And you guys are pretty much pushing the edge with, it sounds to me, like all the different elements you're really combining into this. With the agentic approach, like the multiple tools within an agent, and then all of the feedback mechanisms you've got, I think this sounds really, pretty marvelous.

So, kudos to you for what you've done. And learned Python in the meantime, and learned [inaudible] for the first time. Not a bad first job.

Michael Napper: And how many exclamation marks I've gotten? I'm getting very good at writing all caps and exclamation marks in the system prompt. You know, if in doubt, add another capital letter, you know. That's, you know, and the other thing is that there's just so many different language models now. Like, again, you could just get lost in this, and again, it could become a research project.

So we have to keep bringing it back. And there's always trade offs as well, like GPT-4's underlying model is more powerful, and it's great for writing Cypher. But it's slower, and it's more expensive, right? And you, you know, you look at what's going on with Meta and the open source models. I mean, it's just like, how do you even keep up with the news, let alone harness that into something that's production grade?

So it's, yeah, it's never dull.

Andreas Kollegger: I have to say, though. So this is the kind of conversation I love because this is really interesting on basically every level. Like it's interesting as a business proposition, certainly on the social impact, the social good that's coming out of, your mission is amazing. And then obviously we can geek out on the tech stack.

I have a bunch of follow up questions on pretty much everything you've mentioned. Maybe we'll have to invite you guys back, maybe in a month, actually, if you're saying you don't know what a month out would look like. Maybe we'll have to come back around and be like, actually the stuff we said last month, totally wrong.

Useless. We've got a new approach. That'd be a fun follow up. But, we've gotten, I think, you know, close to our time for this part of the segment. It's been marvelous having you, Michael and Henry. Thank you both for all that you've been sharing with us.

Henry Bruce: Thank you for having us.

Michael Napper: Well, thank you for the opportunity.

Andreas Kollegger: Do you guys, we're going to head into the tools of the month. Is that right, Jennifer? Is that up next?

Jennifer Reif: Yeah, that's good with me. So I'll just go ahead and toss mine out there really quickly. With the podcast editing that I've actually been doing on GraphStuff, for a little bit now, I've been using Descript and this was something that the previous person doing the editing was working with as well.

And I've just kind of started really enjoying it. It does really well. You drop a video or audio file in. It will kind of do an automated first-guess transcript for you. And then, as you make edits inside the transcript, it will edit the video for you if you so choose. But you can also make transcript edits, you know, customized, you can remove filler words and things like that.

And it just does a really great job as, as just like a first swath, you know, quickly get me to from point A to point B kind of thing. So really nice features there. They have, I think, a nice free tier as well as some subscription levels too. So, that's just been kind of fun playing around with that and getting to know that tool.

It does a pretty good job. So

Andreas Kollegger: Shout out to Descript. I'm going to stay, you know, onto the GenAI track here. And, we have our own tool. Lots of things going on actually with Neo4j, but the one I want to call out that has just been out for a couple of weeks is the LLM Graph Builder. Our guests today, Michael and Henry, have been hand crafting a graph, which is an incredible undertaking.

But it's intimidating, I think, for people who are just kind of like taking the first steps. This LLM Graph Builder is a nice kind of first step. If you're interested in these kinds of topics, you'd like to build something, but you're not quite sure where to start and how to kind of shape a graph. This LLM Graph Builder lets you take some text files, PDF files, whatever you might have. Even, you can reach out to YouTube videos. It'll allow you to call out to Wikipedia, if you want to, and just put a graph together. And then it'll show you the graph that was assembled.

It'll do some interesting things with like, with the text, breaking the text apart. But also doing what's called named entity recognition to pull out the people, places, and things, and creating a graph around the unstructured data. And then, within the same interface, you get to have a small chat with whatever graph, you know, was a result of that.

It's kind of general purpose and with most general purpose tools, like, you get a lovely out of the box experience. It's a good place to start to understand what's possible and how to think about what a graph might look like. And if you're just getting started, again, I think this is an awesome place to start.

Just Google for LLM Graph Builder. We'll include the link as well, I guess, in the show notes. It's a great new tool. It's available. Henry and Michael, you guys have got a tool, as well, you want to bring up?

Henry Bruce: Yeah, no, just on that, Andreas, the kind of the stacks of the, these tools that are coming out of the Neo team are super useful, and we've used many of them ourselves already. And they have helped us accelerate our journey so, so much. So, keep up the great work on that. Really second that. That's a fantastic kit.

I mentioned LangSmith earlier. We, you had Lance, talking about it briefly. We've been using it in earnest, and I've talked about ways that we're using it to understand the interaction with our conversational UI and the possibility with these evaluators.

It's just coming out of it's, kind of, beta phase. They launched their pricing, I think a few weeks ago, not to promo LangSmith on their behalf. But, we found it really, really, really useful and invaluable in trying and understanding kind of what our agent is doing and what value our users are getting or not.

So a big shout out to the, to the LangSmith product from the LangChain team there.

Michael Napper: I had two tools of the month, and that was one of them. I spent a lot of time on LangSmith and, observability is everything with large, complex systems. Yeah? Whether it's for debugging in dev or, you know, testing in QA or what just happened in production.

I mean, it's everything, right? You're blind without it. And it does really help us. The other one, which is, not particularly work related, only indirectly. I guess it is, it is related, and I'll explain why. I've switched, probably really the last six months, to Obsidian for my personal knowledge management.

And I will say, I keep my to do lists in there. And it's a great tool, if anyone doesn't know it. I don't own shares, but you know, I highly recommend it. It's pretty good. And it's a very rich, framework of, you know, community plugins, as well, which means it can do all sorts of things. 

But the reason I made that switch away from, you know, traditional SAAS to do list was, the key feature, which is what appeals to many people, is basically the data set is just a really big, set of markdown files, which all live on my Linux laptop and on my phone by replication, and I back them up to GitHub and stuff. 

So I am most of the way through the process of transferring nine years worth of climate learnings, hundreds of podcast notes, podcast transcripts, Kindle highlights, all sorts of stuff from, you know, Google Docs or wherever they are. I'm not doing it by hand. There's some scripting going on there, right? So it's a side project. But basically, once all of that data, which is textual and some structured, but mostly textual, guess what I'm going to do with it?

I'm going to write a semantic RAG search on it. Right? I'm just going to spin it.

Andreas Kollegger: Fantastic.

Michael Napper: Yeah, I'll probably spin up an Aura instance, AuraDB instance for myself. Put it in there and then it's, you know. I already use, LLM type tools and I've kind of, I forget what it's called...the, Tavoli, have a, actually it's now a product. I started playing with it when it was just some GitHub code, to run research. Okay? So, and I sometimes use that with my venture capital hat on as well. So tell me about all the companies that are active in this nascent industry. And it just goes off and searches and then gives you a nice kind of research report.

I want to start doing that. So it's all about having your data in one place. In a place where it's, it's got the relationships and Obsidian is kind of a graph. It has a graph view because it's, it's all about backlinks between this note relates to that note, this one relates to that one. But even just having everything in a bunch of markdown files in a folder structure means you can start running RAG searches on it.

And off you go, right? Suddenly, what was that thing? I'm scratching my head for those on audio. You know, where did I read about that thing about the other thing? And suddenly, it's your second brain becomes that much more powerful. Yeah? So that's currently my, I'm just doing that as a background job, migrating all the data.

And then as you can imagine, spinning up a simple RAG tool on that for my own personal use would not be too hard. Right? And then, you know, some of that content will probably find its way, you know, if licensing and, you know, data ownership restrictions are not in the way, we can find its way into our curated content.

And, you know, we're about guiding people and how to navigate, how to start, and then follow their climate journey. So, you know, it's all of knowledge is dispersed at the moment. Text and structured and unstructured and all the rest of it. So, ways to grapple with that and make it useful and informative are very, very key.

Andreas Kollegger: That is brilliant. Great stuff.

Jennifer Reif: All right. Well, we really appreciate, both you, Henry and Michael, for joining us today and kind of walking through, kind of, the data side of things as well as the business side of things, at ExpectAI. And we hope to be able to chat with you again sometime soon and continue the conversation.

Henry Bruce: We'd love to. Yeah. Thank you so much for having us.

Michael Napper: Absolutely. Love to speak to you.

Andreas Kollegger: Great. So then, Jenn, what's our next segment up then? We've gotten through the tools and that we're into, I guess articles of the month and videos and right.

Jennifer Reif: Yep. we'll just kind of toss out a couple of highlights and, Henry, Michael, feel free to stay on or drop off if you'd like. But, we'll just kind of drop into, Neo4j related things, if there's any highlights we want to focus on.

Henry Bruce: Thanks. Thank you.

Michael Napper: Lovely to meet you both. Thanks for your time. Until next time.

Jennifer Reif: Cheers.

Henry Bruce: Bye.

Jennifer Reif: Okay. So, is there anything, Andreas, you particularly want to highlight this month? I'll be sure to list everything in show notes for folks, so you can get a rundown of everything that's going on.

Andreas Kollegger: I have to say that every time that I look through the articles and I've been trying to keep up, as always happens, like it is such a fire hose of amazing content that picking out the favorites is always a little bit difficult for me.

Jennifer Reif: Yes.

Andreas Kollegger: And of course, I tend to fixate on the GenAI stuff and most of it is GenAI stuff. So that doesn't actually help narrow down the problem,

Jennifer Reif: Right?

Andreas Kollegger: So I'll just say like, I appreciate all the articles that have been coming out and all of the authors of them have been doing amazing work. And we'll just include them all in the show notes.

I think the highlights are: read each of these articles when you've got some time added to your reading list. If you want to learn even more about, actually some of the topics that were touched on today, knowledge graph, construction articles, which are amazing. Where to start is really hard to figure out.

Some of these articles are good for that. And then what does it actually mean to like do things like advanced RAG, like beyond just, you start with some vector search, you've got text, you do the vector search, that's a good first step. By day two, you realize, okay, that's got some limitations. What do you do next?

You'll eventually find your way down to the path that Henry and Michael were on. We're going to have an agent based approach, which means lots of different strategies for accessing information. Each of these blogs touches on different aspects of that problem. So I don't have a particular order that you should go through them in, but they're all like, you know, worth taking a look at.

And if you're on this journey right now, you'll find really good information here.

Jennifer Reif: Yeah. And I would, kind of seguing on from that too. We've kind of been doing a highlight of our NODES 2023 playlist on our YouTube channel. however, it was just released that the CFP for NODES 2024 is now open and it will close June 15th.

So if you are interested in either 1. attending the event, so you can register now to save your spot or 2. submitting your graph learning journey or story or, business case or, or anything along those lines. The more technical, the better for us. But please do, head out to, NODES 2024.

I'll put the link in the show notes for the site and the local place to find all the details. But definitely submit something, register so that you've saved your spot for this coming November.

Andreas Kollegger: That's a great call out, Jenn. And like, there's, again, like in this theme of like, there's so many things going on, all of them are like high value.

We haven't even touched on the ISO GQL announcement that, you know, this historical milestone that, you know, incidentally kind of passed by. That, first there's SQL somewhat 40 years ago that became obviously the dominant way of how to do query databases, structured query language, right? And there's been other, sort of database standards since then, query language standards since then that have happened, but that they haven't come from ISO, so they haven't been international standards organization blessed standards.

But now there's ISO GQL, which is not GraphQL, but graph query language, which is a query language that is now an international you know, standard that is based, of course, on Cypher, all inspired by Cypher. A huge collaborative effort, if you know, we were participating in that, but along with like lots of other partner organizations, a lot of amazing people got together and built this language.

And it's now public. It was very quietly released by ISO because they don't have a great PR department, or maybe any PR department. You'll start to see lots of articles about this. Keep an eye out for that. And I think in another week or two, we're actually gonna have a little Q&A chat with a couple people from Neo4j who are involved in the process from the beginning, from proposing the standard and then go following through to actually having the standard, you know, published.

It's an amazing journey from...Cypher was just some offhanded idea that Neo4j had some number of years ago has gotten to this point now where it's an international standard. It's an amazing kind of a journey and an accomplishment for Neo4j to pat ourselves on the back a little bit. But it is unbelievable to me that this has happened.

Jennifer Reif: But, by the same token, too, I think that bringing other vendors into this space and other experts that are not just Neo4j experts, but just kind of general technologists and other, you know, graphs and graph query languages that might exist out there. Bringing this all together is, is super helpful for developers because it sets a code of standards.

You know, if Neo4j went off, went off and did something crazy and rebellious, you know, in three years, that's, that would affect and change the direction of a, of a graph query language where this setting a standard for this ensures that, you know, no one person, no one vendor could make drastic shifts.

It's, you know, kind of setting a standard for this is how developers should approach this and setting this consistency that's going to help developers long term as well as maintainability and learning and all of that, down the line too. So, really amazing for us, but as well as for developer, technologist, vendor populations everywhere.

Andreas Kollegger: Absolutely. A great new chapter in the story of graphs, right?

Jennifer Reif: Yeah.

Andreas Kollegger: So that's something to look forward to, both in blog posts and videos coming up soon for, for events that are coming up. What's the highlight you've, you've noticed here?

Jennifer Reif: So there's lots of hands-on labs going on, I've noticed. So if you're wanting to get, you know, really hands on, play with the technology to actually do something and accomplish something with it, but in a very incremental and guided way, definitely check some of those out.

They range all over the country, I've noticed. Or all over the world, excuse me. There's plenty of various countries around the globe. Lots of meetups, a few webinars. So if you're, you know, not local or difficult to get local to different places, there's some virtual options to out there, so be sure to check those out across the spectrum of big data conferences, GenAI meetups and hands-on labs, some various technologies that are all kind of coming together.

So yeah. Check those out. Quite a few events happening throughout the course of May. We're jumping into, you know, kind of pre summer stuff. So it's like, get all the events in before everybody, you know, heads off on vacations. But, yeah, some really great things. So, be sure to check those out and I'll be sure to link everything in the show notes as well.

Andreas Kollegger: And, as you mentioned at the top of the show, Jennifer, of course, coming up in May as well, of all the dates, the most important one, May the 4th be with you.

Jennifer Reif: And also to you. So hopefully we'll have some nerdy geeky things coming out for that day. So be sure to check those out, too, or keep posted and feel free to post your own. If you have some Neo4j graph related things that coincide, with Star Wars and May the 4th. 

Andreas Kollegger: *singing Darth Vader theme*

Jennifer Reif: By all means, post those, and we will be happy to promote those and, repost those and, kind of carry that conversation and have a little fun. So

Andreas Kollegger: Awesome. Good stuff, Jenn.

Hey, great hour spending with you and Michael and Henry. This was a really nice conversation, episode.

Jennifer Reif: Yes. I enjoyed it and look forward to chatting with you again, hopefully next episode.

Andreas Kollegger: Cool.

Jennifer Reif: All right. Bye everyone.