Sepana CEO Daniel J. Keyes
18 April 2023Summary
On today’s episode, I’m joined by Sepana co-founder and CEO Daniel J. Keyes. Sepana is a decentralized search infrastructure provider. Developers can create Sepana engines, which are able to index data on any decentralized protocol to create Search APIs, or feed data into other Sepana engines. The community has built engines enabling users to search Mirror articles on L1, Optimism, and Arweave, Lens posts on Polygon PoS, and Farcaster casts on Hub network. Over time, Sepana aims to decentralize the operation of its backend software.
At the top of the show, Daniel and I discuss GPT and related Large Language Model tooling, and what impact they are having on his company’s work building new software products. We turn to how Sepana works, who it’s useful for, and how it’s being used in the market today. At the end of the conversation, we discuss how Sepana’s searchable Web3 data APIs may provide verifiable data sources to LLMs, in a world of bot content.
It was great talking with Daniel about Sepana. I hope you enjoy the show.
As always, this show is provided for entertainment and education purposes only and does not constitute financial advice or any form of endorsement or suggestion. Crypto is risky and you alone are responsible for doing your research and making your own decisions.
Links
https://docs.sepana.io/sepana-search-api
Transcript
Nicholas: Welcome to Web3 Galaxy Brain. My name is Nicholas. Each week, I sit down with some of the brightest people building Web3 to talk about what they're working on right now. On today's episode, I'm joined by Sipana co-founder and CEO, Daniel J. Keyes. Sipana is a decentralized search infrastructure provider. Developers can create Sipana engines, which are able to index data on any decentralized protocol to create search APIs or feed data into other Sipana engines. The community has built engines enabling users to search mirror articles on L1, Optimism, and Arweave, lens posts on Polygon POS, and Farcaster casts on Farcaster's hub network. Over time, Sipana aims to decentralize the operation of its backend software too. At the top of the show, Daniel and I discussed GPT and related large language model tooling and what impact they're having on his company's work building new software products. We turned to how Sipana works, who it's useful for, and how it's being used in the market today. At the end of the show, we discussed how Sipana's searchable Web3 data APIs may provide verifiable data sources to LLMs in a world of bot content. It was great talking with Daniel about Sipana. I hope you enjoy the show. As always, this show is provided for entertainment and education purposes only and does not constitute financial advice or any form of endorsement or suggestion. Crypto is risky and you alone are responsible for doing your research and making your own decisions. Hey Daniel, how's it going? How are you today?
Daniel J. Keyes: I'm good. I'm doing good. I'm trying to balance all the hectic nature of the world that we find ourselves in, but I'm good.
Nicholas: Which part of hectic world is on your mind today? Is it the AI or is it the end of Western dominance or microplastics maybe?
Daniel J. Keyes: I think it's all connected. It sounds like it could be. It's very, very scary.
Nicholas: It feels, I mean, the last hundred were pretty wild too. I talked to my grandmother sometimes, she's in her nineties and I think about what it was like in the twenties and thirties. So different, but this certainly does seem like an inflection point. Have you seen auto GPT?
Daniel J. Keyes: Is that the self healing one?
Nicholas: Oh, I don't know. You're going to have to tell me about that one. Auto GPT is the one where it can run sequential commands and feed the output of API calls that it's making and criticism of those outputs into its subsequent calls.
Daniel J. Keyes: I've seen that. It's like seeing your own brain kind of run itself on a screen. It's a super scary.
Nicholas: What's the self healing stuff?
Daniel J. Keyes: Very similar, but basically just printing out, I think it was a GitHub repo called Wolverine, which is a beautiful name. It basically just kind of outputted the code errors and then inputted that back into a GPT prompt.
Nicholas: So code writing styled GPT.
Daniel J. Keyes: Yeah. Code writing that then produces wrong code. that then fixes itself automatically until you get the output that you're supposed to.
Nicholas: Very cool. And it really seems to work. It seems to output something usable.
Daniel J. Keyes: Yeah. And I think even more interesting than that is the, have you seen this paper about GPT's ability to correct itself by just prompting itself if it had succeeded in answering the prompts requirements correctly. So it increases accuracy by 20 plus percent. If after the response, all you do is say, did you cover all the requirements? And then just by having that introspection, it almost has this emergent property of learning. And it's super interesting because it seems like this is coming from the complexity of the system and from nowhere in particular. But it's also interesting because it changes the competitive landscape because now much smaller models or models that are trained on kind of only the outputs of other GPT models can heal quickly to becoming just as competitive or just as good. So it's this kind of race to the bottom, so to speak.
Nicholas: Yeah. That's interesting. I was looking at something. Have you seen Vicuna? No. It's like a variation on alpaca. These like tuned llama. There's a great thing. If you go to vicuna.lmsys.org, so vicuna.lmsys.org, they are using GPT-4 to rank the quality of their output as compared to 3.5. And they basically do on the same prompt from raw llama, alpaca, vicuna, barred, and GPT-3.5, and then have GPT-4 assess the quality. Describe the differences and rank on 10 to tune themselves, which I guess a lot of people are doing this kind of thing now. But pretty cool. It makes me wonder if the solution, I mean, will they try to ban this use of the API to kind of exfiltrate the latent knowledge and intelligence in GPT-4? I mean, obviously you could run GPT-4 on the requests and say, well, did these requests look like someone trying to train a competitive model or not? And let's kill their API key or something.
Daniel J. Keyes: Yeah. I think the competition in this kind of post-GPT world is really interesting to look at because not only are so many of these models open source, I think one of the most amazing things about chat-GPT was that there's no real secret to what they did. Like they just kind of did it a little bit better than everyone else who was trying to do it. And so basically all of the methodologies and the practices that they used are well-known. And so even when they succeeded doing something, I think it's really just a matter of weeks or months until the next large train is capable of doing that as well. So I think it's going to be really, really hard to create long-term moats in this world. I think we're just going to end up seeing this kind of foundational power kind of go to the biggest players and then they'll continue to kind of win and compete on things that they've already been winning at. So like Microsoft can embed it into their entire suite and then use the proprietary data that they have. And the same is true for kind of every other big sales force and Google and every other big player. So, but I think it's going to be very hard for smaller companies to compete in this world because everything becomes a feature that can be built in orders of magnitude less time than before. So it's definitely a new competitive landscape that I think we find ourselves in.
Nicholas: Yeah, I've had this idea for a while that the real metaverse land is, as it always was, just attention. The same as the web was attention and having the domain name or app icon that people have on their phone, what have you, is what, or the API people are familiar with or OS affordances that people are used to or the bank identity platform, et cetera, becomes, you know, they just pick up the new technologies, but, or being an influencer, if you already have an audience on YouTube and you're now super charged with, I don't know, Dali two or whatever, you know, you don't lose your audience from one day to the next, but it maybe does give some advantage to people who think differently than those organizations. I mean, innovators dilemma has always been a case of the, it's not that the larger organization doesn't have the capacity. I mean, even right, right now we're looking at it now, like Bard sucks compared to GPT four, but even GPT 3.5, it's, it's sort of not as good, but, and they totally innovators dilemma themselves into not shipping an LLM years ago, right?
Daniel J. Keyes: Yeah, I think that's definitely true. Like they, they definitely fumbled the ball there, but I think what's important is what will happen now that they realize it. And, and I think that what's perhaps different about this time around is that they already have everything they need in order to build it. So it's just a matter of, can they pivot deep mind into this kind of work? And I think that because these things will become like so foundational that perhaps that headstart doesn't matter as much as we think it is, right. They still have billions of users and perhaps the, you know, the greatest data set of all time. And so it might matter for the kind of first leg of the race, but this is many laps around. And so I still think they're going to kind of continue to be a big name. And it's not like, you know, we've seen the end of Google by any means.
Nicholas: Totally. I mean, look at IBM or Microsoft or all these other companies that have stuck around for a long time. I think about, I don't know exactly what point he was trying to make, but Peter Thiel's a zero to one graph. And the first half of the book, he shows like the revenue generation potential of LinkedIn or PayPal or some like large company. And that the majority of the revenue that they generate is, you know, 15 years after it's not cool anymore is when they're actually, actually making the money. And I think that's probably to some extent still true for Google and things like that. I'm not going to stop using Gmail tomorrow just because GPT exists. By that logic, like, so let's say there is some auto GPT that could ostensibly create an equivalent service to Gmail, but without all the cruft of years of development, is it going to convince me to switch? I'm not sure. It's going to be pretty uphill for a startup to use that GPT power to directly replace an incumbent that I'm more familiar with, but maybe to build some new service that I've never used, it'll just be purely helpful to startup founders.
Daniel J. Keyes: Yeah, I think it's even more sinister than that because what GPT allows the incumbents to do is become even better. So if what before a small change in user behavior could carve you out a part of a business, then you couldn't continue to like, you know, I'm thinking of, you know, superhuman, for instance, a better UI kind of, you know, brought them tens of millions of users, but now that kind of the power up that they get from GPT is just, it's unparalleled. So it'll make even, even the bad experience is so much better.
Nicholas: But you're saying that there's maybe a chance that the, anything could be replicated by the people who really have the cheapest access to the models and the most talent to build derivative, even software development tooling, that they can just outpace any new player.
Daniel J. Keyes: Like I think, I think at its base, GPT is like a reasoning engine and it's going to get much, much better at reasoning. But even in the versions that we have now, that with a very large corpus of data is already enough to kind of build a 10X, you know, a hundred X improvement to Gmail. And so the players that already have that name brand and that data, I think we'll be able to utilize the benefits of GPT for anyone else.
Nicholas: If their organizational lethargy doesn't get in the way.
Daniel J. Keyes: Yeah. Well, there's still going to be a ton of businesses that go under because of this, because those businesses are now just one query in a big.
Nicholas: I think it's interesting though, because we were being told for years that no one would be able to compete with Google on machine learning because they had all the data or Facebook because they had all the data, but open AI came out of nowhere, essentially. I mean, they had a lot of money, but nowhere near as much money for R and D as those companies did and certainly not the amount of data that they have. I think my impression is that they got all the GPT data from essentially scraping the web. Maybe, maybe some deals with people, but you know, not, not deals with Google, not deals with Facebook.
Daniel J. Keyes: Yeah. I think, you know, they're also like a seven year overnight success. They've been working at this for a while and they've built up, you know, hundreds of the world's best AI researchers, but yeah, most of the data online is public, right? You have these kinds of internet archives. I think one of the key things that open AI was able to do was to like, the problem is really a filtering problem because most of the internet is information that you don't want to bring into your model. You want to be able to duplicate pages and data and take out spam and kind of just trash websites. And so being able to do that at scale is really, really tough, right? You can't even like load that much into memory to then analyze it, to then make a decision on it. It has to be kind of other machine learning models that can run on that petabyte scale of data and make those decisions and then eventually bring you a dataset that's trainable. But you know, I think one of the reasons it took so long for these things, you know, we've had the algorithms behind this for quite some time, but you know, in the beginning it was just so expensive to run. And so to spend tens of millions of dollars on a certain parameter change in an experiment to run something that you have no idea what's going to come out on the other side is really hard to do and hard to get, I think, the okay for, which is why it's been kind of dominated by just a few players. But now that kind of the cat's out of the bag, I think it becomes really interesting and dangerous because you just have to think that every large organization and every nation state actor is going to just, there's a tiny percentage of the defense budget, you know, for countries and they're just going to throw everything they can at this. So it's a nuclear arms race. Yeah, absolutely.
Nicholas: I want to dive into Sipana Uweju. I just want to say hey to Uweju, founder of Interface App. How's it going?
wijuwiju: Hello. Thank you.
Nicholas: How are you Uweju? Long time no speak.
wijuwiju: Awesome, thanks. Yeah, likewise. Nice to hear you guys. Also very interesting. I just wanted to actually to ask like regarding when you mentioned the cost of switching to from Google mail, you wouldn't use, this is something like we're actually looking into. like to basically, because even going back to like Microsoft Office, like we're looking at like digital ocean and stuff like that because it would make sense to go full stop because like you get Skype, you get notes, you get everything for that and then everything synchronized and then the models, well the AI can have access to all that. So it just makes your flow so much better.
Nicholas: So you're thinking about switching to Microsoft?
wijuwiju: Yeah, this is something like I would never would have thought of, but then yeah, I mean, why not? if it's like the better look of it, it can bring sort of efficiency into the operations.
Nicholas: I haven't looked much at what they're doing with Office. I saw that they were doing things, but I assumed it wasn't that great. Is there anything in particular, the email stuff you find looks good?
wijuwiju: Well, but it looks, I think I saw like a preview video or something. So it's like a superhuman plus. then for all the kind of documents and it can access all the sheets, it can access all that basically emails, they can basically during the call, it can I assume document everything, like everything is basically accessible to it and then can kind of reference a lot of it. Maybe Daniel has more on that because I just watched the video, I think of the preview, how it all synchronizes, it just made a lot of sense.
Daniel J. Keyes: Yeah, it's interesting. I think we're seeing it come out of basically every large kind of app ecosystem. And I think actually Apple is going to be the biggest here because their dominance in hardware. But basically, you can use kind of existing infrastructures to bring data into GPT prompts. And what you get is a much more powerful kind of experience that, you know, obviously you can kind of search and create and iterate with. And so you're seeing it across the whole suite of Microsoft Office and Google announced it too. But actually, I think like one of the really interesting places for innovation in kind of, you know, GPT is sharing state between models. So you're still going to want to do this on applications and devices and browsers that are not necessarily in the same ecosystem that might not do it natively because they share a common back end. And so being able to share state between prompts and between kind of where a model is will be really crucial. Yeah. And it's even, I think, a place that maybe kind of Web 3 can help pioneer because of this kind of shared infrastructure that sits below so many applications.
Nicholas: I'm also excited by the like. you see llama.cpp. They basically compress llama into something that can run on these smaller and smaller machines. You can run like the full fledged llama model on 35 gigs of RAM. I think it's made for a MacBook as the target device, basically. So it'll be cool to see more and more of it running on hardware. I'm wondering if you're right, if Apple will quickly follow up with something. It feels like they have some genius for at least the hardware side of this. But if they've actually been building software, I mean, it seems like all voice assistant development stopped five years ago, basically. Google teased some things, but I don't know if any of it ever actually came out. Whereas Siri is stuck in the past. So I don't know. It seems like this is the kind of thing that takes a while to get to something production ready. I wonder if Apple will catch up that quickly. But it will be exciting. Yeah.
Daniel J. Keyes: Yeah, there's kind of two competing trends here. One is that it seems like the larger the model is, the better it performs. And so the more parameters it has and the harder it is to run on anything but a GPU cluster. But at the other side, this is going to be embedded on kind of devices and at the edge. And so I think that there's going to be that kind of trade-off game between which model to use and where to use it. And I think it was maybe a week or two weeks ago, Apple released a performance benchmark of these large, I think it was stable diffusion and a couple other models that run on M1 hardware and M2. So yeah, they're definitely, you got to think that they're kind of aiming for this. This is the next kind of unlock for computation for them.
Nicholas: Yeah, definitely. Well, this is very interesting, but we are here to talk about Sipana and I'm sure we'll come back to talking about how GPT has an influence on, and LLMs in general have an influence on what Sipana is up to. But for the people who aren't familiar, what is Sipana?
Daniel J. Keyes: Yeah. So the TLDR is that Sipana is a decentralized search infrastructure for blockchains and Web3. And so what that means is that we've built a suite of different tools for searching dApps and blockchains, including search engines for individual dApps and protocols. We built an infrastructure called Cloud Search, which allows every dApp and protocol to integrate powerful search engine into their app with a few clicks. So we have an API and a developer dashboard where you can go and kind of spin up an engine and define the data that's streamed over to it. And we've actually kind of been pioneering ML ways of querying that data. So using kind of similar type processes like these large language models to do better search into that engine and then also power cross dApp search. One of the really interesting things about data and Web3 is that it's for the most part open source. So you can build applications and search experiences that are very hard to do in Web2, ones that cross many different domains. And so you can aggregate data from different protocols and different dApps from different social networks or blogs on Web3 and kind of roll that up into a much better data access layer for devs and for users. So you can imagine searching across Lens and Ethereum and Mirror kind of all in one go. And so those type of experiences are what is enabled by the kind of platform we build and a lot of the innovations.
Nicholas: Yeah, I want to get into some of the details, but just to give people a little bit of context, what were you up to before you started working on Sipan and what was your life like before Web3?
Daniel J. Keyes: Yeah, so, well, I mean, the first thing I did was I actually studied philosophy and theology for a number of years. That was kind of my first my first passion and love. But I'm a technologist by trade. So I worked in a physics lab for the Department of Defense in Israel for a number of years. And then I studied machine learning and computer science and worked on kind of NLP and those type of models and was in venture capital for a number of years after that, investing through a PC here in Israel. That's where I'm based. And a few years back, I started working on a new type of search engine for the web for Web2, which was essentially an AI-based search engine. And our kind of thesis was to kind of revolutionize how people do research on the web. And it's really interesting to see GPT do that today. Kind of so many of the features and things that we had kind of built very narrowly are now kind of part of this very large capability that GPT has. But kind of through that process, through trying to get access to data in Web2, we kind of stumbled upon this really frustrating bug or feature of Web2, which is that not all the data is accessible to you. So if you think about platforms like LinkedIn or Twitter, you can interact with that on the UI. But on the data side, they shut down their APIs. It's really hard to get kind of root access to that data. And so there were all of these search tools that we wanted to build on top of that data and just couldn't do. And so that kind of led us down the rabbit hole of Web3 and open data and blockchains. And we started building and tinkering and kind of spun up a few engines for different protocols. And those kind of took off. And it really showed us the way that you could build a new type of search infrastructure for the web, specifically for the open web. And that's kind of. that's what led us to Sipana and work on this.
Nicholas: Amazing. So the problem that it's solving then is basically giving people access to the data that is available one way or another. The first example of Sipana that I saw was the mirror search. I think it's. is it ask.mirror?
Daniel J. Keyes: Ask mirror. Yeah, askmirror.xyz.
Nicholas: Askmirror.xyz. So this is search on top of mirror, which at least at the time, mirror had no built in search in their website. So that's fetching data. If I recall correctly, the mirror stuff is on our weave, right? So you're indexing material that's available, not just directly on the blockchain. So it's not just an equivalent to something like the graph or I don't know, the EtherScan API or Infura API. Instead, it's also going and fetching things beyond the blockchain itself.
Daniel J. Keyes: Yeah, exactly. So I mean, web3 is this kind of umbrella term. But really, there's different networks and protocols and dApps. Some of those are centralized. Some of those are decentralized. They use all different types of backends and messaging protocols between them. And we didn't think the world needed another block explorer. We thought what's needed was a way to kind of search and aggregate and discover all of that content. And so the first thing we built were these kind of vertical specific search experiences. So the data within each of these applications and protocols is kind of so confusing as it is. Right. I mean, even mirror, they use RWeave, they use Optimism. There's a whole bunch of different things that happen in between there. And so it's already hard enough to find things within vertical specific places in web3. But because that data is open source, you can kind of see that data structure. And so you can build breadth and depth search at the same time. Right. You can kind of understand the underlying data structure of each application or each contract or each protocol and build very, very good, very accurate search for that. But then you can also aggregate that across kind of a number of applications and chains. And that's a very unique thing to web3. In web2, at least today, still the best search for individual platforms are within those platforms. So no one understands TikTok data as much as TikTok. So the best way to kind of find creators or videos on TikTok is kind of through their search and application. On web3, that's flipped. You can kind of see all that data and you can build what we call composable engines, which are ways of kind of putting data from one engine into another and kind of using that for better recommendations, better personalization, a lot of these kind of downstream features.
Nicholas: So does each application, like I know you also do search for Lens, is it one engine for all of Lens? Do you need an engine for each protocol that you're indexing?
Daniel J. Keyes: Yeah, so it could be one engine for Lens. You could also do kind of an engine for parts of Lens, right? So if you only wanted to take music creators on Lens and take that and connect that to something else, you could spin up an engine just for that. But you could also do an engine just for Lens.
Nicholas: Got it. And so Mirror, for instance, is just one engine that's able to index across the three protocols like Mainnet, Optimism and Rweave? Yeah. Got it. And how does this compare to something like Dune or The Graph? I know The Graph also does, I think, a little bit of IPFS indexing if you want. So it can do some bridging across protocols and is on some chains. I mean, it's a pretty technical audience. So maybe you could explain a little bit how it compares to these other options that are out there.
Daniel J. Keyes: Sure. Yeah. So in general in Web3, almost everyone is kind of looking at the same data. The question is how you organize that data and how you structure it for different applications. So The Graph is a really good kind of database for many different applications. You can think of them like your Postgres. And we're on the other side of that, kind of solving for more search, kind of specific applications. So we would be your Elasticsearch, Algolia, your kind of embedding space or kind of VectorDB for different types of things. So it's the same data, but different types of applications. And then Dune would be the same thing. They're kind of building up their kind of SQL style data retrieval specifically for graphs and tabular things that they can do.
Nicholas: Right.
wijuwiju: Yeah.
Nicholas: Whereas Sipana is more focused on search specifically, or you just get search naturally with the way that Sipana is indexing. Is search more a convenient way to explain the functionality of Sipana or is Sipana's product really essentially tied to search specifically?
Daniel J. Keyes: It's a good question. It's a combination of both. So there are infrastructure decisions that we make so that we have the best performance, the fastest search possible, on par with kind of any other infrastructure in Web 2 or elsewhere. But then there's also the kind of features that are built around that. So one feature we built specifically for this was Web 3 data resolvers. So if you have a data object, some of that data is on chain and some of that data is off chain and some of that data is in IPFS, we'll, for instance, grab that IPFS data in real time and then bring it back and store it within that same data object. So you can now search through the data that you have, data that's on chain and the data that might be on your central storage. So it's kind of a much more intuitive and kind of useful way to declare the kind of data that you have in your application and then build experiences on top of that. So it's a combination of kind of bottom up infrastructure and architecture decisions and also the type of features that we build into that. So another example would be kind of a focus on personalization and kind of recommendation algorithms that are specific towards those types of experiences. So that kind of wouldn't come out of the box with Postgres or Mongo or things like that.
Nicholas: Right, right. I saw you've also got someone's built an engine for Farcaster's casts, which is a pretty recent protocol and a smaller one. So I guess Sipana is able to adapt to new protocols that emerge. It's not fixed to EVM or RWE for these things.
Daniel J. Keyes: Yeah, exactly. It's, you know, sometimes we're running nodes ourself, gearing up to launch kind of native support for a bunch of chains. But it's also just. you have the ability to kind of define that data and then stream it over. And so wherever that data comes from, we're kind of agnostic to it because we're essentially kind of building search on top of that.
Nicholas: Got it. I see. Are there any other applications that any other interesting engines or things that we haven't talked about in terms of using Sipana to establish search with a protocol that people might not imagine is possible or available elsewhere? Anything you've seen in the wild people are doing that's cool?
Daniel J. Keyes: Yeah, I think a lot of the ones, a lot of the really interesting ones revolve around kind of social first things and Web3. That's kind of Lens and Diso and Mirror and, you know, Farcaster. And we've seen PoOps and ENS and things that are kind of not exactly DeFi, although our infrastructure works for DeFi. But there's kind of a lot of existing infrastructures for that. So we've seen a lot of kind of building and searching around the things where you want kind of more human readable kind of interaction with the underlying content. And I think Web3 in general is moving to that direction. I think, you know, DeFi is critical and kind of sound money is critical. But these other experiences that can be built in this type of Internet where users have ownership over their data, where things are transparent, I think those are going to get built out into more and more user experiences on Web3. Yeah. So that's I think where a lot of the kind of tinkering and innovation.
Nicholas: And is it useful? Like, are these social apps using Sipana exclusively for their search function or would it be applicable for feeds or other kinds of content discovery display?
Daniel J. Keyes: Yeah, no, you can use it for for other things. You can use it for feeds or for in general as a data store. If the type of data that you have is more attuned to this type. In general, our infrastructure comes with kind of the state of the art search, but then also a very good database for structured data as well. So you can kind of power all of those things as well. I think actually a lot of the applications that we use are built on top of these type of infrastructures and we don't really realize it. So, you know, I think, for instance, Coinbase, their NFT wallet is built entirely on a search infrastructure, if I'm not mistaken. So those type of applications are really easy to spin up.
Nicholas: For someone who's using the graph, is it really like end user search that makes Sipana relevant? or where does the graph start to not provide enough and you need something that Sipana is offering?
Daniel J. Keyes: Yeah, so I think if you want to use off-chain data, you want to not necessarily take data from a contract or a place on chain that you know how to define, but maybe you have other data in your database or another kind of place in Web3 that you want to get access to. We're really good at helping with that. But then also on if you need an experience that's beyond a database that you want to search for, that you want kind of advanced filtering and the ability to kind of sift and search through different fields, machine learning search on top of that, all of that you can find on
Nicholas: Sipana. Because you can do a kind of fuzzy search with Sipana, right? You don't need to be searching exact character, correct? Yeah, yeah, exactly.
Daniel J. Keyes: That's kind of where these infrastructures excel at. Imagine you're searching for something on Twitter or Google and you don't want to find the exact thing, but you want to find things around it and things that are related to it. You want to kind of add in other signals to that ranking algorithm. All of those things are kind of hyper optimized for these type of infrastructures.
Nicholas: So if someone was going to build like a new forecaster or lens, say, iPhone app or web app, would it make sense for them to use and they can get access to the engines like the engines are available once someone creates one, something that anyone could use?
Daniel J. Keyes: Yeah, exactly. So you can build your own engine on it, but then you can also access other engines that are public. So you have the ability to create a private or public engine and so you can access public engines and feed into other applications, engines, data into your experience. So if you were building a new type of Web3 music app and you wanted to pull in data from lens and forecaster and mirror and NFTs and things like that, you could do that.
Nicholas: So it's very useful, it sounds like for aggregation, like back in the day there was what, friend feed? Like aggregating different things, different social networks into a single interface. If the engines exist, that sounds like it would be pretty trivial to do. First upon it to create a new interface that lets you traverse all of them.
Daniel J. Keyes: Exactly.
Nicholas: Very cool. You mentioned briefly, but how do you think LLMs and GPT change can interact with something like what you've built? Does it fit in nicely?
Daniel J. Keyes: Yeah, well, I mean, as you spoke earlier, I think more broadly, it's really kind of changing the landscape for everyone. And I think every company is, if they're not, then they should be kind of pushing themselves to understand how this affects them today and how kind of second order effects of this will come into play. So, you know, one thing that I think we'll begin to see is that basically every application, every text field will be kind of GPT embedded. So you'll be able to pull data in from whatever database you use that's optimized for your type of experience. And maybe that's Sopana because you have a lot of text and you want really fast retrieval and things like that. And so you can pull that data in, bring it to a GPT like model and then interface with the data in this kind of really powerful chat way where you can, you know, I think in general, search is moving to something that's retrieval based to something that is interactive and agent based. So you're not only kind of calling on data, but you're also interacting with that data. So, you know, show me the accounts and the addresses that have done this and that, and then turn that into a table and email it to my team. Those type of things I think are going to basically be ubiquitous to every place that you interact with data. So that's like the first, I think, first order of change we'll see. Second order of change is that in a world that's flooded with post GPT data, it's going to be basically impossible to tell the difference between content and data that originated from a human and one that originated from a bot or an agent. And so I think that if you stop and think about all the different functions in society and business that rely on this, the kind of ability, the fact that you know that it took someone a while to do that thing, right? Like you get an email or someone wants to fight a parking ticket or you issue an insurance claim, right? Like there was work that gone into that and that kind of goes out the window now. So you can kind of basically DDoS anything and kind of just flood the channel, so to speak. So I'd imagine an airline customer support line that now with with Whisper, you can create an infinite amount of totally real sounding calls. So I think kind of Web3 and blockchains and proofs in particular are going to be the kind of anchor against that, right? You're going to have to prove and verify not only who you are, but where that data came from and the process that that data took in order to get to that end point. And so there's a lot of interesting work that the team is doing around that. In our protocol, we've been working on something that we call proof of ETL, which is essentially a way to create ETL pipelines that do exactly this. It's from the genesis of that data to every type of transformation and change that data would go through to the place that you store that data in a database. All of that can be traced and improved so that you can store that object in a database and then interact with it with kind of a much, much stronger kind of integrity check around that data and its origin. So I think that's kind of like where crypto comes to defend against the age of AI.
Nicholas: That's interesting because I've been worried that I mean, obviously, I agree with you. And actually, your example of flooding a call center reminds me a little bit of. I read an article recently about apparently Rupert Murdoch in the early 2000s. I think he owned Sky in the UK and they, Murdoch through Sky, employed, essentially gave like Juarez hacker forum administrator cracked copies of their rival satellite companies, satellite keys, whatever the thing is that you purchase in order to get premium access to their satellite network to distribute it online so that they would essentially lose all their revenue. And Sky could be the dominant television satellite provider in the UK. There's BBC articles about it. And obviously the same thing applies, right? You could flood whatever rival airline with customer service calls, although it seems like for all of recent memory, they've always been over capacity.
Daniel J. Keyes: And yeah, it can only get better. Yeah, exactly.
Nicholas: But it seems to me like the thrust of that is towards like. one version of this is like, there's no reason anymore to argue on the internet. There's no point in arguing on the internet anymore because obviously any half intelligent person will just have a bot manage their side of the argument and go and go do something more interesting or fun. And the consequence of that, in my imagination, is that KYC becomes the counterpoint to generated text and audio and video, et cetera, which would be the opposite of crypto, you know, more leaning on traditional identification structures. But you think that key pairs and proofs on the basis of private keys might be a solution instead. I think the problem is that the bots can generate those two, you know?
Daniel J. Keyes: Yeah. So I mean, KYC only goes so far, right? So there's like different levels of this authentication problem, right? There's. you indeed are a real human. There's the output that you're. Yeah, exactly. Yeah. We're all just a shadow on someone's ball. But, you know, you're. you know, you're a person and then the output of that person is them. Right. As opposed to generated. Like imagine that you want, you know, with with mid journey today, you can create evidence for basically anything. So, you know, there's an insurance company and here's my car crash. It's the right license plate, my face in it, you know, video and audio from everyone around in the crash. And like that's coming from a KYC person. Right. So I think that, you know, one place that crypto. I'm using crypto very broadly here, right. It might not be kind of on chain Ethereum transaction. It could be much stronger cryptographic proofs that are ZK enabled that allow existing business functions to survive. Right. Being able to say, OK, I know the origin of this, this data and I know that nothing else or nothing more was added to it. So, you know, I think in the very near future, we're going to see hardware or applications that are very limited to what they can do. Like I could imagine a social network where you can only post if you've actually hit all the keypad in the right areas or cameras that pull pixels and add that directly into a proof. So I think those type of things are going to be necessary to have for the times that we want to interact with humans, which might be way less, you know, like, you know, robots are awesome. We're still going to need ways of validating that.
Nicholas: I guess we've seen shades of that with, I mean, simple software things like Snapchat only being originally at least only being able to send photos that originated within the app, which as long as you don't have a rooted iPhone, you know, you're sort of depending on Apple's iron fist there or equivalently in the hardware, having components that will insist that other components
Daniel J. Keyes: be
Nicholas: have some kind of manufacturer's signature on the component in order for the device to power on. These things aren't spoof proof, but they're for regular people, some level of assurance. But it does make me wonder about like, I don't know a whole lot about this kind of like hardware signed images or sensor data signed by the hardware. And the game theory of proving that the hardware was not tampered with, I guess people like WorldCoin are also dealing with these things.
Daniel J. Keyes: Yeah, yeah. It's not to say that these things are not tamper proof. And, you know, obviously we're all kind of subject to the limitations of the hash functions that we use. And but it lowers by orders of magnitude the chance that you'll be spammed or DDoS by these things. Because I think one of the things that's coming is that you used to have to be very sophisticated, very smart and very bold to do kind of like fraud and attacks on these levels. But now a lot of that has gone to zero. Right. So you can create with GPT, you know, fraud financial data with a click of a button. Right. Like, you know, here's this company, here's this Excel. Now tell the story like that. That is very easy to do. And so it's going to be, I think, even in places where we didn't think we needed to trust, we're going to need trust because it's going to be so easy for the other side to manipulate us without us even knowing, especially for results that are non-deterministic. Right. Like it's for models or for things where I know the answer to it, I can easily check it. That's one thing. But if it's just stream of data that could be this way or could be that way, I'll have no idea that I'm interacting with the wrong person or the wrong data or the wrong model. So I think it's going to kind of introduce a whole new era of what it means to have trust online. I think it's it's interesting to think about, you know, in the age of AI where, you know, GPT eats software. You know, one of the only things that thrives in that time is the need for trust. Right. Because it's like if you imagine this trend line, this trend line going up and to the right, countering that is the need for validation and trust. And so I do think it's this like kind of trillion dollar business that's coming. It'll be interesting. And I think crypto has a huge part to play there.
Nicholas: Yeah, definitely. At the very least, because what's easier to create as a financial account than keep pair? Yeah. No, no KYC required. So that's the end of my list of questions. I don't know if there was any topics that we didn't cover that you think would be relevant to discuss or that you'd like builders in the space to know about Sipana.
Daniel J. Keyes: I think we covered a lot. I mean, it was very interesting. And kudos on kind of putting together this series. It's been interesting to listen to.
Nicholas: If people want to check out Sipana, what's the best thing to do?
Daniel J. Keyes: Yeah, we're just Sipana.io and we're going to have a lot of updates coming in a very short order.
Nicholas: Have you super sharpened your building process with GPT?
Daniel J. Keyes: Yeah, definitely. I like almost never get to code just because of my day to day. And just today I got to like ship two new things that I wanted to build. And it was just easier to do with GPT than, you know, have someone else look into it. And yeah, it's incredible. Like the before it puts us out of a job, I think we'll see productivity rise a great deal.
Nicholas: Exciting times. Well, Daniel, thanks so much for coming through. This is great.
Daniel J. Keyes: Thank you very much.
Nicholas: Take care. All right. Thanks for everybody for coming to listen next week. Actually, there may be a pause because I'm going to be in Tokyo for ETH Tokyo for a couple of weeks. So we will see if I'm able to maybe do some off hour versions, maybe with some locals. But nevertheless, keep tuned to the Twitter to find out when the next episode is. And thank you for coming through. See you next week.
Show less