Web3 Galaxy Brain đŸŒŒđŸ§ 

Subscribe
iconiconicon
Web3 Galaxy Brain

Eito Miyamura, ZK Microphone

18 January 2024

Summary

Show more

Transcript

Nicholas: Welcome to Web3 Galaxy Brain. My name is Nicholas. Each week, I sit down with some of the brightest people building Web3 to talk about what they're working on right now. My guest today is Eito Miyamura, co-creator of ZK Microphone. ZK Microphone is an e-global Paris hackathon project that prototypes generating hardware-assigned audio recordings, which uniquely link a file of captured audio to the device that recorded it. This application is enabled by hardware security modules, also known as Trusted Execution Environments, or Secure Enclaves. In this conversation, Eito explains how hardware-attested recording devices work and how his team used zero-knowledge provable computation techniques to go beyond and enable editors to mutate hardware-attested audio files while maintaining a cryptographically provable link to the original recording. It was great getting to know more about Eito, hardware attestation, and the home DAO hacker community in Oxford that brought together the ZK Microphone team. I hope you enjoy the show. As always, this show is provided as entertainment and does not constitute legal, financial, or tax advice Any form of endorsement or suggestion. Crypto has risks, and you alone are responsible for doing your research and making your own decisions. Hey, Eito, how's it going? Am I pronouncing your name right? It's Eito. Eito.

Eito Miyamura: Thanks for asking.

Nicholas: So we're going to talk today all about ZK Microphone and hardware-attested recording and things like that. You did a project at ETH Global that was very popular, this ZK Microphone project. Maybe to start off, what's the problem that hardware-attested recordings are trying to address? Yeah, absolutely.

Eito Miyamura: So the idea of ZK Microphone was to solve the deepfake problem on the audio medium. In particular, the question is, can you have hardware-signed audio? So inside of the microphone, you sign the audio that was recorded on the microphone so you know that that particular audio was signed, but also edit the audio so that you can, you know, have your cake and eat it too, where you can, for example, censor out parts that are problematic and allow you to do that using zero-knowledge proofs. And yeah, the idea is basically, the end problem that it's solving is when you hear a piece of audio on the internet, can you tell if it was real or AI-generated? That's ultimately what it comes down to.

Nicholas: And in this world, I guess maybe we can talk about sort of hardware, tested recording, and then get into this sort of ZK editability that you've introduced. Does that seem like a logical flow? Yeah, absolutely.

Eito Miyamura: In fact, I would say what we've done is pretty much an iteration on the basic hardware, like attestation, but I'll let you take that first.

Nicholas: Yeah, I mean, I think people are becoming familiar with this a little bit more, things like WorldCoin in the crypto world, but also a few months ago, Leica put out a camera that's got a HSM kind of attestation, maybe you're familiar with. And since then, I've heard Nikon, Canon, others are doing the same for photography, and I'm sure there are other examples in audio too. But maybe the way I understand it is that a hardware security module, for example, something people might be familiar with, like a secure enclave in an iPhone, is a hardware chip that has a private key in it that every time it's accessed, there's a log that is kept of attempts to access it or to use it to sign things. And the hardware is designed in such a way that it's tamper resistant. So if someone were to try to exfiltrate this secret material from the HSM, it would be obvious or it would stop functioning. And that private key is used to sign data, for example, metadata or a hash of an image or of an audio file in order to attest that this physical device, this recording device or camera, whatever it might be, was the place that this data was captured or generated. And that no other source of data could be signed with this HSM because it's bundled in a hardware package that's tamper resistant. Is that the basic premise of a hardware-tested recording?

Eito Miyamura: Man, you've done your research. Yeah, you pretty much did all the explaining that I was going to do.

Nicholas: Well, it was a good show.

Eito Miyamura: Yeah, definitely. Well, we're definitely missing the zero-knowledge part. So we'll get to that in a second. And sort of the criticism of the existence, whether it be the Kodak cameras or the Canon cameras like Secure Enclave hardware signatures.

Nicholas: Yeah, I think there's a lot to understand in just the basic premise before we even get to the ZK stuff. I mean, the first question that comes to my mind when I hear about that is, can we really trust, I don't know, does the factory not have a list of all the ID numbers of the HSMs and their corresponding private keys and that could leak? Or is the private key generated in the device such that even the manufacturer doesn't know about it until it's booted up?

Eito Miyamura: Yeah, I don't know about the specific instances or implementation details of how those are done. I believe that most of the time it is just generated internally. But at times it's also, whenever it's paired with a software that allows you to actually view and attest the signature, then it may be logged in some central logo. So in some ways, the private key is untampered on a hardware level, but there is a chance that the key is known to the actual creators of the hardware. But again, these are like sort of implementation detail specific and there are definitely ways in which you can do it so that, for example, you have pseudo-randomly generated secret keys within the secure enclave. It's all a matter of the hardware manufacturer and their integrity as well as like that particular processes.

Nicholas: Right. But there is a sense within the community, especially because HSMs are used in government applications and there is a cryptographic, security, industrial, even academic perspective on these things, that there are ways to generate HSMs where even the manufacturer doesn't know the private key and it's not something that can be extracted from the device without it being obvious. Is that, of course, it depends on which HSM you're using, etc. But I guess the... The first premise that people are going to be skeptical of is like, is it really possible to make a piece of hardware where the manufacturer or the operating system that accesses that hardware module don't have access to the data itself? And maybe you're not the expert on the details of that, but at least within the community of people who are dealing with HSMs, it is believed that that technology is possible. Or are there limits to that? Do we think that like a state actor, for example, you really can't protect... An HSM is really not going to stop a state actor from getting access to the private key material inside. How do you think about it?

Eito Miyamura: Yeah, so it is... As far as my knowledge goes, it is possible to create a private key, which is even unknown to the manufacturer. Of course, you'll have to sort of... That's true in theory, but the actual implementation of what's on your device, I think you'll have to just verify manually. Since they could give you a theoretical specification of how they generate the private keys, that they don't even know what the private keys are, but the actual implementation details may be different from what they say they do and the particular theory. Now, on the level of state level verification, this is where there are some issues that come to rise. And I think this is very much not well known within any community that discusses HSMs. One example is a physical security... A security attack called the fault injection attack. And it's quite remarkable. What they do with a fault injection attack is basically they monitor the electrical signals that are used by the device. And if they can get access to the physical hardware, also the electrical signals that they have, like they can measure in the particular secure enclave. And if they have information about the code, for example, let's say at a branch point, right, in an assembly instruction, one exploit that you can do is you can actually cut the power supplied to the device and cause the device to essentially skip some assembly instructions. Now, some people have created exploits, for example, using Intel's SGX secure enclave of cases where they don't necessarily leak the... private information, but they do certainly get it to a state that would not be reachable otherwise. And, of course, that vulnerability, in theory, could be led to many other ways of stealing the secret key or at least partially revealing it. And then afterwards, you would be able to, like, brute force it, for example. So I think it's an area of debate. It's certainly a very niche field, the number of people who sort of study this seriously. I'm not going to claim that I know everything about this field. But I think it's definitely ready for mass adoption state level. I don't think it is because of attacks like fault injection attacks.

Nicholas: So I want to jump back. I don't want to lose the thread, and I'd like you to help me walk through sort of all the steps of the basic version, and then we'll get to the ZK. But it seems to me, in terms of application, where we're at with this, this kind of thing is like maybe a journalist, a photojournalist or an interviewer or something, maybe even something like this, perhaps if it's more politically spicy even, I might have a device that attests to my recordings with signed data, signed recording, essentially. And then you would be, as a listener, as you say in the kind of description of ZK Microphone, you'd be, you'd have some certainty that there was no manipulation of the data after I published it. But you're still trusting that I'm sort of socially vouching on Twitter or wherever publicly that this is my recording and I'm the one who put it up and it's got the signature from my device that I always use. So you know that it's really me. But of course, I could have been lying about my name or have a fake identity or be otherwise compromised and publishing data. that's not, it's not proven, it's not proving that the data is true. It's just proving that kind of association between the like named creator, the person whose name is on the file and their public profile and associating it with a physical device that recorded it. Is that a, like, is that kind of the application that we're looking at given the current state of the technology?

Eito Miyamura: Yeah, definitely. I think one of the base axiomatic assumptions that is made with not just ZK Microphone, but also other hardware attestation, is that the hardware is not compromised by the original user. And there is a level of consistency where, you know, the same hardware is used by the same person over and over and over again. And the hardware is not compromised. Right.

Nicholas: Because for some reason, I find the camera example easier. But, you know, I could manipulate the image sensor on a camera that has an HSM in it, or I could insert another track into this recording while we're recording, but not properly, obligate it to you and have some other guest saying things in the background that you can't hear, but that my hardware device attests to. And yet in the final recording, it might sound like we're all talking together, but you during the thing might not have heard it. Or I might, in a simple example, like take a picture of a picture and say that it came from my camera and be able to attest to the date and the authenticity of the device that took the image. But the image itself is not, it sort of relies on this social link, almost like encryption, like crypto layer zero, that I'm attesting to it with my social credentials. My credibility as a person, my identity is what makes the signature from the hardware relevant.

Eito Miyamura: - So that I would say, you can definitely have methods where you provide sort of multi-factor authentication. And when I say MFA, sorry, multi-factor authentication, I don't mean it in the traditional sense of like recovering your account, but you can always have another source of identity, let's say your crypto wallet, sign the audio transaction on top of the hardware signature so that you have sort of like two independent variables, which are both saying, yes, this is an audio or photo generated by me. And that's how you can increase the security of that. Also to the point of, yeah, the like playing the audio to the microphone or sort of taking a photo of a photo. What I would say to that is I would always go back to the use case, which is sort of, okay, like, where is this most important? This is the most important for someone with a significant amount of reach on social media, for example, right? So journalists, influencers, large names, and it is fairly reasonable to assume that the hardware they use, or it's very hard to hijack the entire stack from the social media stack all the way to the hardware stack. If it's just one, then it's fairly easy, right? For example, if I hack into someone's account and generate a bunch of deep fakes right now, in theory, like no one would be able to tell the difference until, you know, I'm caught doing naughty things. But along the hardware attestation just gives you another dimension of assurance about the authenticity of the media that is being propagated on social media. So yeah, again, it goes back to the idea of like multi-factor authentication. It's like, yes, this one thing alone isn't perfect, but by combining these together, i.e. the hardware attestation, as well as the fact that it's being propagated on their social media, and maybe that it's being signed with a crypto wallet, it's highly unlikely that any hacker would be able to compromise all three at once. Each of these sort of decrease the probability of fraud by a large amount. And I think this is something where we'll converge for essentially like preventing deep fakes.

Nicholas: - So essentially we're depending on, one application would be for comparing to something like misinformation imagery about the president. you mentioned in one of the presentations about ZK Microphone, that you as a viewer might expect an attestation proof that it originated from the recording device of a photojournalist who is, and published also by their Twitter or something, So as you say, multifactor to kind of assure the origin is at least someone who is accredited by the, an institution that you trust, or even an independent journalist that you trust, and that it hasn't been manipulated along the way. But if those people decide to perpetuate lies by manipulating the sensors or signing data with a device that is inauthentic, then the hit would be on their reputation, not on the proof that, that the recording device recorded like a reality per se. - That's correct.

Eito Miyamura: And I think if a hardware attestation dominated future is the one that we are heading into, I would imagine that a lot of media companies will have a publicly known list of like public addresses that is known to be able to verify the particular hardware that stores the secret key corresponding to that public key. And they would, in general, be expected to just keep using those hardware and not add on a new ones. And maybe you can sort of wait how much you believe the new reporting based on how new the public addresses are. - Makes sense.

Nicholas: Okay, so it's just another way to kind of, it especially solves, and I like the way that you set it up in the presentation of ZK Microphone, this problem of, of manipulation, at least even in the most basic case without the ZK Microphone technology that we're about to talk about, just removing the doubt about manipulation between the point at which it was created and the point at which it's consumed. And I guess just to finish off talking about the hardware attestation, how it works in a basic sense, the sensor captures some data that a hardware security module signs, signs both the, essentially signs a hash of the data and its metadata. And then that signature is sent, along with the image or whatever file data in order to be able to be verified by somebody else that this data was in fact signed by the device with this hardware security module. That's kind of the big picture of HSMs for attestation of recordings, right?

Eito Miyamura: - Yep, that's absolutely correct. And one other sort of way that it's being implemented is by a coalition made by a lot of large tech companies, called C2PA. - Yes. - Which essentially just involves sort of building these cross compatible apps between, for example, Sony, Adobe. Like you might take a photo on a Sony camera and then edit it in a Adobe Photoshop and maybe post it on, I don't know, Microsoft Forms, or I don't think they've got Twitter or Meta yet, but they can basically sort of, using the standard protocol, that is controlled by these tech companies, which is C2PA, you would be able to attest these hardware signatures all along this stack.

Nicholas: - But the C2PA, as you, I think, point out correctly, critically in your presentation at the ZK Summit is very centralized.

Eito Miyamura: - Correct. It is pretty much held by the major interests of these large players. And you would not know if these, sort of, proof of edits, or even proof of hardware signatures for that matter, is actually legitimate. But the scarier one is definitely the proof of edits. Since that can be manipulated, that means that you sort of give the power to define what is truth and what is fake to, for example, Adobe or the C2PA Collective. And so to that extent, it's a fairly dangerous proposition.

Nicholas: - So what is the solution that ZK Summit, ZK Microphone proposes?

Eito Miyamura: - Yeah, I think ZK Microphone is definitely a very early, very, like, beginnings of a solution towards this entire deep fake problem, balanced with, you know, who holds the power to say what is true, what is not. What it is is basically allowing you to create a proof of computation that is also private, which is where the zero knowledge part comes in, to be able to edit the audio, in such a way that you preserve two things. A, you preserve the original digital signature by the HSM. So sort of doing operations on, doing any kind of computations on the audio doesn't ruin the original hardware signature. And B, it preserves privacy. So you wouldn't be able to tell, for example, if one of the edits you make to the audio is maybe censoring out very sensitive information, you wouldn't be able to know what that sensitive information was, which is sort of the zero knowledge part of the ZK Microphone. And these two together allow you to create software that can basically allow you to, like, sort of control two layers of the stack, which is the physical layer of recording the audio and signing the audio, and then editing the audio, according to whatever you want to do with it. And then where we're missing sort of the last layer of the stack is basically distribution. So, for example, on Twitter, you may want to actually have this, you know, verification badge to each post, for example, which basically verifies that, yes, it was recorded with a hardware attestation, and it was edited by this hardware attestation. And by the way, you would actually know what operations were carried out without actually knowing what was in the original audio. This is, yeah, a very important point. But yeah, I think the completion of those steps allow you to create a pseudo open source protocol for the entire stack of the information supply chain to preserve authenticity and minimize sort of damage done by deepfakes to society.

Nicholas: So let's talk about the physical prototype and then the steps of recording and then editing and generating these ZK proofs. So maybe tell me how, I know you used a Raspberry Pi as a kind of stand-in for an HSM. Maybe you can describe what the physical prototype is and then we can talk about the software.

Eito Miyamura: Yeah, so, you know, we hacked this together in a, what was it, a 36-hour hackathon. So it was definitely not a complete solution, but it was a Raspberry Pi hooked up to a microphone, and then the program on the Raspberry Pi was pretty simple. It was just, it stored a secret key, and then it signed the, when it recorded audio with it, it would sign the audio with that secret key, which never sort of gets transmitted outside of the Raspberry Pi. You only transmit the signature and the recorded audio outside.

Nicholas: And then... And when you say signed, when you say signed, you mean both signed the, I don't know, WAV or MP3 file, as well as some metadata about it that it was recorded on a certain date by this device? Yes, that's correct.

Eito Miyamura: And just getting one single signature, I forgot how many bytes it was, maybe like, I don't know, 256 bits. Don't quote me on that. And then you just send that and the audio into the computer. And then you would take the audio and the signature, as well as the metadata, and now what you do is you carry out, for our particular demo, we wanted to demo the zero-knowledge privacy pod as well as the succinct proof of edit. So we had a demo where you can, like, bleep out a chosen section of the audio, and after you select that, it will basically just replace all of the sound waves between that time interval to zero, i.e. bleeping it out. And then you would run a zero-knowledge proof of computation that you indeed zeroed out those particular bits. So that's what was semantically going on. And then, yeah, you would generate the zero-knowledge proof of that computation. So you would return the edited audio, the zero-knowledge proof, and the original digital signature, modified by that process. And then, finally, when you, for example, distribute this on X or whatever social media platform, you would sort of have this zero-knowledge proof as well as the signature as a metadata, and then you can verify that on-chain or wherever the public key was committed. And verify against that, verify that zero-knowledge proof, to, yeah, basically verify the authenticity that, yes, indeed, this audio was signed by the secret key stored in the secret enclave of the microphone, and the computational steps taken were all done correctly. Mm-hmm.

Nicholas: I'm curious about this provable computation around editing the signed original data. So you're able to-- maybe you can describe for people who haven't used that kind of application of ZK yet, but essentially from having all the signature and the final edit of the data, you're able to tell that editing processes, editing computations were done to the file without knowing the original material that was removed, but also with the knowledge that you didn't do additional manipulations. Like, what is it that you're able to tell about the processing that was done through these provable computations? And what is it that is obscured?

Eito Miyamura: Yeah, so the part that you are able to tell is exactly what you mentioned, which is that the steps that you claim that was done was indeed done, and

Nicholas: that

Eito Miyamura: there were-- to the extent that there is no hash collisions, you can prove-- like, you can know that there was indeed no other operations carried out to it. And the part that you obfuscate is, like, what was actually in the original audio. So, like, unless the operations were invertible, which in our case it wasn't, you would not be able to recreate the original audio as a function of the edited audio and the zero-knowledge proof, as well as the implicit sort of commitment of what you actually did to the original audio.

Nicholas: Are there limitations around what-- do you need some ZK-specific computation, like, audio computation, audio editing software, for example, or is any kind of operation that can run on a CPU possibly ZK-provable, and so we don't need to reinvent the whole stack for editing data, for example?

Eito Miyamura: Yeah, so I think that's a great question. Where I think we're heading with the entire ZK sort of industry is definitely any CPU-- well, okay, any CPU operation with some caveats, which maybe is not worth getting into, can be zero-knowledge proofed. So you can generate any proof of computation of it. I think the best company that sort of resembles this view of the future is definitely Risk Zero, where essentially they can generate zero-knowledge proofs of any RISC-V architecture computations, which is sort of a major ISA sort of competitor to ARM, which is, you know, used in-- for computations for mobile phones, NVIDIA GPUs, MAP, Apple M1 chips, et cetera. And RISC-V is sort of like this very-- a very similar instruction set and can basically do everything that your regular CPU can do.

Nicholas: But open source.

Eito Miyamura: Yes, but open source, unlike ARM, which is a public company. So to answer your question, yes. Technically, yes. You would be able to generate zero-knowledge proofs of any editing computation that can be run on a CPU. That said, I think Risk Zero and just ZKs in general are a little bit too slow right now for that to be feasible in any kind of easy way. We actually had to innovate quite a bit to get the thing that we were working on, like the ZK microphone editing part working. And we actually did that by operating not in the image space, but actually in the hardware. When we're carrying out the zero-knowledge computate-- sorry, the generating proofs for the computation, we would actually have to, like-- we would have to basically operate and, like, apply a hash to the original audio. And then we would actually do a zero-knowledge proof on the hash of the computation. And then we used a very specific hash so that any operations that work in the original audio will also correspond to the hash. I don't think I'm explaining this well, but-- I think that--.

Nicholas: So two things. Just before we continue with this, I just want to say the-- in principle, any, like, all the common types of software that we use in the operating systems, et cetera, can be compiled for the RISC-V architecture. In practice, you cannot get macOS to run and Adobe to run on RISC-V right now. So you can't use the nascent-- in practice, you can't use whatever your favorite audio editing tool so easily. So I suppose you're doing things on a lower level from command line in some unique environment. that's not just any old tool that you want. So we're not quite there yet. But at least in principle, you could imagine recompiling all the existing software or any specific piece of software that you might use to manipulate data for an architecture that is provable because it's open source and there's technology being built to build ZK proofs around those computations. The part that you're talking about that I think is the least familiar to me and maybe to the audience is the idea of being able to hash a computation itself, not the data that results from the computation, but to be able to generate a hash of that computation itself and then do proofs over that. I don't know if you can explain that at all. Yeah.

Eito Miyamura: I'd say it's not quite the-- hash of the computation. It's actually a clever way of generating the zero-knowledge proof that you want, which is that you did the editing operation correctly. And the way you achieve that is basically you hash the audio beforehand and then you-- let's say you commit to saying you did operation one, two, three, and four, operation one, operation two, operation three, then instead of generating a zero-knowledge proof of doing operation one, operation two, operation three, operation four on the original audio, instead you actually do the operation one, operation two, operation three on the hash. And you sort of do that specifically in such a way that there is like a one-to-one correspondence between the hash and the original audio. A good way to think about it is this is like a compression technique where you basically compress-- you compress the original audio into a fixed-size hash space and you do proof of computation just in the hash space. And because you're operating with such much smaller data, you basically reduce the computational load that is required. But because you assume that there are no hash collisions, which is basically like a de facto assumption made in computer science, you can basically rest assured that, you know, there is like a one-to-one correspondence between the hash of the audio and the original audio as well as the corresponding edited audio.

Nicholas: So the ZK provable computation element is to generate the proofs of computation, the time and computational resources required, and by extension the time required to do so, are implemented. So you're making part a function of the size, the amount of data at these steps of state transitions that your computation is affecting upon the data. So you're compressing the data at each of these checkpoints at the end of an operation in order to reduce the time it takes to generate the proof.

Eito Miyamura: Yes, that's absolutely correct. And, like, we use the specific hash in such a way that maybe, like, you know, after operation one, your original audio turns into state two. Well, if you apply the hash to state two, the hash will correspond exactly to, like, applying operation one on the first hash. So there's, like, a really nice correspondence there.

Nicholas: I guess the part that's maybe confusing or may draw skepticism is how do we know if you're just proving the-- I guess it is just the technology of provable computation that we need to probe a little deeper or maybe through other methods. I've had conversations with people who are specialists in that. But how do you know that the computation that got me from this hash, you know, the data that corresponds to this hash one and the data that corresponds to this hash two, how do I know that the computation is just, I don't know, zeroing out or, you know, clipping the audio and not introducing some other changes? if all that I'm comparing is the-- all that I have is the checkpoints of state transitions or hashes of data rather than the data itself? If I had the full data, I could see, oh, at, you know, between 30 seconds and 31 seconds, the wave is-- there's no waveforms. And in the prior version, there was. Then I can see the difference. But the whole point and interest of provable computation is that you don't need to know all of the source data. You can just prove that you did certain operations and only give the results. So I guess there's some more details for us to understand over time.

Eito Miyamura: Yeah, I think the-- I mean, you know, you can, like, also generate a mathematical proof in terms of probabilities that, you know, your generation of zero knowledge proof is legit. The intuition I think that is, like, useful to take away is basically it's-- like, the reason why we used a specific hash, which is, like, called a polyhash, is so that you have the self-consistency between the hash space and the audio space whenever you apply the operations. And, like, the same operations on the audio correspond to the same-- results in the hashes. So you have this, like-- it's much more intuitive if you can draw it, but, like, you know, imagine operation one, like, transforming audio one to audio two. And then, like, you have, like, hash one, and you're applying operation one. Then it turns into page two. Then you also have this, like, additional constraint, which basically keeps it in check in reality, that the audio one should hash into hash one, and audio two should still hash into hash two. So you have, like, two constraints on hash two such that it needs to be correct. And the operation one for our particular demo was just zeroing out the bits.

Nicholas: Mm-hmm, mm-hmm. So the point being that the hash two is not a completely independent hash. It's connected and verifiably connected to hash one.

Eito Miyamura: Exactly. And it's, like-- it's doubly constrained, which limits the, like-- which basically limits it from being just any arbitrary hash.

Nicholas: Fascinating. I think that's really-- so that's polyhash.

Eito Miyamura: Yeah, that's polyhash. It's actually-- it actually comes from a lot of, like, signaling theory of where you, like, decompose, like, Fourier waves and stuff like that. And then, yeah, fun stuff from that field.

Nicholas: So we've now got this signature-- or is it a signature--. what do we get at the end of this provable computation process? So we generate these polyhashes as we go through making a bunch of operations on our original file. We're kind of accumulating a bunch of hashes that are all provably connected to the original source material and to all the prior state transitions that we've executed upon the data. So we have at the end of the day-- what do we have at the end before we publish it?

Eito Miyamura: Yeah, so you have the-- you have the final polyhash. You have the final edited audio. I mean, technically, you don't actually need the polyhash because you can just hash the final edited audio to get the final polyhash. But that-- and you have a commitment of the operations applied. So you, as the person editing the audio, would also commit to saying, "Hey, I did these operations, and here is the corresponding zero-notch proof.". And, yeah, basically, you would-- then verify that on a public verify contract, which stores a previously committed public key. that--sorry. Yeah, you would actually just have a verify contract where you verify the proof of computation. Mm-hmm.

Nicholas: And I guess you have the original signature from the original data as well, so you know that it came from some hardware device. Exactly.

Eito Miyamura: And you would have that committed on a-- like, yeah, public ledger, like a blockchain, which stores, like, the corresponding public key to the secret key that is stored in the HSM.

Nicholas: And I suppose this kind of provable edit over an original HSM-attested recording could also be executed by people other than the original creator. So you could mutate a video or audio file from some official recording source, and people could verify that--. you know, I don't know, maybe you just clipped out some portions of it, but the original--the pieces that are there do originate from some verifiable attested source. Doesn't need to be the same person editing.

Eito Miyamura: Yeah, you can really verify the, like, information supply chain if you sort of broadly define the information supply chain as, like, using the sensor to take a photo or video or audio or whatever, to editing the audio, to publishing the audio. Like, you can create Azure Knowledge Proof on any part of the supply chain which involves editing. And then for the publishing end, you know, that just comes from, like, game theoretic incentives, which sort of keeps it in check and keeps it real.

Nicholas: Hmm. Fantastic. So are there any other parts to understand about the prototype? as you built it at-- it was ETH Global Paris, right?

Eito Miyamura: Yeah, that's correct. I think the only other implementation detail was that it was stored in-- all the data was stored in IPFS, but I think that's basically it for the majority of it.

Nicholas: And does the IPFS bring anything particular to it, or--? It's just a convenient place to store this data. It's accessible broadly.

Eito Miyamura: Yeah, it was just a convenient storage point. Oh, I think something that's maybe worth talking about is just the fact that, you know, if you want to bring this into production or sort of-- since we have the hardware constraint issue that we talked about before, I think that if someone were to actually, like, want to build, you know, an Adobe After Effects equivalent, then they would actually have to-- re-implement all of the circuits for each of the edit operations from first principle in such a way that it's also performant, which is basically, like, the huge blocker on being able to use this in a real setting. And also, you know, there is sort of a network effect to particular audio formats and stuff like that. So the path towards this kind of, like, verifiable media supply chain is definitely quite far, but at the rate that zero-knowledge proofs are progressing, I think we may have a chance at a bright future where we can fight these deep fake AI at these particular levels.

Nicholas: Sort of short of re-implementing one of the major operating systems on which the Adobe or whatever editing software functions. for RISC-V, you would need to-- if you aren't able to run Mac OS or Windows or Linux and an editing software on top of it on RISC-V, then you would need to re-implement the editing functionality in SERCOM or something like that.

Eito Miyamura: Ah, sorry. RISC-V was a separate point where that was just about using something like RISC-V to do any general CPU computations. The point that I was just making now is basically, you know, since RISC-V, at least right now, can't be dependent on to just run Adobe, you're pretty much left with manually implementing it right now.

Nicholas: One thing that this all made me think about was, you know, there are secure enclaves in all the iPhones and-- is it TEEs in-- but essentially HSMs in all the Android phones, more or less. So you-- I don't know if they're doing it currently, but you could imagine hardware-- it would be, I think, trivial for Apple to hardware sign every image, for example, that you take as a photo on your iPhone. And then it might be not be so complicated for them either to hardware-- use that same hardware signing for doing these state transitions. if they--again, they might have to-- it's unclear. I don't know the details of the stack. if it's possible for them to use their secure enclave without having to rewrite all of that software to run inside of something that is more compatible with the ZK proving-- provable computation stack. But you could imagine that the iPhone could be generating signatures for all these state transitions. Does that seem like a possible path towards something like this where it comes from a more computationally advanced-- you know, not from the traditional camera industry or something like that, but instead from these devices that already have all these HSMs built in? Maybe it's possible sooner than we think.

Eito Miyamura: Yeah, I think absolutely. We--so actually the funny story is after the ZK microphone hack, we immediately went and basically asked that question of can we use the-- Apple secure enclave? The answer for external developers was no. The access to that is fairly limited to where we wouldn't be able to design anything sort of useful. I think that there is basically no way. Apple isn't working on this. The question to ask is, are they going to take a more open source approach or are they going to take a more T2PA approach where they get to dictate what is sort of real and what is fake with the T2PA path, or is it more sort of commercially advantageous for them to keep it open source and keep it legit? And I think that's a non-obvious answer that we haven't quite been able to answer, but I imagine that they are working on something to fight the deepfake problem.

Nicholas: It is pretty cool once you have that sort of whole ecosystem, presuming that they do it in a legitimate and ideally open source way or verifiable at the very least kind of way, that each time you purchase a new device, it becomes associated with your Apple identity and then any data that you generate on those or manipulations that you make to files can be associated with your identity. and if you choose to reveal, proved that you made those state transitions or generated those data in a way attesting to their origin, to their creator. It would be pretty cool. Yeah, absolutely.

Eito Miyamura: I mean, you know, maybe ask the question like for what proportion of all media that exists on the internet was taken by an iPhone and it's probably a fairly non-trivial. In fact, maybe it is the largest category, maybe excluding deepfakes since I anticipate that there are more deepfake images in the internet now than real images taken by real hardware.

Nicholas: I suppose we're just in the acceleration of that exponential towards the shift in the balance of the origin of content towards generative AI. It does make me wonder also about the reverse situation, which is the plausible deniability that, I don't know, what was the, a few years ago, 4chan or someone hacked a bunch of people's iCloud accounts and released private images from celebrities' camera rolls. You could imagine preferring the plausible deniability that, oh, it's actually a deepfake or generative, rather than generated on a device that is known to be associated with me, which maybe you lose with hardware-attested data generation.

Eito Miyamura: Yeah. I have no idea how this is going to play out. I mean, I think the other major sort of thing that is really something to keep an eye on is basically how the 2024 presidential elections will play out. I mean, you know, if you just look back to like 2016 or 2020 and just how much misinformation or sort of political maneuvering both by external countries like Russia was involved, I think there's absolutely no doubt that both China and Russia are ready to launch a deepfake mass, like mass meteor attack on the United States. And yeah, it's a pretty scary thought when you really think about the full implications of deepfakes and plausible deniability. And, you know, all of these things. I mean, Trump can do something really shitty, for example, and just say, "Yeah, that was a deepfake, not me," and vice versa. So I think we're entering a very scary time. I hope someone, whether it be C2PA or Apple or something, has a great solution to this problem.

Nicholas: Yeah, I mean, it does also make me think that the technology definitely has applications and will be useful for orienting ourselves with regard to media and data. But at the same time, at the end of the day, people can be presented with facts and disagree with them for ideological reasons, come up with reasons why they don't trust. Oh, you know, take it. Okay, so it's a photo taken on an Apple phone and Apple is in bed with China and something, something, something, why I shouldn't trust even an attestation. So I don't know if at the end of the day, maybe our ideological commitments are more powerful than any kind of hardware attestation, at least, at the level of sort of public discourse, maybe, you know, state actors and security agencies and corporations maybe put more trust into this kind of technological solution than perhaps individuals on Facebook.

Eito Miyamura: Yeah, I have zero clue how this is going to play out. I think that there is a fairly non-trivial chance that there is probably some technologies, like some maybe new versions of stable diffusion or some breakthrough deep fake, like, I don't know, like cyber weapons that are sort of ready to be deployed come election time. And that is sort of one of the scariest things to me. But yeah.

Nicholas: You did this project with collaborators, Alex Chima, Manoj Mishra, who are both also at HomeDAO, which you're a co-founder of. I wondered if you could share a little bit about HomeDAO, what it is and how it got started.

Eito Miyamura: So just to be clear, I'm not a founder of HomeDAO, but I'm just one of the founding members. It all started in, like, Oxford University, where there was just a group of very entrepreneurial group of people, specifically within the Web3 space. And we basically all got together. There was, like, one hacker house that existed already, where basically there were, like, four guys who lived together. And of the four of them, three venture-backed startups came out the previous year. And sort of Josh, who's, like, the main founder, basically said, "Hey, I think this should be scalable, just given how many talented people there are and, you know, just how much opportunity there is in this space.". And then, yeah, he basically started commercializing and started expanding the number of hacker houses. To this day, I think we've won around, like, 11 hackathons in a row in total amongst collectively between just, like, a small number of us. I think there's, like, 20 of us or so with seven also, no, maybe eight or nine, like, venture-backed companies that came out. Wow.

Nicholas: All in, kind of, crypto space or other subjects? Yeah.

Eito Miyamura: Mostly all in, I think, all in crypto space of all one. One of them was backed by Wine Combinator. And then there are also a few that are about to come out. Yeah. But I think I can talk about them here.

Nicholas: All right. And largely in the cryptography or ZK parts of the world, this kind of math driven technology or a variety of different things, I guess?

Eito Miyamura: I think a variety of different things. You know, there was sort of, like, more NFT AMMs. There is an account abstraction wallet. There are sort of hardware acceleration and, like, very deep compiler technologies, stuff like that. So there's a very wide range of different projects coming out of HomeDAO. And, yeah, it's basically, like, it was sort of an attempt of, okay, can we create an app in a California-esque hacker house environment adjacent to a university that pumps out talent? And so far, I think the answer has been yes and it's been a success. But, yeah, I think it'll be interesting to see how it evolves over the years.

Nicholas: Very cool. So if people are in Oxford, maybe they should reach out, get to know the HomeDAO people. And, yeah, I guess, yep. They're doing stuff in March so coming up really soon. And, yeah, I was curious, I guess, as the last question, what's the crypto community like in Oxford? What's the builderly feeling? Are people interested? And how does it feel to be a builder there?

Eito Miyamura: Yeah, I think that there's something interesting about Oxford which I think is because people are, in the UK, unlike the US universities, you tend to be specialized in one subject and very, like, especially slow. And so you have this effect where your average, for example, computer science or very technical person is a lot more technical than your average, let's say, American CS student. And so you have a lot of deep expertise but minus the sort of very entrepreneurial or sort of very dynamic American spirit. Almost because of that, you tend to have very, very naturally driven people who are, like, the builders in the space. In some ways, like, because it's not an environment where the de facto is not is to found companies. Instead, it's just to go into consulting or banking or whatever. The people who fall out of that tend to be extremely sort of strong-minded and entrepreneurial. I suppose you can call them naturals rather than sort of influenced by these surroundings. Mm-hmm.

Nicholas: They're not aspiring to the cultural paradigm. They're actually breaking it. Yeah, precise.

Eito Miyamura: And, you know, you need that almost to break out of the much more rigid structures. So, yeah, I think you have this very interesting effect where, like, because the UK is definitely a lot more conservative than the US, the people who do decide to do something very different are very different and very strongly opinionated and have very high convictions.

Nicholas: That's great. And, yeah, the education system that you're sort of specializing much earlier on in university, you go into a specific program, there's no period of time where you're choosing a program or anything like that.

Eito Miyamura: Yeah, and that definitely makes you way more hardcore and, in my opinion, way more competent at the particular technological fields. I think you do, like, the advantage of the American system obviously being the flexibility as well as sort of your capability to be a generalist formed by the education. That's not to say that there aren't any generalists in Oxford. It's just a numbers game on sort of how the education system works.

Nicholas: Fascinating. Well, ETH London is coming up, so maybe some listeners will be there and be able to meet you, talk more about ZK Microphone. Eito, thank you so much for coming through and telling us all about hardware tested recording and editing. It's a fascinating subject.

Eito Miyamura: Thank you so much, Nicholas. It's been really fun talking to you.

Nicholas: Yeah, if people want to check out more or follow your projects, where's a good place for them to go?

Eito Miyamura: I think my Twitter would be the best place to look. These days, I'm not actually working on ZK Microphone, but I have something potentially even more interesting to bring up. And so, when I can talk about them, I think that will be something really cool.

Eito Miyamura: Thank you so much, It's been really fun.

Nicholas: Awesome. Happy New Year and talk to you soon. Thanks everybody.

Eito Miyamura: Talk to you soon.

Nicholas: Bye. Hey, thanks for listening to this episode of Web3 Galaxy Brain. To keep up with everything Web3, follow me on Twitter @Nicholas with four leading ends. You can find links to the topics discussed on today's episode in the show notes. Podcast feed links are available at web3galaxybrain.com. Web3 Galaxy Brain airs live most Friday afternoons at 5:00 PM Eastern Time, 22:00 UTC on Twitter Spaces. I look forward to seeing you there.

Show less

Related episodes

Podcast Thumbnail

DC Posch and Nalin Bhardwaj, Founders of Daimo

2 November 2023
Podcast Thumbnail

Zero Knowledge with Brandon H Gomes

15 November 2022
Podcast Thumbnail

Jose Aguinaga on Passkeys, MPC, and AA Wallets

22 September 2023
Eito Miyamura, ZK Microphone