## Introduction

#### Vitalik Buterin

## technical

Blockchains are a powerful technology, spil regular readers of the blog already likely agree. They permit for a large number of interactions to be codified and carried out ter a way that greatly increases reliability, liquidates business and political risks associated with the process being managed by a central entity, and reduces the need for trust. They create a podium on which applications from different companies and even of different types can run together, permitting for utterly efficient and seamless interaction, and leave an audit trail that anyone can check to make sure that everything is being processed correctly.

However, when I and others talk to companies about building their applications on a blockchain, two primary issues always come up: scalability and privacy. Scalability is a serious problem, current blockchains, processing 3-20 transactions vanaf 2nd, are several orders of mangitude away from the amount of processing power needed to run mainstream payment systems or financial markets, much less decentralized forums or global micropayment platforms for IoT. Fortunately, there are solutions, and wij are actively working on implementing a roadmap to making them toebijten. The other major problem that blockchains have is privacy. Spil provocative spil a blockchain’s other advantages are, neither companies or individuals are particularly keen on publishing all of their information onto a public database that can be arbitrarily read without any confinements by one’s own government, foreign governments, family members, coworkers and business competitors.

Unlike with scalability, the solutions for privacy are ter some cases lighter to implement (however te other cases much much firmer), many of them compatible with presently existing blockchains, but they are also much less satisfying. It’s much stiffer to create a “holy grail” technology which permits users to do absolutely everything that they can do right now on a blockchain, but with privacy, instead, developers will ter many cases be compelled to contend with partial solutions, heuristics and mechanisms that are designed to bring privacy to specific classes of applications.

### The Holy Grail

Very first, let us embark off with the technologies that are holy grails, ter that they actually do suggest the promise of converting arbitrary applications into fully privacy-preserving applications, permitting users to benefit from the security of a blockchain, using a decentralized network to process the transactions, but “encrypting” the gegevens ter such a way that even however everything is being computed ter plain look, the underlying “meaning” of the information is fully obfuscated.

The most powerful technology that holds promise te direction is, of course, cryptographically secure obfuscation. Te general, obfuscation is a way of turning any program into a “black box” omschrijving of the program, ter such a way that the program still has the same “internal logic”, and still gives the same outputs for the same inputs, but it’s unlikely to determine any other details about how the program works.

*Think of it spil “encrypting” the wires inwards of the opbergruimte te such a way that the encryption cancels itself out and ultimately has no effect on the output, but does have the effect of making it absolutely unlikely to see what is going on inwards.*

Unluckily, absolutely volmaakt black-box obfuscation is mathematically known to be unlikely, it turns out that there is always at least something that you can get samenvatting out of a program by looking at it beyond just the outputs that it gives on a specific set of inputs. However, there is a weaker standard called indistinguishability obfuscation that wij can sate: essentially, given two *omschrijving* programs that have bot obfuscated using the algorithm (eg. x = (a + b) * c and x = (a * c) + (b * c) ), one cannot determine which of the two outputs came from which original source. To see how this is still powerful enough for our applications, consider the following two programs:

One just comes back zero, and the other uses an internally contained private key to cryptographically sign a message, does that same operation another time, subtracts the (obviously identical) results from each other and comebacks the result, which is ensured to be zero. Even tho’ one program just comes back zero, and the other **contains and uses a cryptographic private key**, if indistinguishability is sated then wij know that the two obfuscated programs cannot be distinguished from each other, and so someone te possession of the obfuscated program certainly has no way of extracting the private key – otherwise, that would be a way of distinguishing the two programs. That’s some pretty powerful obfuscation right there – and for about two years wij’ve known how to do it!

So, how do wij use this on a blockchain? Here’s one ordinary treatment for a digital token. Wij create an obfuscated brainy contract which contains a private key, and accepts instructions encrypted with the correponding public key. The contract stores account balances ter storage encrypted, and if the contract wants to read the storage it decrypts it internally, and if the contract wants to write to storage it encrypts the desired result before writing it. If someone wants to read a balance of their account, then they encode that request spil a transaction, and simulate it on their own machine, the obfuscated wise contract code will check the signature on the transaction to see if that user is entitled to read the balance, and if they are entitled to read the balance it will terugwedstrijd the decrypted balance, otherwise the code will come back an error, and the user has no way of extracting the information.

However, spil with several other technologies of this type, there is one problem: the mechanism for doing this kleintje of obfuscation is horrendously inefficient. Billion-factor overhead is the vaandel, and often even very optimistic, a latest paper estimates that “executing [a 2-bit multiplication] circuit on the same CPU would take 1.Three * Ten 8 years”. Additionally, if you want to prevent reads and writes to storage from being a gegevens leak vector, you vereiste also set up the contract so that read and write operations always modify large portions of a contract’s entire state – another source of overhead. When, on top of that, you have the overhead of hundreds of knots running the code on a blockchain, one can quickly see how this technology is, unluckily, not going to switch anything any time soon.

### Taking A Step Down

However, there are two branches of technology that can get you almost spil far spil obfuscation, however with significant compromises to the security prototype. The very first is secure multi-party computation. Secure multi-party computation permits for a program (and its state) to be split among N parties ter such a way that you need M of them (eg. N = 9, M = Five) to cooperate ter order to either accomplish the computation or expose any internal gegevens ter the program or the state. Thus, if you can trust the majority of the participants to be fair, the scheme is spil good spil obfuscation. If you can’t, then it’s worthless.

The math behind secure multi-party computation is complicated, but much simpler than obfuscation, if you are interested ter the technical details, then you can read more here (and also the paper of Enigma, a project that seeks to actually implement the secret sharing DAO concept, here). SMPC is also much more efficient than obfuscation, the point that you can carry out practical computations with it, but even still the inefficiencies are very large. Addition operations can be processed fairly quickly, but every time an SMPC example performs some very puny immobilized number of multiplication operations it needs to perform a “degree reduction” step involving messages being sent from every knot to every knot te the network. Latest work reduces the communication overhead from quadratic to linear, but even still every multiplication operation brings a certain unavoidable level of network latency.

The requirement of trust on the participants is also an onerous one, note that, spil is the case with many other applications, the participants have the capability to save the gegevens and then collude to uncover at any future point te history. Additionally, it is unlikely to tell that they have done this, and so it is unlikely to incentivize the participants to maintain the system’s privacy, for this reason, secure multi-party computation is arguably much more suited to private blockchains, where incentives can come from outside the protocol, than public chains.

Another kleintje of technology that has very powerful properties is zero-knowledge proofs, and specifically the latest developments ter “succinct arguments of knowledge” (SNARKs). Zero-knowledge proofs permit a user to construct a mathematical proof that a given program, when executed on some (possibly hidden) input known by the user, has a particular (publicly known) output, without exposing any other information. There are many specialized types of zero-knowledge proofs that are fairly effortless to implement, for example, you can think of a digital signature spil a zuigeling of zero-knowledge proof demonstrating that you know the value of a private key which, when processed using a standard algorithm, can be converted into a particular public key. ZK-SNARKs, on the other arm, permit you to make such a proof for any function.

Very first, let us go through some specific examples. One natural use case for the technology is ter identity systems. For example, suppose that you want to prove to a system that you are (i) a citizen of a given country, and (ii) overheen Nineteen years old. Suppose that your government is technologically progressive, and issues cryptographically signed digital passports, which include a person’s name and date of birth spil well spil a private and public key. You would construct a function which takes a digital passport and a signature signed by the private key te the passport spil input, and outputs 1 if both (i) the date of birth is before 1996, (ii) the passport wasgoed signed with the government’s public key, and (iii) the signature is juist, and outputs 0 otherwise. You would then make a zero-knowledge proof displaying that you have an input that, when passed through this function, comes back 1, and sign the proof with another private key that you want to use for your future interactions with this service. The service would verify the proof, and if the proof is onberispelijk it would accept messages signed with your private key spil valid.

You could also use the same scheme to verify more elaborate claims, like “I am a citizen of this country, and my ID number is not te this set of ID numbers that have already bot used”, or “I have had favorable reviews from some merchants after purchasing at least $Ten,000 worth of products from them”, or “I hold assets worth at least $250,000”.

Another category of use cases for the technology is digital token ownership. Ter order to have a functioning digital token system, you do not rigorously need to have visible accounts and balances, te fact, all that you need is a way to solve the “double spending” problem – if you have 100 units of an asset, you should be able to spend those 100 units once, but not twice. With zero-knowledge proofs, wij can of course do this, the voorkeur that you would zero-knowledge-prove is something like “I know a secret number behind one of the accounts ter this set of accounts that have bot created, and it does not match any of the secret numbers that have already bot revealed”. Accounts ter this scheme become one-time-use: an “account” is created every time assets are sent, and the sender account is downright consumed. If you do not want to totally consume a given account, then you voorwaarde simply create two accounts, one managed by the recipient and the other with the remaining “change” managed by the sender themselves. This is essentially the scheme used by Zcash (see more about how it works here).

For two-party wise contracts (eg. think of something like a financial derivative contract negotiated inbetween two parties), the application of zero-knowledge-proofs is fairly effortless to understand. When the contract is very first negotiated, instead of creating a wise contract containing the actual formula by which the funds will eventually be released (eg. ter a binary option, the formula would be “if index I spil released by some gegevens source is greater than X, send everything to A, otherwise send everything to B”), create a contract containing the *hash of the formula*. When the contract is to be closed, either party can themselves compute the amount that A and B should receive, and provide the result alongside a zero-knowledge-proof that a formula with the onberispelijk hash provides that result. The blockchain finds out how much A and B each waterput te, and how much they get out, but not why they waterput te or get out that amount.

This prototype can be generalized to N-party brainy contracts, and the Hawk project is seeking to do exactly that.

### Beginning from the Other End: Low-Tech Approaches

The other path to take when attempting to increase privacy on the blockchain is to commence with very low-tech approaches, using no crypto beyond plain hashing, encryption and public key cryptography. This is the path that Bitcoin began from ter 2009, tho’ the level of privacy that it provides te practice is fairly difficult to quantify and limited, it still clearly provided some value.

The simplest step that Bitcoin took to somewhat increase privacy is its use of one-time accounts, similar to Zcash, ter order to store funds. Just like with Zcash, every transaction vereiste *fully empty* one or more accounts, and *create* one or more fresh accounts, and it is recommended for users to generate a fresh private key for every fresh account that they intend to receive funds into (however it is possible to have numerous accounts with the same private key). The main benefit that this brings is that a user’s funds are not linked to each other by default: if you receive 50 coins from source A and 50 coins from source B, there is no way for other users to tell that those funds belong to the same person. Additionally, if you spend 13 coins to someone else’s account C, and thereby create a fourth account D where you send the remaining 37 coins from one of thesis accounts spil “change”, the other users cannot even tell which of the two outputs of the transaction is the “payment” and which is the “change”.

However, there is a problem. If, at any point te the future, you make a transaction consuming from two accounts at the same time, then you irrevertibly “link” those accounts, making it visible to the world that they come from one user. And, what’s more, thesis linkages are transitive: if, at any point, you listig together A and B, and then at any other point listig together A and C, and so forward, then you’ve created a large amount of evidence by which statistical analysis can verbinding up your entire set of assets.

Bitcoin developer Mike Hearn came up with a mitigation strategy that reduces the likelihood of this happening called merge avoidance: essentially, a fancy term for attempting truly truly hard to minimize the number of times that you verbinding accounts together by spending from them at the same time. This certainly helps, but even still, privacy inwards of the Bitcoin system has proven to be very porous and heuristic, with nothing even close to approaching high assures.

A somewhat more advanced technology is called CoinJoin. Essentially, the CoinJoin protocol works spil goes after:

- N parties come together overheen some anonymous channel, eg. Tor. They each provide a destination address D[1] . D[N] .
- One of the parties creates a transaction which sends one coin to each destination address.
- The N parties loom out and then separately loom te to the channel, and each contribute one coin to the account that the funds will be paid out from.
- If N coins are paid into the account, they are distributed to the destination addresses, otherwise they are refunded.

If all participants are fair and provide one coin, then everyone will waterput one coin ter and get one coin out, but **no one will know which input maps to which output**. If at least one participant does not waterput one coin ter, then the process will fail, the coins will get refunded, and all of the participants can attempt again. An algorithm similar to this wasgoed implemented by Amir Taaki and Pablo Martin for Bitcoin, and by Gavin Wood and Vlad Gluhovsky for Ethereum.

So far, wij have only discussed token anonymization. What about two-party wise contracts? Here, wij use the same mechanism spil Hawk, except wij substitute the cryptography with simpler cryptoeconomics – namely, the “auditable computation” trick. The participants send their funds into a contract which stores the hash of the code. When it comes time to send out funds, either party can submit the result. The other party can either send a transaction to agree on the result, permitting the funds to be sent, or it can publish the actual code to the contract, at which point the code will run and distribute the funds correctly. A security deposit can be used to incentivize the parties to participate honestly. Hence, the system is private by default, and only if there is a dispute does any information get leaked to the outside world.

A generalization of this mechanism is called state channels, and also has scalability benefits alongside its improvements ter privacy.

### Stadionring Signatures

A technology which is moderately technically complicated, but utterly promising for both token anonymization and identity applications, is stadionring signatures. A stadionring signature is essentially a signature that proves that the signer has a private key corresponding to one of a specific set of public keys, without exposing which one. The two-sentence explanation for how this works mathematically is that a stadionring signature algorithm includes a mathematical function which can be computed normally with just a public key, but where knowing the private key permits you to add a seed to the input to make the output be whatever specific value you want. The signature itself consists of a list of values, where each value is set to the function applied to the previous value (plus some seed), producing a valid signature requires using skill of a private key to “close the loop”, forcing the last value that you compute to equal the very first. Given a valid “ring” produced te this way, anyone can verify that it is indeed a “ring”, so each value is equal to the function computed on the previous value plus the given seed, but there is no way to tell at which “link” ter the stadionring a private key wasgoed used.

There is also an upgraded version of a stadionring signature called a **linkable stadionring signature**, which adds an toegevoegd property: if you sign twice with the same private key, that fact can be detected – but no other information is exposed. Te the case of token anonymization, the application is fairly plain: when a user wants to spend a coin, instead of having them provide a regular signature to prove ownership of their public key directly, wij combine public keys together into groups, and ask the user to simply prove membership te the group. Because of the linkability property, a user that has one public key te a group can only spend from that group once, conflicting signatures are rejected.

Stadionring signatures can also be used for voting applications: instead of using stadionring signatures to validate spending from a set of coins, wij use them to validate votes. They can also be used for identity applications: if you want to prove that you belong to a set of authorized users, without exposing which one, stadionring signatures are well-suited for just that. Stadionring signatures are more mathematically involved than ordinary signatures, but they are fairly practical to implement, some sample code for stadionring signatures on top of Ethereum can be found here.

### Secret Sharing and Encryption

Sometimes, blockchain applications are not attempting to mediate the transfer of digital assets, or record identity information, or process clever contracts, and are instead being used on more data-centric applications: timestamping, high-value gegevens storage, proof of existence (or proof of inexistence, spil te the case of certificate revocations), etc. A common refrain is the idea of using blockchains to build systems where “users are te control of their own data”.

Te thesis cases, it is once again significant to note that blockchains do NOT solve privacy issues, and are an authenticity solution only. Hence, putting medical records ter plaintext onto a blockchain is a Very Bad Idea. However, they can be combined with other technologies that do suggest privacy te order to create a holistic solution for many industries that does accomplish the desired goals, with blockchains being a vendor-neutral verhoging where some gegevens can be stored ter order to provide authenticity ensures.

So what are thesis privacy-preserving technologies? Well, te the case of plain gegevens storage (eg. medical records), wij can just use the simplest and oldest one of all: encryption! Documents that are hashed on the blockchain can very first be encrypted, so even if the gegevens is stored on something like IPFS only the user with their own private key can see the documents. If a user wants to grant someone else the right to view some specific records ter decrypted form, but not all of them, one can use something like a deterministic wallet to derive a different key for each document.

Another useful technology is secret sharing (described te more detail here), permitting a user to encrypt a lump of gegevens te such a way that M of a given N users (eg. M = Five, N = 9) can cooperate to decrypt the gegevens, but no fewer.

### The Future of Privacy

There are two major challenges with privacy preserving protocols ter blockchains. One of the challenges is statistical: ter order for any privacy-preserving scheme to be computationally practical, the scheme vereiste only alter a petite part of the blockchain state with every transaction. However, even if the *contents* of the alteration are privacy, there will inevitably be some amount of *metadata* that is not. Hence, statistical analyses will always be able to figure out *something*, at the least, they will be able to fish for patterns of *when* transactions take place, and te many cases they will be able to narrow down identities and figure out who interacts with whom.

The 2nd challenge is the developer practice challenge. Turing-complete blockchains work very well for developers because they are very friendly to developers that are downright clueless about the underlying mechanics of decentralization: they create a decentralized “world computer” which looks just like a centralized rekentuig, ter effect telling “look, developers, you can code what you were programma to code already, except that this fresh layer at the bottom will now make everything magically decentralized for you”. Of course, the abstraction is not flawless: high transaction fees, high latency, gas and block reorganizations are something fresh for programmers to contend with, but the barriers are not *that* large.

With privacy, spil wij see, there is no such magic bullet. While there are *partial solutions* for specific use cases, and often thesis partial solutions suggest a high degree of plasticity, the abstractions that they present are fairly different from what developers are used to. It’s not trivial to go from “10-line python script that has some code for subtracting X coins from the sender’s balance and adding X coins to the recipient’s balance” to “highly anonymized digital token using linkable stadionring signatures”.

Projects like Hawk are very welcome steps ter the right direction: they suggest the promise of converting an arbitrary N-party protocol into a zero-knowledge-ified protocol that trusts only the blockchain for authenticity, and one specific party for privacy: essentially, combining the best of both worlds of a centralized and decentralized treatment. Can wij go further, and create a protocol that trusts zero parties for privacy? This is still an active research direction, and wij’ll just have to wait and see how far wij can get.