A detailed but not overly-technical description of how blockchain analysis works and the implications of Taproot on Bitcoin user privacy.
Much like mining, privacy is one of the most important yet difficult to understand topics in Bitcoin, especially if you aren’t working in the sector. There is a rapidly growing business for tracking funds through blockchain analysis, but the majority of Bitcoiners likely don’t know what that entails.
With the Taproot activation process now appearing imminent, we have done our homework and discussed the impacts with experts in privacy and blockchain analysis. In this article, we’ll present what we learned about the future possibilities for Bitcoin’s privacy and scalability with Taproot.
If you’ve ever tried using a block explorer to follow some BTC back a few hops before you received it, you probably discovered that it was no easy task. This is for a good reason: the BTC is more like a homogenous digital liquid than a solid like physical bills and coins, meaning that the individual currency units can’t be easily traced.
You see, the Bitcoin blockchain is a record of what’s known as Unspent Transaction Outputs (UTXOs). Once a UTXO is used as an input in another transaction, it cannot be used again (as it is no longer “unspent”), but it can still be seen on the blockchain. A Bitcoin wallet’s balance is the cumulative value of all these present UTXOs, but the individual satoshis in the balance are not actually distinguishable from one another.
However, there is an important caveat: the UTXO amounts are still distinguishable when they are used as inputs in transactions. In other words, a wallet with 10 UTXOs worth 1 BTC each is different from a wallet with 1 UTXO worth 10 BTC, even though the total balance is the same.
To illustrate the significance of that, let’s go through a quick example. Suppose you receive 1 BTC from each of 3 different senders, giving you 3 UTXOs worth a total of 3 BTC. Now, suppose you want to spend 2.5 BTC. That transaction would have 3 inputs (the 3 UTXOs) and 2 outputs — one fresh UTXO worth 2.5 BTC, and another worth 0.5 BTC minus transaction fees. This second UTXO for the leftover amount is commonly known as “change” — like the coins you receive back if you give a cashier more than you owe at checkout.
To put it another way, you can’t partially spend a UTXO and leave the rest untouched. Instead, you simply send any leftover amount from the UTXO back to yourself as a smaller UTXO. With this in mind, we can move on to discuss privacy and blockchain analysis.
Before diving into privacy, we should preface this with a disclaimer that blockchain analysis is not inherently bad. Many of the metrics measuring user adoption, investor sentiment, and other interesting insights are made possible by analyzing the public ledger. In fact, we recently published an article about how Coin Metrics has analyzed the Bitcoin ledger to estimate miner spending behavior, which we think is pretty cool.
Where things become stickier is when blockchain analysis is used to attribute wallet addresses and transactions to specific real-world identities. In other words, when an individual's privacy is eroded. But we are not here to discuss the morality of chain analysis done with this purpose. Instead, we are going to proceed with the assumption that preserving privacy is good, as it makes Bitcoin more fungible and censorship resistant.
Now, let’s establish a couple of simple spectrums for measuring privacy. First is the Identity Spectrum, specifying how confident blockchain analysts can be about the identity of the individual or group that owns a particular wallet. On the far left side is 0% confidence, meaning that no details are known about the wallet owner. On the opposite side is 100% confidence, meaning that the wallet owner has been positively identified through regulatory compliance (e.g. KYC) or other means.
The second spectrum that we care about is how “clean” a given UTXO is. We’ll call this the Tainted Spectrum. On the far left side we have “dirty” or “tainted” coins, which describes UTXOs that are known to be involved in criminal activity. On the right side we have “clean” coins that either have no history (i.e. coinbase transactions containing newly mined coins) or are in a known legally-compliant wallet such as that of a regulated exchange.
In both cases, most transactions and wallets lie somewhere in the middle of these spectrums, not on the far ends.
One of the main goals of chain analytics firms when working for governments or regulatory-compliant businesses is to ensure that their clients only interact with addresses on the right side of the Identity Spectrum. By labeling the addresses that are associated with shady activity, it enables exchanges and other services to ban users who attempt to deposit tainted coins.
The most significant method used by analysts to accomplish these goals is called heuristic clustering. This is a technique whose aim is to find patterns within the data and to arrange the data points such that they are split into separate, distinctly labeled groups. Clustering techniques typically maximize the similarity of data points within a given cluster while making the centroids of each cluster as far apart (as distinguishable) as possible from one another.
In the context of blockchain analysis, there are numerous heuristics for every transaction that can be used in an attempt to cluster them, but the two most prominent clustering methods are:
Every exchange, wallet, or payment service is ultimately just software, and this software tends to produce some unique patterns which clustering algorithms can recognize. As a result, blockchain analysis can narrow down the potential owner of a wallet to smaller subsets of users based on a combination of common spending, one-time change inference, and these traces left behind by the software used.
In the image below from Crystal Blockchain, you can see an example of how UTXOs can be labeled by these means, giving service providers some insights about whether the funds are associated with non-suspicious categories such as exchanges or high-risk categories including ransomware and dark net wallets.
Calling to mind the Identity Spectrum, it’s safe to say that all the activity in the image above besides the Uncategorized inflows and outflows is on the right side of the spectrum. The individuals owning the categorized wallets may or may not be known, but they have already lost a fair deal of privacy simply by virtue of their activities being known.
If a given wallet address is labeled as part of a high-risk cluster such as a scam or a darknet market, service providers who have paid for this chain analysis data will censor those addresses from using their services in order to satisfy local regulations.
Finally, this brings us to the point of the article — Taproot.
You’ve probably heard that Taproot can improve privacy on Bitcoin. This is because Taproot enables many different smart contract transactions (e.g. multi-sigs, closing and opening Lightning Network channels, etc.) to appear on the blockchain as simple transactions. By obfuscating the true nature of the transaction, it makes it possible for those smart contract transactions to hide amongst the “regular” ones and the others disguised as regular ones.
Not only does Taproot make specific purposes of transactions much more difficult to identify, but it also enables them to take less block space than was possible previously. We won’t go deeper into the technical details of Schnorr signatures and MAST (incorporated in the Taproot upgrade) here, but if you aren’t already familiar with them then you can check out this simple explanation.
At last, let’s tie in the concepts from the previous section on blockchain analysis. We know that analysis software is looking to create clusters of wallets that are as similar as possible to each other, and as different as possible from other wallets. These clusters are better known as anonymity sets. Generally speaking, the bigger an anonymity set is (i.e. the more addresses included in it), the more private it is.
Since blockchain analysts have many heuristics they can use to cluster UTXOs and wallet addresses, we have to think about all of them when considering privacy. And this is where a potential downside of Taproot comes into play.
You see, Taproot introduces a new address type, P2TR (Pay-to-Taproot), which is easily distinguishable from existing address types. Therefore, when Taproot is activated, the first users will be putting themselves into a very small anonymity set with the others using P2TR, as most transactions will still be occurring with other, more popular address types.
Smaller anonymity sets make it easier for chain analysts to trace funds across the blockchain, particularly in the cases where the sender(s) and recipient(s) of a transaction are not using the same address type. A big reason for that is what we mentioned previously: most transactions have change. In the vast majority of cases, the address type receiving change is the same as the one sending it, even if the address itself is different. As a result, it’s relatively straightforward for analysis software to label change UTXOs when other outputs are sent to different address types.
Think of it like playing a game where you try to follow a specific card while somebody moves it around in a pile of other cards on a table. The cards are UTXOs while the amount of cards in the pile is like the anonymity set. Now imagine that all of the aces in the deck are a different color from the rest of the cards. How much easier would it be to follow an ace through the deck compared to if all the cards were the same color? Even if you lose it completely, you have a 25% chance of guessing correctly, versus a 1.9% chance if the color of all the cards was identical.
Blockchain analysis software is essentially playing a much more complicated version of that game. Whereas before Taproot, smart contract transactions were like cards with small bends or tears in them that make them easy to distinguish, Taproot makes sure the individual cards aren’t as distinguishable, but it also turns all the cards with the same address type into a different color from the rest of the cards in the pile.
The small initial anonymity sets are a known downside of Taproot. It’s a downside that can be reduced in magnitude or even turned into an upside the more Taproot gets adopted (i.e. the larger the P2TR pile of cards grows). Additionally, it’s important to keep in mind that address type is only one of many factors that are used for clustering in chain analysis, and by all accounts Taproot is an improvement or neutral on the other factors.
A couple of open questions are:
It is reasonable to say that Bitcoin is currently not very private. Even before introducing Taproot, there are already 4 common address types (P2PKH, P2WPKH, P2SH, P2WSH) being used today, with some less common sub-types of those that can show up when smart contract UTXOs are spent. Taproot initially makes things worse, but ultimately has the potential to make things better, especially for smart contract users whose UTXOs are already part of relatively small anonymity sets. It makes it cheaper to open and close Lightning channels and to do co-spend transactions, which includes coin mixes that are good for privacy.
If Taproot could hypothetically achieve 100% adoption for future transactions, it would make the job of chain analysts much more difficult. Without Taproot, ~100% adoption of any address type would be impossible because there was previously no way to make smart contracts indistinguishable from simpler transactions.
Of course, maintaining backwards compatibility (i.e. no hard forks) means that adoption will not occur that quickly, at least for ordinary transactions. That said, you can be pro-Taproot activation while still acknowledging that there are some downsides.
It’s even a possibility that we never reach the threshold needed for Taproot to be a net positive for privacy. So, how can we avoid that possibility?
A great start would be exchanges using Taproot in mass, which they are incentivized to do considering that they are likely the most active users of multi-sig transactions which are made cheaper by Taproot. If exchanges do their part, it’s up to wallet service providers to also implement Taproot for their users, so that multi-sig exchange withdrawals to single-sig users are not distinguishable on-chain.
This all starts with education. Most new adopters of Bitcoin will likely never know or care about address types and other technical aspects of the network. But we are the early adopters, the vocal and powerful minority in the future Bitcoin ecosystem. It is up to us to follow best practices, educate others, and support service providers who help move Bitcoin forward.
If Taproot is activated, it is important for privacy that it also be widely adopted — the alternative is that P2TR addresses end up being another useful heuristic for blockchain analysts to follow the flows of UTXOs.
The importance of privacy is only going to increase from here. As Andreas Antonopolous recently said, “there is no middle ground between authoritarians and free systems.”
In order for Bitcoin to be the best possible tool for financial freedom across the world that it is capable of being, we must make it so ourselves. Luckily, all Bitcoiners can vote with our satoshis, and that’s the most powerful vote there is.
Miners! If you care about your operation’s security and privacy, we have another article for you:
Bitcoin mining company: Slush Pool, Braiins OS+ & Stratum V2.
By miners, for miners.