Data Accuracy in Web3 💻

Lava Vision and How it Works
Kagemni Karimu
Apr 21, 2023

tl;dr

Data accuracy is an overlooked nuiscance in Web3. It plagues RPC calls and all blockchain interactions. Incorrect data from blockchains can lead to incorrect token balances, erroneous smart contract execution, compromised reputations systems, failed transactions, and misleading price information on decentralized exchanges. The problem could be exacerbated by the rise of AI…

According to Hackernoon, “For the simple reason that they were giving their users contradictory information, a popular NFT team lost tens of thousands of subscribers.” The story is repeated elsewhere, as Alchemy’s data accuracy article similarly exclaims that frontend bugs caused by unreliable data led to a whopping $2,000,000 loss!

By now, we hope that you know a little bit about RPC. In my previous article on Web3 RPC Developer Woes, we talked about how RPC sometimes returns inaccurate, invalid, and wildly stale data. Now, I want to talk about how and why we ensure the data we receive is coming correctly from the blockchain- and the ways in which unreliable data can manifest into costly mistakes.

Data Accuracy is a Hidden Nuisance in Web3 🐑 😎

There are problems that everyone has but nobody thinks about and problems that few people have but everyone thinks about. One of the hidden nuisances when using RPC is data verifiability, which refers to the ability to ensure that the data being exchanged is authentic and accurate. It is a problem that few people think about, but everyone has experienced — in one way or another — and which can have profound consequences.

Because these issues do not generate error messages, it is difficult to catch them. Note that when they do happen, if you’re unclear about the source of the issue, you can spend hours debugging them — trying to pin down the cause of the undesired outputs of your calls. In the meantime, your application can go haywire — producing unexpected behavior.

Ultimately, this is caused because RPC is accomplished through peer-to-peer connections in the blockchain world, and as such, it is difficult to prove the veracity of the data received. In most cases, such data is not stored on chain, but metadata is. Therefore, in a majority of cases, only the metadata of a remote call can be verified.

Metadata, in this context, refers to data that provides information about the structure and properties of the actual data being exchanged. Most commonly, this includes a basic “who, what, and when” that excludes the content of the request.

Our fren Bajali points out this is accomplished primarily by recording three factors on-chain: 1) digital signature of the authorizing parties, 2) a hash of high-visibility data exchanged, and 3) timestamp of the transaction. While metadata can provide some level of confidence in the data’s integrity, it alone is often not sufficient to ensure that the exchanged data itself has not been tampered with. The issue of data verifiability remains.

The Consequences of Inaccurate Data 💣

So what happens if data is inaccurate, you ask? Inaccurate data can be the cause of a ton of issues:

  • Incorrect token balances: Inaccurate data may lead to users seeing incorrect token balances in their wallets, causing confusion and potential financial loss if they make transactions based on incorrect information.
  • Misleading price information: In decentralized finance (DeFi) platforms, inaccurate price data can result in users making poor trading decisions or executing transactions at unfavorable prices.
  • Erroneous smart contract execution: If inaccurate data is used as input for smart contracts, it can cause unintended consequences, such as the loss of funds, incorrect asset distribution, or the triggering of undesired actions.
  • Failed transactions: Inaccurate data can lead to failed transactions due to issues such as insufficient gas fees or incorrect nonce values, resulting in wasted time, effort, and potentially lost funds for users.
  • Blockchain reorgs: Inaccurate data may lead to blockchain reorganizations, where multiple competing chains temporarily exist. This can cause confusion, delayed transactions, and even loss of funds in some cases.
  • Misrepresentation of NFT ownership: Inaccurate data regarding NFT ownership can result in disputes, wrongful sales or transfers, and potential legal issues.
  • Incorrect voting results: In decentralized governance systems, inaccurate data can lead to incorrect voting outcomes, resulting in undesired changes to the protocol or unfair distribution of resources.
  • False alerts or missed security incidents: Inaccurate data can lead to false security alerts or the failure to detect actual security breaches, leaving networks vulnerable to attacks and exploitation.
  • Compromised reputation systems: In decentralized platforms that rely on reputation systems, inaccurate data can lead to users being unfairly penalized or malicious actors gaining unwarranted trust.
  • Cascading failures in interconnected systems: Inaccurate data can propagate through interconnected systems, causing cascading failures, loss of funds, and widespread disruptions in the Web3/crypto ecosystem.

Why Verifying Data Accuracy is Important ❗️

Beyond avoiding the pitfalls listed above, there are second-order, ideological reasons why verifying data can be important:

🔴Trustlessness

One of the core principles of Web3 is the concept of trustlessness, which means that users and participants in the network do not need to rely on a central authority or trust any single entity in order to operate. Data verifiability plays a key role in ensuring trustlessness, as it allows users to independently validate the authenticity and accuracy of the data being exchanged within the network. This empowers users to have confidence in the system without having to rely on a centralized governing entity.

🔴Security

In a decentralized ecosystem like Web3, ensuring the security of data is of utmost importance, as it forms the basis of transactions, smart contracts, and other interactions between users and applications. Beyond this, users will not use an application that appears risky, has known exploits, or leaks critical data. Verifying data helps protect against malicious actors who may attempt to manipulate or tamper with the data to gain undue advantage, compromise the network, or conduct fraudulent activities.

🔴Interoperability

Web3 envisions a highly interconnected and interoperable landscape of decentralized applications, platforms, and protocols. Verifying data is crucial in enabling seamless communication and collaboration between different components in the Web3 ecosystem. Ensuring data verifiability allows developers to build reliable applications and services that can exchange data across various networks and platforms without compromising the integrity and accuracy of the information. This promotes greater innovation, efficiency, and user experience in the Web3 space.

How Data is Verified

Many approaches are used at various layers to ensure accurate data. Data verifiability mechanisms, such as cryptographic signatures and consensus algorithms, help secure the network and protect users’ assets and information for anything kept on-chain. Elsewhere, ZKproofs, and optimistic sampling protocols are used to ensure data is what it’s supposed to be! Let’s look at a few of these closer:

  • Cryptographic signatures: These are digital signatures that rely on cryptographic algorithms to verify the authenticity and integrity of data. By signing data with a private key, the sender can prove that they are the legitimate source of the information and indicate that the data has not been altered during transmission. This method of verifying data is fairly abundant with ECDSA, ERC-725, JSON Web Tokens (JWT), Decentralized Identifiers (DID) and DAO voting. Smart contracts, frequently verify these signatures to authenticate senders and execute themselves.
  • On-chain consensus: In blockchain-based systems, on-chain consensus mechanisms (e.g., Proof of Work, Proof of Stake) are used to ensure that all nodes in the network agree on the validity of transactions and data. These consensus algorithms are designed to be resistant to manipulation and to ensure that a single malicious node cannot control the network. Virtually all blockchain projects implement some form of consensus mechanism -and a good deal of them share similar consensus mechanisms schema (i.e. Cosmos appchains all use Tendermint BFT for the time being).
  • Optimistic sampling: This is a technique used to verify data in distributed systems by randomly sampling a small subset of nodes and cross-validating their responses. If the sampled nodes provide consistent information, the data can be considered trustworthy. This method reduces the time and computational resources needed to verify data across the entire network. Networks such as Avalanche and IPFS which are highly distributed rely upon this method.
  • Zero-Knowledge Proofs (ZKPs): Zero-knowledge proofs are cryptographic techniques that enable one party to prove to another that they possess certain information without revealing the actual data. This allows for secure and privacy-preserving verification in Web3 systems. zk-SNARKs (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) and zk-STARKs (Zero-Knowledge Scalable Transparent ARguments of Knowledge) are examples of ZKP implementations. These mechanisms are particularly useful in privacy-focused applications, such as anonymous transactions in cryptocurrencies (e.g., Zcash) or off-chain computations in decentralized systems, where data confidentiality is of paramount importance. By incorporating ZKPs, Web3 systems can achieve data verifiability while preserving privacy and reducing the trust required among participants.

Data in the age of AI

One last point — as we move towards a world increasingly dominated by artificial intelligence (AI), data verifiability will become even more critical. Some have even said AI without verifiable data will be disastrous. Undoubtedly, the performance and reliability of AI systems depends on the quality and authenticity of the data they process. Ensuring data verifiability in AI-driven systems will be crucial to prevent biases, misinformation, and manipulation. Furthermore, lack of data accuracy can have profound impact on independent actors in monetized systems! That is to say, on web3 users in cryptonetworks!

Moreover, one must consider how profoundly different a world which has powerful AI will be from a data perspective. Basic tasks such as data entry and dummy data generation will be performed at unprecedented speeds. The sheer amount of data in interoperable systems will unavoidably increase. This creates new vectors for attack and new considerations for verifiability — if, for example, AI is clever enough to generate patterned data which appears “correct” to existing systems of verification. On the other foot, one must also consider that AI can be a valid tool for defense and can create more sophisticated visions of data verification technologies.

All this to say, there is a unique maelstrom of potential unleashed the moment AI meets trustless, permissionless decentralized networks and does its thing! Data verifiability, a mere nuisance now, will be significantly more important as time continues — and our methods for doing so in web3 will need to be cheaper, more robust and more performant!

That’s all for now… 💪🏿 In an upcoming article, we will delve deeper into the challenges and potential solutions for data verifiability in the age of AI. We will explore how cutting-edge technologies such as homomorphic encryption, ZeroKnowledge proofs, and other techniques can contribute to improving data verification. Sleep tight and don’t trust, verify 😉.

About the Author🧑🏿‍💻

KagemniKarimu is current Developer Relations Engineer for Lava Network and former Developer Relations at Skynet Labs. He’s a self-proclaimed Rubyist, new Rust learner, and friendly Web3 enthusiast who entertains all conversations about tech. Follow him on Twitter or say hi to him on Lava’s Discord where he can be found lurking.

About Lava 🌋

Lava is a decentralized network of top-tier API providers, where developers make one subscription to access any blockchain. Providers are rewarded for their quality of service, so your users can fetch data and send transactions with maximum speed, data integrity and uptime. Pairings are randomized, meaning your users can make make queries or transact in privacy.

We help developers build web3-native apps on any chain, while giving users the best possible experience.