#18 - Deciphering the Trust Puzzle of Interoperability

A deep dive into the design of Arbitrary Message Passing (AMP) Protocols

Jun 12, 2023

Stanford Blockchain Review
Volume 2, Article No. 8

📚 Authors: Shi Khai Wei, Raghav Agarwal – LongHash Ventures
🌟 Technical Prerequisite: Moderate/Advanced

Introduction

The future is multichain. The quest for scalability has led Ethereum towards rollups. The shift towards modular blockchains has reignited attention on app chains. And over the horizon, we hear whispers of application-specific roll-ups, L3s, and sovereign chains. But all this has come at the cost of fragmentation, and current bridges are often limited in functionality and rely on trusted signers for security.

What will the endgame of an interconnected web3 look like? We believe that bridges will eventually evolve into cross-chain messaging or “Arbitrary Message Passing” (AMP) protocols to unlock new use cases, by allowing applications to pass arbitrary messages from source to destination chain. We will also see a “trust mechanism landscape” emerge, where builders make various tradeoffs in usability, complexity, and security.

Every AMP solution needs two critical capabilities:

Verification: The ability to verify the validity of the message from the source chain on the destination chain
Liveness: The ability to relay information from source to destination

Unfortunately, 100% trustless verification is not realistic and the users are either required to trust code, game theory, humans (or entities), or a combination of these, depending on whether the verification is being done on-chain or off-chain.

In this essay, we will divide the overall interoperability landscape vertically based on trust mechanism used and horizontally based on integration architecture.

Trust Mechanism:

Trust Code and Math: For these solutions, on-chain proof exists and can be verified by anyone. These solutions generally rely on a light client to either validate the consensus of a source chain on a destination chain or verify the validity of a state transition for a source chain on a destination chain. Verification through light clients can be made much more efficient through Zero Knowledge proofs to compress arbitrarily long computations offline and provide a simple verification on-chain to prove computations.
Trust Game Theory: There is an additional trust assumption when the user/application has to trust a third party or network of third parties for the authenticity of transactions. These mechanisms can be made more secure through permissionless networks coupled with game theoretics such as economic incentives and optimistic security.
Trust Humans: These solutions rely on honesty from the majority of the validators or independence of entities relaying different information. They require trust in third parties in addition to trusting the consensus of the two interacting chains. The only thing at stake here is the reputation of the participating entities. If enough participating entities agree that a transaction is valid, then it is considered valid.

It is important to note that all solutions, to a certain degree, require trust in code as well as humans. Any solution with faulty code can be exploited by hackers and every solution has some human element in the setup, upgrades, or maintenance of the codebase.

Integration architecture:

Point-to-Point model: A dedicated communication channel needs to be established between every source and every destination.
Hub and Spoke model: A communication channel needs to be established with a central hub that enables connectivity with all other blockchains connected to that hub.

The Point to Point model is relatively difficult to scale as a pairwise communication channel is required for every connected blockchain. Developing these channels can be challenging for blockchains with different consensus and frameworks. However, pairwise bridges provide more flexibility to customize configurations, if needed. A hybrid approach is also possible, for example, by using Inter-Blockchain Communication protocol (IBC) with multi-hop routing via a hub, which removes the need for direct pairwise communication, but reintroduces more complexity in security, latency, and cost considerations.

Trust Code and Math

In order to only rely on code/math for trust assumptions, light clients can be used to validate the consensus of a source chain on a destination chain. A light client/node is a piece of software that connects to full nodes to interact with the blockchain. Light clients on the destination chain normally store the history of block headers (sequentially) of the source chain which is enough to verify the transactions. Off-chain agents like relayers monitor the events on the source chain, generate cryptographic inclusion proofs, and forward them along with the block headers, to the light client on the destination chain. Light clients are able to verify the transaction as they store the block headers sequentially and each block header contains the Merkle root hash which can be used to prove the state. Here is an overview of the key features of this approach:

Security

During the initialization of the light client, trust assumptions are introduced. When creating a new light client, it is initialized with a header from a specific height on the counterparty chain. However, there is a possibility that the supplied header could be incorrect, potentially tricking the light client with fake headers. Once the light client has been initialized, no further trust assumptions are introduced. However, it is worth noting that this initialization process relies on a weak trust assumption, as anyone can verify it. Additionally, there is a liveness assumption on the relayer, as it is responsible for transmitting the information.

Implementation

The implementation of the light client depends on the availability of support for the cryptographic primitives required for verification. If the same type of chain is being connected, meaning they share the same application framework and consensus algorithm, then the implementation of the light client on both sides will be the same. For example, the Inter-Blockchain Communication (IBC) protocol is used for all Cosmos SDK-based chains. On the other hand, if two different types of chains are being connected, such as different application frameworks or consensus types, then the implementation of the light client will differ. An example of this is Composable Finance, which is working to enable Cosmos SDK chains to be connected via IBC to Substrate, the application framework of the Polkadot ecosystem. This requires a Tendermint light client on the Substrate chain and a "beefy" light client added to the Cosmos SDK chain. Recently, they launched the first connection between Polkadot and Kusama via IBC.

Challenges

Resource intensiveness is a significant challenge. Running pairwise light clients on all chains can be expensive, as writes on blockchains are costly. Moreover, it is not feasible to run light clients on chains with dynamic validator sets like Ethereum.

Extensibility is another challenge. The implementation of the light client varies based on the architecture of the chain, making it difficult to scale and connect different ecosystems.

Code exploitation is a potential risk, as errors in the code can lead to vulnerabilities. An example of this is the BNB chain exploit in October 2022, which uncovered a critical security vulnerability [1] affecting all IBC-enabled chains.

To address the cost and practicality issues of running pairwise light clients on all chains, alternative solutions such as zero-knowledge (ZK) proofs offer a way to eliminate the need for trust in third parties.

ZK Proofs as a Solution for Third-Party Trust

ZK proofs can be utilized to verify the validity of state-transitions from the source chain on the destination chain. Rather than executing the entire computation on-chain, only the verification of the proof of computation is performed on-chain, while the actual computation takes place off-chain. This approach allows for quicker and more gas-efficient verification compared to re-running the original computation. Some examples include Polymer ZK-IBC by Polymer Labs and Telepathy by Succinct Labs. Polymer is working on multi-hop [2] enabled IBC to enhance connectivity while reducing the number of pairwise connections required.

The key aspects of this mechanism include:

Security

The security of zk-SNARKs relies on elliptic curves, while zk-STARKs depend on hash functions. zk-SNARKs may require a trusted setup, which involves creating initial keys for generating proofs used in verification. It's crucial to destroy the secrets of the setup event to prevent the potential forging of transactions through false verifications. Once the trusted setup is complete, no further trust assumptions are introduced. Additionally, new ZK frameworks like Halo and Halo2 eliminate the need for a trusted setup altogether.

Implementation

Various ZK proving schemes such as SNARK, STARK, VPD, and SNARG exist, with SNARK being the most widely adopted currently. Different SNARK proving frameworks like Groth16, Plonk, Marlin, Halo, and Halo2 offer trade-offs in terms of proof size, proving time, verification time, memory requirements, and the need for a trusted setup. Recursive ZK proofs have also emerged, allowing the workload of proving to be distributed among multiple computers instead of a single one. To generate validity proofs, the following core primitives must be implemented: verification of the signature scheme used by validators, inclusion of proof of validator public keys in the validator set commitment stored on-chain, and tracking the set of validators, which can change frequently.

Challenges

Implementing various signature schemes within zkSNARKs requires implementing out-of-field arithmetic and complex elliptic curve operations, which is not straightforward and may require different implementations for each chain depending on their framework and consensus. Auditing ZK circuits is a challenging and error-prone task. Developers need to familiarize themselves with domain-specific languages like Circom, Cairo, and Noir, or directly implement circuits themselves, both of which can be challenging and slow down adoption. If proving time and effort are extremely high, only specialized teams with specialized hardware may be able to handle them, potentially leading to centralization. Higher proof generation time can also result in latency. Techniques like Incrementally Verifiable Computation (IVC) can optimize proving time, but much of this is still in the research phase and awaits implementation. Longer verification time and effort will increase on-chain costs.

Trust Game Theory

Interoperability protocols that rely on game theoretics can be broadly divided into 2 categories based on how they incentivize honest behavior from participating entities:

The first category is economic security, where multiple external participants, such as validators, collaborate to reach a consensus on the updated state of the source chain. To become a validator, participants are required to stake a certain amount of tokens, which can be slashed in the event of malicious activity. In permissionless setups, anyone can accumulate stakes and become a validator. Additionally, economic incentives in the form of block rewards are provided to validators who follow the protocol, ensuring economic motivation for honesty. However, if the potential amount that can be stolen outweighs the stakes, participants may collude to steal funds. Examples of protocols using economic security are Axelar and Celer IM.

The second category is optimistic security, where solutions rely on the assumption that only a minority of blockchain participants are honest and follow the protocol rules. A single honest participant can act as a guarantee in this approach. For instance, an optimal solution allows anyone to submit fraud proof. Although there are economic incentives, it is possible for an honest watcher to miss a fraudulent transaction. Optimistic roll-ups also employ this mechanism. Nomad and ChainLink CCIP are examples of protocols using optimistic security. In the case of Nomad, watchers are able to prove fraud, although they are whitelisted at the time of writing. ChainLink CCIP plans to leverage an Anti-Fraud Network consisting of decentralized oracle networks dedicated to monitoring malicious activity, although the implementation of CCIP's Anti-Fraud Network is yet to be seen.

Security

When it comes to security, both mechanisms rely on permissionless participation from validators and watchers to ensure the effectiveness of game theory. In the economic security mechanism, the funds are more vulnerable if the amount staked is lower than the potential amount that can be stolen. On the other hand, in the optimistic security mechanism, the minority trust assumptions can be exploited if no one submits the fraud proof or if permissioned watchers are compromised or removed. In contrast, economic security mechanisms are not as dependent on liveness for maintaining security.

Implementations

In terms of implementation, one approach involves a middle chain with its own validators. In this setup, a group of external validators monitors the source chain and reaches consensus on the validity of transactions when a call is detected. Once consensus is achieved, they provide attestation on the destination chain. Validators are typically required to stake a certain amount of tokens, which can be slashed if malicious activity is detected. Examples of protocols using this implementation approach include Axelar Network and Celer IM.

Another implementation approach involves using off-chain agents. Off-chain agents are utilized to implement a solution similar to optimistic roll-ups. During a predefined time window, these off-chain agents are allowed to submit fraud proofs and revert transactions if necessary. Nomad, for instance, relies on independent off-chain agents to relay the header and cryptographic proof. ChainLink CCIP, on the other hand, plans to leverage its existing oracle network for monitoring and attesting cross-chain transactions.

Advantages and Challenges

One key advantage of game-theoretic AMPs is resource optimization since the verification process typically does not occur on-chain, resulting in lower resource requirements. Moreover, these mechanisms exhibit extensibility, as the consensus mechanism remains the same for various types of chains and can be easily extended to heterogeneous blockchains.

There are also several challenges associated with these mechanisms. Trust assumptions can be exploited to steal funds if the majority of validators collude, which necessitates the use of countermeasures like quadratic voting and fraud proofs. Additionally, optimistic security-based solutions introduce complexity in terms of finality and liveness, as users and applications need to wait for the fraud window to ensure transaction validity.

Trust Humans

Solutions which require trust in human entities can also be broadly divided into two categories:

Reputational Security: These solutions rely on a multi-sig implementation where multiple entities verify and sign the transactions. Once the minimum threshold is achieved, the transaction is considered valid. The assumption here is that the majority of the entities are honest and if a majority of these entities are signing on a particular transaction, it is valid. The only thing at stake here is the reputation of the participating entities. Some examples include Multichain (Anycall V6), Wormhole. Exploits due to smart contract bugs are still possible, as evidenced by the Wormhole hack [3] in early 2022.

Independence: These solutions split the entire message passing process into two parts and rely on different independent entities to manage the two processes. The assumption here is that the two entities are independent of each other and are not colluding. An example of this is LayerZero. The block headers are streamed on demand by decentralized oracles and transaction proofs are sent via relayers. If the proof matches the headers, the transaction is considered valid. While proof matching relies on code/math, participants are required to trust the entities to remain independent. The applications building on LayerZero have the option to choose their Oracle and Relayer (or host their own Oracle/Relayer), thereby limiting the risk to individual oracle/relayer collusion. The end users need to trust that either LayerZero, a third party, or the application itself is running the oracle and relayer independently and without malicious intentions.

In both approaches, the reputation of participating 3rd party entities disincentivizes malicious behaviour. These are usually respected entities within the validator and oracle community and they risk reputational consequences and negative impact on their other business activities if they act maliciously.

Beyond Trust Assumptions: Other Considerations for AMP Solutions

While considering the security and usability of an AMP solution, we need to also take into account the details beyond basic mechanisms. As these are moving parts that can change over time, we did not include them in the overall comparison.

Code Integrity

Recent hacks have exploited code bugs, highlighting the need for reliable audits, bug bounties, and diverse client implementations. If all the validators (in economic/optimistic/reputational security) run the same client (software for verification), it increases the dependency on a single codebase and reduces client diversity. Ethereum, for example, relies on multiple execution clients like geth, nethermind, erigon, besu, akula. Multiple implementations in a variety of languages are likely to increase diversity without any client dominating the network, thereby eliminating a potential single point of failure. Having multiple clients could also help with liveness if a minority of validators/signers/light clients go down due to exploits/bugs in one particular implementation.

Setup and Upgradability

Users and developers need to be aware if validators/ watchers can join the network in a permissionless manner, otherwise trust is hidden by the selection of permissioned entities. Upgrades to smart contracts can also introduce bugs which can lead to exploits or even potentially change the trust assumptions. Different solutions can be implemented to mitigate these risks. For example, in the current instantiation, the Axelar gateways are upgradable subject to approval from an offline committee (4/8 threshold), however, in the near future Axelar plans to require all validators to collectively approve any upgrades to the gateways. Wormhole’s core contracts are upgradeable and are managed via Wormhole’s on-chain governance system. LayerZero relies on immutable smart contracts and immutable libraries to avoid any upgrades, however, it can push a new library, and dapps with default settings would get the updated version, and dapps with their version manually set would need to set it to the new one.

Maximal Extractable Value (MEV)

Different blockchains are not synchronized through a common clock and have different times to finality. As a result, the order and time of execution on the destination chain can vary across chains. MEV in a cross-chain world is challenging to clearly define. It introduces a trade-off between liveness and order of execution. An ordered channel will ensure the ordered delivery of messages but the channel will close if one message times out. Another application might prefer a scenario where ordering is not necessary but the delivery of other messages is not impacted.

Source Chain Finality

Ideally AMP solutions should wait for the source chain to reach finality before transmitting state information from the source chain to one or more destination chains. This will ensure that there is a negligible probability of a block on the source chain being reversed or altered. However, in order to provide the best user experience, many solutions provide instant message passing and make trust assumptions related to finality. In this case, if the source chain experiences a state reversion after the message has been passed and tokens are bridged, it could lead to scenarios like double spending of bridged funds. AMP solutions can manage this risk through multiple approaches like different finality assumptions for different chains based on how decentralized the chain is, or by trading off speed for security. Bridges leveraging AMP solutions can put a limit on the amount of assets that can be bridged before finality is achieved on the source chain.

Trends and Future Outlook

Customisable and Additive Security

To better serve diverse use cases, AMP solutions are incentivized to offer more flexibility to developers. Axelar introduced an approach [4] for the upgradability of message passing and verification, without any changes to application-layer logic. HyperLane V2 introduced modules that allow developers to choose from multiple choices such as economic security, optimistic security, dynamic security, and hybrid security. CelerIM offers additional optimistic security along with economic security. Many solutions wait for a predefined minimum number of block confirmations on the source chain before transmitting the message. LayerZero allows developers to update these parameters. We expect some AMP solutions to continue offering more flexibility but these design choices warrant some discussion. Should the apps be able to configure their security, to what extent, and what happens if the apps adopt sub-par design architecture? User awareness of the basic concepts behind security may become increasingly important. Ultimately, we foresee aggregation and abstraction of AMP solutions, perhaps in some form of combination or “additive” security [5].

Maturation of “Trust Code and Math” Mechanisms

In an ideal endgame, all cross-chain messages will be trust minimized by using zero knowledge (ZK) proofs to verify messages and states. We are already witnessing this shift with the emergence of projects like Polymer Labs and Succinct Labs. Multichain has also published [6] a zkRouter whitepaper to enable interoperability through ZK proofs. With the recently announced [7] Axelar Virtual Machine, developers can leverage the Interchain Amplifier to permissionlessly set up new connections to the Axelar network. For example, once robust light-clients & ZK proofs for Ethereum’s state are developed, a developer can easily integrate them into the Axelar network to replace or enhance an existing connection. Celer Network has announced [8] Brevis, a ZK omnichain data attestation platform, enabling dApps and smart contracts to access, compute, and utilize arbitrary data across multiple blockchains. Using ZK light client circuits, Celer has implemented a user-facing asset zkBridge between Ethereum Goerli and BNB Chain testnet. LayerZero in its documentation talks about the possibility of adding new optimized proof messaging libraries in the future. Newer projects like Lagrange are exploring the aggregation of multiple proofs from multiple source chains and Herodotus is making storage proofs feasible through ZK proofs. However, this transition will take time as this approach is difficult to scale among blockchains relying on different consensus mechanisms and frameworks.

ZK is a relatively new and complex tech that is challenging to audit, and, currently, the verification and proof generation is not cost-optimal. We believe that in the long run, to support highly scalable cross-chain applications on the blockchain, many AMP solutions are likely to complement trusted humans and entities with verifiable software because:

The possibility of code exploitation can be minimized through audits and bug bounties. With the passage of time, it will be easier to trust these systems as their history will serve as proof of their security.
The cost of generating ZK proofs will decrease. With more R&D in ZKPs, recursive ZK, proof aggregation, folding schemes and specialized hardware, we expect the time and cost of proof generation and verification to reduce significantly, making it a more cost-effective approach.
Blockchains will become more zk-friendly. In the future, zkEVMs will be able to provide a succinct validity proof of execution and light client-based solutions will be able to easily verify both execution and consensus of a source chain on the destination chain. In Ethereum’s endgame, there are also plans to “zk-SNARK everything” including consensus.

Proof of Humanity, Reputation, and Identity

The security of complex systems like AMP solutions cannot be encapsulated through a single framework and warrants multiple layers of solutions. For example, along with economic incentives, Axelar has implemented quadratic voting [9] to prevent the concentration of voting power among a subset of nodes and promote decentralization. Other proofs of humanity, reputation, and identity can also complement mechanisms for setup and permission.

Conclusion

In the Web3 spirit of openness, we will likely see a plural future where multiple approaches co-exist. In fact, applications may choose to use multiple interoperability solutions, either in a redundant way, or let users mix-and-match with disclosure of trade-offs. Point-to-point solutions may be prioritized between “high traffic” routes, whereas hub and spoke models may dominate the long tail of chains. In the end, it is up to us, a community of users, builders, and contributors, to shape the topography of the interconnected web3.

About the Authors

LongHash Ventures specializes in bootstrapping Web3 ecosystems. LongHash venture funds invest in early-stage Web3 protocols, and LongHashX Accelerator partners with ecosystems and protocols to accelerate early-stage founders.

Shi Khai Wei is the general partner and chief operations officer at LongHash Ventures, a Web3-focused venture fund and accelerator. In 2021, Shi Khai was awarded Forbes 30 Under 30 in recognition of his achievements. Shi Khai was previously a management consultant at McKinsey, with a focus on digital transformation and analytics across the financial and telecommunications sector across Southeast Asia. Shi Khai holds a Bachelor of Science from Imperial College London.

Raghav Agarwal is an investment analyst at LongHash Ventures. Raghav has previously worked as research analyst at D.E. Shaw and product analyst at Fidelity Investments. Raghav is a computer science engineer and holds a Master in Business Administration from Indian Institute of Management, Shillong.

We would also like to thank Bo Du and Peter Kim from Polymer Labs, Galen Moore from Axelar Network, Uma Roy from Succinct Labs, Alex Smirnov from deBridge, Max Glassman and Ryan Zarick from LayerZero for reviewing and providing their valuable feedback.