A Deep Dive into Data Availability: The Promises and Challenges of Scaling Web3

Intro - The Problem with Data Availability

The blockchain ecosystem is constantly evolving to improve scalability and build more application layer primitives. Still, adoption is hindered by the “blockchain trilemma.” That is, the tension which exists between decentralization, security and scalability such that all three pillars have yet to be fully achieved in cohesion without sacrificing one pillar.

Traditionally, blockchains were built on a monolithic architecture, with a singular layer containing transaction execution, consensus mechanisms, data availability, and settlement. However, such rigidity has led to issues. Take for example Bitcoin’s low transaction throughput of ~7 transactions per second (TPS) or Ethereum’s high gas fees. For the former, limitations in scalability have led to unusably slow transaction speeds, and for the latter, prohibitively high transaction costs. These issues have led to developers pushing for a new approach to blockchain design: modular blockchains. With modular blockchains, execution, settlement, and data availability all function in separate layers. This specialization allows for increased throughput and lower transaction costs, leading to more scalable solutions.

*Figure 1: Modular blockchains use independent components to separate the responsibilities of transaction execution, consensus, and data availability on different networks or sidechains.*

Why We Need DA Protocols

Ethereum Layer 2 (L2) solutions play a crucial role in Ethereum's scalability roadmap. L2 solutions are built on top of the Ethereum mainnet and provide a way to offload a significant portion of the transaction load from the main blockchain, thereby improving scalability while retaining the core functionality of the Ethereum network.

L2s achieve this scalability improvement by aggregating multiple transactions and settling them as a single batch on the Ethereum mainnet, reducing the overall number of transactions that need to be processed on the main chain.

The rise in L2 protocols to process transactions off mainnet has helped enhance transaction speed and reduce costs. However, increased L2 solution adoption is, in turn, overloading Ethereum’s consensus mechanism and perpetuating mainnet congestion. Such issues are due to L2 reliance on L1s for bundled transaction processing, verification, and settlement, which adds to the amount of transaction detail data that sits on the mainnet. To combat this, L2s can reduce data storage related gas fees by only posting summarized transaction data – but this poses a trade-off issue between efficiency and full verifiability. For security and decentralization, the goal would be to achieve increased data availability¹ while also keeping costs down and increasing the feasibility of running a full node (to decrease reliance on third-parties like DACs and sync committees).

We can better understand this data availability problem through an analogy – a public library system. Imagine Ethereum’s blockchain as an old-school public library system that serves a major city with a large population like New York. However, this specific library system is unique in that people can not only borrow books, but they can also write and contribute their own books to the library. Ethereum’s own consensus mechanism represents the in-house librarians who work at the library and help customers find books. The librarians will walk through the stacks of books and find books for customers to check out. They will also take new books submitted by authors and go to the correct section and stack of the library to organize the new books accordingly.

The main library (Ethereum’s mainnet) is extremely popular and used by thousands of people. It has become so popular that it has become too crowded with people trying to borrow books and add more books to the collection. Too much congestion has caused the library to be very slow – the librarians cannot keep up with all the people trying to borrow books. Customers have to wait long hours in order to find the books they want. Additionally, it is now prohibitively expensive to borrow and contribute books – the library had to instate higher fees for checking out books, reserving times to visit the library, etc. Overall, the rise in popularity of the library enabled the system of checking out books to be inefficient.

Continuing this metaphor, L2s would be like the library creating new satellite branches to offload congestion in the main library. These satellite branches, built to handle the overflow from the main library, now allow people to read and write new books (transactions) more quickly and efficiently. These branches collect many books together and, once they're full, send a summary or an index of these books back to the main library, reducing the overall burden on the main library's space and resources.

However, as more satellite branches open up, the process of updating the main library's catalog with these summaries starts to overload the system again. The issue here is not just about moving books around but ensuring that everyone has access to these books, whether they're in the main library or any of its satellite branches. This is where the data availability problem becomes prevalent.

The Solution

Moving back to the world of blockchains, dedicated DA layers act as specialized storage and consensus mechanisms for all data types involved in rollup transactions, including transaction details, smart contracts, and off-chain data. DA layers ensure that such data is available and verifiable by all nodes across L1, L2, and L3 networks, thereby improving accessibility, transparency, and immutability. By shifting the DA layers off of L1s, significant congestion is alleviated, enhancing throughput, speed, and response time. This separation enables rollups to manage their data more efficiently.

DA layers record and broadcast transaction data, ensuring that any node can verify the blockchain's history. With the introduction of DA layers, rollups can now offload their data to a separate blockchain, streamlining data availability and increasing security. By making all data available and accessible to nodes, DA layers have the potential to ensure the reliable security of rollups – chances of malicious attacks are significantly reduced. The possibility for separate DA layers enables rollup networks to experiment with DA and other layers, creating a completely modular and flexible ecosystem.

DA layers for rollups will be optimized to introduce next-generation features that are proven to accelerate a network’s scalability while keeping the highly tested security maintained. Additionally, high prices on Ethereum’s network due to the growing popularity has driven up transaction fees. Storing data on mainnet is expensive for rollups, so allowing calldata to be stored on a separate DA layer can reduce the costs significantly. DA layers also guarantee continuous access to data across all rollup chain aspects, including execution, settlement, and consensus layers. This accessibility facilitates the straightforward resolution of any disagreements, fraudulent activities, or other complications throughout the various layers.

Moreover, the DA layer has the capability to utilize an off-chain light client, enabling nodes to verify the presence of data without the need to download complete blocks. Despite operating alongside other components of the modular blockchain architecture such as the sequencer and execution layer, interacting with DA layers is very straightforward. This user-friendliness is attributed to the integration of DA layers with readily accessible RPCs, which allow for the efficient retrieval and delivery of on-chain data across the entire ecosystem, ensuring smooth operation without complications.

How DA Protocols work

In the context of our public library system analogy, think of data availability proofs and sampling as a cutting-edge index system developed by the 3rd party software service from before. This system doesn’t need to physically examine every book or shelf to find the correct book for a customer. Instead, it cleverly checks availability using minimal but effective clues from its comprehensive index. Imagine this software has the ability to quickly highlight if any book or piece of information is missing, as well as if there is a hidden section in the library, without having to manually search through each book. Instead, it is able to randomly select indexes from different parts of the library and compile those together. This random selection is based on sophisticated algorithms that ensure that even a few checks are enough to guarantee with high confidence that no book is hidden or missing.

However, with this constant shorthand indexing of transaction information, there is high potential for the loss, misplacement, or degradation of the underlying data (books). DA layers solve this issue with an erasure coding mechanism. Erasure coding in our library system can be thought of as a specialized method of creating backup copies of books. When a new book is added to any branch (L2 solution), the library uses this method to not just copy but to create backups of additional pieces of information about the book – such as its author, category, year published, etc. These pieces are designated in such a way and connected together such that even if a large portion of the book’s information were lost, you could still piece together the entire book with what is left through these backups.

Applying this, each book is divided into segments, and then extra segments are created containing information that could rebuild the original segments if some were missing. This special copying process (erasure coding) ensures that even if half of the pieces of the book were somehow lost or hidden, the original content could still be fully recovered from the remaining pieces. This method provides a safeguard against misinformation or data loss, ensuring that every book remains accessible to anyone who seeks it.

Our library analogy can help us understand other technical details of how DA layers work. DA proofs are an innovative approach that enable nodes to verify that the sequencer has made new transaction data blocks available without needing to download the entire block. When a sequencer publishes a block of transaction data, that data goes through the erasure coding process described above The redundancies created allow for the recovery of the full original data from just half of the erasure-coded version. Therefore, a sequencer attempting to hide data would need to withhold over 50% of it, making it significantly harder to cheat. Using DA proofs, nodes can sample small portions of the erasure-coded data to check its availability. By downloading random pieces of the data, nodes can effectively verify the data's presence with high confidence. For instance, after sampling different data chunks multiple times, the probability that a node would fail to notice withheld data drops to less than 1%. This method, known as data availability sampling (DAS), is highly efficient. It allows nodes to ensure the entire block's availability by checking only a fraction of it. Addressing the data availability problem in this manner ensures that rollup sequencers remain honest, facilitating scalability.

Now that we have outlined the architecture and basic building blocks of DA layers, we can walk through the different approaches to implementing DA layers. We will first look at Ethereum’s in-house approach to solving data availability, followed by an analysis of the leading third-party DA protocols.

Ethereum’s In-House Solution

Ethereum aims to solve the data availability problem and further its scalability with its own in-house solution – sharding. Sharding involves dividing the blockchain network into smaller units, “shards,” which can process transactions in parallel. The parallelization of sharding enabled through a modular architecture will maximize transaction throughput while also reducing congestion on mainnet. The Ethereum Foundation is implementing their sharding solution through a series of sequential improvements, with Danksharding and Proto-Danksharding being the first intermediary steps.

Ethereum’s latest hard fork upgrade, the Dencun upgrade, successfully went live on mainnet on March 13th, 2024. The Dencun upgrade implements EIP 4844 (Proto-Danksharding), marking the first milestone that significantly advances Ethereum’s scalability roadmap by supercharging L2s. Proto-Danksharding is the core feature of the Dencun upgrade and constitutes the most substantial upgrade to the Ethereum ecosystem since the Shanghai upgrade in which stakers could withdraw funds deposited with the network. The Dencun upgrade is crucial for increasing Ethereum’s transaction processing capacity, enabling the network to handle more than 100,000 transactions per second – a metric that is crucial for the foundation’s ability to support a growing ecosystem of DApps and users.

*Figure 2: Ethereum’s Data Availability Roadmap showing the hierarchy of milestones needed to be accomplished in order to reach a full sharding solution.*

Danksharding is a sharding architecture that relies on “blobs” (large pieces of data) to scale the Ethereum blockchain. Rollup-centric L2 protocols use the extra blob data spaces to decongest the main network and thus reduce transaction charges. The key feature of Danksharding is how it manages transaction fees and data placement through a unified process (merged-fee market). Instead of having many separate groups handling different parts of the data, there's one entity responsible for deciding what data gets included at any given time. To avoid overburdening this system and to ensure fairness, Ethereum has introduced a system (proposer-builder separation) where specialized entities, known as block builders, compete to suggest which transactions and data should be prioritized. The winning suggestion is then chosen by another entity, the proposer, based on which offer is best. This process means that not everyone has to handle the full details of each transaction, making the system more efficient.

For even further efficiency, the system uses DAS. You’ll remember that it allows the network to check that all necessary data is correctly included without having to examine every single piece of data in full detail. This method is crucial not only for Danksharding but also for creating simpler, more streamlined clients that don't need to store as much information. Through DAS, validators can quickly confirm data is correct and fully available by checking just a few parts of it, ensuring the integrity of the transactions without the need for exhaustive checks. If any data is missing, it will be identified quickly and the blob rejected.

Since Danksharding is a very ambitious solution that will take time to fully implement, Ethereum announced Proto-Danksharding as an intermediate checkpoint before implementing the full Danksharding protocol. Proto-Danksharding, outlined in EIP-4844 and featured in the Dencun upgrade, implements the basic logic and verification rules that make up the full Danksharding solution before implementing any actual sharding. In this case, users and validators still have to directly validate the availability of the full data.

Proto-Danksharding’s core innovation introduces the blob-carrying transaction, a new transaction type in which a regular transaction carries an extra piece of data called a blob. While blobs are very large (~125kB), they are much cheaper than an equivalent amount of calldata since they are pruned after approximately two weeks and are available for L2s users to retrieve it. The blobs are fixed in size with a limit on how many blobs can be included per block. However, blob data is not accessible to EVM execution, so validators and users still must download full block contents. The blobs are stored by the consensus layer (beacon chain) instead of the execution layer, which is what enables higher throughput and faster verification. Since validators and clients must download full blob contents, this frees up disk space and allows more users to participate as well.

Proto-Danksharding brings large scalability gains, since the blob data is not competing with the gas usage of existing transactions on Ethereum. EIP-4844 also helps to facilitate an ecosystem-wide move to rollups, a solid trustless scaling solution for Ethereum, and could boost throughput of L2 rollups by 100x. Because L1 transaction fees are significant blockers for acquiring new users and applications, EIP-4844 reduces rollup fees and high gas prices by orders of magnitude by placing an upper limit on the number of blobs per block. This enables Ethereum to remain competitive while maintaining decentralization.

Below is a summary of what was accomplished in the recent Dencun upgrade, and what remains to be done.

The work already done in EIP-4844 includes:

A new transaction type, which is the exact same format that will need to exist in full sharding
All of the execution layer logic required for full sharding
All of the consensus & execution cross-verification logic required for full sharding
Layer separation between BeaconBlock verification and DAS blobs
Most of the BeaconBlock logic required for full sharding
A self-adjusting independent gas price for blobs

The work that remains to be done to get to full sharding includes:

A low-degree extension of the blob KZG commitment proofs in the consensus layer to allow 2D sampling
An actual implementation of DAS
Proposer-builder separation to avoid requiring individual validators to process 32 MB of data in one slot
Proof of custody or some similar in-protocol requirement for each validator to verify a particular part of the sharded data in each block

Proto-Danksharding makes a large number of changes today so that few changes are required in the future to upgrade to full sharding. Even though those changes to achieving full sharding are complex, the complexities are contained to the consensus layer. Now that EIP-4844 is realized, rollup developers and execution layer teams have no more further work to finish the transition to full sharding because all complexity is placed to the consensus layer with blobs being stored on the beacon chain.

Having detailed Ethereum’s in-house plans towards solving the data availability problem, we will now survey the leading third-party DA layer protocols who are building alternative solutions to data availability.

Other DA Players

Celestia

TLDR: Celestia's major edge in the DA landscape is being the first mover in modular data availability. Celestia stands out for pioneering a modular approach to blockchain architecture, emphasizing simplicity in rollup and L2 deployment.

Its major pitch outside of being the first in the space is around accessibility. Its optimistic architecture combined with DAS for light node operation make it more accessible for validators to participate without processing the entire block's data. This approach facilitates a broad range of blockchain applications, from Ethereum L2 solutions to sovereign rollups, by offering an additional settlement layer. Celestia’s "Rollups-as-a-Service" facilitates easy blockchain deployment, similar to deploying a smart contract, potentially leading to a wide variety of rollups, from general-purpose to application-specific ones.

In terms of use cases, Celestia is particularly suited for developers looking for a flexible and straightforward way to build and deploy rollups and L2 solutions without needing to bootstrap validators or a consensus mechanism. Its integration with the Cosmos SDK and the development of Optimint make it a compelling choice for creating sovereign rollups that require fast finality and efficient data availability without the heavy lifting of full block validation.

Launching its mainnet in 2023, Celestia’s modular data availability network focuses on simplifying Rollup and L2 deployment. Serving both data availability and consensus for integrated blockchains, Celestia utilizes DAS to run light nodes, allowing for more accessible validation as only a small portion of each block’s data needs to be downloaded for sampling.

Architecture: First, Celestia’s DAS utilizes 2D Reed-Solomon (RS) coding to split the data into chunks which are then encoded into a matrix and extension. Namespaced merkle trees (NMT) are then used to organize data retrieval through computation of the root of Merkle roots, which are then committed to block headers. These data commitments then allow random sampling of small block data portions and their corresponding Merkle proofs, which, when validated, ascertain with a high probability that the block’s data is fully available. Celestia features optimistic architecture with their fraud-proof scheme, the only prominent DA solution to do so, meaning that data is assumed to be available implicitly and there is a challenge period for fraud proof disputes before accurate block encoding can be confirmed. For state transition disputes, Celestia utilizes interactive verification games, so only the key disputed computation step is re-executed for the fraud proof, decreasing gas fees.

Figure 3: How Celestia’s DAS process utilizes 2D RS Encoding. Original data (k x k) is extended. Then 2k column Merkle roots and 2k row Merkle roots are generated. The Merkle root of the column and row Merkle roots is used as the header of the extended block. The data consists of both the original and the extended data.

Celestia utilizes the Cosmos SDK blockchain framework and a PoS consensus mechanism based on Celestia-core, a fork of CometBFT, and an implementation of Tendermint. With a 100 validator set size, Celestia’s distinct DA and consensus layers mean that the only validator responsibilities are to reach consensus on the order of transactions within a block and whether the necessary relevant block data has been shared. Celestia then utilizes DNS mapping to tag recorded data to its corresponding rollup/L2. While Tendermint can help Celestia achieve fast finality, the required challenge period for DA guarantees does delay time to finality. Despite that, Celestia still boasts a faster time to finality than Avail (the most comparable competitor), with a finality time of 15 seconds compared to Avail’s 20 seconds.

Distinct Features: Celestia’s structure allows for versatile execution methodologies. Further,Ethereum L2s, sovereign rollups, and settlement enrolled rollups can all accessibly plug-in to Celestia. Celestia has developed Optimint, an ABCI client to ease sovereign rollup development by serving as an additional settlement layer specifically. Optimint helps transform Cosmos SDK chains into rollups as Celestia’s consensus layer makes Tendermint’s full BFT consensus redundant.

Celestia Quantum Gravity Bridge, a data availability bridge contract sitting on Ethereum, allows Ethereum L2s to utilize Celestia as an L1 solely for data availability. Such L2s, called Celestiums, utilize Celestia for DA and Ethereum for settlement and dispute resolution. The bridge verifies the Celestia validator signed Merkle root, attesting that the provided data is available and valid. This allows Ethereum L2s to just query the DA bridge contract for state updates instead of relying on calldata posted to Ethereum. Since Celestia prices based on bytes rather than utilizing resource pricing like Ethereum, the data throughput of Celestia is ~6.67 mb/s, which is greater than that of Ethereum (even after the EIP-4844 update which took Ethereum to ~1.33 mb/s).

*Figure 4: Celestia’s Quantum Gravity Bridge allows L2s to utilize both Ethereum and Celestia as L1s*

The Celestia blockchain has a native token, TIA, which is utilized to pay for “blobspace” data storage and as a stake for participation in consensus and validation. TIA can also be utilized by developers as a gas token for rollups. TIA tokens were first widely distributed via their “genesis drop” during the launch of their mainnet beta. Six percent of the total token supply, 60 million TIA, were distributed to over 580,000 users.

Metrics: Celestia can currently process 11GB per day with its initial 2MB blocks (4 blocks per minute). After a case study testing three different transactions, (ERC20 Sends, ERC720 Mints, and Uniswap V3 Swaps), the cost of posting callData to L1s vs. Celestia was calculated. It was found that L2 scalability and cost savings fueled by Celestia were especially profound when performing high-volume on-chain transactions. Each type of transaction was replicated 10M times and in some cases, the total cost was 300–500X lower using Celestia. With DAS, Celestia is shown to reduce DA costs for L2s by up to 95%.

Avail

TLDR: Avail has an edge when it comes to modularity and verification speed. Avail emphasizes its modularity by separating its DA, execution, and verification/dispute resolution layers. This architecture allows for a significant reduction in transaction fees and swift verification times, making it advantageous for rollups and app-specific chains that demand efficient, cost-effective data availability. Avail also stands out with its light client P2P network that ensures robust data availability even during network disruptions. This feature, along with its ability to support a wide range of execution environments, makes Avail a versatile DA solution.

In terms of use cases, Avail is favored in scenarios where transaction cost efficiency and fast verification are paramount. It is designed to serve a wide array of applications, including Validiums, L2 rollups, and sovereign rollups, by providing a robust base layer that supports quick and economical data storage and retrieval. Avail is well-suited for standalone chains, sidechains, and off-chain scaling solutions that benefit from a decoupled execution and validity environment.

Architecture: Avail, originally created as a data attestation bridge to Ethereum for L2s and L3s, has developed into a robust data availability base layer that is data-agnostic in nature. Avail introduces modularity by decoupling its DA layer, execution layer, and its verification/dispute resolution layer. While specifically meant for rollups (Validiums, L2 rollups posting transaction data, and sovereign rollups), Avail also supports app-specific chains. Avail claims that utilizing their DA layer can save up to 90% in transaction fees compared to using Ethereum for DA.

The foundational DA layer orders, stores, and guarantees availability of transactional data. Avail guarantees data availability to Light Clients by combining erasure coding, KZG commitments (also utilized by EIP-4844 for its trust setup), and DAS. When transactions are queued to Avail, data blocks are arranged into 2D matrices and erasure coding extends data columns while binding KZG commitments for each row are added to the block header. Such methodology makes commitment fraud proofs unnecessary. Then KZG polynomial commitments to each block cryptographically prove the data stored is accurate and secure. These proofs are then regenerated by validators to confirm data integrity and, through a super majority, reach consensus on a block. Use of a validity-proof model, instead of a fraud-proof scheme like Celestia, gives Avail a competitive advantage in terms of verification time, which takes less than a minute compared to Celestia’s sub 10 minute benchmark.

Figure 5: Diagram of Avail’s base layer architecture. First the block proposer erasure encodes the data and generates KZG commitments to construct the block. Then the blocks are propagated to validators to double check and reach consensus on the block. (Decode, reconstruct, verify).

Distinct Features: On Avail, light client nodes can be run by anyone to identify whether blocks are valid and contain full data through DAS. For each sampled cell, by checking KZG polynomial openings against block header commitments, data availability verification occurs independently (for each cell) and instantly. Light clients are encouraged by the network to promote decentralization. Avail noticeably features a peer-to-peer (P2P) light client network, which acts as backup to uphold the network during full node outages. Celestia, their main competitor, lacks such network reinforcement as light client operation requires full nodes.

Metrics: Avail launched its testnet in 2023 as a gamified incentive testnet called “Clash of Nodes.” The testnet is capped at 300 validators but its Nominated PoS (NPoS) consensus mechanism will support up to 1000 validators upon mainnet launch. Specifically Avail uses BABE, serving as a block production engine relying on probabilistic finality, and GRANDPA, functioning as a finality gadget, inherited from PolkaDot SDK.

Due to its modularity, Avail’s block space, like Celestia, is expandable with DAS. Avail’s additional client-side verification techniques also increase their ability to expand block size. Avail first launched their testnet with 2MB block size, which, with erasure coding, gets duplicated to 4MB. However, Avail claims to have tested block sizes “up to 128 MB without difficulty” while maintaining their 20-second block time.

EigenLayer

TLDR: EigenLayer has an edge for utilizing the existing Ethereum validator set. EigenDA leverages the security and validator set of Ethereum, presenting an opt-in middleware solution that doesn't necessitate bootstrapping a new validator set. Its use of a slashing-based mechanism and a restaking protocol for ensuring data availability and integrity makes it a secure and reliable option.

EigenDA is ideal for applications requiring high throughput and scalability on the Ethereum network, such as sophisticated DeFi protocols or large-scale dApps. Its ability to utilize restaked ETH for securing the network offers a cost-effective solution for developers seeking enhanced programmability and innovation on Ethereum.

Architecture: EigenLayer launched a data availability layer for Ethereum called EigenDA. It is supposed to be similar to the current Danksharding specs (with DAS, proof of custody, etc). However, it is an opt-in middleware instead of being a part of the core protocol. Instead of employing a consensus layer, it relies on a quorum of DA nodes that hold a security assumption, and there is no collusion amongst nodes. Each DA node is an EigenLayer restaker who stakes ETH to participate in the quorum protocol. EigenDA introduces a slashing-based mechanism to punish the node that breaks its promise on data storage and access service, incentivizing stakers to behave appropriately. As a product on top of EigenLayer, EigenDA has a significant advantage in being able to utilize the numerous Ethereum validators already in place, so users have no need to bootstrap their own validator set.

*Figure 6: How rollups integrate with EigenDA: EigenLayer Docs*

This architecture results in better programmability on the Ethereum base layer and, as a consequence, a higher rate of development. When protocols interact with each other more quickly and easily, the rate of innovation rises. Since the marginal cost of staking is zero (it doesn’t cost anything to stake the same capital twice), the value proposition of Ethereum grows: protocols on Ethereum can be secured with the same capital stock that secures Ethereum. This should result in Ethereum attracting new capital to secure its base layer.

Distinct Features: Due to EigenDA’s architecture, protocols can abstain from tokenization as a security mechanism, which should improve tokenomics and reduce the problem of mercenary capital. EigenDA is highly anticipated as it can leverage restaked ETH instead of its token to secure the network. EigenDA introduces the characteristic of decentralized choice. Validators and users have the freedom to choose the specific DA layer that aligns with their preferences and requirements. This flexibility ensures that the ecosystem can adapt to evolving needs and preferences, without being restricted to a single design choice.

Given the ongoing advancements in data availability research and the changing demands of rollups, having a fixed DA design now could risk adopting suboptimal solutions for years to come. EigenDA's adaptable nature ensures it can readily accommodate emerging technologies and evolving rollup requirements. EigenLayer-based DA layers can operate numerous heterogeneous layers concurrently. Different users, applications, and rollups often have diverse data availability needs. By utilizing EigenLayer, DA layers can vary in security levels (via validators or erasure code rate), bandwidth capacities, latencies, and prices. Moreover, these DA layers benefit from the security provided by Ethereum validators without incurring any capital cost.

Metrics: EigenLayer has a live testnet running with 100 validators running at 0.3 Mb/s each, which results in 15 MB/s total capacity (with a code rate of 1/2). EigenDA enables cheaper and higher bandwidth for data availability than the Ethereum base layer: Ethereum currently processes 80 kilobytes per second, but EigenLayer will allow up to 15 megabytes per second. EigenLayer’s founder Sreeram Kannan highlighted that the expected throughput of EigenDA for the future will be 1 GB/s.

NEAR DA

TLDR: NEAR DA has strength in its cost-effectiveness and its ability to handle high transaction volumes. NEAR’s DA layer leverages the Nightshade sharding mechanism to offer an efficient and scalable solution for Ethereum rollups and developers. Its chunk architecture allows for cost-effective data availability, especially in high-transaction-volume scenarios.

NEAR DA is particularly beneficial for Ethereum rollups that experience high transaction volumes and seek a cost-effective, scalable DA solution. Its architecture is good for supporting a range of applications, from gaming to finance, by providing a robust infrastructure for storing and accessing large data volumes efficiently.

Architecture: NEAR Protocol is building an efficient and robust DA layer for ETH rollups and Ethereum developers. NEAR DA leverages a crucial part of NEAR’s consensus mechanism, Nightshade, which parallelizes the network into multiple shards. Each shard on NEAR produces a chunk (small portion of a block) which is aggregated with other chunks to produce full blocks. This implementation is done at the protocol level, so is inaccessible to users and developers. Receipts processed by a chunk producer have consensus around the receipt. But once the chunk is processed and included in the block, the receipt is no longer needed for consensus and can be removed from the blockchain’s state. This pruning time makes data available in the network for around 60 hours. Once the receipt has been pruned, it is the responsibility of archival nodes to retain the transaction data.

This implies NEAR doesn’t slow down its consensus with more data than required. Additionally, any user of NEAR DA would have enough time to query transaction data. This specific chunk architecture provides great advantages to rollups through its cost-effective data availability, especially in use cases with high transaction volume.

Distinct Features: The three important components of NEAR DA that are open source and can be integrated into any OP Stack, Polygon, or Arbitrum Nitro rollups are as follows:

Blob Store Contract: this contract provides the store for arbitrary DA blobs. In practice, these “blobs” are sequencing data from rollups, but they can be any data. Consensus is provided around the submission of a blob by NEAR validators.
Light Client: A trustless off-chain light client for NEAR with DA-enabled features, such as KZG commitments, RS erasure coding, and storage connectors. The light client provides easy access to transaction and receipt inclusion proofs within a chunk or full block.
RPC Client: The defacto client for submitting data blobs to NEAR. These allow a client to interact with the blob store.

NEAR’s engineering team also announced the move towards stateless validation, the next phase of sharding which will decrease the hardware requirements of chunk validators and move the state into memory. This will allow for more shards and will increase the decentralization in the system by reducing the requirements to become a validator. More shards will boost throughput and reduce the amount of data that has to be stored on a single shard.

Metrics: Since its launch on mainnet at the end of 2020, NEAR achieved 100% uptime with 4 shards and has onboarded 35M accounts. NEAR DA is an incredibly fast and cost-effective data availability option. Depending on the gas fees, NEAR DA is up to 85,000x cheaper than posting blob submissions on Ethereum and 30x cheaper than Celestia. Their goal is to reach a system in which rollups can rely on its own shard by running a lightweight RPC through DA with sharding rather than relying on DAS. DA sharding remains in the research phase, but it would be a huge advantage to the NEAR Protocol design for fast and cost-effective data availability.

0G²

TLDR: 0G (ZeroGravity) is a newer primitive that has an edge for decentralized storage and scalability. 0G focuses on scalability and reliability through a novel design that separates data publishing and storage lanes. This approach addresses scalability bottlenecks and enables large volumes of data transfers, supported by a decentralized storage system designed for partitioning.

With its emphasis on horizontal scalability and multi-layer storage design, 0G is well-suited for on-chain AI applications and decentralized infrastructures requiring hyperscale programmable data availability. Its architecture allows for a wide variety of use cases, including decentralized gaming, collaborative web2 applications, and high-frequency DeFi solutions.

Architecture: 0G is building a new design of the data availability system that aims to be more scalable and reliable – enough to meet the enormous demands for off-chain verification of executed states without jeopardizing scalability and security. 0G builds the DA layer directly on top of a decentralized storage system. This addresses the obstacles of scalability head-on by minimizing the data transfer volume required for broadcast. The crux of their solution lies in their separation of work required for data availability into a “data publishing lane” and a “data storage lane.” The data publishing lane guarantees data availability through consensus of data availability sampling, requiring only tiny data to flow through the consensus protocol to avoid the broadcasting bottleneck. The data storage lane enables large volumes of data transfers and is supported by the storage layer that accomplishes horizontal scalability through designed partitioning.

The 0G DA layer has a separate consensus network that uses DAS to guarantee the data availability property of each data block. Each validator does DAS independently, so once the majority of the validators reach the consensus of successful sampling results, the data is treated as available. When a data block enters the DA layer, it is first erasure coded and organized into multiple consecutive chunks. To keep the order of the data entering the system intact, the merkle root as a commitment of the encoded data block is submitted to the consensus layer. The chunks are then released to different storage nodes in the 0G storage system, where the data can be further replicated to other nodes depending on the storage fee paid by the user.

*Figure 7: The Architecture of the* 0G System, showing how both the 0G Consensus layer and the 0G storage network interact.

The storage layer contains a storage network that connects with the same separate consensus network as the DA layer. Each storage node participates in a mining process by submitting the proof of accessibility for a specific piece of data to the smart contract deployed on the consensus network. Once the proof is validated by the smart contract, the storage node gets rewarded. The partitioning design occurs through rewarding more to the node for storing the specific data that belongs to the same partition within the node. 0G created an incentive-based mechanism that rewards the nodes for their contributions, better encouraging nodes to participate in the maintenance of the network and promoting the network to achieve improved scalability. This general decentralized storage design enables 0G to support a variety of availability data types from various use cases. The storage system has multiple stacks of abstractions and structures that can handle unstructured or mutable data, unlocking reliable data indexing. This allows them to support not only L2 networks, but a variety of decentralized AI infrastructures as well.

Distinct Features: 0G hopes to build a highly performant, modular, and unopinionated stack that will enable hyperscale programmable data availability optimized for AI. Their storage system will be sufficient enough to store LLMs and metadata in order for OPML or ZKML to run large AI models. Through their horizontal scaling and multi-layer storage design, they will fuel the future of on-chain AI by being the fastest and lowest cost data propagation mechanism.

Their programmable DA will allow protocols to save and load DApp state via smart contracts. 0G will monitor state requests, manage data and provide economic security that Eth L1s and L2s can retrieve, which is useful for multiple application level cases. For example, this allows decentralized gaming to store game or player stats, opens possibilities for traditional web2 applications like collaborative editing, or enables high frequency DeFi through being able to store on-chain order books.

0G enables complete DA customization. Users can choose storage length, location and replication, as well as how much of the network to propagate to. They can also choose token mechanics, such as how token payments should work based on their data and how much stake is required for a certain node. 0G allows for bandwidth reservation, guaranteeing consistent costs to rollups, which could also pay for throughput on-demand. Finally, 0G will be very extensible in the future. They plan on using their infrastructure to later build zk prover aggregation and decentralized shared sequencers.

Metrics: 0G plans on launching their public testnet soon. In their private testnet, they were able to achieve 10MB/s per node, which is 8x faster than Celestia and 50x faster than an individual node on EigenDA.

NuBit³

TLDR: NuBit distinguishes itself by being a Bitcoin-Native solution for increased scalability and efficiency. NuBit introduces a Bitcoin-native data availability layer aimed at enhancing network scalability and reducing storage transaction fees. By leveraging zkSNARKs and a unique consensus mechanism, NuBit offers an efficient solution for Bitcoin-based applications.

NuBit is tailored for applications within the Bitcoin ecosystem that require efficient data storage and access, such as L2 solutions, Ordinal inscriptions, and price oracles. Its architecture enables a significant increase in data throughput and block creation speed, making it a viable option for a broad range of Bitcoin-centric applications seeking scalability and efficiency.

Architecture: NuBit, developed by Riema Labs, is a Bitcoin-native DA layer, secured by Bitcoin to help relieve the Bitcoin network. NuBit aims to utilize zkSNARKs to improve network scalability by increasing validator size, DAS, and block dispersion. The goal is to decrease block size, generate instant finality to increase TPS, and implement a PCN-based (Payment Channel Network) payment system to maintain a scalable trustless setup. NuBit plans to lower these storage transaction fees to one-thousandth of the current rates and with implementation, increase available storage capacity to petabytes. Further, NuBit claims to increase data throughput to 100 times that of Bitcoin and they boast a block creation time of ~20 seconds per block compared to Bitcoin’s ~10 minutes per block.

For use within the Bitcoin ecosystem, NuBit helps applications like L2s, Ordinal inscriptions, and price oracles become more efficient while making Bitcoin security inheritance accessible. Most prominently, the main use cases are for inscriptions, price feeds, and rollup data. In a high level-overview, transaction data is stored on NuBit’s DA layer before recording a corresponding DA tag and special identifier on the Bitcoin network.

For block assembly, erasure coding is utilized with 2D RS coding to prevent loss of original data. The transaction data is divided into a grid of k x k chunks then enlarged through 2D RS coding to introduce extra data. Each block chunk’s correctness is then ensured by generated KZG commitments stored in the block header (data attestation). With this commitment, one cannot generate a valid proof for any altered chunk, thus its existence allows for the direct verification of chunks. Then, block dispersion is applied to mitigate communication costs. Coded chunks of block data are broadcasted and shared among different validator groups, who are then responsible for keeping and managing them. Validator nodes operate with a novel NuBFT consensus mechanism which features PoS with Babylon Bitcoin staking and timestamping and a SNARK-based Signature Aggregation to simplify bitfield mergers when identifying validator signatures and decrease the rounds of broadcasting for the voting process.

*Figure 8: Diagram of How NuBit works with applications for DA and Bitcoin integration:* *Medium*

Distinct features: NuBit utilizes DAS to allow participants to run a light node and verify data presence without downloading the entire block. DAS is mainly composed of the sampling protocol, which checks randomly selected chunks against the header’s KZG commitment, and then the decoding protocol, which works with validators to reconstruct full blocks through RS decoding. Reconstructed full blocks are then taken on by full storage nodes. The sampling protocol is run both within the validator set to finalize blocks and by light clients to enhance block reputation. NuBit’s introduction of light node DAS to the Bitcoin network is also a significant step towards enhancing Bitcoin verification decentralization, an extra bonus to this scalability solution.

NuBit does not have its own tokens, instead utilizing Bitcoin as both consensus tokens and network tokens. Thus, a trustless bridge via PCN is utilized to bridge Bitcoin assets to NuBit with the Lightning Network for fee payments and validator reward distribution.

Metrics: NuBit launched their pre-alpha testnet in February 2024 and are still in early stages of development. They recently closed a $3M funding round, however metrics surrounding their DA layer specifically have not been released yet. Their consensus mechanism supports a twenty second block time (Bitcoin is ten minutes) and twenty second time to finality (Bitcoin is one hour). Their goal is to reach instant finality time and implement a 1-of-N trust assumption.

Competitive Analysis of DA Layers:

DA Solution	L1 Compatability	Testnet vs. Mainnet	Native Token	Consensus Mechanism	Finality Time	Distinctive Features
Celestia	L1 Agnostic	Mainnet	Yes: $TIA	Celestia Core with modified Tendermint consensus mechanism; PoS	15 sec	Polygon CDK integration L1 Agnostic: Used by VM-agnostic L2s Utilizes fraud-proof scheme instead of a validity-proof model
Avail	Ethereum	Testnet	No	PolkaDot's GRANDPA & BABE; NPoS	20 sec	Gamified incentivized testnet Peer-to-peer (P2P) light client network for backup to uphold network during full node outages
EigenDA	Ethereum	Mainnet	No	DAC-based	Eth's finality time	Use of DACs No DAS Inherits Ethereum validity set & finality time
NearDA	Ethereum	Mainnet	Yes: $NEAR	Nightshade; PoS	3-4 sec (SFFL)	Separated blob store contract, light client, and RPC client
0G	L1/Storage location agnostic	Pre-Testnet Launch	No	Not yet published	TBD	Separate storage system Specific for AI Dapps
NuBit	Bitcoin	Testnet	No	NuBFT: PBFT-based consensus mechanism	Instant	First-to-market DA solution for Bitcoin ecosystem Only Bitcoin compatible for L1 as payment

‍

The Road Ahead - Ethereum vs. The World (of DA Companies)

If full Danksharding is realized, what will happen to the other DA players?

After analyzing the landscape of data availability solutions, particularly within the context of Ethereum and its competitors, we encounter a fundamental dichotomy over which solution is the most reliable. One side of the coin might argue that using Ethereum as a DA layer gives legitimacy, while not using Ethereum lacks legitimacy. Ethereum repeatedly emphasizes the importance of legitimacy, primarily through Vitalik’s blog post (“The Most Important Scarce Resource is Legitimacy”). The Ethereum Foundation has a strong reputation and communal influence that makes its brand representation effective. As such, Ethereum may have incentives to block DA layer competitors in order to maintain its position as the global layer of trust.

Conversely, there is a counter-argument suggesting that placing all bets on Ethereum might not be the most strategic move – not only does it lead to a lack of diversification, but the main pain point emphasized by the third-party DA protocols is that Ethereum’s full sharding solution will take many years to be fully realized. Because Ethereum’s data availability roadmap goes through a series of intermediary steps in development, and there is no specific timeline for the full realization of Danksharding, there is lots of uncertainty around Ethereum’s ability to service data availability immediately. Ethereum has only accomplished the first step of their scalability roadmap so far (Figure 2) and it only came after numerous delays in the Dencun upgrade timeline.This experience makes it hard not to anticipate more delays with Ethereum’s future sharding upgrades. In contrast, when you see new third-party DA layers building fast and launching innovative technologies to edge past Ethereum’s progress, it’s hard not to be optimistic about the value these new DA layers can provide.

Speaking with founders of these individual DA layers specifically, some view this market as evolving into a “winner take most” model. What L2s and other infrastructure providers really care about are the cheapest transaction costs, highest throughput, and lowest latency. Whoever has the fastest data propagation speed will enable that and be able to see the most prominent “wins” out of this opportunity. DA layers also aim to reach web2 parity in terms of pricing, cost, and speed. Those who can support application layers with high volumes of data, such as decentralized inference, gaming applications, and eventually model training for on-chain AI will see the most success as a protocol. ✦

¹_{The data availability problem is tackled in}_{data availability (DA) layers,}_{sidechains that plug into the modular blockchain architecture as a separate component, as shown in Figure 1. Think of the DA Layer in our library analogy as a 3rd party software service employed by the library to help with their inefficiency issues. The DA layer is a new technological innovation that helps customers find books extremely fast using partial information about a book (an index). This service is not physical and does not hold books, but it keeps detailed records and indexes of every book and summary sent between the main library and its branches. If someone needs to verify a fact or find a book, they no longer have to search through every shelf in every branch. Instead, they can consult the software’s records to quickly input the index information on a book and the service will respond almost immediately with where exactly in the library to find the book they need. This software ensures that all records are transparent and easily accessible, and has verification mechanisms to prevent any single branch from claiming to have a book that doesn't exist or hiding a book that should be available. By having this dedicated software outsourced, the library network can scale up, opening more branches and serving more customers without overwhelming any single part of the system.}

²_{Symbolic Capital is an investor in 0G.}

³_{Symbolic Capital is an investor in NuBit.}‍

^{Legal Disclosure: This document, and the information contained herein, has been provided to you by Hyperedge Technology LP and its affiliates (“Symbolic Capital”) solely for informational purposes. This document may not be reproduced or redistributed in whole or in part, in any format, without the express written approval of Symbolic Capital. Neither the information, nor any opinion contained in this document, constitutes an offer to buy or sell, or a solicitation of an offer to buy or sell, any advisory services, securities, futures, options or other financial instruments or to participate in any advisory services or trading strategy. Nothing contained in this document constitutes investment, legal or tax advice or is an endorsement of any of the digital assets or companies mentioned herein. You should make your own investigations and evaluations of the information herein. Any decisions based on information contained in this document are the sole responsibility of the reader. Certain statements in this document reflect Symbolic Capital’s views, estimates, opinions or predictions (which may be based on proprietary models and assumptions, including, in particular, Symbolic Capital’s views on the current and future market for certain digital assets), and there is no guarantee that these views, estimates, opinions or predictions are currently accurate or that they will be ultimately realized. To the extent these assumptions or models are not correct or circumstances change, the actual performance may vary substantially from, and be less than, the estimates included herein. None of Symbolic Capital nor any of its affiliates, shareholders, partners, members, directors, officers, management, employees or representatives makes any representation or warranty, express or implied, as to the accuracy or completeness of any of the information or any other information (whether communicated in written or oral form) transmitted or made available to you. Each of the aforementioned parties expressly disclaims any and all liability relating to or resulting from the use of this information. Certain information contained herein (including financial information) has been obtained from published and non-published sources. Such information has not been independently verified by Symbolic Capital and, Symbolic Capital, does not assume responsibility for the accuracy of such information. Affiliates of Symbolic Capital may have owned or may own investments in some of the digital assets and protocols discussed in this document. Except where otherwise indicated, the information in this document is based on matters as they exist as of the date of preparation and not as of any future date, and will not be updated or otherwise revised to reflect information that subsequently becomes available, or circumstances existing or changes occurring after the date hereof. This document provides links to other websites that we think might be of interest to you. Please note that when you click on one of these links, you may be moving to a provider’s website that is not associated with Symbolic Capital. These linked sites and their providers are not controlled by us, and we are not responsible for the contents or the proper operation of any linked site. The inclusion of any link does not imply our endorsement or our adoption of the statements therein. We encourage you to read the terms of use and privacy statements of these linked sites as their policies may differ from ours. The foregoing does not constitute a “research report” as defined by FINRA Rule 2241 or a “debt research report” as defined by FINRA Rule 2242 and was not prepared by Symbolic Capital Partners LLC. For all inquiries, please email info@symbolic.capital. © Copyright Hyperedge Capital LP 2024. All rights reserved.}

Intro - The Problem with Data Availability

Why We Need DA Protocols

The Solution

How DA Protocols work

Ethereum’s In-House Solution

Below is a summary of what was accomplished in the recent Dencun upgrade, and what remains to be done.

The work already done in EIP-4844 includes:

A new transaction type, which is the exact same format that will need to exist in full sharding
All of the execution layer logic required for full sharding
All of the consensus & execution cross-verification logic required for full sharding
Layer separation between BeaconBlock verification and DAS blobs
Most of the BeaconBlock logic required for full sharding
A self-adjusting independent gas price for blobs

The work that remains to be done to get to full sharding includes:

A low-degree extension of the blob KZG commitment proofs in the consensus layer to allow 2D sampling
An actual implementation of DAS
Proposer-builder separation to avoid requiring individual validators to process 32 MB of data in one slot
Proof of custody or some similar in-protocol requirement for each validator to verify a particular part of the sharded data in each block

Blob Store Contract: this contract provides the store for arbitrary DA blobs. In practice, these “blobs” are sequencing data from rollups, but they can be any data. Consensus is provided around the submission of a blob by NEAR validators.
Light Client: A trustless off-chain light client for NEAR with DA-enabled features, such as KZG commitments, RS erasure coding, and storage connectors. The light client provides easy access to transaction and receipt inclusion proofs within a chunk or full block.
RPC Client: The defacto client for submitting data blobs to NEAR. These allow a client to interact with the blob store.

0G²

NuBit³

Competitive Analysis of DA Layers:

DA Solution	L1 Compatability	Testnet vs. Mainnet	Native Token	Consensus Mechanism	Finality Time	Distinctive Features
Celestia	L1 Agnostic	Mainnet	Yes: $TIA	Celestia Core with modified Tendermint consensus mechanism; PoS	15 sec	Polygon CDK integration L1 Agnostic: Used by VM-agnostic L2s Utilizes fraud-proof scheme instead of a validity-proof model
Avail	Ethereum	Testnet	No	PolkaDot's GRANDPA & BABE; NPoS	20 sec	Gamified incentivized testnet Peer-to-peer (P2P) light client network for backup to uphold network during full node outages
EigenDA	Ethereum	Mainnet	No	DAC-based	Eth's finality time	Use of DACs No DAS Inherits Ethereum validity set & finality time
NearDA	Ethereum	Mainnet	Yes: $NEAR	Nightshade; PoS	3-4 sec (SFFL)	Separated blob store contract, light client, and RPC client
0G	L1/Storage location agnostic	Pre-Testnet Launch	No	Not yet published	TBD	Separate storage system Specific for AI Dapps
NuBit	Bitcoin	Testnet	No	NuBFT: PBFT-based consensus mechanism	Instant	First-to-market DA solution for Bitcoin ecosystem Only Bitcoin compatible for L1 as payment

‍

The Road Ahead - Ethereum vs. The World (of DA Companies)

If full Danksharding is realized, what will happen to the other DA players?

²_{Symbolic Capital is an investor in 0G.}

A Deep Dive into Data Availability: The Promises and Challenges of Scaling Web3

Intro - The Problem with Data Availability

Why We Need DA Protocols

The Solution

How DA Protocols work

Ethereum’s In-House Solution

Other DA Players

0G2

NuBit3

The Road Ahead - Ethereum vs. The World (of DA Companies)

Intro - The Problem with Data Availability

Why We Need DA Protocols

The Solution

How DA Protocols work

Ethereum’s In-House Solution

Other DA Players

0G2

NuBit3

The Road Ahead - Ethereum vs. The World (of DA Companies)

0G²

NuBit³

0G²

NuBit³