Possible futures of the Ethereum protocol, part 4: The Verge
2024 Oct 23
See all posts
Possible futures of the Ethereum protocol, part 4: The Verge
Special thanks to Justin Drake, Hsiao-wei Wang, Guillaume Ballet,
Ignacio, Josh Rudolf, Lev Soukhanov, Ryan Sean Adams and Uma Roy for
feedback and review.
One of the most powerful things about a blockchain is the fact that
anyone can run a node on their computer and verify that the
chain is correct. Even if 95% of the nodes running the chain
consensus (PoW, PoS...) all immediately agreed to change the rules, and
started producing blocks according to the new rules, everyone running a
fully-verifying node would refuse to accept the chain. The stakers who
are not part of such a cabal would automatically converge on,
and continue building, a chain that continues to follow the old rules,
and fully-verifying users would follow that chain.
This is a key difference between blockchains and centralized systems.
However, for this property to hold, running a fully-verifying node needs
to be actually feasible for a critical mass of people. This applies both
to stakers (as if stakers are not verifying the chain,
they are not actually contributing to enforcing the protocol rules), and
to regular users. Today, running a node is possible on
a consumer laptop (including the one being used to write this post), but
doing so is difficult. The Verge is about changing this, and
making fully-verifying the chain so computationally affordable
that every mobile wallet, browser wallet, and even smart watch is doing
it by default.
The Verge, 2023 roadmap.
Originally, the "Verge" referred to the idea of moving Ethereum state
storage to Verkle
trees - a tree structure that allows for much more compact proofs,
enabling stateless validation of Ethereum blocks. A
node could verify an Ethereum block without having any of the Ethereum
state (account balances, contract code, storage...) on its hard drive, at
a cost of spending a few hundred kilobytes of proof data and a few
hundred extra milliseconds verifying a proof. Today, the Verge
represents a much larger vision focused on enabling maximally
resource-efficient verification of the Ethereum chain,
which includes not just stateless validation technology, but also
verifying all Ethereum execution with SNARKs.
In addition to the added long-term focus on SNARK-verifying the whole
chain, another new question has to do with whether or not Verkle
trees are even the best technology in the first place. Verkle
trees are vulnerable to quantum computers, and so if we replace the
current KECCAK Merkle
Patricia tree with Verkle trees, we will later have to replace the
trees again. The natural alternative to Merkle trees is skipping
straight to using a STARK
of Merkle branches in a binary tree.
Historically, this has been considered non-viable due to overhead and
technical complexity. More recently, however, we have seen Polygon prove 1.7
million Poseidon hashes per second on a laptop with circle
STARKs, and proving times for more "conventional" hashes are also
rapidly improving thanks to techniques like GKR.
As a result, over the past year the Verge has become much more
open-ended, and there are several possibilities.
The Verge: key goals
- Stateless clients: fully-verifying clients, and staking nodes,
should not need more than a few GB of storage
- (Longer term) fully verify the chain (consensus and execution) on a
smart watch. Download some data, verify a SNARK, done.
In this chapter
Stateless verification:
Verkle or STARKs
What problem are we trying
to solve?
Today, an Ethereum client needs to store hundreds
of gigabytes of state data in order to verify blocks, and this
amount continues to increase with each passing year. The raw state data
increases by ~30
GB per year, and individual clients have to store some extra data on
top to be able to update the trie efficiently.
This reduces the number of users who can run fully-verifying Ethereum
nodes: even though hard drives large enough to store all Ethereum state
and even history for many years are readily available,
the computers that people buy by default tend to only have a few hundred
gigabytes of storage. The state size also introduces a great friction
into the process of setting up a node for the first time: the node needs
to download the entire state, which can take hours or days. This has all
kinds of knockon effects. For example, it makes it significantly harder
for a staker to upgrade their staking setup. Technically, it's possible
to do this with no downtime - start a new client, wait for it to sync,
then shut down the old client and transfer the key - but in practice
it's technically complicated.
What is it and how does it
work?
Stateless verification is a technology that allows nodes to verify
blocks without having the entire state. Instead, each block comes with a
witness, which includes (i) the values (eg.
code, balances, storage) at the specific locations in the state
that the block will access, and (ii) a cryptographic
proof that those values are correct.
Actually implementing stateless verification requires changing the
Ethereum state tree structure. This is because the current Merkle
Patricia tree is extremely unfriendly to implementing any
cryptographic proof scheme, especially in the worst case. This is true
for both "raw" Merkle branches, and the possibility of "wrapping" the
Merkle branches in a STARK. The key difficulties stem from two
weaknesses of the MPT:
- It's a hexary tree (ie. each node has 16 children). This means that
on average, a proof in a size-N tree has
32 * (16 - 1) * log16(N) = 120 * log2(N)
bytes, or about
3840 bytes in a 232-item tree. With a binary tree, you only
need 32 * (2 - 1) * log2(N) = 32 * log2(N)
bytes, or about
1024 bytes.
- The code is not Merkelized. This means that proving any access of
account code requires providing the entire code, which is a maximum of
24000 bytes.
We can compute a worst-case scenario as follows:
30,000,000 gas / 2,400 ("cold" account read cost) * (5 * 480 + 24,000) = 330,000,000
bytes
The branch cost is slightly decreased ( 5 * 480
instead
of 8 * 480
) because the top parts of branches are repeated
when there are many of them. But even still, this works out to a
completely unrealistic amount of data to download within one slot. If we
try to wrap it in a STARK, we get two problems: (i) KECCAK is relatively
STARK-unfriendly, and (ii) 330 MB of data means we have to prove 5
million calls to the KECCAK round function, which is way too
much to prove on all but the most powerful consumer hardware, even
if we could make STARK-proving KECCAK much more efficient.
If we just replace the hexary tree with a binary tree, and we
additionally Merkelize code, then the worst case becomes roughly
30,000,000 / 2,400 * 32 * (32 - 14 + 8) = 10,400,000
bytes
(the 14 is a subtraction for redundant bits of ~214 branches,
and the 8 is the length of a proof going into a leaf in a chunk). Note
that this requires a change in gas costs, to charge for accessing each
individual chunk of code; EIP-4762 does this.
10.4 MB is much better, but it's still too much data for many nodes to
download within one slot. And so we need to introduce some more powerful
technology. For this, there are two leading solutions: Verkle
trees, and STARKed binary hash trees.
Verkle trees
Verkle trees use elliptic curve-based vector commitments to make much
shorter proofs. The key unlock is that the piece of the proof
corresponding to each parent-child relationship is only 32 bytes,
regardless of the width of the tree. The only limit to the
width of the tree is that if the tree gets too wide, proofs become
computationally inefficient. The implementation proposed for Ethereum
has a width of 256.
The size of a single branch in a proof thus becomes
32 * log256(N) = 4 * log2(N)
bytes. The theoretical max
proof size thus becomes roughly
30,000,000 / 2,400 * 32 * (32 - 14 + 8) / 8 = 1,300,000
bytes (the math works out slightly differently in practice because of
uneven distribution of state chunks, but this is fine as a first
approximation).
As an additional caveat, note that in all of the above examples, this
"worst case" is not quite a worst case: an even worse case is
when an attacker intentionally "mines" two addresses to have a long
common prefix in the tree and reads from one of them, which can extend
the worst-case branch length by another ~2x. But even with this
caveat, Verkle trees get us to ~2.6 MB worst-case proofs, which roughly
matches today's worst-case calldata.
We also take advantage of this caveat to do another thing: we make it
very cheap to access "adjacent" storage: either many chunks of code of
the same contract, or adjacent storage slots. EIP-4762 provides a
definition of adjacency, and charges only 200 gas for adjacent access.
With adjacent accesses, the worst-case proof size becomes
30,000,000 / 200 * 32 = 4,800,800
bytes, which is still
roughly within tolerances. If we want this value decreased for safety,
we can increase the adjacent access costs slightly.
STARKed binary hash trees
The technology here is very self-explanatory: you do a binary tree,
take the max-10.4-MB proof that you need to prove the values in a block,
and replace the proof with a STARK of the proof. This gets us to the
point where the proof itself consists of only the data being proven,
plus ~100-300 kB fixed overhead from the actual STARK.
The main challenge here is prover time. We can make basically the
same calculations as above, except instead of counting bytes, we count
hashes. A 10.4 MB block means 330,000 hashes. If we add in the
possibility of an attacker "mining" addresses with a long common prefix
in the tree, the real worst case becomes around 660,000 hashes.
So if we can prove ~200,000 hashes per second, we're fine.
These numbers have already been reached on a consumer laptop with the
Poseidon hash function,
which has been explicitly designed for STARK-friendliness. However,
Poseidon is relatively immature, and so many do not yet trust it for
security. There are thus two realistic paths forward:
- Do lots and lots of security analysis on Poseidon quickly, and get
comfortable enough with it to deploy it at L1
- Use a more "conservative" hash function, such as SHA256 or
BLAKE
Starkware's circle STARK provers at the time of this writing can only
prove ~10-30k hashes per second on a consumer laptop if they are proving
conservative hash functions. However, STARK technology is improving
quickly. Even today, GKR-based techniques show promise in potentially
increasing this to the ~100-200k range.
Use cases of
witnesses other than verifying blocks
In addition to verifying blocks, there are three other key use cases
for more efficient stateless validation:
- Mempools: when a transaction gets broadcasted,
nodes in the p2p network need to verify that the transaction is valid
before re-broadcasting it. Today, validation involves verifying the
signature, but also checking that the balance is sufficient and the
nonce is correct. In the future (eg. with native account abstraction,
such as EIP-7701),
this may involve running some EVM code, which does some state accesses.
If nodes are stateless, the transaction will need to come with a proof
proving the state objects.
- Inclusion lists: this is a proposed
feature which allows (potentially small and unsophisticated)
proof-of-stake validators to force the next block to contain a
transaction, regardless of the (potentially large and sophisticated)
block builder's wishes. This would reduce the ability of powerful actors
to manipulate the blockchain by delaying transactions. However, this
requires validators to have a way to verify the validity of transactions
in the inclusion list.
- Light clients: if we want users accessing the chain
through wallets (eg. Metamask, Rainbow, Rabby...) to do so without
trusting centralized actors, they need to run light clients (eg. Helios). The core Helios
module gives the user verified state roots. However, for a fully
trustless experience, the user needs proofs for each individual RPC call
they make (eg. for an eth_call request,
the user would need a proof of all the state accessed during the
call)
One thing that all of these use cases have in common is that they
require a fairly large number of proofs, but each proof is small. For
this reason, STARK proofs do not actually make sense for them; instead,
it's most realistic to just use Merkle branches directly. Another
advantage of Merkle branches is that they are updateable: given a proof
of a state object X, rooted in block B, if you receive a child block B2
with its witness, you can update the proof to make it rooted in block
B2. Verkle proofs are also natively updateable.
What are some links to
existing research?
What is left to
do, and what are the tradeoffs?
The main remaining work to do is:
- More analysis on the consequences of EIP-4762
(statelessness gas cost changes)
- More work finalizing and testing the transition procedure, which is
a large part of the complexity of any statelessness EIP
- More security analysis of Poseidon, Ajtai and other "STARK-friendly"
hash functions
- More development of ultra-efficient STARK protocols for
"conservative" (or "traditional") hash functions, eg. based on ideas
from Binius
or GKR.
We also will soon have a decision point of which of three options to
take: (i) Verkle trees, (ii) STARK-friendly
hash functions, and (iii) conservative hash
functions. Their properties can be approximately summarized in
this table:
Verkle |
Data plus ~100-2,000 kB |
Elliptic curve (not quantum-resistant) |
< 1s |
STARK over conservative hash functions (eg. SHA256,
BLAKE) |
Data plus ~100-300 kB |
Conservative hash functions |
> 10 s |
STARK over aggressive hash functions (Poseidon,
Ajtai) |
Data plus ~100-300 kB |
Relatively new and less-tested hash functions |
1-2s |
In addition to these "headline numbers", there are a few other
important considerations:
- Today, the Verkle tree code is quite mature. Using
anything but Verkle would realistically delay deployment, likely by one
hard fork. This can be okay, especially if we need the extra time anyway
to work on hash function analysis or prover implementations, and if we
have other important features we want to get included in Ethereum
earlier.
- Updating the state root is faster with hashes than
with Verkle trees. This means that hash-based approaches can lead to
lower sync time for full nodes.
- Verkle trees have interesting witness update
properties - Verkle tree witnesses are updateable. This
property is useful for mempools, inclusion lists, and other use cases,
and it also can potentially help with making implementations more
efficient: if a state object is updated, you can update the witness on
the second last level without even reading the last level.
- Verkle trees are more difficult to SNARK-prove. If
we want to reduce the proof size all the way to a few kilobytes, Verkle
proofs introduce some difficulty. This is because the verification of a
Verkle proof introduces a large number of 256-bit operations, which
requires the proof system to either have a lot of overhead, or itself
have a custom internal construction with a 256-bit part for the Verkle
proof. This is not a problem for statelessness itself, but introduces
more difficulties later.
If we want the Verkle witness updateability properties in a way
that's quantum-safe, and reasonably efficient, one other possible path
is lattice-based Merkle
trees.
If proof systems end up being not efficient enough in the worst case,
one other "rabbit out of a hat" that we could use to compensate for such
an inadequacy is multidimensional
gas: have separate gas limits for (i) calldata, (ii) computation,
(iii) state accesses, and possibly other distinct resources.
Multidimensional gas adds complexity, but in exchange it much more
tightly bounds the ratio between average case and worst case. With
multidimensional gas, the theoretical max number of branches to prove
could plausibly decrease from 30,000,000 / 2400 = 12,500
to
eg. 3000. This would make BLAKE3 (just barely) sufficient even
today, with no further prover improvements.
Multidimensional gas allows the resource limits of a
block to much more closely replicate the resource limits of the
underlying hardware.
Another "rabbit out of a hat" is this
proposal to delay state root computation until the slot
after a block. This would give us a full 12 seconds to
compute the state root, meaning that only ~60,000 hashes/sec proving
time is sufficient even in the most extreme cases, again putting us in
the range of BLAKE3 being just barely sufficient.
This approach has the downside that it would increase light client
latency by a slot, though there are more clever versions of the
technique that reduce this delay to just the proof generation
latency. For example, the proof could be broadcasted across the network
as soon as any node generates it, instead of waiting for the next
block.
How does
it interact with other parts of the roadmap?
Solving statelessness greatly increases the ease of solo staking.
This becomes much more valuable if technologies that reduce the minimum
balance of solo staking, such as Orbit SSF or application-layer
strategies like squad staking,
become available.
Multidimensional gas becomes easier if EOF is also introduced. This
is because a key
complexity of multidimensional gas for execution is handling
sub-calls which don't pass along the parent call's full gas, and EOF
makes this problem trivial by simply making such sub-calls illegal (and
native account abstraction would provide an in-protocol alternative for
the current primary use case of partial-gas sub-calls).
One other important synergy is between stateless validation and
history expiry. Today, clients have to store nearly a
terabyte of history data; this data is several times larger than the
state. Even if clients are stateless, the dream of
nearly-storage-free clients will not be realized unless we can relieve
clients of the responsibility to store history as well. The
first step in this regard is EIP-4444, which also
implies storing historical data in torrents or the Portal network.
Validity proofs of EVM
execution
What problem are we
trying to solve?
The long-term goal for Ethereum block verification is clear: you
should be able to verify an Ethereum block by (i) downloading
the block, or perhaps even only small parts of the block with
data availability sampling, and (ii) verifying a small
proof that the block is valid. This would be an extremely
low-resource operation, and could be done on a mobile client, inside a
browser wallet, or even (without the data availability part) in another
chain.
Getting to this point requires having SNARK or STARK proofs of (i)
the consensus layer (ie. the proof of stake), and (ii)
the execution layer (ie. the EVM). The former is itself
a challenge, and it should be addressed during the process of making
further ongoing improvements to the consensus layer (eg. for single slot
finality). The latter requires proofs of EVM execution.
What is it and how does it
work?
Formally, in the Ethereum specifications, the EVM is defined as a
state transition function: you have some
pre-state S
, a block
B
, and you are computing a post-state
S' = STF(S, B)
. If a user is using a light client, they do
not have S
and S'
or even B
in
their entirety; instead, they have a pre-state root
R
, and a post-state root R'
and a block hash H. The full statement that needs to be
proven is approximately:
- Public inputs: pre-state root
R
,
post-state root R'
, block hash H
- Private inputs: block body
B
, the
objects in the state accessed by the block Q
, the same
objects after executing the block Q'
, the state proof (eg.
Merkle branches) P
- Claim 1:
P
is a valid proof that
Q
contains some portion of the state represented by
R
- Claim 2: if you run STF on
Q
, (i) the
execution only accesses objects inside of Q
, (ii) the block
is valid, and (iii) the outcome is Q'
- Claim 3: If you recompute the new state root, using
information from
Q'
and P
, you get
R'
If this exists, you can have a light client that fully verifies the
Ethereum EVM execution. This allows clients to be quite low-resource
already. To get to true fully-verifying Ethereum clients, you
also need to do the same for the consensus side.
Implementations of validity proofs for EVM computation already exist,
and are being used heavily by layer 2s. However, there is still a lot to
do to make EVM validity proofs viable for L1.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
Today, validity proofs for the EVM are inadequate in two dimensions:
security and prover time.
A secure validity proof involves having an assurance that the SNARK
actually verifies the EVM computation, and does not have a bug in it.
The two leading techniques for increasing security are multi-provers and formal
verification. Multi-provers means having multiple
independently-written validity proof implementations, much like there
are multiple clients, and having clients accept a block if it's proven
by a sufficiently large subset of these implementations. Formal
verification involves using tools often used to prove mathematical
theorems, such as Lean4, to prove that a
validity proof only accepts inputs that are correct executions of the
underlying EVM specification (eg. the EVM K
Semantics or the Ethereum execution
layer specifications (EELS) written in python).
Sufficiently fast prover time means that any Ethereum block can be
proven in less than ~4 seconds. Today, we are still far from this,
though we are much closer than was imagined possible even two years ago.
To get to this goal, we need to advance in three directions:
Parallelization - the fastest EVM prover today
can prove an average Ethereum block in ~15 seconds. It does
this by parallelizing between several hundred GPUs, and then aggregating
their work together at the end. Theoretically, we know exactly how to
make an EVM prover that can prove computation in O(log(N)) time: have
one GPU do each step, and then do an "aggregation tree":
There are challenges in implementing this. To work even in worst-case
situations, where a very large transaction takes up an entire block, the
splitting of the computation cannot be per-tx; it must be per-opcode (of
the EVM or of an underlying VM like RISC-V). A key implementation
challenge that makes this not completely trivial is the need to
make sure that the "memory" of the VM is consistent between different
parts of the proof. However, if we can make this kind of
recursive proof, then we know that at least the prover latency problem
is solved, even without any improvements on any of the other
axes.
Proof system optimization - new proof systems
like Orion, Binius,
GKR,
and many more, are likely to lead to yet another large reduction in
prover time for general-purpose computation.
EVM gas cost other changes - many things in the
EVM could be optimized to be much more prover-friendly, especially in
the worst case. Being able to prove an average Ethereum
block in 4 seconds is not enough, if an attacker can construct a block
that will clog up provers' time for ten minutes. The EVM changes
required can largely be broken down into two categories:
- Gas cost changes - if an operation takes a long
time to prove, it should have a high gas cost, even if it is relatively
fast to compute. EIP-7667 is one
proposed EIP to handle the worst offenders in this regard: it
significantly increases the gas costs of (conventional) hash functions
exposed as relatively cheap opcodes and precompiles. To compensate for
these gas cost increases, we can reduce the gas cost of EVM opcodes that
are relatively cheap to prove, so as to keep average throughput the
same.
- Data structure replacements - in addition to
replacing the state tree with a more STARK-friendly alternative, we need
to replace the transaction list, receipt tree, and other structures that
are expensive to prove. Etan Kissling's EIPs that move transaction and
receipt structures to SSZ ([1] [2] [3]) are one step in
that direction.
In addition to this, the two "rabbits out of a hat" mentioned in the
previous section (multidimensional gas, and
delayed state root) can also help here. However, it's
worth noting that unlike stateless validation, where using either rabbit
out of a hat means that we have sufficient technology to do what we need
today, even with these techniques full ZK-EVM validation will
take more work - it will just take less work.
One thing that was not mentioned above is prover
hardware: using GPUs, FPGAs and ASICs to generate proofs
faster. Fabric
Cryptography, Cysic and Accseal are three companies pushing
ahead on this. This will be extremely valuable for layer 2s, but it's
unlikely to become a decisive consideration for layer 1, because there
is a strong desire to keep layer 1 highly decentralized, which implies
that proof generation must be within the capabilities of a reasonably
large subset of Ethereum users, and should not be bottlenecked by a
single company's hardware. Layer 2s can make more aggressive
tradeoffs.
More work remains to be done in each of these areas:
- Parallelized proving requires proof systems where different parts of
the proof can "share a memory" (eg. lookup tables). We know techniques
for doing this, but they need to be implemented.
- We need more analysis to figure out the ideal set of gas cost
changes to minimize worst-case prover time.
- We need more work on proving systems
Likely tradeoffs here include:
- Security vs prover time: it may be possible to cut
down prover time using aggressive choices for hash functions, proof
systems with more complexity or more aggressive security assumptions, or
other design choices.
- Decentralization vs prover time: the community
needs to agree on what is the "spec" for prover hardware that it is
targeting. Is it okay to require provers to be large-scale entities? Do
we want a high-end consumer laptop to be able to prove an Ethereum block
in 4 seconds? Something in between?
- Degree of breaking backwards compatibility:
insufficiencies in other areas can be compensated for by making much
more aggressive gas cost changes, but this is more likely to
disproportionately increase the cost of some applications over others,
and force developers to rewrite and redeploy code in order to remain
economically viable. Similarly, the "rabbits out of a hat" have their
own complexity and downsides.
How does
it interact with other parts of the roadmap?
The core tech needed to make EVM validity proofs at layer 1 happen is
heavily shared with two other areas:
- Validity proofs at layer 2 (ie. "ZK rollups")
- The "STARK a binary hash proof" approach to statelessness
A successful implementation of validity proofs at layer 1 allows for
the ultimate in easy solo staking: even the weakest computer (including
phone or smart watch) would be able to stake. This further increases the
value of addressing the other limitations to solo staking (eg. the 32
ETH minimum).
Additionally, EVM validity proofs at L1 can enable
considerable L1 gas limit increases.
Validity proofs of consensus
What problem are we
trying to solve?
If we want it to be possible to fully verify an Ethereum block with a
SNARK, then the EVM execution is not the only part we need to prove. We
also need to prove the consensus: the part of the system that
handles deposits, withdrawals, signatures, validator balance updates,
and other elements of the proof-of-stake part of Ethereum.
The consensus is considerably simpler than the EVM, but it has the
challenge that we don't have layer 2 EVM rollups as a reason why most of
the work is going to be done anyway. Hence, any implementation of
proving Ethereum consensus would need to be done "from scratch",
although the proof systems themselves, are shared work that can be built
on top of.
What is it and how does it
work?
The beacon chain is defined as a state
transition function, just like the EVM. The state transition
function is dominated by three things:
- ECADDs (for verifying BLS signatures)
- Pairings (for verifying BLS signatures)
- SHA256 hashes (for reading and updating the state)
In each block, we need to prove 1-16 BLS12-381
ECADDs per validator (potentially more than one,
because signatures can get included in multiple aggregates). This can be
compensated for by subset precomputation techniques, so altogether we
can say that it's one BLS12-381 ECADD per validator. Today, there are
~30,000 validators signing in each slot. In the future, with single slot
finality, this could change in either direction (see the
exposition here): if we take the "brute force" route, this could
increase to 1 million validators per slot. Meanwhile, with Orbit SSF, it
would stay at 32,768, or even decrease to 8,192.
How BLS aggregation works. Verifying the aggregate
signature only requires an ECADD per participant, instead of an ECMUL.
But 30,000 ECADDs is still a lot to prove.
For pairings, there is a current maximum of 128 attestations per
slot, implying the need to verify 128 pairings. With EIP-7549 and further
changes, this can plausibly decrease to 16 per slot. Pairings are few in
number, but they are extremely expensive: each one takes thousands of
times longer to run (or prove) than an ECADD.
A major challenge with proving BLS12-381 operations is that there is
no convenient curve whose curve order equals the BLS12-381 field size,
which adds considerable overhead to any proving system. The Verkle trees
proposed for Ethereum, on the other hand, were built with the Bandersnatch curve,
which makes BLS12-381 itself the natural curve to use in a SNARK system
to prove a Verkle branch. A fairly naive implementation can prove ~100
G1 additions per second; clever techniques like GKR would almost
certainly be required to make proving fast enough.
For SHA256 hashes, today the worst case is the epoch
transition block, where the entire validator short-balance tree, and a
significant number of validator balances, gets updated. The validator
short-balance tree has one byte per validator, so ~1 MB of data gets
re-hashed. This corresponds to 32,768 SHA256 calls. If a thousand
validators' balances fall above or below a threshold that requires the
effective balance, in the validator record, to be updated, that
corresponds to a thousand Merkle branches, so perhaps another ten
thousand hashes. The shuffling
mechanism requires 90 bits per validator (so, 11 MB of data), but
this can be computed at any time over the course of an epoch. With
single-slot finality, these numbers may again increase or decrease
depending on the details. Shuffling becomes unnecessary, though Orbit
may bring back the need for some degree of it.
Another challenge is the need to read all validator
state, including public keys, in order to verify a block.
Reading the public keys alone takes 48 million bytes for 1 million
validators, together with Merkle branches. This requires millions of
hashes per epoch. If we had to prove the proof-of-stake validation
today, a realistic approach would be some form of incrementally
verifiable computation: store a separate data structure inside the proof
system that is optimized for efficient lookups, and prove updates to
this structure.
To summarize, there are a lot of challenges.
Fixing these challenges most efficiently may well require a deep
redesign of the beacon chain, which could happen at the same time as a
switch to single-slot finality. Features of this redesign could
include:
- Hash function change: today, the "full" SHA256 hash
function gets used, so due to padding each call corresponds to two
underlying compression function calls. At the very least, we can get a
2x gain by switching to the SHA256 compression function. If we switch to
Poseidon, we can get a potentially ~100x gain, which could solve all of
our problems (at least for hashes) completely: at 1.7 million hashes (54
MB) per second, even a million validator records can be "read" into a
proof in a few seconds.
- If Orbit, store shuffled validator records
directly: if you choose some number of validators (eg. 8,192 or
32,768) to be the committee for a given slot, put them directly into the
state beside each other, so that the minimum amount of hashing is needed
to read all validator pubkeys into a proof. This would also allow all
balance updates to be done efficiently.
- Signature aggregation: any high-performance
signature aggregation scheme will realistically involve some kind of
recursive-proving, where intermediate proofs of subsets of signatures
will get made by various nodes in the network. This naturally splits the
load of proving across many nodes in the network, making the work of the
"final prover" much smaller.
- Other signature schemes: For a Lamport+Merkle
signature, we need 256 + 32 hashes to verify a signature; multiplying by
32,768 signers gives 9,437,184 hashes. Optimizations to the signature
scheme can improve this further by a small constant factor. If we use
Poseidon, this is within range to prove within a single slot.
Realistically, though, this would be made much faster with recursive
aggregation schemes.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
Realistically, it will take years before we have a validity proof of
the Ethereum consensus. This is roughly the same timeline that we need
to implement single slot finality, Orbit, changes to the signature
algorithm, and potentially security analysis needed to become
sufficiently confident to use "aggressive" hash functions like Poseidon.
Hence, it makes the most sense to work on these other problems, and
while doing that work keep STARK-friendliness in mind.
The main tradeoff may well be in order of operations, between a more
incremental approach to reforming the Ethereum consensus layer and a
more radical "many changes at once" approach. For the EVM, an
incremental approach makes sense, because it minimizes disruption to
backwards compatibility. For the consensus layer, backwards
compatibility concerns are smaller, and there are benefits to a more
"holistic" re-think in various details of how the beacon chain is
constructed, to best optimize for SNARK-friendliness.
How does
it interact with other parts of the roadmap?
STARK-friendliness needs to be a primary concern in long-term
redesigns of the Ethereum proof of stake consensus, most notably
single-slot finality, Orbit, changes to the signature scheme, and
signature aggregation.
Possible futures of the Ethereum protocol, part 4: The Verge
2024 Oct 23 See all postsSpecial thanks to Justin Drake, Hsiao-wei Wang, Guillaume Ballet, Ignacio, Josh Rudolf, Lev Soukhanov, Ryan Sean Adams and Uma Roy for feedback and review.
One of the most powerful things about a blockchain is the fact that anyone can run a node on their computer and verify that the chain is correct. Even if 95% of the nodes running the chain consensus (PoW, PoS...) all immediately agreed to change the rules, and started producing blocks according to the new rules, everyone running a fully-verifying node would refuse to accept the chain. The stakers who are not part of such a cabal would automatically converge on, and continue building, a chain that continues to follow the old rules, and fully-verifying users would follow that chain.
This is a key difference between blockchains and centralized systems. However, for this property to hold, running a fully-verifying node needs to be actually feasible for a critical mass of people. This applies both to stakers (as if stakers are not verifying the chain, they are not actually contributing to enforcing the protocol rules), and to regular users. Today, running a node is possible on a consumer laptop (including the one being used to write this post), but doing so is difficult. The Verge is about changing this, and making fully-verifying the chain so computationally affordable that every mobile wallet, browser wallet, and even smart watch is doing it by default.
The Verge, 2023 roadmap.
Originally, the "Verge" referred to the idea of moving Ethereum state storage to Verkle trees - a tree structure that allows for much more compact proofs, enabling stateless validation of Ethereum blocks. A node could verify an Ethereum block without having any of the Ethereum state (account balances, contract code, storage...) on its hard drive, at a cost of spending a few hundred kilobytes of proof data and a few hundred extra milliseconds verifying a proof. Today, the Verge represents a much larger vision focused on enabling maximally resource-efficient verification of the Ethereum chain, which includes not just stateless validation technology, but also verifying all Ethereum execution with SNARKs.
In addition to the added long-term focus on SNARK-verifying the whole chain, another new question has to do with whether or not Verkle trees are even the best technology in the first place. Verkle trees are vulnerable to quantum computers, and so if we replace the current KECCAK Merkle Patricia tree with Verkle trees, we will later have to replace the trees again. The natural alternative to Merkle trees is skipping straight to using a STARK of Merkle branches in a binary tree. Historically, this has been considered non-viable due to overhead and technical complexity. More recently, however, we have seen Polygon prove 1.7 million Poseidon hashes per second on a laptop with circle STARKs, and proving times for more "conventional" hashes are also rapidly improving thanks to techniques like GKR.
As a result, over the past year the Verge has become much more open-ended, and there are several possibilities.
The Verge: key goals
In this chapter
Stateless verification: Verkle or STARKs
What problem are we trying to solve?
Today, an Ethereum client needs to store hundreds of gigabytes of state data in order to verify blocks, and this amount continues to increase with each passing year. The raw state data increases by ~30 GB per year, and individual clients have to store some extra data on top to be able to update the trie efficiently.
This reduces the number of users who can run fully-verifying Ethereum nodes: even though hard drives large enough to store all Ethereum state and even history for many years are readily available, the computers that people buy by default tend to only have a few hundred gigabytes of storage. The state size also introduces a great friction into the process of setting up a node for the first time: the node needs to download the entire state, which can take hours or days. This has all kinds of knockon effects. For example, it makes it significantly harder for a staker to upgrade their staking setup. Technically, it's possible to do this with no downtime - start a new client, wait for it to sync, then shut down the old client and transfer the key - but in practice it's technically complicated.
What is it and how does it work?
Stateless verification is a technology that allows nodes to verify blocks without having the entire state. Instead, each block comes with a witness, which includes (i) the values (eg. code, balances, storage) at the specific locations in the state that the block will access, and (ii) a cryptographic proof that those values are correct.
Actually implementing stateless verification requires changing the Ethereum state tree structure. This is because the current Merkle Patricia tree is extremely unfriendly to implementing any cryptographic proof scheme, especially in the worst case. This is true for both "raw" Merkle branches, and the possibility of "wrapping" the Merkle branches in a STARK. The key difficulties stem from two weaknesses of the MPT:
32 * (16 - 1) * log16(N) = 120 * log2(N)
bytes, or about 3840 bytes in a 232-item tree. With a binary tree, you only need32 * (2 - 1) * log2(N) = 32 * log2(N)
bytes, or about 1024 bytes.We can compute a worst-case scenario as follows:
30,000,000 gas / 2,400 ("cold" account read cost) * (5 * 480 + 24,000) = 330,000,000
bytesThe branch cost is slightly decreased (
5 * 480
instead of8 * 480
) because the top parts of branches are repeated when there are many of them. But even still, this works out to a completely unrealistic amount of data to download within one slot. If we try to wrap it in a STARK, we get two problems: (i) KECCAK is relatively STARK-unfriendly, and (ii) 330 MB of data means we have to prove 5 million calls to the KECCAK round function, which is way too much to prove on all but the most powerful consumer hardware, even if we could make STARK-proving KECCAK much more efficient.If we just replace the hexary tree with a binary tree, and we additionally Merkelize code, then the worst case becomes roughly
30,000,000 / 2,400 * 32 * (32 - 14 + 8) = 10,400,000
bytes (the 14 is a subtraction for redundant bits of ~214 branches, and the 8 is the length of a proof going into a leaf in a chunk). Note that this requires a change in gas costs, to charge for accessing each individual chunk of code; EIP-4762 does this. 10.4 MB is much better, but it's still too much data for many nodes to download within one slot. And so we need to introduce some more powerful technology. For this, there are two leading solutions: Verkle trees, and STARKed binary hash trees.Verkle trees
Verkle trees use elliptic curve-based vector commitments to make much shorter proofs. The key unlock is that the piece of the proof corresponding to each parent-child relationship is only 32 bytes, regardless of the width of the tree. The only limit to the width of the tree is that if the tree gets too wide, proofs become computationally inefficient. The implementation proposed for Ethereum has a width of 256.
The size of a single branch in a proof thus becomes
32 * log256(N) = 4 * log2(N)
bytes. The theoretical max proof size thus becomes roughly30,000,000 / 2,400 * 32 * (32 - 14 + 8) / 8 = 1,300,000
bytes (the math works out slightly differently in practice because of uneven distribution of state chunks, but this is fine as a first approximation).As an additional caveat, note that in all of the above examples, this "worst case" is not quite a worst case: an even worse case is when an attacker intentionally "mines" two addresses to have a long common prefix in the tree and reads from one of them, which can extend the worst-case branch length by another ~2x. But even with this caveat, Verkle trees get us to ~2.6 MB worst-case proofs, which roughly matches today's worst-case calldata.
We also take advantage of this caveat to do another thing: we make it very cheap to access "adjacent" storage: either many chunks of code of the same contract, or adjacent storage slots. EIP-4762 provides a definition of adjacency, and charges only 200 gas for adjacent access. With adjacent accesses, the worst-case proof size becomes
30,000,000 / 200 * 32 = 4,800,800
bytes, which is still roughly within tolerances. If we want this value decreased for safety, we can increase the adjacent access costs slightly.STARKed binary hash trees
The technology here is very self-explanatory: you do a binary tree, take the max-10.4-MB proof that you need to prove the values in a block, and replace the proof with a STARK of the proof. This gets us to the point where the proof itself consists of only the data being proven, plus ~100-300 kB fixed overhead from the actual STARK.
The main challenge here is prover time. We can make basically the same calculations as above, except instead of counting bytes, we count hashes. A 10.4 MB block means 330,000 hashes. If we add in the possibility of an attacker "mining" addresses with a long common prefix in the tree, the real worst case becomes around 660,000 hashes. So if we can prove ~200,000 hashes per second, we're fine.
These numbers have already been reached on a consumer laptop with the Poseidon hash function, which has been explicitly designed for STARK-friendliness. However, Poseidon is relatively immature, and so many do not yet trust it for security. There are thus two realistic paths forward:
Starkware's circle STARK provers at the time of this writing can only prove ~10-30k hashes per second on a consumer laptop if they are proving conservative hash functions. However, STARK technology is improving quickly. Even today, GKR-based techniques show promise in potentially increasing this to the ~100-200k range.
Use cases of witnesses other than verifying blocks
In addition to verifying blocks, there are three other key use cases for more efficient stateless validation:
One thing that all of these use cases have in common is that they require a fairly large number of proofs, but each proof is small. For this reason, STARK proofs do not actually make sense for them; instead, it's most realistic to just use Merkle branches directly. Another advantage of Merkle branches is that they are updateable: given a proof of a state object X, rooted in block B, if you receive a child block B2 with its witness, you can update the proof to make it rooted in block B2. Verkle proofs are also natively updateable.
What are some links to existing research?
What is left to do, and what are the tradeoffs?
The main remaining work to do is:
We also will soon have a decision point of which of three options to take: (i) Verkle trees, (ii) STARK-friendly hash functions, and (iii) conservative hash functions. Their properties can be approximately summarized in this table:
In addition to these "headline numbers", there are a few other important considerations:
If we want the Verkle witness updateability properties in a way that's quantum-safe, and reasonably efficient, one other possible path is lattice-based Merkle trees.
If proof systems end up being not efficient enough in the worst case, one other "rabbit out of a hat" that we could use to compensate for such an inadequacy is multidimensional gas: have separate gas limits for (i) calldata, (ii) computation, (iii) state accesses, and possibly other distinct resources. Multidimensional gas adds complexity, but in exchange it much more tightly bounds the ratio between average case and worst case. With multidimensional gas, the theoretical max number of branches to prove could plausibly decrease from
30,000,000 / 2400 = 12,500
to eg. 3000. This would make BLAKE3 (just barely) sufficient even today, with no further prover improvements.Multidimensional gas allows the resource limits of a block to much more closely replicate the resource limits of the underlying hardware.
Another "rabbit out of a hat" is this proposal to delay state root computation until the slot after a block. This would give us a full 12 seconds to compute the state root, meaning that only ~60,000 hashes/sec proving time is sufficient even in the most extreme cases, again putting us in the range of BLAKE3 being just barely sufficient.
This approach has the downside that it would increase light client latency by a slot, though there are more clever versions of the technique that reduce this delay to just the proof generation latency. For example, the proof could be broadcasted across the network as soon as any node generates it, instead of waiting for the next block.
How does it interact with other parts of the roadmap?
Solving statelessness greatly increases the ease of solo staking. This becomes much more valuable if technologies that reduce the minimum balance of solo staking, such as Orbit SSF or application-layer strategies like squad staking, become available.
Multidimensional gas becomes easier if EOF is also introduced. This is because a key complexity of multidimensional gas for execution is handling sub-calls which don't pass along the parent call's full gas, and EOF makes this problem trivial by simply making such sub-calls illegal (and native account abstraction would provide an in-protocol alternative for the current primary use case of partial-gas sub-calls).
One other important synergy is between stateless validation and history expiry. Today, clients have to store nearly a terabyte of history data; this data is several times larger than the state. Even if clients are stateless, the dream of nearly-storage-free clients will not be realized unless we can relieve clients of the responsibility to store history as well. The first step in this regard is EIP-4444, which also implies storing historical data in torrents or the Portal network.
Validity proofs of EVM execution
What problem are we trying to solve?
The long-term goal for Ethereum block verification is clear: you should be able to verify an Ethereum block by (i) downloading the block, or perhaps even only small parts of the block with data availability sampling, and (ii) verifying a small proof that the block is valid. This would be an extremely low-resource operation, and could be done on a mobile client, inside a browser wallet, or even (without the data availability part) in another chain.
Getting to this point requires having SNARK or STARK proofs of (i) the consensus layer (ie. the proof of stake), and (ii) the execution layer (ie. the EVM). The former is itself a challenge, and it should be addressed during the process of making further ongoing improvements to the consensus layer (eg. for single slot finality). The latter requires proofs of EVM execution.
What is it and how does it work?
Formally, in the Ethereum specifications, the EVM is defined as a state transition function: you have some pre-state
S
, a blockB
, and you are computing a post-stateS' = STF(S, B)
. If a user is using a light client, they do not haveS
andS'
or evenB
in their entirety; instead, they have a pre-state rootR
, and a post-state rootR'
and a block hash H. The full statement that needs to be proven is approximately:R
, post-state rootR'
, block hashH
B
, the objects in the state accessed by the blockQ
, the same objects after executing the blockQ'
, the state proof (eg. Merkle branches)P
P
is a valid proof thatQ
contains some portion of the state represented byR
Q
, (i) the execution only accesses objects inside ofQ
, (ii) the block is valid, and (iii) the outcome isQ'
Q'
andP
, you getR'
If this exists, you can have a light client that fully verifies the Ethereum EVM execution. This allows clients to be quite low-resource already. To get to true fully-verifying Ethereum clients, you also need to do the same for the consensus side.
Implementations of validity proofs for EVM computation already exist, and are being used heavily by layer 2s. However, there is still a lot to do to make EVM validity proofs viable for L1.
What are some links to existing research?
What is left to do, and what are the tradeoffs?
Today, validity proofs for the EVM are inadequate in two dimensions: security and prover time.
A secure validity proof involves having an assurance that the SNARK actually verifies the EVM computation, and does not have a bug in it. The two leading techniques for increasing security are multi-provers and formal verification. Multi-provers means having multiple independently-written validity proof implementations, much like there are multiple clients, and having clients accept a block if it's proven by a sufficiently large subset of these implementations. Formal verification involves using tools often used to prove mathematical theorems, such as Lean4, to prove that a validity proof only accepts inputs that are correct executions of the underlying EVM specification (eg. the EVM K Semantics or the Ethereum execution layer specifications (EELS) written in python).
Sufficiently fast prover time means that any Ethereum block can be proven in less than ~4 seconds. Today, we are still far from this, though we are much closer than was imagined possible even two years ago. To get to this goal, we need to advance in three directions:
Parallelization - the fastest EVM prover today can prove an average Ethereum block in ~15 seconds. It does this by parallelizing between several hundred GPUs, and then aggregating their work together at the end. Theoretically, we know exactly how to make an EVM prover that can prove computation in O(log(N)) time: have one GPU do each step, and then do an "aggregation tree":
There are challenges in implementing this. To work even in worst-case situations, where a very large transaction takes up an entire block, the splitting of the computation cannot be per-tx; it must be per-opcode (of the EVM or of an underlying VM like RISC-V). A key implementation challenge that makes this not completely trivial is the need to make sure that the "memory" of the VM is consistent between different parts of the proof. However, if we can make this kind of recursive proof, then we know that at least the prover latency problem is solved, even without any improvements on any of the other axes.
Proof system optimization - new proof systems like Orion, Binius, GKR, and many more, are likely to lead to yet another large reduction in prover time for general-purpose computation.
EVM gas cost other changes - many things in the EVM could be optimized to be much more prover-friendly, especially in the worst case. Being able to prove an average Ethereum block in 4 seconds is not enough, if an attacker can construct a block that will clog up provers' time for ten minutes. The EVM changes required can largely be broken down into two categories:
In addition to this, the two "rabbits out of a hat" mentioned in the previous section (multidimensional gas, and delayed state root) can also help here. However, it's worth noting that unlike stateless validation, where using either rabbit out of a hat means that we have sufficient technology to do what we need today, even with these techniques full ZK-EVM validation will take more work - it will just take less work.
One thing that was not mentioned above is prover hardware: using GPUs, FPGAs and ASICs to generate proofs faster. Fabric Cryptography, Cysic and Accseal are three companies pushing ahead on this. This will be extremely valuable for layer 2s, but it's unlikely to become a decisive consideration for layer 1, because there is a strong desire to keep layer 1 highly decentralized, which implies that proof generation must be within the capabilities of a reasonably large subset of Ethereum users, and should not be bottlenecked by a single company's hardware. Layer 2s can make more aggressive tradeoffs.
More work remains to be done in each of these areas:
Likely tradeoffs here include:
How does it interact with other parts of the roadmap?
The core tech needed to make EVM validity proofs at layer 1 happen is heavily shared with two other areas:
A successful implementation of validity proofs at layer 1 allows for the ultimate in easy solo staking: even the weakest computer (including phone or smart watch) would be able to stake. This further increases the value of addressing the other limitations to solo staking (eg. the 32 ETH minimum).
Additionally, EVM validity proofs at L1 can enable considerable L1 gas limit increases.
Validity proofs of consensus
What problem are we trying to solve?
If we want it to be possible to fully verify an Ethereum block with a SNARK, then the EVM execution is not the only part we need to prove. We also need to prove the consensus: the part of the system that handles deposits, withdrawals, signatures, validator balance updates, and other elements of the proof-of-stake part of Ethereum.
The consensus is considerably simpler than the EVM, but it has the challenge that we don't have layer 2 EVM rollups as a reason why most of the work is going to be done anyway. Hence, any implementation of proving Ethereum consensus would need to be done "from scratch", although the proof systems themselves, are shared work that can be built on top of.
What is it and how does it work?
The beacon chain is defined as a state transition function, just like the EVM. The state transition function is dominated by three things:
In each block, we need to prove 1-16 BLS12-381 ECADDs per validator (potentially more than one, because signatures can get included in multiple aggregates). This can be compensated for by subset precomputation techniques, so altogether we can say that it's one BLS12-381 ECADD per validator. Today, there are ~30,000 validators signing in each slot. In the future, with single slot finality, this could change in either direction (see the exposition here): if we take the "brute force" route, this could increase to 1 million validators per slot. Meanwhile, with Orbit SSF, it would stay at 32,768, or even decrease to 8,192.
How BLS aggregation works. Verifying the aggregate signature only requires an ECADD per participant, instead of an ECMUL. But 30,000 ECADDs is still a lot to prove.
For pairings, there is a current maximum of 128 attestations per slot, implying the need to verify 128 pairings. With EIP-7549 and further changes, this can plausibly decrease to 16 per slot. Pairings are few in number, but they are extremely expensive: each one takes thousands of times longer to run (or prove) than an ECADD.
A major challenge with proving BLS12-381 operations is that there is no convenient curve whose curve order equals the BLS12-381 field size, which adds considerable overhead to any proving system. The Verkle trees proposed for Ethereum, on the other hand, were built with the Bandersnatch curve, which makes BLS12-381 itself the natural curve to use in a SNARK system to prove a Verkle branch. A fairly naive implementation can prove ~100 G1 additions per second; clever techniques like GKR would almost certainly be required to make proving fast enough.
For SHA256 hashes, today the worst case is the epoch transition block, where the entire validator short-balance tree, and a significant number of validator balances, gets updated. The validator short-balance tree has one byte per validator, so ~1 MB of data gets re-hashed. This corresponds to 32,768 SHA256 calls. If a thousand validators' balances fall above or below a threshold that requires the effective balance, in the validator record, to be updated, that corresponds to a thousand Merkle branches, so perhaps another ten thousand hashes. The shuffling mechanism requires 90 bits per validator (so, 11 MB of data), but this can be computed at any time over the course of an epoch. With single-slot finality, these numbers may again increase or decrease depending on the details. Shuffling becomes unnecessary, though Orbit may bring back the need for some degree of it.
Another challenge is the need to read all validator state, including public keys, in order to verify a block. Reading the public keys alone takes 48 million bytes for 1 million validators, together with Merkle branches. This requires millions of hashes per epoch. If we had to prove the proof-of-stake validation today, a realistic approach would be some form of incrementally verifiable computation: store a separate data structure inside the proof system that is optimized for efficient lookups, and prove updates to this structure.
To summarize, there are a lot of challenges.
Fixing these challenges most efficiently may well require a deep redesign of the beacon chain, which could happen at the same time as a switch to single-slot finality. Features of this redesign could include:
What are some links to existing research?
What is left to do, and what are the tradeoffs?
Realistically, it will take years before we have a validity proof of the Ethereum consensus. This is roughly the same timeline that we need to implement single slot finality, Orbit, changes to the signature algorithm, and potentially security analysis needed to become sufficiently confident to use "aggressive" hash functions like Poseidon. Hence, it makes the most sense to work on these other problems, and while doing that work keep STARK-friendliness in mind.
The main tradeoff may well be in order of operations, between a more incremental approach to reforming the Ethereum consensus layer and a more radical "many changes at once" approach. For the EVM, an incremental approach makes sense, because it minimizes disruption to backwards compatibility. For the consensus layer, backwards compatibility concerns are smaller, and there are benefits to a more "holistic" re-think in various details of how the beacon chain is constructed, to best optimize for SNARK-friendliness.
How does it interact with other parts of the roadmap?
STARK-friendliness needs to be a primary concern in long-term redesigns of the Ethereum proof of stake consensus, most notably single-slot finality, Orbit, changes to the signature scheme, and signature aggregation.