Will OP+ZK Hybrid Rollup become the ultimate future of Ethereum scalability?

Will OP+ZK Hybrid Rollup be the future of Ethereum scalability?

I am not going to explain in detail the workings of ZK and Optimistic Rollups in this article. If I were to spend time explaining the essence of Rollups, this article would be too long. So this article is based on the assumption that you already have a certain understanding of these technologies. Of course, you don’t need to be an expert, but at least you should know what ZK and Optimistic Rollups are and how they work. Anyway, enjoy reading this article.

Let’s start with Optimistic Rollup

The system that combines ZK and Optimistic Rollup was initially based on the Optimistic Rollup architecture of Optimism’s Bedrock. Bedrock is designed to be maximally compatible with Ethereum (“EVM-equivalent”) by running an execution client that is almost identical to an Ethereum client. Bedrock takes advantage of Ethereum’s upcoming consensus/execution client separation model, significantly reducing differences with the EVM (though there are always some changes in this process, but we can handle them).

Like all good Rollups, Optimism extracts block/transaction data from Ethereum, sorts it in a deterministic way in the consensus client, and feeds it to the L2 execution client for execution. This architecture solves the first half of the “ideal Rollup” puzzle and provides us with an L2 that is equivalent to the EVM.

Of course, the problem we still need to solve now is: telling Ethereum what is happening inside Optimism in a verifiable way. If this problem is not solved, smart contracts cannot make decisions based on the state of Optimism. This would mean that users could deposit assets into Optimism, but would not be able to withdraw them. While one-way Rollup is possible in some cases, bidirectional Rollup is more effective in most cases.

By providing some form of commitment to that state, and proof that the commitment is correct, we can inform Ethereum of the state of all Rollups. In other words, we are proving that the “Rollup program” is executed correctly. The only substantive difference between ZK and Optimistic Rollups is the form of this proof. In ZK Rollup, you need to provide a specific zero-knowledge proof to prove the correct execution of the program. In Optimistic Rollup, you can make a declaration of commitment without providing explicit evidence. By challenging and questioning your declaration, other users can force you to participate in a back-and-forth “game” to determine who is ultimately right.

I’m not going to go into detail on the challenge part of Optimistic Rollups. What’s worth noting is that the latest technology in this area is to compile your program (in Optimism’s case, Geth EVM + some edgy parts) into a simple machine architecture like MIPS. We do this because we need to establish a program interpreter on the chain, and building a MIPS interpreter is much easier than building an EVM interpreter. The EVM is also a moving target (we have regular upgrade forks), and it doesn’t fully contain the programs we want to prove (there’s also some non-EVM stuff inside).

Once you’ve built an on-chain interpreter for your simple machine architecture and created some offline tools, you should have a fully functional Optimistic Rollup.

Turning to ZK Rollup

Overall, I firmly believe that Optimistic Rollups will dominate in the next few years. Some people think that ZK Rollups will eventually surpass Optimistic Rollups, but I don’t agree with that view. I think the current relative simplicity and flexibility of Optimistic Rollups means that they can gradually evolve into ZK Rollups. If we can find a pattern to achieve this transition, there’s no need to go to great lengths to build a less flexible and more fragile ZK ecosystem, we can simply deploy it to an existing Optimistic Rollup ecosystem.

Therefore, my goal is to create an architecture and migration path that allows existing modern OP ecosystems (such as Bedrock) to seamlessly transition to the ZK ecosystem. I believe that this is not only feasible, but also a way to go beyond the current zkEVM approach.

We start with the Bedrock architecture I described in my previous article. Note that I have (briefly) explained that Bedrock has a challenge game that can verify the validity of some execution of L2 programs (running EVM + some additional content on MIPS programs). One of the main drawbacks of this approach is that we need to reserve a period of time for users to detect and successfully challenge a proposed erroneous program result. This will add a considerable amount of time to the asset extraction process (7 days on the current Optimism mainnet).

However, our L2 is just a program running on a simple machine (like MIPS). It is entirely possible to build a ZK circuit for this simple mechanism. Then, we can use this circuit to explicitly prove the correct execution of the L2 program. Without modifying the current Bedrock codebase, you can start publishing validity proofs for Optimism. It’s really that simple to operate.

Why is this method reliable?

Just to clarify: while I mention “zkMIPS” in this section, I’m actually using it as a stand-in term for all general-purpose and simplified zero-knowledge proof virtual machines (zkVMs).

zkMIPS is easier than zkEVM

There is one significant advantage to building a zkMIPS (or any type of zk virtual machine) over zkEVM: the target machine’s architecture is simple and static. The EVM changes frequently, gas prices adjust, opcodes change, elements are added or removed. MIPS-V, on the other hand, has not changed since 1996. By focusing on zkMIPS, you are dealing with a fixed problem space. Every time the EVM updates, you do not need to modify or even re-audit your circuit.

zkMIPS is more flexible than zkEVM

Another key point is that zkMIPS is more flexible than zkEVM. With zkMIPS, you can change the client code at will, perform various optimizations, or improve the user experience without requiring a corresponding circuit update. You can even create a core component that turns any blockchain into a ZK Rollup, not just Ethereum.

Your task has become proving time

Zero-knowledge proof time scales along two axes: the number of constraints and the size of the circuit. By focusing on circuits for simple machines like MIPS (rather than more complex machines like EVM), we can significantly reduce the size and complexity of the circuit. However, the number of constraints depends on the number of machine instructions executed. Each EVM opcode is broken down into multiple MIPS opcodes, meaning that the number of constraints increases significantly, and your overall proof time increases significantly as well.

However, reducing proof time is also a problem deeply rooted in the Web2 domain. Given that the MIPS machine architecture is unlikely to change in the short term, we can highly optimize circuits and proofers without worrying about future changes to the EVM. I am very confident in hiring a senior hardware engineer to optimize a well-defined problem, which may be ten or even a hundred times the number of engineers needed to build and audit a constantly changing zkEVM target. Companies like Netflix, which have a large number of hardware engineers optimizing transcoding chips, are likely to be willing to take on this interesting ZK challenge with a pile of venture capital funds.

The initial proof time for a circuit like this may exceed the 7-day optimistic rollup withdrawal period. Over time, this proof time will only decrease. By introducing ASICs and FPGAs, we can significantly speed up proof time. With a static target, we can build a more optimized prover.

Eventually, the proof time for this circuit will be lower than the current 7-day withdrawal period for Optimism, and we can begin to consider removing Optimism’s challenge process. Running a prover for 7 days may still be too expensive, so we may want to wait a while longer, but this is feasible. You can even run two proof systems at the same time, so we can start using ZK proofs as soon as possible and fall back to Optimism proof if the prover fails for any reason. When ready, Optimism proof can be removed in a way that is completely transparent to the application, so your optimistic rollup becomes a ZK rollup.

You can focus on other important issues

Running a blockchain is a complex problem that involves not only writing a lot of backend code. At Optimism, much of our work is focused on improving the user and developer experience by providing useful client tools. We also spend a lot of time and effort on “soft” issues: having conversations with projects, understanding their pain points, designing incentive mechanisms. The more time you spend on chain software, the less time you have to deal with these other things. While you can always try to hire more people, organization does not scale linearly, and each new employee adds to the cost of internal communication.

Since the work of zero-knowledge circuits can be directly applied to the running chain, you can build the core platform and proof software development at the same time. Since the client can be modified without changing the circuit, you can decouple your client and proof teams. Optimistic rollup using this approach may be years ahead of zero-knowledge competitors in terms of actual chain activity.

Conclusion

To be very frank, I don’t think there’s anything obviously wrong with the zkMIPS prover unless it can’t be significantly optimized over time. The only real impact on the application, I think, is that gas costs for different opcodes may need to be adjusted to reflect the increased proof time for those opcodes. If this prover cannot be optimized to a reasonable level, then I admit that I have failed. But if this prover can really be optimized, the zkMIPS/zkVM approach may completely replace the zkEVM approach we have now. This may sound like a radical statement, but not long ago, single-step optimistic fault proofs were completely replaced by multi-step proofs.