Site icon Celer Network

The Pantheon of Zero Knowledge Proof Development Frameworks (Updated!)

Benchmarking Circuit Frameworks using SHA-256

We’d like to thank the teams from Polygon Zero, gnark project at Consensys, Pado Labs, and Delphinus Lab, Dr. Srinath Setty for their valuable review and feedback on this blog. 

Change logs:

The Pantheon of Zero Knowledge Proof

Over the past several months, we’ve dedicated a significant amount of time and effort into developing cutting-edge infrastructure that leverages zk-SNARK succinct proofs. As part of our development efforts, we have tested and used a wide variety of Zero-Knowledge-Proof (ZKP) development frameworks. While this journey has been rewarding, we do realize that the abundance of available ZKP frameworks often creates a challenge for new developers who are trying to find the best fit for their specific use cases and performance requirements. With this pain point in mind, we believe a community evaluation platform that is able to provide comprehensive benchmark results is needed and will greatly aid in the development of these new applications. 

To fulfill this need, we are launching the Pantheon of Zero Knowledge Proof as a public good community initiative. The first step will be to encourage the community to share reproducible benchmarking results from various ZKP frameworks. Our ultimate goal is to collectively and collaboratively create and maintain a universally recognized evaluation testbed that covers low-level circuit development frameworks, high-level zkVMs and compilers, and even hardware acceleration providers. We hope that this initiative will expedite the adoption of ZKPs by facilitating informed decision-making, while also encouraging the evolution and iteration of the ZKP frameworks themselves by providing a set of commonly referenceable benchmarking results. We are committed to investing in this initiative and invite all like-minded community members to join us and contribute to this effort together!

A First Step: Benchmarking Circuit Frameworks using SHA-256 

In this blog post, we take the first step towards building the Pantheon of ZKP by providing a reproducible set of benchmark results using SHA-256 across a range of low-level circuit development frameworks. While we acknowledge that other benchmarking granularities and primitives are possible, we selected SHA-256 due to its applicability to a wide range of ZKP use cases, including blockchain systems, digital signatures, zkDID and more. It’s also worth mentioning that we also leverage SHA-256 in our own system, so it is quite convenient for us as well! 😂

Our benchmark evaluates the performance of SHA-256 on various zk-SNARK and zk-STARK circuit development frameworks. Through this comparison, we seek to provide developers with insights into the efficiency and practicality of each framework. Our goal is that these findings will enable developers to make informed decisions when selecting the most suitable framework for their projects.

Proving Systems

In recent years, we’ve observed a proliferation of zero-knowledge proving systems. While it is challenging to keep up with all of the exciting advances in the space, we’ve carefully selected the following proving systems based on their maturity and developer adoption. Our aim is to present a representative sample of different frontend/backend combinations.

  1. Circom + snarkjs / rapidsnark: Circom is a popular DSL for writing circuits and generating R1CS constraints while snarkjs is able to generate Groth16 or Plonk proof for Circom. Rapidsnark is also a prover for Circom that generates Groth16 proof and is usually much faster than snarkjs due to the use of the ADX extension, which parallelizes proof generation as much as possible.
  2. gnark: gnark is a comprehensive Golang framework from Consensys that supports Groth16, Plonk, and many more advanced features.
  3. Arkworks: Arkworks is a comprehensive Rust framework for zk-SNARKs.
  4. Halo2 (KZG): Halo2 is Zcash’s zk-SNARK implementation with Plonk. It is equipped with the highly-flexible Plonkish arithmetization that supports many useful primitives, such as custom gates and lookup tables. We use a Halo2 fork with KZG support from the Ethereum Foundation and Scroll.
  5. Plonky2: Plonky2 is a SNARK implementation based on techniques from PLONK and FRI from Polygon Zero. Plonky2 uses a small Goldilocks field and supports efficient recursion. In our benchmarking, we targeted 100-bit conjectured security and used the parameters that yielded the best proving time for the benchmark job. Specifically, we used 28 Merkle queries, a blowup factor of 8, and a 16-bit proof-of-work grinding challenge. Moreover, we set num_of_wires = 60 and num_routed_wires = 60.
  6. Starky: Starky is a highly performant STARK framework from Polygon Zero. In our benchmarking, we targeted 100-bit conjectured security and used the parameters that yielded the best proving time. Specifically, we used 90 Merkle queries, a blowup factor of 2, and a 10-bit proof-of-work grinding challenge.
  7. Boojum: Boojum is a high performance Rust-based arithmetization & constraint library from the zkSync team, used to implement the the ZK circuits for zkSync Era and the ZK Stack. It is a STARK based  implementation, leveraging PLONK-style arithmetization with FRI as the commitment scheme.
  8. Nova: a recursive SNARK framework using folding scheme.

The table below summarizes the above frameworks with the relevant configurations used in our benchmarking. This list is by no means exhaustive and many state-of-the-art frameworks/techniques (e.g., Nova, GKR, Hyperplonk) are left for future work.

Please be aware that these benchmark results are only for circuit development frameworks. We plan to publish a separate blog benchmarking different zkVMs (e.g., Scroll, Polygon zkEVM, Consensys zkEVM, zkSync, Risc Zero, zkWasm) and IR compiler frameworks (e.g., Noir, zkLLVM) in the future.

FrameworkArithmetizationCommitment SchemeFieldOther Configs
Circom + snarkjs / rapidsnarkR1CSGroth16BN254 scalar
gnarkR1CSGroth16BN254 scalar
ArkworksR1CSGroth16BN254 scalar
Halo2 (KZG)PlonkishKZGBN254 scalar
Plonky2PlonkFRIGoldilocksblowup factor = 8
proof of work bits = 16
query rounds = 28
num_of_wires = 60
num_routed_wires = 60
StarkyAIRFRIGoldilocksblowup factor = 2
proof of work bits = 10
query rounds = 90
BoojumPlonkFRIGoldilocks

Benchmark Methodology

To benchmark these various proving systems, we computed the SHA-256 hash for N bytes of data, where we experimented with N = 64, 128, …, 64K (with one exception being Starky, where the circuit repeats the SHA-256 computation for a fixed 64-byte input but maintains the same total number of message chunks). The benchmark code and SHA-256 circuit implementations can be found in this repository.

Furthermore, we conducted the benchmarking of each system using the following performance metrics:

Please note that we are making some “hand-waving” assumptions regarding the proof size and proof verification cost, as these aspects can be mitigated by composing with Groth16/KZG before going on-chain. 

The Machines

We conducted our benchmarking on two different machines:

The Linux server was used to simulate the scenario with many CPU cores and abundant memory. While the Macbook M1 Pro, which is commonly used for R&D, has a more powerful CPU with fewer cores.

We enabled multithreading where optional, but we did not utilize GPU acceleration in this benchmark. We plan to include GPU benchmarking as part of our future work.

Benchmark Results

Number of Constraints

Before we move on to the detailed benchmarking results, it’s useful to first understand the complexity of the SHA-256 by looking at the number of constraints in each proving system. It is important to be aware that the constraint numbers in different arithmetization schemes are not directly comparable.

The results below correspond to a pre-image size of 64KB. While results may vary with other pre-image sizes, they can be roughly scaled linearly.

Proving SystemNumber of Constraints (64KB SHA-256)
Circom32M
gnark45M
Arkworks43M
Halo24M rows (K=22)
Plonky28M rows (K=23)
Boojum0.5M rows (K=19). specialized optimization used
Starky2^16 transition steps

Proof Generation Time

[Figure 1] illustrates the proof generation time of each framework for SHA-256 over the various pre-image sizes, using the Linux Server. We are able to make the following observations:

We also conducted a proof generation time benchmark on the Macbook M1 Pro, as illustrated in [Figure 2]. However, it is important to note that rapidsnark was not included in this benchmark due to its lack of support for arm64 architecture. In order to use snarkjs on arm64, we had to generate the witness using webassembly, which is slower than the C++ witness generation used on the Linux Server.

There were several additional observations when running the benchmark on Macbook M1 Pro: 

Peak Memory Usage

The peak memory usage during proof generation on the Linux Server and Macbook M1 Pro is shown in [Figure 3] and [Figure 4], respectively. The following observations can be made based on these benchmarking results:

CPU Utilization

We evaluated the degree of parallelization for each proving system by measuring the average CPU utilization during proof generation for SHA-256 over a 4KB pre-image input.. The table below shows the average CPU utilization (and the average utilization per core in the parentheses) on both the Linux Server (with 20 cores) and the Macbook M1 Pro (with 10 cores). 

The key observations are as follows:

Proving SystemCPU Usage (avg per-core usage)(Linux Server)CPU Usage (avg per-core usage)(MBP M1)
snarkjs557% (27.85%)486% (48.6%)
rapidsnark1542% (77.1%)N/A
gnark1624% (81.2%)720% (72%)
Arkworks935% (46.75%)504% (50.4%)
Halo2 (KZG)1227% (61.35%)588% (58.8%)
Plonky2892% (44.6%)429% (42.9%)
Starky849% (42.45%)335% (33.5%)

Nova: Benchmarking Recursive SHA256 (Updated 08/04/2023)

In previous studies conducted by Ethereum’s Privacy and Scaling Engineering group, Nova’s recursive SHA256 performance has been benchmarked. These benchmarks were performed through Nova-Scotia, a platform that facilitates the translation of Circom circuits into formats compatible with Nova. It should be noted, however, that the performance results may be inextricably linked to the translation layer itself.

In this post, we are excited to introduce the first-ever benchmark of Nova’s recursive SHA256 utilizing the “bare-metal” Nova framework. It’s important to recognize that Nova cannot be directly compared with other frameworks in terms of time and computation. This uniqueness stems from the incremental computing capabilities enabled by Nova. To put it simply, breaking down the entire computation into more detailed steps naturally leads to a decrease in memory consumption, even though it may cause an increase in computation time.

In our benchmark, we concentrate on demonstrating Nova’s performance for a fixed SHA256 workload, but with varying step sizes. More crucially, we divide the proof time benchmark into two distinct parts: 1. The folding part; 2. The recursion part, which is used to generate the final SNARK. As illustrated in Figure 5:

  1. The total Nova folding time (calculated as per-step time multiplied by the number of steps) decreases as per-step complexity grows, but not significantly. Nova generally presents an attractive trade-off between folding time and memory scalability. As we utilize smaller step sizes, the total folding time, even during serial execution, only increases slightly.
  2. The final compression time required to generate a SNARK increases as the complexity of each step grows.
  3. The best total time is achieved with 10 hashes per step and a total of 24 steps.

However, there are several critical and nuanced discussion points to consider:

  1. The benchmark is performed in serialized execution. For long-running applications, parallelization has been noted by the creators of Nova and others within the community as a natural avenue to significantly reduce the total folding time as the number of steps increases. As such, Nova could attain an even more favorable time-memory trade-off when parallelization is implemented.
  2. The current final compression step employs Spartan with IPA-PC, which leaves substantial room for improvement in both prover and verifier performance. These improvements could be realized by replacing it with KZG.

In general, we believe that folding-based mechanisms like Nova offer highly promising performance results. We view them as worthy of community collaboration and contribution, especially in relation to the key areas for improvement mentioned above.

Figure 5. Nova Performance with different step sizes

Conclusion and Future Work

This blog post presents a comprehensive comparison of the performance of SHA-256 on various zk-SNARK and zk-STARK development frameworks. Through the benchmark results, we’ve gained insights into the efficiency and practicality of each framework for developers who require succinct proofs for SHA-256 operations. Groth16 frameworks (e.g., rapidsnark, gnark) are found to be faster in generating proofs than Plonk frameworks (e.g., Halo2, Plonky2). The lookup table in Plonkish arithmetization significantly reduces the constraints and proving time for SHA-256 when using a larger pre-image size. Furthermore, gnark and rapidsnark demonstrate an excellent capability to utilize multiple cores for parallelization. Starky, on the other hand, shows a much shorter proof generation time but at the cost of a much larger proof size. In terms of memory efficiency, rapidsnark and Starky outperform other frameworks.

As the first steps to building the Pantheon of ZKP, we acknowledge that this benchmark result is far from being the final comprehensive testbed we aim for it to one day be. We welcome and are open to feedback and criticism  and invite everyone to contribute to this initiative of making ZKP easier and more accessible for developers to use. We are also willing to provide grants for individual contributors to cover the costs of computational resources for large-scale benchmarking. Together, we can improve the efficiency and practicality of ZKP for the benefit of the wider community. 

Exit mobile version