Folder Explainer
Back to folderResearch Explainer · Nation (2025)
Seven quantum SDKs face the same benchmark, and only Qiskit clears the full transpilation suite
Benchpress runs 1,066 identical tests across BQSKit, Braket, Cirq, Qiskit, QTS, Staq and Tket on circuits up to 930 qubits, exposing two orders of magnitude in compile time and a wide gap in functional coverage.
Published February 2025
1,066 tests run against every SDK, on circuits up to 930 qubits and around a million two-qubit gates
1 of 7 SDKs that pass the full 1,032-test transpilation set (Qiskit). Tket fails 87, BQSKit 200, QTS 19
0.18 d vs 20.5 d estimated full-suite runtime for Qiskit versus BQSKit on the same hardware
55× Cirq's speedup over Qiskit on the 100-qubit Hamiltonian-simulation circuit build test
What Benchpress actually does
Quantum software is no longer a single experiment in a Jupyter notebook. It is a stack: build a circuit, manipulate it, then transpile it down to whatever weird coupling map your hardware actually has. Benchpress is an open-source pytest-based suite from IBM Quantum that runs 1,066 tests across all three of those stages, on circuits up to 930 qubits and roughly one million two-qubit gates.
Each test is defined as an abstract 'workout' that defaults to SKIPPED unless an SDK explicitly implements it. That design choice matters: an SDK does not get punished for missing functionality, it gets a transparent count of what it can and cannot do. Seven SDKs are evaluated here, including Amazon Braket, Cirq, Tket, BQSKit, Staq, Qiskit and the Qiskit Transpiler Service.
All results were generated on a single AMD 7900 with 128 GB of RAM, Python 3.12, with timeouts capped at one hour per test. Tests that exceed the cap are recorded as FAILED, not silently dropped.
Per-SDK outcomes across the full 1,066-test Benchpress suite
Recreated from Nation et al. (2025), Fig. 1b. PASSED, SKIPPED, FAILED and XFAIL counts per SDK. Skips are tests an SDK has not implemented; XFAIL means the test is known to fail irrecoverably (e.g. running out of memory). QTS = Qiskit Transpiler Service.
Estimated runtime to complete the full Benchpress suite
Estimated from Nation et al. (2025), Sec. I.B, by scaling each SDK's geometric-mean per-test runtime by Qiskit's measured 0.18-day total. Braket and Cirq are excluded because too few tests pass to support an estimate. Lower is better.
Compiled-circuit quality across all transpilation tests, normalized to Qiskit
Recreated from Nation et al. (2025), Tbl. II, 'all-tests' row. Geometric mean of two-qubit gate count and two-qubit gate depth, divided by the corresponding Qiskit value. A bar at 1.0 matches Qiskit; below 1.0 is better, above 1.0 is worse. Staq numbers are over a 551-test subset that excludes synthesis.
Coverage: most SDKs cannot run most of the suite
The first surprise is how thinly some packages spread. Braket completes 7 tests and skips 1,057. Cirq completes 10 and skips 1,054. Both are perfectly reasonable libraries for what they target, but neither offers a general-purpose transpilation pipeline you can throw arbitrary 100-qubit circuits at.
Staq is a different story: it does 549 tests, but its OpenQASM-only input forces 515 skips, and it cannot return circuits in a target backend's basis set. Tket and BQSKit are full-featured contenders, but Tket fails 87 tests and BQSKit fails 200, mostly to timeouts at large qubit counts. Qiskit and the Qiskit Transpiler Service are the only stacks that touch nearly every test, with Qiskit passing 1,044 and the QTS 1,013.
The 22 universally-skipped tests are a feature: they are device-transpilation cases whose input circuit is wider than the 133-qubit Heron target, so no SDK is expected to pass them.
Speed: a hundredfold gap on the same hardware
If you ran the entire suite end to end, Qiskit would finish in about 0.18 days. Staq, the OpenQASM-flavoured fast option, would finish in 0.15 days because it skips synthesis. The Qiskit Transpiler Service would take 1.3 days, Tket 4.73 days, and BQSKit 20.5 days. Same machine, same tests.
The differences are not uniform. Cirq builds the 100-qubit Hamiltonian-simulation set 55× faster than Qiskit. Qiskit binds parameters into a parameterised SU(2) ansatz 13.5× faster than the next closest SDK. BQSKit, which leans heavily on numerical linear algebra during synthesis, runs out of 128 GB of memory while building a 16-qubit multi-controlled X gate. The lesson is that 'compile time' depends entirely on which test you pick.
For the 100-qubit transpilation work that anyone with hardware actually cares about, Qiskit's compilation cost is roughly 1.4× the on-device runtime. BQSKit and Tket take one to two orders of magnitude longer to compile than Qiskit, which on superconducting devices means the compiler dominates the wall-clock cost of running the circuit at all.
Quality: who actually produces a shorter circuit
Speed is half the picture. The other half is whether the compiled output is any good, measured by two-qubit gate count and two-qubit gate depth, both of which directly drive error rates on real hardware.
Across all transpilation tests, BQSKit produces circuits with 26% more two-qubit gates and 27% more depth than Qiskit on geometric mean. Staq, the fast option, ships circuits with 2.8× the gate count and 2.9× the depth. Tket lands within 1% of Qiskit on depth (and beats Qiskit on heavy-hex topologies, where its synthesis pass shines), but at 13× the runtime. The Qiskit Transpiler Service, which augments Qiskit with reinforcement-learning routing, marginally beats Qiskit on both gate count and depth, with a 12% depth improvement on average and up to 22% on heavy-hex.
The pattern is exactly what you would expect of an engineering trade-off: the QTS pays in seconds of input-output overhead and gets a shorter circuit; Staq pays in synthesis quality and gets a faster pipeline; BQSKit pays in everything to chase optimal synthesis and currently does not finish.
WHY IT MATTERS
Benchpress is the first transparent, reproducible attempt to compare quantum SDKs at scales near current hardware. Its results say two things at once: most quantum compilers cannot yet handle the workloads researchers want to run on 100-plus-qubit devices, and the ones that can differ by two orders of magnitude in compile time at broadly similar circuit quality. The benchmark is authored by Qiskit's maintainers, which is worth knowing, but the suite is open-source and the tests are auditable. If you build quantum software, this is the scoreboard you now have to beat.
Reference
Nation, P. D., Saki, A. A., Brandhofer, S., Bello, L., Garion, S., Treinish, M., & Javadi-Abhari, A. (2025). Benchmarking the performance of quantum computing software. arXiv preprint arXiv:2409.08844v2 [quant-ph]. https://arxiv.org/abs/2409.08844