Python, Javascript and UNIX hacker, open source advocate, IRC addict, general badass and traveler
434 stories
·
7 followers

Modern Statistical Arbitrage

1 Share
Newton by William Blake

The idea

“If I have seen further it is by standing on the shoulders of Giants.” Sir Isaac Newton.

Isaac Newton was arguably the greatest scientist who ever lived. He effectively discovered gravity. He showed us how to predict the motion of the planets. He had every right to brag about his genius. Yet he chose humility. Why?

His critics paraded their ideas with hubris. Newton offered his with deference to those who came before. And that humility was no pose. It came from something he understood early and deeply: knowledge builds upon itself. Each idea improves on the last, little by little, until the small gains add up to something revolutionary. That is the essence of his most famous “standing on the shoulders of Giants” metaphor.

From a young age, Newton kept a commonplace book, a gift from his father. In it, he copied passages from what he read and added his own notes, turning borrowed knowledge into original ideas. He called it his “Waste Book.” The name was a nod to the usefulness of useless knowledge and the combinatorial nature of creativity, what Einstein would later call “combinatory play.” Creating by connecting was the foundation of Newton’s mind. It was his real superpower.

This week, we will cover two articles. We will build on The Modern Spirit of Statistical Arbitrage, a great piece by SysLS. And we will implement a recent breakthrough paper that rigorously tested more than 190 signals in the US equity market.

Here’s our plan:

  1. First, we will summarize the modern spirit of stat arb.

  2. Next, we will construct a signal and show, in a few lines of code, how it performs across several large baskets.

  3. We will then present the combined performance of the top ~20 signals and walk through them, summarizing the source paper along the way.

  4. Finally, we will lay out a simple way to merge these signals into a portfolio that survives friction and costs.

Let’s get started.


Quant Trading Rules is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


What is statistical arbitrage?

At its core, statistical arbitrage is a class of strategy built from a portfolio of signals. Each signal assigns weights to instruments based on how much they outperform or underperform the rest of their basket, measured against some central point of all the other instruments. This can happen in any space: returns, price, volume, flow, dividends, and so on. The basket is then built so that its net factor exposures tend toward zero, which means most of its returns come from idiosyncratic moves rather than broad market or factor exposure. It’s a broad definition, and that’s the point. It covers every flavor of stat arb.

For more details, check the great post by SysLS.

Let’s see a concrete example. We will start with Factor 46 from the paper we are sourcing the signals.

Definition. Factor46 is the paper’s Multi-Period Mean Reversion Ratio, computed as:

(MEAN(CLOSE,3) + MEAN(CLOSE,6) + MEAN(CLOSE,12) + MEAN(CLOSE,24)) / (4 * CLOSE)

The Python code is straightforward:

The inputs are clear: both are date-indexed, point-in-time panels covering every symbol that has ever belonged to a given universe (the Norgate “… Current & Past” watchlist, so it’s survivorship-bias free):

  • data: a wide DataFrame whose columns are a two-level (’Feature’, ‘Symbol’) MultiIndex, so data['Close'] slices out a single date-×-symbol price matrix. Features available are Open, High, Low, Close, Amount, Volume, and VWAP. factor46 only touches data['Close'].

  • index: a same-shaped date-×-symbol integer DataFrame that is 1 on days a symbol was an actual index constituent and 0 otherwise. factor46’s final result.where(index == 1) uses it as a membership mask, blanking out the factor value for any stock/day that wasn’t in the universe at that time so it can’t be traded.

As we can see, Factor 46 takes four trailing moving averages of the closing price (3, 6, 12, and 24 days, each window roughly doubling, spanning a few days out to about a trading month), equal-weights them into a single blended reference price, and divides by today’s close. The result is a pure ratio centered near 1:

  • a value > 1 means the current price sits below its own multi-horizon average (the stock has recently sold off relative to its recent history), and

  • a value < 1 means it sits above it (recently run up).

It is, in essence, a smoothed, scale-free “distance from fair value” measure where “fair value” is the stock’s own recent average price rather than any fundamental anchor.

What it captures and why economically. The effect is short-horizon cross-sectional mean reversion / contrarian price correction. The paper grounds this in behavioral universality: overreaction, herding, and liquidity-seeking are cognitive traits common to all market participants, so when traders push a price away from its recent path (chasing news or dumping inventory), the dislocation tends to correct. Crucially, arbitrageurs are slow to close these gaps because of noise-trader risk and limits to arbitrage, which lets the reversal premium persist long enough to be harvested.

Now, let’s see the code that tests this signal. The most important lines are:

What is happening here?

The line signal = -(factor.subtract(factor.mean(axis=1), axis=0)) measures how far each stock sits from the center of its basket. For every day, it takes the cross-sectional mean of the factor across all stocks and subtracts it from each stock’s value. What’s left is each stock’s displacement from the group. The leading minus sign then flips the bet: stocks far below the average get positive weight, and stocks far above it get negative weight. That is the contrarian core of stat arb, betting that extremes revert toward the center.

The line signal.divide(signal.abs().sum(axis=1), axis=0) just normalizes. It divides every weight by the total gross exposure that day, so the book always sums to the same size and the longs roughly fund the shorts. This keeps the portfolio close to dollar-neutral and comparable across time.

The flip is optional. When flip_signal is true, it multiplies the whole book by -1, reversing the direction of every bet. This is useful because we don’t always know in advance which sign of a factor is the profitable one. The same formula can be tested both ways, longs and shorts swapped, without rewriting anything.

The last line runs the backtest: signal.shift(execution_lag).multiply(returns).sum(axis=1). It takes each day’s weights, shifts them forward by execution_lag days (two by default), and multiplies them by that day’s returns. The shift is the important part. It makes sure we only earn returns on positions we could have actually held, using yesterday’s signal to trade today rather than peeking at information we wouldn’t have had in time. Summing across all stocks then collapses the basket into a single daily P&L for the whole portfolio.

Let’s see the results:

Cumulative returns of the Factor 46 signal across six equity universes
Factor 46 performance across six equity universes

Factor 46 is encouraging, but on its own it’s not a finished strategy. The same formula behaves very differently depending on where we point it: a 0.53 Sharpe on the S&P 500, a 1.46 on the S&P ASX 300. The S&P 500 version is the clear laggard, with the weakest return and a brutal 55.7% drawdown that no one would want to sit through. And these numbers are gross of costs, so the real picture is worse. That’s the honest takeaway: a single signal, however clever, is rarely tradeable alone. The edge is real but thin, and one factor gives us no diversification when it goes through a bad stretch. The fix is to stop relying on any one signal. Combine many, each with its own small edge, and the weak spots start to cancel out. That is exactly where we go next.

From a single signal to a portfolio of signals

Let’s see what happens when we combine the strongest 17 signals according to the source paper:

Cumulative returns of the portfolio of signals across six equity universes
Portfolio performance across six equity universes

The effect of diversification is immediate. The single Factor 46 signal swung as deep as 55.7% on the S&P 500; the combined portfolio’s worst drawdown across all six universes is just 8.2%, and only 4.3% on the Russell 3000. Every Sharpe ratio now sits above 1.0, from 1.15 to 1.76, where the lone signal struggled to clear 0.5. The annual returns look smaller, but that’s the point: the single signal earned big numbers by running enormous risk, while the portfolio earns steadier returns with a fraction of the pain. That’s diversification doing its job. Seventeen small, imperfect edges combine into something far smoother than any one of them.

So what are these seventeen signals? Let’s look at them next.

Inside the portfolio

The list below shows the strongest signals, the most pervasive cross-sectional drivers according to the paper:

Read more

Read the whole story
miohtama
12 days ago
reply
Helsinki, Finland
Share this story
Delete

Why LSTM is overrated for price prediction and when Gradient Boosting beats it

1 Share

The Hard Truth About Deep Learning in Quant Trading

For years, LSTM neural networks have been marketed as the holy grail of financial prediction.

Search for:

  • AI stock prediction
  • deep learning trading
  • neural networks for finance
  • predictive trading systems

…and you’ll see endless claims that LSTMs can “learn market patterns” and predict future prices better than traditional models.

The narrative sounds compelling:

Markets are time series.
LSTMs are designed for time series.
Therefore, LSTMs should dominate trading.

But in real quantitative trading environments, things are far less glamorous.

Many professional quant researchers eventually discover something surprising:

Simpler gradient boosting models often outperform LSTMs in real-world financial prediction tasks.

Models like:

  • XGBoost
  • LightGBM
  • CatBoost

frequently beat deep learning systems in:

  • robustness
  • interpretability
  • training efficiency
  • out-of-sample stability
  • tabular financial feature prediction

This article breaks down:

  • why LSTMs became popular
  • why they often fail in live trading
  • where gradient boosting excels
  • when deep learning actually makes sense
  • how institutional quants approach the problem

This is not an anti-deep-learning argument.

It’s a realism argument.

Why LSTMs Became Popular in Finance

LSTM (Long Short-Term Memory) networks were designed to solve a specific problem in machine learning:

Sequential dependency modeling.

Unlike traditional neural networks, LSTMs maintain memory over time.

This makes them theoretically ideal for:

  • speech recognition
  • language modeling
  • sequential forecasting
  • time-series prediction

Naturally, traders thought:

“Markets are sequential data too.”

And thus began the explosion of LSTM-based trading research.

The Core Promise of LSTM Trading Models

The appeal was obvious.

LSTMs can theoretically:

  • capture temporal dependencies
  • model nonlinear dynamics
  • remember historical patterns
  • adapt to changing sequences

This sounded perfect for financial markets.

Especially compared to:

  • linear regression
  • moving averages
  • traditional indicators

The Problem: Financial Markets Are Not Normal Time Series

This is where theory collides with reality.

Financial markets differ dramatically from structured sequential domains like language.

Markets are:

  • noisy
  • adversarial
  • regime-dependent
  • non-stationary
  • reflexive
  • heavily stochastic

This changes everything.

Why LSTMs Struggle With Financial Data

1. Financial Signals Are Weak

In natural language:

"The cat sat on the..."

The next word is highly predictable.

In markets:

Price moved up yesterday...

The next move is barely predictable.

Financial signal-to-noise ratio is extremely low.

This is devastating for deep learning models.

2. Markets Constantly Change Regime

LSTMs assume historical relationships remain somewhat meaningful.

But markets shift between:

  • trending periods
  • mean-reverting phases
  • volatility expansions
  • macroeconomic shocks
  • liquidity crises

Patterns decay rapidly.

This reduces the usefulness of long sequential memory.

3. LSTMs Require Huge Amounts of Data

Deep learning thrives on massive datasets.

Examples:

Even decades of market data are relatively small for deep learning standards.

And most financial datasets are highly autocorrelated and noisy.

4. Overfitting Becomes Extremely Dangerous

LSTMs are highly expressive models.

They can memorize historical noise incredibly well.

This creates:

  • beautiful backtests
  • terrible live performance

Many traders mistake memorization for prediction.

The Backtest Illusion

An LSTM can produce:

  • smooth equity curves
  • high historical Sharpe ratios
  • strong in-sample accuracy

while actually learning:

  • noise
  • random structure
  • data artifacts

Instead of genuine market edge.

5. Financial Features Are Often Tabular, Not Sequential

This is a massive insight many beginners miss.

Most useful trading information comes from:

  • volatility
  • spreads
  • momentum
  • factor exposures
  • volume anomalies
  • macro features
  • options positioning

These are tabular features.

And tabular data is where gradient boosting dominates.

Why Gradient Boosting Often Wins

Now we reach the important part.

Models like XGBoost and LightGBM excel because they align better with the structure of financial data.

What Is Gradient Boosting?

Gradient boosting combines multiple weak decision trees into a strong predictive system.

Instead of learning sequential memory…

It learns:

  • nonlinear interactions
  • feature relationships
  • probabilistic splits

This works remarkably well in financial prediction.

Why XGBoost Became a Quant Favorite

Because it handles financial data characteristics extremely well.

1. Excellent With Tabular Data

Financial datasets are usually structured like:

Gradient boosting thrives in this environment.

LSTMs do not naturally excel here.

2. Better Performance With Smaller Datasets

XGBoost can produce strong results with relatively limited data.

This is critical in finance where:

  • data is expensive
  • clean labels are limited
  • signal is weak

3. Less Overfitting Risk

Compared to deep neural networks:

  • tree ensembles generalize better
  • regularization is stronger
  • training stability is higher

This improves out-of-sample robustness.

4. Faster Training and Iteration

LSTMs require:

  • GPU acceleration
  • hyperparameter tuning
  • sequence preparation
  • long training cycles

XGBoost trains dramatically faster.

This matters enormously in quant research.

5. Better Interpretability

Institutional traders need to understand:

Why is the model making decisions?

Gradient boosting allows:

  • feature importance analysis
  • SHAP values
  • split interpretation

LSTMs are often black boxes.

Real-World Quant Workflow

Many professional firms use pipelines like:

Market Data

Feature Engineering

XGBoost / LightGBM

Probability Forecast

Execution Engine

Not:

Raw Prices

Massive LSTM

Magic Predictions

That distinction matters.

When Gradient Boosting Dominates LSTM

Gradient boosting usually performs better when:

1. Features Matter More Than Sequences

If your edge comes from:

  • volatility structure
  • order flow imbalance
  • factor combinations
  • sentiment signals

boosting models are often superior.

2. Data Size Is Limited

Smaller datasets strongly favor boosting.

3. You Need Fast Research Cycles

Quant firms test thousands of hypotheses.

Training speed matters enormously.

4. Explainability Matters

Especially in institutional environments.

5. Prediction Horizon Is Short

For many short-term signals:

  • recent engineered features matter more than long memory

When LSTM Actually Makes Sense

Now the important nuance:

LSTMs are not useless.

They simply get overused.

LSTMs Work Better When:

1. Sequential Structure Truly Matters

Examples:

  • order book dynamics
  • tick-level flow
  • high-frequency sequences

2. Massive Data Exists

Examples:

  • alternative datasets
  • market microstructure data
  • cross-asset sequences

3. You Model Complex Temporal Relationships

Examples:

  • volatility forecasting
  • intraday regime transitions

4. Combined With Other Architectures

Modern quant systems increasingly use:

  • CNN + LSTM hybrids
  • transformers
  • attention models
  • ensemble systems

Rarely standalone LSTMs.

Why Many “AI Trading Gurus” Mislead Beginners

Because deep learning sounds sophisticated.

Saying:

“I built an LSTM stock predictor”

sounds more impressive than:

“I used XGBoost on engineered volatility features”

But sophistication does not equal predictive power.

The Hidden Secret of Quant Trading

The real edge usually comes from:

  • better features
  • cleaner data
  • superior execution
  • regime awareness
  • risk management

Not from using the most complicated model.

Feature Engineering Beats Model Complexity

This is one of the biggest lessons in financial machine learning.

A strong feature set with XGBoost often outperforms:

  • poorly engineered deep learning systems
  • raw-price LSTMs
  • overfit neural networks

Because financial prediction is primarily a:

Feature engineering problem.

Not an architecture problem.

The Institutional Perspective

Professional quant firms rarely obsess over one model.

Instead they focus on:

  • data infrastructure
  • signal stability
  • execution quality
  • robustness testing
  • probabilistic forecasting

Models are just one component.

The Rise of Hybrid Quant Systems

Modern trading systems increasingly combine:

  • gradient boosting
  • deep learning
  • regime detection
  • ensemble forecasting

This hybrid approach is far more realistic.

Example Hybrid Architecture

Technical Features

Volatility Features

Sentiment Features

XGBoost Layer

Meta-Model

Execution System

This often performs better than pure LSTM systems.

The Biggest Misconception About AI Trading

Many beginners assume:

More complex AI = better trading performance

In reality:

More complexity often increases fragility

Especially in noisy financial environments.

Final Thoughts

LSTMs became popular in trading because they seemed perfectly aligned with financial time series.

But real markets are not clean sequential systems.

They are noisy, adaptive, adversarial environments with weak predictive structure.

And in those environments:

  • simpler models
  • stronger features
  • better validation methods

often outperform deep neural architectures.

Gradient boosting models like XGBoost succeed because they match the true structure of most financial datasets:

  • tabular
  • sparse
  • nonlinear
  • noisy

The lesson is not:

“Deep learning is bad.”

The lesson is:

The best model is the one that matches the actual nature of the data.

And in quantitative trading, that distinction is everything.

Important Note

“If you’re new to investing and trading and not sure where to begin, here’s a simple guide to get you started. Grab it now and level up your investing game.”

A Message from InsiderFinance

Thanks for being a part of our community! Before you go:


Why LSTM is overrated for price prediction and when Gradient Boosting beats it was originally published in InsiderFinance Wire on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
miohtama
27 days ago
reply
Helsinki, Finland
Share this story
Delete

@adlrocha - Google's ZKP-hidden quantum attack

1 Share

This week started with a bang. Anthropic accidentally leaked the source code for Claude Code, and within hours someone had kicked off a clean-room rewrite in Python. The internet, understandably, caught fire, and it seemed like the perfect topic to write about this week. As there were still lots of threads open, and people trying to make sense of the code base, I decided to leave it for when the dust settles (that way I could read the code base myself to draw my own conclusions before rushing into writing anything).

Fortunately, amidst the noise of Claude Code’s leak, Google Quantum AI made a release (Google featuring this newsletter again) that didn’t get the attention that I think it deserved. It was the perfect excuse to write again in this newsletter about quantum computing.

I’ve been fascinated by quantum computing since I was first introduced to it (at the time, I even wrote a patent that leveraged quantum information to reach consensus in distributed networks, but I’ll spare you the details for now). From all the new fancy technologies coming up these days, quantum computing is, to me, one of the hardest technology timelines to read. Since I’ve started following and studying closely there’s been an enormous amount of hype, a few winters, a lot of exciting progress, and no immediate use case to show off yet.

I’ve been studying the technology on the side for years, but never worked on it professionally. My only hands-on experience with the technology has been through a few Qiskit hackathons many years ago (I guess the barriers were high). I’ve been meaning to go back and get hands-on time with something like IBM’s publicly available quantum systems just to recalibrate my intuition, but I never find the time or motivation. This paper made me feel that urgency more acutely that I needed to recover this rusty skill.

The TL;DR of what Google dropped this week is a whitepaper claiming to reduce the quantum resources needed to break Bitcoin’s cryptography by roughly 20-fold. Cryptocurrencies and quantum computing… you can imagine how this topic took preference over Claude Code’s leak.


Shor’s algorithm and the hard problem underneath ECDSA

Before we get to the papers, let’s set the stage so everyone (independently of your knowledge about the space) is on the same page. This means taking a quick trip into the cryptographic primitives that currently protect every Bitcoin and Ethereum transaction.

When you sign a transaction on Bitcoin or Ethereum, you’re using a cryptographic primitive called ECDSA: the Elliptic Curve Digital Signature Algorithm. The security of ECDSA rests entirely on one hard problem: the Elliptic Curve Discrete Logarithm Problem (ECDLP). Here’s a high-level intuition of what this problem is all about.

An elliptic curve over a finite field forms a specific algebraic structure: a prime-order cyclic group. You’ll see that this really matters when we discuss how it can be attacked by quantum computers. The group is generated by a single distinguished point G (the generator), and every element of the group can be written as k·G for some integer k. Your private key is that integer k. Your public key is Q = k·G, the generator point “multiplied” by your private key, where multiplication means repeatedly applying a specific point-addition rule defined by the curve’s geometry.

Given Q and G, recovering k by brute force classically (meaning with our current computing systems) requires roughly 2^128 operations on Bitcoin’s curve (secp256k1). That’s a few hundred undecillion operations, effectively the age of the universe at a billion operations per second. The problem is hard in one direction only. Computing Q from k is instant. The reverse is infeasible.This asymmetry is what cryptographers call a hard problem, and this is why they are so appealing to create cryptographic primitives out of them.

Remember my post a few months ago about complexity theory and P=NP? ?This has a lot to do with that. Cryptographic primitives are built on the assumption of hard problems complexity. Technically, ECDLP sits in NP∩co-NP, it’s not known to be NP-hard in the strict complexity-theoretic sense, and most cryptographers believe it isn’t. It isn’t known to be in P either. Another hard problem commonly used for cryptographic primitives is integer factorisation, the hard problem underlying for instance RSA, which sits in exactly the same class: NP∩co-NP, not NP-complete, not known to be efficiently solvable. Both problems are “believed hard” without being provably hard in the complexity-theoretic sense.

Both problems resist classical attacks for the same reason: no efficient algorithm has been found after decades. And here is where Shor’s famous algorithm enters the scene.

Shor’s algorithm, published in 1994, exploits the cyclic structure of the group. Rather than brute-forcing the keyspace, it uses quantum Fourier transforms and period-finding on the multiplicative structure of the group to extract k from Q in polynomial time. The precise gate complexity is approximately O(n² log n log log n) in the bit-length n of the key (often cited as O(n²) for shorthand) though the full form matters when you’re counting Toffoli gates against a hardware budget (these gates are the quantum equivalent of a controlled-controlled-NOT, used to implement AND operations reversibly. Think of it as the universal reversible gate of quantum computing, they will be important when we discuss the contributions of the papers released). For a 256-bit key, that’s tractable, if you have a sufficiently large quantum computer.

The question has always been: how large is “sufficiently large”?I think you see where I am getting at. The papers released this week seem to have changed our existing intuitions about how many qubits are needed for Shor’s algorithm to break our existing cryptography.


The two papers released

The two papers that dropped this week have made some experts reevaluate their timelines about the security of the underlying security of blockchain systems that haven’t adopted post-quantum:

The Google Quantum AI whitepaper, “Securing Elliptic Curve Cryptocurrencies against Quantum Vulnerabilities: Resource Estimates and Mitigations”. Authored by Ryan Babbush and Craig Gidney at Google Quantum AI, alongside Thiago Bergamaschi (UC Berkeley), Justin Drake from the Ethereum Foundation, and Dan Boneh from Stanford. Google also published a blog post on the responsible disclosure methodology.

Let me give you some background about some of the authors so you can frame this contribution in the state-of-the-art.. Justin Drake is one of the primary researchers at the Ethereum Foundation responsible for Ethereum’s data-availability roadmap, he was a key architect behind EIP-4844 and the KZG trusted setup ceremony. Dan Boneh is a professor of computer science at Stanford, co-director of the Stanford Security Lab, and co-author of the most widely used applied cryptography textbook in the field. His free online cryptography course has been taken by over half a million people, and some of his papers were key for the development of Filecoin (another one that hits home). Finally, Craig Gidney has been responsible for a lot of the recent progress in the intersection of quantum and AI. You can imagine the weight that claims from these people can have in their respective fields. He published a paper in May 2025 showing RSA-2048 breakable with under 1 million physical qubits, down from 20 million in his own 2019 estimate.

On the other hand, the Oratomic paper, “Shor’s algorithm is possible with as few as 10,000 reconfigurable atomic qubits”, comes from Oratomic, a neutral-atom quantum computing company out of Pasadena, with John Preskill (Caltech) and Dolev Bluvstein as co-authors. Crucially, the Google whitepaper cites the Oratomic circuits as its own input, the two papers are cross-linked and share the same circuit design.

The papers present two circuit variants for attacking secp256k1:

  • Circuit 1: ≤1,200 logical qubits, ≤90 million Toffoli gates

  • Circuit 2: ≤1,450 logical qubits, ≤70 million Toffoli gates

Translated to physical hardware using surface codes on a superconducting architecture (planar degree-4 connectivity, consistent with Google’s Willow-class chips): fewer than 500,000 physical qubits. The previous best estimate, Litinski (2023), put this at roughly 9 million physical qubits. Google just moved that needle by nearly 20-fold.

That reduction didn’t come from a hardware breakthrough, it came from a better circuit. Running Shor’s on ECDLP isn’t just “run the algorithm” (this is somethign I learnt the hard way the first time I was tinkering with Qiskit and IBMs quantum computers). The core computation is elliptic curve point multiplication, computing k·G for arithmetic on secp256k1, which Shor’s algorithm needs to evaluate in quantum superposition as part of its period-finding routine. That means implementing modular arithmetic (specifically Montgomery multiplication, the standard technique for efficient modular operations) entirely in reversible quantum gates.

Every classical arithmetic operation has to be “uncomputed” after use to avoid accumulating garbage qubits that would corrupt the superposition. The dominant cost is Toffoli Gates and there are hundreds of millions of them in a naively constructed circuit.

Prior work optimised either qubit count or gate count, but not both simultaneously. The relevant figure of merit for real hardware is spacetime volume, i.e. the product of qubits × gates × cycle time, because that’s what determines wall-clock runtime on an actual machine.

Google’s contribution is a circuit that achieves the best spacetime volume ever published for ECDLP-256, through two main improvements. First, they applied improved windowing to Montgomery multiplication: rather than processing one bit of the scalar at a time, they process wider windows, amortising the Toffoli cost across more bits per round, reducing the total gate count substantially.

Second, they revised the T-state factory overhead: magic state distillation (the process for producing the high-fidelity ancilla states that Toffoli gates consume) is the dominant physical qubit cost in any surface-code implementation, and prior estimates were conservative. More careful accounting of distillation factory layout and scheduling cut the physical qubit estimate significantly. The combination brought the spacetime volume down far enough to halve the physical qubit requirement relative to Litinski 2023, and Litinski 2023 had already improved substantially on everything before it.

But before going any further I think is worth stressing the distinction between logical and physical qubits and why this matters. Theoretical qubits are what algorithms assume, perfect, noiseless two-state quantum systems. Logical qubits are error-corrected abstractions built from many physical qubits using a quantum error-correcting code (typically a surface code, I have to admit that loving information theory this field of error-corrected qubits is one that I am fascinated about. I actually leverage some of these error-corrected algorithms for my patent).

Physical qubits are the actual noisy hardware. Today’s devices operate at error rates around 10^-3 per gate, which means you need roughly 1,000 physical qubits to sustain one reliable logical qubit. The overhead varies by architecture and target error rate, but it’s the dominant cost in any near-term hardware plan.

To put the current state in perspective: Google’s Willow chip has 105 physical qubits. IBM’s Condor processor reached 1,121 qubits in late 2023, the largest superconducting qubit count to date, though not all at useful error rates. The gap between today and 500,000 error-corrected qubits is still enormous. But the conceptual threshold has moved, and it’s moved faster than almost anyone expected.

The two papers cover different hardware architectures, and the distinction matters. Superconducting qubits, the technology behind Google Willow and IBM’s quantum systems, encode quantum information in tiny circuits cooled near absolute zero (i.e close to 0 Kelvins), where electrical resistance vanishes and quantum effects dominate. Gate operations run in nanoseconds to microseconds. Neutral-atom architectures, like those used by Oratomic, trap individual atoms using focused laser beams and manipulate their quantum states optically. They achieve extremely long coherence times and flexible qubit connectivity, but gate operations are around 1000x slower). Ion trap systems (IonQ, Quantinuum) work on similar principles: individual ions levitated in electromagnetic fields and controlled with lasers. IonQ’s Forte system currently achieves around 29 “algorithmic qubits”, roughly the effective logical qubit count after accounting for noise. The Oratomic team reported 6,100 coherent atomic qubits trapped, with fault-tolerant operations demonstrated below the error threshold on around 500 qubits.

The Oratomic result is the more striking one in raw qubit count: the same computation runs with as few as 10,000–26,000 qubits on neutral-atom hardware. The catch: at current clock speeds (around 1ms/cycle), runtime is close to 10 days, not minutes. That limits the attack to at-rest targets, long-dormant wallets that have been sitting on-chain for years, not live transaction interception.

That clock speed difference is one of the genuinely novel framings in these papers. Superconducting hardware runs gate cycles in microseconds; neutral atoms and ion traps are 100–1,000x slower. This determines which kind of attack is feasible. The papers define three categories: on-spend (race Bitcoin’s block clock before the transaction confirms), at-rest (target publicly exposed keys on dormant wallets), and on-setup (recover secrets from one-time cryptographic ceremonies like KZG). Fast-clock architectures enable on-spend. Slow-clock ones are limited to the other two.


The ZKP disclosure 😱

Here’s the part that really blew my mind about Google’s whitepaper (and that I think justifies even more having Justing Drake and than Dan Boneh around for the paper). Google did not publish the attack circuits. Instead, they published a zero-knowledge proof that the circuits work.

The attack circuit, a sequence of quantum gate operations implementing Shor’s algorithm for secp256k1, was written as an ordinary Rust code using a quantum circuit library that models qubits, gates (Hadamard, CNOT, Toffoli, phase rotation), and multi-qubit arithmetic operations. The program encodes the Montgomery modular multiplication routine at the core of the elliptic curve group arithmetic, the quantum Fourier transform used for period extraction, and the bookkeeping that wires those components into a complete Shor’s instance for ECDLP-256. The circuit itself is a classical description of a quantum computation, a directed graph of gate operations to be executed on hardware. It’s the blueprint, not the machine. (sidenote: the circuit of the image is the classical implementation of Shor’s algorithm for those of you that haven’t seen one ever).

That Rust program was then fed into SP1, a zero-knowledge virtual machine built by Succinct Labs which targets the RISC-V architecture. For those unfamiliar with ZK-VMs, SP1 compiles Rust to RISC-V bytecode (using the standard RISC-V target), and then generates a cryptographic proof, specifically a STARK-based proof, that a given RISC-V program was executed correctly on specific inputs and produced a specific output. You get a proof of correct execution without anyone needing to see the program or the inputs.

In this case: Google ran the circuit program against 9,000 randomly sampled secp256k1 input points, verified that the circuit correctly performs the elliptic curve operations it claims to, and had SP1 generate a proof of that execution. The SHA-256 hash of the circuit was committed publicly so anyone can verify they’re talking about the same circuit. The SP1 proof attests: “this hash corresponds to a program that, when run on these inputs, produces these outputs consistently with a correct Shor’s implementation for ECDLP-256.”

The inner SP1 proof is a STARK. STARKs have no trusted setup, but they’re large, hundreds of kilobytes to megabytes. So SP1 wraps the STARK in an outer Groth16 SNARK. Groth16 takes the STARK proof as a statement to be proved and generates a compact proof of it: roughly 200 bytes, regardless of the complexity of the original computation. The final artefact, code and proof, sits on Zenodo. Anyone can download it and verify Groth16’s 200-byte proof in milliseconds, without ever seeing the attack circuit.

What this means practically: the existence and correctness of the attack is publicly verifiable. The attack tool itself is not.

This is a genuinely new move in responsible disclosure. The standard practice for software vulnerabilities is to notify the vendor, wait a window, then publish. But there’s no vendor to notify here, no patch to deploy in 90 days. So Google found a different answer: prove the result is real, withhold the exploit.

Here’s where it gets funny, or uncomfortable, depending on your perspective. Groth16 is itself an elliptic curve construction. It operates over BN254, a pairing-friendly curve distinct from secp256k1, but it is still fundamentally an elliptic curve scheme. The pairings that make Groth16 work rely on the same class of hard problems, discrete logarithms on elliptic curves, that Shor’s algorithm can break. So Google used a cryptographic primitive that is also eventually threatened by sufficiently powerful quantum computers to prove the existence of the circuit that threatens elliptic curve cryptography. If CRQCs (Cryptographically Relevant Quantum Computers, the term the whitepaper uses for machines capable of running these attacks) ever arrive at scale, Groth16 and the broader ZKP ecosystem go down with the rest.

I don’t know if that’s elegant or just funny. Probably both.

But what is even crazier to me is that this could become eventually the standard model for future research and proprietary algorithms, where companies and researchers can show that “their algorithms do what they claim to be doing” without leaking anything about its underlying implementation. That’s enough for a post of itself. I’ve been saying it for a while but ZKP primitives can have immediate use outside of blockchain networks and web3.


Post-quantum cryptography: what exists, what migration looks like

To understand why certain cryptographic schemes survive a quantum computer and others don’t, we need to understand why Shor’s algorithm works in the first place.

Shor’s algorithm is a period-finding machine. It exploits the fact that ECDLP and integer factorisation both reduce to finding the period of a function defined over a cyclic algebraic group. Quantum Fourier transforms make period-finding tractable on cyclic structures, and that’s the attack. The quantum speedup isn’t general; it’s specific to problems with this periodic structure. If you pick a hard problem that doesn’t have it, Shor’s doesn’t help.

That’s exactly what post-quantum cryptography does.

Lattice problems, specifically the Shortest Vector Problem (SVP) and its structured variant, Module Learning With Errors (MLWE), ask you to find the shortest non-zero vector in a high-dimensional lattice, or to distinguish a structured equation system from a random one. Neither problem has a cyclic group structure Shor’s can exploit. The best known quantum algorithm for SVP offers only a polynomial speedup over classical approaches, not the exponential gap that Shor’s gives against ECDLP.

SVP is NP-hard in the worst case, and lattice cryptography has an elegant property: worst-case hardness reduces to average-case hardness, which makes the security proofs unusually strong. The specific structured variants used in practice (MLWE, MSIS) sit slightly off the worst-case problem, so ongoing cryptanalysis remains active, but no quantum attack comes close to breaking them.

Hash-based problems rest on collision resistance alone. There is no algebraic structure, no group, no lattice. If SHA-256 or SHAKE-256 resist collision attacks, and there’s no known quantum or classical attack that breaks them, the scheme is secure. Grover’s algorithm gives a quadratic speedup for unstructured search, which halves the effective security level (256-bit security becomes 128-bit), but doubling the output size restores it. That’s a parameter choice, not a structural break.

Code-based problems, specifically the Syndrome Decoding Problem, ask you to find a codeword in a random linear error-correcting code given a corrupted version. Berlekamp showed in 1978 that SDP is NP-complete in the worst case. No quantum speedup beyond polynomial is known. The cost has historically been large key sizes (around 1MB for McEliece-based schemes), but newer constructions have reduced this substantially.

The NIST post-quantum standards (i.e. list of post-quantum standards so far accepted by NIST) are a portfolio of bets across those three problem families:

  • ML-KEM (FIPS 203), key encapsulation, formerly CRYSTALS-Kyber. Lattice-based (MLWE). FIPS-finalised, production-ready.

  • ML-DSA / Dilithium (FIPS 204), digital signatures. Lattice-based (MLWE/MSIS). Signature size: ~2.5KB. FIPS-finalised, production-ready.

  • SLH-DSA / SPHINCS+ (FIPS 205), stateless hash-based signatures. Signature size: ~8KB. FIPS-finalised. Heavy but the most conservative security assumption available.

  • HQC, selected March 2025 as fifth KEM, full standard expected 2027. Code-based (syndrome decoding). Smaller keys than McEliece.

And why not migrate immediately to these primitives. The main issue rests in the size of the keys, that can mean breaking a lot of assumptions in some systems (including blockchain networks). Post-quantum keys can be 100-fold larger than existing ECDSA and even RSA keys.


Has the timeline really changed?

What about all of this claims and the statement in Google’s paper about this discovery making them “reevaluate” current quantum supremacy timelines? My immediate answer would be, “who knows?”

Here’s one thing that I think some people may be missing when reading this results: the dramatic reduction in resource counts is real, but the practical problem is not about how many qubits you need on paper. It’s about whether you can build qubits good enough to make those counts mean anything.

The Google whitepaper assumes a physical gate error rate of 10^ 3 sustained uniformly across all qubits. That’s the modelling assumption. Where is hardware today?

The state of the art, as of 2024, is two-qubit gate fidelity of ~99.9%, which is exactly 10^ -3. Multiple groups have now reported this number, including Google with Willow. So you might conclude the assumption is already met. Scott Aaronson (you probably remember him as being my favourite computer scientist alive :) ), who has been tracking this more carefully than most, made exactly this point in September 2024:

“Within the past year, multiple groups have reported 99.9% [two-qubit gate fidelity]. I’m now more optimistic than I’ve ever been that, if things continue at the current rate, either there are useful fault-tolerant QCs in the next decade, or else something surprising happens to stop that.”

But he also noted that 99.99%, a full order of magnitude better, is what you really need for sustained fault-tolerant operation where error correction delivers a net gain rather than just breaking even. That threshold hasn’t been reached.

There’s a version of the coverage that reads these papers as evidence the timeline itself has shortened. I don’t think that’s right, and the distinction matters. What these papers changed is the target: the number of qubits and gates required on paper to run the attack. What they didn’t change is the distance to that target, which is determined entirely by hardware, and hardware hadn’t moved much this past month. The Willow chip had the same error rates the day after the whitepaper dropped as it did the day before. A more efficient attack circuit doesn’t build better qubits. It lowers the bar you need to clear, but if you can’t clear the bar yet, lowering it isn’t the same as getting closer.

More critically: those fidelity numbers are measured on the best qubit pairs on a 100-qubit chip under carefully optimised conditions. Nobody has demonstrated 99.9% gate fidelity sustained uniformly across a million physical qubits.

Google’s own Willow error correction paper, the paper that demonstrated below-threshold surface code performance for the first time, achieved that milestone on 101 physical qubits. The target for a cryptographically relevant attack is somewhere between 500,000 and 1 million. The Willow paper itself notes that logical performance is limited by rare correlated error events, roughly once per hour, that fall outside the standard noise model fault-tolerance proofs assume. At million-qubit scale, the frequency and character of those events is unknown.

Then there’s inter-chip communication. Gidney’s estimates assume a planar grid of qubits with nearest-neighbour connectivity. At the million-qubit scale, that means stitching together many chips into a coherent quantum system, something that has not been demonstrated anywhere. Aaronson again: “eventually you’ll need communication of qubits between chips, which has yet to be demonstrated.”

There’s still a sentence near the end of the whitepaper that I think frames the risk correctly:

“It is conceivable that the existence of early CRQCs may first be detected on the blockchain rather than announced.”

That’s the authors acknowledging a tail scenario the “Nassim Taleb-way”: a nation-state or well-funded private effort builds this quietly, and the first public evidence of success is unexplained large wallet drains on-chain (my good friend Marko Vukolic always said that Bitcoin and Satoshi’s wallet was the biggest quantum computing bounty available, so this claim adds up).

So the honest position is: the resource count dropped dramatically, and that matters. But the real question for the timeline isn’t how many qubits you need on paper, it’s whether anyone can build a million qubits that are actually good enough.

We’ll have to wait and see… Until next week!

Read the whole story
miohtama
68 days ago
reply
Helsinki, Finland
Share this story
Delete

Which one are you? 😆 I am definitely feeling the tea + anxiety one right now.

1 Share

chibird:

Which one are you? 😆 I am definitely feeling the tea + anxiety one right now.

Chibird store | Positive pin club | Instagram

Read the whole story
miohtama
245 days ago
reply
Helsinki, Finland
Share this story
Delete

Short Walk

1 Share

Short Walk

And so begins my annual week of pirate chickens, leading up to September 19th’s Talk Like A Pirate Day!

Read the whole story
miohtama
269 days ago
reply
Helsinki, Finland
Share this story
Delete

David Splinter on how much tax billionaires pay

1 Share

Here is his comment on the paper presented here:

Summary: The U.S. tax system is highly progressive. Effective tax rates increase from 2% for the bottom quintile of income to 45% for the top hundredth of one percent. But rates may be lower among those with the highest wealth. This comment starts with the “top 400” tax rate estimates by wealth in Balkir, Saez, Yagan, and Zucman (2025, BSYZ), and adjusts these to account for Forbes family wealth being spread across multiple tax returns, to avoid double-counting capital income, to include missing taxes, and to apply standard tax and income definitions. This results in “top 400” effective tax rates exceeding overall tax rates by 13 percentage points. Still, the “top 400” tax rate is lower than for the top hundredth of one percent, suggesting a modest decline in effective tax rates at the very top when ranking by wealth. However, this is an unsurprising deviation from progressive rates because the tax system targets income, not wealth. Compared to the annual estimates in BSYZ, longer-run estimates are more appropriate for top wealth groups, which have volatile wealth and concentrate charitable giving into end-of-life bequests. End-of-life giving suggests long-run top 400 effective tax-and-giving rates could exceed 75%.

The full link.

The post David Splinter on how much tax billionaires pay appeared first on Marginal REVOLUTION.

Read the whole story
miohtama
290 days ago
reply
Helsinki, Finland
Share this story
Delete
Next Page of Stories