A Google Research Paper Just Crashed Memory Chip Stocks

Google published a research blog post last Tuesday about a new compression algorithm for AI models. Within hours, memory chip stocks were falling. Micron dropped 3%, Western Digital lost 4.7%, SanDisk fell 5.7%. By Thursday, Samsung was down nearly 5% and SK Hynix dropped 6% in Seoul. Micron has now fallen nearly 30% from its March 18 all-time high.

The algorithm is called TurboQuant, and it targets one of the most expensive bottlenecks in running large language models: the key-value cache. This is the high-speed data store that holds context information so the model doesn't have to recompute it with every new token. As models process longer inputs, the cache grows rapidly, consuming GPU memory that could otherwise serve more users or run larger models.

TurboQuant compresses the cache to just 3 bits per value, down from the standard 16 — reducing its memory footprint by at least six times without, according to Google's benchmarks, any measurable loss in accuracy. The paper, set to be presented at ICLR 2026, also achieves up to 8x faster attention processing.

Cloudflare CEO Matthew Prince called it "Google's DeepSeek" — a reference to the Chinese AI lab whose efficiency breakthroughs triggered a massive Nasdaq sell-off in January 2025. The comparison isn't subtle: both are research results that forced investors to recalculate how much hardware the AI industry actually needs.

The counterargument

Not everyone is panicking. Ray Wang, a memory analyst at SemiAnalysis, argues that addressing a bottleneck doesn't reduce demand — it increases capability. "When you address a bottleneck, you help AI hardware to be more capable. And the training model will be more powerful in the future. When the model becomes more powerful, you require better hardware to support it," he told CNBC.

In other words: making models more memory-efficient doesn't mean fewer chips get sold. It means models get bigger, context windows get longer, and inference gets deployed to more users. The Jevons paradox — where efficiency gains increase total consumption — has played out in every prior computing cycle.

Analysts also noted that memory stocks had run up 200–300% over the past year. Much of the sell-off may simply be profit-taking with TurboQuant as the catalyst, not the cause.

Why it matters

This is a case study in how fast research can move markets. A single Google blog post — not a product launch, not an earnings report — wiped billions off the global memory sector in a day. It shows how tightly financial markets are now coupled to AI research output, and how a compression breakthrough can be as market-moving as a new chip announcement.

For developers, TurboQuant is straightforwardly good news: running models locally or at scale is about to get cheaper and faster. For investors, it's a reminder that the AI hardware supercycle has always had an efficiency risk embedded in it.

Also in the news

Mistral secured $830 million in debt financing to buy 13,800 Nvidia chips and build a major data center near Paris — Europe's most aggressive move yet for AI compute sovereignty.
Starcloud hit a $1.1 billion valuation after raising $170 million to build solar-powered data centers in orbit, betting AI's power bottleneck will eventually push compute off-planet.
Microsoft launched Copilot Cowork in early access — an agentic tool for multi-step tasks across Excel, Outlook, and SharePoint, signaling Microsoft 365's evolution into an agent platform.
Microsoft open-sourced VibeVoice, a new speech AI project for advanced voice synthesis and processing — another open-source push from Redmond's AI division.

Relevant links

← Back to updates