Google Unveils Two New AI Chips at Cloud Next, Dropping x86 in Favour of Arm

Google has unveiled its latest generation of Tensor Processing Units (TPUs) alongside a dedicated inference accelerator, signalling the company's intent to compete aggressively in a rapidly escalating AI hardware market dominated by Nvidia and increasingly challenged by Amazon, Microsoft, and a clutch of well-funded startups.

The eighth-generation TPU — TPU 8 — is designed primarily to handle the computationally intensive task of training large AI models, a workload that demands enormous memory bandwidth and floating-point performance. The second chip targets inference, the process of running a trained model to generate responses, which is where the bulk of day-to-day AI serving costs are incurred.

Perhaps as significant as the chips themselves is the architectural decision accompanying the announcement: Google is pairing both accelerators with its Arm-based Axion processors rather than the x86 CPUs that have long underpinned data centre workloads. Axion, Google's first custom Arm server CPU, was introduced in 2024 and is designed to deliver better performance-per-watt than comparable x86 options.

The move reflects a broader industry trend away from general-purpose x86 architecture toward purpose-built silicon. Amazon Web Services has pursued a similar path with its Graviton (Arm CPU) and Trainium (AI training) chips, while Microsoft has developed its own Maia AI accelerator. The shift puts Intel and AMD — whose server CPUs have long powered cloud data centres — under increasing pressure.

Google's announcement comes as the company faces intensifying competition from Nvidia, whose H100 and Blackwell GPU families remain the default choice for most AI training workloads. By developing its own silicon, Google aims to reduce its dependence on external chip suppliers, lower infrastructure costs, and offer cloud customers a differentiated alternative.

The inference-focused chip is particularly notable given industry economics: as AI model deployment scales to billions of daily queries across products like Gemini, the marginal cost of each inference request becomes a critical business variable. A more efficient inference chip could translate directly into lower operating costs and, potentially, more competitive pricing for Google Cloud customers.

Google has not yet published detailed performance benchmarks or pricing for either chip, and independent verification of the company's claims is not yet available. The broader market will be watching whether the new silicon can meaningfully challenge Nvidia's entrenched position.

Why This Matters

AI chip performance and cost directly affect the price and capability of cloud AI services that businesses and consumers rely on daily — more efficient chips could lower barriers to AI adoption.
Google's departure from x86 in its AI infrastructure stack increases pressure on Intel and AMD while validating Arm's growing role in data centre computing.
If Google's custom silicon proves competitive, it could give Google Cloud a cost and performance edge over AWS and Azure, reshaping the cloud market's competitive dynamics.

Background

Google has been developing custom AI silicon since 2016, when it first revealed its Tensor Processing Units — chips designed specifically for neural network workloads rather than adapted from general-purpose GPU designs. The original TPU was built to handle inference for Google's own search and translation products, with later generations expanding into training tasks.

The TPU lineage has evolved through multiple generations, with each iteration improving raw throughput and memory bandwidth. Alongside TPUs, Google introduced its Axion CPU in 2024, a custom Arm-based server processor designed to replace x86 chips in certain cloud workloads — a direct parallel to Amazon's Graviton programme, which has steadily taken share from Intel's Xeon line within AWS.

The broader context is an AI infrastructure spending boom. Cloud hyperscalers collectively committed hundreds of billions of dollars to AI-related capital expenditure in 2025 and 2026, with custom silicon seen as one of the few levers available to control costs as GPU prices and demand remain elevated.

Key Perspectives

Google Cloud: Framing the new chips as a cost and performance breakthrough, Google positions TPU 8 and its inference counterpart as evidence that customers need not rely solely on third-party silicon. The Axion pairing is presented as a coherent, vertically integrated stack.

Nvidia and incumbent GPU vendors: Nvidia's Blackwell architecture remains the benchmark against which all AI accelerators are measured. The company benefits from a mature software ecosystem (CUDA) that custom chips must either replicate or replace — a significant switching cost for developers.

Critics and industry analysts: Some observers note that Google has a mixed history of commercialising its internal technologies. TPUs have historically been most accessible to Google's own teams, and whether external cloud customers will adopt them at scale — given the inertia of the CUDA ecosystem — remains an open question. Benchmark transparency will be essential to evaluating real-world claims.

What to Watch

Independent benchmark results comparing TPU 8 training throughput and inference cost against Nvidia H100/Blackwell equivalents.
Adoption rates among Google Cloud enterprise customers, particularly those currently running large Nvidia GPU workloads.
Intel and AMD's response, including any announcements regarding next-generation server CPU roadmaps aimed at retaining hyperscaler contracts.

ZOTPAPER