BestAIFor.com
DeepSeek

Huawei chips refine DeepSeek model in major leap for China’s AI self-reliance

D
Debby Wang
June 8, 202613 min read
Share:
Huawei chips refine DeepSeek model in major leap for China’s AI self-reliance

Huawei Chips Refine DeepSeek Models in a Major Leap for China's AI Self-Reliance

TL;DR

DeepSeek — the Hangzhou lab that trained a frontier model for a disclosed compute cost of roughly $5.5 million — has reportedly completed fine-tuning runs on Huawei's domestic Ascend 910C chips, extending China's AI self-reliance stack beyond NVIDIA hardware for the first time at meaningful scale. Verified, independent benchmarks proving Ascend parity with the H100 on full training workloads do not yet exist. What is not in doubt: China's AI supply chain is operational, not aspirational, and the timeline Western policymakers assumed for slowing it has compressed considerably.

Key Takeaways

  • DeepSeek disclosed its own training economics: DeepSeek-V3 was pre-trained on approximately 2,048 NVIDIA H800 GPUs over 2.788 million GPU-hours at a stated compute cost of around $5.576 million, according to DeepSeek's December 2024 technical report — a figure Western labs questioned before attempting to replicate it.
  • US export controls removed the H800 from legal supply in October 2023: The Bureau of Industry and Security's updated rule restricted the A800 and H800 — the China-export-compliant NVIDIA chips that had served as the working workaround — cutting off the compute stack DeepSeek had actually used to train V3.
  • Huawei Cloud deployed DeepSeek on Ascend within weeks of R1's release: By January 30, 2025, Huawei was routing DeepSeek inference traffic through Ascend cluster infrastructure, according to Reuters reporting from that date.
  • Fine-tuning on Ascend 910C has been reported but not independently benchmarked: Chinese research teams have claimed successful reinforcement-learning post-training runs of DeepSeek-R1 on Ascend 910B and 910C hardware. A peer-reviewed comparison against identical NVIDIA baselines has not been published as of this writing.
  • Ascend 910C specs remain partially opaque: Huawei has not released a complete datasheet at the disclosure level of NVIDIA's H100 spec sheet. Third-party estimates suggest the 910C meaningfully outperforms the 910B — which reportedly peaks around 256 TFLOPS at FP16 — but Huawei has not confirmed this figure publicly.
  • DeepSeek's architecture reduces the chip gap: The mixture-of-experts routing in V3 and the reasoning-distillation approach in R1 lower per-token compute demand significantly — making a domestically produced, lower-spec chip a more viable training substrate than raw FLOP comparisons suggest.
  • Multiple Chinese chip makers are in parallel development: Cambricon (Shanghai), Biren Technology, and Moore Threads are all shipping AI accelerators, giving Chinese labs supply-chain redundancy that did not exist two years ago.

What Happened — The Actual Story

Hangzhou is not the city Western AI executives typically cite when they track frontier model development. Shenzhen has the hardware culture; Beijing has the policy apparatus; Shanghai has the chips money. Hangzhou has Alibaba — and, since 2023, DeepSeek, the quant-funded lab that published a reasoning model in January 2025 that matched OpenAI's o1 on standard benchmarks and sent NVIDIA's stock down sharply in a single session.

The Western conversation that followed focused on training efficiency and what it meant for the AI scaling hypothesis. The hardware question got considerably less attention, which is a gap worth closing now.

Here is the hardware question: DeepSeek trained V3 on H800 GPUs — NVIDIA's China-export-compliant H100 variant, with capped NVLink bandwidth. Those chips are now off the table. The October 2023 BIS rule locked them out, and subsequent amendments have closed remaining loopholes. Whatever DeepSeek builds next, it cannot legally purchase the same compute stack that produced V3.

The pivot to Huawei Ascend has been underway since at least mid-2024. Huawei Cloud began running major Chinese LLMs on Ascend 910B hardware; by the time R1 landed, the company was positioned to route DeepSeek inference through Ascend clusters almost immediately. That inference migration is verified. What is newer — and still being assessed — is that fine-tuning workloads, specifically the reinforcement learning post-training that converts a raw language model into a reasoning engine, have been running on Ascend 910C hardware.

Fine-tuning is not pre-training, and the distinction matters. Pre-training a frontier model from scratch demands millions of GPU-hours on high-bandwidth chips operating in tight synchrony across thousands of nodes. RL post-training is compute-efficient by comparison and more tolerant of hardware heterogeneity and interconnect variation. Completing fine-tuning on Ascend is a genuine milestone. Extrapolating it to claim Ascend can replace NVIDIA for the full pre-training stack is a step the evidence does not yet support.

What this story actually is: China has a domestic chip that can refine a frontier model. What it is not yet: a fully documented, independently verified replacement for NVIDIA hardware across the complete training lifecycle.

The Evidence: Verified vs. Unconfirmed

What Is Verified

Huawei Cloud is running DeepSeek inference on Ascend clusters. This is publicly accessible — Huawei Cloud's AI services catalog confirms it, and the company has made it a marketing centerpiece. The inference latency and throughput figures Huawei has published are in a plausible range, though neutral third-party A/B comparisons against NVIDIA-hosted inference are not available.

DeepSeek's training methodology and cost figures are documented in peer-reviewable technical reports. Independent researchers have broadly confirmed the efficiency numbers hold up. DeepSeek is not a black box; it publishes.

The Ascend 910C exists and is shipping at scale. Alibaba Cloud, Baidu AI Cloud, and Huawei Cloud have all acknowledged deploying it. Quantitative shipment data from Huawei is not public.

What Is Reported But Not Independently Verified

Specific benchmark comparisons between Ascend 910C and NVIDIA H100 on identical AI workloads at the same scale. Huawei's own marketing materials claim competitive performance; this has not been verified to a standard that would satisfy a semiconductor review publication.

Full pre-training runs of a V3-class model on Ascend clusters. Whether a 2,000-chip Ascend cluster can deliver the speed, stability, and interconnect performance needed for a frontier pre-training run at commercial scale is, as of now, an open question with no definitive public evidence either way.

The Chip Comparison (With Appropriate Caveats)

ChipMemoryMemory BWPeak FP16 TFLOPSAvailable in ChinaStatus
NVIDIA H100 SXM80 GB HBM33.35 TB/s~3,958Restricted (BIS Oct 2023)Benchmark standard; not legally purchasable
NVIDIA H80080 GB HBM2e3.35 TB/s~3,958Restricted (BIS Oct 2023)DeepSeek V3 training chip; supply cut off
Huawei Ascend 910B64 GB HBM2~2.0 TB/s~256 (est., unconfirmed)AvailableWidely deployed; specs from third-party estimates
Huawei Ascend 910C64–96 GB HBMNot publishedNot publishedAvailableHuawei claims H100-competitive; unverified

The blanks in the Ascend 910C row are real blanks. Any analyst quoting confirmed FLOP counts for the 910C is working from unofficial sources and should say so.

What This Changes for Western Founders and Professionals

The practical implications differ sharply depending on your relationship to Chinese AI.

If you are building on DeepSeek's open-weight models, the hardware story has no immediate bearing on your work. You are running inference, not managing a training cluster. The weights are available; what matters to you is whether the lab continues releasing fine-tuned variants, whether it maintains its open-access publication policy, and whether model quality continues improving. None of those questions are resolved by knowing which chip the next version was fine-tuned on. Monitor the lab, not the chip.

If you are advising enterprises on AI supply-chain risk, the Ascend development changes a specific assumption. Many supply-chain audits have operated on the premise that US export controls create a hard ceiling on Chinese AI capabilities by limiting compute access. That premise needs revision — not because Ascend matches H100 today, but because the gap is demonstrably narrowing, and the Chinese AI ecosystem has shown enough efficiency innovation that raw FLOP parity is not required to produce competitive models. Revise your threat-model accordingly.

If you advise on China AI strategy or regulatory risk, the Ascend ramp is not a quiet back-channel development. It is explicitly state-backed, integrated into cloud procurement mandates, and framed publicly as AI infrastructure sovereignty. As Huawei's chairman noted in terms that were anything but subtle — the US export restrictions did not slow the company; they accelerated domestic development. That dynamic is fully documented and worth reading carefully. It is relevant context for any client briefing on the long-term efficacy of technology controls.

If you are an investor or founder evaluating Chinese AI partnerships, consider what the Ascend shift means for durability. A Chinese AI product running inference on Huawei Cloud's Ascend clusters, with models fine-tuned on domestic hardware, has a supply chain that US export controls cannot easily disrupt. That is a materially different risk profile from a Chinese product dependent on NVIDIA capacity routed through third-country cloud providers — a workaround that has historically been patchable by regulators.

When NOT to Overcorrect

Don't conclude Ascend matches H100 for full-scale pre-training. The fine-tuning milestone is real. Extrapolating it to claim China can train the next frontier-scale model entirely without NVIDIA hardware is not supported by current evidence. Pre-training at scale requires cluster interconnect performance, memory bandwidth, and software stack maturity that the fine-tuning story does not demonstrate.

Don't interpret DeepSeek's efficiency as proof that chips don't matter. The argument runs like this: DeepSeek proved you can do more with less; therefore hardware is no longer a bottleneck; therefore export controls are irrelevant. That logic doesn't hold. DeepSeek's architectural innovations reduce the cost differential between Ascend and H100; they don't eliminate it. At the frontier of model scale, raw compute still determines what is achievable.

Don't assume Ascend production is unconstrained. Huawei's chip manufacturing depends on SMIC's advanced-node processes, which are leading-edge by Chinese standards but behind TSMC's most current nodes. Yield rates, HBM memory supply, and advanced packaging constraints limit production volume in ways that are not publicly quantified. Huawei shipping Ascend at scale today does not mean it can ship at arbitrarily larger scale next year.

Don't project a linear catch-up trajectory. Chip development is not linear. The 910B-to-910C jump may or may not predict an equivalent step to the next generation on schedule. Hardware roadmaps slip in Shenzhen and Santa Clara alike.

Where This Is Heading

Software-hardware co-design is closing the loop. DeepSeek's architectural choices — MoE routing, latent attention, efficient RL post-training — were made partly with hardware constraints in mind. Expect future Chinese frontier models to be increasingly designed around domestic chip capabilities rather than around NVIDIA's feature set. This is not limitation adapting to circumstance; it is a design philosophy that historically produces durable competitive advantages, because the resulting model runs better on the available hardware than a model designed for a different chip and then ported.

Inference is already domestic; fine-tuning has followed; pre-training is the remaining frontier. The timeline for full pre-training on domestic chips at frontier scale is genuinely unknown. Estimates from people with better access than public reporting provides range from 18 months to several years. The honest answer is that the public data does not resolve this question, and anyone who tells you otherwise with confidence is extrapolating from incomplete information.

Chinese cloud providers will integrate DeepSeek more tightly over the next 12 months. Alibaba Cloud, Huawei Cloud, Baidu, and ByteDance's Volcengine are all building DeepSeek-compatible services on Ascend infrastructure. As this matures, the delivery layer for Chinese AI will look less like an NVIDIA cluster with Chinese software on top and more like a vertically integrated domestic stack. That convergence is the real self-reliance story — not any individual chip benchmark.

Watch the software toolchain, not just the chips. CUDA is one of NVIDIA's most durable competitive advantages — not the silicon, but the decade of software infrastructure built around it. Huawei's CANN (Compute Architecture for Neural Networks) is the domestic equivalent. It is improving, but it remains less mature and less extensively tested than CUDA across the model development workflow. CANN's progress is a more reliable indicator of long-term Ascend viability than any chip spec claim.

Geopolitical escalation is a variable, not a constant. Additional export controls targeting SMIC's process nodes, advanced packaging, or memory supply chains could extend timelines. Stable restrictions give the Ascend program a clear runway. Either way, the underlying trajectory — domestic hardware viable for at least some frontier AI workloads — is unlikely to reverse.

FAQ

Is DeepSeek now fully independent from NVIDIA hardware? No. DeepSeek's V3 and R1 models were trained on H800 GPUs, as documented in their own technical reports. The current Ascend milestone is at the fine-tuning and inference level. Full pre-training of a frontier-scale model on domestic hardware without NVIDIA has not been publicly documented with verifiable detail.

How does the Ascend 910C actually compare to the H100? Incompletely documented. Huawei claims competitive performance. Third-party benchmarks on identical workloads at equivalent scale, reviewed to a standard comparable to serious hardware publications, do not currently exist. The 910C is meaningfully better than the 910B; the gap to H100 has narrowed; by how much and under what conditions remains unverified.

Should Western companies building on DeepSeek's open-weight models worry about the hardware shift? The hardware story has no bearing on model quality for your inference use case. The questions that actually matter are: Does the lab maintain its open-access publication policy? Are future fine-tuned variants still released publicly? Does the model's open license remain permissive? Those are separate from, and more important than, what chip the model was refined on.

Are the US export controls working? Partially, temporarily, and with significant second-order effects. They slowed access to frontier compute. They did not prevent DeepSeek from publishing V3 and R1. They appear to have materially accelerated Chinese domestic chip investment, as the Ascend ramp demonstrates. Whether the net effect is positive for US national security interests is a policy question that hardware analysts and security researchers actively disagree on.

What should a Western founder actually do with this information? Probably not change your current stack. If you are building on DeepSeek's open-weight models, those models are competitive and open-licensed — continue using them. Add a clear provenance note to your supply-chain risk documentation. Monitor whether the lab's open-access policy holds. That is the proportionate response.

Is some of this Huawei marketing dressed as technical progress? Yes, some of it is. The performance claims Huawei has made about the 910C are ahead of what independent evidence supports. But the Huawei Cloud inference story is verified. The fine-tuning reports are credible and sourced from multiple teams. The state-backed infrastructure build-out is documented in public procurement. Dismissing the entire Ascend narrative as marketing misses a genuine shift in China's AI infrastructure.

What about Cambricon, Biren, and Moore Threads — are they relevant here? Real companies, real products, real revenue — mostly in inference and edge deployment. None are currently competitive with Huawei Ascend for frontier model training workloads. Their significance is structural: China's domestic AI chip program no longer has a single point of failure. For the near-term frontier training story, Huawei Ascend is the one to watch.

D
Debby Wang is BestAIFor's China AI Correspondent, covering the tools, startups, and policy shifts coming out of China's AI ecosystem. Based in Shenzhen, she writes for Western founders and professionals who want to understand what's actually happening - without the hype or the panic. Her focus areas include physical AI, robotics, medical applications, AI hardware, and the social and legal impact of automation.