DeepSeek R2 AI Model Rumors Begin to Swirl Online; Reported to Feature 97% Lower Costs Compared to GPT-4 & Fully Trained on Huawei's Ascend Chips

[email protected]

This post did not contain any content.

[email protected]

1.2T param, 78B active, hybrid MoE

That's enormous, very much not local, heh.

Here's the actual article translation (which seems right comparing to other translations):

::: spoiler Translation
DeepSeek R2: Unit Cost Drops 97.3%, Imminent Release + Core Specifications

Author: Chasing Trends Observer
Veteran Crypto Investor Watching from Afar
2025-04-25 12:06:16 Sichuan

Three Core Technological Breakthroughs of DeepSeek R2:

Architectural Innovation
Adopts proprietary Hybrid MoE 3.0 architecture, achieving 1.2 trillion dynamically activated parameters (actual computational consumption: 78 billion parameters).
Validated by Alibaba Cloud tests:

97.3% reduction in per-token cost compared to GPT-4 Turbo for long-text inference tasks
(Data source: IDC Computing Power Economic Model)

Data Engineering
Constructed 5.2PB high-quality corpus covering finance, law, patents, and vertical domains.
Multi-stage semantic distillation boosts instruction compliance accuracy to 89.7%
(Benchmark: C-Eval 2.0 test set)
Hardware Optimization
Proprietary distributed training framework achieves:

Application Layer Advancements - Three Multimodal Breakthroughs:

Industrial Inspection
Adaptive feature fusion algorithm reduces false detection rate to 7.2E-6 in photovoltaic EL defect detection
(Field data from LONGi Green Energy production lines)
Medical Diagnostics
Knowledge graph-enhanced chest X-ray multi-disease recognition:

98.1% accuracy vs. 96.3% average of senior radiologist panels
(Blind test results from Peking Union Medical College Hospital)

Key Highlight:
8-bit quantization compression achieves:

83% model size reduction
<2% accuracy loss
(Enables edge device deployment - Technical White Paper Chapter 4.2)
:::

Others translate it as 'sub-8-bit' quantization, which is interesting too.

agnos.is Forums