Mixture of Grouped Experts (MoGE) balances expert workloads by constraining tokens to activate an equal number of experts per group.
MoGE’s grouping design ensures even computational load across devices, boosting inference throughput.
Pangu Pro MoE has 72 billion parameters, with 16 billion activated per token, and is optimized for Ascend NPUs.
In experiments, Pangu Pro MoE achieves 1148 tokens/s per card, rising to 1528 tokens/s with speculative acceleration.
Pangu Pro MoE outperforms comparable 32B and 72B dense models and delivers a strong cost-to-performance ratio on Ascend hardware.
Get notified when new stories are published for "🇺🇸 Hacker News English"