Mixture of Grouped Experts (MoGE) groups experts and enforces equal activation per group to balance workload.
MoGE’s design distributes computation evenly across devices, boosting inference throughput.
Pangu Pro MoE is a 72 billion parameter MoGE model that activates 16 billion parameters per token, optimized for Ascend NPUs.
Experiments demonstrate MoGE improves expert load balancing and efficiency during training and inference on Ascend NPUs.
Inference achieves 1148 tokens/s per card, rising to 1528 tokens/s with speculative acceleration, outperforming 32B and 72B dense models.
Pangu Pro MoE delivers a strong cost-to-performance ratio on Ascend 300I Duo and outperforms prominent 32B open-source models.
Get notified when new stories are published for "General AI News"