Scale with Parameters

DeepMind’s PEER scales language models with millions of tiny experts

Mixture-of-Experts (MoE) has become a popular technique for scaling large language models (LLMs) without exploding computational costs. Instead of using the entire model capacity for every input, MoE ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

DeepMind’s PEER scales language models with millions of tiny experts

Trending now