Kimi K2 is the large language model series developed by Moonshot AI team
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
Key Features
Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability.
MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up.
Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.
Model Variants
Kimi-K2-Base: The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions.
Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.
Architecture | Mixture-of-Experts (MoE) |
Total Parameters | 1T |
Activated Parameters | 32B |
Number of Layers (Dense layer included) | 61 |
Attention Hidden Dimension | 7168 |
MoE Hidden Dimension (per Expert) | 2048 |
Number of Attention Heads | 64 |
Number of Experts | 384 |
Selected Experts per Token | 8 |
Number of Shared Experts | 1 |
Vocabulary Size0 | 160K |
Context Length | 128K |
Attention Mechanism | MLA |
Activation Function | SwiGLU |