NEW KIMI AI | Coding with a Multimodal Long-Context Brain

Kimi K2 is the large language model series developed by Moonshot AI team

Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.


Key Features

Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability.

MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up.

Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.


Model Variants

Kimi-K2-Base: The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions.

Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.


https://www.kimi.com

https://aimode.co/app/kimi








ArchitectureMixture-of-Experts (MoE)
Total Parameters 1T
Activated Parameters32B
Number of Layers (Dense layer included)61
Attention Hidden Dimension7168
MoE Hidden Dimension (per Expert)2048
Number of Attention Heads64
Number of Experts384
Selected Experts per Token8
Number of Shared Experts1
Vocabulary Size0160K
Context Length128K
Attention MechanismMLA
Activation FunctionSwiGLU


Wes Roth
wes-roth