Three major capabilities land in Kimi K2-0905 that fundamentally change how coding agents operate: 256k context window, improved tool calling, and enhanced capabilities for front-end development. The model is live in Cline via the Groq (serving at ~349 TPS), Moonshot, OpenRouter, Fireworks, and Cline providers.
What changed from July
The July checkpoint put Kimi K2 on the map with strong tool calling and consistent diff generation (currently at 5%, which rivals Sonnet-4 at 4% and bests Gemini 2.5 Pro at 10%). K2-0905 builds on that foundation with focus on the capabilities that matter most for agent workflows.
Context window that actually scales
From 131k to 262k tokens help you work with larger codebases, conversation histories, and test suites in memory without the typical degradation at context boundaries.
The model's attention mechanism was specifically tuned for long-context scenarios. Token allocation is smarter, coherence is maintained across the full window, and you can finally stop engineering around context limits.
- Speed: Groq delivers responses fast enough that model latency stops being a workflow bottleneck. ~349 TPS serving capacity handles production workloads without throttling. Expect some warmup time on first requests (2-3 seconds), but subsequent requests in the same session are significantly faster.
- Context efficiency: The 256k window maintains coherence without the typical degradation you see in other long-context models. Long conversations stay focused, and the model doesn't lose track of earlier context when processing later tokens.
- Tool reliability: Expect consistent structured outputs with ~95% or better first-try success rate on well-formed tool schemas. The model rarely produces malformed JSON or unexpected parameter variations.
- Frontend Improvements: Moonshot has noted that K2-0905 is improved at frontend coding than its predecessor. We recommend using K2-0905 in Act mode where it can execute on the plan devised by a reasoning model.
Setup in Cline
Kimi-K2-0905 is available via the Cline, Groq, Fireworks, Vercel AI Gateway, and OpenRouter providers. It’s still $1/$3 in/out per 1M tokens through most providers, however, as an open-source model this can vary.