Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks



Xiaomi, the Chinese firm best known for its smartphones and electric vehicles, has lately been shipping some incredibly affordable and high-powered open source AI large language models.

The trend continued today with the release of Xiaomi MiMo-V2.5 and Xiaomi MiMo-V2.5-Pro, both available under the permissive, enterprise-friendly MIT License, making them suitable for use in production in commercial applications. Enterprises and individual/independent developers can now download either of the models (and more Xiaomi open source options) directly from Hugging Face, modify them as needed, and run them locally or on virtual private clouds as they see fit.

The most notable attribute of these models besides the open source licensing is that, according to Xiaomi's published benchmarks, they are among the most efficient available for agentic "claw" tasks, that is, powering systems such as OpenClaw, NanoClaw and Hermes Agent, in which users can communicate with them directly over third-party messaging apps and have the agents go off and complete tasks on the human user's behalf, such as making and publishing marketing content, running accounts, organizing email and scheduling, etc.

As Xiaomi's ClawEval benchmark chart shows, both MiMo-V2.5 and the Pro version in particular appear near the top left of the chart, indicating high performance in completing the benchmarked claw tasks while using the fewest amount of tokens — saving the human user money, especially in a world where more and more services such as Microsoft's GitHub Copilot are moving to usage-based billing (charging the human behind the agents for each token used rather than imposing rate limits like Anthropic or providing an "all-you-can-eat" buffet-style subscription like OpenAI).

In fact, the Pro model leads the open-source field with a 63.8% success rate, consuming only ~70K tokens per trajectory.

This is roughly 40–60% fewer tokens than those required by Anthropic Claude Opus 4.6, Google Gemini 3.1 Pro, and OpenAI GPT-5.4 to achieve comparable results.

By combining a massive 310B-parameter architecture with a highly efficient "active" footprint and a native 1-million-token context window, Xiaomi MiMo is challenging the dominance of closed-source frontier models from Google and OpenAI, especially when it comes to the latest and greatest craze in enterprise AI deployments — agentic tasks and "claws" similar to OpenClaw.

A two-pronged pincer

Xiaomi has released two distinct versions of the model to serve different ends of the development spectrum: MiMo-V2.5 (the "Omni" multimodal specialist) and MiMo-V2.5-Pro (the "Agent" specialist).

While the base model provides native multimodality, the MiMo-V2.5-Pro is specifically engineered for "long-horizon coherence" and complex software engineering.

On the GDPVal-AA (Elo) benchmark, the Pro model achieved a score of 1581, surpassing competitors like Kimi K2.6 and GLM 5.1.

Xiaomi researchers further released data on several high-complexity tasks performed autonomously by V2.5-Pro:

  • SysY Compiler in Rust: The model implemented a complete compiler from scratch—including lexer, parser, and RISC-V assembly backend—in 4.3 hours. Spanning 672 tool calls, the model achieved a perfect 233/233 score on hidden test suites, a task that typically takes a computer science major several weeks.

  • Full-Featured Video Editor: Over 11.5 hours and 1,868 tool calls, the model produced an 8,192-line desktop application featuring multi-track timelines and an export pipeline.

  • Analog EDA Optimization: In a graduate-level engineering task, the model optimized a Flipped-Voltage-Follower (FVF-LDO) regulator in the TSMC 180nm process. By iterating through an ngspice simulation loop, the model improved metrics like line regulation by 22x over its initial attempt.

These experiments highlight a "harness awareness" in V2.5-Pro, where the model actively manages its own memory and shapes its context to sustain coherence over thousands of sequential tool calls.

Over the API, Xiaomi is pricing the models at competitive rates for both domestic (Chinese) and international markets (like the U.S.). For overseas developers, the high-performance MiMo-V2.5-Pro is priced at $1.00 per million input tokens (for a cache miss) and $3.00 for output within context windows up to 256K.

For ultra-long context tasks between 256K and 1M tokens, the cost doubles to $2.00 for input and $6.00 for output, though the architecture’s caching capabilities offer significant relief, reducing input costs to as little as $0.20 to $0.40 per million tokens upon a cache hit.

Domestically, these rates are mirrored in yuan, with the Pro model starting at ¥7.00 per million input tokens for standard context and reaching ¥14.00 for the extended 1M range. Meanwhile, the base model starts at just $0.40 USD for overseas input per million tokens and $2.00 per million output, putting it among the more affordable third of leading LLMs globally (see our chart below):

Model

Input

Output

Total Cost

Source

Grok 4.1 Fast

$0.20

$0.50

$0.70

xAI

MiniMax M2.7

$0.30

$1.20

$1.50

MiniMax

MiMo-V2.5 Flash

$0.10

$0.30

$0.40

Xiaomi MiMo

Gemini 3 Flash

$0.50

$3.00

$3.50

Google

Kimi-K2.5

$0.60

$3.00

$3.60

Moonshot

MiMo-V2.5

$0.40

$2.00

$2.40

Xiaomi MiMo

MiMo-V2-Pro (≤256K)

$1.00

$3.00

$4.00

Xiaomi MiMo

GLM-5

$1.00

$3.20

$4.20

Z.ai

GLM-5-Turbo

$1.20

$4.00

$5.20

Z.ai

DeepSeek V4 Pro

$1.74

$3.48

$5.22

DeepSeek

GLM-5.1

$1.40

$4.40

$5.80

Z.ai

Claude Haiku 4.5

$1.00

$5.00

$6.00

Anthropic

Qwen3-Max

$1.20

$6.00

$7.20

Alibaba Cloud

Gemini 3 Pro

$2.00

$12.00

$14.00

Google

GPT-5.2

$1.75

$14.00

$15.75

OpenAI

GPT-5.4

$2.50

$15.00

$17.50

OpenAI

Claude Sonnet 4.5

$3.00

$15.00

$18.00

Anthropic

Claude Opus 4.7

$5.00

$25.00

$30.00

Anthropic

GPT-5.5

$5.00

$30.00

$35.00

OpenAI

GPT-5.4 Pro

$30.00

$180.00

$210.00

OpenAI

To lower the barrier for agentic development further, Xiaomi has made cache writing free of charge for a limited time across all models, alongside a total fee waiver for the entire MiMo-V2.5-TTS suite, which includes its specialized voice cloning and design features.

This pricing logic is clearly designed to accelerate the transition from simple chat applications to persistent, long-horizon agents that can operate at a fraction of the cost of legacy frontier models.

Xiaomi has also introduced an overhauled version of its subscription offerings, called the "Token Plan," now available in four levels:

  • The Lite "Starter Pack" provides 720 million credits for $63.36 USD per year

  • Standard tier offers 2.4 billion credits for $168.96 per year

  • A Pro tier provides 8.4 billion credits for $528.00 per year (designed for enterprise use cases)

  • Max —aimed at high-intensity coding enthusiasts—delivers 19.2 billion credits for $1,056.00 per year

Beyond credit allotments, all plans include preferential API rates, a 20% discount for off-peak calls, and "Day-0" support for popular coding scaffolds like Cursor, Zed, and Claude Code.

However, both through the API and via the Token Plan, accessing the Xiaomi models from China may present barriers or additional compliance and regulatory risks to U.S.-based enterprise customers. As such, the best bet for U.S. enterprises concerned about relying on Chinese tech but wanting to take advantage of the low cost and open source models is likely setting up their own virtual private clouds or local servers, downloading the model weights, and running the models domestically.

MoE architecture but divergent training regimens for V2.5 and V2.5-Pro

At the heart of MiMo-V2.5 is a Sparse Mixture-of-Experts (MoE) architecture. While the model boasts a total of 310 billion parameters, only 15 billion are "active" during any given inference cycle.

Meanwhile, V2.5-Pro is 1.02 trilion-parameter Mixture-of-Experts model with 42 billion active parameters.

In either case, the design functions much like a specialized research hospital: while the facility has hundreds of doctors (parameters), only the specific specialists required for a particular case (query) are called into the room.

This massive increase in parameter volume for the Pro version provides the "neural capacity" required for the deep, multi-step reasoning found in complex software engineering and long-horizon tasks, as though even more specialists are available in an even larger hospital.

According to Xiaomi's blog post, the regular V2.5 follows a rigorous five-stage evolution:

  1. Text Pre-training: Building a massive language backbone on 48 trillion tokens.

  2. Projector Warmup: Aligning in-house audio and visual encoders with the language core.

  3. Multimodal Pre-training: Scaling across high-quality cross-modal data.

  4. Agentic Post-training: Progressively extending the context window from 32K to 1M tokens.

  5. RL and MOPD: Utilizing Reinforcement Learning and Multimodal Preference Optimization (MOPD) to sharpen real-world reasoning and perception.

The backbone utilizes a hybrid sliding-window attention architecture, inherited from MiMo-V2-Flash, which optimizes how the model "remembers" long-range information. This technical foundation enables MiMo-V2.5 to see, hear, and reason natively, rather than relying on external "plug-in" tools for visual or auditory processing.

Conversely, the training of MiMo-V2.5-Pro prioritizes "action space" over sensory perception. Instead of sensory alignment, the Pro model’s training focus shifts toward scaling post-training compute.

This process is designed to instill "harness awareness," where the model is specifically trained to manage its own memory and context within autonomous agent scaffolds like Claude Code or OpenCode.

While the base V2.5 model is trained to reason across modalities, the Pro version is trained to sustain coherence across more than a thousand sequential tool calls.

The standard V2.5 model balances local and global attention to maintain multimodal perception. The Pro model, however, utilizes an increased hybrid attention ratio—evolving from the 5:1 ratio of previous generations to a more aggressive 7:1 ratio.

This allows the Pro model to "skim" the vast majority of its context while applying high-density attention to the specific 15% of data most relevant to its current objective, a critical feature for debugging large repositories or optimizing graduate-level circuits.

Finally, while both models undergo Reinforcement Learning (RL) and Multimodal Preference Optimization (MOPD), the objectives of these stages differ.

For MiMo-V2.5, the RL stage is used to sharpen perception and multimodal reasoning. For MiMo-V2.5-Pro, RL is focused on instruction following within agentic scenarios, ensuring the model adheres to subtle requirements embedded deep within ultra-long contexts and recovers gracefully from errors during autonomous execution.

This results in the Pro model's "self-correcting" discipline, as seen in its ability to diagnose and fix regressions during the 4.3-hour SysY compiler build.

Full MIT License is perfect for enterprise use cases

In a move that distinguishes it from many "open" models that include restrictive "Acceptable Use" policies, Xiaomi has released MiMo-V2.5 under the MIT License.The MIT License is the gold standard of permissive software licensing. For developers and enterprises, this means:

  • No Authorization Required: Companies can deploy the model commercially without seeking explicit permission from Xiaomi.

  • Continued Training: Developers are free to fine-tune the model on proprietary data and even release those derivative weights.

  • Unrestricted Commercial Use: There are no revenue caps or user-base limits that often plague "community" licenses.

By choosing MIT over a custom "open weights" license, Xiaomi is positioning MiMo as the foundational infrastructure for the next generation of AI agents, effectively inviting the global developer community to treat the model as a public utility.

Xiaomi's background: from smartphones and EVs to Chinese open source AI darling

Xiaomi’s pivot toward frontier AI agents is the logical culmination of a decade spent building one of the world's most dense hardware-software flywheels.

Founded in 2010 as a smartphone disruptor, the Beijing-based company has executed a high-stakes transition into a vertically integrated powerhouse defined by its "Human x Car x Home" strategy. This ecosystem now encompasses over 823 million connectable smart devices unified under the HyperOS architecture.

The company’s 2024 entry into the automotive sector with the SU7 and the subsequent high-performance YU7 SUV served as a proof of concept for this integration, positioning Xiaomi as a direct competitor to global luxury marques.

By investing 200 billion yuan ($29B USD) into foundational R&D for chips and operating systems, Xiaomi has moved beyond consumer electronics assembly; it has become an architect of the "action space," using its massive hardware footprint as the primary testing ground for the agentic intelligence found in the MiMo-V2.5 series.

Ecosystem support

The release has been met with immediate "Day-0" support from the broader AI ecosystem. The MiMo team announced that SGLang and vLLM—two of the most popular high-throughput inference engines—supported the V2.5 series at launch.

This was made possible through hardware partnerships with AWS, AMD, T-HEAD, and Enflame, ensuring the model can run efficiently on everything from cloud-based H100s to domestic Chinese accelerators.

Fuli Luo, the project lead at Xiaomi MiMo and a former key member of the DeepSeek team, underscored the philosophy behind the release on X (formerly Twitter):

"A model's value isn't measured by rankings alone — it's measured by the problems it solves. Let's build with MiMo now!"

To kickstart this building phase, Luo announced a 100-trillion free token grant for builders and creators. This massive incentive is designed to lower the barrier to entry for developers who want to experiment with the 1M context window without immediate financial risk.

The economic realignment: open source vs. metered proprietary

The launch arrives at a critical juncture for AI economics. The shift toward usage-based billing marks the definitive end of the "all-you-can-eat" buffet era for AI services, a trend underscored by GitHub’s announcement today that its AI coding assistant Github Copilot will transition all plans to metered, token-based credits.

As seat-based predictability gives way to consumption-driven costs, premium agentic workflows—which can consume millions of tokens in a single reasoning session—are becoming increasingly difficult for enterprises to budget.

User sentiment has turned predictably cynical, with developers lamenting that they will "get less, but pay the same price" as subscriptions convert into finite allotments. This pricing evolution significantly enhances the strategic appeal of the MiMo series. By releasing under a permissive MIT License, Xiaomi allows organizations to bypass the escalating "SaaS tax" and reclaim financial predictability through private deployment.

Crucially, Xiaomi has eliminated the "context tax" for its API. The 1-million-token context window is now billed at the standard rate—1 token = 1 credit for V2.5 and 2 credits for the Pro version—with no additional multiplier. This stands in stark contrast to the industry-wide move toward session-based caps, positioning MiMo as a refuge for cost-sensitive, high-volume development.

Analysis for enterprises

The launch of MiMo-V2.5 is more than just a weight drop; it is a declaration of independence for the open-source community.

By matching Claude Sonnet 4.6 in multimodal agentic work and Gemini 3 Pro in video understanding, Xiaomi has proven that the gap between "closed-door" labs and open research is effectively closed.

With the MIT license as a catalyst and a 100T token grant as fuel, the coming months will likely see a surge in specialized, agentic applications built on the MiMo backbone.

Confirming the project's ambitious trajectory, the team noted they are already training the next generation, focusing on "deeper reasoning" and "richer real-world grounding". For now, MiMo-V2.5 stands as a testament to the power of sparse architectures and permissive licensing in the race toward functional AGI.



Leave a Reply

Your email address will not be published. Required fields are marked *