Alibaba Qwen3.7-Max Ranks First Among Domestic Models. Are AI Large Models Starting to Work Independently?

Did you follow Alibaba Cloud Summit today, May 20, 2026? Alibaba unveiled a major breakthrough — its brand-new flagship model Qwen3.7-Max.

Frankly speaking, in the past, we mainly judged large models by their chatting performance, yet the trend has shifted. Top-tier models nowadays are truly capable of getting practical work done.

Solid Blind Test Data

Let’s cut to the latest third-party blind test results. Qwen3.7-Max delivered outstanding performance, outperforming domestic rivals including Kimi-K2.6 and DeepSeek-v4-pro to claim the top spot among homegrown large models, and it also keeps pace with world-leading models such as GPT and Claude.

Key noteworthy data are listed below:

Coding capability: It scored 69.7 points in Terminal Bench evaluation, higher than Claude-Opus 4.6.

Advanced reasoning: It outperformed all domestic models in difficult tests like GPQA Diamond.

Instruction following: It gained 79.1 points in IFBench assessment, proving its improved ability to comprehend complex commands.

Most Impressive Practical Case

Scores alone may not be persuasive, but a real case demonstrated on-site amazed many people.

Alibaba assigned Qwen3.7-Max to operate on Zhenwu M890, a brand-new chip it had never accessed before. Without any human intervention, the AI worked continuously for 35 hours, completing over 400 evaluations and more than 1000 tool calls.

Eventually, the code optimized independently by the model ran 10 times faster than the official original version. It is just like the AI independently upgraded hardware drivers, which sounds rather futuristic.

Fierce Competition Among Domestic Large Models

Apart from Alibaba, other tech giants are also making rapid progress recently:

Tencent Hunyuan: Its token usage has surged tenfold, indicating rising popularity in office scenarios.

Baidu ERNIE: Its pre-training cost has been reduced to 6% of the industry average, showing remarkable cost control strength.

StepStars: Its pre-installation volume on mobile phones has exceeded 42 million, gaining growing popularity among the public.

Personal Opinions

I quite agree with Zhou Jingren, relevant person in charge of Alibaba, who said, “We used to pursue models that speak well, while now we demand models that get things done.”

Indeed, AI is evolving from chatty conversational tools into digital employees that solve practical problems. For ordinary users, AI is expected to handle more tedious workflows efficiently in the future.

Although China lags slightly behind in the field of task-executing large models compared with overseas intelligent agents, it has officially embarked on this development path. This will inevitably make the intelligent agent track more competitive. On the other hand, it also brings benefits to ordinary users, allowing them to have more options to pick large model-based intelligent agents that suit their own needs.