DeepSeek’s Next Move: From “Strongest Base Model” to “Most Productive Tool”

DeepSeek’s Next Move: From “Strongest Base Model” to “Most Productive Tool”
Where DeepSeek Stands Today

If you had to sum up DeepSeek’s position in early summer 2026 in one sentence, it would be this: they have proven they build the best open-weight models—now they need to prove they can turn that strength into real products and revenue.

The V4 series, fully released on April 24, speaks for itself. The top-end V4-Pro runs 1.6T total parameters with 49B activated; the efficient V4-Flash uses 284B total with just 13B activated. Both come standard with 1M context windows, use CSA/HCA hybrid attention to cut KV cache to 10–27% of V3.2 levels, and are fully open-sourced under the MIT license. On SWE-bench Verified, V4 hits 80.6%, surpassing Claude Sonnet 4.5 in agentic coding and closing in on Opus 4.6’s non-thinking mode. Then, on May 30, DeepSeek unveiled Thinking with Visual Primitives—a multimodal architecture using V4-Flash as the language backbone, a self-developed ViT encoder, and support for arbitrary image resolutions.

At the model level, DeepSeek is already near the ceiling. The real question isn’t whether the next model will be stronger—it’s how that strength turns into usable products and actual income.

Hiring Signals: What DeepSeek Is Actually Building

DeepSeek’s job postings over the past two months tell a clearer story than any press release.

The Harness Team: The Most Direct Signal

In mid-to-late May, DeepSeek quietly posted two new roles: Agent Harness Product Manager and Agent Harness Software Engineer. The descriptions were explicit: “Transform DeepSeek’s frontier model capabilities into leading Agent products,”and “Build desktop Agent products from the ground up.”

Senior researcher Deli Chen put it even more plainly on X: “Join DeepSeek to build the Code Harness from scratch—DeepSeek’s answer to Claude Code.”

The hires themselves are revealing. Cui Tianyi, formerly at Jane Street, was brought in to lead the new AI Harness team—not another paper-writing researcher, but someone who has built real systems and understands what breaks when model outputs hit execution environments. Meanwhile, Xu Mingyu, co-author of DeltaFormer and former ByteDance Seed member, joined the model architecture group—confirming that foundational model work is still very much alive.

Inside DeepSeek, the equation is simple: Model + Harness = Agent. The model is the engine; the Harness is the transmission and steering wheel. Until now, DeepSeek focused almost entirely on the engine. That balance is shifting.

Infrastructure: Moving Beyond Renting GPUs

On June 9, DeepSeek posted an IDC Design & Planning Engineer role in Hangzhou. This isn’t routine ops. The job covers end-to-end data center planning: campus layout, power systems, cooling, networking. Combined with aggressive hiring for senior infrastructure roles in Ulanqab since April, the message is clear: DeepSeek is moving from renting computeto owning compute, with plans for GW-scale self-built clusters.

Two tracks are converging: build the hardware foundation, and build the software layer to run on it. This is no longer a lab; it’s a full-stack technology company taking shape.

Three Directions: What Comes Next?
Direction One: Agent / Code Harness (Claude Code Competitor) — The Most Likely First Launch

Of the three paths, this one has the strongest signals, the clearest team, and the most urgent market window.

Why now? Because Claude Code currently defines the state of the art in AI coding tools—and it isn’t available to developers in China. That gap is DeepSeek’s opening. At the same time, the domestic market for coding agents is exploding. CICC estimates the global AI coding market could reach $23 billion by 2030, with long-term potential near $700 billion. The top three players already control ~70% of the market—first-mover advantage matters.

From the job descriptions, DeepSeek’s Code Harness won’t be a thin CLI wrapper around an API. It’s being designed as a desktop-native agent system, with tool calling, sandbox execution, and file system access—and crucially, with real task feedback flowing back into training loops for RLHF/RLVR. The product will reshape the model, not just consume it.

My prediction: Within the next 3–6 months, DeepSeek will launch a developer-focused DeepSeek Code / Agent CLI, likely powered by V4-Flash on the backend and the Harness team’s execution layer. Expect the usual DeepSeek playbook: open-source core, aggressively low API pricing, and performance that competes with closed models. Its edge won’t be UI polish—it’ll be that the model is theirs, the cost is under control, and it already scores 80%+ on SWE-bench.

This is classic vertical integration: a model company building the product layer to outcompete pure application startups.

Direction Two: The Next Foundation Model (V4.1 → V5)

This isn’t speculation—the roadmap is already visible.

While V4 made waves in agentic coding and efficiency, it remains text-only. The expected V4.1 update (targeted for June 2026) is slated to add native multimodal input (image + audio), deep MCP protocol support, and enterprise tooling (fine-tuning, private deployment, permissions). The late-May preview of Thinking with Visual Primitivesalready demonstrated a viable path for visual understanding.

Beyond that, the Engram module research—co-authored by Liang Wenfeng—points toward a next-gen architecture that reduces attention-layer computation exponentially by storing frequently used information hierarchically. That’s not a V4.x tweak. That’s a V5-level leap.

Prediction: The next major model release will combine native multimodal MoE, deeper agent-native interfaces, and even more extreme inference cost compression. But it will trail the Harness launch—training cycles are a hard constraint, while a Harness product can ship first and iterate on top of V4-Flash.

Direction Three: Multimodal

Multimodal isn’t a separate track—it’s the enabling layer running through everything else.

DeepSeek has been restrained here. Unlike labs that rush out text-to-image demos, DeepSeek seems to be following a different sequence: perfect the language model first, then attach vision. The May 30 technical preview confirms that vision is coming online. V4.1’s expected image and audio inputs aren’t standalone features—they exist so an agent can see screenshots, read schematics, and process recordings.

Expect multimodal to show up first in Code Harness (interpreting UI errors, design mockups), then in enterprise workflows (document parsing, meeting transcription), and only later in consumer-facing generation features—which are unlikely to be a priority.

Putting It All Together: Not Three Choices, But One Timeline

Here’s how the pieces fit:

Direction

Certainty

Speed

Business Urgency

Likely Form

① Code Harness / Agent Product

★★★★★

Fastest (team in place, 3–6 month horizon)

Highest (developer mindshare window closing)

CLI / desktop agent, open-source + low-cost API

② Foundation Model (V4.1 → V5)

★★★★☆

Slower (training cycles)

High (but model leadership already established)

V4.1: multimodal + MCP; V5: architectural leap

③ Multimodal

★★★★☆

Medium (tech ready, productization follows ① & ②)

Medium (enabler, not standalone revenue driver)

Vision-first for agents, expanding to broader use cases

The first thing DeepSeek ships that makes people say “wow” will almost certainly be an agent product—not a model weight update. Another strong benchmark number is expected. A locally available, Chinese-developed alternative to Claude Code is not.

Meanwhile, V4.1 and V5 will keep moving in parallel—less as competing priorities, more as supply lines for the Harness. Better models make better agents. Better agents generate better data. That loop is where the real race is.

Closing Thought

DeepSeek has always played a clear game: don’t chase the flashiest demo—build the cheapest, strongest model. But with major funding secured, data center plans underway, and the industry shifting from “whose model scores higher” to “whose agent actually works,” the engine alone isn’t enough anymore. You need the chassis, the wheels, and a way to get it into people’s hands.

The hiring over the past two months says it plainly: the Harness team is forming, the infrastructure team is forming, and the model team is still pushing forward. DeepSeek’s next answer won’t be a higher benchmark score. It’ll be something developers open every day to get real work done.

They’ve won the first half with models. The second half will be decided by whether they can build China’s Claude Code—and make it better.

DeepSeek’s Next Move: From “Strongest Base Model” to “Most Productive Tool”

Get weekly China AI intelligence in English

Community Feedback