Which writes better code, GPT-5 or Claude Sonnet 4.6?

For agent coding (Cursor, Cline, Aider, SWE-bench), Claude Sonnet 4.6 is stronger; for single-point reasoning, algorithm problems, and HumanEval-style benchmarks, GPT-5 is stronger. In 2026 most working engineers lean Claude because its tool calling is steadier and it does not edit unrelated code.

How much do the two differ in price?

Claude Sonnet 4.6 is about $3 input / $15 output per million tokens. GPT-5 is about $2.50 / $10. GPT-5 is 30-40% cheaper overall, and Claude Haiku 4.5 is much cheaper than both ($0.80 / $4).

Which is better for Cursor?

Claude Sonnet 4.6 is Cursor's default recommendation, steadier on multi-file edits, tool calling, and not breaking the existing code style. GPT-5 is the best alternative for explaining code, refactoring approaches, and solving tricky problems.

Which has the longer context?

GPT-5 is 400K, Claude Sonnet 4.6 is 200K standard (1M beta). Both recall well within 100K, and beyond 500K Claude is slightly steadier. For repos over 500,000 tokens, Gemini 2.5 Pro is still the recommendation.

深度对比 · 2026-05-10 · by @zayuerweb-dev

GPT-5 vs Claude Sonnet 4.6: Which to Pick for Coding

For serious coding there are really two AI choices: OpenAI's GPT-5 (including Pro) and Anthropic's Claude Sonnet 4.6. I wired both into Cursor and ran them for a week, cross-checked against SWE-bench numbers and official prices. This piece tells you which to use for which job. None of the "they're both great" hand-waving.

30-second verdict

Daily agent coding (Cursor / Cline / Aider): Claude Sonnet 4.6. Steady tool calling, doesn't go off-script, first on SWE-bench.
System design, hard algorithms, fuzzy requirements: GPT-5 / GPT-5 Pro. Leads on asking the right questions and reasoning depth.
Tight budget or high volume: GPT-5 (30-40% cheaper), or go further with Claude Haiku 4.5 / DeepSeek R1.
Loading a whole repo to ask about it: neither is optimal, pick Gemini 2.5 Pro (2 million context).
If you can't decide: Claude as the default, switch to GPT-5 Pro for hard problems.

Core specs compared

Dimension	Claude Sonnet 4.6	GPT-5
Input price (per 1M tokens)	$3.00	$2.50
Output price	$15.00	$10.00
Context window	200K (1M beta)	400K
SWE-bench Verified	~70%	~65%
HumanEval	~94%	~96%
LiveCodeBench	~72%	~78%
Tool-calling reliability	Strongest	Very good
Cursor default	Yes	Alternative

Official price, per 1M tokens, May 2026

Sources: Anthropic / OpenAI official pricing pages, SWE-bench Verified, LiveCodeBench, and public Vellum evaluations, current as of May 2026.

Real scenario 1: fixing a cross-file bug in Cursor

Same task: "the error message doesn't show when login fails, please find and fix it." It spans three files: a front-end component, an API route, and error-handling middleware.

How Claude Sonnet 4.6 did: read the 3 relevant files, traced it to the middleware swallowing the error, produced a patch, passed type checks, and changed only the necessary lines. Done in one pass.

How GPT-5 did: read the same files, found the same root cause, but also "optimized" two unrelated early-return styles in the middleware. The code was correct, but the diff was 3x larger than Claude's. You have to manually strip out the unrelated changes.

Verdict: Claude is more restrained in agent mode. That's why Cursor, Cline, and Aider set it as the default. The bigger the codebase and the more people reviewing PRs, the more this matters.

Real scenario 2: algorithm / system design

"Design a URL shortener that handles 1 million QPS, covering consistency, capacity estimates, and a degradation plan."

GPT-5 Pro: asked clarifying questions first: "Read-heavy or write-heavy? What's the budget? Do you need custom-suffix analytics?" Then gave three designs and noted the trade-offs of each.

Claude Sonnet 4.6: went straight to a complete, good-quality design, but with a weaker instinct to ask questions.

Verdict: for open-ended questions, system design, and interview problems, GPT-5 Pro is clearly steadier. The real reasoning edge is "knowing to ask," not how prettily it writes up the answer.

Real scenario 3: cost-effectiveness

The same 100-file monorepo run through one AI code review. Estimated totals: 800K tokens in, 200K out.

Claude Sonnet 4.6: $3 × 0.8 + $15 × 0.2 = $5.40 per run.
GPT-5: $2.50 × 0.8 + $10 × 0.2 = $4.00 per run.
Claude with prompt caching: ~$1.50.
GPT-5 with batch: ~$2.00.

GPT-5's sticker price is cheaper, but Claude's prompt-caching discount is more aggressive. For a product that reuses the same system prompt repeatedly, Claude's real cost can come out below GPT-5. Test it on your own data.

When GPT-5 is the better fit

Agents that need to ask questions. For example, having the model read a spec before writing code, GPT-5 Pro's question quality is clearly higher.
Competitive programming, LeetCode. GPT-5 edges ahead on HumanEval and LiveCodeBench.
Image understanding alongside. GPT-5's multimodal ability is slightly stronger.
Already in the OpenAI ecosystem. If you use the Assistants API, file search, or Code Interpreter, migration costs are high.
Budget-sensitive. Token prices are 30-40% cheaper.

When Claude is the must-pick

Using Cursor / Windsurf / Cline / Aider. Every major agent tool is tuned most deeply for Claude.
Multi-file refactors. No stray edits, keeps the code style consistent.
Structured output. Tables, comparisons, and normalized documents are steadier with Claude.
Long agent loops (10+ tool calls). Claude's drift rate is clearly lower.
Teams picky about code style. Claude follows existing conventions by default; GPT-5 occasionally takes liberties.

How to use both

The most common engineer setup in 2026:

Cursor defaulting to Claude Sonnet 4.6 for daily edits, completions, and refactors.
Switch to GPT-5 Pro for hard problems (built into Cursor), letting it clarify first, then propose a design.
Switch batch, low-value tasks (lint, doc generation, commit messages) to DeepSeek R1 or Claude Haiku 4.5 to save money.
Switch whole-repo Q&A to Gemini 2.5 Pro, fitting the whole project into 2M context.

Don't bet on a single model. The frontier swaps leaders every 3-6 months. Switch when you can, and don't dig yourself a hole.

Call both with one API key

If you're building your own tool or agent, OpenRouter gives you one OpenAI-compatible endpoint that routes to GPT-5, Claude, DeepSeek, and Gemini, which makes switching and A/B testing easy. Note: OpenRouter has no public referral program; the link below is a plain recommendation.

Go to OpenRouter →

The one-line summary

Claude Sonnet 4.6 as the default, switch to GPT-5 Pro for hard problems, switch batch tasks to DeepSeek R1. Use Gemini for whole-repo Q&A. Don't try to bet on one model. Switching is the best practice for 2026.

→ Compare all models side by side on Check.AI