Vibe Coding: What the Research Says, and a Tool Comparison

Andrej Karpathy coined the term in February 2025. Collins Dictionary named it word of the year. But what do peer-reviewed studies say about vibe coding’s effectiveness? A data-backed comparison of Claude Code, Codex, Cursor, Windsurf, and Lovable — strengths, weaknesses, and when to use each.

In February 2025, Andrej Karpathy — co-founder of OpenAI and former Tesla AI director — posted on X what would become one of the most-quoted developer statements of the year: „There’s a new kind of coding I call 'vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”

By March, Merriam-Webster had listed the term as „slang & trending”. By December, Collins Dictionary had named it word of the year for 2025. And throughout that year, a wave of peer-reviewed research tried to answer the question the industry had been asking since February: does it actually work?

The answer is more nuanced than most headlines suggest.

A definition worth clarifying

Vibe coding isn’t the same as no-code. It’s not copying ChatGPT outputs either. Researchers at ICSE 2026 define it as an iterative cycle: formulate a goal in natural language → prompt → review code → test → refine. The human remains the product owner and architect. AI is the implementation partner.

This shifts what expertise is required — but doesn’t eliminate it. You don’t need to know how to write a React hook with optimistic UI updates. You need to know that you need one and why.

What the research says — productivity

Several significant studies were published in 2025. Here’s what the data shows.

Key studies — 2025

Grey Literature Review — arxiv 2510.00328 (ICSE 2026)

101 practitioner sources, 518 documented first-hand accounts of vibe coding. Top motivations: faster prototyping, accessibility for non-developers, idea exploration. Top challenges: security vulnerabilities, technical debt, lack of understanding of generated code.

Mixed-Methods Case Study — IJSAT 2025

Multiple real-world apps built using vibe coding. Result: 60–80% faster prototyping, increased creativity, but human oversight required for security, quality, and maintainability.

GitHub / Microsoft Research — controlled experiments (95 developers)

Task completion time: 2h41min → 1h11min (55% faster). Success rate: 70% → 78%. 87% of developers reported maintaining flow on complex tasks. 31% faster feature development cycles at team level.

ZoomInfo Enterprise Deployment Study — January 2025

AI suggestion acceptance rate: 33%, line-of-code acceptance: 20%. Developer satisfaction score: 72/100. Key takeaway: AI accelerates — it doesn’t replace the thinking process.

New studies — 2026

METR Randomized Controlled Trial — arxiv 2507.09089 (July 2025)

16 experienced open-source developers, 246 tasks in mature projects (average 5 years of repo experience). Surprising result: AI increased completion time by 19%. Before tasks, developers predicted 24% speedup — misreading subjective feeling for reality. Context: applies to complex, existing codebases — not greenfield projects.

„Vibe Coding Kills Open Source” — arxiv 2601.15494 (January 2026)

Economic analysis by researchers at CEU Budapest and Kiel Institute. Vibe coding boosts productivity by making open-source easier to use — but simultaneously eliminates user engagement (bug reports, docs, maintainer support) that sustains the OSS ecosystem. Conclusion: under widespread vibe coding, existing OSS business models are financially unsustainable.

„CS Achievement and Writing Skills Predict Vibe Coding Proficiency” — arxiv 2603.14133 (March 2026)

Preregistered cross-sectional study, N=100 students. Result: writing ability and CS knowledge are the strongest predictors of vibe coding effectiveness — stronger than general cognitive ability. Vibe coding does not eliminate the knowledge barrier — those with solid CS fundamentals and clear thinking extract far more value from it.

Faros AI — telemetry from 22,000 developers, 1,255 teams (2026)

Data from task management, IDE, static analysis and CI/CD systems over 2 years. Over 75% of developers use AI coding assistants — but organisational productivity gains are just ~10%. Developers feel faster; company-level software delivery metrics don’t confirm it.

Task completion time with and without AI (minutes, GitHub Research 2024, n=95)

Controlled conditions, 95 professional developers. 55% time reduction with Copilot, over 67% with advanced agents.

What the research says — code quality

This is where the data gets less comfortable for vibe coding enthusiasts.

GitClear analysed 211 million changed lines of code from 2020–2024 and identified what they call AI-induced tech debt. The results are sobering:

Code quality degradation — GitClear 2025 metrics (value „1” = 2020 baseline / human-written code)

Source: GitClear AI Copilot Code Quality Research 2025, 211M lines of code. CodeRabbit (470 PRs, December 2025): AI co-authored code has 1.7x more „major issues”.

An independent CodeRabbit analysis from December 2025 (470 pull requests) confirmed: AI-assisted code contains 1.7x more serious defects — primarily logic errors, flawed control flow, and incorrect dependencies.

The vibe coding paradox: 55% faster, but 2.74x more security vulnerabilities. Both numbers are true simultaneously.

Code refactoring dropped from 25% of changed lines in 2021 to under 10% in 2024. Duplication grew 8x. Copy-pasted code exceeded moved code for the first time in two decades.

Security case — Lovable (May 2025)

170 out of 1,645 apps built with Lovable had a vulnerability allowing access to user personal data without authentication. The apps were live in production. None displayed any security warning.

Tool comparison

The ecosystem grew fast. Two categories: IDE assistants (Claude Code, Cursor, Windsurf, Codex) and app builders (Lovable, Bolt, v0). They differ fundamentally in target audience and use case.

Tool	Best for	Strengths	Weaknesses	$/mo
Claude Code	Senior devs, complex projects	SWE-bench leader (79.6% Sonnet 4.6, 87.6% Opus 4.7). Best context retention across 40+ files simultaneously. Precise cross-file refactoring. Terminal-first, no unnecessary GUI.	Terminal only — no GUI. Slower on simple one-off tasks. Higher cost under heavy usage.	$20–200
Cursor	Developers, teams	Full IDE (VS Code fork), 1M+ users. Up to 8 parallel agents with auto-judge. .cursorrules for project context. Largest ecosystem (360k paying customers, $29.3B valuation).	Loses context on very large refactors. Lock-in to its own IDE — no JetBrains/Vim support.	$20
Windsurf	Devs, multi-IDE users	Cascade (persistent agentic context, self-recovery). Plugins for 40+ IDEs (JetBrains, Vim, XCode, NeoVim). #1 LogRocket AI Dev Tool Rankings (Feb 2026). Acquired by Cognition for $250M.	Smaller ecosystem than Cursor. Strategic uncertainty post-acquisition by Cognition (makers of Devin).	$20
Codex (OpenAI)	Devs, enterprise GPT	Open-source CLI. Built-in web search enabled by default. MCP server support. SWE-bench ~85% (GPT-5.3-Codex). Low-latency optimised. Image input support (screenshots, wireframes).	Newer tool, smaller community. Interface less polished than Cursor/Claude Code. Less control over the execution environment.	in ChatGPT plan
Lovable	Non-devs, MVPs	Fastest start — from description to working app in minutes. Great UI/UX output. Zero technical knowledge required. Perfect for validating an idea fast.	Documented security vulnerabilities (170/1,645 apps). Struggles with changing requirements. Not suitable for complex systems.	$25–50
Bolt.new	Non-devs, prototypes	StackBlitz in the browser — zero installation. Fast start. Good for demos and showcases.	Loses coherence on requirement changes. Similar limitations to Lovable — not production-ready without audit.	$20
v0 (Vercel)	Designers, frontend devs	Best for UI components (React/Next.js/shadcn). Perfect Vercel integration. Precise styling output. Great for designers with minimal JS knowledge.	Narrow scope — frontend/UI only. Doesn’t replace a full coding assistant.	$20

SWE-bench: how real coding ability is measured

SWE-bench Verified is a benchmark built from real GitHub bugs — not synthetic tasks. The model receives a repository and an issue, and must independently write a patch that passes the tests. It’s the most credible measure of an AI agent’s real-world coding capability.

SWE-bench Verified — top model scores (April 2026, % of issues resolved)

From 48.5% (GPT-4 Turbo, November 2023) to 87.6% (Claude Opus 4.7, April 2026) in under 2.5 years. The rate of improvement is as striking as the score itself.

Which tool to choose — decision map

🎯

Large codebase, deep refactoring, 40+ files at once?
→ Claude Code. Best context retention, SWE-bench leader.

⚡

Want a full AI-native IDE, largest ecosystem, parallel agents?
→ Cursor. Market leader, 1M+ users, VS Code fork.

🔧

JetBrains, Vim, XCode — don’t want to switch editors?
→ Windsurf. Only tool with plugins for 40+ IDEs.

🌐

OpenAI/GPT-5 ecosystem, need live web search and MCP?
→ Codex CLI. Open-source, built-in web search, good for tasks requiring current data.

🚀

Non-developer, want to test an app idea in an hour?
→ Lovable or Bolt. No technical knowledge required. Get a security audit before going live.

The best 2026 strategy: Claude Code or Codex for complex agentic tasks; Cursor or Windsurf for everyday IDE coding; Lovable / Bolt / v0 for fast prototyping without technical expertise. Most experienced teams use multiple tools simultaneously — depending on the task.

Vibe coding — what it means

Karpathy, who coined the term, later admitted publicly that he hand-coded his next project — because it required precision that vibe coding couldn’t deliver. That’s a good metaphor for the whole phenomenon.

Vibe coding doesn’t replace programming. It dramatically lowers the barrier to entry — and that’s it. The data shows 55% speed gains alongside 2.74x more security vulnerabilities. Both numbers are true simultaneously. The question is no longer „whether to use AI for coding” — that’s settled.

The question is: when does a human step in and take responsibility for what AI wrote? For a prototype — maybe never. For a system processing real user data — always, before the code goes live.

Sources: Karpathy (X, Feb 2025) · arxiv 2510.00328 / 2510.12399 (ICSE 2026) · IJSAT 2025 · GitHub/Microsoft Research 2024 (n=95) · GitClear 2025 (211M lines) · CodeRabbit (470 PRs, Dec 2025) · ZoomInfo Enterprise Study (Jan 2025) · METR RCT arxiv 2507.09089 · arxiv 2601.15494 „Vibe Coding Kills OSS” · arxiv 2603.14133 · Faros AI (22K devs) · SWE-bench (Apr 2026) · Collins Dictionary Word of the Year 2025