In February 2025, Andrej Karpathy — co-founder of OpenAI and former Tesla AI director — posted on X what would become one of the most-quoted developer statements of the year: „There’s a new kind of coding I call 'vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”
By March, Merriam-Webster had listed the term as „slang & trending”. By December, Collins Dictionary had named it word of the year for 2025. And throughout that year, a wave of peer-reviewed research tried to answer the question the industry had been asking since February: does it actually work?
The answer is more nuanced than most headlines suggest.
A definition worth clarifying
Vibe coding isn’t the same as no-code. It’s not copying ChatGPT outputs either. Researchers at ICSE 2026 define it as an iterative cycle: formulate a goal in natural language → prompt → review code → test → refine. The human remains the product owner and architect. AI is the implementation partner.
This shifts what expertise is required — but doesn’t eliminate it. You don’t need to know how to write a React hook with optimistic UI updates. You need to know that you need one and why.
What the research says — productivity
Several significant studies were published in 2025. Here’s what the data shows.
Key studies — 2025
New studies — 2026
Controlled conditions, 95 professional developers. 55% time reduction with Copilot, over 67% with advanced agents.
What the research says — code quality
This is where the data gets less comfortable for vibe coding enthusiasts.
GitClear analysed 211 million changed lines of code from 2020–2024 and identified what they call AI-induced tech debt. The results are sobering:
Source: GitClear AI Copilot Code Quality Research 2025, 211M lines of code. CodeRabbit (470 PRs, December 2025): AI co-authored code has 1.7x more „major issues”.
An independent CodeRabbit analysis from December 2025 (470 pull requests) confirmed: AI-assisted code contains 1.7x more serious defects — primarily logic errors, flawed control flow, and incorrect dependencies.
The vibe coding paradox: 55% faster, but 2.74x more security vulnerabilities. Both numbers are true simultaneously.
Code refactoring dropped from 25% of changed lines in 2021 to under 10% in 2024. Duplication grew 8x. Copy-pasted code exceeded moved code for the first time in two decades.
Security case — Lovable (May 2025)
170 out of 1,645 apps built with Lovable had a vulnerability allowing access to user personal data without authentication. The apps were live in production. None displayed any security warning.
Tool comparison
The ecosystem grew fast. Two categories: IDE assistants (Claude Code, Cursor, Windsurf, Codex) and app builders (Lovable, Bolt, v0). They differ fundamentally in target audience and use case.
| Tool | Best for | Strengths | Weaknesses | $/mo |
|---|---|---|---|---|
| Claude Code | Senior devs, complex projects | SWE-bench leader (79.6% Sonnet 4.6, 87.6% Opus 4.7). Best context retention across 40+ files simultaneously. Precise cross-file refactoring. Terminal-first, no unnecessary GUI. | Terminal only — no GUI. Slower on simple one-off tasks. Higher cost under heavy usage. | $20–200 |
| Cursor | Developers, teams | Full IDE (VS Code fork), 1M+ users. Up to 8 parallel agents with auto-judge. .cursorrules for project context. Largest ecosystem (360k paying customers, $29.3B valuation). | Loses context on very large refactors. Lock-in to its own IDE — no JetBrains/Vim support. | $20 |
| Windsurf | Devs, multi-IDE users | Cascade (persistent agentic context, self-recovery). Plugins for 40+ IDEs (JetBrains, Vim, XCode, NeoVim). #1 LogRocket AI Dev Tool Rankings (Feb 2026). Acquired by Cognition for $250M. | Smaller ecosystem than Cursor. Strategic uncertainty post-acquisition by Cognition (makers of Devin). | $20 |
| Codex (OpenAI) | Devs, enterprise GPT | Open-source CLI. Built-in web search enabled by default. MCP server support. SWE-bench ~85% (GPT-5.3-Codex). Low-latency optimised. Image input support (screenshots, wireframes). | Newer tool, smaller community. Interface less polished than Cursor/Claude Code. Less control over the execution environment. | in ChatGPT plan |
| Lovable | Non-devs, MVPs | Fastest start — from description to working app in minutes. Great UI/UX output. Zero technical knowledge required. Perfect for validating an idea fast. | Documented security vulnerabilities (170/1,645 apps). Struggles with changing requirements. Not suitable for complex systems. | $25–50 |
| Bolt.new | Non-devs, prototypes | StackBlitz in the browser — zero installation. Fast start. Good for demos and showcases. | Loses coherence on requirement changes. Similar limitations to Lovable — not production-ready without audit. | $20 |
| v0 (Vercel) | Designers, frontend devs | Best for UI components (React/Next.js/shadcn). Perfect Vercel integration. Precise styling output. Great for designers with minimal JS knowledge. | Narrow scope — frontend/UI only. Doesn’t replace a full coding assistant. | $20 |
SWE-bench: how real coding ability is measured
SWE-bench Verified is a benchmark built from real GitHub bugs — not synthetic tasks. The model receives a repository and an issue, and must independently write a patch that passes the tests. It’s the most credible measure of an AI agent’s real-world coding capability.
From 48.5% (GPT-4 Turbo, November 2023) to 87.6% (Claude Opus 4.7, April 2026) in under 2.5 years. The rate of improvement is as striking as the score itself.
Which tool to choose — decision map
→ Claude Code. Best context retention, SWE-bench leader.
→ Cursor. Market leader, 1M+ users, VS Code fork.
→ Windsurf. Only tool with plugins for 40+ IDEs.
→ Codex CLI. Open-source, built-in web search, good for tasks requiring current data.
→ Lovable or Bolt. No technical knowledge required. Get a security audit before going live.
The best 2026 strategy: Claude Code or Codex for complex agentic tasks; Cursor or Windsurf for everyday IDE coding; Lovable / Bolt / v0 for fast prototyping without technical expertise. Most experienced teams use multiple tools simultaneously — depending on the task.
Vibe coding — what it means
Karpathy, who coined the term, later admitted publicly that he hand-coded his next project — because it required precision that vibe coding couldn’t deliver. That’s a good metaphor for the whole phenomenon.
Vibe coding doesn’t replace programming. It dramatically lowers the barrier to entry — and that’s it. The data shows 55% speed gains alongside 2.74x more security vulnerabilities. Both numbers are true simultaneously. The question is no longer „whether to use AI for coding” — that’s settled.
The question is: when does a human step in and take responsibility for what AI wrote? For a prototype — maybe never. For a system processing real user data — always, before the code goes live.
Sources: Karpathy (X, Feb 2025) · arxiv 2510.00328 / 2510.12399 (ICSE 2026) · IJSAT 2025 · GitHub/Microsoft Research 2024 (n=95) · GitClear 2025 (211M lines) · CodeRabbit (470 PRs, Dec 2025) · ZoomInfo Enterprise Study (Jan 2025) · METR RCT arxiv 2507.09089 · arxiv 2601.15494 „Vibe Coding Kills OSS” · arxiv 2603.14133 · Faros AI (22K devs) · SWE-bench (Apr 2026) · Collins Dictionary Word of the Year 2025