April 2024. I’ve just pushed the first working version of naswoim.org to production. The app runs. Users can log in. Data persists in Supabase. On paper, it’s a success.
Under the hood: two competing design systems, seventeen duplicate utility functions scattered across twelve files, and a Supabase RLS policy that silently fails for one specific edge case I won’t discover for three more weeks.
Vibe coding works. But it fails in ways that are invisible — until they aren’t.
Quick context: what I built
Over roughly twelve months, I built three applications using vibe coding — almost without external developers:
- naswoim.org — a platform for property investors: checklists, budgets, documents, expert marketplace, land maps, AI assistant. Web + Android + iOS + admin portal.
- industrverse.com — B2B SaaS for industrial VR training: 7 user roles, 9 dashboards per role, real-time communication, VR session gateway, full backend API.
- marcinpaszkiewicz.com — this site. Astro SSR + WordPress headless. Simpler, but instructive.
I used Claude Code as my primary tool, with occasional help from Cursor. I wrote somewhere between 40,000 and 60,000 lines of production code this way. I shipped everything. It all works.
Here’s what I got wrong.
Mistake #1: I let AI pick the architecture
When I started naswoim.org, I described the project to Claude and asked what stack it recommended. It gave me a solid answer: React 19, Vite, Supabase, Tailwind CSS 4. All excellent choices.
Then I asked about UI components. It suggested MUI 7. I said yes — it was fast, it had everything I needed.
The problem: I was already using Tailwind CSS 4. Now I had two design systems:
Diagram — konflikt design systemów / design system conflict
▸ spacing: rem scale
▸ breakpoints: sm/md/lg/xl
▸ tokens: CSS vars
nowy ekran
▸ spacing: 8px grid
▸ breakpoints: xs/sm/md/lg
▸ tokens: theme object
The lesson: AI doesn’t know your 18-month vision. It optimizes for „working right now.” Architecture decisions — especially around design systems, data models, and module boundaries — must come from you. AI implements. You decide what to implement.
What I’d do differently: write a one-page architecture decision record before the first prompt. Not a full spec — just: what’s the single source of styling truth? What’s the state management philosophy? How are we splitting modules? Give AI constraints, not blank permission.
Mistake #2: AI never says no — and that’s dangerous
By the time I was three months into industrverse, the backend had 13 NestJS modules, 7 user roles, and 9 separate dashboards. Each role had its own data access logic, its own notification system, its own workflow.
None of it was in the original spec.
industrverse — MVP plan vs reality
planned 5 → actual 13
planned 3 → actual 7
planned 3 → actual 9
Here’s what happens: you have an idea at 11pm. You describe it to Claude. It builds it in twenty minutes. It works. You ship it. Three weeks later, you realize that adding this feature broke the mental model for the next feature. But AI doesn’t tell you this — it just builds what you ask.
Every developer on a team has a colleague who says „wait — are we sure we need this?” AI doesn’t say wait. AI says yes.
The lesson: You must be the PM for your AI. Not just the visionary — the person who says no. The question isn’t „can AI build this?” (it can). The question is „should this exist at all?”
I now have a rule before any new feature: write one sentence about what problem this solves for a specific user. If I can’t write that sentence, I don’t prompt it.
Mistake #3: Debugging code you didn’t write is slower than it looks
In naswoim.org, I had a bug in the Supabase Row Level Security policies. Users in one role could occasionally see documents they shouldn’t — but only when a specific sequence of operations had happened first.
It took me three days to find it.
Not because the bug was complex. Because the code was AI-generated and I hadn’t read it carefully enough when it was written. The RLS policy looked right. It was syntactically correct. It passed my basic tests. The edge case was subtle — a combination of two different policy conditions that interacted in a non-obvious way.
When you write code yourself, you build a mental model of it as you write. When AI writes it, you review it — which is faster, but shallower. The model in your head is less complete. And shallow mental models make debugging slow.
The lesson: Never merge AI code you can’t explain line by line. For anything touching auth, data access, or business-critical logic: read it like a code reviewer, not like someone checking a shopping list.
Mistake #4: Context ends — and AI „forgets” everything
In a long Claude Code session, the AI sees everything you’ve built together. It knows your naming conventions, your patterns, your preferences. It’s coherent.
In the next session, it starts fresh.
Schemat — pamięć kontekstu między sesjami / context memory across sessions
▸ TanStack Query
▸ Zustand atoms
▸ local useState
▸ API calls inline
▸ Context API
▸ Axios interceptors
In naswoim.org, I started a new session after a two-day break and asked Claude to build a new feature component. It generated something that worked — but used completely different patterns from everything else in the codebase. Different state management approach. Different error handling style. Different naming.
By month four, the codebase had three distinct „eras” — each reflecting the conventions of whoever I’d been talking to at that time.
The lesson: A CLAUDE.md file is not optional. Set it up on day one. It should contain: naming conventions, patterns to follow, patterns to avoid, which libraries to use for which problems. This is the persistent memory that bridges sessions.
Mistake #5: Security is invisible until it isn’t
AI generates working code. It doesn’t reliably generate secure code.
In industrverse, I had an API endpoint that was supposed to be accessible only to users with the „trainer” role. The endpoint worked correctly. It returned the right data. It handled errors gracefully.
It also didn’t verify the JWT role claim on one specific HTTP method. A user with any authenticated token could call it.
I found this in a manual security review — not because Claude flagged it, not because my tests caught it. Because I sat down and read through every auth-related endpoint one afternoon.
Security review — checklista po każdym auth feature
Czy każdy endpoint weryfikuje JWT/session?
Is every endpoint verifying JWT/session?
Czy rola użytkownika jest sprawdzana server-side?
Is the user role checked server-side?
Czy RLS działa dla wszystkich kombinacji ról?
Does RLS work for all role combinations?
Czy input jest walidowany przed zapisem do bazy?
Is input validated before writing to DB?
Czy wrażliwe pola są filtrowane w response?
Are sensitive fields filtered in the response?
Czy edge case (brak roli, wygasły token) jest obsłużony?
Is the edge case (missing role, expired token) handled?
The lesson: After every feature that touches authentication, authorization, or user data — do a manual security review. Not a vibe. A checklist.
What I’d do differently: 6 rules
If I started today, with everything I know now:
What I’m not saying
I’m not saying vibe coding is flawed or that AI tools are unreliable. All three projects I built work. They have real users. They solve real problems. The productivity gain is genuine — I built in one year what a team of three would have taken eighteen months to build.
What I’m saying is that the failure modes are specific, and they’re not obvious at the start.
The biggest risk in vibe coding isn’t that AI writes bad code. It’s that AI writes code that looks fine — until something goes wrong. And by then, you’re looking at a codebase you half-understand, with a bug you didn’t write, and a mental model that has gaps in exactly the wrong places.
The speed is real. Build the habits that make it safe.