VIBE CODING
"Just vibe with it 🎵"
Andrej Karpathy coined "vibe coding" — where you let the AI do everything and just vibe with the result. Let's talk about what that actually means.
What IS vibe coding?
Human is NOT in the loop
Never read the generated code
Just prompt harder 💪
The key defining feature: you don't read the code. You describe what you want, accept what you get, and ship it. Some people have shipped entire apps this way.
🔥
This is fine.
The app is on fire but the tests pass
— Every vibe coder, eventually
Tests green. CI green. Production on fire. The AI wrote tests that test the wrong thing. Classic.
🚨 The Problems
🤖 Hallucinations — The code looks right but calls an API that doesn't exist
🧩 Maintainability — 6 months later, nobody (including the AI) understands it
🔓 Security — SQL injection on line 42, shipped to prod
⚖️ IP Risks — Did that code come from a GPL repo?
🧠 Cognitive Decay — You stop understanding your own codebase
👥 Team Load — Junior devs reviewing AI slop they can't evaluate
Each of these is a real, documented failure mode. I'll elaborate on a couple during the demo section.
Engineers are so dramatic! 🎭
🙄
...or are they?
Let's look at the data.
Sure, we can be dramatic. But these concerns have shown up in post-mortems at real companies. Let's be precise about the risks rather than hand-wavy either way.
Would you fly on a fully automatic plane? ✈️
Let this land for a second. Modern commercial planes can fly, navigate, and land almost entirely on autopilot. Yet every flight still has two trained pilots in that cockpit. Why?
Would you fly on a fully automatic plane? ✈️
🤖 Autopilot can do almost everything — yet every flight still has two pilots
🧠 Skills & judgment matter — when things go wrong, humans take over
📋 Checklists & procedures — not because pilots are dumb, but because stakes are high
⌨️ Same goes for coding with AI — automation is a tool, not a replacement for understanding
The analogy is tight: autopilot = AI coding agent. The plane can fly itself — but you need a trained human who understands the system, can read the instruments, and knows when and how to intervene. "Vibe flying" is not a thing. Why should vibe coding be?
THE 5 PHASES
of human-AI collaboration in coding
Let's map out the spectrum from "AI suggests a word" to "AI ships the whole product." Each phase has a different human-in-the-loop model and different risk profile.
PHASE 1
Automatic Code Completion
Copilot, Tabnine, Supermaven — inline suggestions as you type
⚡ Fast & low-risk — you review every token before it lands
⚠️ Can introduce subtle bugs — sustained attention required
This is where most developers started. The human is still typing, still in control — the AI just autocompletes. Lowest friction, lowest risk.
PHASE 2
Chat → Copy → Paste
ChatGPT, Claude.ai — generate in a chat, copy, paste into your editor
✅ You're still in full control of what gets committed
⚠️ Context switching cost — the chat has no awareness of your project
Many developers still live here. It's deliberate — you're choosing what to paste. But the lack of project context means the AI's suggestions can be subtly wrong.
PHASE 3
IDE-Integrated Agent
Cursor, Copilot Chat, Claude Code — agent reads your files, writes code inline
✅ Project-aware — can refactor across files, understands your codebase
⚠️ Trust boundary blurs — review discipline becomes critical
This is where things get interesting. The agent can now see your whole project. That's powerful — and that's exactly why you need tighter review habits here.
PHASE 4
Multi-Agent Orchestration
Parallel agents, tool use, sub-agents — full autonomy pipelines
✅ Handles entire workflows end-to-end, parallelizes complex tasks
⚠️ Hard to audit — failure modes are non-obvious and can cascade
This is cutting-edge territory. You might have one agent writing code, another reviewing it, another running tests. The coordination overhead is real and the audit trail can get murky.
PHASE 5
Spec-Driven Development
Still a myth 🦄
"Write a spec → AI ships production code → Deploy → ??? → Profit"
Reality check: We're not there yet. Maybe never for complex systems.
Everyone's pitching this. Nobody's shipping it reliably at scale. For simple CRUD apps? Maybe. For systems with complex domain rules, security requirements, and performance constraints? We need humans in the loop.
PHASE 5 — IN THE WILD
Case Study: A C Compiler, Built from Specs
🦀 106,000 lines of Rust — 351 files — compiles the Linux kernel
🤖 Generated with Claude Opus via iterative spec-driven prompting
🧩 Key insight: 32 phases × ~11 files — modular decomposition was everything
💸 Cost: $20,000+ in API calls — and the type checker doesn't work
🔗 shape-of-code.com — Investigating an LLM-generated C compiler
Real experiment, February 2026. Someone used Claude Opus to write a C compiler capable of compiling the Linux kernel — 106K lines of Rust. Cost: over $20,000 in API calls. The trick was breaking it into 32 phases of ~11 files each and regenerating each phase ~13 times. It compiles real programs in 17 seconds. But: no type checking. This is Phase 5 in the wild — impressive, expensive, and not production-grade.
MY ADVICE
(completely unrequested)
Nobody asked but here we are. A few things I genuinely believe will help you and your team.
New tools, new manners
🍝 Stop Sloppypasta
New capabilities → new professional norms. The industry hasn't caught up yet. You can lead.
Stop Sloppypasta is a movement around thoughtful AI usage in code. The name is funny but the point is serious: don't just accept and paste without thought.
📄 Shen & Tamkin — "How AI Impacts Skill Formation" · arxiv.org/abs/2601.20245 · AI use impairs conceptual understanding without delivering efficiency gains
Randomized experiment: developers learning a new programming library. AI use impaired conceptual understanding, code reading, and debugging — without significant efficiency gains. The HOW matters: 3 out of 6 interaction patterns preserved learning. Efficiency ≠ competence.
Your coding skills still matter
🏋️ Exercise them — build pet projects just for the joy of it
📺 youtube.com/@TsodingDaily — raw, educational, zero-AI coding streams (C, Rust, Go, and more)
Tsoding (Alexey Kutepov) live-codes real projects from scratch — compilers, game engines, tools — with no AI assistance. Pure fundamentals, highly entertaining, deeply instructive.
📺 youtube.com/@TheCodingTrain — creative coding with p5.js, Daniel Shiffman (1.76M subscribers)
Daniel Shiffman teaches creative coding — algorithms, simulations, generative art — with infectious enthusiasm. Great for rediscovering the joy of programming from first principles.
🤖 Use AI for exploration — not as a substitute for understanding
Prediction: in the coming years, many will try to do engineering without engineering knowledge. That gap will be visible. Your fundamentals are your moat.
The ability to read, write, and reason about code is more valuable now than many think — precisely because it's becoming rarer. It's a superpower when combined with AI tools.
WHAT'S NEXT?
Going deeper: AI features, local inference & model training
Now that we've covered the coding-agent layer, let's look one level deeper — how do you actually integrate AI into your own apps, run it locally, and eventually train your own models?
LEVEL 1
Adding AI features to your app
☁️ Hosted APIs — Anthropic, OpenAI, Gemini — drop-in inference, no infra
🔀 HuggingFace Inference API — proxy to hundreds of open-weight models, same REST interface
🧩 Patterns: chat completion · embeddings · structured output · tool use
⚠️ You are now responsible for prompt injection , data leakage and cost runaway
The entry point for most developers. Pick a hosted provider, grab an SDK, start with a chat completion. The hard part isn't calling the API — it's building the guardrails around it.
LEVEL 2
Local inference
🦙 Ollama
Run Llama, Mistral, Gemma locally — one command
⚡ llama.cpp
CPU inference, quantised GGUF models, minimal deps
🤗 HF Transformers
Python, full control, GPU/MPS acceleration
🔒 Why bother? — data stays on-device, zero cost per token, works offline
⚠️ Hardware matters — a 70B model needs ~40 GB VRAM. Start with 7B–8B quantised.
Local inference is now surprisingly accessible. Ollama is the easiest on-ramp — one brew install, one command to pull a model. HuggingFace Transformers gives you full Python control for more advanced use cases.
LEVEL 3
RAG — Retrieval-Augmented Generation
📄 Your docs
→
✂️ Chunk & embed
→
🗄️ Vector store
→
🔍 Retrieve
→
🤖 LLM
🧠 Gives the model your private knowledge without fine-tuning
🛠️ Stack: LangChain / LlamaIndex · ChromaDB / pgvector / Qdrant
→ Currently experimenting with this 🧪
RAG is the practical answer to "how do I give the model knowledge of my codebase / docs / database?" without the cost and complexity of fine-tuning. Chunk your data, embed it, store vectors, retrieve at query time.
LEVEL 4
Training an open-weight model
🎯 Fine-tuning LoRA / QLoRA — adapt a base model on your task with minimal GPU
📊 Datasets HuggingFace Datasets — thousands of ready-to-use training sets
🚂 Training HF Transformers + PEFT · Unsloth for fast LoRA
🤗 HuggingFace — one-stop shop: models · datasets · papers · inference API · model hub
→ Currently experimenting with local training + inference 🧪
You don't need to train from scratch. LoRA lets you fine-tune a 7B model on a consumer GPU in hours. HuggingFace is the center of gravity for the open-weight ecosystem — models, datasets, papers (via ArXiv integration), and inference all in one place.
🤗 HuggingFace as your AI platform
🏗️ Model Hub 500,000+ open-weight models, filterable by task, size, license
📦 Datasets Curated training data for every domain — ready to fine-tune on
📄 Papers Daily ML papers with linked code & models — stay on the frontier
⚡ Inference Providers Proxy to hosted open models — same API, swap models freely
huggingface.co — the GitHub of AI models
Think of HuggingFace as the npm registry + GitHub + ArXiv for AI. The inference provider proxy is particularly useful: one API key, access to Llama, Mistral, Falcon and more — no individual provider accounts needed.