AUTOMATIC CODING

From autocomplete to AI agents — where are we actually?

FIRST:
WHAT IS AN LLM?

🔗 growingswe.com/blog/microgpt — interactive walkthrough of Karpathy's 200-line GPT

Neural Network at its core

🧠 📊
  • Trained on massive text corpora
  • Predicts the next token — one word piece at a time
  • No memory between calls

The Memory Problem

What fits inside a "context window"

System
Prompt
Conversation
History
Your
Message
Response

Run out of context = amnesia 🫥

Every call starts fresh 🔄

What LLMs don't have

  • ❌ Understanding of consequences
  • ❌ Real-time knowledge
  • ❌ A "world model"
  • ❌ Persistent memory (without scaffolding)
  • ✅ Incredible pattern matching & synthesis

VIBE
CODING

"Just vibe with it 🎵"

What IS vibe coding?

Human is NOT in the loop
  • Never read the generated code
  • Just prompt harder 💪
🔥

This is fine.

The app is on fire but the tests pass

— Every vibe coder, eventually

🚨 The Problems

🤖 Hallucinations — The code looks right but calls an API that doesn't exist
🧩 Maintainability — 6 months later, nobody (including the AI) understands it
🔓 Security — SQL injection on line 42, shipped to prod
⚖️ IP Risks — Did that code come from a GPL repo?
🧠 Cognitive Decay — You stop understanding your own codebase
👥 Team Load — Junior devs reviewing AI slop they can't evaluate

Engineers are so dramatic! 🎭

🙄

...or are they?

Let's look at the data.

Would you fly on a
fully automatic plane? ✈️

Would you fly on a fully automatic plane? ✈️

🤖 Autopilot can do almost everything — yet every flight still has two pilots
🧠 Skills & judgment matter — when things go wrong, humans take over
📋 Checklists & procedures — not because pilots are dumb, but because stakes are high
⌨️ Same goes for coding with AI — automation is a tool, not a replacement for understanding

THE 5 PHASES

of human-AI collaboration in coding

PHASE 1

Automatic Code Completion

Copilot, Tabnine, Supermaven — inline suggestions as you type

⚡ Fast & low-risk — you review every token before it lands
⚠️ Can introduce subtle bugs — sustained attention required

PHASE 2

Chat → Copy → Paste

ChatGPT, Claude.ai — generate in a chat, copy, paste into your editor

✅ You're still in full control of what gets committed
⚠️ Context switching cost — the chat has no awareness of your project

PHASE 3

IDE-Integrated Agent

Cursor, Copilot Chat, Claude Code — agent reads your files, writes code inline

✅ Project-aware — can refactor across files, understands your codebase
⚠️ Trust boundary blurs — review discipline becomes critical

PHASE 4

Multi-Agent Orchestration

Parallel agents, tool use, sub-agents — full autonomy pipelines

✅ Handles entire workflows end-to-end, parallelizes complex tasks
⚠️ Hard to audit — failure modes are non-obvious and can cascade

PHASE 5

Spec-Driven Development

Still a myth 🦄

"Write a spec → AI ships production code → Deploy → ??? → Profit"

Reality check: We're not there yet. Maybe never for complex systems.

PHASE 5 — IN THE WILD

Case Study: A C Compiler, Built from Specs

🦀 106,000 lines of Rust — 351 files — compiles the Linux kernel
🤖 Generated with Claude Opus via iterative spec-driven prompting
🧩 Key insight: 32 phases × ~11 files — modular decomposition was everything
💸 Cost: $20,000+ in API calls — and the type checker doesn't work

🔗 shape-of-code.com — Investigating an LLM-generated C compiler

🎬 Demo: Claude Code

  • 🔧 Skills — OpticOdds fixtures integration
  • 🪝 Hooks — Preventing credentials from leaking into context

AI SLOP

When volume beats quality

🗑️ What is AI Slop?

📈 Code creation FAR outpaces code review
📉 Consistency erodes — style, patterns, naming — all drift
🧠 Team expertise declines — nobody owns anything anymore

Spot the difference

👎
Vibe coding this into prod
No review. No context. No accountability.
👍
Vibe coding this into a pet project
Learn fast. Break things. Nobody gets paged at 3am.

Ready to ship without review? 🤔

NO.
  • Even senior engineers miss things under time pressure
  • AI confidently writes wrong code — the confidence is the bug
  • The diff is your last line of defense 🛡️

A WISE
STRATEGY

Learn from Open Source Maintainers

🔍 Add friction to PR review — slow down intake, not speed up merges
🔗 Always link to the AI conversation in the PR description
📋 No more breadcrumbs — full context required, or no merge

The Review Contract

You wrote it with AI?
Then YOU are responsible for it.
Not the AI. YOU.
🫵

MY ADVICE

(completely unrequested)

New tools, new manners

🍝 Stop Sloppypasta
stopsloppypasta.ai — think before you paste AI-generated code

New capabilities → new professional norms. The industry hasn't caught up yet. You can lead.

📄 Shen & Tamkin — "How AI Impacts Skill Formation" · arxiv.org/abs/2601.20245 · AI use impairs conceptual understanding without delivering efficiency gains

Your coding skills still matter

🏋️ Exercise them — build pet projects just for the joy of it
📺 youtube.com/@TsodingDaily — raw, educational, zero-AI coding streams (C, Rust, Go, and more)
📺 youtube.com/@TheCodingTrain — creative coding with p5.js, Daniel Shiffman (1.76M subscribers)
🤖 Use AI for exploration — not as a substitute for understanding

Prediction: in the coming years, many will try to do engineering without engineering knowledge. That gap will be visible. Your fundamentals are your moat.

WHAT'S NEXT?

Going deeper: AI features, local inference & model training

LEVEL 1

Adding AI features to your app

☁️ Hosted APIs — Anthropic, OpenAI, Gemini — drop-in inference, no infra
🔀 HuggingFace Inference API — proxy to hundreds of open-weight models, same REST interface
🧩 Patterns: chat completion · embeddings · structured output · tool use
⚠️ You are now responsible for prompt injection, data leakage and cost runaway

LEVEL 2

Local inference

🦙 Ollama
Run Llama, Mistral, Gemma locally — one command
llama.cpp
CPU inference, quantised GGUF models, minimal deps
🤗 HF Transformers
Python, full control, GPU/MPS acceleration
🔒 Why bother? — data stays on-device, zero cost per token, works offline
⚠️ Hardware matters — a 70B model needs ~40 GB VRAM. Start with 7B–8B quantised.

LEVEL 3

RAG — Retrieval-Augmented Generation

📄 Your docs
✂️ Chunk & embed
🗄️ Vector store
🔍 Retrieve
🤖 LLM
🧠 Gives the model your private knowledge without fine-tuning
🛠️ Stack: LangChain / LlamaIndex · ChromaDB / pgvector / Qdrant
→ Currently experimenting with this 🧪

LEVEL 4

Training an open-weight model

🎯 Fine-tuning
LoRA / QLoRA — adapt a base model on your task with minimal GPU
📊 Datasets
HuggingFace Datasets — thousands of ready-to-use training sets
🚂 Training
HF Transformers + PEFT · Unsloth for fast LoRA
🤗 HuggingFace — one-stop shop: models · datasets · papers · inference API · model hub
→ Currently experimenting with local training + inference 🧪

🤗 HuggingFace as your AI platform

🏗️ Model Hub
500,000+ open-weight models, filterable by task, size, license
📦 Datasets
Curated training data for every domain — ready to fine-tune on
📄 Papers
Daily ML papers with linked code & models — stay on the frontier
Inference Providers
Proxy to hosted open models — same API, swap models freely

huggingface.co — the GitHub of AI models

EXPERIMENT.

REVIEW.

UNDERSTAND.

In that order. 😉