Why & When to Train — l33tpwn / chakra / foundry

What signals say it's time to train?

Three "yes" signals you've outgrown prompt engineering — and three "wait" signals that mean training is premature.

Train when…

Evals are plateauing

You've iterated prompts, system messages, few-shot exemplars, RAG, agent loops — the eval line has flattened. The next gain is in weights, not strings.

You need lower latency or higher throughput

A 3B or 7B fine-tuned model on your own GPU returns in 100–300ms what a frontier API delivers in 2–4s — and it scales horizontally without per-token billing.

Unit economics aren't scaling

Per-call API cost × volume crosses your margin. A fine-tuned adapter on owned compute flips the curve from "linear with usage" to "fixed monthly + tiny marginal."

Hold off when…

You're still developing core functionality

If your product loop doesn't yet work end-to-end with a frontier API, training won't fix that. Get the loop working first; then optimize.

You haven't collected data

Fine-tuning needs traces. If your product hasn't shipped or you haven't been logging interactions, train an eval suite first, then come back.

What you need — and what you don't

Most teams are intimidated by training because they think it requires infrastructure they don't have. The reverse is closer to true: you probably already have the hard parts.

What you need (you probably have these)

🤖 Agent harness

A loop that drives the model through a real task and captures every step. In this stack: chakra drives PDCA missions and writes traces to data/traces/pdca/.

📊 Eval suite

A way to score "did this run go well?" against a golden set. In this stack: chakra scores + foundry's experiment graph track every adapter against a fixed eval corpus.

👥 AI engineers

People who can read traces and reason about model behavior. In this stack: the cohort SOP trains 250 of them per day, with each producing their own adapter.

📂 Data

Traces from real product use, curated. In this stack: foundry data + the Data Review tab let you keep/drop/flag records before training.

What you don't need

A team of infrastructure experts

Modal handles the GPU, vLLM handles serving, foundry's Modal backend handles the upload/spawn/poll loop. foundry train fire web --watch is the entire interface.

A cluster reservation

A LoRA training run on Modal A10G costs ~$0.30–$3 and takes 30–90 minutes. Pay-per-minute, no cluster, no reservation.

In building your product, you may have already done the hard part.

The four "haves" → the three CLIs that ship them

Each industry-standard requirement maps cleanly onto a tool in this stack. Nothing custom; just the right composition.

l33tpwn
target labs · cohorts · walkthroughs

chakra
agent harness · trace capture · eval scores

foundry
data curation · Modal training · UI lineage

↓

🤖 Agent harness
→ chakra run

📊 Eval suite
→ chakra scores

👥 AI engineers
→ cohort SOP × 250

📂 Data
→ foundry data

↓

Output

A domain-specific adapter, evaluated on golden traces, deployed via vLLM, lineage tracked.

❌ Infrastructure experts
— Modal handles it

❌ Cluster reservation
— pay-per-minute training

Eight commands. End-to-end.

The whole loop, from "I have a target" to "I have a fine-tuned adapter scoring on golden traces."

$ l33tpwn login                                       # Cognito session

$ l33tpwn student provision me@cohort.l33tpwn.com    # 2-4 min
$ l33tpwn student vnc-url me@cohort.l33tpwn.com      # ← VNC link from CLI
$ l33tpwn walkthrough view albania                   # learn the kill chain

$ chakra run albania --invocations 30                # produces the trace
$ foundry data review v17-cohort                    # human-curate
$ foundry train fire web --watch                     # ~$1 on Modal A10G
$ chakra scores                                       # eval — did it learn?

Detailed timing in the 10-phase course SOP.

Treating models as raw material, not product.

The model is the transistor. The fine-tuned, domain-specific system is the product.

A pen-test mission engine, a security-domain corpus, a fine-tuned LoRA on top of Qwen2.5-7B that posts logins and reports findings — that's your product. The base model is a component, not the offering.

Ready to run a cohort?

The 10-phase Course Delivery SOP scales to 250 students per cohort, $370 typical spend, $893 hard ceiling. Each student fires their own training run, ships their own adapter.

Read the SOP → Skip to student quick-start Open Foundry UI

The model is the transistor. Your fine-tuned system is the product.