Foundry
Foundry manages the full lifecycle of LoRA adapters — from raw data to deployed model. Point it at training data, tell it how to evaluate, and Foundry handles the rest: quality scanning, training on cloud GPUs, evaluation, and deployment tracking. The current release ships the data and training layers; autonomous eval loops and self-improvement are on the roadmap.
7-command interface for your terminal workflow. Zero code changes required — point it at an existing adapter directory and it reads everything automatically.
This interface. Lineage graph, quality dashboard, fine-tuning run tracker, data pipeline, and deployment registry — all in one place.
Every UI action has a corresponding API endpoint. Automate from CI/CD, scripts, or your own tooling. GET /api/v1/ for the full spec.
Quick Start
Get from nothing to a tracked training run in 5 steps.
Prerequisites
- Python 3.11+ — check with
python3 --version - Either a trained adapter directory (containing
adapter_config.json) or a JSONL training file to ingest - For cloud training: a Kaggle account + API token or a Modal account (see Compute Backends)
Running foundry init in an empty directory is fine — you can register data later with foundry ingest.
Install and start the server
pip install -e foundry/
python -m uvicorn foundry.api:app --host 127.0.0.1 --port 12001
The UI is now available at http://127.0.0.1:12001. Default login: admin / admin
⚠ Change the default password immediately if this machine is accessible to others.
Initialize a project in your training directory
cd /path/to/my-lora-project
foundry init
Foundry scans the directory, finds JSONL datasets and adapter directories, and registers them. Creates a .foundry/ folder (auto-added to .gitignore). Running in an empty directory is fine — register data later with foundry ingest ./data/train.jsonl.
Track a training run
foundry track train \
--adapter ./outputs/recon-lora \
--data ./data/recon_train.jsonl \
--backend kaggle-t4x2 \
--notes "First run, lr=2e-4"
Foundry reads the adapter directory, extracts LoRA config, loss history, and status — no training code changes needed.
Example output
Scanner: unsloth
Status: complete
Base model: Qwen/Qwen3-8B
LoRA r: 32
Steps: 300/300
Train loss: 0.3881
History: 300 data points
Data: linked (data/recon_train.jsonl)
✓ Recorded experiment: exp-a1b2c3d4
View status and lineage
foundry status
Project: Recon Adapter
Root: /Users/you/recon-project
Datasets (1):
recon_train.jsonl 4,821 examples
Experiments (1):
recon-lora complete LoRA r=32 loss: 0.388
Or open the Graph tab in the UI to see the lineage graph: dataset → training run → adapter → deployment.
Workflow — Dataset Versioning
A versioned dataset is an immutable snapshot of your processed training data, identified by an integer (v1, v2, v3…). Every training run references a specific version so you always know exactly what data produced which adapter.
The 6-step pipeline that creates v1
Register a source
Tell Foundry where your raw trajectories live. Sources can be local directories (containing JSONL or mission trajectory files).
UI: Sources tab → "Add Source"
curl -X POST http://localhost:12001/api/v1/data-studio/sources \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "VAPT Trajectories", "source_type": "local",
"config": {"path": "/path/to/vapt/data/trajectories"}}'
Discover + Sync
Discover does a dry-run scan to count available files. Sync pulls them into Foundry's internal store.
UI: Sources tab → Discover → Sync
SOURCE_ID="src-abc123"
# Dry-run preview
curl -X POST http://localhost:12001/api/v1/data-studio/sources/$SOURCE_ID/discover \
-H "Authorization: Bearer $TOKEN"
# Pull files in
curl -X POST http://localhost:12001/api/v1/data-studio/sources/$SOURCE_ID/sync \
-H "Authorization: Bearer $TOKEN" -d '{}'
Run the pipeline
Converts raw trajectories into merged, adapter-split JSONL. The pipeline also runs a quality scan automatically.
UI: Pipeline tab → "Run Pipeline"
curl -X POST http://localhost:12001/api/v1/data-studio/pipeline/run \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"source_ids": ["src-abc123"]}'
# Poll until status is "complete"
curl http://localhost:12001/api/v1/data-studio/pipeline/status \
-H "Authorization: Bearer $TOKEN"
Review quality flags
Check the Quality tab. Resolve high-severity flags before committing — empty responses and malformed messages will degrade training.
UI: Quality tab → Run Scan → review queue
curl -X POST http://localhost:12001/api/v1/data-studio/quality/scan \
-H "Authorization: Bearer $TOKEN"
curl "http://localhost:12001/api/v1/data-studio/quality/flags?severity=high" \
-H "Authorization: Bearer $TOKEN"
Commit → creates an immutable snapshot
Commit seals the current state of your clean data as v1 (or v2, v3… if versions already exist). The version ID never changes — it's a permanent record of this exact data state.
UI: Deploy tab → "Commit Version"
curl -X POST http://localhost:12001/api/v1/data-studio/commit \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"notes": "Initial dataset — 4,821 VAPT trajectories"}'
# → {"id": "uuid", "version": 1, "trajectory_count": 0, ...}
Push to Kaggle → dataset is ready for training
Formats v1 through each adapter's model chat template and uploads as a versioned Kaggle dataset. A training run then references this dataset_ref to consume it.
UI: Deploy tab → "Push to Kaggle"
curl -X POST http://localhost:12001/api/v1/data-studio/push-to-kaggle \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"user_id": "your-user-uuid",
"model_profile_id": "qwen3-8b",
"dataset_ref": "nt1412/tactician-training-data",
"adapters": ["recon", "web", "privesc", "lateral", "windows"]
}'
# → {"action": "created", "dataset_ref": "nt1412/tactician-training-data",
# "data_version": 1, "adapters_pushed": ["recon", "web", ...]}
The Kaggle dataset now contains one JSONL per adapter, formatted for the selected model. Submit a training run pointing at this dataset_ref to start fine-tuning.
Workflow — Updating a Dataset
You never modify a committed version — you create a new one. This preserves the lineage between training runs and their exact data.
Option A — Add new trajectories from a source
Sync new missions from an existing source (or register a new one), re-run the pipeline, scan quality, then commit again. The version auto-increments: v1 → v2.
# Re-sync with new data
curl -X POST http://localhost:12001/api/v1/data-studio/sources/$SOURCE_ID/sync \
-H "Authorization: Bearer $TOKEN" -d '{}'
# Re-run pipeline
curl -X POST http://localhost:12001/api/v1/data-studio/pipeline/run \
-H "Authorization: Bearer $TOKEN" \
-d '{"source_ids": ["src-abc123"]}'
# Commit → creates v2
curl -X POST http://localhost:12001/api/v1/data-studio/commit \
-H "Authorization: Bearer $TOKEN" \
-d '{"notes": "Added 800 new privesc trajectories"}'
Option B — Fix specific examples via the staging area
Edit or delete individual examples before committing. Staged changes are held in a session until you commit or discard them.
# Edit example at index 42
curl -X PUT http://localhost:12001/api/v1/data-studio/staging/42 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "system", "content": "..."}, ...]}'
# Remove a flagged example
curl -X DELETE http://localhost:12001/api/v1/data-studio/staging/7 \
-H "Authorization: Bearer $TOKEN"
# Preview what the commit would look like
curl "http://localhost:12001/api/v1/data-studio/commit/preview" \
-H "Authorization: Bearer $TOKEN"
# Commit → creates next version
curl -X POST http://localhost:12001/api/v1/data-studio/commit \
-H "Authorization: Bearer $TOKEN" \
-d '{"notes": "Removed 87 empty-response examples"}'
After committing v2, push to Kaggle again — this creates a new version of the Kaggle dataset (not a new dataset). Training runs can reference either version by setting "data_version": 1 or "data_version": 2 in the push payload.
Workflow — Training an Adapter from a Versioned Dataset
Once a version is pushed to Kaggle, submit a training run that references it. The run downloads, trains, downloads artifacts, and converts them — you can watch every step in the Fine-tuning Runs tab.
Submit the run
curl -X POST http://localhost:12001/api/v1/runs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"backend": "kaggle",
"config": {
"adapters": ["recon"],
"base_model": "Qwen/Qwen3-8B",
"epochs": 3,
"dataset_ref": "nt1412/tactician-training-data"
}
}'
Watch progress
UI: Fine-tuning Runs tab → Expand the row → Pipeline stepper + loss curve. Or poll the API:
curl http://localhost:12001/api/v1/runs/$RUN_ID/progress \
-H "Authorization: Bearer $TOKEN"
# → {"status": "running", "progress": {"loss_history": [...]}}
When status is ready_for_eval — run a local eval
curl -X POST http://localhost:12001/api/v1/runs/$RUN_ID/eval \
-H "Authorization: Bearer $TOKEN"
Record the result as an experiment
Link it to the dataset version for full lineage:
curl -X POST http://localhost:12001/api/v1/projects/recon/experiments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"adapter_path": "outputs/recon-lora-v2",
"data_version": "v2-uuid",
"status": "complete",
"train_loss": 0.38,
"compute_backend": "kaggle-t4x2",
"notes": "Trained on data v2, 3 epochs"
}'
The Graph tab now shows the full chain: data v2 → experiment → eval → deployment.
CLI — Installation
The foundry CLI is installed as a Python package. It requires Python 3.11+.
# From the repo root
pip install -e foundry/
# Verify
foundry --version
All CLI commands require a Foundry project — a directory containing a .foundry/ folder created by foundry init. Commands walk up the directory tree to find the project root, so you can run them from any subdirectory.
foundry init
Initialize a Foundry project. Scans the directory for JSONL datasets and LoRA adapter directories and registers them.
foundry init [PATH] [OPTIONS]
Arguments:
PATH Directory to initialize (default: current directory)
Options:
--name TEXT Project name (default: directory name)
--yes, -y Accept all defaults without interactive prompts
Examples
# Initialize current directory interactively
foundry init
# Initialize a specific directory, no prompts
foundry init /path/to/project --name "Recon Adapter" --yes
What it creates
.foundry/
foundry.db # SQLite database (all tracking data)
foundry.yaml # Project config (name, id)
versions/ # Cleaned dataset versions (created by foundry version)
foundry track
Track training artifacts. Currently supports train — more sub-commands coming for evals and deployments.
foundry track train
foundry track train [OPTIONS]
Options:
--adapter PATH Path to adapter output directory [required]
--data PATH Path to training JSONL (creates lineage link)
--config PATH Path to training config JSON or TOML
--backend TEXT Compute backend label (e.g. "kaggle-t4x2", "local-mlx")
--notes TEXT Free-text notes
--tag TEXT Tag (repeatable: --tag v1 --tag experiment)
--id TEXT Override auto-generated experiment ID
How it works
Foundry scans the adapter directory using a chain of scanners (Unsloth, HuggingFace PEFT, generic). It extracts without you changing any training code:
- Base model name (from
adapter_config.jsonortrainer_state.json) - LoRA rank, alpha, and target modules
- Training steps completed vs total
- Train and validation loss (final + full history for the loss curve)
- Learning rate, batch size, gradient accumulation
- Training duration
Examples
# Basic tracking — just the adapter
foundry track train --adapter ./outputs/recon-lora
# Full tracking with data lineage and compute backend label
foundry track train \
--adapter ./outputs/recon-lora-v2 \
--data ./data/recon_train_v2.jsonl \
--backend kaggle-t4x2 \
--tag v2 \
--notes "Increased LoRA rank to 32, reduced lr to 1e-4"
# Track a Kaggle download (post-hoc)
foundry track train \
--adapter ~/Downloads/recon-lora-kaggle \
--backend kaggle-p100 \
--source "nt1412/tactician-lora-training-recon"
foundry ingest
Bring external data or adapter artifacts into the project registry. Automatically detects whether the path is a dataset (JSONL) or an adapter directory.
foundry ingest PATH [OPTIONS]
Arguments:
PATH File or directory to ingest
Options:
--as CHOICE Asset type: auto | dataset | adapter | bundle
(default: auto — detects from file extension / directory contents)
--format TEXT Override format detection
--source TEXT Where this came from (URL, person, service name)
--version TEXT Version label (e.g. "v2", "2024-03-01")
--base-model TEXT Base model name (for adapters missing adapter_config.json)
--notes TEXT Free-text notes
--yes, -y Accept duplicate checksum without prompting
--no-quality Skip quality scan (faster for very large files)
Examples
# Ingest a JSONL dataset (auto-detected, quality scan included)
foundry ingest ./data/privesc_train.jsonl \
--source "generated by tactician-collect" \
--version v1
# Ingest a pre-trained adapter downloaded from HuggingFace
foundry ingest ~/Downloads/qwen3-recon-lora \
--as adapter \
--base-model "Qwen/Qwen3-8B" \
--source "huggingface.co/nt1412/qwen3-recon-lora"
# Skip quality scan for a large file
foundry ingest ./data/big_dataset.jsonl --no-quality
Duplicate detection: Foundry computes a SHA-256 checksum on ingest. If you try to ingest the same file twice, it will warn you and ask before creating a duplicate entry.
foundry status
Show a summary of everything registered in the current project: datasets, training experiments, evals, and deployments. Also surfaces common issues (e.g. incomplete runs with a higher LoRA rank than completed ones).
foundry status
Example output
Project: Recon Adapter
Root: /Users/you/recon-project
Datasets (2):
recon_train.jsonl 4,821 examples
recon_train_v2.jsonl 4,651 examples v2
Experiments (3):
recon-lora complete LoRA r=16 loss: 0.412
recon-lora-v2 complete LoRA r=32 loss: 0.388
recon-lora-v3 incomplete (step 180/500)
foundry quality
Run a quality scan on a registered dataset and show the results. Checks for empty responses, malformed message structure, duplicate examples, oversized inputs, and other issues.
foundry quality DATASET_ID [OPTIONS]
Arguments:
DATASET_ID Dataset ID or name
Options:
--output CHOICE Output format: text (default) | json
Examples
# Text output (human-readable)
foundry quality recon_train.jsonl
# JSON output (for scripts/CI)
foundry quality recon_train.jsonl --output json | jq '.flag_groups'
Example output
Quality scan: recon_train.jsonl (4,821 examples)
4,651 clean (96.5%)
Issues:
x 87 empty_response [high]
! 52 oversized_input [medium]
- 31 duplicate_conversation [low]
foundry inspect
Browse individual examples in a registered dataset. Useful for spot-checking quality or reviewing flagged examples before cleaning.
foundry inspect DATASET_ID [OPTIONS]
Arguments:
DATASET_ID Dataset ID or name
Options:
--limit INT Max examples to show (default: 20)
--flagged Show only flagged examples (high/medium severity)
--output CHOICE Output format: text (default) | json
Examples
# Browse first 20 examples
foundry inspect recon_train.jsonl
# Inspect only flagged examples
foundry inspect recon_train.jsonl --flagged --limit 50
# Export flagged examples as JSON for further analysis
foundry inspect recon_train.jsonl --flagged --output json > flagged.json
foundry data
Inspect and curate multi-turn training corpora (PDCA tool-call traces). Operates on directories like data/training/pdca-toolcalls-{tag}-mt-clean/{web,recon}/train.jsonl. Records are scored as high-progression when they cover ≥4 distinct tools or include a report_finding call — the same definition the corpus builder uses.
Subcommands
foundry data list List discovered corpora + bucket counts + hp%
foundry data show TAG [OPTIONS] Per-record stats for one corpus/bucket
foundry data review TAG [OPTIONS] Interactive keep/drop/flag review
foundry data curate TAG [OPTIONS] Write a curated copy honoring the manifest
foundry data show
foundry data show TAG [OPTIONS]
Arguments:
TAG Corpus tag (e.g. v16) — full dir name also accepted
Options:
--bucket CHOICE web | recon (default: web)
--limit INT Max records to show (default: 20)
--hp-only Only high-progression records
--output CHOICE text (default) | json
foundry data review
foundry data review TAG [OPTIONS]
Options:
--bucket CHOICE web | recon (default: web)
--start INT Start at this record index (default: 0)
--limit INT Max records this session (default: 10)
Keys during review: [k]eep [d]rop [f]lag [s]kip [q]uit
Verdicts persist to .foundry/curation/{tag}-{bucket}.json so you can split a long
review across sessions; resume with --start <next idx>.
foundry data curate
foundry data curate TAG [OPTIONS]
Options:
--bucket CHOICE web | recon | both (default: both)
--apply Actually write the curated corpus (default is dry-run)
--dest-suffix Sibling-dir suffix; default "-curated" → pdca-toolcalls-{tag}-mt-curated
Examples
# See what corpora exist
foundry data list
# tag web web_hp recon recon_hp total hp%
# v15 47 10 29 15 76 33%
# v16 57 20 29 15 86 41%
# Inspect only the high-progression records in v16/web
foundry data show v16 --bucket web --hp-only
# Review the recon bucket of v16, 10 at a time, resuming from record 30
foundry data review v16 --bucket recon --start 30 --limit 10
# Dry-run a curation report (uses prior keep/drop/flag manifest)
foundry data curate v16
# Write the curated corpus
foundry data curate v16 --apply
# → data/training/pdca-toolcalls-v16-mt-curated/{web,recon}/train.jsonl
How verdicts are applied: records marked drop are excluded from the curated copy. flag keeps the record but tags it for future attention. keep is the default for any record without a verdict — curation is opt-out, not opt-in, so partial review still produces a usable corpus.
foundry train
Fire Modal training runs and auto-track them in the local CLI DB. Each adapter you train gets its own running Experiment row with the Modal run_id; later foundry sync pushes them to the server so the UI shows them alongside everything else.
Subcommands
foundry train fire ADAPTERS... [OPTIONS] Spawn a Modal training run (1+ adapters)
foundry train status RUN_ID Modal state + per-adapter row summary
foundry train logs RUN_ID [--tail|--events-only] Stream Modal logs
foundry train download RUN_ID [--dest DIR] Pull trained adapters off the Modal volume
foundry train list [--limit N] Recent training runs from the CLI DB
foundry train cancel RUN_ID [--yes] Stop a running Modal call
foundry train fire
foundry train fire ADAPTERS... [OPTIONS]
Arguments:
ADAPTERS One or more adapter names (e.g. web, recon, web-pdca-v17)
Options:
--base-model TEXT Default: Qwen/Qwen2.5-7B-Instruct
--epochs INTEGER Default: 3
--batch-size INTEGER Default: 2
--learning-rate FLOAT Default: 1e-4
--max-seq-length INTEGER Default: 2048
--lora-rank INTEGER Default: 32
--training-data-dir PATH Override default; expects {adapter}/train.jsonl inside
--watch Block and poll Modal, parsing JSON events from logs
into loss_history; updates the experiment row on each step
--poll-seconds INTEGER Default: 15 (with --watch)
Examples
# Fire v17 training on the curated v16 corpus, return immediately
foundry train fire web recon \
--training-data-dir data/training/pdca-toolcalls-v16-mt-curated \
--epochs 3 --lora-rank 32
# Fire and watch — terminal is your live training dashboard
foundry train fire web --watch
# A few minutes later, from a different terminal
foundry train list
foundry train status callid-xyz:run-abc
foundry train logs callid-xyz:run-abc --events-only
# When the run completes, pull the artifacts
foundry train download callid-xyz:run-abc \
--dest packages/tactician/adapters/
# Then push everything to the UI server
foundry sync --url http://localhost:8347
How auto-track works: on fire, each adapter gets a deterministic experiment id of the form modal-{short_run_id}-{adapter} with status running. With --watch, the CLI polls Modal logs, parses the structured JSON events the training emits ({"event":"step","step":N,"loss":x}), and re-upserts the row so train_loss, steps_done/total_steps, and loss_history all stay current. On terminal state we set status to complete / error / cancelled and record duration_seconds.
foundry version
Create a cleaned version of a dataset by removing all examples flagged by specified quality rules. The new file is saved to .foundry/versions/ and registered as a child dataset (preserving lineage).
foundry version DATASET_ID [OPTIONS]
Arguments:
DATASET_ID Dataset ID or name
Options:
--exclude-rule TEXT Comma-separated rule names to strip (e.g. "empty_response,malformed")
If omitted, removes ALL flagged examples.
--label TEXT Version label override (default: auto v1, v2, …)
--notes TEXT Notes for this version
Examples
# Remove all flagged examples, auto-label as v1
foundry version recon_train.jsonl
# Remove only high-severity issues, keep medium/low
foundry version recon_train.jsonl \
--exclude-rule "empty_response,malformed_messages" \
--label "v2-partial-clean" \
--notes "kept oversized inputs for now"
Lineage: The new dataset is registered with a parent_id pointing to the original. The Graph tab shows this as a lineage edge: dataset → cleaned dataset → training run.
REST API — Authentication
The API uses JWT bearer tokens. Log in to get a token, then include it in subsequent requests.
# Login
curl -X POST http://localhost:12001/api/v1/login \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "admin"}'
# → {"access_token": "eyJ...", "token_type": "bearer"}
# Use the token
curl http://localhost:12001/api/v1/runs \
-H "Authorization: Bearer eyJ..."
The full OpenAPI spec (with request/response schemas for every endpoint) is available at http://localhost:12001/docs.
REST API — Sources
Sources are external data connections (directories, S3 paths, etc.) that feed the data pipeline.
GET /api/v1/data-studio/sources List all sources
POST /api/v1/data-studio/sources Register a new source
POST /api/v1/data-studio/sources/{id}/discover Scan a source for files
POST /api/v1/data-studio/sources/{id}/sync Pull latest files from source
DELETE /api/v1/data-studio/sources/{id} Remove a source
GET /api/v1/data-studio/sources/{filename} Download a source file
Register a source
curl -X POST http://localhost:12001/api/v1/data-studio/sources \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "VAPT Trajectories",
"type": "local",
"path": "/path/to/vapt/data/trajectories"
}'
REST API — Pipeline
The pipeline converts raw source files into training-ready JSONL and pushes to Kaggle.
POST /api/v1/data-studio/pipeline/run Start a pipeline run
GET /api/v1/data-studio/pipeline/runs List pipeline runs
GET /api/v1/data-studio/pipeline/runs/{id} Get run status
GET /api/v1/data-studio/pipeline/status Current pipeline state
Trigger a pipeline run
curl -X POST http://localhost:12001/api/v1/data-studio/pipeline/run \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"source_ids": ["src-abc123", "src-def456"]}'
REST API — Quality
GET /api/v1/data-studio/quality/dashboard Aggregated stats + rule breakdown
GET /api/v1/data-studio/quality/flags List all flagged examples
PUT /api/v1/data-studio/quality/flags/{id} Update a flag (mark reviewed, etc.)
POST /api/v1/data-studio/quality/scan Run a fresh quality scan
Trigger a scan and get the dashboard
# Kick off a scan
curl -X POST http://localhost:12001/api/v1/data-studio/quality/scan \
-H "Authorization: Bearer $TOKEN"
# Poll dashboard
curl http://localhost:12001/api/v1/data-studio/quality/dashboard \
-H "Authorization: Bearer $TOKEN"
# → {"total_trajectories": 1240, "pct_flagged": 3.8, "per_rule": {...}}
REST API — Dashboard
Returns platform-wide health and storage stats — used by the main dashboard view.
GET /api/v1/dashboard
curl http://localhost:12001/api/v1/dashboard \
-H "Authorization: Bearer $TOKEN"
# → {
# "disk_total_gb": 460, "disk_used_gb": 120, "disk_free_gb": 340,
# "training_data_gb": 2.4,
# "adapters": ["recon", "web", "privesc", "lateral", "windows"],
# "total_experiments": 12, "total_evals": 4
# }
REST API — Projects
Projects correspond to adapter types (recon, web, privesc, lateral, windows). Each project has its own experiments, evals, deployments, and dataset versions.
GET /api/v1/projects List all projects with stats
GET /api/v1/projects/{adapter} Full project snapshot (all sub-resources)
GET /api/v1/projects/{adapter}/graph Per-adapter lineage graph
GET /api/v1/graph Cross-adapter global lineage graph
List projects
curl http://localhost:12001/api/v1/projects \
-H "Authorization: Bearer $TOKEN"
# → [{"adapter": "recon", "exists": true, "stats": {...}}, ...]
Dataset sub-resources
GET /api/v1/projects/{adapter}/datasets/versions List dataset versions
POST /api/v1/projects/{adapter}/datasets/versions Register a new version
GET /api/v1/projects/{adapter}/datasets/stats Row counts + size breakdown
GET /api/v1/projects/{adapter}/datasets/quality Quality flag summary
GET /api/v1/projects/{adapter}/datasets/examples Browse individual examples
GET /api/v1/projects/{adapter}/datasets/distribution Intent/category distribution
REST API — Experiments
Experiments represent training runs tracked by the CLI (foundry track train) or imported from external sources.
GET /api/v1/projects/{adapter}/experiments List experiments
POST /api/v1/projects/{adapter}/experiments Create a new experiment record
GET /api/v1/projects/{adapter}/experiments/{exp_id} Get experiment details
POST /api/v1/projects/{adapter}/experiments/import Import an external experiment
Create an experiment record
curl -X POST http://localhost:12001/api/v1/projects/recon/experiments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"adapter_path": "outputs/recon-lora-v2",
"status": "complete",
"base_model": "Qwen/Qwen3-8B",
"lora_rank": 32,
"train_loss": 0.388,
"compute_backend": "kaggle-t4x2",
"notes": "Second run, reduced lr"
}'
REST API — Evals
GET /api/v1/projects/{adapter}/evals List evals for an adapter
POST /api/v1/projects/{adapter}/evals Record an eval result
Record an eval
curl -X POST http://localhost:12001/api/v1/projects/recon/evals \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"experiment_id": "exp-abc123",
"total_count": 50,
"passing_count": 44,
"avg_score": 88.0,
"verdict": "pass",
"notes": "Tested against DVWA recon benchmark"
}'
REST API — Deployments
Deployments register a running inference server for an adapter. Foundry tracks the PID and reports whether the process is alive.
GET /api/v1/projects/{adapter}/deployments List deployments (with live status)
POST /api/v1/projects/{adapter}/deployments Register a deployment
DELETE /api/v1/projects/{adapter}/deployments/{dep_id} Remove + SIGTERM the process
Register a deployment
curl -X POST http://localhost:12001/api/v1/projects/recon/deployments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"host": "localhost",
"port": 8080,
"serving_engine": "llama.cpp",
"status": "active",
"pid": 12345
}'
REST API — Training Runs
Training runs are fine-tuning jobs dispatched to compute backends (Kaggle, Modal, local). These are distinct from experiments — a run submits the job; an experiment records the resulting artifacts.
GET /api/v1/runs List all runs
POST /api/v1/runs Create and submit a new run
GET /api/v1/runs/{id} Get run details + status
GET /api/v1/runs/{id}/progress Live progress + loss curve data
GET /api/v1/runs/{id}/logs Raw log output
POST /api/v1/runs/{id}/cancel Cancel a running job
POST /api/v1/runs/{id}/eval Trigger local evaluation on downloaded adapter
POST /api/v1/runs/{id}/retry-download Re-download artifacts after a failed download step
POST /api/v1/runs/{id}/retry-convert Re-run conversion after a failed convert step
Submit a Kaggle training run
curl -X POST http://localhost:12001/api/v1/runs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"backend": "kaggle",
"config": {
"adapters": ["recon", "web"],
"base_model": "Qwen/Qwen3-8B",
"epochs": 3,
"dataset_ref": "nt1412/tactician-training-data"
}
}'
Poll until complete
RUN_ID="7ec5f1e5-..."
while true; do
STATUS=$(curl -s http://localhost:12001/api/v1/runs/$RUN_ID \
-H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo "Status: $STATUS"
[[ "$STATUS" == "complete" || "$STATUS" == "error" ]] && break
sleep 30
done
Run status values
| Status | Meaning |
|---|---|
| queued | Submitted to backend, waiting for GPU |
| running | Training in progress |
| downloading | Pulling artifacts from backend |
| converting | Converting adapter weights to local format |
| ready_for_eval | Adapter downloaded and ready to test locally |
| evaluating | Local eval running |
| complete | All steps done |
| error | Failed — check error_message field |
REST API — Data Studio
GET /api/v1/data-studio/versions List dataset versions (all adapters)
GET /api/v1/data-studio/preview Preview merged training data
POST /api/v1/data-studio/convert Convert trajectory files to JSONL
GET /api/v1/data-studio/convert-diff Show diff from last conversion
GET /api/v1/data-studio/staging List staged edits (current session)
GET /api/v1/data-studio/staging/session Get current staging session info
PUT /api/v1/data-studio/staging/{index} Edit a staged example
DELETE /api/v1/data-studio/staging/{index} Remove a staged example
DELETE /api/v1/data-studio/staging/session/{id} Clear a staging session
POST /api/v1/data-studio/commit Commit staging area → new version
GET /api/v1/data-studio/commit/preview Preview what a commit would produce
POST /api/v1/data-studio/push-to-kaggle Format + push dataset to Kaggle
GET /api/v1/data-studio/model-profiles List model formatting profiles
POST /api/v1/data-studio/model-profiles Create a custom model profile
GET /api/v1/data-studio/model-profiles/{id} Get a profile
PUT /api/v1/data-studio/model-profiles/{id} Update a profile
DELETE /api/v1/data-studio/model-profiles/{id} Delete a profile
UI Guide — Graph
The Graph tab shows the full lineage of a project as a tree: datasets flow into training experiments, which produce adapters, which are deployed.
- Node colours indicate type: blue = dataset, indigo = experiment, green = eval, orange = deployment.
- Clicking a node shows its metadata panel on the right.
- Orphaned dataset nodes (no training run yet) appear at the root.
UI Guide — Fine-tuning Runs
The Fine-tuning Runs tab lists all training jobs submitted through the UI or API. Runs are scoped to the currently selected adapter (shown in the sidebar).
- New Run — opens a form to configure and submit a new Kaggle or Modal training job.
- Expand — shows the pipeline stepper, live loss curve, and error details for a run.
- ADAPTERS column — shows which adapters were trained. The current adapter is highlighted in blue. Multi-adapter runs (all 5 trained in one job) are expected to appear under every adapter.
- My Runs / All Runs toggle — filter to your own runs or see everyone's.
- Run Local Eval — triggers
POST /runs/{id}/evalto evaluate the downloaded adapter locally.
UI Guide — Quality
The Quality tab scans all trajectories in the database and reports data quality issues. This tab is global (applies to all adapters' training data combined).
- Run Scan — triggers a fresh quality scan across all trajectories. The % Flagged stat uses the total trajectory count from the last scan as denominator, so it's accurate across re-scans.
- Per-rule breakdown — each quality rule is listed with its count and severity (critical/high = red, medium = yellow, low = gray).
- Review queue — flagged examples sorted by severity. Use the queue to mark examples as reviewed so you can track remediation progress.
Quality rules
| Rule | Severity | What it checks |
|---|---|---|
| empty_response | high | Assistant turn has empty or whitespace-only content |
| malformed_messages | high | Missing required fields (role, content) or wrong message order |
| oversized_input | medium | Total token count exceeds model context limit |
| duplicate_conversation | low | Exact duplicate of another example in the dataset |
| truncated_tool_call | medium | Tool call JSON is cut off mid-stream |
UI Guide — Data Review
The Data Review tab is for record-level human curation of multi-turn PDCA training corpora (the JSONL files under data/training/pdca-toolcalls-{tag}-mt-clean/). Use it to keep, drop, or flag individual records before retraining — verdicts persist to the same .foundry/curation/ manifests that foundry data review on the CLI writes, so you can switch between CLI and UI freely.
Layout
- Top bar — pick a corpus (e.g.
v15,v16), toggle bucket (web/recon), filter to high-progression records only. Live counts of keep / drop / flag verdicts are shown on the right. - Left pane — list of records with index, mission_id, decision count, distinct tools, and current verdict pill. The trending-up icon flags high-progression records (≥4 distinct tools); the check-circle icon flags records that emit
report_finding. Click a record to open it. - Right pane — record detail with verdict buttons (Keep / Drop / Flag / Clear), an optional note, and a collapsible message thread (system / user / assistant tool calls / tool results). Use prev / next to walk the corpus without leaving the keyboard.
- Curate buttons (top right) — Curate (dry-run) reports how many records would be kept, dropped, and flagged; Apply Curation writes a sibling corpus directory
pdca-toolcalls-{tag}-mt-curated/with the dropped records excluded. The original corpus is never modified.
Verdict semantics: a record without a verdict is treated as keep. Curation is opt-out — partial review still produces a usable corpus. Flag records are kept in the curated copy but the flag persists for future attention.
REST API
GET /api/v1/data/corpora List corpora + bucket counts
GET /api/v1/data/corpora/{tag}/records?bucket=web Paginated records + verdicts
&offset=0&limit=50&hp_only=false
GET /api/v1/data/corpora/{tag}/records/{idx}?bucket=web Full record (with messages)
GET /api/v1/data/corpora/{tag}/curation?bucket=web Get curation manifest
PUT /api/v1/data/corpora/{tag}/curation/{idx} Set verdict (body: {bucket, verdict, note})
verdict ∈ ('keep','drop','flag','')
empty string clears the entry
POST /api/v1/data/corpora/{tag}/curate Dry-run or apply (body: {bucket, apply})
bucket ∈ ('web','recon','both')
UI Guide — Pipeline
The Pipeline tab converts raw trajectory files from registered sources into training-ready JSONL and optionally pushes them to Kaggle as a versioned dataset. It is global across all adapters.
Sources tab
Register source directories. Click Discover to scan for JSONL/trajectory files, then Sync to ingest them.
Pipeline tab
Select sources, click Run Pipeline. Foundry converts trajectories to chat-format JSONL using the selected model profile.
Quality tab
Run a scan to find and review issues in the converted data.
Deploy tab → Push to Kaggle
Select a model profile, pick adapters, and push. This creates a versioned Kaggle dataset ready to consume in a training notebook.
Data — JSONL Format
Training data must be in chat-format JSONL: one JSON object per line, each with a messages array. Foundry's adapters use a [TOOL] prefix notation for tool calls in assistant turns — the model learns to emit this format natively.
Tool call format
{"messages": [
{"role": "system", "content": "You are a web application security specialist..."},
{"role": "user", "content": "Continue penetration testing Sneaky. Previous progress: 4 phase(s) completed."},
{"role": "assistant", "content": "nmap completed (open ports: 22/tcp, 80/tcp). Proceeding with SSH.\n[TOOL] ssh (remote-shell) --i ~/keys/sneaky_thrasivoulos\n[TOOL] ssh (remote-shell) --i ~/keys/sneaky_thrasivoulos\n[TOOL] scp (file-transfer) --i ~/keys/sneaky_thrasivoulos"},
{"role": "user", "content": "[ssh] thrasivoulos@Sneaky:~$ uname -a\nLinux Sneaky 4.4.0-75-generic #96~14.04.1-Ubuntu SMP..."},
{"role": "assistant", "content": "ssh completed. Proceeding with privilege escalation enumeration.\n[TOOL] suid (general) --perm -2000 --ls 2>/dev/null"}
]}
Tool call syntax
[TOOL] <tool_name> (<category>) <flags>
Examples:
[TOOL] nmap (port-scan) -sV -sC -p- 10.0.0.1
[TOOL] sqlmap (sqli) --url http://target/page.php?id=1 --dbs
[TOOL] ssh (remote-shell) --i ~/keys/id_rsa user@10.0.0.1
[TOOL] hydra (brute-force) -l admin -P /usr/share/wordlists/rockyou.txt ssh://10.0.0.1
Tool result format
Tool results come back as user turns prefixed with the tool name in brackets:
{"role": "user", "content": "[nmap] PORT STATE SERVICE\n22/tcp open ssh\n80/tcp open http\n443/tcp open https"}
Rules
- Each line must be valid JSON — no trailing commas, no comments.
- The
messagesarray must start withsystemand end withassistant. - An assistant turn may contain multiple
[TOOL]calls — the model learns to emit several tool invocations per step. - Tool results are
userrole turns, not a separatetoolrole. - Foundry's quality scanner flags empty assistant turns, malformed structure, and oversized conversations automatically.
Model chat template: When pushing to Kaggle, Foundry runs each example through the model's chat template (via tokenizer.apply_chat_template) before uploading. The [TOOL] notation lives in the raw content field — the model template wraps it in whatever format the base model expects (Qwen3, Llama, Gemma, etc.).
CLI — Upcoming Commands
These commands are on the roadmap. The UI already supports the equivalent actions — CLI wrappers are planned for the next release.
foundry deploy
Coming in 0.3
Register a running inference server (llama.cpp, vLLM, Ollama) and link it to an experiment. Equivalent to the Deployments tab in the UI.
# Planned interface
foundry deploy \
--adapter ./outputs/recon-lora \
--engine llama.cpp \
--host localhost \
--port 8080
foundry graph
Coming in 0.3
Print the project's lineage graph to the terminal. Equivalent to the Graph tab in the UI.
# Planned interface
foundry graph
foundry graph --adapter recon # Per-adapter lineage only
foundry graph --output json # Machine-readable
Compute Backends
Foundry supports three compute backends for fine-tuning. Configure credentials in the Settings tab.
Modal + Unsloth (A10G)
Primary backendModal runs the full Unsloth training stack on an A10G GPU (24GB VRAM) with 6-hour timeout. Credentials are stored locally — nothing to configure in the UI beyond a one-time modal token set.
1. One-time setup
pip install modal
modal token set --token-id <id> --token-secret <secret>
# Token stored at ~/.modal/credentials.toml — Foundry reads it automatically
2. What happens when you submit a Modal run
- Foundry uploads
train.jsonlfor each adapter into a Modal Volume namedfoundry-training-data - A Modal function (
train_unsloth) is spawned on an A10G GPU - Unsloth loads the base model (4-bit quantized), applies LoRA patches, trains with SFTTrainer
- Every 10 steps, a JSON event is written to the Volume at
runs/{run_id}/training.log - Foundry polls the log and streams it to the Runs tab loss curve in real-time
- Adapter weights land at
runs/{run_id}/output/{adapter}/in the Volume - On completion, Foundry downloads the weights to your local machine
3. Training config
# Defaults (overridable via config.extra in the run payload)
GPU: A10G (24GB VRAM)
base_model: set per run (e.g. "Qwen/Qwen3-8B")
lora_rank: 16
lora_alpha: 32
epochs: 3
batch_size: 2
gradient_accum: 4 (effective batch = 8)
learning_rate: 2e-4
max_seq_length: 2048
lr_scheduler: cosine with 10% warmup
target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
4. Log event format
# Events streamed to foundry-training-data volume
{"event": "adapter_start", "adapter": "recon", "total_steps": 300}
{"event": "step", "adapter": "recon", "step": 10, "total_steps": 300, "loss": 1.2341}
{"event": "step", "adapter": "recon", "step": 20, "total_steps": 300, "loss": 0.9817}
{"event": "adapter_done", "adapter": "recon", "train_loss": 0.3881}
{"event": "adapter_error", "adapter": "web", "error": "train.jsonl not found"}
Kaggle Kernels (T4 / P100)
Free tier availableFoundry submits a training notebook to Kaggle, polls for completion, and downloads artifacts. Requires a Kaggle API token in Settings → My Credentials. Note: Kaggle's free GPU pool can allocate P100s — request a T4 or disable accelerator-type checking if you hit errors.
# In Settings → Kaggle API Token, paste your token JSON:
{"username": "your_username", "key": "your_api_key"}
# Or set environment variable: KAGGLE_API_TOKEN=<token>
Local (MLX on Apple Silicon)
Run training directly on your machine with MLX. Track the resulting adapter directory with foundry track train after the run completes — no code changes needed.
# Train with MLX LoRA
python -m mlx_lm.lora \
--model Qwen/Qwen3-8B \
--train true \
--data ./data/recon.jsonl \
--adapter-path ./outputs/recon-lora \
--num-iters 1000
# Track the result
foundry track train \
--adapter ./outputs/recon-lora \
--data ./data/recon.jsonl \
--backend local-mlx-m4 \
--notes "MLX run, 1000 iters"