kept candidates, discarded candidates, crashes, counterexamples, and no-go routes
Automated research systems
Research loops you can audit.
Mutome is a local-first workbench for eval-driven research loops: propose candidates, run validators and benchmarks, keep what improves, and archive what fails.
Randomness in discovery.Determinism in verification.

Model confidence is not proof.
The engine separates discovery from promotion. A model can suggest a protocol, patch, proof strategy, or experiment; Mutome only moves it forward when the run leaves replayable evidence behind.
Every claim has a file trail.
baseline, candidate score, incumbent score, and metric direction for replay
running best, discarded attempts, kept improvements, and noisy experiments
commands, reports, hashes, tool disclosure, and reproducibility package
Propose. Verify. Archive.

Propose
Generate candidate lab protocols, code patches, proof routes, and experiment variants through structured exploration.

Verify
Evaluate candidates with deterministic validators, frozen benchmarks, replay commands, and predeclared metrics.

Archive
Keep reports, stdout, artifacts, hashes, failures, counterexamples, and decisions as reusable research memory.
No promotion without a gate.
heuristiccandidate idea, untrusted until measured
model_checkedtests, validators, or adapters reproduce the claim
solver_certifieda solver certifies the relevant subclaim
proof_checkedmachine-checkable proof artifacts exist
reviewedhuman review, attribution, and reproduction are complete
Engine now. Desktop next.
Protocol search, protocol autotune, command autotune, benchmark adapters, scorecards, and campaign packaging run locally from the engine.
Launch runs, watch score lift, inspect mutations, connect local providers, and decide whether to keep, harden, or discard a candidate.