Automated research systems

Research loops you can audit.

Mutome is a local-first workbench for eval-driven research loops: propose candidates, run validators and benchmarks, keep what improves, and archive what fails.

Randomness in discovery.Determinism in verification.

Request access
campaign ledgerEvidence trail

Model confidence is not proof.

The engine separates discovery from promotion. A model can suggest a protocol, patch, proof strategy, or experiment; Mutome only moves it forward when the run leaves replayable evidence behind.

Propose. Verify. Archive.

01Propose protocol plate illustration

Propose

Generate candidate lab protocols, code patches, proof routes, and experiment variants through structured exploration.

02Verify protocol plate illustration

Verify

Evaluate candidates with deterministic validators, frozen benchmarks, replay commands, and predeclared metrics.

03Archive protocol plate illustration

Archive

Keep reports, stdout, artifacts, hashes, failures, counterexamples, and decisions as reusable research memory.

No promotion without a gate.

00heuristic protocol plate illustrationheuristic

candidate idea, untrusted until measured

01model_checked protocol plate illustrationmodel_checked

tests, validators, or adapters reproduce the claim

02solver_certified protocol plate illustrationsolver_certified

a solver certifies the relevant subclaim

03proof_checked protocol plate illustrationproof_checked

machine-checkable proof artifacts exist

04reviewed protocol plate illustrationreviewed

human review, attribution, and reproduction are complete

Engine now. Desktop next.

The current engine runs protocol search, protocol autotune, and command autotune. It can evolve lab protocols against frozen validators, run git-backed patch/eval/metric loops, and package campaigns with artifacts, hashes, reports, and candidate ledgers.

The desktop release turns those outputs into a native research cockpit: launch runs, watch score lift, inspect the current mutation, connect local providers, and decide whether to keep, harden, or discard a candidate.

Research automation, with the evidence still attached.

Request access