Automated research systems

Research loops you can audit.

Mutome is a local-first workbench for eval-driven research loops: propose candidates, run validators and benchmarks, keep what improves, and archive what fails.

Randomness in discovery.Determinism in verification.

candidateevalgatearchive
Request access
campaign ledgerEvidence trail

Model confidence is not proof.

The engine separates discovery from promotion. A model can suggest a protocol, patch, proof strategy, or experiment; Mutome only moves it forward when the run leaves replayable evidence behind.

Every claim has a file trail.

01candidate_archive.jsonl

kept candidates, discarded candidates, crashes, counterexamples, and no-go routes

02results.tsv

baseline, candidate score, incumbent score, and metric direction for replay

03progress.svg

running best, discarded attempts, kept improvements, and noisy experiments

04artifact_manifest.json

commands, reports, hashes, tool disclosure, and reproducibility package

Propose. Verify. Archive.

01Propose protocol plate illustration

Propose

Generate candidate lab protocols, code patches, proof routes, and experiment variants through structured exploration.

02Verify protocol plate illustration

Verify

Evaluate candidates with deterministic validators, frozen benchmarks, replay commands, and predeclared metrics.

03Archive protocol plate illustration

Archive

Keep reports, stdout, artifacts, hashes, failures, counterexamples, and decisions as reusable research memory.

No promotion without a gate.

00heuristic protocol plate illustrationheuristic

candidate idea, untrusted until measured

01model_checked protocol plate illustrationmodel_checked

tests, validators, or adapters reproduce the claim

02solver_certified protocol plate illustrationsolver_certified

a solver certifies the relevant subclaim

03proof_checked protocol plate illustrationproof_checked

machine-checkable proof artifacts exist

04reviewed protocol plate illustrationreviewed

human review, attribution, and reproduction are complete

Engine now. Desktop next.

EngineAutoresearch runs today

Protocol search, protocol autotune, command autotune, benchmark adapters, scorecards, and campaign packaging run locally from the engine.

DesktopMutome Desktop is the cockpit

Launch runs, watch score lift, inspect mutations, connect local providers, and decide whether to keep, harden, or discard a candidate.

Research automation, with the evidence still attached.

Request access