Automated research systems

Research loops you can audit.

Mutome is a local-first workbench for eval-driven research loops: propose candidates, run validators and benchmarks, keep what improves, and archive what fails.

Randomness in discovery.Determinism in verification.

candidateevalgatearchive

campaign ledgerEvidence trail

Premise

Model confidence is not proof.

The engine separates discovery from promotion. A model can suggest a protocol, patch, proof strategy, or experiment; Mutome only moves it forward when the run leaves replayable evidence behind.

Run contract

Every claim has a file trail.

01candidate_archive.jsonl

kept candidates, discarded candidates, crashes, counterexamples, and no-go routes

02results.tsv

baseline, candidate score, incumbent score, and metric direction for replay

03progress.svg

running best, discarded attempts, kept improvements, and noisy experiments

04artifact_manifest.json

commands, reports, hashes, tool disclosure, and reproducibility package

Loop

Propose. Verify. Archive.

01

Propose protocol plate illustration

Propose

Generate candidate lab protocols, code patches, proof routes, and experiment variants through structured exploration.

02

Verify protocol plate illustration

Verify

Evaluate candidates with deterministic validators, frozen benchmarks, replay commands, and predeclared metrics.

03

Archive protocol plate illustration

Archive

Keep reports, stdout, artifacts, hashes, failures, counterexamples, and decisions as reusable research memory.

Evidence

No promotion without a gate.

00

heuristic protocol plate illustration

heuristic

candidate idea, untrusted until measured

01

model_checked protocol plate illustration

model_checked

tests, validators, or adapters reproduce the claim

02

solver_certified protocol plate illustration

solver_certified

a solver certifies the relevant subclaim

03

proof_checked protocol plate illustration

proof_checked

machine-checkable proof artifacts exist

04

reviewed protocol plate illustration

reviewed

human review, attribution, and reproduction are complete

Release path

Engine now. Desktop next.

EngineAutoresearch runs today

Protocol search, protocol autotune, command autotune, benchmark adapters, scorecards, and campaign packaging run locally from the engine.

DesktopMutome Desktop is the cockpit

Launch runs, watch score lift, inspect mutations, connect local providers, and decide whether to keep, harden, or discard a candidate.

Research automation, with the evidence still attached.