MidGARD
Envisioned Solution

MidGARD is a Rust command-line tool suite that transforms TLA+ formal specifications into actionable regression test plans. As its name indicates—Model-initiated di-Graph Analysis for Regression Deployment—the process is model-initiated: the TLA+ specification model is what drives and initiates the entire pipeline, from graph generation through test sequence output. This bridges the gap between formal verification and practical test engineering by automating the creation of scientifically derived test sequences from mathematically verified models.

The tool addresses a core challenge at SanDisk: the directed graphs generated from TLA+ specifications contain an enormous (effectively infinite) number of possible test paths, far too many for any human engineer to enumerate or evaluate. MidGARD replaces intuition-driven test planning with algorithmic analysis, producing a finite, high-confidence set of test sequences that cover the state space comprehensively.

System Architecture

The system is organized as a four-stage pipeline, where each stage is a distinct module that can also be invoked independently. The stages are designed to flow sequentially from specification to deployable test plan:

# Stage Description
1 Model Checking Accepts a TLA+ specification and invokes the TLC model checker as a subprocess. If the specification passes verification, TLC generates a Graphviz DOT file encoding the full state-transition digraph. For specifications with large state spaces, VIEW and ALIAS abstractions can be applied to reduce complexity before DOT generation.
2 Graph Preprocessing Parses the raw DOT file into an in-memory directed graph structure (likely using the petgraph crate). Cleans the graph by removing self-referencing edges (artifacts of TLA+ stuttering invariance), optionally collapsing equivalent states, and stripping unreachable nodes. The result is a lean, analysis-ready digraph.
3 Analysis & Generation The core algorithmic engine. Applies directed-graph analysis techniques—including strongly connected component decomposition (Tarjan’s), path enumeration, loop detection, and coverage algorithms—to generate test sequences. Supports full-coverage mode (all nodes and edges) and n-test mode (maximally distinct routes). All generation is seeded for reproducibility.
4 Ordering & Output Reorders the generated test sequences by a user-chosen criterion (e.g., random, coverage-weighted, edge-diversity) so that any prefix of the list is a useful subset. Outputs a structured, machine-readable list of test sequences for deployment into SanDisk’s regression workflows.

Key Design Decisions

Seed-Based Reproducibility

Every run of the test generation engine accepts a seed value that deterministically controls all algorithmic decisions (path selection, tie-breaking, randomization). A given seed always produces the same output for the same graph. Users can also supply a “seed set”—a collection of seeds chosen to produce maximally distinct test suites—enabling teams to partition their regression efforts across complementary test plans.

Distinctness Heuristics

Defining what makes two test sequences “distinct” is a central open question. Our initial approach treats distinctness as a function of edge-set overlap: two routes are more distinct when they share fewer edges. We will also explore node-coverage diversity, loop-participation metrics, and other graph-theoretic distance measures as the project matures.

Modular, Extensible Architecture

Each pipeline stage is a standalone Rust module with a well-defined interface. This allows SanDisk to swap in alternative algorithms (e.g., a different coverage strategy), add new preprocessing steps, or integrate additional output formats without modifying the rest of the pipeline.

Open Questions & Future Refinement

As MidGARD is still in early development (Spring 2026), several design questions remain open and will be refined through client collaboration and experimentation:

What precisely defines “distinct” between two test sequences? Edge overlap, node coverage, loop participation, or a composite metric?

Should the test set be static (fixed once generated) or should there be a dynamic/randomized element that changes across runs?

What additional metadata or constraints might accompany the digraph (e.g., edge weights, priority nodes, forbidden paths)?

How should the system handle graphs that are too large for exhaustive analysis? What approximation strategies are acceptable?

What output formats best integrate with SanDisk’s existing regression deployment infrastructure?

These questions will be addressed iteratively through the Spring requirements phase, with the goal of having firm answers before full implementation begins in Fall 2026.