How It Works

Quell has four stages: Read → Check → Verify → Write. Each stage is independent, reversible, and safe by default.

Stage 1: Read specs

Quell reads specifications that already exist in your codebase. Each spec reader returns a list of Requirement objects.

Docstring reader

Parses Google-style and plain docstrings:

Raises: blocks → MUST_RAISE requirements
Returns: blocks → MUST_RETURN requirements
Boundary phrases ("must be greater than 0", "cannot exceed") → BOUNDARY requirements
Enumeration phrases ("one of USD, EUR, GBP") → ENUM_VALID requirements

Type reader

Reads Pydantic models and type annotations:

Field(gt=0), Field(ge=18), Field(min_length=1) → BOUNDARY requirements
Literal["USD", "EUR", "GBP"] fields → ENUM_VALID requirements
Function arguments with Literal type → ENUM_VALID requirements

Bug reader

Converts natural language bug descriptions into BUG_REPRO requirements using an LLM prompt. Used by quell reproduce.

Every spec reader returns [] on any error — they never raise exceptions.

Stage 2: Check coverage

An AST-based coverage checker scans your test files and marks each Requirement as covered or uncovered. No test execution required.

Coverage heuristics:

Requirement kind	Covered if test file contains...
`MUST_RAISE`	`pytest.raises(ExceptionType)`
`BOUNDARY`	assertion with boundary constants (0, -1, 1)
`ENUM_VALID`	assertion referencing the enum values
`MUST_RETURN`	assertion on the return value
`BUG_REPRO`	never covered (always generates a test)

When in doubt, the checker marks a requirement as uncovered. Duplicate tests are cheaper than missed gaps.

Stage 3: Verify — The Moat

This is the most important stage. A test is only accepted if it satisfies both conditions:

PASS on original code — run pytest <temp_test_file>. If this fails, the generated test is already broken.
FAIL on violated code — inject a violation into the source, run pytest again. If the test still passes, it doesn't actually prove the requirement.

Both conditions are required. A test that passes both is verified and proceeds to the writer.

Violation injection

Quell injects minimal violations to trigger requirement failures:

Kind	Violation
`MUST_RAISE`	Comment out the `raise` statement
`BOUNDARY`	Weaken the threshold to `-9999`
`MUST_RETURN`	Replace the return with `return None`
`BUG_REPRO`	No injection — the bug already exists

Isolation

Verification always runs in a subprocess (subprocess.run), never in-process. This ensures:

Violations load fresh (no module caching)
Failures are isolated (a crashing test doesn't kill Quell)
Timeouts are enforced cleanly

File safety

try:
    backup_source()
    inject_violation()
    run_pytest()
finally:
    restore_source()    # ALWAYS runs
    cleanup_temp()      # ALWAYS runs

The source file is always restored, even if verification crashes or times out.

Stage 4: Write

Verified tests are injected into the target test file using libcst — a lossless concrete syntax tree parser.

Unlike regex-based injection, libcst:

Preserves your existing comments, blank lines, and indentation
Validates the resulting source parses correctly before writing
Backs up the file before making any change
Restores on failure

The final write sequence:

Parse existing test file with libcst
Parse new test function with libcst
Append the new function node
Validate the combined source
Write to disk

If step 4 or 5 fails, the backup is restored and Quell exits cleanly.

Diagnostic report

After every --fix run, Quell writes .quell/report.json — a privacy-safe diagnostic file that records where the rule engine succeeded, where it failed, and which argument types it couldn't stub.

{
  "quell_version": "0.4.4",
  "total_requirements": 79,
  "written": 41,
  "fails_on_correct": 15,
  "doesnt_catch_violation": 0,
  "skipped": 5,
  "unknown_type_frequency": {},
  "failure_reason_frequency": {
    "test_logic_incorrect": 15
  },
  "_note": "This report contains no source code or full paths. Safe to share with the Quell maintainer to improve the rule engine."
}

What IS recorded: function names, constraint kinds, verification outcome, unknown type annotations, aggregate counts.

What is NOT recorded: source code, function bodies, full file paths, or any data that could identify proprietary business logic.

Share this file with the Quell maintainer to improve rule engine coverage — each unknown_type_frequency entry tells us exactly which type stubs to add next.

Audit log

Every action Quell takes is appended to .quell/audit.jsonl as a structured JSON record:

{
  "timestamp": "2026-05-08T14:32:01Z",
  "requirement_id": "req_abc123",
  "action": "test_written",
  "file_path": "tests/test_payments.py",
  "test_function_name": "test_process_payment_must_raise_valueerror",
  "verification_status": "VERIFIED"
}

Design invariants

These invariants are enforced throughout the codebase and must never be broken:

verifier.py — always restores source files in a finally block
writer.py — always backs up before writing, always restores on failure
writer.py — always validates CST parses correctly before writing to disk
No code is sent to any server unless an LLM provider is configured
LLM is only called for complex/unstructured specs — never for ones the rule engine handles
Verification runs in a subprocess — never in-process
Every spec reader returns [] on any error — never raises