How It Works
Quell has four stages: Read → Check → Verify → Write. Each stage is independent, reversible, and safe by default.
Stage 1: Read specs
Quell reads specifications that already exist in your codebase. Each spec reader returns a list of Requirement objects.
Docstring reader
Parses Google-style and plain docstrings:
Raises:blocks →MUST_RAISErequirementsReturns:blocks →MUST_RETURNrequirements- Boundary phrases ("must be greater than 0", "cannot exceed") →
BOUNDARYrequirements - Enumeration phrases ("one of USD, EUR, GBP") →
ENUM_VALIDrequirements
Type reader
Reads Pydantic models and type annotations:
Field(gt=0),Field(ge=18),Field(min_length=1)→BOUNDARYrequirementsLiteral["USD", "EUR", "GBP"]fields →ENUM_VALIDrequirements- Function arguments with
Literaltype →ENUM_VALIDrequirements
Bug reader
Converts natural language bug descriptions into BUG_REPRO requirements using an LLM prompt. Used by quell reproduce.
Every spec reader returns [] on any error — they never raise exceptions.
Stage 2: Check coverage
An AST-based coverage checker scans your test files and marks each Requirement as covered or uncovered. No test execution required.
Coverage heuristics:
| Requirement kind | Covered if test file contains... |
|---|---|
MUST_RAISE | pytest.raises(ExceptionType) |
BOUNDARY | assertion with boundary constants (0, -1, 1) |
ENUM_VALID | assertion referencing the enum values |
MUST_RETURN | assertion on the return value |
BUG_REPRO | never covered (always generates a test) |
When in doubt, the checker marks a requirement as uncovered. Duplicate tests are cheaper than missed gaps.
Stage 3: Verify — The Moat
This is the most important stage. A test is only accepted if it satisfies both conditions:
- PASS on original code — run
pytest <temp_test_file>. If this fails, the generated test is already broken. - FAIL on violated code — inject a violation into the source, run
pytestagain. If the test still passes, it doesn't actually prove the requirement.
Both conditions are required. A test that passes both is verified and proceeds to the writer.
Violation injection
Quell injects minimal violations to trigger requirement failures:
| Kind | Violation |
|---|---|
MUST_RAISE | Comment out the raise statement |
BOUNDARY | Weaken the threshold to -9999 |
MUST_RETURN | Replace the return with return None |
BUG_REPRO | No injection — the bug already exists |
Isolation
Verification always runs in a subprocess (subprocess.run), never in-process. This ensures:
- Violations load fresh (no module caching)
- Failures are isolated (a crashing test doesn't kill Quell)
- Timeouts are enforced cleanly
File safety
try:
backup_source()
inject_violation()
run_pytest()
finally:
restore_source() # ALWAYS runs
cleanup_temp() # ALWAYS runs
The source file is always restored, even if verification crashes or times out.
Stage 4: Write
Verified tests are injected into the target test file using libcst — a lossless concrete syntax tree parser.
Unlike regex-based injection, libcst:
- Preserves your existing comments, blank lines, and indentation
- Validates the resulting source parses correctly before writing
- Backs up the file before making any change
- Restores on failure
The final write sequence:
- Parse existing test file with libcst
- Parse new test function with libcst
- Append the new function node
- Validate the combined source
- Write to disk
If step 4 or 5 fails, the backup is restored and Quell exits cleanly.
Diagnostic report
After every --fix run, Quell writes .quell/report.json — a privacy-safe diagnostic file that records where the rule engine succeeded, where it failed, and which argument types it couldn't stub.
{
"quell_version": "0.4.4",
"total_requirements": 79,
"written": 41,
"fails_on_correct": 15,
"doesnt_catch_violation": 0,
"skipped": 5,
"unknown_type_frequency": {},
"failure_reason_frequency": {
"test_logic_incorrect": 15
},
"_note": "This report contains no source code or full paths. Safe to share with the Quell maintainer to improve the rule engine."
}
What IS recorded: function names, constraint kinds, verification outcome, unknown type annotations, aggregate counts.
What is NOT recorded: source code, function bodies, full file paths, or any data that could identify proprietary business logic.
Share this file with the Quell maintainer to improve rule engine coverage — each unknown_type_frequency entry tells us exactly which type stubs to add next.
Audit log
Every action Quell takes is appended to .quell/audit.jsonl as a structured JSON record:
{
"timestamp": "2026-05-08T14:32:01Z",
"requirement_id": "req_abc123",
"action": "test_written",
"file_path": "tests/test_payments.py",
"test_function_name": "test_process_payment_must_raise_valueerror",
"verification_status": "VERIFIED"
}
Design invariants
These invariants are enforced throughout the codebase and must never be broken:
verifier.py— always restores source files in afinallyblockwriter.py— always backs up before writing, always restores on failurewriter.py— always validates CST parses correctly before writing to disk- No code is sent to any server unless an LLM provider is configured
- LLM is only called for complex/unstructured specs — never for ones the rule engine handles
- Verification runs in a subprocess — never in-process
- Every spec reader returns
[]on any error — never raises