Generate Verified Python Tests With Zero API Keys — How Quell's Rule Engine Works
The standard pitch for AI-powered test generation goes like this: give us your code, get back tests. What doesn't make it into the pitch: your code is being sent to a third-party API, the tests come back unverified, and the model has no idea what your requirements actually are — it's guessing based on what the code does, not what it's supposed to do.
Quell has a different default path. The rule engine handles test generation deterministically — no LLM, no API call, no network traffic. It started as a "what if we just pattern-matched the docstrings?" experiment and turned out to cover about 75% of real requirements.
What makes something rule-eligible
Requirements fall into patterns. When a docstring says "raises ValueError if amount is negative," there's exactly one meaningful test:
with pytest.raises(ValueError):
function(amount=0)
When a Pydantic field says Field(gt=0, le=10_000), there are exactly two boundary tests. When a type annotation says Literal["active", "inactive"], there's exactly one enum test. These don't require intelligence — they require pattern recognition. A rule handles them instantly, consistently, and without sending your code anywhere.
The five rule types
MUST_RAISE — triggered by Raises: entries in docstrings.
# Raises:
# ValueError: If amount is <= 0.
Generated test:
def test_process_payment_must_raise_value_error():
with pytest.raises(ValueError):
process_payment(amount=0, currency="USD")
Violation injection: comments out the raise statement. If the test still passes after that, it's discarded.
BOUNDARY — triggered by Field(gt=N), Field(lt=N), Field(min_length=N), Field(max_length=N) in Pydantic models, or from Args: sections with explicit bounds.
class Item(BaseModel):
price: float = Field(gt=0, lt=1000)
Generated tests:
def test_item_boundary_price_lower():
with pytest.raises(ValidationError):
Item(price=0)
def test_item_boundary_price_upper():
with pytest.raises(ValidationError):
Item(price=1000)
Violation injection: weakens the Field constraint (e.g., Field(gt=-1) instead of Field(gt=0)).
ENUM_VALID — triggered by Literal[...] type annotations.
def set_status(status: Literal["active", "inactive", "suspended"]) -> None:
Generated test:
def test_set_status_enum_valid_status():
with pytest.raises((ValueError, TypeError)):
set_status(status="__invalid__")
MUST_RETURN — triggered by Returns: sections in docstrings.
# Returns:
# dict with keys: transaction_id, status.
Generated test:
def test_process_payment_must_return():
result = process_payment(amount=10.0, currency="USD")
assert isinstance(result, dict)
assert "transaction_id" in result
assert "status" in result
NOT_NULL / TYPE_CHECK — triggered by PySpark StructField definitions with nullable=False.
Building valid function calls
The part that's actually tricky isn't the test template — it's constructing a valid call to the function. If the signature is:
def process_payment(
amount: float,
currency: str,
payment_method: PaymentMethod,
idempotency_key: Optional[str] = None,
) -> dict:
The rule engine needs real arguments, not None for everything. Quell's sig_inspector.py reads the signature via AST and maps each parameter to a stub value based on its type annotation:
| Type | Stub |
|---|---|
str | "test" |
int | 1 |
float | 1.0 |
bool | True |
Optional[X] | None |
List[X] | [] |
datetime.datetime | datetime.datetime(2024, 1, 1) |
Name-based inference fills in more specific values: a parameter named email gets "test@example.com", url gets "https://example.com", path gets a Path("test"). Custom class arguments get instantiated with stub values for their own parameters.
This doesn't cover every edge case — deeply nested custom types occasionally need the LLM fallback. But it handles the vast majority of real-world signatures without any configuration.
Running without an API key
pip install quelltest
quell check src/ --no-llm
quell check src/ --fix --no-llm
--no-llm is the default — you have to explicitly configure an LLM provider to enable it. Without one, the rule engine handles what it can, and requirements it can't handle are flagged but not sent anywhere. After a successful --fix run, Quell prints:
Your code never left your machine.
When you actually need the LLM
The rule engine can't handle requirements that depend on external state or domain-specific context. Something like:
# Raises:
# PaymentGatewayError: If the payment gateway is unreachable.
Triggering a gateway failure requires mocking httpx, setting up a fake server, or knowing which specific error condition to simulate. That's context the rule engine doesn't have. For these cases, Quell falls back to an LLM — but only if you've configured one, and only after the rule engine gives up.
In practice, the rule engine handles about 75% of requirements from real codebases. The LLM adds another ~20%, leaving a small fraction that needs manual tests (usually the ones involving complex external state).
Comparing the two approaches
| Rule engine | LLM | |
|---|---|---|
| Speed | Under 10ms per test | 1-5s (API latency) |
| Cost | Free | API costs |
| Privacy | Zero code sent | Code sent to provider |
| Determinism | Same input, same output | Non-deterministic |
| Coverage | ~75% of requirements | ~95% |
| Config required | No | Yes (API key) |
For teams that want fast, private, zero-cost test generation, the rule engine covers most of what you need. The LLM exists for when it doesn't.
Quickstart guide — no API key required. View the source.