Python docstrings are the most underused source of test specifications in most codebases. A well-written docstring already contains every testable requirement for a function — Raises:, Returns:, argument constraints. The only missing step is turning those requirements into actual tests.
This guide walks through exactly how to do that automatically with Quelltest, from installation to CI integration.
Why docstrings are a test specification
Consider this function:
def transfer_funds(
amount: float,
from_account: str,
to_account: str,
currency: str,
) -> TransferResult:
"""
Transfer funds between accounts.
Args:
amount: Transfer amount. Must be positive.
from_account: Source account ID.
to_account: Destination account ID.
currency: ISO 4217 code.
Returns:
TransferResult with a unique transfer_id and timestamp.
Raises:
ValueError: If amount is <= 0.
ValueError: If from_account equals to_account.
AccountNotFoundError: If either account does not exist.
InsufficientFundsError: If from_account balance is below amount.
"""
This docstring contains four testable requirements — four things your code must do that are worth verifying. Without Quelltest, you'd write all four tests manually. With Quelltest, they're generated and verified automatically.
Installation
pip install quelltest
Requires Python 3.11+.
Step 1: Check what requirements Quelltest finds
Before generating tests, see what Quelltest extracts from your docstrings:
quell check src/ --no-llm
For the transfer_funds function above, you'd see:
[docstring] MUST_RAISE transfer_funds ValueError: amount <= 0 ✗ uncovered
[docstring] MUST_RAISE transfer_funds ValueError: from == to ✗ uncovered
[docstring] MUST_RAISE transfer_funds AccountNotFoundError: no account ✗ uncovered
[docstring] MUST_RAISE transfer_funds InsufficientFundsError: low bal ✗ uncovered
Score: 0% (0/4 covered)
4 gap(s) found.
If you already have tests that cover some of these, those show as ✓ covered and are skipped.
Step 2: Generate and verify tests
Add --fix to generate tests for every uncovered requirement:
quell check src/ --fix --no-llm
Quelltest generates a test for each gap, then runs two-phase verification:
- Phase 1: The test must PASS on your current code
- Phase 2: The
raise ValueErrorline is temporarily commented out — the test must FAIL
Only tests that pass both phases are written. Output:
transfer_funds MUST_RAISE ValueError (amount) ✓ verified
transfer_funds MUST_RAISE ValueError (from==to) ✓ verified
transfer_funds MUST_RAISE AccountNotFoundError ✓ verified
transfer_funds MUST_RAISE InsufficientFundsError ✓ verified
4 tests written → tests/test_transfer.py
Score: 100% (4/4 covered)
Your code never left your machine.
What the generated tests look like
# tests/test_transfer.py — auto-generated by Quelltest 0.6.9
import pytest
from src.accounts import transfer_funds
def test_transfer_funds_raises_valueerror_amount():
"""MUST_RAISE ValueError — If amount is <= 0. [quelltest]"""
with pytest.raises(ValueError):
transfer_funds(
amount=0,
from_account="acc_001",
to_account="acc_002",
currency="USD",
)
def test_transfer_funds_raises_valueerror_same_account():
"""MUST_RAISE ValueError — If from_account equals to_account. [quelltest]"""
with pytest.raises(ValueError):
transfer_funds(
amount=100.0,
from_account="acc_001",
to_account="acc_001",
currency="USD",
)
These are plain pytest functions — no special runner, no Quelltest dependency at test time. They run with pytest like any other test.
Writing docstrings that Quelltest reads well
Quelltest supports Google-style, NumPy-style, and Sphinx-style docstrings. Here's what each looks like for Raises::
Google style (recommended):
"""
Raises:
ValueError: If amount is <= 0.
AccountNotFoundError: If the account does not exist.
"""
NumPy style:
"""
Raises
------
ValueError
If amount is <= 0.
AccountNotFoundError
If the account does not exist.
"""
Sphinx style:
"""
:raises ValueError: If amount is <= 0.
:raises AccountNotFoundError: If the account does not exist.
"""
All three produce the same MUST_RAISE requirements.
Returns: sections
"""
Returns:
TransferResult with a unique transfer_id field.
"""
This produces a MUST_RETURN requirement. Quelltest generates a test that calls the function and asserts the return value is not None and is an instance of TransferResult.
What Quelltest extracts vs what it skips
| Docstring section | Extracted? | Constraint kind |
|---|---|---|
Raises: SomeError: if X | ✅ | MUST_RAISE |
Returns: SomeType with field X | ✅ | MUST_RETURN |
Args: x: Must be positive | Partial | BOUNDARY (if numeric bound) |
Note: / Example: / See also: | ❌ | Not applicable |
| Custom sections | ❌ | Not applicable |
Args: sections are parsed for numeric constraints ("must be positive", "must not be negative") but these produce lower-confidence requirements than explicit Raises: blocks. For best results, document constraints in Raises: where possible.
Handling functions with complex argument construction
If your function requires objects that are hard to construct (database sessions, external clients), Quelltest's rule engine generates tests with stub arguments. These may fail phase 1 verification because the stub isn't a real Session.
In this case, the requirement appears in report.json as skipped (stub-failed). You can write the test manually using your existing test fixtures, or configure an LLM fallback:
# pyproject.toml
[tool.quell]
llm_provider = "anthropic"
llm_model = "claude-sonnet-4-5"
quell check src/ --fix # LLM handles the complex stub cases
Checking your current requirement coverage
quell score --badge
Output:
Requirement coverage: 73% (22/30)
Uncovered:
transfer_funds MUST_RAISE ValueError: amount <= 0
validate_user MUST_RAISE AuthError: not logged in
...
Badge: 
Paste the badge into your README.
CI integration
# .github/workflows/quelltest.yml
name: Requirement coverage
on: [push, pull_request]
jobs:
quelltest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install quelltest
- run: quell check src/ --ci --threshold 0.80
This fails the pipeline if requirement coverage drops below 80%. Adjust the threshold to match your team's standard.
To post the gap report as a PR comment:
- run: quell pr ${{ github.event.pull_request.number }} --comment
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
The full workflow
# 1. See what requirements exist and what's covered
quell check src/ --no-llm
# 2. Generate tests for all gaps
quell check src/ --fix --no-llm
# 3. Run the generated tests
pytest tests/
# 4. Check the score
quell score --badge
# 5. Add the CI gate
quell install --pr
Everything except quell install --pr (which writes the workflow file) is read-only or writes only to tests/. Your source code is never modified.
Summary
Docstrings already contain requirements. Quelltest turns them into verified pytest tests automatically — no LLM needed, no code sent anywhere, no false positives. The two-phase verification engine ensures every generated test actually proves the requirement it claims to test.
pip install quelltest
quell check src/ --fix --no-llm