All posts
·5 min read·0 views·Shashank Bindal

Auto-generate pytest Tests from Python Docstrings: A Complete Guide

A step-by-step guide to automatically generating verified pytest tests from Python docstrings using Quelltest — covering Raises:, Returns:, and Args: sections with real examples.

Python docstrings are the most underused source of test specifications in most codebases. A well-written docstring already contains every testable requirement for a function — Raises:, Returns:, argument constraints. The only missing step is turning those requirements into actual tests.

This guide walks through exactly how to do that automatically with Quelltest, from installation to CI integration.

Why docstrings are a test specification

Consider this function:

def transfer_funds(
    amount: float,
    from_account: str,
    to_account: str,
    currency: str,
) -> TransferResult:
    """
    Transfer funds between accounts.

    Args:
        amount: Transfer amount. Must be positive.
        from_account: Source account ID.
        to_account: Destination account ID.
        currency: ISO 4217 code.

    Returns:
        TransferResult with a unique transfer_id and timestamp.

    Raises:
        ValueError: If amount is <= 0.
        ValueError: If from_account equals to_account.
        AccountNotFoundError: If either account does not exist.
        InsufficientFundsError: If from_account balance is below amount.
    """

This docstring contains four testable requirements — four things your code must do that are worth verifying. Without Quelltest, you'd write all four tests manually. With Quelltest, they're generated and verified automatically.

Installation

pip install quelltest

Requires Python 3.11+.

Step 1: Check what requirements Quelltest finds

Before generating tests, see what Quelltest extracts from your docstrings:

quell check src/ --no-llm

For the transfer_funds function above, you'd see:

[docstring]  MUST_RAISE  transfer_funds  ValueError: amount <= 0          ✗ uncovered
[docstring]  MUST_RAISE  transfer_funds  ValueError: from == to           ✗ uncovered
[docstring]  MUST_RAISE  transfer_funds  AccountNotFoundError: no account ✗ uncovered
[docstring]  MUST_RAISE  transfer_funds  InsufficientFundsError: low bal  ✗ uncovered

Score: 0% (0/4 covered)
4 gap(s) found.

If you already have tests that cover some of these, those show as ✓ covered and are skipped.

Step 2: Generate and verify tests

Add --fix to generate tests for every uncovered requirement:

quell check src/ --fix --no-llm

Quelltest generates a test for each gap, then runs two-phase verification:

  • Phase 1: The test must PASS on your current code
  • Phase 2: The raise ValueError line is temporarily commented out — the test must FAIL

Only tests that pass both phases are written. Output:

  transfer_funds  MUST_RAISE  ValueError (amount)         ✓ verified
  transfer_funds  MUST_RAISE  ValueError (from==to)        ✓ verified
  transfer_funds  MUST_RAISE  AccountNotFoundError         ✓ verified
  transfer_funds  MUST_RAISE  InsufficientFundsError       ✓ verified

4 tests written → tests/test_transfer.py
Score: 100% (4/4 covered)
Your code never left your machine.

What the generated tests look like

# tests/test_transfer.py — auto-generated by Quelltest 0.6.9

import pytest
from src.accounts import transfer_funds


def test_transfer_funds_raises_valueerror_amount():
    """MUST_RAISE ValueError — If amount is <= 0. [quelltest]"""
    with pytest.raises(ValueError):
        transfer_funds(
            amount=0,
            from_account="acc_001",
            to_account="acc_002",
            currency="USD",
        )


def test_transfer_funds_raises_valueerror_same_account():
    """MUST_RAISE ValueError — If from_account equals to_account. [quelltest]"""
    with pytest.raises(ValueError):
        transfer_funds(
            amount=100.0,
            from_account="acc_001",
            to_account="acc_001",
            currency="USD",
        )

These are plain pytest functions — no special runner, no Quelltest dependency at test time. They run with pytest like any other test.

Writing docstrings that Quelltest reads well

Quelltest supports Google-style, NumPy-style, and Sphinx-style docstrings. Here's what each looks like for Raises::

Google style (recommended):

"""
Raises:
    ValueError: If amount is <= 0.
    AccountNotFoundError: If the account does not exist.
"""

NumPy style:

"""
Raises
------
ValueError
    If amount is <= 0.
AccountNotFoundError
    If the account does not exist.
"""

Sphinx style:

"""
:raises ValueError: If amount is <= 0.
:raises AccountNotFoundError: If the account does not exist.
"""

All three produce the same MUST_RAISE requirements.

Returns: sections

"""
Returns:
    TransferResult with a unique transfer_id field.
"""

This produces a MUST_RETURN requirement. Quelltest generates a test that calls the function and asserts the return value is not None and is an instance of TransferResult.

What Quelltest extracts vs what it skips

Docstring sectionExtracted?Constraint kind
Raises: SomeError: if XMUST_RAISE
Returns: SomeType with field XMUST_RETURN
Args: x: Must be positivePartialBOUNDARY (if numeric bound)
Note: / Example: / See also:Not applicable
Custom sectionsNot applicable

Args: sections are parsed for numeric constraints ("must be positive", "must not be negative") but these produce lower-confidence requirements than explicit Raises: blocks. For best results, document constraints in Raises: where possible.

Handling functions with complex argument construction

If your function requires objects that are hard to construct (database sessions, external clients), Quelltest's rule engine generates tests with stub arguments. These may fail phase 1 verification because the stub isn't a real Session.

In this case, the requirement appears in report.json as skipped (stub-failed). You can write the test manually using your existing test fixtures, or configure an LLM fallback:

# pyproject.toml
[tool.quell]
llm_provider = "anthropic"
llm_model = "claude-sonnet-4-5"
quell check src/ --fix   # LLM handles the complex stub cases

Checking your current requirement coverage

quell score --badge

Output:

Requirement coverage: 73% (22/30)

Uncovered:
  transfer_funds   MUST_RAISE  ValueError: amount <= 0
  validate_user    MUST_RAISE  AuthError: not logged in
  ...

Badge: ![Quell Score](https://img.shields.io/badge/quell-73%25-yellow)

Paste the badge into your README.

CI integration

# .github/workflows/quelltest.yml
name: Requirement coverage

on: [push, pull_request]

jobs:
  quelltest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install quelltest
      - run: quell check src/ --ci --threshold 0.80

This fails the pipeline if requirement coverage drops below 80%. Adjust the threshold to match your team's standard.

To post the gap report as a PR comment:

      - run: quell pr ${{ github.event.pull_request.number }} --comment
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The full workflow

# 1. See what requirements exist and what's covered
quell check src/ --no-llm

# 2. Generate tests for all gaps
quell check src/ --fix --no-llm

# 3. Run the generated tests
pytest tests/

# 4. Check the score
quell score --badge

# 5. Add the CI gate
quell install --pr

Everything except quell install --pr (which writes the workflow file) is read-only or writes only to tests/. Your source code is never modified.

Summary

Docstrings already contain requirements. Quelltest turns them into verified pytest tests automatically — no LLM needed, no code sent anywhere, no false positives. The two-phase verification engine ensures every generated test actually proves the requirement it claims to test.

pip install quelltest
quell check src/ --fix --no-llm

Try Quelltest

Install Quelltest and run it on your codebase — no API key, no configuration.