Skip to content

TDD with Mutation Testing

Integrate mutation testing into your Test-Driven Development workflow for stronger tests.

Goal

Extend the classic Red-Green-Refactor cycle with mutation testing to ensure tests are not just passing, but actually catching bugs.

Prerequisites

  • Understanding of TDD fundamentals
  • pytest-gremlins installed
  • A project with tests

The Extended TDD Cycle

Traditional TDD:

Text Only
RED → GREEN → REFACTOR

TDD with Mutation Testing:

Text Only
RED → GREEN → REFACTOR → MUTATE

The MUTATE phase uses pytest-gremlins to verify your tests would catch bugs in the code you just wrote.

Steps

  1. Understand the extended cycle
  2. Configure for fast feedback
  3. Practice the workflow
  4. Integrate into your routine

Configuration

Fast Feedback Configuration

Create pyproject.toml optimized for TDD:

TOML
[project]
name = "myproject"
version = "1.0.0"

[project.optional-dependencies]
dev = [
    "pytest>=7.0.0",
    "pytest-gremlins>=1.0.0",
    "pytest-watch>=4.2.0",  # For continuous testing
]

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-ra --strict-markers -x --tb=short"
# -x: stop on first failure (fast feedback)
# --tb=short: concise tracebacks

[tool.pytest-gremlins]
paths = ["src"]

# TDD-friendly operator selection
operators = [
    "comparison",    # Boundary conditions
    "arithmetic",    # Math operations
    "boolean",       # Logic conditions
    "return",        # Return values
]

Shell Aliases

Add to your .bashrc or .zshrc:

Bash
# TDD workflow aliases
alias t='pytest -x --tb=short'                    # Quick test run
alias tw='pytest-watch -- -x --tb=short'          # Watch mode
alias tm='pytest --gremlins --gremlin-cache -x'   # Mutate
alias tdd='pytest -x && pytest --gremlins --gremlin-cache -x'  # Full cycle

The Workflow

Phase 1: RED - Write a Failing Test

Write a test for the behavior you want:

Python
# tests/test_calculator.py
"""Tests for calculator module."""


class TestCalculatorAdd:
    """Tests for add function."""

    def test_add_positive_numbers(self):
        """Adding positive numbers returns their sum."""
        from myproject.calculator import add

        result = add(2, 3)

        assert result == 5

Run the test - it should fail:

Bash
t  # or: pytest -x --tb=short
Text Only
FAILED tests/test_calculator.py::TestCalculatorAdd::test_add_positive_numbers
E   ModuleNotFoundError: No module named 'myproject.calculator'

Phase 2: GREEN - Make It Pass

Write the minimum code to pass:

Python
# src/myproject/calculator.py
"""Calculator module."""


def add(a, b):
    """Add two numbers."""
    return a + b

Run the test - it should pass:

Bash
t  # or: pytest -x --tb=short
Text Only
PASSED tests/test_calculator.py::TestCalculatorAdd::test_add_positive_numbers

Phase 3: REFACTOR - Improve the Code

If needed, refactor while keeping tests green:

Python
# src/myproject/calculator.py
"""Calculator module."""


def add(a: int | float, b: int | float) -> int | float:
    """Add two numbers.

    Args:
        a: First number.
        b: Second number.

    Returns:
        Sum of a and b.
    """
    return a + b

Verify tests still pass:

Bash
t

Phase 4: MUTATE - Verify Test Strength

Now run mutation testing to see if your test would catch bugs:

Bash
tm  # or: pytest --gremlins --gremlin-cache -x

If gremlins survive, your test has gaps:

Text Only
================== pytest-gremlins mutation report ==================

Zapped: 0 gremlins (0%)
Survived: 2 gremlins (100%)

Surviving gremlins:
  src/myproject/calculator.py:12    + → -   (arithmetic not verified)
  src/myproject/calculator.py:12    + → *   (arithmetic not verified)

This tells us: if someone changed + to - or *, our test wouldn't catch it!

Back to RED - Strengthen Tests

Add tests that would catch these mutations:

Python
# tests/test_calculator.py
"""Tests for calculator module."""


class TestCalculatorAdd:
    """Tests for add function."""

    def test_add_positive_numbers(self):
        """Adding positive numbers returns their sum."""
        from myproject.calculator import add

        result = add(2, 3)

        assert result == 5

    def test_add_is_not_subtraction(self):
        """Addition is different from subtraction."""
        from myproject.calculator import add

        # If add(5, 3) returned 2, we'd know it's subtracting
        result = add(5, 3)

        assert result == 8  # Not 2 (5-3) or 15 (5*3)

    def test_add_zero_returns_other(self):
        """Adding zero returns the other number."""
        from myproject.calculator import add

        assert add(5, 0) == 5
        assert add(0, 5) == 5

Run mutation testing again:

Bash
tm
Text Only
================== pytest-gremlins mutation report ==================

Zapped: 2 gremlins (100%)
Survived: 0 gremlins (0%)

All gremlins zapped. Your tests are strong.

Complete Example: Boundary Conditions

Let's work through a more complex example with boundary conditions.

RED - Write the Test

Python
# tests/test_validator.py
"""Tests for age validator."""


class TestIsAdult:
    """Tests for is_adult function."""

    def test_eighteen_is_adult(self):
        """Age 18 is considered adult."""
        from myproject.validator import is_adult

        assert is_adult(18) is True

GREEN - Make It Pass

Python
# src/myproject/validator.py
"""Age validation module."""


def is_adult(age: int) -> bool:
    """Check if age qualifies as adult.

    Args:
        age: Age in years.

    Returns:
        True if 18 or older, False otherwise.
    """
    return age >= 18

MUTATE - Find Gaps

Bash
tm
Text Only
Surviving gremlins:
  src/myproject/validator.py:14    >= → >   (boundary not tested)

The >= to > mutation survives. If someone changed age >= 18 to age > 18, our test would still pass because we only test with age 18.

RED Again - Test the Boundary

Python
# tests/test_validator.py
"""Tests for age validator."""


class TestIsAdult:
    """Tests for is_adult function."""

    def test_eighteen_is_adult(self):
        """Age 18 is considered adult."""
        from myproject.validator import is_adult

        assert is_adult(18) is True

    def test_seventeen_is_not_adult(self):
        """Age 17 is not adult."""
        from myproject.validator import is_adult

        assert is_adult(17) is False

    def test_nineteen_is_adult(self):
        """Age 19 is adult."""
        from myproject.validator import is_adult

        assert is_adult(19) is True

MUTATE - Verify

Bash
tm
Text Only
Zapped: 2 gremlins (100%)

The boundary is now properly tested.

Quick Feedback Loop

Using pytest-watch

Install pytest-watch for continuous testing:

Bash
pip install pytest-watch

Run in watch mode:

Bash
pytest-watch -- -x --tb=short

Now every time you save a file, tests run automatically.

Periodic Mutation Checks

While pytest-watch handles the RED-GREEN-REFACTOR cycle, periodically run mutation testing:

Bash
# In another terminal, or after completing a feature
tm

Or use a keyboard shortcut in your IDE to run the full TDD cycle:

Bash
tdd  # alias for: pytest -x && pytest --gremlins --gremlin-cache -x

IDE Integration

VS Code

Create .vscode/tasks.json:

JSON
{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "TDD: Run Tests",
      "type": "shell",
      "command": "pytest -x --tb=short",
      "group": "test",
      "problemMatcher": []
    },
    {
      "label": "TDD: Mutate",
      "type": "shell",
      "command": "pytest --gremlins --gremlin-cache -x",
      "group": "test",
      "problemMatcher": []
    },
    {
      "label": "TDD: Full Cycle",
      "type": "shell",
      "command": "pytest -x && pytest --gremlins --gremlin-cache -x",
      "group": "test",
      "problemMatcher": []
    }
  ]
}

Use Cmd+Shift+B (Mac) or Ctrl+Shift+B (Windows/Linux) to run tasks.

PyCharm

Create run configurations:

  1. TDD: Tests - pytest -x --tb=short
  2. TDD: Mutate - pytest --gremlins --gremlin-cache -x
  3. TDD: Full - Compound configuration running both

Verification

  1. Practice the cycle on a new feature:
  2. Write failing test
  3. Make it pass
  4. Refactor
  5. Run mutation testing
  6. Strengthen tests if needed

  7. Check that mutation scores stay high:

Bash
pytest --gremlins --gremlin-report=console
  1. Over time, mutation scores should improve or stay stable

Troubleshooting

Mutation testing is too slow for TDD

Use caching and operator subsets:

Bash
# Fast check during development
pytest --gremlins --gremlin-cache --gremlin-operators=comparison -x

# Full check before committing
pytest --gremlins

Too many surviving gremlins to address

Focus on one at a time:

Bash
# See detailed report
pytest --gremlins --gremlin-report=html

# Address the most critical (e.g., boundary conditions) first

Prioritize:

  1. Boundary condition mutations (>= to >)
  2. Return value mutations (returning wrong value)
  3. Boolean mutations (logic errors)
  4. Arithmetic mutations (calculation errors)

Some gremlins are false positives

Not all surviving gremlins indicate test gaps. Some mutations are equivalent (produce the same behavior). Use pragmatic judgment:

Python
# This gremlin might survive: x = x + 0  →  x = x - 0
# Both produce the same result - not a real test gap

Mark intentional exclusions with the pardon pragma:

Python
def calculate_discount(price):
    return price * 0  # gremlin: pardon[equivalent] always zero by design

Valid reason codes are equivalent, untestable, and out_of_scope.

Best Practices

  1. Run mutation testing after each feature, not each commit
  2. TDD cycle: seconds
  3. Mutation check: tens of seconds to minutes

  4. Start with high-value mutations

  5. Comparison operators catch boundary bugs
  6. Boolean operators catch logic errors

  7. Don't chase 100% mutation score

  8. 85-95% is excellent
  9. Some equivalent mutations are unavoidable

  10. Use mutation testing to learn

  11. Surviving gremlins teach you about edge cases
  12. Over time, you'll write stronger tests naturally

  13. Integrate with code review

  14. Share mutation reports in PRs
  15. Discuss surviving gremlins with teammates