TDD with Mutation Testing¶
Integrate mutation testing into your Test-Driven Development workflow for stronger tests.
Goal¶
Extend the classic Red-Green-Refactor cycle with mutation testing to ensure tests are not just passing, but actually catching bugs.
Prerequisites¶
- Understanding of TDD fundamentals
- pytest-gremlins installed
- A project with tests
The Extended TDD Cycle¶
Traditional TDD:
TDD with Mutation Testing:
The MUTATE phase uses pytest-gremlins to verify your tests would catch bugs in the code you just wrote.
Steps¶
- Understand the extended cycle
- Configure for fast feedback
- Practice the workflow
- Integrate into your routine
Configuration¶
Fast Feedback Configuration¶
Create pyproject.toml optimized for TDD:
[project]
name = "myproject"
version = "1.0.0"
[project.optional-dependencies]
dev = [
"pytest>=7.0.0",
"pytest-gremlins>=1.0.0",
"pytest-watch>=4.2.0", # For continuous testing
]
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-ra --strict-markers -x --tb=short"
# -x: stop on first failure (fast feedback)
# --tb=short: concise tracebacks
[tool.pytest-gremlins]
paths = ["src"]
# TDD-friendly operator selection
operators = [
"comparison", # Boundary conditions
"arithmetic", # Math operations
"boolean", # Logic conditions
"return", # Return values
]
Shell Aliases¶
Add to your .bashrc or .zshrc:
# TDD workflow aliases
alias t='pytest -x --tb=short' # Quick test run
alias tw='pytest-watch -- -x --tb=short' # Watch mode
alias tm='pytest --gremlins --gremlin-cache -x' # Mutate
alias tdd='pytest -x && pytest --gremlins --gremlin-cache -x' # Full cycle
The Workflow¶
Phase 1: RED - Write a Failing Test¶
Write a test for the behavior you want:
# tests/test_calculator.py
"""Tests for calculator module."""
class TestCalculatorAdd:
"""Tests for add function."""
def test_add_positive_numbers(self):
"""Adding positive numbers returns their sum."""
from myproject.calculator import add
result = add(2, 3)
assert result == 5
Run the test - it should fail:
FAILED tests/test_calculator.py::TestCalculatorAdd::test_add_positive_numbers
E ModuleNotFoundError: No module named 'myproject.calculator'
Phase 2: GREEN - Make It Pass¶
Write the minimum code to pass:
# src/myproject/calculator.py
"""Calculator module."""
def add(a, b):
"""Add two numbers."""
return a + b
Run the test - it should pass:
Phase 3: REFACTOR - Improve the Code¶
If needed, refactor while keeping tests green:
# src/myproject/calculator.py
"""Calculator module."""
def add(a: int | float, b: int | float) -> int | float:
"""Add two numbers.
Args:
a: First number.
b: Second number.
Returns:
Sum of a and b.
"""
return a + b
Verify tests still pass:
Phase 4: MUTATE - Verify Test Strength¶
Now run mutation testing to see if your test would catch bugs:
If gremlins survive, your test has gaps:
================== pytest-gremlins mutation report ==================
Zapped: 0 gremlins (0%)
Survived: 2 gremlins (100%)
Surviving gremlins:
src/myproject/calculator.py:12 + → - (arithmetic not verified)
src/myproject/calculator.py:12 + → * (arithmetic not verified)
This tells us: if someone changed + to - or *, our test wouldn't catch it!
Back to RED - Strengthen Tests¶
Add tests that would catch these mutations:
# tests/test_calculator.py
"""Tests for calculator module."""
class TestCalculatorAdd:
"""Tests for add function."""
def test_add_positive_numbers(self):
"""Adding positive numbers returns their sum."""
from myproject.calculator import add
result = add(2, 3)
assert result == 5
def test_add_is_not_subtraction(self):
"""Addition is different from subtraction."""
from myproject.calculator import add
# If add(5, 3) returned 2, we'd know it's subtracting
result = add(5, 3)
assert result == 8 # Not 2 (5-3) or 15 (5*3)
def test_add_zero_returns_other(self):
"""Adding zero returns the other number."""
from myproject.calculator import add
assert add(5, 0) == 5
assert add(0, 5) == 5
Run mutation testing again:
================== pytest-gremlins mutation report ==================
Zapped: 2 gremlins (100%)
Survived: 0 gremlins (0%)
All gremlins zapped. Your tests are strong.
Complete Example: Boundary Conditions¶
Let's work through a more complex example with boundary conditions.
RED - Write the Test¶
# tests/test_validator.py
"""Tests for age validator."""
class TestIsAdult:
"""Tests for is_adult function."""
def test_eighteen_is_adult(self):
"""Age 18 is considered adult."""
from myproject.validator import is_adult
assert is_adult(18) is True
GREEN - Make It Pass¶
# src/myproject/validator.py
"""Age validation module."""
def is_adult(age: int) -> bool:
"""Check if age qualifies as adult.
Args:
age: Age in years.
Returns:
True if 18 or older, False otherwise.
"""
return age >= 18
MUTATE - Find Gaps¶
The >= to > mutation survives. If someone changed age >= 18 to age > 18, our test would
still pass because we only test with age 18.
RED Again - Test the Boundary¶
# tests/test_validator.py
"""Tests for age validator."""
class TestIsAdult:
"""Tests for is_adult function."""
def test_eighteen_is_adult(self):
"""Age 18 is considered adult."""
from myproject.validator import is_adult
assert is_adult(18) is True
def test_seventeen_is_not_adult(self):
"""Age 17 is not adult."""
from myproject.validator import is_adult
assert is_adult(17) is False
def test_nineteen_is_adult(self):
"""Age 19 is adult."""
from myproject.validator import is_adult
assert is_adult(19) is True
MUTATE - Verify¶
The boundary is now properly tested.
Quick Feedback Loop¶
Using pytest-watch¶
Install pytest-watch for continuous testing:
Run in watch mode:
Now every time you save a file, tests run automatically.
Periodic Mutation Checks¶
While pytest-watch handles the RED-GREEN-REFACTOR cycle, periodically run mutation testing:
Or use a keyboard shortcut in your IDE to run the full TDD cycle:
IDE Integration¶
VS Code¶
Create .vscode/tasks.json:
{
"version": "2.0.0",
"tasks": [
{
"label": "TDD: Run Tests",
"type": "shell",
"command": "pytest -x --tb=short",
"group": "test",
"problemMatcher": []
},
{
"label": "TDD: Mutate",
"type": "shell",
"command": "pytest --gremlins --gremlin-cache -x",
"group": "test",
"problemMatcher": []
},
{
"label": "TDD: Full Cycle",
"type": "shell",
"command": "pytest -x && pytest --gremlins --gremlin-cache -x",
"group": "test",
"problemMatcher": []
}
]
}
Use Cmd+Shift+B (Mac) or Ctrl+Shift+B (Windows/Linux) to run tasks.
PyCharm¶
Create run configurations:
- TDD: Tests -
pytest -x --tb=short - TDD: Mutate -
pytest --gremlins --gremlin-cache -x - TDD: Full - Compound configuration running both
Verification¶
- Practice the cycle on a new feature:
- Write failing test
- Make it pass
- Refactor
- Run mutation testing
-
Strengthen tests if needed
-
Check that mutation scores stay high:
- Over time, mutation scores should improve or stay stable
Troubleshooting¶
Mutation testing is too slow for TDD¶
Use caching and operator subsets:
# Fast check during development
pytest --gremlins --gremlin-cache --gremlin-operators=comparison -x
# Full check before committing
pytest --gremlins
Too many surviving gremlins to address¶
Focus on one at a time:
# See detailed report
pytest --gremlins --gremlin-report=html
# Address the most critical (e.g., boundary conditions) first
Prioritize:
- Boundary condition mutations (
>=to>) - Return value mutations (returning wrong value)
- Boolean mutations (logic errors)
- Arithmetic mutations (calculation errors)
Some gremlins are false positives¶
Not all surviving gremlins indicate test gaps. Some mutations are equivalent (produce the same behavior). Use pragmatic judgment:
# This gremlin might survive: x = x + 0 → x = x - 0
# Both produce the same result - not a real test gap
Mark intentional exclusions with the pardon pragma:
def calculate_discount(price):
return price * 0 # gremlin: pardon[equivalent] always zero by design
Valid reason codes are equivalent, untestable, and out_of_scope.
Best Practices¶
- Run mutation testing after each feature, not each commit
- TDD cycle: seconds
-
Mutation check: tens of seconds to minutes
-
Start with high-value mutations
- Comparison operators catch boundary bugs
-
Boolean operators catch logic errors
-
Don't chase 100% mutation score
- 85-95% is excellent
-
Some equivalent mutations are unavoidable
-
Use mutation testing to learn
- Surviving gremlins teach you about edge cases
-
Over time, you'll write stronger tests naturally
-
Integrate with code review
- Share mutation reports in PRs
- Discuss surviving gremlins with teammates