Cache Module¶
The cache module implements incremental analysis, the third pillar of pytest-gremlins' speed strategy. Results are cached keyed by content hashes, allowing unchanged code/tests to be skipped on subsequent runs.
Overview¶
Traditional mutation testing runs all gremlins every time:
Run 1: 1000 gremlins tested (5 minutes)
Run 2: 1000 gremlins tested (5 minutes) # No changes
Run 3: 1000 gremlins tested (5 minutes) # 1 file changed
Incremental analysis skips unchanged code:
Run 1: 1000 gremlins tested (5 minutes)
Run 2: 0 gremlins tested (0 seconds) # Cache hit
Run 3: 10 gremlins tested (30 seconds) # Only changed file
Module Exports¶
from pytest_gremlins.cache import (
CachedGremlinResult, # TypedDict for cached result shape
ContentHasher, # SHA-256 content hashing
IncrementalCache, # Cache coordinator
ResultStore, # SQLite-backed result cache
)
ContentHasher¶
Produces deterministic SHA-256 hashes for files and strings.
ContentHasher
¶
Produces content hashes for files and strings.
Uses SHA-256 to produce deterministic hashes that uniquely identify content. These hashes form the cache keys for incremental analysis.
Example
hasher = ContentHasher() result = hasher.hash_string('def foo(): return 42') len(result) == 64 # SHA-256 produces 64 hex characters True
hash_string
¶
Hash a string and return its hex digest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content
|
str
|
The string content to hash. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A 64-character hexadecimal string (SHA-256 digest). |
Source code in src/pytest_gremlins/cache/hasher.py
hash_file
¶
Hash a file's content and return its hex digest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the file to hash. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A 64-character hexadecimal string (SHA-256 digest). |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
Source code in src/pytest_gremlins/cache/hasher.py
hash_files
¶
Hash multiple files and return a mapping of path to hash.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paths
|
list[Path]
|
List of file paths to hash. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
Dictionary mapping string path to hex digest. |
Source code in src/pytest_gremlins/cache/hasher.py
| Python | |
|---|---|
hash_combined
¶
Combine multiple hashes into a single hash.
Useful for creating composite cache keys from multiple source files or a combination of source and test file hashes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hashes
|
list[str]
|
List of hex digest strings to combine. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A single 64-character hexadecimal string. |
Source code in src/pytest_gremlins/cache/hasher.py
ContentHasher Methods¶
| Method | Returns | Description |
|---|---|---|
hash_string(content) |
str |
Hash a string (64 hex chars) |
hash_file(path) |
str |
Hash a file's content |
hash_files(paths) |
dict[str, str] |
Hash multiple files |
hash_combined(hashes) |
str |
Combine multiple hashes |
Usage Example¶
from pathlib import Path
from pytest_gremlins.cache import ContentHasher
hasher = ContentHasher()
# Hash a string
code = 'def foo(): return 42'
hash1 = hasher.hash_string(code)
print(hash1) # 64-character hex string
# Hash a file
file_hash = hasher.hash_file(Path('src/module.py'))
# Hash multiple files
hashes = hasher.hash_files([
Path('src/module.py'),
Path('tests/test_module.py'),
])
# {'src/module.py': 'abc...', 'tests/test_module.py': 'def...'}
# Combine hashes for composite keys
combined = hasher.hash_combined([hash1, file_hash])
Hash Properties¶
- Deterministic: Same content always produces same hash
- Collision-resistant: Different content produces different hashes
- Fast: SHA-256 is hardware-accelerated on modern CPUs
- Fixed-size: Always 64 hexadecimal characters
ResultStore¶
SQLite-backed cache for gremlin test results.
ResultStore
¶
SQLite-backed cache for gremlin test results.
Stores results as JSON blobs indexed by content-based cache keys. Keys are typically composed of source file hash + test file hash + gremlin definition.
Example
from pathlib import Path store = ResultStore(Path('.gremlins_cache/results.db')) store.put('abc123', {'status': 'zapped', 'killing_test': 'test_foo'}) store.get('abc123') {'status': 'zapped', 'killing_test': 'test_foo'} store.close()
If the database file is corrupted, it will be deleted and a fresh database will be created. A warning will be logged in this case.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_path
|
Path
|
Path to the SQLite database file. Parent directories will be created if they don't exist. |
required |
Source code in src/pytest_gremlins/cache/store.py
get
¶
Retrieve a cached result by key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_key
|
str
|
The content-based cache key. |
required |
Returns:
| Type | Description |
|---|---|
CachedGremlinResult | None
|
The cached result dictionary, or None if not found. |
Source code in src/pytest_gremlins/cache/store.py
put
¶
Store a result in the cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_key
|
str
|
The content-based cache key. |
required |
result
|
CachedGremlinResult
|
The result dictionary to cache. |
required |
Source code in src/pytest_gremlins/cache/store.py
put_deferred
¶
Store a result without committing immediately.
Results are batched and committed on flush() or close(). This is faster for bulk inserts as it reduces commit overhead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_key
|
str
|
The content-based cache key. |
required |
result
|
CachedGremlinResult
|
The result dictionary to cache. |
required |
Source code in src/pytest_gremlins/cache/store.py
flush
¶
Commit all pending deferred writes.
This commits any results added via put_deferred() in a single transaction, which is much faster than individual commits.
Source code in src/pytest_gremlins/cache/store.py
delete
¶
Remove a result from the cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_key
|
str
|
The cache key to remove. |
required |
Source code in src/pytest_gremlins/cache/store.py
delete_by_prefix
¶
Remove all results with keys matching a prefix.
Useful for invalidating all gremlins in a specific file when that file's content changes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prefix
|
str
|
The key prefix to match. All keys starting with this prefix will be deleted. |
required |
Source code in src/pytest_gremlins/cache/store.py
clear
¶
has
¶
Check if a key exists in the cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_key
|
str
|
The cache key to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the key exists, False otherwise. |
Source code in src/pytest_gremlins/cache/store.py
| Python | |
|---|---|
keys
¶
Get all cache keys.
Returns:
| Type | Description |
|---|---|
list[str]
|
List of all cache keys currently stored. |
count
¶
Get the number of cached entries.
Returns:
| Type | Description |
|---|---|
int
|
Total count of cached results. |
Source code in src/pytest_gremlins/cache/store.py
close
¶
Close the database connection.
Flushes any pending deferred writes before closing.
ResultStore Methods¶
| Method | Returns | Description |
|---|---|---|
get(cache_key) |
dict \| None |
Retrieve cached result |
put(cache_key, result) |
None |
Store result (immediate commit) |
put_deferred(cache_key, result) |
None |
Store result (batch commit) |
flush() |
None |
Commit all deferred writes |
has(cache_key) |
bool |
Check if key exists |
delete(cache_key) |
None |
Remove single entry |
delete_by_prefix(prefix) |
None |
Remove entries by prefix |
clear() |
None |
Remove all entries |
keys() |
list[str] |
Get all cache keys |
count() |
int |
Get entry count |
close() |
None |
Close database connection |
Usage Example¶
from pathlib import Path
from pytest_gremlins.cache import ResultStore
# Create store (creates parent directories if needed)
store = ResultStore(Path('.gremlins_cache/results.db'))
# Store a result (immediate commit)
store.put('g001:abc123:def456', {
'status': 'zapped',
'killing_test': 'test_boundary',
'execution_time_ms': 150.5,
})
# Retrieve a result
result = store.get('g001:abc123:def456')
if result:
print(f"Status: {result['status']}")
# Batch operations (faster for bulk inserts)
for i in range(100):
store.put_deferred(f'key_{i}', {'value': i})
store.flush() # Single commit for all 100
# Clean up
store.close()
Context Manager¶
from pathlib import Path
from pytest_gremlins.cache import ResultStore
with ResultStore(Path('.cache/results.db')) as store:
store.put('key', {'data': 'value'})
# Automatically closes on exit
Database Schema¶
Error Recovery¶
If the database is corrupted, ResultStore automatically:
- Detects the corruption on open
- Logs a warning
- Deletes the corrupted file
- Creates a fresh database
IncrementalCache¶
Coordinates content hashing and result storage for smart cache invalidation.
IncrementalCache
¶
Coordinator for incremental analysis caching.
Combines content hashing with result storage to implement the incremental analysis invalidation rules:
- Source file modified: cache miss (re-run gremlins in that file)
- Test file modified: cache miss (re-run gremlins covered by those tests)
- New test added: cache miss (re-run gremlins the new test covers)
- Test deleted: cache miss (re-run gremlins that test was zapping)
- Nothing changed: cache hit (return cached results instantly)
The cache key is composed of: - gremlin_id: Unique identifier for the mutation - source_hash: SHA-256 hash of the source file content - test_hashes: Combined hash of all test files covering this gremlin
Example
from pathlib import Path cache = IncrementalCache(Path('.gremlins_cache')) cache.cache_result('g001', 'src_hash', {'test_foo': 'hash'}, {'status': 'zapped'}) cache.get_cached_result('g001', 'src_hash', {'test_foo': 'hash'}) {'status': 'zapped'} cache.close()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_dir
|
Path
|
Directory to store cache files. |
required |
Source code in src/pytest_gremlins/cache/incremental.py
| Python | |
|---|---|
get_cached_result
¶
Retrieve a cached result if available.
Returns None (cache miss) if: - No cached result exists for this gremlin - Source file content has changed - Any relevant test file content has changed - Tests have been added or removed
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gremlin_id
|
str
|
Unique identifier for the gremlin. |
required |
source_hash
|
str
|
Current SHA-256 hash of the source file. |
required |
test_hashes
|
dict[str, str]
|
Current mapping of test name to content hash. |
required |
Returns:
| Type | Description |
|---|---|
CachedGremlinResult | None
|
Cached result dictionary, or None if cache miss. |
Source code in src/pytest_gremlins/cache/incremental.py
cache_result
¶
Cache a gremlin test result.
The result is stored with a key that incorporates the gremlin ID and content hashes. Any change to source or test files will produce a different key, causing a cache miss.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gremlin_id
|
str
|
Unique identifier for the gremlin. |
required |
source_hash
|
str
|
SHA-256 hash of the source file. |
required |
test_hashes
|
dict[str, str]
|
Mapping of test name to content hash. |
required |
result
|
CachedGremlinResult
|
The result dictionary to cache. |
required |
Source code in src/pytest_gremlins/cache/incremental.py
cache_result_deferred
¶
Cache a gremlin test result without committing immediately.
Results are batched and committed on flush() or close(). This is faster for bulk inserts during mutation testing runs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gremlin_id
|
str
|
Unique identifier for the gremlin. |
required |
source_hash
|
str
|
SHA-256 hash of the source file. |
required |
test_hashes
|
dict[str, str]
|
Mapping of test name to content hash. |
required |
result
|
CachedGremlinResult
|
The result dictionary to cache. |
required |
Source code in src/pytest_gremlins/cache/incremental.py
flush
¶
invalidate_file
¶
Invalidate all cached results for gremlins in a file.
Removes all cache entries where the gremlin_id starts with the given prefix. Useful when a source file changes and all its gremlins need re-testing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_prefix
|
str
|
Prefix to match in gremlin IDs. |
required |
Source code in src/pytest_gremlins/cache/incremental.py
clear
¶
get_stats
¶
Get cache statistics.
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
Dictionary with hits, misses, and total_entries counts. |
Source code in src/pytest_gremlins/cache/incremental.py
IncrementalCache Methods¶
| Method | Returns | Description |
|---|---|---|
get_cached_result(gremlin_id, source_hash, test_hashes) |
dict \| None |
Get cached result |
cache_result(gremlin_id, source_hash, test_hashes, result) |
None |
Cache result (immediate) |
cache_result_deferred(...) |
None |
Cache result (batch) |
flush() |
None |
Commit deferred writes |
invalidate_file(file_prefix) |
None |
Invalidate file's gremlins |
clear() |
None |
Clear entire cache |
get_stats() |
dict |
Get hit/miss statistics |
close() |
None |
Close and release resources |
Cache Key Structure¶
Cache keys incorporate three components:
| Component | Changes When |
|---|---|
gremlin_id |
Never (uniquely identifies mutation) |
source_hash |
Source file content changes |
combined_test_hash |
Any covering test file changes |
Invalidation Rules¶
| Change | Cache Effect |
|---|---|
| Source file modified | Miss for all gremlins in that file |
| Test file modified | Miss for gremlins covered by that test |
| New test added | Miss for gremlins the new test covers |
| Test deleted | Miss for gremlins that test was zapping |
| Nothing changed | Hit (return cached result) |
Usage Example¶
from pathlib import Path
from pytest_gremlins.cache import IncrementalCache
cache = IncrementalCache(Path('.gremlins_cache'))
# Define content hashes
source_hash = 'abc123...' # Hash of source file
test_hashes = {
'test_login': 'def456...', # Hash of test file
'test_logout': 'ghi789...',
}
# Try to get cached result
result = cache.get_cached_result(
gremlin_id='g001',
source_hash=source_hash,
test_hashes=test_hashes,
)
if result is not None:
print(f"Cache hit: {result['status']}")
else:
# Run the test
actual_result = run_mutation_test('g001')
# Cache the result
cache.cache_result(
gremlin_id='g001',
source_hash=source_hash,
test_hashes=test_hashes,
result={
'status': actual_result.status.value,
'killing_test': actual_result.killing_test,
'execution_time_ms': actual_result.execution_time_ms,
},
)
# Check statistics
stats = cache.get_stats()
print(f"Hits: {stats['hits']}, Misses: {stats['misses']}")
cache.close()
Batch Caching¶
For better performance during mutation testing runs:
cache = IncrementalCache(Path('.gremlins_cache'))
# Use deferred writes (batched commits)
for gremlin_id, result in results:
cache.cache_result_deferred(
gremlin_id=gremlin_id,
source_hash=source_hashes[gremlin_id],
test_hashes=test_hashes[gremlin_id],
result=result,
)
# Single commit for all results
cache.flush()
cache.close()
Context Manager¶
from pathlib import Path
from pytest_gremlins.cache import IncrementalCache
with IncrementalCache(Path('.gremlins_cache')) as cache:
result = cache.get_cached_result('g001', 'abc', {'test': 'def'})
# Automatically closes
CLI Integration¶
Enable caching via command line:
# Enable incremental caching
pytest --gremlins --gremlin-cache
# Clear cache and start fresh
pytest --gremlins --gremlin-cache --gremlin-clear-cache
Cache Location¶
By default, cache is stored in .gremlins_cache/ in the project root:
Performance Impact¶
Example Scenario¶
Initial run (cold cache):
1000 gremlins x 50ms average = 50 seconds
Second run (no changes):
1000 cache hits x 0.1ms lookup = 0.1 seconds
After changing one 50-gremlin file:
50 cache misses x 50ms = 2.5 seconds
950 cache hits x 0.1ms = 0.1 seconds
Total: 2.6 seconds (vs 50 seconds without cache)
Best Practices¶
- Enable caching in development - Faster feedback during TDD
- Disable in CI - Start fresh for authoritative results
- Clear cache after refactoring - Major changes may confuse the cache
- Monitor hit rates - Low hit rates may indicate frequent test changes