Property-Based Testing: Generative Testing for System Invariants

Property-Based Testing (PBT) revolutionizes how we think about test cases. Instead of manually crafting individual examples, PBT defines properties that should always hold true and automatically generates hundreds of test cases to verify those properties. This approach discovers edge cases that developers rarely anticipate.

What is Property-Based Testing?

Property-Based Testing focuses on specifying invariants—rules that must always be true regardless of inputs—rather than checking specific input-output pairs. A property-based testing framework generates random inputs, runs the code under test, and verifies that the properties hold.

Core Concepts

Property: An assertion about the system that should hold for all valid inputs.

Generator: A function that produces random test data matching specific constraints.

Shrinking: When a failing test is found, the framework automatically simplifies the input to find the minimal failing case.

Invariant: A condition that remains true throughout program execution or across transformations.

Properties vs. Example-Based Tests

Traditional Example-Based Testing

def test_reverse_list():
    assert reverse([1, 2, 3]) == [3, 2, 1]
    assert reverse([]) == []
    assert reverse([42]) == [42]

Limitations: Only tests three specific cases; may miss edge cases like large lists, duplicates, or unusual values.

Property-Based Approach

from hypothesis import given
import hypothesis.strategies as st

@given(st.lists(st.integers()))
def test_reverse_property(lst):
    # Property: Reversing twice returns original
    assert reverse(reverse(lst)) == lst

    # Property: Length is preserved
    assert len(reverse(lst)) == len(lst)

    # Property: First element becomes last
    if lst:
        assert reverse(lst)[0] == lst[-1]

Advantages: Tests hundreds of random lists automatically; discovers unexpected edge cases.

Property-Based Testing Frameworks

Hypothesis (Python)

Hypothesis is the most mature PBT framework for Python, offering sophisticated strategies and excellent shrinking.

from hypothesis import given, strategies as st, assume

@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
    # Property: Addition is commutative
    assert a + b == b + a

@given(st.lists(st.integers(), min_size=1))
def test_max_is_in_list(lst):
    # Property: Max value must be in the list
    assert max(lst) in lst

@given(st.text())
def test_encode_decode(s):
    # Property: Encoding then decoding returns original
    encoded = s.encode('utf-8')
    decoded = encoded.decode('utf-8')
    assert s == decoded

QuickCheck (Haskell)

The original property-based testing framework that inspired all others.

-- Property: Reverse is its own inverse
prop_reverseInverse :: [Int] -> Bool
prop_reverseInverse xs = reverse (reverse xs) == xs

-- Property: Sorted list has all original elements
prop_sortPreservesElements :: [Int] -> Bool
prop_sortPreservesElements xs =
    sort xs `sameElements` xs
  where
    sameElements a b = sort a == sort b

-- Property: Appending then taking length sums lengths
prop_appendLength :: [Int] -> [Int] -> Bool
prop_appendLength xs ys =
    length (xs ++ ys) == length xs + length ys

fast-check (JavaScript/TypeScript)

Property-based testing for JavaScript ecosystem with TypeScript support.

import fc from 'fast-check';

// Property: Array map preserves length
fc.property(
  fc.array(fc.integer()),
  fc.func(fc.integer()),
  (arr, f) => arr.map(f).length === arr.length
);

// Property: JSON serialization round-trip
fc.property(
  fc.anything(),
  (value) => {
    const serialized = JSON.stringify(value);
    const deserialized = JSON.parse(serialized);
    return fc.stringify(value) === fc.stringify(deserialized);
  }
);

JSVerify (JavaScript)

Earlier JavaScript PBT library, simpler but less feature-rich than fast-check.

const jsc = require('jsverify');

// Property: String split then join returns original
const splitJoinProperty = jsc.forall(
  jsc.string,
  jsc.string,
  (str, sep) => {
    if (sep === '') return true; // Skip empty separator
    return str.split(sep).join(sep) === str;
  }
);

jsc.assert(splitJoinProperty);

Generators and Strategies

Generators produce random test data constrained to specific types and ranges.

Built-in Generators

from hypothesis import strategies as st

# Basic types
st.integers()                    # Any integer
st.integers(min_value=0, max_value=100)  # Range 0-100
st.floats()                      # Any float
st.text()                        # Unicode strings
st.booleans()                    # True/False

# Collections
st.lists(st.integers())          # Lists of integers
st.sets(st.text(), min_size=1)   # Non-empty sets of strings
st.dictionaries(st.text(), st.integers())  # String->Int dicts
st.tuples(st.integers(), st.text())  # Fixed-size tuples

# Optionals and choices
st.none()                        # Always None
st.one_of(st.integers(), st.text())  # Either int or string

Custom Generators

from hypothesis import strategies as st
from hypothesis.strategies import composite

# Generate valid email addresses
@composite
def email_strategy(draw):
    username = draw(st.text(
        alphabet=st.characters(whitelist_categories=('Ll', 'Nd')),
        min_size=1,
        max_size=20
    ))
    domain = draw(st.text(
        alphabet=st.characters(whitelist_categories=('Ll',)),
        min_size=1,
        max_size=15
    ))
    tld = draw(st.sampled_from(['com', 'org', 'net', 'edu']))
    return f"{username}@{domain}.{tld}"

@given(email_strategy())
def test_email_validation(email):
    assert '@' in email
    assert '.' in email.split('@')[1]

Generator Composition

# Generate shopping cart with realistic constraints
@composite
def shopping_cart_strategy(draw):
    num_items = draw(st.integers(min_value=0, max_value=50))
    items = draw(st.lists(
        st.tuples(
            st.text(min_size=1),      # Product name
            st.integers(min_value=1, max_value=10),  # Quantity
            st.floats(min_value=0.01, max_value=1000.00)  # Price
        ),
        min_size=num_items,
        max_size=num_items
    ))
    return {'items': items}

Shrinking: Minimal Failing Cases

When a property fails, shrinking automatically reduces the input to the smallest case that still fails.

Example: Shrinking in Action

@given(st.lists(st.integers()))
def test_no_duplicates(lst):
    # This property is false - lists CAN have duplicates
    assert len(lst) == len(set(lst))

Initial failure: [0, -1, 3, 0, -5, 2] After shrinking: [0, 0]

The framework automatically reduces the failing case from a 6-element list to the minimal 2-element duplicate.

Shrinking Strategies

Framework	Shrinking Approach	Quality
Hypothesis	Integrated reduction algorithms	Excellent
QuickCheck	Type-based shrinking	Excellent
fast-check	Custom shrinking per generator	Very Good
JSVerify	Basic shrinking	Good

Common Property Patterns

Inverse Functions

Functions that undo each other should round-trip perfectly.

@given(st.text())
def test_base64_roundtrip(s):
    import base64
    encoded = base64.b64encode(s.encode('utf-8'))
    decoded = base64.b64decode(encoded).decode('utf-8')
    assert s == decoded

Idempotence

Applying an operation multiple times has the same effect as applying it once.

@given(st.lists(st.integers()))
def test_sort_idempotent(lst):
    # Sorting twice equals sorting once
    assert sorted(sorted(lst)) == sorted(lst)

@given(st.sets(st.integers()))
def test_set_idempotent(s):
    # Converting to set twice equals once
    assert set(set(s)) == set(s)

Invariants

Certain properties remain true across transformations.

@given(st.lists(st.integers()))
def test_filter_preserves_order(lst):
    filtered = [x for x in lst if x > 0]
    # Order of filtered elements matches original
    original_positives = [x for x in lst if x > 0]
    assert filtered == original_positives

@given(st.dictionaries(st.text(), st.integers()))
def test_dict_keys_values_match(d):
    # Keys and values maintain correspondence
    assert len(d.keys()) == len(d.values())
    for key in d.keys():
        assert key in d

Oracle Comparison

Compare implementation against a simpler (but slower) reference.

def quicksort(lst):
    # Fast but complex implementation
    if len(lst) <= 1:
        return lst
    pivot = lst[0]
    left = [x for x in lst[1:] if x < pivot]
    right = [x for x in lst[1:] if x >= pivot]
    return quicksort(left) + [pivot] + quicksort(right)

@given(st.lists(st.integers()))
def test_quicksort_matches_builtin(lst):
    # Compare against Python's built-in sort
    assert quicksort(lst) == sorted(lst)

Metamorphic Relations

Relate outputs for different inputs without knowing exact expected output.

@given(st.lists(st.integers()), st.integers())
def test_search_after_insert(lst, value):
    # If we insert a value, searching for it must succeed
    lst_with_value = lst + [value]
    assert value in lst_with_value

Stateful Property Testing

Test stateful systems by generating sequences of operations.

from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
import hypothesis.strategies as st

class BankAccountMachine(RuleBasedStateMachine):
    def __init__(self):
        super().__init__()
        self.balance = 0

    @rule(amount=st.integers(min_value=1, max_value=1000))
    def deposit(self, amount):
        self.balance += amount

    @rule(amount=st.integers(min_value=1, max_value=1000))
    def withdraw(self, amount):
        if amount <= self.balance:
            self.balance -= amount

    @invariant()
    def balance_never_negative(self):
        assert self.balance >= 0

# Run stateful tests
TestBankAccount = BankAccountMachine.TestCase

Hypothesis Stateful Testing Example

class QueueMachine(RuleBasedStateMachine):
    def __init__(self):
        super().__init__()
        self.queue = []

    @rule(value=st.integers())
    def enqueue(self, value):
        self.queue.append(value)

    @rule()
    def dequeue(self):
        if self.queue:
            return self.queue.pop(0)

    @invariant()
    def queue_fifo_order(self):
        # Elements maintain FIFO order
        assert self.queue == self.queue  # Simplified invariant

Assumptions and Preconditions

Use assume() to filter generated inputs to valid scenarios.

from hypothesis import given, assume
import hypothesis.strategies as st

@given(st.integers(), st.integers())
def test_division(a, b):
    assume(b != 0)  # Skip cases where b is zero
    result = a / b
    assert result * b == a  # Within floating-point precision

Warning: Overuse of assume() can make tests inefficient by discarding many generated inputs.

Benefits of Property-Based Testing

Discovers Unexpected Edge Cases

PBT finds bugs developers don’t anticipate.

Case Study: Hypothesis discovered a Unicode handling bug in a production JSON parser that had 95% line coverage from example-based tests.

Serves as Executable Specification

Properties document system behavior more comprehensively than examples.

Reduces Test Maintenance

Properties remain valid as implementation details change.

ROI Example: 50% reduction in test updates during refactoring compared to example-based tests.

Complements Example-Based Tests

Use PBT for complex logic; use examples for specific regression tests and readability.

Challenges and Limitations

Writing Good Properties

Identifying meaningful properties requires practice and deep understanding.

Mitigation: Start with simple properties (round-trip, idempotence); add complex invariants gradually.

Performance

Generating and running hundreds of test cases takes longer than example-based tests.

Mitigation: Configure example counts; use CI for comprehensive runs, fewer examples locally.

from hypothesis import settings

@settings(max_examples=1000)  # Default is 100
@given(st.lists(st.integers()))
def test_with_more_examples(lst):
    assert len(lst) >= 0

Non-Deterministic Systems

Systems with external dependencies or randomness are harder to test with properties.

Mitigation: Use stateful testing with mocked dependencies; test at integration boundaries.

Best Practices

Start with Simple Properties

Begin with universally applicable properties.

# Simple: Type preservation
@given(st.lists(st.integers()))
def test_map_preserves_length(lst):
    assert len(list(map(lambda x: x * 2, lst))) == len(lst)

Combine Multiple Properties

Test several properties in one test for comprehensive coverage.

@given(st.lists(st.integers()))
def test_sorting_properties(lst):
    sorted_lst = sorted(lst)

    # Property 1: Length preserved
    assert len(sorted_lst) == len(lst)

    # Property 2: All elements present
    assert set(sorted_lst) == set(lst)

    # Property 3: Ordered correctly
    for i in range(len(sorted_lst) - 1):
        assert sorted_lst[i] <= sorted_lst[i + 1]

Use Example Decorators for Regressions

Pin specific failing cases as examples while keeping property tests.

from hypothesis import given, example
import hypothesis.strategies as st

@given(st.lists(st.integers()))
@example([])  # Ensure empty list is always tested
@example([1, 2, 3])  # Pin specific regression case
def test_reverse(lst):
    assert reverse(reverse(lst)) == lst

Configure Timeouts Appropriately

from hypothesis import settings
import hypothesis.strategies as st

@settings(deadline=500)  # 500ms per test case
@given(st.lists(st.integers(), max_size=10000))
def test_large_lists(lst):
    process(lst)

Real-World Applications

JSON Parsing Library

JSON library uses PBT to verify round-trip serialization for all data types.

Results: Discovered 3 edge cases in Unicode handling and floating-point precision.

Sorting Algorithm Validation

Comparison-based sort tested against built-in sort as oracle.

Results: 100% confidence in correctness across 10,000 generated inputs per test run.

API Request Validation

REST API validator tested with generated payloads matching schema constraints.

Results: Found 5 edge cases in nested object validation that manual tests missed.

Conclusion

Property-Based Testing shifts testing focus from individual examples to universal truths about system behavior. While requiring a different mindset than traditional example-based testing, PBT provides superior edge case discovery and serves as living documentation of system invariants.

Success with PBT comes from starting simple, identifying meaningful properties incrementally, and combining property-based tests with carefully chosen example-based tests for regression coverage and readability.