Why StateTracker?

StateTracker was developed to support the design of complex DNA sequence libraries (see PoolParty), but it solves a general problem that arises whenever you need to enumerate a combinatorial space.

This page explains the core problem StateTracker addresses and why existing approaches fall short.

The Problem: Random Access to Combinatorial Spaces

Consider a common scenario: you’re designing an experiment with multiple conditions. Say you have 3 treatments and 4 replicates, giving you 12 experimental samples:

Sample

Treatment

Replicate

0

0

0

1

1

0

2

2

0

3

0

1

11

2

3

With nested loops, enumerating this space is trivial:

[1]:
# Nested loops: easy to enumerate
for replicate in range(4):
    for treatment in range(3):
        sample = replicate * 3 + treatment
        print(f"Sample {sample}: treatment={treatment}, replicate={replicate}")
Sample 0: treatment=0, replicate=0
Sample 1: treatment=1, replicate=0
Sample 2: treatment=2, replicate=0
Sample 3: treatment=0, replicate=1
Sample 4: treatment=1, replicate=1
Sample 5: treatment=2, replicate=1
Sample 6: treatment=0, replicate=2
Sample 7: treatment=1, replicate=2
Sample 8: treatment=2, replicate=2
Sample 9: treatment=0, replicate=3
Sample 10: treatment=1, replicate=3
Sample 11: treatment=2, replicate=3

But what if you need to:

  • Random access: Given sample #7, what are its treatment and replicate?

  • Shuffle: Randomize the order of samples while still tracking which treatment/replicate each corresponds to?

  • Sample: Select a random subset of 5 samples?

  • Split: Divide into training (80%) and test (20%) sets?

Nested loops can’t help here. You need a way to go from a single index to the component indices.

The Naive Solution: Manual Index Math

You can compute component indices using divmod:

[2]:
# Manual index math: compute treatment and replicate from sample number
def get_indices(sample, num_treatments=3):
    replicate, treatment = divmod(sample, num_treatments)
    return treatment, replicate


# Random access to sample #7
treatment, replicate = get_indices(7)
print(f"Sample 7: treatment={treatment}, replicate={replicate}")
Sample 7: treatment=1, replicate=2

This works for simple products, but the approach has serious limitations:

1. It doesn’t compose. What if you have a more complex structure?

[3]:
# Complex scenario: 2 control samples + (3 treatments × 4 replicates)
# This is a "stack" of a simple state and a product
# Total: 2 + 12 = 14 samples


def get_complex_indices(sample):
    """Manual index math for: stack(control[2], product(treatment[3], replicate[4]))"""
    if sample < 2:
        return {"type": "control", "control": sample, "treatment": None, "replicate": None}
    else:
        adjusted = sample - 2
        replicate, treatment = divmod(adjusted, 3)
        return {
            "type": "treatment",
            "control": None,
            "treatment": treatment,
            "replicate": replicate,
        }


# This is already getting complicated...
for i in [0, 1, 2, 7, 13]:
    print(f"Sample {i}: {get_complex_indices(i)}")
Sample 0: {'type': 'control', 'control': 0, 'treatment': None, 'replicate': None}
Sample 1: {'type': 'control', 'control': 1, 'treatment': None, 'replicate': None}
Sample 2: {'type': 'treatment', 'control': None, 'treatment': 0, 'replicate': 0}
Sample 7: {'type': 'treatment', 'control': None, 'treatment': 2, 'replicate': 1}
Sample 13: {'type': 'treatment', 'control': None, 'treatment': 2, 'replicate': 3}

2. Every operation requires new math. Want to shuffle? You need to track a permutation and apply it before computing indices. Want to sample? You need to track which original indices were sampled. Want to split? More bookkeeping.

3. It’s error-prone. Off-by-one errors, wrong divisors, forgetting to handle edge cases—manual index math is a minefield.

StateTracker’s Solution: Composable States

StateTracker solves this with a simple but powerful idea: build a state DAG that mirrors your combinatorial structure, then let state propagate automatically.

Here’s the same complex scenario with StateTracker:

[4]:
from statetracker import Manager, State, product, stack

with Manager():
    # Define the structure declaratively
    control = State(num_values=2, name="control")
    treatment = State(num_values=3, name="treatment")
    replicate = State(num_values=4, name="replicate")

    # Compose: stack control with (treatment × replicate)
    treatment_arm = product([treatment, replicate])
    samples = stack([control, treatment_arm])

    # Now iterate—parent states update automatically!
    for state in samples:
        print(
            f"Sample {state}: control={control.value}, treatment={treatment.value}, replicate={replicate.value}"
        )
Sample 0: control=0, treatment=None, replicate=None
Sample 1: control=1, treatment=None, replicate=None
Sample 2: control=None, treatment=0, replicate=0
Sample 3: control=None, treatment=1, replicate=0
Sample 4: control=None, treatment=2, replicate=0
Sample 5: control=None, treatment=0, replicate=1
Sample 6: control=None, treatment=1, replicate=1
Sample 7: control=None, treatment=2, replicate=1
Sample 8: control=None, treatment=0, replicate=2
Sample 9: control=None, treatment=1, replicate=2
Sample 10: control=None, treatment=2, replicate=2
Sample 11: control=None, treatment=0, replicate=3
Sample 12: control=None, treatment=1, replicate=3
Sample 13: control=None, treatment=2, replicate=3

The key insight: set one state, and all parent states propagate automatically. This gives you single-index random access to any point in the combinatorial space:

[5]:
with Manager():
    control = State(num_values=2, name="control")
    treatment = State(num_values=3, name="treatment")
    replicate = State(num_values=4, name="replicate")

    treatment_arm = product([treatment, replicate])
    samples = stack([control, treatment_arm])

    # Random access: what are the indices for sample #7?
    samples.value = 7
    print(
        f"Sample 7: control={control.value}, treatment={treatment.value}, replicate={replicate.value}"
    )
Sample 7: control=None, treatment=2, replicate=1

And because StateTracker handles the index math internally, operations like shuffle, sample, and split become trivial:

[6]:
from statetracker import sample, shuffle, split

with Manager():
    control = State(num_values=2, name="control")
    treatment = State(num_values=3, name="treatment")
    replicate = State(num_values=4, name="replicate")

    treatment_arm = product([treatment, replicate])
    samples = stack([control, treatment_arm], name="samples")

    # Shuffle: randomize sample order
    shuffled = shuffle(samples, seed=42)

    # Split: 80% train, 20% test
    train, test = split(shuffled, [0.8, 0.2])

    print(f"Total samples: {samples.num_values}")
    print(f"Train samples: {train.num_values}")
    print(f"Test samples: {test.num_values}")
    print()

    # Iterate through test set—parent states still propagate correctly!
    # Use include_inactive=False to show only active states (filters out None)
    print("Test set:")
    for state in test:
        print("  ", end="")
        test.print_states(include_inactive=False)
Total samples: 14
Train samples: 11
Test samples: 3

Test set:
  samples=0, control=0
  samples=1, control=1
  samples=10, treatment=2, replicate=2

When to Use StateTracker

StateTracker is useful whenever you need single-index access to a combinatorial space. Common scenarios include:

Experimental Design

  • Randomizing treatment/control order while tracking which condition each sample belongs to

  • Splitting experiments into batches while maintaining structured indices

Combinatorial Libraries

  • Generating DNA sequence variants with structured indices (the original motivation—see PoolParty)

  • Enumerating parameter combinations for hyperparameter search

Machine Learning

  • Creating train/validation/test splits on structured datasets

  • Stratified sampling from combinatorial data

General Enumeration

  • Any domain where you build complex iteration patterns from simpler ones

  • When you need to shuffle, sample, or slice a combinatorial space without reimplementing index math

Summary

If you’ve ever written nested loops and wished you could shuffle the iteration order, or needed random access to a point in a Cartesian product, StateTracker is for you.

The library lets you:

  1. Define your combinatorial structure declaratively

  2. Compose states using algebraic operations (product, stack, slice, etc.)

  3. Set one state and have all parent states propagate automatically

  4. Freely shuffle, sample, split, and slice without reimplementing index math

Continue to the Quick Start to learn the basics, or dive into Core Concepts for a deeper understanding.