Why StateTracker?
StateTracker was developed to support the design of complex DNA sequence libraries (see PoolParty), but it solves a general problem that arises whenever you need to enumerate a combinatorial space.
This page explains the core problem StateTracker addresses and why existing approaches fall short.
The Problem: Random Access to Combinatorial Spaces
Consider a common scenario: you’re designing an experiment with multiple conditions. Say you have 3 treatments and 4 replicates, giving you 12 experimental samples:
Sample |
Treatment |
Replicate |
|---|---|---|
0 |
0 |
0 |
1 |
1 |
0 |
2 |
2 |
0 |
3 |
0 |
1 |
… |
… |
… |
11 |
2 |
3 |
With nested loops, enumerating this space is trivial:
# Nested loops: easy to enumerate
for replicate in range(4):
for treatment in range(3):
sample = replicate * 3 + treatment
print(f"Sample {sample}: treatment={treatment}, replicate={replicate}")
Sample 0: treatment=0, replicate=0
Sample 1: treatment=1, replicate=0
Sample 2: treatment=2, replicate=0
Sample 3: treatment=0, replicate=1
Sample 4: treatment=1, replicate=1
Sample 5: treatment=2, replicate=1
Sample 6: treatment=0, replicate=2
Sample 7: treatment=1, replicate=2
Sample 8: treatment=2, replicate=2
Sample 9: treatment=0, replicate=3
Sample 10: treatment=1, replicate=3
Sample 11: treatment=2, replicate=3
But what if you need to:
Random access: Given sample #7, what are its treatment and replicate?
Shuffle: Randomize the order of samples while still tracking which treatment/replicate each corresponds to?
Sample: Select a random subset of 5 samples?
Split: Divide into training (80%) and test (20%) sets?
Nested loops can’t help here. You need a way to go from a single index to the component indices.
The Naive Solution: Manual Index Math
You can compute component indices using divmod:
# Manual index math: compute treatment and replicate from sample number
def get_indices(sample, num_treatments=3):
replicate, treatment = divmod(sample, num_treatments)
return treatment, replicate
# Random access to sample #7
treatment, replicate = get_indices(7)
print(f"Sample 7: treatment={treatment}, replicate={replicate}")
Sample 7: treatment=1, replicate=2
This works for simple products, but the approach has serious limitations:
1. It doesn’t compose. What if you have a more complex structure?
# Complex scenario: 2 control samples + (3 treatments x 4 replicates)
# This is a "stack" of a simple state and a product
# Total: 2 + 12 = 14 samples
def get_complex_indices(sample):
"""Manual index math for: stack(control[2], product(treatment[3], replicate[4]))"""
if sample < 2:
return {"type": "control", "control": sample, "treatment": None, "replicate": None}
else:
adjusted = sample - 2
replicate, treatment = divmod(adjusted, 3)
return {
"type": "treatment",
"control": None,
"treatment": treatment,
"replicate": replicate,
}
# This is already getting complicated...
for i in [0, 1, 2, 7, 13]:
print(f"Sample {i}: {get_complex_indices(i)}")
Sample 0: {'type': 'control', 'control': 0, 'treatment': None, 'replicate': None}
Sample 1: {'type': 'control', 'control': 1, 'treatment': None, 'replicate': None}
Sample 2: {'type': 'treatment', 'control': None, 'treatment': 0, 'replicate': 0}
Sample 7: {'type': 'treatment', 'control': None, 'treatment': 2, 'replicate': 1}
Sample 13: {'type': 'treatment', 'control': None, 'treatment': 2, 'replicate': 3}
2. Every operation requires new math. Want to shuffle? You need to track a permutation and apply it before computing indices. Want to sample? You need to track which original indices were sampled. Want to split? More bookkeeping.
3. It’s error-prone. Off-by-one errors, wrong divisors, forgetting to handle edge cases — manual index math is a minefield.
StateTracker’s Solution: Composable States
StateTracker solves this with a simple but powerful idea: build a state DAG that mirrors your combinatorial structure, then let values propagate automatically.
Here’s the same complex scenario with StateTracker:
from statetracker import Manager, State, product, stack
with Manager():
# Define the structure declaratively
control = State(num_values=2, name="control")
treatment = State(num_values=3, name="treatment")
replicate = State(num_values=4, name="replicate")
# Compose: stack control with (treatment x replicate)
treatment_arm = product([treatment, replicate])
samples = stack([control, treatment_arm])
# Now iterate -- parent states update automatically!
for value in samples:
print(
f"Sample {value}: control={control.value}, "
f"treatment={treatment.value}, replicate={replicate.value}"
)
Sample 0: control=0, treatment=None, replicate=None
Sample 1: control=1, treatment=None, replicate=None
Sample 2: control=None, treatment=0, replicate=0
Sample 3: control=None, treatment=1, replicate=0
Sample 4: control=None, treatment=2, replicate=0
Sample 5: control=None, treatment=0, replicate=1
Sample 6: control=None, treatment=1, replicate=1
Sample 7: control=None, treatment=2, replicate=1
Sample 8: control=None, treatment=0, replicate=2
Sample 9: control=None, treatment=1, replicate=2
Sample 10: control=None, treatment=2, replicate=2
Sample 11: control=None, treatment=0, replicate=3
Sample 12: control=None, treatment=1, replicate=3
Sample 13: control=None, treatment=2, replicate=3
The key insight: set one value, and all parent states propagate automatically. This gives you single-index random access to any point in the combinatorial space:
with Manager():
control = State(num_values=2, name="control")
treatment = State(num_values=3, name="treatment")
replicate = State(num_values=4, name="replicate")
treatment_arm = product([treatment, replicate])
samples = stack([control, treatment_arm])
# Random access: what are the indices for sample #7?
samples.value = 7
print(
f"Sample 7: control={control.value}, "
f"treatment={treatment.value}, replicate={replicate.value}"
)
Sample 7: control=None, treatment=2, replicate=1
And because StateTracker handles the index math internally, operations like shuffle, sample, and split become trivial:
from statetracker import sample, shuffle, split
with Manager():
control = State(num_values=2, name="control")
treatment = State(num_values=3, name="treatment")
replicate = State(num_values=4, name="replicate")
treatment_arm = product([treatment, replicate])
samples = stack([control, treatment_arm], name="samples")
# Shuffle: randomize sample order
shuffled = shuffle(samples, seed=42)
# Split: 80% train, 20% test
train, test = split(shuffled, [0.8, 0.2])
print(f"Total samples: {samples.num_values}")
print(f"Train samples: {train.num_values}")
print(f"Test samples: {test.num_values}")
print()
# Iterate through test set -- parent states still propagate correctly!
print("Test set:")
for value in test:
print(" ", end="")
test.print_states(include_inactive=False)
Total samples: 14
Train samples: 11
Test samples: 3
Test set:
samples=0, control=0
samples=1, control=1
samples=10, treatment=2, replicate=2
When to Use StateTracker
StateTracker is useful whenever you need single-index access to a combinatorial space. Common scenarios include:
- Experimental Design
Randomizing treatment/control order while tracking which condition each sample belongs to. Splitting experiments into batches while maintaining structured indices.
- Combinatorial Libraries
Generating DNA sequence variants with structured indices (the original motivation — see PoolParty). Enumerating parameter combinations for hyperparameter search.
- Machine Learning
Creating train/validation/test splits on structured datasets. Stratified sampling from combinatorial data.
- General Enumeration
Any domain where you build complex iteration patterns from simpler ones. When you need to shuffle, sample, or slice a combinatorial space without reimplementing index math.
Summary
If you’ve ever written nested loops and wished you could shuffle the iteration order, or needed random access to a point in a Cartesian product, StateTracker is for you.
The library lets you:
Define your combinatorial structure declaratively
Compose states using algebraic operations (product, stack, slice, etc.)
Set one value and have all parent states propagate automatically
Freely shuffle, sample, split, and slice without reimplementing index math
Continue to the Quick Start Guide to learn the basics, or dive into Core Concepts for a deeper understanding.