{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Why StateTracker?\n", "\n", "StateTracker was developed to support the design of complex DNA sequence libraries (see [PoolParty](https://github.com/jkinney/poolparty)), but it solves a general problem that arises whenever you need to enumerate a combinatorial space.\n", "\n", "This page explains the core problem StateTracker addresses and why existing approaches fall short." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Problem: Random Access to Combinatorial Spaces\n", "\n", "Consider a common scenario: you're designing an experiment with multiple conditions. Say you have 3 treatments and 4 replicates, giving you 12 experimental samples:\n", "\n", "| Sample | Treatment | Replicate |\n", "|--------|-----------|-----------|\n", "| 0 | 0 | 0 |\n", "| 1 | 1 | 0 |\n", "| 2 | 2 | 0 |\n", "| 3 | 0 | 1 |\n", "| ... | ... | ... |\n", "| 11 | 2 | 3 |\n", "\n", "With nested loops, enumerating this space is trivial:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample 0: treatment=0, replicate=0\n", "Sample 1: treatment=1, replicate=0\n", "Sample 2: treatment=2, replicate=0\n", "Sample 3: treatment=0, replicate=1\n", "Sample 4: treatment=1, replicate=1\n", "Sample 5: treatment=2, replicate=1\n", "Sample 6: treatment=0, replicate=2\n", "Sample 7: treatment=1, replicate=2\n", "Sample 8: treatment=2, replicate=2\n", "Sample 9: treatment=0, replicate=3\n", "Sample 10: treatment=1, replicate=3\n", "Sample 11: treatment=2, replicate=3\n" ] } ], "source": [ "# Nested loops: easy to enumerate\n", "for replicate in range(4):\n", " for treatment in range(3):\n", " sample = replicate * 3 + treatment\n", " print(f\"Sample {sample}: treatment={treatment}, replicate={replicate}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But what if you need to:\n", "\n", "- **Random access**: Given sample #7, what are its treatment and replicate?\n", "- **Shuffle**: Randomize the order of samples while still tracking which treatment/replicate each corresponds to?\n", "- **Sample**: Select a random subset of 5 samples?\n", "- **Split**: Divide into training (80%) and test (20%) sets?\n", "\n", "Nested loops can't help here. You need a way to go from a **single index** to the **component indices**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Naive Solution: Manual Index Math\n", "\n", "You can compute component indices using `divmod`:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample 7: treatment=1, replicate=2\n" ] } ], "source": [ "# Manual index math: compute treatment and replicate from sample number\n", "def get_indices(sample, num_treatments=3):\n", " replicate, treatment = divmod(sample, num_treatments)\n", " return treatment, replicate\n", "\n", "\n", "# Random access to sample #7\n", "treatment, replicate = get_indices(7)\n", "print(f\"Sample 7: treatment={treatment}, replicate={replicate}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works for simple products, but the approach has serious limitations:\n", "\n", "**1. It doesn't compose.** What if you have a more complex structure?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample 0: {'type': 'control', 'control': 0, 'treatment': None, 'replicate': None}\n", "Sample 1: {'type': 'control', 'control': 1, 'treatment': None, 'replicate': None}\n", "Sample 2: {'type': 'treatment', 'control': None, 'treatment': 0, 'replicate': 0}\n", "Sample 7: {'type': 'treatment', 'control': None, 'treatment': 2, 'replicate': 1}\n", "Sample 13: {'type': 'treatment', 'control': None, 'treatment': 2, 'replicate': 3}\n" ] } ], "source": [ "# Complex scenario: 2 control samples + (3 treatments × 4 replicates)\n", "# This is a \"stack\" of a simple state and a product\n", "# Total: 2 + 12 = 14 samples\n", "\n", "\n", "def get_complex_indices(sample):\n", " \"\"\"Manual index math for: stack(control[2], product(treatment[3], replicate[4]))\"\"\"\n", " if sample < 2:\n", " return {\"type\": \"control\", \"control\": sample, \"treatment\": None, \"replicate\": None}\n", " else:\n", " adjusted = sample - 2\n", " replicate, treatment = divmod(adjusted, 3)\n", " return {\n", " \"type\": \"treatment\",\n", " \"control\": None,\n", " \"treatment\": treatment,\n", " \"replicate\": replicate,\n", " }\n", "\n", "\n", "# This is already getting complicated...\n", "for i in [0, 1, 2, 7, 13]:\n", " print(f\"Sample {i}: {get_complex_indices(i)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2. Every operation requires new math.** Want to shuffle? You need to track a permutation and apply it before computing indices. Want to sample? You need to track which original indices were sampled. Want to split? More bookkeeping.\n", "\n", "**3. It's error-prone.** Off-by-one errors, wrong divisors, forgetting to handle edge cases—manual index math is a minefield." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## StateTracker's Solution: Composable States\n", "\n", "StateTracker solves this with a simple but powerful idea: **build a state DAG that mirrors your combinatorial structure, then let state propagate automatically**.\n", "\n", "Here's the same complex scenario with StateTracker:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample 0: control=0, treatment=None, replicate=None\n", "Sample 1: control=1, treatment=None, replicate=None\n", "Sample 2: control=None, treatment=0, replicate=0\n", "Sample 3: control=None, treatment=1, replicate=0\n", "Sample 4: control=None, treatment=2, replicate=0\n", "Sample 5: control=None, treatment=0, replicate=1\n", "Sample 6: control=None, treatment=1, replicate=1\n", "Sample 7: control=None, treatment=2, replicate=1\n", "Sample 8: control=None, treatment=0, replicate=2\n", "Sample 9: control=None, treatment=1, replicate=2\n", "Sample 10: control=None, treatment=2, replicate=2\n", "Sample 11: control=None, treatment=0, replicate=3\n", "Sample 12: control=None, treatment=1, replicate=3\n", "Sample 13: control=None, treatment=2, replicate=3\n" ] } ], "source": [ "from statetracker import Manager, State, product, stack\n", "\n", "with Manager():\n", " # Define the structure declaratively\n", " control = State(num_values=2, name=\"control\")\n", " treatment = State(num_values=3, name=\"treatment\")\n", " replicate = State(num_values=4, name=\"replicate\")\n", "\n", " # Compose: stack control with (treatment × replicate)\n", " treatment_arm = product([treatment, replicate])\n", " samples = stack([control, treatment_arm])\n", "\n", " # Now iterate—parent states update automatically!\n", " for state in samples:\n", " print(\n", " f\"Sample {state}: control={control.value}, treatment={treatment.value}, replicate={replicate.value}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The key insight: **set one state, and all parent states propagate automatically**. This gives you single-index random access to any point in the combinatorial space:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample 7: control=None, treatment=2, replicate=1\n" ] } ], "source": [ "with Manager():\n", " control = State(num_values=2, name=\"control\")\n", " treatment = State(num_values=3, name=\"treatment\")\n", " replicate = State(num_values=4, name=\"replicate\")\n", "\n", " treatment_arm = product([treatment, replicate])\n", " samples = stack([control, treatment_arm])\n", "\n", " # Random access: what are the indices for sample #7?\n", " samples.value = 7\n", " print(\n", " f\"Sample 7: control={control.value}, treatment={treatment.value}, replicate={replicate.value}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And because StateTracker handles the index math internally, operations like shuffle, sample, and split become trivial:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total samples: 14\n", "Train samples: 11\n", "Test samples: 3\n", "\n", "Test set:\n", " samples=0, control=0\n", " samples=1, control=1\n", " samples=10, treatment=2, replicate=2\n" ] } ], "source": [ "from statetracker import sample, shuffle, split\n", "\n", "with Manager():\n", " control = State(num_values=2, name=\"control\")\n", " treatment = State(num_values=3, name=\"treatment\")\n", " replicate = State(num_values=4, name=\"replicate\")\n", "\n", " treatment_arm = product([treatment, replicate])\n", " samples = stack([control, treatment_arm], name=\"samples\")\n", "\n", " # Shuffle: randomize sample order\n", " shuffled = shuffle(samples, seed=42)\n", "\n", " # Split: 80% train, 20% test\n", " train, test = split(shuffled, [0.8, 0.2])\n", "\n", " print(f\"Total samples: {samples.num_values}\")\n", " print(f\"Train samples: {train.num_values}\")\n", " print(f\"Test samples: {test.num_values}\")\n", " print()\n", "\n", " # Iterate through test set—parent states still propagate correctly!\n", " # Use include_inactive=False to show only active states (filters out None)\n", " print(\"Test set:\")\n", " for state in test:\n", " print(\" \", end=\"\")\n", " test.print_states(include_inactive=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## When to Use StateTracker\n", "\n", "StateTracker is useful whenever you need **single-index access to a combinatorial space**. Common scenarios include:\n", "\n", "**Experimental Design**\n", "- Randomizing treatment/control order while tracking which condition each sample belongs to\n", "- Splitting experiments into batches while maintaining structured indices\n", "\n", "**Combinatorial Libraries**\n", "- Generating DNA sequence variants with structured indices (the original motivation—see [PoolParty](https://github.com/jkinney/poolparty))\n", "- Enumerating parameter combinations for hyperparameter search\n", "\n", "**Machine Learning**\n", "- Creating train/validation/test splits on structured datasets\n", "- Stratified sampling from combinatorial data\n", "\n", "**General Enumeration**\n", "- Any domain where you build complex iteration patterns from simpler ones\n", "- When you need to shuffle, sample, or slice a combinatorial space without reimplementing index math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "**If you've ever written nested loops and wished you could shuffle the iteration order, or needed random access to a point in a Cartesian product, StateTracker is for you.**\n", "\n", "The library lets you:\n", "1. Define your combinatorial structure declaratively\n", "2. Compose states using algebraic operations (product, stack, slice, etc.)\n", "3. Set one state and have all parent states propagate automatically\n", "4. Freely shuffle, sample, split, and slice without reimplementing index math\n", "\n", "Continue to the [Quick Start](quickstart.ipynb) to learn the basics, or dive into [Core Concepts](concepts.ipynb) for a deeper understanding." ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 2 }