{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Alternative Approaches\n", "\n", "This page compares StateTracker to other Python methods for working with combinatorial structures. Understanding these alternatives will help you appreciate what makes StateTracker unique.\n", "\n", "**The key insight**: StateTracker's power comes from building a *computation DAG* (directed acyclic graph) of states, where setting the state of a derived state automatically propagates to all parent states. No other approach provides this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Problem: Complex Combinatorial Structures\n", "\n", "Real-world combinatorial problems often involve more than simple Cartesian products. Consider designing an experiment with:\n", "- 2 control samples\n", "- Plus a treatment arm with 3 treatments x 4 replicates\n", "\n", "This is a **stack** (disjoint union) of a simple state and a **product**. The total is 2 + 12 = 14 samples, but they have different structure depending on which \"arm\" is active.\n", "\n", "When you iterate through these 14 samples, you need to know:\n", "1. Which arm is active (control or treatment)?\n", "2. If treatment, what are the treatment and replicate indices?\n", "3. If you shuffle, slice, or sample, how do you track all this?\n", "\n", "Let's see how hard this is without StateTracker:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Manual Approach" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2026-01-17T23:04:23.552280Z", "iopub.status.busy": "2026-01-17T23:04:23.552046Z", "iopub.status.idle": "2026-01-17T23:04:23.565502Z", "shell.execute_reply": "2026-01-17T23:04:23.564743Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Manual enumeration of 14 samples:\n", " Sample 0: {'control': 0, 'treatment': None, 'replicate': None}\n", " Sample 1: {'control': 1, 'treatment': None, 'replicate': None}\n", " Sample 2: {'control': None, 'treatment': 0, 'replicate': 0}\n", " Sample 3: {'control': None, 'treatment': 1, 'replicate': 0}\n", " Sample 4: {'control': None, 'treatment': 2, 'replicate': 0}\n", " Sample 5: {'control': None, 'treatment': 0, 'replicate': 1}\n", " Sample 6: {'control': None, 'treatment': 1, 'replicate': 1}\n", " Sample 7: {'control': None, 'treatment': 2, 'replicate': 1}\n", " Sample 8: {'control': None, 'treatment': 0, 'replicate': 2}\n", " Sample 9: {'control': None, 'treatment': 1, 'replicate': 2}\n", " Sample 10: {'control': None, 'treatment': 2, 'replicate': 2}\n", " Sample 11: {'control': None, 'treatment': 0, 'replicate': 3}\n", " Sample 12: {'control': None, 'treatment': 1, 'replicate': 3}\n", " Sample 13: {'control': None, 'treatment': 2, 'replicate': 3}\n" ] } ], "source": [ "# Manual approach: 2 controls + (3 treatments x 4 replicates)\n", "def get_indices_manual(sample_idx):\n", " \"\"\"Manually compute which arm is active and the component indices.\"\"\"\n", " num_controls = 2\n", " num_treatments = 3\n", " num_replicates = 4\n", "\n", " if sample_idx < num_controls:\n", " # Control arm\n", " return {\"control\": sample_idx, \"treatment\": None, \"replicate\": None}\n", " else:\n", " # Treatment arm: need to decompose into treatment x replicate\n", " adjusted = sample_idx - num_controls\n", " treatment = adjusted % num_treatments\n", " replicate = adjusted // num_treatments\n", " return {\"control\": None, \"treatment\": treatment, \"replicate\": replicate}\n", "\n", "\n", "print(\"Manual enumeration of 14 samples:\")\n", "for i in range(14):\n", " print(f\" Sample {i}: {get_indices_manual(i)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works, but now imagine you want to **shuffle** these samples. You need to:\n", "1. Generate a random permutation\n", "2. Apply it before looking up indices\n", "3. Rewrite all your index math to account for the permutation\n", "\n", "And what if you want to **slice** (take samples 5-10)? Or **split** into train/test? Each operation requires rewriting the index logic. This quickly becomes unmanageable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## StateTracker's Solution: The State DAG\n", "\n", "StateTracker solves this by building a **directed acyclic graph (DAG)** of states. You declare the structure once, and StateTracker handles all the index math automatically, including for shuffles, slices, samples, and splits." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2026-01-17T23:04:23.612468Z", "iopub.status.busy": "2026-01-17T23:04:23.612300Z", "iopub.status.idle": "2026-01-17T23:04:24.450289Z", "shell.execute_reply": "2026-01-17T23:04:24.449747Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "StateCounter enumeration of 14 samples:\n", " Sample 0: control=0, treatment=None, replicate=None\n", " Sample 1: control=1, treatment=None, replicate=None\n", " Sample 2: control=None, treatment=0, replicate=0\n", " Sample 3: control=None, treatment=1, replicate=0\n", " Sample 4: control=None, treatment=2, replicate=0\n", " Sample 5: control=None, treatment=0, replicate=1\n", " Sample 6: control=None, treatment=1, replicate=1\n", " Sample 7: control=None, treatment=2, replicate=1\n", " Sample 8: control=None, treatment=0, replicate=2\n", " Sample 9: control=None, treatment=1, replicate=2\n", " Sample 10: control=None, treatment=2, replicate=2\n", " Sample 11: control=None, treatment=0, replicate=3\n", " Sample 12: control=None, treatment=1, replicate=3\n", " Sample 13: control=None, treatment=2, replicate=3\n" ] } ], "source": [ "from statetracker import Manager, State, product, stack\n", "\n", "with Manager():\n", " # Declare the structure\n", " control = State(num_values=2, name=\"control\")\n", " treatment = State(num_values=3, name=\"treatment\")\n", " replicate = State(num_values=4, name=\"replicate\")\n", "\n", " # Build: controls + (treatments x replicates)\n", " treatment_arm = product([treatment, replicate])\n", " samples = stack([control, treatment_arm])\n", "\n", " print(\"StateTracker enumeration of 14 samples:\")\n", " for state in samples:\n", " print(\n", " f\" Sample {state}: control={control.value}, treatment={treatment.value}, replicate={replicate.value}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The magic is **automatic state propagation**: when you set the state of the derived state, all parent states (control, treatment, replicate) update automatically. The inactive states show `None`.\n", "\n", "Now adding shuffle, slice, or split is trivial:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2026-01-17T23:04:24.451907Z", "iopub.status.busy": "2026-01-17T23:04:24.451760Z", "iopub.status.idle": "2026-01-17T23:04:24.455115Z", "shell.execute_reply": "2026-01-17T23:04:24.454604Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total: 14, Train: 11, Test: 3\n", "\n", "Test set (states still propagate correctly):\n", " control=0, treatment=None, replicate=None\n", " control=1, treatment=None, replicate=None\n", " control=None, treatment=2, replicate=2\n" ] } ], "source": [ "from statetracker import Manager, State, product, shuffle, split, stack\n", "\n", "with Manager():\n", " control = State(num_values=2, name=\"control\")\n", " treatment = State(num_values=3, name=\"treatment\")\n", " replicate = State(num_values=4, name=\"replicate\")\n", "\n", " treatment_arm = product([treatment, replicate])\n", " samples = stack([control, treatment_arm])\n", "\n", " # Shuffle - no manual permutation tracking needed!\n", " shuffled = shuffle(samples, seed=42)\n", "\n", " # Split into train/test - no manual index math!\n", " train, test = split(shuffled, [0.8, 0.2])\n", "\n", " print(f\"Total: {samples.num_values}, Train: {train.num_values}, Test: {test.num_values}\")\n", " print(\"\\nTest set (states still propagate correctly):\")\n", " for state in test:\n", " print(\n", " f\" control={control.value}, treatment={treatment.value}, replicate={replicate.value}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can visualize the state DAG to understand the structure:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2026-01-17T23:04:24.456644Z", "iopub.status.busy": "2026-01-17T23:04:24.456519Z", "iopub.status.idle": "2026-01-17T23:04:24.470336Z", "shell.execute_reply": "2026-01-17T23:04:24.469791Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test (counter, io=0, n=3)\n", "└── [op=Slice]\n", " └── shuffled (counter, io=0, n=14)\n", " └── [op=Shuffle]\n", " └── samples (counter, io=0, n=14)\n", " └── [op=Stack]\n", " ├── control (counter, io=0, n=2)\n", " └── treatment_arm (counter, io=0, n=12)\n", " └── [op=Product]\n", " ├── treatment (counter, io=0, n=3)\n", " └── replicate (counter, io=0, n=4)\n" ] } ], "source": [ "from statetracker import Manager, State, product, shuffle, split, stack\n", "\n", "with Manager():\n", " control = State(num_values=2, name=\"control\")\n", " treatment = State(num_values=3, name=\"treatment\")\n", " replicate = State(num_values=4, name=\"replicate\")\n", "\n", " treatment_arm = product([treatment, replicate], name=\"treatment_arm\")\n", " samples = stack([control, treatment_arm], name=\"samples\")\n", " shuffled = shuffle(samples, seed=42, name=\"shuffled\")\n", " train, test = split(shuffled, [0.8, 0.2], names=[\"train\", \"test\"])\n", "\n", " test.print_dag()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alternative Approaches in Python\n", "\n", "Let's examine what other Python tools offer and why they fall short for problems involving both products and stacks with additional operations.\n", "\n", "### itertools\n", "\n", "Python's `itertools` module provides `product` for Cartesian products and `chain` for concatenation:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2026-01-17T23:04:24.471602Z", "iopub.status.busy": "2026-01-17T23:04:24.471512Z", "iopub.status.idle": "2026-01-17T23:04:24.474502Z", "shell.execute_reply": "2026-01-17T23:04:24.474037Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "itertools.product (2 x 3):\n", " 0: (0, 0)\n", " 1: (0, 1)\n", " 2: (0, 2)\n", " 3: (1, 0)\n", " 4: (1, 1)\n", " 5: (1, 2)\n", "\n", "itertools.chain (2 controls + 3 treatments):\n", " 0: ('control', 0)\n", " 1: ('control', 1)\n", " 2: ('treatment', 0)\n", " 3: ('treatment', 1)\n", " 4: ('treatment', 2)\n" ] } ], "source": [ "from itertools import chain, product\n", "\n", "# itertools.product for Cartesian products\n", "print(\"itertools.product (2 x 3):\")\n", "for i, (t, r) in enumerate(product(range(2), range(3))):\n", " print(f\" {i}: ({t}, {r})\")\n", "\n", "# itertools.chain for concatenation\n", "print(\"\\nitertools.chain (2 controls + 3 treatments):\")\n", "controls = [(\"control\", i) for i in range(2)]\n", "treatments = [(\"treatment\", i) for i in range(3)]\n", "for i, item in enumerate(chain(controls, treatments)):\n", " print(f\" {i}: {item}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Limitations of itertools:**\n", "\n", "1. **No state propagation**: You get tuples, not connected objects. There is no way to ask \"given index 4, what are the component values?\"\n", "\n", "2. **No composition**: You cannot easily combine `product` and `chain` into a single enumerable structure with unified indexing.\n", "\n", "3. **No tracking for chain**: With `chain`, you lose information about which source contributed each element. You have to encode it manually (as we did with the tuples).\n", "\n", "4. **Operations do not compose**: To shuffle an `itertools.product`, you must materialize it into a list first. Then you lose the ability to know which original indices correspond to each shuffled position." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NumPy's unravel_index / ravel_multi_index\n", "\n", "NumPy provides functions to convert between flat indices and multi-dimensional indices for arrays:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2026-01-17T23:04:24.475944Z", "iopub.status.busy": "2026-01-17T23:04:24.475841Z", "iopub.status.idle": "2026-01-17T23:04:24.478466Z", "shell.execute_reply": "2026-01-17T23:04:24.477960Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Flat index 4 -> treatment=1, replicate=1\n", "treatment=1, replicate=1 -> flat index 4\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Shape: 2 treatments x 3 replicates\n", "shape = (2, 3)\n", "\n", "# Convert flat index 4 to multi-dimensional indices\n", "# Note: numpy uses row-major (C) order by default\n", "indices = np.unravel_index(4, shape)\n", "print(f\"Flat index 4 -> treatment={indices[0]}, replicate={indices[1]}\")\n", "\n", "# Convert back to flat index\n", "flat = np.ravel_multi_index(indices, shape)\n", "print(f\"treatment={indices[0]}, replicate={indices[1]} -> flat index {flat}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Limitations of NumPy's index functions:**\n", "\n", "1. **Only rectangular products**: NumPy assumes a regular multi-dimensional array shape. It cannot represent structures like \"2 controls + (3 x 4 treatments)\" which are not rectangular.\n", "\n", "2. **No stacks (disjoint unions)**: There is no NumPy equivalent for StateTracker's `stack` operation.\n", "\n", "3. **No state propagation**: You get index tuples, not connected state objects that track state.\n", "\n", "4. **No composable operations**: Slicing, shuffling, or sampling requires manual reimplementation of all the index math." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manual Index Math (divmod)\n", "\n", "You can always compute indices manually using `divmod`, as shown in the [Motivation](motivation.ipynb) page:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2026-01-17T23:04:24.479980Z", "iopub.status.busy": "2026-01-17T23:04:24.479867Z", "iopub.status.idle": "2026-01-17T23:04:24.483101Z", "shell.execute_reply": "2026-01-17T23:04:24.482382Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Manual product enumeration:\n", " 0: treatment=0, replicate=0\n", " 1: treatment=1, replicate=0\n", " 2: treatment=0, replicate=1\n", " 3: treatment=1, replicate=1\n", " 4: treatment=0, replicate=2\n", " 5: treatment=1, replicate=2\n" ] } ], "source": [ "def get_product_indices(flat_idx, sizes):\n", " \"\"\"Convert flat index to component indices for a Cartesian product.\"\"\"\n", " indices = []\n", " for size in sizes:\n", " flat_idx, idx = divmod(flat_idx, size)\n", " indices.append(idx)\n", " return tuple(indices)\n", "\n", "\n", "# Example: 2 treatments x 3 replicates\n", "sizes = [2, 3]\n", "print(\"Manual product enumeration:\")\n", "for flat in range(6):\n", " t, r = get_product_indices(flat, sizes)\n", " print(f\" {flat}: treatment={t}, replicate={r}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Limitations of manual index math:**\n", "\n", "1. **Does not compose**: Every new operation (stack, slice, shuffle, sample) requires writing new index logic from scratch.\n", "\n", "2. **Error-prone**: Off-by-one errors, wrong divisors, forgetting edge cases. Manual index math is a minefield.\n", "\n", "3. **No state tracking**: You get tuples, not connected objects. You cannot ask \"what is the current treatment value?\" without recomputing.\n", "\n", "4. **Tedious for complex structures**: As shown in the first example, combining products and stacks manually is already complicated. Adding shuffle or slice makes it worse." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What Makes StateTracker Different\n", "\n", "The key difference is that StateTracker builds a **computation DAG** where:\n", "\n", "1. **Leaf states** represent basic dimensions (control, treatment, replicate, etc.)\n", "2. **Operations** (product, stack, slice, shuffle, etc.) create derived states\n", "3. **State propagation** flows automatically from any derived state to all its ancestors\n", "\n", "This means you declare your structure once, and StateTracker handles all the index math for every operation you compose on top." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison Summary\n", "\n", "| Feature | itertools | NumPy | Manual divmod | StateTracker |\n", "|---------|-----------|-------|---------------|--------------|\n", "| **Cartesian products** | Yes | Yes | Yes | Yes |\n", "| **Disjoint unions (stack)** | chain (no tracking) | No | Manual | **Yes, with tracking** |\n", "| **State propagation** | No | No | No | **Automatic** |\n", "| **Composable operations** | No | No | No | **Yes** |\n", "| **Shuffle/slice/sample** | Must materialize | Manual | Manual | **Built-in** |\n", "| **Conflict detection** | N/A | N/A | Manual | **Automatic** |\n", "| **Synchronization** | N/A | N/A | Manual | **Built-in** |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## When to Use Each Approach\n", "\n", "### Use itertools when:\n", "- You just need to iterate through all combinations once\n", "- You do not need to track which component indices correspond to each element\n", "- Your structure is simple (just products or just chains, not both)\n", "\n", "### Use NumPy when:\n", "- You are working with actual array data, not just indices\n", "- Your structure is a regular rectangular grid\n", "- You need fast vectorized operations on the data\n", "\n", "### Use manual divmod when:\n", "- You have a one-off simple product\n", "- You are comfortable with the math and do not need composition\n", "\n", "### Use StateTracker when:\n", "- You have complex structures combining products AND stacks\n", "- You need automatic tracking of which components are active\n", "- You want to compose operations (shuffle, slice, sample, split) without reimplementing index math\n", "- You need synchronization between states\n", "- You want automatic conflict detection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "StateTracker's unique value is **automatic child-to-parent state propagation through a computation DAG**. This is fundamentally different from:\n", "\n", "- **itertools**: Provides iteration but no state tracking or composition\n", "- **NumPy**: Provides index math for rectangular arrays but cannot handle stacks or complex compositions\n", "- **Manual divmod**: Works for simple cases but does not compose and is error-prone\n", "\n", "When you need to enumerate combinatorial structures that combine products, stacks, slices, shuffles, samples, or splits, while always knowing which component values correspond to each state, StateTracker is the right tool.\n", "\n", "For more details on StateTracker's capabilities, see:\n", "- [Quick Start](quickstart.ipynb) - Get started with basic usage\n", "- [Core Concepts](concepts.ipynb) - Understand the state DAG and state propagation\n", "- [Operations](operations.ipynb) - Complete reference for all operations" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 2 }