Statistics by Simulation: See It Before the Math

April 19, 2025 goodman 133 Views coding, intuition, simulation, statistics

Statistics by Simulation: See It Before the Math

Table of Contents

🧭 What & Why

Simulation-based statistics uses computation (many fast, repeatable trials) to approximate sampling distributions and test statistics. You get an immediate picture of uncertainty—before any algebra—so decisions rest on what the data could plausibly have looked like under repeat sampling. This “visual first, formal next” approach is now standard in modern intro curricula and interactive texts. OpenIntro+1

Key benefits

Intuition: Watch sampling variability emerge; link pictures to p-values and confidence intervals. Inferential Thinking
Fewer fragile assumptions: Randomization (permutation) tests and bootstraps work even when classical formulas are awkward. PennState: Statistics Online Courses
Reproducibility: Code = procedure; re-run with new data in seconds.
Bridges to theory: Simulations of the Central Limit Theorem (CLT) make “why normal?” tangible. Stats Interactives

✅ Quick Start (Do This Today)

Goal: Compare two groups (A vs B) on a mean—get a p-value by randomization test and a 95% CI of the difference by bootstrap.

Data shape (example):

group: “A” or “B”
value: numeric outcome

Python (Permutation + Bootstrap, ~50 lines)

import numpy as np

rng = np.random.default_rng(42)

# Example data
A = np.array([5.2, 6.1, 5.9, 6.3, 5.7])
B = np.array([4.8, 5.0, 5.3, 4.9, 5.2])obs_diff = A.mean() – B.mean()

# Permutation test (null: labels exchangeable)
all_vals = np.concatenate([A, B])
nA = len(A)
iters = 10000
perm_diffs = np.empty(iters)
for i in range(iters):
rng.shuffle(all_vals)
perm_diffs[i] = all_vals[:nA].mean() – all_vals[nA:].mean()

p_value = (np.sum(np.abs(perm_diffs) >= abs(obs_diff)) + 1) / (iters + 1)

# Bootstrap CI for difference in means
boot_diffs = np.empty(iters)
for i in range(iters):
bootA = rng.choice(A, size=len(A), replace=True)
bootB = rng.choice(B, size=len(B), replace=True)
boot_diffs[i] = bootA.mean() – bootB.mean()

ci_low, ci_high = np.percentile(boot_diffs, [2.5, 97.5])

print(f”Observed diff: {obs_diff:.3f}”)
print(f”Permutation p-value (two-sided): {p_value:.4f}”)
print(f”Bootstrap 95% CI: [{ci_low:.3f}, {ci_high:.3f}]”)

Read it:

If p < 0.05, the observed difference is rare under “no true difference.”
If the 95% CI excludes 0, it agrees with the test.
This mirrors textbook definitions of bootstrap and randomization tests. Inferential Thinking+1

R (tidyverse-style with infer or base + boot) is equally straightforward; the logic is identical: shuffle labels for the test; resample with replacement for the CI. stats.oarc.ucla.edu

🛠️ Techniques & Frameworks

1) Bootstrap (resampling with replacement)

Use for: Standard errors and confidence intervals when formulas are messy.
Idea: Treat your sample as a stand-in for the population; draw thousands of bootstrap resamples; compute the statistic each time; use the empirical distribution to get SEs and CIs. Inferential Thinking
Why it works (intuition): If the sample is representative, the resample-to-resample variation mimics sample-to-sample variation from the population. data8.org
Learn more: UCLA OARC’s concise primers and examples. stats.oarc.ucla.edu

2) Randomization / Permutation Tests (label-shuffling)

Use for: Hypothesis tests with minimal distributional assumptions.
Idea: Build the null distribution by relabeling treatments many times, recomputing a test statistic (e.g., difference in means). The p-value is the proportion of permuted stats at least as extreme as observed. PennState: Statistics Online Courses
Pedigree & modern review: From Fisher’s early ideas to current formal results. Oxford Academic+1

3) Monte Carlo “What-Ifs”

Use for: Propagating uncertainty in models or processes (e.g., risk, queues, reliability).
Idea: Specify input distributions; simulate thousands of scenarios; summarize outputs to see likely ranges and tail risks. Authoritative handbooks and tools emphasize defining inputs, iterations, and sensitivity checks. itl.nist.gov+1

4) Central Limit Theorem (via simulation)

Use for: Explaining why sample means often look normal and why many classical intervals/tests work.
Idea: Draw many samples of size n from a skewed or odd-shaped population; plot the distribution of sample means; as n grows, the shape tends to normal with SD ≈ σ/√n. Interactive CLT demonstrators make this vivid. Stats Interactives+1

Choosing the method (at a glance)

Goal	Method	Core step	Notes
95% CI for any statistic	Bootstrap	Resample with replacement; take percentiles	Needs representative sample
p-value with minimal assumptions	Randomization	Shuffle labels; recompute statistic	Mirrors experimental assignment
Explore model/output uncertainty	Monte Carlo	Sample from input distributions	Do sensitivity analysis
Explain “why normal?”	CLT simulation	Sample many means; watch shape	Links pictures to formulas

🗺️ 7-Day Habit Plan (from zero to confident)

Day 1 – See variability.
Install Python or R. Simulate 10,000 coin flips; compare empirical to theoretical 0.5. (Checkpoint: your histogram stabilizes as trials grow.)

Day 2 – Bootstrap a mean.
Take any dataset; bootstrap the mean difference (A–B); compute a 95% CI. (Checkpoint: You can explain why CI width shrinks as n grows.) Inferential Thinking

Day 3 – Randomization test.
Recreate the Quick Start example on your own data; report a two-sided p-value with 10,000 label shuffles. (Checkpoint: You can sketch the null distribution and mark the observed stat.) PennState: Statistics Online Courses

Day 4 – CLT demo.
Pick a skewed population (e.g., exponential); show that means of size n=30 look ~normal; compare SDs to σ/√n. (Checkpoint: Overlay a normal curve.) data8.org

Day 5 – Monte Carlo planning.
Model a simple process (say, monthly demand). Specify inputs; run 20k simulations; report median, 90% interval, and worst-case. (Checkpoint: A sensitivity plot—what input drives risk most?) itl.nist.gov

Day 6 – Refactor into a template.
Wrap your code into functions; save as a notebook/script with parameters.

Day 7 – Communicate.
Write a 1-page “visual first” report: one plot of the distribution, one sentence on the decision, one note on assumptions.

👥 Audience Variations

Students & Self-learners: Start with Seeing Theory visuals; then reproduce those pictures with your own code. Seeing Theory
Professionals (Product, Ops, Finance): Use Monte Carlo to stress-test plans; report percentiles (P50, P90) and decision thresholds. itl.nist.gov
Teachers: Pair CLT demos with a quick in-class permutation test so students connect randomization (assignment) with random sampling (CLT). Stats Interactives

⚠️ Mistakes & Myths to Avoid

Myth: “Simulation is a shortcut that dodges theory.”
Reality: It implements theoretical definitions (sampling distributions, null distributions) computationally. Inferential Thinking
Mistake: Too few iterations.
Use ≥5,000 for smooth pictures; 10,000+ if tails matter.
Mistake: Bootstrapping toxic data.
Garbage in, garbage out—bootstrap assumes your sample reflects the population. data8.org
Mistake: Ignoring design.
Randomization tests mirror the random assignment mechanism; don’t shuffle what wasn’t randomized. home.uchicago.edu

💬 Real-Life Examples & Scripts

Explaining to a teammate:
“I’ll shuffle the labels 10,000 times to see what differences happen by chance when there’s truly no effect. Our observed difference sits in the top 1.2% of that null distribution, so p≈0.012.”
Reporting a bootstrap CI:
“Across 10,000 resamples, the median difference was 0.62 with a 95% interval [0.15, 1.09]. Because 0 isn’t inside, we have evidence of a positive effect.”
CLT classroom line:
“No matter how weird the population is, the mean of many samples tends to look normal—watch the animation as n increases.” Statistics LibreTexts

🧰 Tools, Apps & Resources

Python: numpy, pandas, matplotlib, scipy, statsmodels (permutation, bootstrap utilities).
R: infer (tidy verbs for simulation-based inference), boot (general bootstrap), ggplot2. stats.oarc.ucla.edu
Interactive learning: Seeing Theory (visuals), Data 8 textbook/notes (code-driven inference), CLT interactives. Seeing Theory+2Inferential Thinking+2
Reference handbooks: NIST/SEMATECH e-Handbook for Monte Carlo and uncertainty analysis. itl.nist.gov

📌 Key Takeaways

Simulation turns abstract definitions into pictures you can measure.
Use bootstrap for CIs, randomization for p-values with fewer assumptions, Monte Carlo for planning, and CLT demos to link pictures to theory. Stats Interactives+3Inferential Thinking+3PennState: Statistics Online Courses+3
Package your workflow as reusable code; lead with the graphic, end with a crisp decision statement.

❓ FAQs

1) How many iterations are “enough”?
For smooth central estimates, 5,000–10,000 iterations are typical; increase for tail probabilities or very small p-values. (Computers make this cheap.) itl.nist.gov

2) Is simulation “cheating” compared with formulas?
No. It approximates the same targets (sampling and null distributions) when formulas are complex or assumptions are shaky—standard in modern courses. OpenIntro

3) Can I bootstrap everything?
You need a representative sample and an i.i.d.-like setup; for time series or clustered data, use block or stratified bootstraps. See primers before applying. stats.oarc.ucla.edu

4) What’s the difference between bootstrap and randomization tests?
Bootstrap resamples with replacement from your sample to mimic repeated sampling; randomization reshuffles labels to mimic the assignment mechanism under the null. Inferential Thinking+1

5) When should I prefer a t-test over permutation?
With clean normal-ish residuals and equal variances, t-tests are efficient. Permutation tests are robust when those assumptions are doubtful. PennState: Statistics Online Courses

6) Can simulation help with teaching/learning?
Yes—interactive CLT and inference demos improve conceptual grasp and engagement. PMC

7) How do I present results to non-statisticians?
Show the simulated distribution, mark the observed statistic or CI, and add one sentence on the decision rule.

📚 References

OpenIntro Introductory Statistics with Randomization and Simulation (ISRS)—simulation-first intro text. OpenIntro
UC Berkeley Computational and Inferential Thinking (Data 8)—bootstrap & simulation-driven inference. Inferential Thinking
NIST/SEMATECH e-Handbook of Statistical Methods—Monte Carlo and uncertainty analysis. itl.nist.gov
UCLA OARC Introduction to Bootstrapping—concise tutorial and examples. stats.oarc.ucla.edu
Penn State STAT 200 Randomization Procedures—course notes on building null distributions. PennState: Statistics Online Courses
Brown University Seeing Theory—interactive visualization of core concepts. Seeing Theory
Columbia University Stats Interactives: CLT—interactive Central Limit Theorem demo. Stats Interactives
Stanford/UChicago (Romano, Shaikh et al.) Randomization Inference: Theory and Applications (2025 preprint). home.uchicago.edu
Oxford Academic “To permute or not to permute”—overview of permutation tests and properties. Oxford Academic
OpenIntro Introduction to Modern Statistics (IMS)—simulation + traditional methods together. OpenIntro