STEM & Coding

Statistics by Simulation: See It Before the Math

Statistics by Simulation: See It Before the Math


🧭 What & Why

Simulation-based statistics uses computation (many fast, repeatable trials) to approximate sampling distributions and test statistics. You get an immediate picture of uncertainty—before any algebra—so decisions rest on what the data could plausibly have looked like under repeat sampling. This “visual first, formal next” approach is now standard in modern intro curricula and interactive texts. OpenIntro+1

Key benefits

  • Intuition: Watch sampling variability emerge; link pictures to p-values and confidence intervals. Inferential Thinking

  • Fewer fragile assumptions: Randomization (permutation) tests and bootstraps work even when classical formulas are awkward. PennState: Statistics Online Courses

  • Reproducibility: Code = procedure; re-run with new data in seconds.

  • Bridges to theory: Simulations of the Central Limit Theorem (CLT) make “why normal?” tangible. Stats Interactives


✅ Quick Start (Do This Today)

Goal: Compare two groups (A vs B) on a mean—get a p-value by randomization test and a 95% CI of the difference by bootstrap.

Data shape (example):

  • group: “A” or “B”

  • value: numeric outcome

Python (Permutation + Bootstrap, ~50 lines)

import numpy as np
rng = np.random.default_rng(42)
# Example data
A = np.array([5.2, 6.1, 5.9, 6.3, 5.7])
B = np.array([4.8, 5.0, 5.3, 4.9, 5.2])obs_diff = A.mean() – B.mean()

# Permutation test (null: labels exchangeable)
all_vals = np.concatenate([A, B])
nA = len(A)
iters = 10000
perm_diffs = np.empty(iters)
for i in range(iters):
rng.shuffle(all_vals)
perm_diffs[i] = all_vals[:nA].mean() – all_vals[nA:].mean()

p_value = (np.sum(np.abs(perm_diffs) >= abs(obs_diff)) + 1) / (iters + 1)

# Bootstrap CI for difference in means
boot_diffs = np.empty(iters)
for i in range(iters):
bootA = rng.choice(A, size=len(A), replace=True)
bootB = rng.choice(B, size=len(B), replace=True)
boot_diffs[i] = bootA.mean() – bootB.mean()

ci_low, ci_high = np.percentile(boot_diffs, [2.5, 97.5])

print(f”Observed diff: {obs_diff:.3f}”)
print(f”Permutation p-value (two-sided): {p_value:.4f}”)
print(f”Bootstrap 95% CI: [{ci_low:.3f}, {ci_high:.3f}]”)

Read it:

  • If p < 0.05, the observed difference is rare under “no true difference.”

  • If the 95% CI excludes 0, it agrees with the test.
    This mirrors textbook definitions of bootstrap and randomization tests. Inferential Thinking+1

R (tidyverse-style with infer or base + boot) is equally straightforward; the logic is identical: shuffle labels for the test; resample with replacement for the CI. stats.oarc.ucla.edu


🛠️ Techniques & Frameworks

1) Bootstrap (resampling with replacement)

Use for: Standard errors and confidence intervals when formulas are messy.
Idea: Treat your sample as a stand-in for the population; draw thousands of bootstrap resamples; compute the statistic each time; use the empirical distribution to get SEs and CIs. Inferential Thinking
Why it works (intuition): If the sample is representative, the resample-to-resample variation mimics sample-to-sample variation from the population. data8.org
Learn more: UCLA OARC’s concise primers and examples. stats.oarc.ucla.edu

2) Randomization / Permutation Tests (label-shuffling)

Use for: Hypothesis tests with minimal distributional assumptions.
Idea: Build the null distribution by relabeling treatments many times, recomputing a test statistic (e.g., difference in means). The p-value is the proportion of permuted stats at least as extreme as observed. PennState: Statistics Online Courses
Pedigree & modern review: From Fisher’s early ideas to current formal results. Oxford Academic+1

3) Monte Carlo “What-Ifs”

Use for: Propagating uncertainty in models or processes (e.g., risk, queues, reliability).
Idea: Specify input distributions; simulate thousands of scenarios; summarize outputs to see likely ranges and tail risks. Authoritative handbooks and tools emphasize defining inputs, iterations, and sensitivity checks. itl.nist.gov+1

4) Central Limit Theorem (via simulation)

Use for: Explaining why sample means often look normal and why many classical intervals/tests work.
Idea: Draw many samples of size n from a skewed or odd-shaped population; plot the distribution of sample means; as n grows, the shape tends to normal with SD ≈ σ/√n. Interactive CLT demonstrators make this vivid. Stats Interactives+1

Choosing the method (at a glance)

Goal Method Core step Notes
95% CI for any statistic Bootstrap Resample with replacement; take percentiles Needs representative sample
p-value with minimal assumptions Randomization Shuffle labels; recompute statistic Mirrors experimental assignment
Explore model/output uncertainty Monte Carlo Sample from input distributions Do sensitivity analysis
Explain “why normal?” CLT simulation Sample many means; watch shape Links pictures to formulas

🗺️ 7-Day Habit Plan (from zero to confident)

Day 1 – See variability.
Install Python or R. Simulate 10,000 coin flips; compare empirical to theoretical 0.5. (Checkpoint: your histogram stabilizes as trials grow.)

Day 2 – Bootstrap a mean.
Take any dataset; bootstrap the mean difference (A–B); compute a 95% CI. (Checkpoint: You can explain why CI width shrinks as n grows.) Inferential Thinking

Day 3 – Randomization test.
Recreate the Quick Start example on your own data; report a two-sided p-value with 10,000 label shuffles. (Checkpoint: You can sketch the null distribution and mark the observed stat.) PennState: Statistics Online Courses

Day 4 – CLT demo.
Pick a skewed population (e.g., exponential); show that means of size n=30 look ~normal; compare SDs to σ/√n. (Checkpoint: Overlay a normal curve.) data8.org

Day 5 – Monte Carlo planning.
Model a simple process (say, monthly demand). Specify inputs; run 20k simulations; report median, 90% interval, and worst-case. (Checkpoint: A sensitivity plot—what input drives risk most?) itl.nist.gov

Day 6 – Refactor into a template.
Wrap your code into functions; save as a notebook/script with parameters.

Day 7 – Communicate.
Write a 1-page “visual first” report: one plot of the distribution, one sentence on the decision, one note on assumptions.


👥 Audience Variations

  • Students & Self-learners: Start with Seeing Theory visuals; then reproduce those pictures with your own code. Seeing Theory

  • Professionals (Product, Ops, Finance): Use Monte Carlo to stress-test plans; report percentiles (P50, P90) and decision thresholds. itl.nist.gov

  • Teachers: Pair CLT demos with a quick in-class permutation test so students connect randomization (assignment) with random sampling (CLT). Stats Interactives


⚠️ Mistakes & Myths to Avoid

  • Myth: “Simulation is a shortcut that dodges theory.”
    Reality: It implements theoretical definitions (sampling distributions, null distributions) computationally. Inferential Thinking

  • Mistake: Too few iterations.
    Use ≥5,000 for smooth pictures; 10,000+ if tails matter.

  • Mistake: Bootstrapping toxic data.
    Garbage in, garbage out—bootstrap assumes your sample reflects the population. data8.org

  • Mistake: Ignoring design.
    Randomization tests mirror the random assignment mechanism; don’t shuffle what wasn’t randomized. home.uchicago.edu


💬 Real-Life Examples & Scripts

  • Explaining to a teammate:
    “I’ll shuffle the labels 10,000 times to see what differences happen by chance when there’s truly no effect. Our observed difference sits in the top 1.2% of that null distribution, so p≈0.012.”

  • Reporting a bootstrap CI:
    “Across 10,000 resamples, the median difference was 0.62 with a 95% interval [0.15, 1.09]. Because 0 isn’t inside, we have evidence of a positive effect.”

  • CLT classroom line:
    “No matter how weird the population is, the mean of many samples tends to look normal—watch the animation as n increases.” Statistics LibreTexts


🧰 Tools, Apps & Resources

  • Python: numpy, pandas, matplotlib, scipy, statsmodels (permutation, bootstrap utilities).

  • R: infer (tidy verbs for simulation-based inference), boot (general bootstrap), ggplot2. stats.oarc.ucla.edu

  • Interactive learning: Seeing Theory (visuals), Data 8 textbook/notes (code-driven inference), CLT interactives. Seeing Theory+2Inferential Thinking+2

  • Reference handbooks: NIST/SEMATECH e-Handbook for Monte Carlo and uncertainty analysis. itl.nist.gov


📌 Key Takeaways


❓ FAQs

1) How many iterations are “enough”?
For smooth central estimates, 5,000–10,000 iterations are typical; increase for tail probabilities or very small p-values. (Computers make this cheap.) itl.nist.gov

2) Is simulation “cheating” compared with formulas?
No. It approximates the same targets (sampling and null distributions) when formulas are complex or assumptions are shaky—standard in modern courses. OpenIntro

3) Can I bootstrap everything?
You need a representative sample and an i.i.d.-like setup; for time series or clustered data, use block or stratified bootstraps. See primers before applying. stats.oarc.ucla.edu

4) What’s the difference between bootstrap and randomization tests?
Bootstrap resamples with replacement from your sample to mimic repeated sampling; randomization reshuffles labels to mimic the assignment mechanism under the null. Inferential Thinking+1

5) When should I prefer a t-test over permutation?
With clean normal-ish residuals and equal variances, t-tests are efficient. Permutation tests are robust when those assumptions are doubtful. PennState: Statistics Online Courses

6) Can simulation help with teaching/learning?
Yes—interactive CLT and inference demos improve conceptual grasp and engagement. PMC

7) How do I present results to non-statisticians?
Show the simulated distribution, mark the observed statistic or CI, and add one sentence on the decision rule.


📚 References

  1. OpenIntro Introductory Statistics with Randomization and Simulation (ISRS)—simulation-first intro text. OpenIntro

  2. UC Berkeley Computational and Inferential Thinking (Data 8)—bootstrap & simulation-driven inference. Inferential Thinking

  3. NIST/SEMATECH e-Handbook of Statistical Methods—Monte Carlo and uncertainty analysis. itl.nist.gov

  4. UCLA OARC Introduction to Bootstrapping—concise tutorial and examples. stats.oarc.ucla.edu

  5. Penn State STAT 200 Randomization Procedures—course notes on building null distributions. PennState: Statistics Online Courses

  6. Brown University Seeing Theory—interactive visualization of core concepts. Seeing Theory

  7. Columbia University Stats Interactives: CLT—interactive Central Limit Theorem demo. Stats Interactives

  8. Stanford/UChicago (Romano, Shaikh et al.) Randomization Inference: Theory and Applications (2025 preprint). home.uchicago.edu

  9. Oxford Academic “To permute or not to permute”—overview of permutation tests and properties. Oxford Academic

  10. OpenIntro Introduction to Modern Statistics (IMS)—simulation + traditional methods together. OpenIntro