Statistics for Beginners: Mean, SD, Hypothesis Tests, and More

Statistics is the language of uncertainty — the tool that lets us draw conclusions from incomplete information. Whether you're reading a news poll, interpreting a clinical trial result, or analysing your own data, understanding these core concepts will make you a far more critical reader.

Descriptive Statistics: Summarising Data

Before you can analyse data, you need to describe it. The key measures are central tendency (where is the middle?) and spread (how variable is the data?).

Mean, Median, and Mode

The arithmetic mean is the sum divided by the count. It's the most familiar average but is highly sensitive to outliers.

The median is the middle value when data is sorted. It's more robust — a single extreme value doesn't move it much.

The mode is the most frequent value. Useful for categorical data; less useful for continuous measurements.

| Dataset | Mean | Median | Mode | |---------|------|--------|------| | 2, 4, 4, 6, 8 | 4.8 | 4 | 4 | | 2, 4, 4, 6, 100 | 23.2 | 4 | 4 |

Notice how one extreme value (100) changes the mean dramatically but leaves the median untouched. This is why house price statistics use the median — a handful of multi-million-pound mansions would make average prices misleading.

Standard Deviation and Variance

Variance measures the average squared deviation from the mean:

σ² = Σ(xi - x̄)² / n

Standard deviation is the square root of variance — it's in the same units as the original data, which makes it interpretable:

σ = √[Σ(xi - x̄)² / n]

The 68-95-99.7 rule for normally distributed data:

68% of values fall within 1 standard deviation of the mean
95% within 2 standard deviations
99.7% within 3 standard deviations

Note: Use n in the denominator for the population standard deviation; use n−1 for a sample estimate (this is called Bessel's correction and corrects for the slight underestimate that occurs with samples).

The Normal Distribution

The normal (Gaussian) distribution is the bell-shaped curve that appears everywhere in nature and statistics. It's fully described by two parameters: mean (μ) and standard deviation (σ).

The z-score converts any value to "how many standard deviations from the mean":

z = (x - μ) / σ

A z-score of 1.96 corresponds to the 97.5th percentile — the value above which only 2.5% of the distribution lies. This appears constantly in statistics because of confidence intervals.

The Central Limit Theorem is why the normal distribution matters so much: regardless of the shape of the original population, the distribution of sample means approaches normality as sample size increases. This is why so many statistical tests assume normality even when the raw data isn't normally distributed.

Confidence Intervals

A 95% confidence interval doesn't mean "there's a 95% probability the true value is in this range." It means: "if we repeated this sampling process many times, 95% of the intervals we computed would contain the true value."

For a proportion p from a sample of size n:

CI = p ± z × √(p(1-p)/n)

For 95% confidence, z = 1.96. For 99%, z = 2.576.

Margin of error is just the ± part: z × √(p(1-p)/n). When a poll reports "±3 percentage points," this is the margin of error.

Hypothesis Testing

Every hypothesis test follows the same structure:

H₀ (null hypothesis): The default — usually "no effect," "no difference," "no relationship"
H₁ (alternative hypothesis): What you're trying to show evidence for
Test statistic: A number computed from the data that measures how far from H₀ the data is
p-value: The probability of observing a result at least this extreme if H₀ were true

The p-value Explained

A p-value of 0.03 means: "If there were truly no effect, we'd see data this extreme by chance only 3% of the time." This is usually considered significant enough to reject H₀.

What p < 0.05 does NOT mean:

It does not mean there's a 95% chance the effect is real
It does not mean the effect is practically important
It does not mean H₀ is false

Type I and Type II Errors:

| | H₀ is true | H₀ is false | |--|-----------|------------| | Reject H₀ | Type I error (false positive) | Correct | | Fail to reject H₀ | Correct | Type II error (false negative) |

α (significance level) = Type I error rate, usually 0.05 β = Type II error rate; Power = 1 − β, usually targeted at 0.80

The t-Test

The t-test compares means between groups. The two-sample t-statistic is:

t = (x̄₁ - x̄₂) / √(s₁²/n₁ + s₂²/n₂)

A large |t| means the groups are far apart relative to within-group variability. Compare to a critical value (or compute the p-value) with the appropriate degrees of freedom.

When to use it: Comparing two means from independent groups, when data is approximately normal or n > 30.

Correlation

Pearson's r measures the strength of linear relationship between two variables:

r = +1: Perfect positive linear relationship
r = 0: No linear relationship
r = −1: Perfect negative linear relationship

r = Σ(xi - x̄)(yi - ȳ) / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]

R² (r squared) tells you the proportion of variance in Y explained by X. If r = 0.7, then R² = 0.49 — X explains 49% of the variability in Y.

Spearman's ρ (rho) does the same thing but uses ranks rather than raw values, making it robust to outliers and appropriate for ordinal data.

Remember: Correlation ≠ causation. Ice cream sales and drowning rates are strongly correlated (both peak in summer), but ice cream doesn't cause drowning.

Effect Size

Statistical significance tells you whether an effect is real; effect size tells you how big it is. Cohen's d for comparing two means:

d = (μ₁ - μ₂) / σ_pooled

| Cohen's d | Interpretation | |-----------|---------------| | 0.2 | Small | | 0.5 | Medium | | 0.8 | Large |

A highly significant p-value with d = 0.1 means you've detected a real but trivially small effect — possibly because your sample was enormous. Always report effect sizes alongside p-values.

Chi-Square Test

The chi-square (χ²) test asks: "Do the observed counts differ from what we'd expect by chance?"

χ² = Σ (Observed - Expected)² / Expected

Use it when your data is categorical — for example, testing whether a die is fair, or whether treatment outcome is independent of treatment group.

Choosing the Right Test

| Situation | Test | |-----------|------| | Compare one mean to a known value | One-sample t-test | | Compare two independent means | Two-sample t-test | | Compare two paired means | Paired t-test | | Compare 3+ means | ANOVA | | Compare 3+ means (non-normal) | Kruskal-Wallis | | Association between two continuous variables | Pearson/Spearman correlation | | Compare categorical proportions | Chi-square | | Two groups, non-normal distribution | Mann-Whitney U |

Common Mistakes

Peeking: Running your test repeatedly and stopping when p < 0.05 inflates Type I error dramatically. Plan your sample size before collecting data.

Multiple comparisons: Running 20 independent tests at α = 0.05 will produce one false positive on average. Use Bonferroni correction or control the false discovery rate.

Ignoring assumptions: Most tests assume random sampling, independence of observations, and (for t-tests) approximate normality. Violating these undermines the results.

Use our Z-Score Calculator, Sample Size Calculator, t-Test Calculator, and Correlation Calculator to work through your own data.