Lecture 22
March 16, 2026
Each of these is a hypothesis about causation or influence.

One-Tailed Test:
Two-Tailed Test:
| Null Hypothesis Is | |||
| True | False | ||
| Decision About Null Hypothesis | Don’t reject | True negative (probability \(1-\alpha\)) | Type II error (probability \(\beta\)) |
| Reject | Type I Error (probability \(\alpha\)) | True positive (probability \(1-\beta\)) | |
The standard null hypothesis significance framework is based on balancing the chance of making Type I (false positive) and Type II (false negative) errors.
Idea: Set a significance level \(\alpha\) which is an “acceptable” probability of making a Type I error.
Aside: The probability \(1-\beta\) of correctly rejecting \(H_0\) is the power.
Common practice: If the p-value is sufficiently small (below \(\alpha\)), reject the null hypothesis with \(1-\alpha\) confidence, or declare that the alternative hypothesis is statistically significant at the \(1-\alpha\) level.
This can mean:
\[ \underbrace{p(S \geq \hat{S}) | \mathcal{H}_0)}_{\text{p-value}} \neq \underbrace{p(\mathcal{H}_0 | S \geq \hat{S})}_{\substack{\text{probability of} \\ \text{null}}}!\]
After a test like whether \(\beta = 0\) or not, it is common to hear people say something like “\(\beta\) is [statistically] significant.”
This is true in a technical sense, but e.g. non-rejection of the null hypothesis does not mean that \(\beta\) is trivial or can be ignored.
Why?
For example: \(\beta\) can be quite large, but with large standard errors, and the null hypothesis would be retained.
You actually need to look at the sampling distribution/confidence intervals to know this!
Statistical significance mixes together a number of different concepts:
Statistical significance also does not mean anything about whether the alternative hypothesis is:

Fit a regression
\[ \begin{gather*} y_i = \beta_0 + \beta_1 t + \varepsilon_i, \\ \varepsilon_i \sim \mathcal{N}(0, \sigma^2\mathbb{I}). \end{gather*} \]
SF Tide Gauge Data: \((\hat{\beta}_0, \hat{\beta}_1) \approx (1.26, 4 \times 10^{-4})\)
\(\mathcal{H}_0: \beta_1 = 0\)
Use the \(t\)-statistic:
\[\hat{t} = \frac{\hat{\beta_1}}{se(\hat{\beta_1)}}, \quad \hat{t} \sim t_{n-2}.\]
SF Data: \(\hat{t} = 2.31\), \(p\text{-value} \approx 0.01\).
A non-parametric alternative is the Mann-Kendall Test:
Assume data is independent and no periodic signals, but no specific distributional assumption. Test statistic \(S\):
\[S = \sum_{i=1}^{n-1} \sum_{j={1+1}}^n \text{sgn}\left(y_j - y_i\right).\]
SF Tide Gauge Data: \(S=921\).
\(\mathcal{H}_0: S = 0\) (no average trend).
The null sampling distribution of \(S\) is \(N(0, \sqrt{\text{Var}(S)})\), where \[\text{Var}(S) = \frac{n(n-1)(2n+5)}{18} - \sum_{i=1}^g t_i(t_i - 1)(2t_i + 5),\]
\(g\) are the number of equal values and \(t_i\) is the size of tie group \(i\)
SF Tide Gauge Data: \(Var(S) = 224875\)
Is there a trend in the SF tide gauge trend data?

Source: McElreath (2020, fig. 1.2)
If you conduct multiple statistical tests, you must account for all of these in the p-value computation and assessment of significance.

For example: an appropriate applied test at a 5% significance level will result in a 5% Type I error rate. But if you do 100 independent tests, the Type I rate is 99.4%!

Elton John Results Section Meme
Source: Richard McElreath
Examining and testing the implications of competing models is important, including “null” models!

Use computing for simulation:
Wednesday: Introduction to Simulation and Random Sampling
Friday: Sampling and Monte Carlo
Next Week The Bootstrap
HW4 assigned, due 3/27.