Generalized Linear Models

Lecture 09

February 10, 2026

Review

Logistic Regression

Used to model probabilities for classification.
Two-class (e.g. success or failure:

\[\begin{aligned} y_i &\sim Bernoulli(p_i) \\ \text{logit}(p_i) &= \sum_j \beta_j x^i_j \end{aligned}\]

where \(\text{logit}(p) = \log\left(p / (1 - p)\right)\).

Multi-Class Logistic Regression

Treat one class as a reference. Other classes \(c\) will have its own intercept \(\beta_0^c\) and coefficients \(\beta^c\).

\[p(Y = c | X = \mathbf{x}) = \frac{\exp\left(\beta_0^c + x \cdot \beta^c \right)}{\sum_c \exp\left(\beta_0^c + x \cdot \beta^c\right) }\]

Interpretation of Coefficients

\(\exp(\beta^c_j)\) gives the relative change in the odds of class c from a unit change in \(x\).

Logistic Regression Example

Forecasting Precipitation

Code

# read in data; each year is a row and each column is a day
dat = CSV.read("data/weather/snoqualmie_falls.txt", delim=" ", ignorerepeated=true, silencewarnings=true, skipto=2, DataFrame)
years = 1948:1983
# the last "day" of each non-leap year is NA, so need to skip over these
days_per_year = repeat([366, 365, 365, 365], outer=Int(ceil(length(years) / 4)))
snoq = zeros(sum(days_per_year))
for i = 1:length(years)
    # need to use values() to get the vector of elements in the row; this is a quirk of DataFrames.jl
    snoq[1+sum(days_per_year[1:i-1]):sum(days_per_year[1:i])] .= values(dat[i, 1:days_per_year[i]])
end
# want to arrange dataframe with today's precipitation (predictor) and tomorrow's (prediction)
snoq_dat = DataFrame(today=snoq[1:end-1], tomorrow=snoq[2:end])
p1 = histogram(snoq_dat.today, xlabel="Precipitation (1/100 inch)", ylabel="Count", label=false)
p2 = scatter(snoq_dat.today, snoq_dat.tomorrow, xlabel="Precip Today (1/100 inch)", ylabel="Precip Tomorrow (1/100 inch)", markersize=2, alpha=0.2, label=false)
plot(p1, p2, layout=(1, 2), size=(1200, 500))

Figure 1: Snoqualmie Falls, WA precipitation dataset.

Forecasting Model

Let’s try to forecast when there will be non-zero precipitation (\(Y=1\)) vs. zero precipitation (\(Y=0\)) from the previous day’s precipitation.

\[\text{logit}(p_i) = \text{logit}(p(Y_i = 1)) = \beta_0 + \beta_1 x_i\]

Maximum Likelihood Estimation

invlogit(x) = exp(x) / (1 + exp(x)) # define inverse logit
function logreg_loglik(params, y, x)
    b0, b1 = params
    p = invlogit.(b0 .+ b1 * x) # logistic regression
    ll = sum(logpdf.(Bernoulli.(p), y)) # use Bernoulli log-pdf as log-likelihood
    return ll
end

lb = [-1.0, -1.0]
ub = [1.0, 1.0]
p0 = [0.0, 0.0]
optim_out = Optim.optimize(p -> -logreg_loglik(p, snoq_dat.tomorrow .> 0, snoq_dat.today), lb, ub, p0)
mle = round.(optim_out.minimizer; digits=2)

2-element Vector{Float64}:
 -0.44
  0.05

Visualizing Results

Code

x_pred = 0:0.1:maximum(snoq_dat.today)
p_pred = invlogit.(mle[1] .+ mle[2] * x_pred)
plot(x_pred, p_pred, linewidth=3, label="Modeled Probability", xlabel="Precip Today (1/100 inch)", ylabel="Positive Precip Tomorrow?", size=(1000, 400))
scatter!(snoq_dat.today, snoq_dat.tomorrow .> 0, alpha=0.2, label="Data")

Figure 2: Modeled probability of precipitation tomorrow as a function of precipitation today.

Calibration

Does the predicted CDF \(F(y)\) align with the “true” distribution of observations \(y\)? \[\mathbb{P}(y \leq F^{-1}(\tau)) = \tau \qquad \forall \tau \in [0, 1]\]

In this case: does the model produce the “right” probabilities of events?

Example: Calibration

We have a total of 6,919 days with precipitation. To see the expected number from the model, sum up the probabilities:

p_fit = invlogit.(mle[1] .+ mle[2] * snoq_dat.today)
round(sum(p_fit); digits=0)

7007.0

We’re off by about 1.3%. Is that good?

Example: Calibration

How well calibrated is the prediction for a no-precipitation day followed by a precipitation day?

The actual fraction:

emp_pnp = mean(snoq_dat[snoq_dat.today .== 0, :tomorrow] .> 0)
round(emp_pnp; digits=3)

0.287

The modeled fraction:

mod_pnp = mean(p_fit[snoq_dat.today .== 0])
round(mod_pnp; digits=3)

0.392

Potential Model Improvements

What might we try to improve this forecast?

Generalized Linear Models

Linear vs. Logistic Regression

Linear Regression: Model the conditional expectation function \[\mu(x) = \mathbb{E}[Y | X=\mathbf{x}] \approx \beta_0 + \sum_i \beta_i x_i.\]

Logistic Regression:

\[\mu(x) = \mathbb{E}[Y | X=\mathbf{x}] = \mathbb{P}[Y = 1 | X = \mathbf{x}].\]

We transform the conditional expectation to be linear (using \(g(x) = \text{logit}(x)\)):

\[g(\mu(x)) = \eta(x) \approx \beta_0 + \sum_i \beta_i x_i.\]

Link Functions

Logistic regression is an example of a generalized linear model (GLM).

Start with a distributional assumption: \(y_i \sim \text{Bernoulli}(p_i)\).

Then model parameters using a link function and a linear predictor:

\[\underbrace{\logit}_{\text{link}}(p_i) = \underbrace{\sum_j \beta_j x^i_j}_{\text{predictor}}\]

Choice of Distribution

Many possible distributions depending on the problem at hand. Common ones:

Binomial: Integer count from 0 to some upper limit \(n\); \[y_i \sim \text{Binomial}(p_i)\]
Poisson: Modeling counts with no fixed upper limit, but low base probability of “success”: \[y_i \sim \text{Poisson}(\lambda_i)\]

Choosing a Link Function

Link functions are typically linked to distributions.

Poisson models usually use the log link (positives → reals).
Binomial/Bernoulli models use the logit link (\([0, 1]\) → reals) \[\text{logit}(p) = \log\left(\frac{p}{1-p}\right)\]

…But A Link Function Is A Choice

These are “canonical” choices for their associated distributions because they are flexible and have nice mathematical properties, but you can make others as relevant for your data and problem.

Code

p = 0:0.01:1
logit(p) = log(p / (1 - p))
plot(p, logit.(p), xlabel=L"$x$", ylabel=L"$y$", linewidth=3, legend=false, size=(600, 450))

Poisson Regression Example

Modeling Counts

Code

fish = CSV.File("data/ecology/Fish.csv") |> DataFrame
fish[1:3, :]

3×6 DataFrame

Row	fish_caught	livebait	camper	persons	child	hours
	Int64	Int64	Int64	Int64	Int64	Float64
1	0	0	0	1	0	21.124
2	0	1	1	1	0	5.732
3	0	1	0	1	0	1.323

What Distribution?

Count data can be modeled using a Poisson or negative binomial distribution.

What Distribution?

Let’s start with a Poisson.

One property of \(y \sim \text{Poisson}(\lambda)\): \(\mathbb{E}(y) = \text{Var}(y) = \lambda\).

Mean: 3.3
Variance: 135.4

What Does This Mean?

The data is overdispersed relative to a Poisson distribution.
But a Poisson GLM may give rise to overdispersed data, since every observation has a different \(\lambda_i\) (hence mean and variance.)

Be Cautious…

Over-relying on plots like histograms can lead to mis-specifying GLM dynamics.
Think of linear regression: the question is not if the data are Gaussian but the residuals.

Source: Richard McElreath

Model Specification

We might hypothesize that the number of fish caught is influenced by the number of children (noisy, distracting, impatient) and whether the party brought a camper.

\[\begin{align*} y_i &\sim \text{Poisson}(\lambda_i) \\ f(\lambda_i) &= \beta_0 + \beta_1 \text{child}_i + \beta_2 \text{camper}_i \end{align*} \]

\(\lambda_i\): positive “rate”

\(f(\cdot)\): maps positive reals (rate scale) to all reals (linear model scale)

Fitting the Model

function fish_model(params, child, camper, fish_caught)
    β₀, β₁, β₂ = params
    λ = exp.(β₀ .+ β₁ * child + β₂ * camper)
    loglik = sum(logpdf.(Poisson.(λ), fish_caught))
end

lb = [-100.0, -100.0, -100.0]
ub = [100.0, 100.0, 100.0]
init = [0.0, 0.0, 0.0]

optim_out = optimize(θ -> -fish_model(θ, fish.child, fish.camper, fish.fish_caught), lb, ub, init)
θ_mle = optim_out.minimizer
@show round.(θ_mle; digits=2);
@show round.(exp.(θ_mle); digits=2);

round.(θ_mle; digits = 2) = [0.91, -1.23, 1.05]
round.(exp.(θ_mle); digits = 2) = [2.48, 0.29, 2.87]

Interpretation of Coefficients

Since \(\log(\lambda_i) = \sum_j \beta_j x^i_j\):

\(\exp(\beta_j)\) says how much a unit increase in \(x_j\) multiplies the rate (hence expected value).

So \(\beta_1 =\) -1.2 says that each extra child reduces the expected number of fish caught by a factor of 0.3.

Predictive Checks To Evaluate Model

Linear regression: model checking was straightforward: residuals should have been Gaussian distributed.
GLMs: Less clear how to check distributional assumptions.

Strategy: check predictive distribution of relevant summary statistic(s) by simulation; do you capture the empirical statistic?

Predictive Checks

In other words, for some statistic \(T\):

Estimate \(\hat{T}\) from the data.
Simulate synthetic datasets using your model and calculate \(\tilde{T}_n\) for each one;
Examine the distribution of \((\tilde{T})\) to see where \(\hat{T}\) falls.

Teaser: This is closely related to the idea of a \(p\)-value…

Checking Proportion of Zero Fish

Figure 7: Predictive distribution for fitted fish model.

What Was The Problem?

The model produces far fewer zero-fish counts than the data.
This is typical of a lot of environmental count data; often have far more zeros than would be expected from a Poisson regression.
Alternative: Use a zero-inflated model.

Key Points

GLMs

Generalized linear models: use an alternative distribution but let parameters vary.
Require a choice of distribution, a link function, and a linear model.
Lots of different choices of distributions — think about structure of the data and generative properties of the distributions.
Simulation is valuable for checking model fit.

Upcoming Schedule

Friday: Quiz 1 for first 25 minutes.

Monday: No class!

Next Week: Modeling extreme values.