Modelling Count Data in R: A Multilevel Framework
A quick practical guide to Poisson, zero-inflated and negative binomial mixed models
I put this guide together as a practical reference for modelling count data in a multilevel framework in R. Many of the datasets I work with, especially in human mobility and migration research, involve outcomes that are counts, strongly right-skewed, often overdispersed, and in some cases zero-inflated. That combination makes model choice important.
The aim of the guide is not to be exhaustive. It is meant to be a quick starting point for people who want to estimate count models in R and need a clear path through the main options and the functions that support them.
What the guide covers
The tutorial walks through four commonly used model variants:
- Poisson
- Zero-inflated Poisson
- Negative binomial
- Zero-inflated negative binomial
The examples are framed in a hierarchical modelling setting, using random intercepts to account for grouped observations. The implementation focuses mainly on glmmTMB(), while also pointing to related functions in lme4, such as glmer() and glmer.nb().
Why glmmTMB()?
The guide gives particular attention to glmmTMB() because it offers a useful combination of speed, flexibility and a syntax that feels familiar if you already use lme4. That makes it especially helpful when moving beyond standard Poisson specifications into zero-inflated and negative binomial models.
In the notebook, I use the owl chick negotiation dataset included with glmmTMB() to illustrate how these models are specified and interpreted. The examples show how to:
- define a mixed-effects count model
- include offsets
- estimate zero-inflation explicitly
- compare negative binomial parameterisations
- think about overdispersion in practice
Why this matters
Count data show up everywhere in applied research, but the default modelling choices are often too simplistic for real-world data. Overdispersion, excess zeros and repeated observations within groups can all lead to misleading inference if they are not handled carefully.
This is why I wanted a short, practical guide that could help people move quickly from a basic Poisson model to more appropriate alternatives when the data demand it.
Read the full guide
You can read the full notebook here:
Modelling Count Data in R: A Multilevel Framework
At the time, I also shared the guide in a short thread:
Been modelling count data in a multilevel framework using #rstats & decided to put together this quick guide to sort my thoughts & get a list of common model specs, packages & approaches
— Francisco (@Fcorowe) January 22, 2021
Link to the code if you find it useful: https://t.co/xRhd0Lo9hu pic.twitter.com/xcD7qC3h7s
Suggested citation
Francisco Rowe (2021-01-11). Modelling Count Data in R: A Multilevel Framework. Francisco Rowe. https://franciscorowe.com/post/2021-02-08-count_data_modelling/
BibTeX
@online{rowe202120210208countdatamodelling,
author = {Francisco Rowe},
title = {Modelling Count Data in R: A Multilevel Framework},
year = {2021},
date = {2021-01-11},
url = {https://franciscorowe.com/post/2021-02-08-count_data_modelling/}
}