Fitting Logistic Regression with Aggregate Data in R
Example Using Simulated Migration Data
I often get asked: can I use a logistic regression on aggregate data?
We generally associate logistic regressions to individual-level or micro data encoding attributes into a binary category, such as moving 1 or not 0. However, it is possible to use logistic regressions with aggregate data. In human mobility and migration research, for example, we are often interested in the proportion, or probability of people moving from an origin to a destination, and we can use a logistic regression to identify the factors that relate to this probability. These models were popularised by the work on discrete choice modelling by Daniel McFadden who was later awarded the Nobel Prize in Economics Science in 2000 for this work. McFadden developed a very elegant formulation in the framework of random utility maximisation and later applied it to study travel choices.
Recently I wrote a computational notebook in R to illustrate three ways to estimate a logistic regression model based on individual-level data and aggregate data leading to the same model estimates - see tweet and link to the notebook below.
Openning short 🧵
— Francisco (@Fcorowe) September 29, 2021
I was recently asked if estimating a logistic regression based on aggregate data was possible. This is something I regurlarly get asked
This time I thought I better document my answer. The result👇🏼
code ➡️ @github repo https://t.co/7OXFlSvTI1#RStats pic.twitter.com/qd6kh22nCB