During non-pandemic times, commuters in many major metropolitan areas spend a lot of time sitting in traffic. In 2019, the last non-COVID year, commuters in the San Francisco Bay area spent, on average, 47 hours waiting in traffic — almost two whole days of each commuter’s life sacrificed to gridlock.
State and regional road planners try to engineer traffic flows to avoid this, but they frequently run into informational roadblocks.
“Statistics capturing human mobility are expensive to obtain and deteriorate quickly as existing mobility patterns change and new ones emerge. Planners have been relying on U.S. Census LODES data, which explicitly captures only commuting trips, and seems unsatisfying because only some 16.6% of all vehicle trips are work-related,” report several researchers with San Jose State University’s Mineta Transportation Institute in a fact sheet accompanying their new study, which seeks to improve transportation data by adding quantity.
The study looked to another, unexpected source of data to try to fill in those gaps: the social network Twitter.
Twitter is probably most famous for users’ hot takes and digital outrage mobs. But professors at San Jose State, Hunter College, and the University of Salzburg, as well as a few student researchers, became interested in it for an entirely different reason: Many Tweets also have geolocation tags, and those dots can be connected.
If enough people tweet in a region at various points throughout the day, one can track that, and one can get a better idea of surface transportation movement — also known as road trips.
What Mineta researchers found in the Twitter geolocation data was a treasure trove of information, at least initially.
“Approximately 33 million georeferenced Bay Area tweets were harvested for the study period from 2010 until early 2020,” the researchers report.
From those tweets, they found that the social network’s data captured a whole lot more movement on roads than the U.S. Census LODES Data. In fact, this new dataset was “suitable to capture the over 80% of non-commuting trips that keep our roads busy.”
However, Mineta researchers ran into some discouraging limitations with the Twitter data and found good news about the transportation data planners had relied upon for information.
They found that Twitter data “is not suitable for characterizing real-time or short-term commuting patterns,” for one. They also found that a change Twitter made to how it tracks tweets in 2015 “resulted in a dramatic reduction in available [data].”
The good news was that Mineta researchers also found that “LODES data are an excellent substitute for overall transportation demand” and that the social network data and the government census data are “complementary.” That means they can help give planners a fuller picture of what’s happening on the streets and that the points where they diverge can act as red flags to investigate further.
This research also points to future research horizons that others are bound to explore. Other social networks track their users’ locations, which could serve as a huge source of information. Some many other sites and devices could yield additional data as well.
One potential barrier to this research is privacy concerns. Twitter changed the way and frequency it records such things under pressure from users, many of whom would rather not have their movements minutely tracked.
Even optional geotagging proved so unpopular that the social network phased that out in 2019, though it still allowed it in limited camera functions.
“Most people don’t tag their precise location in Tweets, so we’re removing this ability to simplify your Tweeting experience,” the Twitter Support account explained in a tweet that would likely have been geotagged to Twitter headquarters.

