Data Science in Health Care

Freethinking data scientists are venturing far beyond the doctor’s office, integrating data from mobile devices and sensors, weather and GIS, nature and history..


  • Google Flu Trends was beating out CDC reports as early as 2009.
  • In Adam Sadilek’s nEmesis project, Sadilek utilized comparable machine learning methods to identify outbreaks of food poisoning.
  • In Rachelle Chong’s 2013 article on Big Predictions for Big Data Impact on Public Health, Dr. Jennifer Olsen notes that big-data sources have reduced the time needed to detect a pandemic from 167 days (1996) to 23 days (2009).
  • During an interview for in late 2013, Dr. Munmun de Choudhury has described how Microsoft Research is analyzing social activity to pinpoint signals of depression.Read more about the privacy debate and de Choudhury’s predictions for the future in: Data Story: How Microsoft Research is Using Social Data to Understand Depression.
  • Andy Oram’s 2013 report from a recent Esri Health GIS Conference. Here Oram saw Esri mapping solutions being used to, among other things:
    • Pinpoint incidence clusters: Data scientists found that babies in Louisiana are more likely to be born with low birth weights in locations with particular demographics (e.g., housing projects).
    • Improve quality of care: A VA Hospital is using RFIDs and GIS data to monitor equipment failures and track accident occurrences.
    • Flag unusual patterns: “In Louisiana, for instance, plotting the instances of certain diseases produced a pattern over a particular waterway that they deduced to be contaminated.”
  • Factoring a person’s location into their health has important implications. In genetic databases, 35 to 50 percent of disease causes are listed as “unknown”, which really means “environmental.” That’s a lot of information being overlooked that could sharpen diagnoses. Esri wants to change this. The company is tapping into sources such as the EPA’s Toxic Release Inventory to try and nail down environmental factors. With the My Place History app, users can now link their personal place history to public EPA data.
  • McKinsey report, The Big Data Revolution in U.S. Health Care
  • Last, but not least, comes the crossover story of the Juniper Pollen Project. Funded by NASA, the project is a collaborative effort between the USA National Phenology Network and several universities in Arizona, New Mexico and Texas. The aim is to improve predictions of juniper pollen release and issue allergy and asthma warnings to people prone to severe allergic reactions. As of 2013, the project pulls on three main sources of data to issue predictions:
    • NASA satellite data: Each year, scientists use NASA’s high-quality imagery to monitor the juniper canopy and watch for the moment of pollen release.
    • Dust storm models: To predict the pollen’s dispersal, researchers then employ models adapted from the University of Arizona’s work on tracking dust storms. These models utilize real-time weather data to help provide a clearer picture of the direction and speed of “pollen storms.”
    • Field verification: Local on-the-ground volunteers observe the development of pollen cones for the timing of pollen release. Their data, combined with pollen samples from six strategic locations, help fine-tune the models. Check out Norene Griffin’s summary of the project and interviews with researchers at: NASA Meets Public Health on the Juniper Pollen Project.