U-M senior’s COVID-19 data model reaches CDC

August 25, 2020
Contact: Jessica Jimenez jjojimnz@umich.edu,
Morgan Sherburne morganls@umich.edu

For Sabrina Corsetti, the pandemic presents an interesting problem—a data problem, that is. Her efforts to model the pandemic’s spread using a machine learning algorithm has now been included in those being aggregated for the CDC’s weekly projections.

Sabrina Corsetti, 2020 Goldwater Scholar, developed a data model that leverages machine learning to predict COVID-19 cases and deaths for the United States up to 40 days in advance.

Sabrina Corsetti, 2020 Goldwater Scholar, developed a data model that leverages machine learning to predict COVID-19 cases and deaths for the United States up to 40 days in advance.

Corsetti, a senior majoring in physics and mathematics, had her previous research halted when the University of Michigan suspended in-person classes and labs back in March. Thomas Schwarz, one of Corsetti’s research professors, happened to be modeling the pandemic’s data and included her in the project.

Under Corsetti’s direction, the small analysis has developed into a full, data-driven research project. Corsetti says that, while she was originally interested in the pandemic due to the news, it was the amount of “unknowns” surrounding the data that inspired her to dive deeper.

“At the beginning, we didn’t know the scope or end goal, but we realized that the simple epidemiological models weren’t carrying us like we needed,” she said. “But then I came across a paper about applying machine learning to epidemiology, and I worked off of that to build better predictions based on the data alone and without any external assumptions.”

Corsetti’s new model performs ridge regression, which is a type of machine learning algorithm that finds a best-fit projection of future COVID-19 cases and deaths for the United States. The model is currently able to project up to 40 days in advance, using a method that centers a spectrum of predictions around a single optimal projection.

The COVID-19 Forecast Hub aggregates data models from across the nation into a weekly prediction for the CDC.

The COVID-19 Forecast Hub aggregates data models from across the nation into a weekly prediction for the CDC.

The model’s greatest value comes from its contribution to the COVID-19 Forecast Hub initiatives by researchers at the University of Massachusetts. The hub aggregates data models from more than 30 international research groups to make a stronger forecast.

Corsetti’s model, in contribution to the hub, has been implemented in the CDC’s public prediction database. Most recently, General Motors has shown interest in the model for potential supply chain studies. It’s the model’s strength in projecting over five weeks ahead that allows organizations and companies like GM to set mitigation efforts in advance of a potential outbreak.

Corsetti and Schwarz developed the UM model that joins the ensemble collected by the COVID-19 Forecast Hub.

Corsetti and Schwarz developed the UM model that joins the ensemble collected by the COVID-19 Forecast Hub.

“The model can confirm the trajectory and give you a bit of an edge,” Corsetti said. “If the area’s curve is twisting slightly, you can track when cases are escalating or mitigation efforts are taking effect. You can get definite early predictions.”

The next step for Corsetti and Schwarz will be developing a user-friendly website to report local COVID-19 data for students and staff at the university. They believe their efforts are important to the U-M community.

Schwarz says that making the data accessible will enable students to understand the pandemic’s trajectory and can help inform their decisions.

“When you look at the data, it’s pretty obvious when there’s going to be an increase or decrease in cases, and you can see the hot spots,” he said. “Once you look at the data on a daily basis, you’ll see which places are having trouble.”

Such a website or mobile app would also allow students to report locations of potential cases anonymously, which benefits those who fear being associated with a reported outbreak.

The biggest challenge for Corsetti and Schwarz will be in gaining public awareness and support for their efforts. They have already begun contacting local and campus newspapers.

The CDC’s weekly predictions are a product of an ensemble collected by the COVID-19 Forecast Hub.

The CDC’s weekly predictions are a product of an ensemble collected by the COVID-19 Forecast Hub.

“The challenge will be in communicating to the public. Physicists just don’t have a lot of venues for communication when it comes to large groups of people,” Schwarz said, jokingly.

For Corsetti, she hopes to continue applying her love of research and data analysis to create social impact after her senior year. In addition to creating the COVID-19 data model, Corsetti has spent this summer working for the National Renewable Energy Lab using data science to detect and prevent common types of attacks on the U.S. electrical grid. She intends to work in the research and development of renewable energy full-time.

“I really enjoy computer science and want to use what I’ve learned with my previous research to pursue more opportunities in applied research,” she said. “I look forward to continuing this covid project and working with renewable energy.”

Corsetti is one of four 2020 Goldwater Scholars, a group of highly qualified STEM students with plans to pursue a Ph.D. and research in those fields. Ella McCauley, a high school senior from South Lyon, Michigan, has also contributed to this project, leading the analysis of testing data. Schwarz is an experimental particle physicist and associate professor. His current research focuses on discovering new physics in high-energy collisions at the Large Hadron Collider at CERN.