A new data processing approach created by scientists at the University of Michigan Life Sciences Institute in Ann Arbor offers a simpler, faster path to data generated by cryo-electron microscopy instruments, allowing scientists to more quickly determine the 3-D shape of cellular proteins and other molecules that have been flash-frozen in a thin layer of ice.
The new method removes a barrier to wider adoption of the technique. Advanced microscopes beam high-energy electrons through the ice while capturing thousands of videos. The videos are then averaged to create a 3-D structure of the molecule.
Researchers can use the structures of the molecules to answer questions about how they function in cells and how they might contribute to health and diseases. Researchers recently used cryo-EM to reveal how a protein spike on the COVID-19 virus enables it to gain entry into host cells.
While recent advances in cryo-EM technology have rapidly opened the field to new users and increased the rate at which data can be collected, researchers still face a hurdle in accessing the full potential of the technique: the complex data processing landscape required to turn the microscope’s data into a 3-D structure ready for analysis.
Before researchers can begin analyzing the 3-D structure they want to study, they have to complete a series of preprocessing steps and subjective decisions. Currently, the steps must be supervised. Because researchers use cryo-EM to analyze a huge variety of molecule types, it’s nearly impossible to create a general set of guidelines that all researchers could follow for the steps.
“If we can create an automated pipeline for those preprocessing steps, the whole process could be much more user-friendly, especially for newcomers to the field,” says Yilai Li, a Willis Life Sciences fellow at the Life Sciences Institute.
Using machine learning, Li and his colleagues have developed such a pipeline. The new program connects deep-learning and image-analysis tools with preexisting software data preprocessing algorithms to narrow data sets down to the information researchers need to begin their analysis.
“This pipeline takes the knowledge that experienced users have gained and puts it into a program that improves accessibility for users from a range of backgrounds,” says Michael Cianfrocco, assistant professor in the Life Sciences Institute as well as an assistant professor of biological chemistry at the U-M Medical School. His lab hosted the work. “It really streamlines the process stage so that researchers can jump in and focus on what’s important: the scientific questions they want to ask and answer.”
The program was published April 14 as part of a study in the journal Structure, which is available here. The work was supported by the National Science Foundation and the National Institutes of Health.