Berger began her research career in algorithms before transitioning to computational molecular biology during her postdoctoral work at MIT. Her postdoctoral advisor
Daniel Kleitman, a professor of
applied mathematics, encouraged her to apply computational methods to protein folding after hearing Stanford
biophysicist Michael Levitt speak on the subject. This pivot marked the beginning of a research program that would span more than a dozen subfields over the following three decades.
Protein structure prediction Early in her career, Berger developed methods to predict which
proteins would form coiled coil structures,
helical bundles in which two or more
amino acid chains twist around each other. Drawing on code-breaking techniques that analyze the frequency of character pairs and triplets, she identified sequence-level patterns that distinguish coiled coil proteins from others. The resulting paper and its follow-ups have been cited approximately 2,000 times and enabled downstream work such as predicting how influenza viruses bind to
cell membranes. Working with Jonathan King in the MIT biology department and mathematician
Peter Shor, she also studied
viral capsids, the protein shells that protect viruses and facilitate cell entry. She predicted that
capsid formation follows local assembly rules, with each protein subunit acting as a lock-and-key trigger for the next.
Comparative and functional genomics In the late 1990s, Berger and her students
Serafim Batzoglou and
Lior Pachter, working with
Broad Institute researcher
Eric Lander, developed an algorithm to align the
genomes of two different species. In 2000 they published the first paper on
comparative genomics, demonstrating that coding regions of the human and mouse genomes are on average 80% identical, a finding that launched the subfield. Berger and Batzoglou also developed the prototype for the Arachne sequence assembler, a software tool later used extensively in assembling the first
human genome. She subsequently extended this comparative work to
fruit flies (
Drosophila) and 18 species of
yeast with student Manolis Kellis. With student Rohit Singh, she developed Isorank, software that aligns genome sequences across species using a ranking algorithm analogous to PageRank, integrating heterogeneous data types such as protein-protein interactions and sequence alignments to identify
genes with common ancestry and function across species.
Compressive genomics Together with students Po-Ru Loh and Michael Baym, Berger invented compressive genomics, a set of methods for compressing DNA sequence, drug molecule, metagenomic, and protein sequence data in such a way that existing analysis software can operate directly on the compressed representation without decompressing it first. The approach exploits the fact that genomic sequences cluster in dense groups of near-identical sequences, allowing each cluster to be summarized by a representative. Her team demonstrated that this technique could accelerate decades-old sequence-comparison algorithms by two orders of magnitude while retaining more than 99% accuracy.
Genomic privacy Berger developed a framework for sharing sensitive biological data, such as human genomes, across institutions while preserving privacy, adapting a cryptographic technique called multi-party computation that had not previously been applied at biological scale. Working with students Brian Hie and Hyunghoon Cho, she applied this framework to pharmacological collaboration, allowing competing companies to securely pool data on drug-target interactions without revealing proprietary information to one another. The resulting shared model allows individual companies to query whether a drug targets a protein of interest, enabling more efficient drug repurposing than any single institution could achieve alone.
Viral language models More recently, Berger developed a method she calls "Mad Libs for viruses," which repurposes language models trained on protein sequences to predict how readily viral strains will evade the immune system. Unlike earlier methods, it does not require sequence alignments between new strains and known ones. The model's final layer evaluates structural viability, analogous to grammatical correctness, while the penultimate layer assesses semantic divergence from previously seen strains, flagging variants likely to escape existing antibody recognition. Berger applied this approach in collaboration with the Coalition for Epidemic Preparedness Innovations (CEPI) to assess the deltacron SARS-CoV-2 variant and with the Centers for Disease Control and Prevention to predict future variants with immune escape capacity. ==Awards and honours==