I studied computer science and math at UConn and earned my PhD in systems biology at Harvard. My research has been in statistically modeling a variety of biological data, especially mutational effects. I'm currently a postdoc in Walter Fontana's lab at Harvard Medical School where I'm building tools, including our software package PyKappa, to formally express, simulate, and iteratively improve our models of complex biomolecular systems.
Genetics
We analyze data of protein mutational effects with respect to the amino acids exchanged and site surface accessibility. This simple model quantifies physicochemical trends and explains the variance in effects on par with sequence-based machine learning approaches.
medRxiv
Clinicians often assess lab results by checking whether they're extreme with respect to a known-healthy population. We analyze the statistical implications of this interpretation and highlight considerations for more precise, personalized references.
PLOS One
Directed evolution, i.e. repeatedly mutagenizing and selecting better-performing variants, is a powerful tool to engineer proteins. By making statistical models of this process, we describe how selection stringency (how harshly one selects at each step) affects success.
Immunity
Antibodies that broadly neutralize group 1 influenza A subtypes can rarely neutralize group 2. This study shows that a vaccine enables cross-group protection in humanized mice via a single amino acid change in a class of precursor antibodies.
Blood Neoplasia
TP53 mutations drive cancer progression including myeloid neoplasms, but their distinct clinical roles remain unclear. This study compares different TP53 mutations across myeloid neoplasm subtypes in patient data.
Drug Discovery Today
The AUC and AUPR metrics are commonly used to evaluate models that predict drug side effects from their molecular features, but their baselines depend on the statistical properties of the ground truth. We analyze this dependence and how much models actually benefit from molecular fingerprints.
Bioinformatics
Large cohorts with both genotype and gene expression data shed light on how genetic variation shapes complex traits and diseases. We develop haplotype-based models that account for interactions between genetic markers, improving genotype-to-expression prediction for a large subset of genes.
Forecasting
Thunderstorms can quickly cause many power outages. Predicting them is challenging when models summarize weather over an entire storm. Instead, we develop a framework for learning outage dynamics directly from hourly weather forecasts.