Berk Alpay

I'm a computer science and math major now doing a PhD in Systems, Synthetic, and Quantitative Biology at Harvard in Michael Desai's lab. I work on a bunch of different kinds of problems but my current interests are protein fitness landscapes, clinical data analysis, and Bayesian statistics.



Effects of selection stringency on the outcomes of directed evolution [link]

Berk A. Alpay and Michael M. Desai

Directed evolution — repeatly mutagenizing and selecting better-performing variants — is a powerful tool to engineer proteins. By analysis and simulation of statistical models of this process, we describe how selection stringency (how harshly one selects at each step) affects success.


Eliciting a single amino acid change by vaccination generates antibody protection against group 1 and group 2 influenza A viruses [link]

Rashmi Ray*, Faez A.N. Mohamed*, Daniel P. Maurer*, Jiachen Huang*, Berk A. Alpay, ..., Aaron G. Schmidt, Facundo D. Batista, Daniel Lingwood

Flu mutates, evading antibodies specialized against prior subtypes. Even antibodies that broadly neutralize a group of influenza A subtypes can rarely neutralize the other group. This study shows that a vaccine enables cross-group protection in humanized mice by a single amino acid change in a class of precursor antibodies.

Blood Neoplasia

Comparison of TP53 mutations in myelodysplasia and acute leukemia suggests divergent roles in initiation and progression [link]

Ashwini Jambhekar, Emily E. Ackerman, Berk A. Alpay, Galit Lahav, Scott B. Lovitch

Mutations to the gene TP53 are crucial to the progression of cancer including myeloid neoplasms. However, the clinical importance of the kind of mutation in these cancers is not very well understood. This study compares TP53 mutations across subtypes of myeloid neoplasms from data of hundreds of patients.

Drug Discov. Today

Evaluating molecular fingerprint-based models of drug side effects against a statistical control [link]

Berk A. Alpay, Mark Gosink, Derek Aguiar

The AUC and AUPR are metrics commonly used to evaluate models that predict the side effects of drugs using their molecular features. However, the baseline AUC and AUPR depend on the statistical properties of the ground truth. We analyze this dependence and ask: to what degree do models actually benefit from molecular fingerprints?


Combinatorial and statistical prediction of gene expression from haplotype sequence [link]

Berk A. Alpay*, Pinar Demetci*, Sorin Istrail, Derek Aguiar

Studies have genotyped and measured the gene expression levels of many people. Using this data, one can investigate how genotype influences gene expression, useful for understanding complex traits and diseases. We model gene expression, accounting for interactions among genetic markers. By doing so, we more accurately predict the expression of a large subset of genes.


Dynamic modeling of power outages caused by thunderstorms [link]

Berk A. Alpay, David Wanik, Peter Watson, Diego Cerrai, Guannan Liang, Emmanouil Anagnostou

Thunderstorms can cause many power outages in a short period. Predicting these outages is challenging using models that summarize the weather over the entire course of the storm. Instead, we develop a framework for models to learn the dynamics of thunderstorm-caused outages directly from hourly weather forecasts.