Publications

Statistical relationships across epigenomes using large-scale hierarchical clustering

Anastasiia Kim, Nicholas Lubbers, Christina R. Steadman, and Karissa Y. Sanbonmatsu
Bioinformatics Advances, 2025
ArXiv    

In this study, we use ML to explore over 3,000 epigenomes and provide a comprehensive characterization of the relationships among epigenetic modifications, their modifiers, and specific immune cell types across all chromosomes. We find that in addition to the traditional perspective that epigenetic modifiers help regulate the expression of genes involved with cellular processes, they also function in a feedforward manner to regulate their own expression. We elaborate on the rationale behind analyzing baseline healthy data as a preparatory step for future infectious disease studies.

Aerial imagery dataset of lost oil wells

Anastasiia Kim, Teeratorn Kadeethum, Christine Downs, Hari S. Viswanathan, and Daniel O'Malley
Scientific Data, 2024
View    

This work addresses the critical issue of orphaned wells—inactive oil and gas wells that are often unreported and contribute significantly to climate change, groundwater contamination, and toxic emissions. With potentially millions of these wells in the U.S. alone, locating them is essential for effective environmental remediation. Our paper presents a comprehensive dataset of high-resolution 120,948 aerial images of documented orphan wells, accompanied by segmentation masks and metadata.

Latent Dirichlet Allocation modeling of environmental microbiomes

Anastasiia Kim, Sanna Sevanto, Eric R. Moore, and Nicholas Lubbers
PLOS Computational Biology, 2023
View    

In this paper, we identified corn soil microbiome communities associated with different experimental conditions, such as watering treatment and soil source type, at each taxonomic level using Latent Dirichlet Allocation (LDA). Unlike traditional methods used for microbial analysis which target individual taxa, LDA provides an effective way to quickly find significant correlations of groups of multiple taxa with plant traits responsible for its performance under water stress. LDA identified microbiome compositions that may act synergistically toward some ecological function in plant-microbiome interaction.

Probabilities of unranked and ranked anomaly zones under birth-death models

Anastasiia Kim, Noah Rosenberg, and James Degnan
Molecular Biology and Evolution, 2020
View     ArXiv    

In this paper, we study how the parameters of a species tree simulated under a constant rate birth-death process can affect the probability that the species tree lies in the anomaly zone. We derive the lower bound of the probability of the species tree being in an unranked anomaly zone with n leaves for large speciation rate $\lambda$, and we show that this lower bound approaches 1 as n $\rightarrow \infty$ and $\lambda \rightarrow \infty$.