Her work relates to
transcription and translational control of protein expression levels in the
central dogma and statistical methods for
RNA-seq data at the bulk and
single-cell levels. Her 2015
Science study, a reanalysis of a 2011
Nature article, suggested that transcription, rather than
translation, remains the dominant factor regulating protein abundance, primarily influencing differences in protein expression levels across genes. Her research group developed a suite of
single-cell data simulators, including scDesign, scDesign2 that captures gene-gene correlations, scDesign3 for single-cell and spatial multi-omics data, and scReadSim for single-cell RNA-seq and ATAC-seq read simulation. Besides, her group developed scImpute, an imputation tool for missing gene expression values. Her contributions also extend to statistical and computational methodologies, including Clipper, a p-value-free
false discovery rate (FDR) control method; ITCA, a criterion for guiding the combination of ambiguous class labels in multiclass classification; and Neyman-Pearson classification, a framework for prioritizing the control of misclassification errors in critical classes. Her recent efforts advocate for the importance of statistical rigor in genomics data analysis. In a recent study, she and co-authors raised a warning in using popular RNA-seq differential expression (DE) methods blindly without checking the underlying assumptions. For example, in population-scale human RNA-seq samples where the
negative binomial assumption for each gene does not hold, popular methods relying on this assumption can lead to excessive false discoveries, while
non-parametric tests such as the
Wilcoxon rank-sum test gives more reliable results. Moreover, she developed scDEED, a statistical method leveraging permutation techniques to evaluate and optimize embeddings produced by
t-SNE and
UMAP. scDEED detects dubious embeddings that fail to preserve mid-range distances and refines t-SNE and UMAP hyperparameters. == References ==