Yemin Lan – Ph.D.

Yemin Lan – Ph.D.


Research interest/work responsibility 
My role in the Epigenetics Program is to provide bioinformatic services. This includes analysis of big data and next-generation sequencing data in particular, consultation and education of computational topics of interest.

Contributions to Science

Development of bioinformatics tools: as a PhD candidate in Dr. Rosen’s lab, my early research interest was to develop computational algorithms for bioinformatics analysis. One of the intriguing challenge’s in bioinformatics as well as other big data analysis was, how to extract meaningful information from a large amount of information. I integrated feature selection algorithms and supervised classification to identify signatures (e.g. gene sets) most meaningful in comparative genomics studies. This algorithm has been packaged as the FIZZY software. I also collaborated with the Ribosomal Database Project research group, and established a set of standards for their taxonomic classification tool.

  • Lan, Q. Wang, J. R. Cole, and G. L. Rosen, “Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms,” PLOS ONE, 2012, 7(3), e32491.
  • L. Bouchot, W. Trimble, G. Ditzler, Y. Lan, S. Essinger, and G. L. Rosen, “Advances in machine learning for processing and comparison of metagenomic data,” Computational Systems Biology: From Molecular Mechanisms to Disease, 2013, Ed: A. Kriete and E. Roland, Academic press, ISBN: 978-0-1240-5938-2.
  • Ditzler, Y. Lan, J.-L. Bouchot, and G. L. Rosen, “Feature selection for metagenomic data analysis,” Encyclopedia of Metagenomics, 2014, Ed: K. E. Nelson, Springer, ISBN: 978-1-4614-6418-1.
  • Ditzler, J. C. Morrison, Y. Lan, and G. Rosen, “Fizzy: feature subset selection for metagenomics,” BMC Bioinformatics, 2015, 16(358).

Construction of genomic database: for decades the scientific community has been using single marker genes to identify microbial organisms and lived with the fact that the most popular marker gene (16S rRNA gene) could not well differential microbes at the species level. While alternative marker genes were used occasionally, it was difficult to compare and decide the best marker gene in various scenarios. Using high-performance computation to compare all existing prokaryotic whole genomes, I developed POGO-DB, a database of pairwise comparison of genomes and universal orthologous genes. It features a web interface and API access, allowing users to compare and select optimal marker genes that can better resolve the phylogenetic relationships between closely related microbes.

  • Lan, J. C. Morrison, R. Hershberg, and G. L. Rosen, “POGO-DB—a database of pairwise-comparisons of genomes and conserved orthologous genes,” Nucleic Acids Research, 2014, 42(D1), D625-D632.
  • Lan, G. L. Rosen, and R. Hershberg, “Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains,” Microbiome, 2016, 4(1), 1.

Next-generation sequencing data analysis applied on various topics: during the development of tools and databases, I gained access to a variety of next-generation sequencing data. The analysis of these diverse datasets uncovered microbiome alternation over various courses, including the consequences of symbiotic evolution, during human aging or environmental changes.

  • Hu, P. Lukasik, Y. Lan, C. S. Moreau, G. L. Rosen, and J. A. Russell, “Variation of symbiotic gut communities across diets and colonies of the ant Cephalotes varians,” in Entomological Society of America Annual Meeting, 2012.
  • Lan, A. Kriete, and G. L. Rosen, “Selecting age-related functional characteristics in the human gut microbiome,” Microbiome, 2013, 1(1), 1-12.
  • Clark, Y. Lan, G. L. Rosen, and C. B. Blackwood, “Relating microbial physiological performance to genome content,” in 98th ESA Annual Convention, 2013.
  • Lan, B. Stenuit, G. L. Rosen, J. B. Hughes, L. Alvarez-Cohen, and C.M. Sales, “Effects of historical TNT contamination and mechanical tillage on soil microbial consortia,” 2016, in preparation.

Contact Information
The Perelman School of Medicine at the University of Pennsylvania
Department of Cell and Developmental Biology
9-155 Smilow Center for Translational Research
3400 Civic Center Blvd
Philadelphia, PA 19104-6059
Office: 215-573-7215

If you would like to work with Yemin, please contact her to set up a meeting:

If it turns out that she can be of help to you, you will then be required to fill out a request for approval: