Shuo Zhang, Ph.D.

University of Pennsylvania
The Perelman School of Medicine
9-155 Smilow Center for Translational Research
3400 Civic Center Blvd
Philadelphia, PA 19104

Research interest/work responsibility
My main role at the Penn Epigenetics Institute is to provide bioinformatic services to the Institute’s core members. I am interested in analyzing next-generation sequencing data to gain biological insights. In addition, I am interested in training and applying machine learning models to tackle biological questions.


  • Whole genome bisulfite sequencing, DNA methylation ChIP array
  • Bulk/single-cell RNA-seq, small RNA-seq (piRNA)
  • ChIP-seq, CUT&RUN, ATAC-seq
  • HiC


I have ten years of hands-on experience in analyzing a variety of next-generation sequencing data. During my PhD research in genomics, I developed a novel computational pipeline to annotate transposable elements (DNA parasites accounting for ~45% of the human genome) from a terabyte-scale whole genome re-sequencing D. melanogaster strains. During my first postdoctoral training, I developed computational pipelines that used HiC data to identify three-dimensional chromatin changes, including split/merge of topologically associated domains (TADs) and changes in chromatin stripes. During my postdoctoral training under the mentorship of Dr. Elizabeth A Heller, I expanded my expertise towards computational analysis of neuronal epigenetic regulation. This training involved analysis and mining of CUT&RUN, ChIP-seq and RNA-seq data. I have also expanded my expertise into machine learning models, such as PLIER, to interrogate cocaine-regulated gene expression in preclinical models of addiction.

Contribution to Science

Development of bioinformatics tools: as a PhD student, I developed a tool to identity insertions of transposable element from whole genome re-sequencing data. I automated the method with a combination of bash and Perl scripts in a Linux high-performance computing environment. This pipeline has been used to study the effect of genetic variations caused by transposable element insertions. As a postdoc, I developed tools to identify three-dimensional chromatin changes. These tools have successfully Finally, I adapted a machine learning model to analyze mouse gene expression data.

  • Zhang S, Pointer B, Kelleher E. 2020. Rapid evolution of piRNA-mediated silencing of an invading transposable element was driven by abundant de novo mutations. Genome Res. 30: 566-575
  • Zhang, S. & Kelleher, E.S., 2017. Targeted identification of TE insertions in a Drosophila genome through hemi-specific PCR. Mobile DNA. 8:10.
  • Wang L, Zhang S, Hadjipanteli S, Saiz L, Silva E, Nguyen L, and Kelleher E. 2023.TE invasion fuels molecular adaptation in laboratory populations of Drosophila melanogaster. Evolution, qpad017
  • Gupta K*, Wang G*, Zhang S*, Gao X, Zheng R, Zhang Y, Meng Q, Zhang L, Cao Q, Chen K. 2022. StripeDiff: Model-based Algorithm for Differential Analysis of Chromatin Stripe. Science Advances,8(49), p.eabk2246.
  • Wang G, Meng Q, Xia B, Zhang S, Lv J, Zhao D, Li Y, Wang X, Zhang L, Cooke JP, et al. 2020.TADsplimer reveals splits and mergers of topologically associating domains for epigenetic regulation of transcription. Genome Biol. 21
  • Zhang S, Heil BJ, Mao W, Chikina M, Greene CS, Heller EA. MousiPLIER: A Mouse Pathway-Level Information Extractor Model. bioRxiv [Preprint]. Aug 15:2023.07.31.551386. doi: 10.1101/2023.07.31.551386.

Next-generation sequencing data analysis applied on various projects: I have established collaborations to provide bioinformatic expertise to wet-bench scientists to: 1) define neuronal-subtype global histone modification profiling; 2) define the role of EZH1 mutation on H3K27me3 distribution and neuronal differentiation; 3) identify differential gene expression downstream of TRPV2; 4) examine the role of ∆Fosb in RNA splicing, by integrating ∆Fosb ChIP-seq and RNA-seq data from mouse brain; 5) identify the topologically associated domain dynamics during cardiomyocyte maturation.

  • Yeh S, Estill M, Lardner CK, Browne CJ, Toribio, A, Futamura R, Beach K, McManus CA, Xu S, Zhang S, et al., 2023. Cell-type-specific whole-genome landscape of ∆FOSB binding in nucleus accumbens after chronic cocaine exposure. Biological Psychiatry.
  • Gracia-Diaz, Carolina, Yijing Zhou, Qian Yang, Reza Maroofian, Paula Espana-Bonilla, Chul-Hwan Lee, Shuo Zhang et al. “Gain and loss of function variants in EZH1 disrupt neurogenesis and cause dominant and recessive neurodevelopmental disorders.”Nature communications 14, no. 1 (2023): 4109.
  • Carpenter MD*, Fischer DK*, Zhang S*, Bond AM, Czarnecki KS, Woolf MT, Song H, Heller EA. 2022. Cell-Type Specific Profiling of Histone Post-Translational Modifications in the Adult Mouse Striatum. Nature Communications, 13(1), p.7720.
  • Zhou, Pingzhu, Nathan J. VanDusen, Yanchun Zhang, Yangpo Cao, Isha Sethi, Rong Hu, Shuo Zhang et al. “Dynamic changes in P300 enhancers and enhancer-promoter contacts control mouse cardiomyocyte maturation.”Developmental Cell, 58, no. 10 (2023): 898-914.


University of Houston, Ph.D. 2019