0% found this document useful (0 votes)
150 views

Tutorial - Ernst - ChromHMM

This document provides an overview of ChromHMM, a tool for analyzing chromatin states and epigenomic data. It discusses: 1) How ChromHMM uses a hidden Markov model to learn combinatorial patterns of histone modifications and define chromatin states across the genome. 2) Examples of large-scale ChromHMM analyses, including defining states across 127 cell/tissue types from ENCODE and Roadmap Epigenomics. 3) Accessing pre-defined chromatin state segmentations through the UCSC Genome Browser and Epigenome Gateway browser. 4) A brief tutorial on running ChromHMM on one's own data to learn chromatin states and interpret the output.

Uploaded by

KS VelArc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Tutorial - Ernst - ChromHMM

This document provides an overview of ChromHMM, a tool for analyzing chromatin states and epigenomic data. It discusses: 1) How ChromHMM uses a hidden Markov model to learn combinatorial patterns of histone modifications and define chromatin states across the genome. 2) Examples of large-scale ChromHMM analyses, including defining states across 127 cell/tissue types from ENCODE and Roadmap Epigenomics. 3) Accessing pre-defined chromatin state segmentations through the UCSC Genome Browser and Epigenome Gateway browser. 4) A brief tutorial on running ChromHMM on one's own data to learn chromatin states and interpret the output.

Uploaded by

KS VelArc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

ChromHMM Tutorial

Jason Ernst

Assistant Professor

University of California, Los Angeles

Talk Outline

•   ChromaMn states analysis and ChromHMM  

•   Accessing chromaMn state annotaMons for


ENCODE2 and Roadmap Epigenomics
•   Running the ChromHMM so?ware
Talk Outline

•   ChromaMn states analysis and ChromHMM  

•   Accessing chromaMn state annotaMons for


ENCODE2 and Roadmap Epigenomics
•   Running the ChromHMM so?ware
Chroma=n Marks for Genome Annota=on
100+ histone modifications

Specificity in:

•   Histone protein
•   Amino acid residue
•   Chemical modification (e.g.

methyl, acetylation)

•   Number of occurrence of

the modifications

Examples
H3K4me1 – Enhancers
H3K4me3 – Promoters
H3K27me3 – Repressive
H3K9me3 – Repressive
H3K36me3 – Transcribed Image source: http://nihroadmap.nih.gov/epigenomics/

Histone Modifications can be Mapped Genome-wide with ChIP-seq


From ‘chromaMn marks’ to ‘chromaMn states’
Promoter states
• Learn de novo
significant  
Transcribed states
combinaMonal and
spaMal paOerns of
chromaMn marks
Active Intergenic

Repressed • Reveal funcMonal


elements, even
without  looking at
sequence

• Use  for  genome  


annotaMon

Ernst and Kellis, Nat Biotech 2010


Our approach: Mul=variate Hidden Markov Model
(ChromHMM)  

Enhancer Gene Starts Gene -­‐ Transcribed Region DNA

Unobserved

Binarized
chromaMn
marks. Called
H3K4me1 H3K4me3 H3K4me3 H3K4me1
based on a H3K36me3 H3K36me3 H3K36me3 H3K36me3

poisson
distribuMon H3K27ac H3K4me1

Most  likely Hidden


State 1 2 3 4 6 6 6 6 6 5 5 5

High Probability ChromaMn Marks in State


200 base pair interval 0.8   0.8   0.7  
1: H3K4me1 K27ac 4 H3K4me1 All probabiliMes
Emission distribution is a
product of independent
Bernoulli random
2: 0.9   0.8  
5 are learned from
K4me1
variables H3K4me3
the data  
3: 0.9   6: 0.9  
H3K4me3 H3K36me3

Binarization leads to explicit modeling of mark combinations and interpretable parameters


6
Ernst and Kellis, Nat Biotech 2010 ; Ernst and Kellis, Nature Methods 2012

ENCODE: Study nine marks in nine human cell lines

9 marks 9 human  cell  types   81 Chroma=n Mark  Tracks


H3K4me1 HUVEC Umbilical vein endothelial
H3K4me2
NHEK Keratinocytes
H3K4me3
GM12878 Lymphoblastoid
H3K27ac
H3K9ac
H3K27me3
H4K20me1
x K562
HepG2
NHLF
Myelogenous leukemia
Liver carcinoma
Normal human lung fibroblast
H3K36me3 HMEC Mammary epithelial cell
CTCF
HSMM Skeletal muscle myoblasts
+WCE
H1 Embryonic
+RNA
Brad Bernstein ENCODE Group
HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1

• Learned jointly
across cell
types
(virtual
concatenation)
• State definitions
are common
• State locations
are dynamic Ernst et al, Nature 2011
Chroma=n states  dynamics  across  nine  ENCODE cell types

•  Single annotation track for each cell type

•  Summarize cell-type activity at a glance


•  Can study 9-cell activity pattern across
Ernst et al, Nature 2011
Talk Outline

•   ChromaMn states analysis and ChromHMM  

•   Accessing chromaMn state annotaMons for


ENCODE2 and Roadmap Epigenomics
•   Running the ChromHMM so?ware
Chroma=n States Defined Across 127 Cell/Tissues Types

16 epigenomes from ENCODE 2

Roadmap Epigenomics Consortium et al, Nature 2015


Chroma=n States  Defined on Imputed Data
Marks

ChromImpute method
Ernst  and Kellis,  Nature Biotech 2015

ChromHMM Models across Many Roadmap/

ENCODE  Cell  and Tissue Types  

127 Cell/Tissue  Types  

98 Cell/Tissue  Types  

H3K4me1 H3K4me1 127 Cell/Tissue  Types  


H3K4me3 H3K4me3
H3K27me3 H3K27me3 H3K4me1 H3K9ac
H3K9me3 H3K9me3 H3K4me3 H4K20me1
H3K36me3 H3K36me3 H3K27me3 H3K79me2
H3K9me3 H3K4me2
H3K27ac
H3K36me3 H2A.Z
H3K27ac DNase
Roadmap Epigenomics IntegraMve Analysis Portal

hOp://compbio.mit.edu/roadmap
Roadmap Epigenomics IntegraMve Analysis Portal

hOp://compbio.mit.edu/roadmap
Accessing Roadmap ChromHMM through
the UCSC Genome Browser Track Hubs
hOp://genome.ucsc.edu  
Accessing Roadmap ChromHMM through
the UCSC Genome Browser Track Hubs
hOp://genome.ucsc.edu  
Accessing Roadmap ChromHMM through
the UCSC Genome Browser Track Hubs

Note: Different  than track hub


Roadmap Epigenomics Data  Complete CollecMon at Wash U VizHub
Accessing Roadmap ChromHMM through
the UCSC Genome Browser Track Hubs
Accessing Roadmap ChromHMM through
the UCSC Genome Browser Track Hubs
Accessing Roadmap ChromHMM through
the UCSC Genome Browser Track Hubs
Accessing Roadmap ChromHMM through
the UCSC Genome Browser Track Hubs
Human Epigenome Browser at

Washington University
hOp://epigenomegateway.wustl.edu/
Talk Outline
•   ChromaMn states analysis and ChromHMM  

•   Accessing chromaMn state annotaMons for


ENCODE2 and Roadmap Epigenomics
•   Running the ChromHMM so?ware
ChromHMM Website
hOp://compbio.mit.edu/ChromHMM  

So?ware download
ChromHMM Website
hOp://compbio.mit.edu/ChromHMM  

So?ware manual
Try to Run ChromHMM on Sample
Data  on Your Computer
(Java  needs to already be installed)
1.  Download
hOp://compbio.mit.edu/ChromHMM/ChromHMM.zip
2.  Unzip ChromHMM.zip
3.  Open a command line
4.  Change into the ChromHMM directory
5.  Enter the command:
java -mx1600M -jar ChromHMM.jar LearnModel -p 0 SAMPLEDATA_HG18 OUTPUTSAMPLE 10 hg18

Input  to ChromHMM


•   ChromHMM models are learned from
binarized data  using its LearnModel
command
•   Binarized data  is typically obtained starMng
from aligned reads.
–   Apply BinarizeBed if reads are in BED format  
–  Apply BinarizeBam if reads are in BAM  format
BinarizeBed

Java  command ‘-­‐mx1600M’ specifies memory to Java  

BinarizeBed

ChromHMM command
BinarizeBed

File with the chromosome lengths for the assembly


BinarizeBed

DIRECTORY of BED files


BinarizeBed

cell mark cell-­‐mark


Cell-­‐mark –file table

Control data  – is opMonal and can


also be treated as a mark
BinarizeBed

Output  directory
LearnModel

‘-­‐p 0’ Use as many processors as available


‘-­‐p N’ Use up to N processors (default  N=1)
LearnModel

Directory with the Binarized Input  

LearnModel

Directory where the output  goes


LearnModel

Number of states
LearnModel

Genome assembly
ChromHMM Report  

ChromHMM Report  

Emission Parameters
TransiMon Parameters
Model Parameter File
SegmentaMon File
Browser Files  

Can load into browser UCSC Genome, IGV

hOps://www.broadinsMtute.org/igv/
Enrichments
PosiMonal Plots
Enrichments for AddiMonal Cell Types
Chroma=n states  to interpret disease  variants

•   Enhancers from
different cell types
enriched in different
•  Specific chromatin states enriched in traits
GWAS catalog

Ernst and Kellis, Nature Biotech 2010 Ernst et al, Nature 2011

•   Imputation based
chromatin state used in
dissection FTO loci

•   Interpreting epigenetic
disease associated
variation in Alzheimer’s
disease
Claussnitzer et al, NEJM 2015 De Jager et al, Nature Neuroscience 2014

•  Many other examples in the literature

Collaborators and Acknowledgements


•  Manolis Kellis

ENCODE consortium
–  Brad Bernstein production group

Roadmap Epigenomics consortium

Funding
•  NHGRI, NIH, NSF, HHMI, Sloan Foundation
AddiMonal Commands
•   CompareModels – the command allows the
comparison of the emission parameters of a selected
model to a set  of models in terms of correlaMon.
AddiMonal Commands
•   MakeBrowserFiles – (re)generates browser
files from segmentaMon files and allows
specifying the coloring
AddiMonal Commands
•   OverlapEnrichment – (re)computes
enrichments of a segmentaMon for a set  of
annotaMons
AddiMonal Commands
•   NeighborhoodEnrichment – (re)computes
enrichments of a segmentaMon around a set  
of anchor posiMons
AddiMonal Commands
•  Reorder – reorders the states of the model

You might also like