HiCExplorer¶
Set of programs to process, normalize, analyze and visualize Hi-C data¶
HiCExplorer addresses the common tasks of Hi-C data analysis from processing to visualization.

Availability¶
HiCExplorer is available as a command line suite of tools on this GitHub repository.
A Galaxy HiCExplorer version is directly available to users at http://hicexplorer.usegalaxy.eu. Training material is available at the Galaxy Training Network, while a Galaxy Tour is available here for users not familiar with this platform. Galaxy HiCExplorer is also available as a Docker image at the Docker Galaxy HiCExplorer GitHub repository. Finally, this Galaxy version is available on the Galaxy Tool Shed and on the corresponding GitHub repository.
The following is the list of tools available in HiCExplorer¶
tool |
description |
---|---|
Identifies the genomic locations of restriction sites |
|
Creates a Hi-C matrix using the aligned BAM files of the Hi-C sequencing reads |
|
Plots QC measures from the output of hicBuildMatrix |
|
Uses iterative correction to remove biases from a Hi-C matrix |
|
Identifies enriched Hi-C contacts |
|
Computes and visualises the correlation of Hi-C matrices |
|
Identifies Topologically Associating Domains (TADs) |
|
Computes for A / B compartments the eigenvectors |
|
Computes a obs_exp matrix like Lieberman-Aiden (2009), a pearson correlation matrix and or a covariance matrix. These matrices can be used for plotting. |
|
Merges consecutive bins on a Hi-C matrix to reduce resolution |
|
Uses a BED file of domains or TAD boundaries to merge the bin counts of a Hi-C matrix. |
|
Plot the decay in interaction frequency with distance |
|
Plots a Hi-C matrix as a heatmap |
|
Plots TADs as a track that can be combined with other tracks (genes, signal, interactions) |
|
A plot with the interactions around a reference point or region. |
|
A tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions. |
|
Adds Hi-C matrices of the same size |
|
Plots distance vs. Hi-C counts of corrected data |
|
Shows information about a Hi-C matrix file (no. of bins, bin length, sum, max, min, etc) |
|
Computes difference or ratio between two matrices |
|
Computes the average of multiple given regions, usually TAD regions |
|
visualization of hicAverageRegions |
|
Normalizes the given matrices to 0-1 range or the smallest read coverage |
|
Converts between different Hi-C interaction matrices |
|
Keeps, removes or masks regions in a Hi-C matrix |
Getting Help¶
For all kind of questions, suggesting changes/enhancements and to report bugs, please create an issue on our GitHub repository
In the past we offered to post on Biostars with Tag hicexplorer : Biostars or on the deepTools mailing list. We still check these resources from time to time but the preferred way to communicate are GitHub issues.
Contents:¶
Installation¶
Requirements¶
Python 3.6
numpy >= 1.15
scipy >= 1.1
matplotlib >= 3.0
pysam >= 0.14
intervaltree >= 2.1
biopython >= 1.72
pytables >= 3.4
pyBigWig >= 0.3
future >= 0.17
six >= 1.11
jinja2 >= 2.10
pandas >= 0.23
unidecode >= 1.0
hicmatrix = 9
pygenometracks >= 2.1
psutil >= 5.4.8
hic2cool >= 0.5
cooler >= 0.8.3
krbalancing >= 0.0.3 (Needs the library eigen; openmp is recommended for linux users. No openmp support on macOS.)
fit_nbinom >= 1.0
Warning: Python 2.7 support is discontinued. Moreover, the support for pip is discontinued too. Warning: We strongly recommend to use the conda package manager and will no longer give support on all issues raising with pip.
Command line installation using conda
¶
The fastet way to obtain Python 3.6 together with numpy and scipy is via the Anaconda Scientific Python Distribution. Just download the version that’s suitable for your operating system and follow the directions for its installation. All of the requirements for HiCExplorer can be installed in Anaconda with:
$ conda install hicexplorer -c bioconda -c conda-forge
We strongly recommended to use conda to install HiCExplorer.
Command line installation using pip
¶
The installation via pip is discontinued with version 3.0. The reason for this is that we want to provide a ‘one-click’ installation. However, with version 3.0 we added the C++ library eigen as dependency and pip does not support non-Python packages.
For older versions you can still use pip: Install HiCExplorer using the following command:
$ pip install hicexplorer
All python requirements should be automatically installed.
If you need to specify a specific path for the installation of the tools, make use of pip install’s numerous options:
$ pip install --install-option="--prefix=/MyPath/Tools/hicexplorer" git+https://github.com/deeptools/HiCExplorer.git
Warning: It can be that you have to install additional packages via your system package manager to successfully install HiCExplorer via pip. Warning: We strongly recommend to use the conda package manager and will no longer give support on all issues raising with pip.
Command line installation without pip
¶
You are highly recommended to use pip rather than these more complicated steps.
Install the requirements listed above in the “requirements” section. This is done automatically by pip.
2. Download source code
$ git clone https://github.com/deeptools/HiCExplorer.git
or if you want a particular release, choose one from https://github.com/deeptools/HiCExplorer/releases:
$ wget https://github.com/deeptools/HiCExplorer/archive/1.5.12.tar.gz
$ tar -xzvf
3. To install the source code (if you don’t have root permission, you can set
a specific folder using the --prefix
option)
$ python setup.py install --prefix /User/Tools/hicexplorer
Galaxy installation¶
HiCExplorer can be easily integrated into a local Galaxy, the wrappers are provided at the Galaxy tool shed.
Installation with Docker¶
The HiCExplorer Galaxy instance is also available as a docker container, for those wishing to use the Galaxy framework but who also prefer a virtualized solution. This container is quite simple to install:
$ sudo docker pull quay.io/bgruening/galaxy-hicexplorer
To start and otherwise modify this container, please see the instructions on the docker-galaxy-stable github repository. Note that you must use bgruening/galaxy-hicexplorer in place of bgruening/galaxy-stable in the examples, as the HiCExplorer Galaxy container is built on top of the galaxy-stable container.
Tip
For support, or feature requests contact: deeptools@googlegroups.com
HiCExplorer tools¶
tool |
type |
input files |
main output file(s) |
application |
---|---|---|---|---|
preprocessing |
1 genome FASTA file |
bed file with restriction site coordinates |
Identifies the genomic locations of restriction sites |
|
preprocessing |
2 BAM/SAM files |
hicMatrix object |
Creates a Hi-C matrix using the aligned BAM files of the Hi-C sequencing reads |
|
preprocessing |
hicMatrix object |
normalized hicMatrix object |
Uses iterative correction or Knight-Ruiz to remove biases from a Hi-C matrix |
|
preprocessing |
hicMatrix object |
hicMatrix object |
Merges consecutives bins on a Hi-C matrix to reduce resolution |
|
preprocessing |
2 or more hicMatrix objects |
hicMatrix object |
Adds Hi-C matrices of the same size |
|
preprocessing |
multiple Hi-C matrices |
multiple Hi-C matrices |
Normalize data to 0 to 1 range or to smallest total read count |
|
analysis |
2 or more hicMatrix objects |
a heatmap/scatterplot |
Computes and visualises the correlation of Hi-C matrices |
|
analysis |
hicMatrix object |
bedGraph file (TAD score), a boundaries.bed file, a domains.bed file (TADs) |
Identifies Topologically Associating Domains (TADs) |
|
visualization |
hicMatrix object |
a heatmap of Hi-C contacts |
Plots a Hi-C matrix as a heatmap |
|
visualization |
hicMatrix object, a config file |
Hi-C contacts on a given region, along with other provided signal (bigWig) or regions (bed) file |
Plots TADs as a track that can be combined with other tracks (genes, signal, interactions) |
|
visualization |
hicMatrix object |
log log plot of Hi-C contacts per distance |
Quality control |
|
data integration |
one/multiple Hi-C file formats |
Hi-C matrices/outputs in several formats |
Convert matrix to different formats |
|
data integration |
one Hi-C file formats |
Hi-C matrix |
Removes, masks or keeps specified regions of a matrix |
|
information |
one or more hicMatrix objects |
Screen info |
Prints information about matrices, like size, maximum, minimux, bin size, etc. |
|
analysis |
one Hi-C matrix |
bedgraph or bigwig file(s) for each eigenvector |
Computes for A / B compartments the eigenvectors |
|
analysis |
one Hi-C matrix |
Hi-C matrix |
Computes a obs_exp matrix like Lieberman-Aiden (2009), a pearson correlation matrix and or a covariance matrix. These matrices can be used for plotting. |
|
visualization |
one Hi-C matrix |
A viewpoint plot |
A plot with the interactions around a reference point or region. |
|
information |
log files from hicBuildMatrix |
A quality control report |
Quality control of the created contact matrix. |
|
analysis |
two Hi-C matrices |
one Hi-C matrix |
Applies diff, ratio or log2ratio on matrices to compare them. |
|
analysis |
multiple Hi-C matrices |
one npz object |
Averages the given locations. Visualization with hicPlotAverageRegions |
|
analysis |
one Hi-C matrices |
bedgraph file with loop locations |
Detects enriched regions. Visualization with hicPlotmatrix and –loop parameter. |
|
visualization |
one npz file |
one image |
Visualization of hicAverageRegions. |
|
preprocessing |
one Hi-C matrix, one BED file |
one Hi-C matrix |
Uses a BED file of domains or TAD boundaries to merge the bin counts of a Hi-C matrix. |
General principles¶
A typical HiCExplorer command could look like this:
$ hicPlotMatrix -m myHiCmatrix.h5 \
-o myHiCmatrix.pdf \
--clearMaskedBins \
--region chrX:10,000,000-15,000,000 \
--vMin -4 --vMax 4 \
You can always see all available command-line options via –help:
$ hicPlotMatrix --help
Output format of plots should be indicated by the file ending, e.g.
MyPlot.pdf
will return a pdf file,MyPlot.png
a png-file.Most of the tools that produce plots can also output the underlying data - this can be useful in cases where you don’t like the HiCExplorer visualization, as you can then use the data matrices produced by deepTools with your favorite plotting tool, such as R.
The vast majority of command line options are also available in Galaxy (in a few cases with minor changes to their naming).
Example usage¶
Hi-C analysis of mouse ESCs using HiCExplorer¶
The following example shows how we can use HiCExplorer to analyze a published dataset. Here we are using a Hi-C dataset from Marks et. al. 2015, on mouse ESCs.
Protocol¶
The collection of the cells for Hi-C and the Hi-C sample preparation procedure was performed as previously described Lieberman-Aiden et al., with the slight modification that DpnII was used as restriction enzyme during initial digestion. Paired-end libraries were prepared according to Lieberman-Aiden et al. and sequenced on the NextSeq 500 platform using 2 × 75 bp sequencing.
Prepare for analysis¶
Download Raw fastq files¶
The fastq files can be downloaded from the EBI archive (or NCBI archive). We will store the files in the directory original_data.
mkdir original_data
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/007/SRR1956527/SRR1956527_1.fastq.gz -O original_data/SRR1956527_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/007/SRR1956527/SRR1956527_2.fastq.gz -O original_data/SRR1956527_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/008/SRR1956528/SRR1956528_1.fastq.gz -O original_data/SRR1956528_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/008/SRR1956528/SRR1956528_2.fastq.gz -O original_data/SRR1956528_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/009/SRR1956529/SRR1956529_1.fastq.gz -O original_data/SRR1956529_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/009/SRR1956529/SRR1956529_2.fastq.gz -O original_data/SRR1956529_2.fastq.gz
Create an index¶
We start with creating an index for our alignment software for the GRCm38/mm10 genome. As a source we use the mm10 genome from UCSC
mkdir genome_mm10
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz -O genome_mm10/chromFa.tar.gz
tar -xvzf genome_mm10/chromFa.tar.gz
cat genome_mm10/*.fa > genome_mm10/mm10.fa
We have the mm10 genome stored in one fasta file and can build the index. We tried it successfully with hisat2, bowtie2 and bwa. Run the mapping with one of them and do not mix them!
hisat2-build -p 8 genome_mm10/mm10.fa hisat2/mm10_index
You can find more information about hisat
Mapping the RAW files¶
Mates have to be mapped individually to avoid mapper specific heuristics designed for standard paired-end libraries.
It is important to have in mind for the different mappers:
for either bowtie2 or hisat2 use the –reorder parameter which tells bowtie2 or hisat2 to output the sam files in the exact same order as in the .fastq files.
use local mapping, in contrast to end-to-end. A fraction of Hi-C reads are chimeric and will not map end-to-end thus, local mapping is important to increase the number of mapped reads.
Tune the aligner parameters to penalize deletions and insertions. This is important to avoid aligned reads with gaps if they happen to be chimeric.
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956527_1.fastq.gz --reorder | samtools view -Shb - > SRR1956527_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956527_2.fastq.gz --reorder | samtools view -Shb - > SRR1956527_2.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956528_1.fastq.gz --reorder | samtools view -Shb - > SRR1956528_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956528_2.fastq.gz --reorder | samtools view -Shb - > SRR1956528_2.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956529_1.fastq.gz --reorder | samtools view -Shb - > SRR1956529_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956529_2.fastq.gz --reorder | samtools view -Shb - > SRR1956529_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956527_1.fastq.gz --reorder | samtools view -Shb - > SRR1956527_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956527_2.fastq.gz --reorder | samtools view -Shb - > SRR1956527_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956528_1.fastq.gz --reorder | samtools view -Shb - > SRR1956528_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956528_2.fastq.gz --reorder | samtools view -Shb - > SRR1956528_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956529_1.fastq.gz --reorder | samtools view -Shb - > SRR1956529_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956529_2.fastq.gz --reorder | samtools view -Shb - > SRR1956529_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956527_1.fastq.gz | samtools view -Shb - > SRR1956527_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956527_2.fastq.gz | samtools view -Shb - > SRR1956527_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956528_1.fastq.gz | samtools view -Shb - > SRR1956528_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956528_2.fastq.gz | samtools view -Shb - > SRR1956528_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956529_1.fastq.gz | samtools view -Shb - > SRR1956529_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956529_2.fastq.gz | samtools view -Shb - > SRR1956529_2.bam
Build, visualize and correct Hi-C matrix¶
Create a Hi-C matrix using the aligned files¶
In the following we will create three Hi-C matrices and merge them to one.
hicBuildMatrix builds the matrix of read counts over the bins in the genome, considering the sites around the given restriction site. We need to provide:
the input BAM/SAM files: –samFiles SRR1956527_1.sam SRR1956527_2.sam
binsize: –binSize 1000
restriction sequence: –restrictionSequence GATC
the name of output bam file which contains the accepted alignments: –outBam SRR1956527_ref.bam
name of output matrix file: –outFileName hicMatrix/SRR1956527_10kb.h5
the folder for the quality report: –QCfolder hicMatrix/SRR1956527_QC
the number of to be used threads. Minimum value is 3: –threads 8
the buffer size for each thread buffering inputBufferSize lines of each input BAM/SAM file: –inputBufferSize 400000
To build the Hi-C matrices:
mkdir hicMatrix
hicBuildMatrix --samFiles SRR1956527_1.bam SRR1956527_2.bam --binSize 10000 --restrictionSequence GATC --outBam SRR1956527_ref.bam --outFileName hicMatrix/SRR1956527_10kb.h5 --QCfolder hicMatrix/SRR1956527_10kb_QC --threads 8 --inputBufferSize 400000
hicBuildMatrix --samFiles SRR1956528_1.bam SRR1956528_2.bam --binSize 10000 --restrictionSequence GATC --outBam SRR1956528_ref.bam --outFileName hicMatrix/SRR1956528_10kb.h5 --QCfolder hicMatrix/SRR1956528_10kb_QC --threads 8 --inputBufferSize 400000
hicBuildMatrix --samFiles SRR1956529_1.bam SRR1956529_2.bam --binSize 10000 --restrictionSequence GATC --outBam SRR1956529_ref.bam --outFileName hicMatrix/SRR1956529_10kb.h5 --QCfolder hicMatrix/SRR1956529_10kb_QC --threads 8 --inputBufferSize 400000
The output bam files show that we have around 34M, 54M and 58M selected reads for SRR1956527, SRR1956528 & SRR1956529, respectively. Normally 25% of the total reads are selected. The output matrices have counts for the genomic regions. The extension of output matrix files is .h5.
A quality report is created in e.g. hicMatrix/SRR1956527_10kb_QC, have a look at the report hicQC.html.

A segment of Hi-C quality report.¶
To increase the depth of reads we merge the counts from these three replicates.
hicSumMatrices --matrices hicMatrix/SRR1956527_10kb.h5 hicMatrix/SRR1956528_10kb.h5 \
hicMatrix/SRR1956529_10kb.h5 --outFileName hicMatrix/replicateMerged_10kb.h5
Plot Hi-C matrix¶
A 10kb bin matrix is quite large to plot and is better to reduce the resolution (to know the size of a Hi-C matrix use the tool hicInfo), i.e. we usually run out of memory for a 1 kb or a 10 kb matrix and second, the time to plot is very long (minutes instead of seconds). For this we use the tool hicMergeMatrixBins.
hicMergeMatrixBins merges the bins into larger bins of given number (specified by –numBins). We will merge 1000 bins in the original (uncorrected) matrix and then correct it. The new bin size is going to be 10.000 bp * 100 = 1.000.000 bp = 1 Mb
hicMergeMatrixBins \
--matrix hicMatrix/replicateMerged_10kb.h5 --numBins 100 \
--outFileName hicMatrix/replicateMerged.100bins.h5
hicPlotMatrix can plot the merged matrix. We use the following options:
the matrix to plot: –matrix hicMatrix/replicateMerged.100bins.h5
logarithmic values for plotting: –log1p
the resolution of the plot: –dpi 300
masked bins should not be plotted: –clearMaskedBins
the order of the chromosomes in the plot: –chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY
the color map: –colorMap jet
the title of the plot: –title “Hi-C matrix for mESC”
the plot image itself: –outFileName plots/plot_1Mb_matrix.png
mkdir plots
hicPlotMatrix \
--matrix hicMatrix/replicateMerged.100bins.h5 \
--log1p \
--dpi 300 \
--clearMaskedBins \
--chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--colorMap jet \
--title "Hi-C matrix for mESC" \
--outFileName plots/plot_1Mb_matrix.png

The Hi-C interaction matrix with a resolution of 1 MB.¶
hicCorrectMatrix corrects the matrix counts in an iterative manner. For correcting the matrix, it’s important to remove the unassembled scaffolds (e.g. NT_) and keep only chromosomes, as scaffolds create problems with matrix correction. Therefore we use the chromosome names (1-19, X, Y) here. Important: Use ‘chr1 chr2 chr3 etc.’ if your genome index uses chromosome names with the ‘chr’ prefix.
Matrix correction works in two steps: first a histogram containing the sum of contact per bin (row sum) is produced. This plot needs to be inspected to decide the best threshold for removing bins with lower number of reads. The second steps removes the low scoring bins and does the correction.
In the following we will use a matrix with a bin size of 20 kb: 10kb * 2 = 20 kb
hicMergeMatrixBins \
--matrix hicMatrix/replicateMerged_10kb.h5 --numBins 2 \
--outFileName hicMatrix/replicateMerged.matrix_20kb.h5
(1-19, X, Y) variant:
hicCorrectMatrix diagnostic_plot \
--chromosomes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 --plotName hicMatrix/diagnostic_plot.png
(chr1-ch19, chrX, chrY) variant:
hicCorrectMatrix diagnostic_plot \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 --plotName hicMatrix/diagnostic_plot.png

Diagnostic plot for the Hi-C matrix at a resolution of 20 kb¶
The output of the program prints a threshold suggestion that is usually accurate but is better to revise the histogram plot. The threshold is visualized in the plot as a black vertical line. See Example usage for an example and for more info.
- The threshold parameter needs two values:
low z-score
high z-score
“The absolute value of z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.” (Source). For more information see wikipedia.
The z-score definition.¶
In our case the distribution describes the counts per bin of a genomic distance. To remove all bins with a z-score threshold less / more than X means to remove all bins which have less / more counts than X of mean of their specific distribution in units of the standard deviation.
Looking at the above distribution, we can select the value of -2 (lower end) and 3 (upper end) to remove. This is given by the –filterThreshold option in hicCorrectMatrix.
(1-19, X, Y) variant:
hicCorrectMatrix correct \
--chromosomes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 \
--filterThreshold -2 3 --perchr --outFileName hicMatrix/replicateMerged.Corrected_20kb.h5
(chr1-ch19, chrX, chrY) variant:
hicCorrectMatrix correct \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 \
--filterThreshold -2 3 --perchr --outFileName hicMatrix/replicateMerged.Corrected_20kb.h5
It can happen that the correction stops with:
`ERROR:iterative correction:*Error* matrix correction produced extremely large values.
This is often caused by bins of low counts. Use a more stringent filtering of bins.`
This can be solved by a more stringent z-score values for the filter threshold or by a look at the plotted matrix. In our case we see that chromosome Y is having more or less 0 counts in its bins. This chromosome can be excluded from the correction by not defining it for the set of chromosomes that should be corrected, parameter –chromosomes.
We can now plot the one of the chromosomes (e.g. chromosome X) , with the corrected matrix.
- New parameter:
The region to plot: –region chrX:10000000-2000000 or –region chrX
(1-19, X, Y) variant:
hicPlotMatrix \
--log1p --dpi 300 \
-matrix hicMatrix/replicateMerged.Corrected_20kb.npz \
--region X --title "Corrected Hi-C matrix for mESC : chrX" \
--outFileName plots/replicateMerged_Corrected-20kb_plot-chrX.png
(chr1-ch19, chrX, chrY) variant:
hicPlotMatrix \
--log1p --dpi 300 \
--matrix hicMatrix/replicateMerged.Corrected_20kb.npz \
--region chrX --title "Corrected Hi-C matrix for mESC : chrX" \
--outFileName plots/replicateMerged_Corrected-20kb_plot-chrX.png

The Hi-C interaction matrix for chromosome X.¶
Plot TADs¶
“The partitioning of chromosomes into topologically associating domains (TADs) is an emerging concept that is reshaping our understanding of gene regulation in the context of physical organization of the genome” [Ramirez et al. 2017].
Find TADs¶
TAD calling works in two steps: First HiCExplorer computes a TAD-separation score based on a z-score matrix for all bins. Then those bins having a local minimum of the TAD-separation score are evaluated with respect to the surrounding bins to decide assign a p-value. Then a cutoff is applied to select the bins more likely to be TAD boundaries.
hicFindTADs tries to identify sensible parameters but those can be change to identify more stringent set of boundaries.
mkdir TADs
hicFindTADs --matrix hicMatrix/replicateMerged.Corrected_20kb.h5 \
--minDepth 60000 --maxDepth 120000 --numberOfProcessors 8 --step 20000 \
--outPrefix TADs/marks_et-al_TADs_20kb-Bins --minBoundaryDistance 80000 \
--correctForMultipleTesting fdr --threshold 0.05
As an output we get the boundaries, domains and scores separated files. We will use in the plot below only the TAD-score file.
Build Tracks File¶
We can plot the TADs for a given chromosomal region. For this we need to create a track file containing the instructions to build the plot. The hicPlotTADs documentation contains the instructions to build the track file.
In following plot we will use the listed track file. Please store it as track.ini.
[hic]
file = hicMatrix/replicateMerged.Corrected_20kb.h5
title = HiC mESC chrX:99974316-101359967
colormap = RdYlBu_r
depth = 2000000
height = 7
transform = log1p
x labels = yes
type = interaction
file_type = hic_matrix
[tads]
file = TADs/marks_et-al_TADs_20kb-Bins_domains.bed
file_type = domains
border color = black
color = none
height = 5
line width = 1.5
overlay previous = share-y
show data range = no
[x-axis]
fontsize=16
where=top
[tad score]
file = TADs/marks_et-al_TADs_20kb-Bins_score.bm
title = "TAD separation score"
height = 4
file_type = bedgraph_matrix
[spacer]
[gene track]
file = mm10_genes_sorted.bed
height = 10
title = "mm10 genes"
labels = off
We used as a gene track mm10 genes and sorted with sortBed from bedtools.
Plot¶
We plot the result with:
(1-19, X, Y) variant:
hicPlotTADs --tracks track.ini --region X:98000000-105000000 \
--dpi 300 --outFileName plots/marks_et-al_TADs.png \
--title "Marks et. al. TADs on X"
(chr1-ch19, chrX, chrY) variant:
hicPlotTADs --tracks track.ini --region chrX:98000000-105000000 \
--dpi 300 --outFileName plots/marks_et-al_TADs.png \
--title "Marks et. al. TADs on X"
The result is:

TADplot¶
How we use HiCExplorer¶
To generate a Hi-C contact matrix is necessary to perform the following basic steps
Map the Hi-C reads to the reference genome
Filter the aligned reads to create a contact matrix
Filter matrix bins with low or zero read coverage
Remove biases from the Hi-C contact matrices
After a corrected Hi-C matrix is created other tools can be used to visualize it, call TADS or compare it with other matrices.
Reads mapping¶
Mates have to be mapped individually to avoid mapper specific heuristics designed for standard paired-end libraries.
We have used the HiCExplorer sucessfuly with bwa, bowtie2 and hisat2. However, it is important to:
for either bowtie2`or `hisat2 use the –reorder parameter which tells bowtie2 or hisat2 to output the sam files in the exact same order as in the .fastq files.
use local mapping, in contrast to end-to-end. A fraction of Hi-C reads are chimeric and will not map end-to-end thus, local mapping is important to increase the number of mapped reads.
Tune the aligner parameters to penalize deletions and insertions. This is important to avoid aligned reads with gaps if they happen to be chimeric.
# map the reads, each mate individually using
# for example bwa
#
# bwa mem mapping options:
# -A INT score for a sequence match, which scales options -TdBOELU unless overridden [1]
# -B INT penalty for a mismatch [4]
# -O INT[,INT] gap open penalties for deletions and insertions [6,6]
# -E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1] # this is set very high to avoid gaps
# at restriction sites. Setting the gap extension penalty high, produces better results as
# the sequences left and right of a restriction site are mapped independently.
# -L INT[,INT] penalty for 5'- and 3'-end clipping [5,5] # this is set to no penalty.
$ bwa mem -A1 -B4 -E50 -L0 index_path \
mate_R1.fastq.gz 2>>mate_R1.log | samtools view -Shb - > mate_R1.bam
$ bwa mem -A1 -B4 -E50 -L0 index_path \
mate_R2.fastq.gz 2>>mate_R2.log | samtools view -Shb - > mate_R2.bam
Creation of a Hi-C matrix¶
Once the reads have been mapped the Hi-C matrix can be built. For this, the minimal extra information required is the binSize used for the matrix. Is it best to enter a low number like 10.000 because lower resolution matrices (larger bins) can be easily constructed using hicMergeMatrixBins. Matrices at restriction fragment resolution can be created by providing a file containing the restriction sites, this file can be created with the tool findRestSites
findRestSites that is part of HiCExplorer.
# build matrix from independently mated read pairs
# the restriction sequence GATC is recognized by the DpnII restriction enzyme
$ hicBuildMatrix --samFiles mate_R1.bam mate_R2.bam \
--binSize 10000 \
--restrictionSequence GATC \
--threads 4
--inputBufferSize 100000
--outBam hic.bam \
-o hic_matrix.h5
--QCfolder ./hicQC
hicBuildMatrix creates two files, a bam file containing only the valid Hi-C read pairs and a matrix containing the Hi-C contacts at the given resolution. The bam file is useful to check the quality of the Hi-C library on the genome browser. A good Hi-C library should contain piles of reads near the restriction fragment sites. In the QCfolder a html file is saved with plots containing useful information for the quality control of the Hi-C sample like the number of valid pairs, duplicated pairs, self-ligations etc. Usually, only 25%-40% of the reads are valid and used to build the Hi-C matrix mostly because of the reads that are on repetitive regions that need to be discarded.
An important quality control measurement to check is the inter chromosomal fraction of reads as this is an indirect measure of random Hi-C contacts. Good Hi-C libraries have lower than 10% inter chromosomal contacts. The hicQC module can be used to compare the QC measures from different samples.
Correction of Hi-C matrix¶
The Hi-C matrix has to be corrected to remove GC, open chromatin biases and, most importantly, to normalize the number of restriction sites per bin. Because a fraction of bins from repetitive regions contain few contacts it is necessary to filter those regions first. Also, in mammalian genomes some regions enriched by reads should be discarded. To aid in the filtering of regions hicCorrectMatrix generates a diagnostic plot as follows:
$ hicCorrectMatrix diagnostic_plot -m hic_matrix.h5 -o hic_corrected.h5
The plot should look like this:

Histogram of the number of counts per bin.¶
For the upper threshold is only important to remove very high outliers and thus a value of 5 could be used. For the lower threshold it is recommended to use a value between -2 and -1. What it not desired is to try to correct low count bins which could result simply in an amplification of noise. For the upper threshold is not so concerning because those bins will be scaled down.
Once the thresholds have been decided, the matrix can be corrected
# correct Hi-C matrix
$ hicCorrectMatrix correct -m hic_matrix.h5 --filterThreshold -1.5 5 -o hic_corrected.h5
Visualization of results¶
There are two ways to see the resulting matrix, one using hicPlotMatrix and the other is using hicPlotTADs. The first one allows the visualization over large regions while the second one is preferred to see specific parts together with other information, for example genes or bigwig tracks.
Because of the large differences in counts found int he matrix, it is better to plot the counts using the –log1p option.
$ hicPlotMatrix -m hic_corrected.h5 -o hic_plot.png --region 1:20000000-80000000 --log1p

Corrected Hi-C counts in log scale.¶
Quality control of Hi-C data and biological replicates comparison¶
HiCExplorer integrates multiple tools that allow the evualuation of the quality of Hi-C libraries and matrices.
hicQC on the log files produced by hicBuildMatrix and control of the pdf file produced.
Proportion of useful reads is important to assess the efficiency of the HiC protocol, which is dependant of proportion of dangling ends detected… Proportion of inter chromosomal, short range and long range contacts are important for….
hicPlotDistVsCounts to compare the distribution of corrected Hi-C counts in relation with the genomic
distance between multiple samples. If some differences are observed between biological replicates, these can be investigated more precisely by computing log2ratio matrices.
hicCompareMatrices log2ratio of matrices of biological replicates to identify where the potential changes are located.
hicPlotPCA bins correlation of two biological replicates.
TAD calling¶
To call TADs a corrected matrix is needed. Restriction fragment resolution matrices provide the best results. TAD calling works in two steps: First HiCExplorer computes a TAD-separation score based on a z-score matrix for all bins. Then those bins having a local minimum of the TAD-separation score are evaluated with respect to the surrounding bins to decide assign a p-value. Then a cutoff is applied to select the bins more likely to be TAD boundaries.
$ hicFindTADs -m hic_corrected.h5 --outPrefix hic_corrected --numberOfProcessors 16
This code will produce several files: 1. The TAD-separation score file, 2. the z-score matrix, 3. a bed file with the boundary location, 4. a bed file with the domains, 5. a bedgraph file with the TAD-score that can be visualized in a genome browser.
The TAD-separation score and the matrix can be visualized using hicPlotTADs.

Example output from hicPlotTADs from http://chorogenome.ie-freiburg.mpg.de/¶
A / B compartment analysis¶
To compute the A / B compartments the matrix needs to be transformed to an observed/expected matrix in the way Lieberman-Aiden describes it. In a next step a pearson correlation matrix and based on it a covariance matrix is computed. Finally the eigenvectors based on the covariance matrix are computed. All these steps are computed with the command:
$ hicPCA -m hic_corrected.h5 --outFileName pca1.bw pca2.bw --format bigwig
If the intermediate matrices of this process should be used for plotting run:
$ hicTransform -m hic_corrected.h5 --outFileName all.h5 --method all
This creates all intermediate matrices: obs_exp_all.h5, pearson_all.h5 and covariance_all.h5.
The A / B compartments can be plotted with hicPlotMatrix.
$ hicPlotMatrix -m pearson_all.h5 --outFileName pca1.png --perChr --bigwig pca1.bw
//.. figure:: ../images/eigenvector1_lieberman.png // :scale: 90 % // :align: center
News and Developments¶
Release 3.0¶
3 April 2019
Python 3 only. Python 2.X is no longer supported
Additional Hi-C interaction matrix correction algorithm ‘Knight-Ruiz’ as a C++ module for a faster runtime and less memory usage.
Enriched regions detection tool: ‘hicDetectLoops’ based on strict candidate selection, ‘hicFindEnrichedContacts’ was deleted
Metadata for cooler files is supported: hicBuildMatrix and hicInfo are using it
New options for hicPlotMatrix: –loops to visualize computed loops from hicDetectLoops and –bigwigAdditionalVerticalAxis to display a bigwig track on the vertical axis too.
Release 2.2.3¶
22 March 2019
This bug fix release patches an issue with cooler files, hicBuildMatrix and the usage of a restriction sequence file instead of fixed bin size.
Release 2.2.2¶
27 February 2019
This bug fix release removes reference to hicExport that were forgotten to delete in 2.2. Thanks @BioGeek for this contribution.
Release 2.2.1¶
7 February 2019
Muting log output of matplotlib and cooler
Set version number of hicmatrix to 7
Optional parameter for hicInfo to write the result to a file instead to the bash
Release 2.2¶
18 January 2019
This release contains:
replaced hicExport by hicConvertFormat and hicAdjustMatrix
extended functionality for hicConvertFormat
read support for homer, hicpro, cool, h5
write support for h5, homer, cool
convert hic to cool
creation of mcool matrices
hicAdjustMatrix
remove, keep or mask specified regions from a file, or chromosomes
hicNormalize
normalize matrices to 0 - 1 range or to the read coverage of the lowest given
hicBuildMatrix
support for build mcool
restructuring the central class HiCMatrix to object oriented model and moved to its own library: deeptools/HiCMatrix.
Extended read / write support for file formats
better (faster, less memory) support for cool format
remove of old, unused code
restrict support to h5 and cool matrices, except hicConvertFormat
hicFindTADs: Option to run computation per specified chromosomes
hicPlotTADs: removed code and calls pyGenomeTracks
hicAverageRegions: Sum up in a given range around defined reference points. Useful to detect changes in TAD structures between different samples.
hicPlotAverageRegions: Plots such a average region
hicTransform: Restructuring the source code, remove of option ‘all’ because it was generating confusion. Adding option ‘exp_obs’, exp_obs_norm and exp_obs_lieberman. These three different options use different expectation matrix computations.
hicPCA
Adding –norm option to compute the expected matrix in the way HOMER is doing it. Useful for drosophila genomes
Adding option to write out the intermediate matrices ‘obs_exp’ and ‘pearson’ which are necessary in the computation of the PCA
hicPlotMatrix
Add option to clip bigwig values
Add option to scale bigwig values
Removed hicLog2Ration, functionality is covered by hicCompareMatrices
Extending test cases to cover more source code and be hopefully more stable.
Many small bugfixes
Publication¶
13 June 2018
We are proud to announce our latest publication:
Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. “Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization”, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: https://doi.org/10.1093/nar/gky504
Release 2.1.4¶
25 May 2018
cooler file format correction factors are applied as they should be
parameter ‘–region’ of hicBuildMatrix works with Python 3
Release 2.1.3¶
7 May 2018
The third bugfix release of version 2.1 corrects an error in hicPlotViewpoint. It adds a feature requested in issue #169 which should have been included in release 2.1 but was accidentally not.
From 2.1 release note: hicPlotViewpoint: Adds a feature to plot multiple matrices in one image
Release 2.1.2¶
26 April 2018
The second bug fix release of 2.1 includes:
documentation improvements
fixing broken Readthedocs documentation
Small bug fix concerning hicPlotMatrix and cooler: –chromosomeOrder is now possible with more than one chromosome
Small fixes concerning updated dependencies: Fixing version number a bit more specific and not that strict in test cases delta values.
Release 2.1.1¶
27 March 2018
This release fixes a problem related to python3 in which chromosome names were of bytes type
Release 2.1¶
5 March 2018
The 2.1 version of HiCExplorer comes with new features and bugfixes.
Adding the new feature hicAggregateContacts: A tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions.
Many improvements to the documentation and the help text. Thanks to Gina Renschler and Gautier Richard from the MPI-IE Freiburg, Germany.
hicPlotMatrix
supports only bigwig files for an additional data track.
the argument –pca was renamed to –bigwig
Smoothing the bigwig values to neighboring bins if no data is present there
Fixes to a bug concerning a crash of tight_layout
Adding the possibility to flip the sign of the values of the bigwig track
Adding the possibility to scale the values of the bigwig track
hicPlotViewpoint: Adds a feature to plot multiple matrices in one image
cooler file format
supports mcool files
applies correction factors if present
optionally reads bin[‘weight’]
fixes
a crash in hicPlotTads if horizontal lines were used
checks if all characters of a title are ASCII. If not they are converted to the closest looking one.
Updated and fixate version number of the dependencies
Release 2.0¶
December 21, 2017
This release makes HiCExplorer ready for the future:
Python 3 support
Cooler file format support
A/B comparment analysis
Improved visualizations
bug fixes for
--perChr
option in hicPlotMatrixeigenvector track with
--pca
for hicPlotMatrixvisualization of interactions around a reference point or region with hicPlotViewpoint
Higher test coverage
re-licensing from GPLv2 to GPLv3
Release 1.8.1¶
November 27, 2017
Bug fix release:
a fix concerning the handling chimeric alignments in hicBuildMatrix. Thanks to Aleksander Jankowski @ajank
handling of dangling ends was too strict
improved help message in hicBuildMatrix
Release 1.8¶
October 25, 2017
This release is adding new features and fixes many bugs:
hicBuildMatrix: Added multicore support, new parameters –threads and –inputBufferSize
hicFindTADs:
One call instead of two: hicFindTADs TAD_score and hicFindTADs find_TADs merged to hicFindTADs.
New multiple correction method supported: False discovery rate. Call it with –correctForMultipleTesting fdr and –threshold 0.05.
Update of the tutorial: mES-HiC analysis.
Additional test cases and docstrings to improve the software quality
Fixed a bug occurring with bigwig files with frequent NaN values which resulted in only NaN averages
hicPlotTADs: Support for plotting points
Moved galaxy wrappers to https://github.com/galaxyproject/tools-iuc
Fixed multiple bugs with saving matrices
hicCorrelate: Changes direction of dendograms to left
Release 1.7.2¶
April 3, 2017
Added option to plot bigwig files as a line hicPlotTADs
Updated documentation
Improved hicPlotMatrix –region output
Added compressed matrices. In our tests the compressed matrices are significantly smaller.
March 28, 2017
Release 1.7¶
March 28, 2017
This release adds a quality control module to check the results from hicBuildMatrix. By default, now hicBuildMatrix generates a HTML page containing the plots from the QC measures. The results from several runs of hicBuildMatrix can be combined in one page using the new tool hicQC.
Also, this release added a module called hicCompareMatrices that takes two Hi-C matrices and computes the difference, the ratio or the log2 ratio. The resulting matrix can be plotted with hicPlotMatrix to visualize the changes.
Preprint introducing HiCExplorer is now online¶
March 8, 2017
Our #biorXiv preprint on DNA sequences behind Fly genome architecture is online!
Read the article here : http://biorxiv.org/content/early/2017/03/08/115063
In this article, we introduce HiCExplorer : Our easy to use tool for Hi-C data analysis, also available in Galaxy.
We also introduce HiCBrowser : A standalone software to visualize Hi-C along with other genomic datasets.
Based on HiCExplorer and HiCBrowser, we built a useful resource for anyone to browse and download the chromosome conformation datasets in Human, Mouse and Flies. It’s called the chorogenome navigator
Along with these resources, we present an analysis of DNA sequences behind 3D genome of Flies. Using high-resolution Hi-C analysis, we find a set of DNA motifs that characterize TAD boundaries in Flies and show the importance of these motifs in genome organization.
We hope that these resources and analysis would be useful for the community and welcome any feedback.
HiCExplorer wins best poster prize at VizBi2016¶
March 20, 2016
We are excited to announce that HiCExplorer has won the NVIDIA Award for Best Scientific Poster in VizBi2016, the international conference on visualization of biological data.
This was our poster :

Citation¶
Please cite HiCExplorer as follows:
Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening,Kin Chung Lam, Bianca Habermann, Asifa Akhtar, Thomas Manke. “High-resolution TADs reveal DNA sequences underlying genome organization in flies”. Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w
Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: https://doi.org/10.1093/nar/gky504

This tool suite is developed by the Bioinformatics Unit at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg and by the Bioinformatics Lab of the Albert-Ludwigs-University Freiburg, Germany.