HiCExplorer¶
Set of programs to process, normalize, analyze and visualize Hi-C and cHi-C data¶
HiCExplorer addresses the common tasks of Hi-C data analysis from processing to visualization.

Availability¶
HiCExplorer is available as a command line suite of tools on this GitHub repository.
A Galaxy HiCExplorer version is directly available to users at http://hicexplorer.usegalaxy.eu. Training material is available at the Galaxy Training Network, while a Galaxy Tour is available here for users not familiar with this platform. Galaxy HiCExplorer is also available as a Docker image at the Docker Galaxy HiCExplorer GitHub repository. Finally, this Galaxy version is available on the Galaxy Tool Shed and on the corresponding GitHub repository.
The following is the list of tools available in HiCExplorer¶
tool |
description |
---|---|
Identifies the genomic locations of restriction sites |
|
Creates a Hi-C matrix using the aligned BAM files of the Hi-C sequencing reads |
|
Estimates the quality of Hi-C dataset |
|
Plots QC measures from the output of hicBuildMatrix |
|
Uses iterative correction to remove biases from a Hi-C matrix |
|
Identifies enriched Hi-C contacts |
|
Computes and visualizes the correlation of Hi-C matrices |
|
Identifies Topologically Associating Domains (TADs) |
|
Merges TADs of different resolutions |
|
Computes differential TADs between two Hi-C samples |
|
Computes for A / B compartments the eigenvectors |
|
Computes a obs_exp matrix like Lieberman-Aiden (2009), a pearson correlation matrix and or a covariance matrix. These matrices can be used for plotting. |
|
Merges consecutive bins on a Hi-C matrix to reduce resolution |
|
Uses a BED file of domains or TAD boundaries to merge the bin counts of a Hi-C matrix. |
|
Plot the decay in interaction frequency with distance |
|
Plots a Hi-C matrix as a heatmap |
|
Plots TADs as a track that can be combined with other tracks (genes, signal, interactions) |
|
A plot with the interactions around a reference point or region. |
|
A tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions. |
|
Adds Hi-C matrices of the same size |
|
Plots distance vs. Hi-C counts of corrected data |
|
Shows information about a Hi-C matrix file (no. of bins, bin length, sum, max, min, etc) |
|
Computes difference or ratio between two matrices |
|
Computes the average of multiple given regions, usually TAD regions |
|
visualization of hicAverageRegions |
|
Normalizes the given matrices to 0-1 range or the smallest read coverage |
|
Converts between different Hi-C interaction matrices |
|
Keeps, removes or masks regions in a Hi-C matrix |
|
Compare the loops with known peak protein locations |
|
Merges loops of different resolutions |
|
Compute the global compartmentalization signal |
|
Quality control for cHi-C data |
|
Background model computation for cHi-C analysis |
|
Computation of all viewpoints based on background model for cHi-C analysis |
|
Detection of significant interactions per viewpoint based on background model |
|
Compiling of target regions for two samples as input for differential analysis |
|
Differential analysis of interactions of two samples |
|
Plotting of viewpoint with background model and highlighting of significant and differential regions |
|
Computing short vs long range contacts and plotting the results |
|
Search for optimal hicDectectLoops parameters |
|
Search for optimal Juicer HiCCUPS parameters |
Getting Help¶
For all kind of questions, suggesting changes/enhancements and to report bugs, please create an issue on our GitHub repository
In the past we offered to post on Biostars with Tag hicexplorer : Biostars or on the deepTools mailing list. We still check these resources from time to time but the preferred way to communicate are GitHub issues.
Contents:¶
Installation¶
Requirements¶
python >= 3.6
numpy >= 1.19.*
scipy >= 1.5.*
matplotlib-base >= 3.1.*
ipykernel >= 5.3.0
pysam >= 0.16.*
intervaltree >= 3.1.*
biopython < 1.77
pytables >= 3.6.*
pandas >= 1.1.*
pybigwig >= 0.3.*
jinja2 >= 2.11
unidecode >= 1.1.*
hicmatrix >= 15
hic2cool >= 0.8.3
psutil >= 5.7.*
pygenometracks >= 3.5
fit_nbinom >= 1.1
cooler >= 0.8.10
krbalancing >= 0.0.5 (Needs the library eigen; openmp is recommended for linux users. No openmp support on macOS.)
pybedtools >= 0.8.*
future >= 0.18
tqdm >= 4.50
hyperopt >= 0.2.4
python-graphviz >= 0.14
Warning: Python 2.7 support is discontinued. Moreover, the support for pip is discontinued too. Warning: We strongly recommend to use the conda package manager and will no longer give support on all issues raising with pip.
Command line installation using conda
¶
The fastet way to obtain Python 3.6 or 3.7 together with numpy and scipy is via the Anaconda Scientific Python Distribution. Just download the version that’s suitable for your operating system and follow the directions for its installation. All of the requirements for HiCExplorer can be installed in Anaconda with:
$ conda install hicexplorer -c bioconda -c conda-forge
We strongly recommended to use conda to install HiCExplorer.
Command line installation using pip
¶
The installation via pip is discontinued with version 3.0. The reason for this is that we want to provide a ‘one-click’ installation. However, with version 3.0 we added the C++ library eigen as dependency and pip does not support non-Python packages.
For older versions you can still use pip: Install HiCExplorer using the following command:
$ pip install hicexplorer
All python requirements should be automatically installed.
If you need to specify a specific path for the installation of the tools, make use of pip install’s numerous options:
$ pip install --install-option="--prefix=/MyPath/Tools/hicexplorer" git+https://github.com/deeptools/HiCExplorer.git
Warning: It can be that you have to install additional packages via your system package manager to successfully install HiCExplorer via pip. Warning: We strongly recommend to use the conda package manager and will no longer give support on all issues raising with pip.
Command line installation without pip
¶
You are highly recommended to use pip rather than these more complicated steps.
Install the requirements listed above in the “requirements” section. This is done automatically by pip.
2. Download source code
$ git clone https://github.com/deeptools/HiCExplorer.git
or if you want a particular release, choose one from https://github.com/deeptools/HiCExplorer/releases:
$ wget https://github.com/deeptools/HiCExplorer/archive/1.5.12.tar.gz
$ tar -xzvf
3. To install the source code (if you don’t have root permission, you can set
a specific folder using the --prefix
option)
$ python setup.py install --prefix /User/Tools/hicexplorer
Galaxy installation¶
HiCExplorer can be easily integrated into a local Galaxy, the wrappers are provided at the Galaxy tool shed.
Installation with Docker¶
The HiCExplorer Galaxy instance is also available as a docker container, for those wishing to use the Galaxy framework but who also prefer a virtualized solution. This container is quite simple to install:
$ sudo docker pull quay.io/bgruening/galaxy-hicexplorer
To start and otherwise modify this container, please see the instructions on the docker-galaxy-stable github repository. Note that you must use bgruening/galaxy-hicexplorer in place of bgruening/galaxy-stable in the examples, as the HiCExplorer Galaxy container is built on top of the galaxy-stable container.
Tip
For support, or feature requests contact: deeptools@googlegroups.com
HiCExplorer tools¶
Tools for Hi-C data pre-processing¶
hicFindRestSites¶
Identifies the genomic locations of restriction sites.
usage: hicFindRestSite --fasta mm10.fa --searchPattern AAGCTT -o rest_site_positions.bed
Required arguments¶
- --fasta, -f
Path to fasta file for the organism genome.
- --searchPattern, -p
Search pattern. For example, for HindIII this pattern is “AAGCTT”. Both, forward and reverse strand are searched for a match. The pattern is a regexp and can contain regexp specif syntax (see https://docs.python.org/2/library/re.html). For example the patternCG..GC will find all occurrence of CG followed by any two bases and then GC.
- --outFile, -o
Name for the resulting bed file.
Optional arguments¶
- --version
show program’s version number and exit
Further usage¶
In case multiple restriction enzymes are used in one experiment, hicFindRestSite
can be used to find restriction sites individually per enzyme. Afterwards, all output bed files should be combined. However, it should be noted that the QC report will not be correct for this specific usage.
hicBuildMatrix¶
Using an alignment from a program that supports local alignment (eg. Bowtie2) where both PE reads are mapped using the –local option, this program reads such file and creates a matrix of interactions.
usage: hicBuildMatrix --samFiles two sam files two sam files --outFileName
FILENAME --QCfolder FOLDER --restrictionCutFile BED file
[BED file ...] --restrictionSequence RESTRICTIONSEQUENCE
[RESTRICTIONSEQUENCE ...] --danglingSequence
DANGLINGSEQUENCE [DANGLINGSEQUENCE ...]
[--outBam bam file] [--binSize BINSIZE [BINSIZE ...]]
[--minDistance MINDISTANCE] [--maxDistance MAXDISTANCE]
[--maxLibraryInsertSize MAXLIBRARYINSERTSIZE]
[--genomeAssembly GENOMEASSEMBLY]
[--region CHR:START-END]
[--removeSelfLigation REMOVESELFLIGATION]
[--keepSelfCircles]
[--minMappingQuality MINMAPPINGQUALITY]
[--threads THREADS] [--inputBufferSize INPUTBUFFERSIZE]
[--doTestRun] [--doTestRunLines DOTESTRUNLINES]
[--skipDuplicationCheck] [--chromosomeSizes txt file]
[--help] [--version]
Required arguments¶
- --samFiles, -s
The two PE alignment sam files to process
- --outFileName, -o
Output file name for the Hi-C matrix.
- --QCfolder
Path of folder to save the quality control data for the matrix. The log files produced this way can be loaded into hicQC in order to compare the quality of multiple Hi-C libraries.
- --restrictionCutFile, -rs
BED file(s) with all restriction cut sites (output of “hicFindRestSite” command). Should only contain the restriction sites of the same genome which has been used to generate the input sam files. Using regions of a different genome version can generate false results! To use more than one restriction enzyme, generate a restrictionCutFile for each enzyne and list them space seperated.
- --restrictionSequence, -seq
Sequence of the restriction site, if multiple are used, please list them space seperated. If a dangling sequence is listed at the same time, please preserve the same order.
- --danglingSequence
Sequence left by the restriction enzyme after cutting, if multiple are used, please list them space seperated and preserve the order. Each restriction enzyme recognizes a different DNA sequence and, after cutting, they leave behind a specific “sticky” end or dangling end sequence. For example, for HindIII the restriction site is AAGCTT and the dangling end is AGCT. For DpnII, the restriction site and dangling end sequence are the same: GATC. This information is easily found on the description of the restriction enzyme. The dangling sequence is used to classify and report reads whose 5’ end starts with such sequence as dangling-end reads. A significant portion of dangling-end reads in a sample are indicative of a problem with the re-ligation step of the protocol.
Optional arguments¶
- --outBam, -b
Output bam file to process. Optional parameter. A bam file containing all valid Hi-C reads can be created using this option. This bam file could be useful to inspect the distribution of valid Hi-C reads pairs or for other downstream analyses, but is not used by any HiCExplorer tool. Computation will be significantly longer if this option is set.
- --binSize, -bs
Size in bp for the bins. The bin size depends on the depth of sequencing. Use a larger bin size for libraries sequenced with lower depth. If not given, matrices of restriction site resolution will be built. Optionally for mcool file format: Define multiple resolutions which are all a multiple of the first value. Example: –binSize 10000 20000 50000 will create a mcool file formate containing the three defined resolutions.
- --minDistance
Minimum distance between restriction sites. Restriction sites that are closer than this distance are merged into one. This option only applies if –restrictionCutFile is given (Default: 300).
Default: 300
- --maxDistance
This parameter is now obsolete. Use –maxLibraryInsertSize instead
- --maxLibraryInsertSize
The maximum library insert size defines different cut offs based on the maximum expected library size. This is not the average fragment size but the higher end of the the fragment size distribution (obtained using for example a Fragment Analyzer or a Bioanalyzer) which usually is between 800 to 1500 bp. If this value is not known use the default value. The insert value is used to decide if two mates belong to the same fragment (by checking if they are within this max insert size) and to decide if a mate is too far away from the nearest restriction site (Default: 1000).
Default: 1000
- --genomeAssembly, -ga
The genome the reads were mapped to. Used for metadata of cool file.
- --region, -r
Region of the genome to limit the operation to. The format is chr:start-end. It is also possible to just specify a chromosome, for example –region chr10
- --removeSelfLigation
If set, inward facing reads less than 1000 bp apart and having a restrictionsite in between are removed. Although this reads do not contribute to any distant contact, they are useful to account for bias in the data (for the moment is always True).
Default: True
- --keepSelfCircles
If set, outward facing reads without any restriction fragment (self circles) are kept. They will be counted and shown in the QC plots.
Default: False
- --minMappingQuality
minimum mapping quality for reads to be accepted. Because the restriction enzyme site could be located on top of the read, this may reduce the reported quality of the read. Thus, this parameter may be adjusted if too many low quality (but otherwise perfectly valid Hi-C reads) are found. A good strategy is to make a test run (using the –doTestRun), then checking the results to see if too many low quality reads are present and then using the bam file generated to check if those low quality reads are caused by the read not being mapped entirely (Default: 15).
Default: 15
- --threads
Number of threads. Using the python multiprocessing module. One master process which is used to read the input file into the buffer and one process which is merging the output bam files of the processes into one output bam file. All other threads do the actual computation. Minimum value for the ‘–thread’ parameter is 2. The usage of 8 threads is optimal if you have an HDD. A higher number of threads is only useful if you have a fast SSD. Have in mind that the performance of hicBuildMatrix is influenced by the number of threads, the speed of your hard drive and the inputBufferSize. To clarify: the performance with a higher thread number is not negative influenced but not positive too. With a slow HDD and a high number of threads many threads will do nothing most of the time (Default: 4).
Default: 4
- --inputBufferSize
Size of the input buffer of each thread. 400,000 read pairs per input file per thread is the default value. Reduce this value to decrease memory usage.
Default: 400000
- --doTestRun
A test run is useful to test the quality of a Hi-C experiment quickly. It works by testing only 1,000,000 reads. This option is useful to get an idea of quality control values like inter-chromosomal interactions, duplication rates etc.
Default: False
- --doTestRunLines
Number of lines to consider for the qc test run (Default: 1000000).
Default: 1000000
- --skipDuplicationCheck
Identification of duplicated read pairs is memory consuming. Thus, in case of memory errors this check can be skipped. However, consider running a –doTestRun first to get an estimation of the duplicated reads.
Default: False
- --chromosomeSizes, -cs
File with the chromosome sizes for your genome. A tab-delimited two column layout “chr_name size” is expectedUsually the sizes can be determined from the SAM/BAM input files, however, for cHi-C or scHi-C it can be that at the start or end no data is present. Please consider that this option causes that only reads are considered which are on the listed chromosomes.Use this option to guarantee fixed sizes. An example file is available via UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/dm3/bigZips/dm3.chrom.sizes
- --version
show program’s version number and exit
Please note that the file type extension for the output matrix (--outFileName
) must be given! This can be .h5, .cool or the specializations of cool
.mcool, if the path is given! For .scool files please create one .cool file per cell and merge it together with scHiCExplorer’s scHicMergeToSCool.
Building multicooler matrices¶
hicBuildMatrix
supports building multicooler matrices which are for example needed for visualization with HiGlass.
To do so, use as out file format either .cool or .mcool and define the desired resolutions as –binSize.
hicBuildMatrix
builds the interaction matrix for the highest resolution and merges the bins for the lower resolutions.
The lower resolutions need to be an integer multiplicative of the highest resolution.
$ hicBuildMatrix -s forward.bam reverse.bam -o multi_resolution.cool
--binSize 10000 20000 50000 100000 --QCfolder QC
Introducing with version 3.5 we support multiple restriction and dangling end sequences, and multiple restriction cut site files. Hi-C protocols that use multiple restriction cut enzymes benefit from this and get now an improved QC report. Version 3.5 adds also the support for a chromosome size file which can help to get interaction matrices with a predefined size. Capture Hi-C or single-cell Hi-C data, where it is not guaranteed that reads from all areas of the chromosome are present benefit from this latest improvement.
Missing scaffolds or contigs in the Hi-C matrix¶
Restriction enzymes cut the DNA at their specific restriction cut site sequences. It can occur for scaffolds or contigs (less likely it happens for chromosomes) that it does not contain any of these cut sites. In this case, the reads from this scaffold or contig are considered as invalid and are part of the ‘same restriction fragment’-statistics in the QC report.
hicSumMatrices¶
Adds Hi-C matrices of the same size. Format has to be hdf5 (.h5) or npz. In order to minimize the the loss of information, it is recommended to to sum uncorrected matrices (before hicCorrectMatrix).
usage: hicSumMatrices --matrices .h5 or cooler file format
[.h5 or cooler file format ...] --outFileName
OUTFILENAME [-h] [--version]
Required arguments¶
- --matrices, -m
Space-delimited names of the matrices to add. The matrices must have the same shape/size. You can verify their size by using hicInfo.
- --outFileName, -o
File name to save the resulting matrix. The output is from the same file type as the input. Please add the file ending suffix (either .h5 or .cool), if it is not given, there will be no output.
Optional arguments¶
- --version
show program’s version number and exit
hicMergeMatrixBins¶
Background¶
Depending on the downstream analyses to perform on a Hi-C matrix generated with HiCExplorer, one might need different bin resolutions. For example using hicPlotMatrix to display chromatin interactions of a whole chromosome will not produce any meaningful visualization if it is performed on a matrix at restriction sites resolution. hicMergeMatrixBins address this issue by merging a given number of adjacent bins (determined by --numBins
). To minimize the loss of information (matrix bins with few counts), it is recommended to perform hicMergeMatrixBins on uncorrected matrices (before matrix correction). The matrix correction can be performed after bin merging for downstream analyses using hicCorrectMatrix.
Description¶
Merges bins from a Hi-C matrix. For example, using a matrix containing 5kb bins, a matrix of 50kb bins can be derived using –numBins 10. From one type of downstream analysis to another, different bin sizes are used. For example to call TADs, unmerged matrices are recommended while to display Hi-C matrices, bins of approximately 2000bp usually yield the best representations with hicPlotMatrix for small regions, and even larger bins (50kb) are recommended for whole chromosome representations or for hicPlotDistVsCounts.
usage: hicMergeMatrixBins --matrix matrix.h5 --outFileName OUTFILENAME
--numBins int [--runningWindow] [--help] [--version]
- --matrix, -m
Matrix to reduce in h5 format.
- --outFileName, -o
File name to save the resulting matrix. The output is also a .h5 file. But don’t add the suffix.
- --numBins, -nb
Number of bins to merge.
- --runningWindow
Set to merge for using a running window of length –numBins.
Default: False
- --version
show program’s version number and exit
Usage example¶
Below, we display the example of plotting a Hi-C matrix at the scale of the whole X-chromosome and at the scale of a 1Mb region of the X chromosome. To do this, we will perform two different bin merging using hicMergeMatrixBins on an uncorrected matrix built at restriction site resolution using hicBuildMatrix. To do this, we run the two following commands:
$ hicMergeMatrixBins -m myMatrix.h5 -o myMatrix_merged_nb3.h5 -nb 3
$ hicMergeMatrixBins -m myMatrix.h5 -o myMatrix_merged_nb50.h5 -nb 50
Starting from a matrix myMatrix.h5
with bins of a median length of 529bp, the first command produces a matrix myMatrix_merged_nb3.h5
with bins of a median length of 1661bp while the second command produces a matrix myMatrix_merged_nb50.h5
with bins of a median length of 29798bp. This can be checked with hicInfo.
After correction of these three matrices using hicCorrectMatrix, we can now plot the corrected matrices myMatrix_corrected.h5
, myMatrix_merged_nb3_corrected.h5
and myMatrix_merged_nb50_corrected.h5
at the scale of the whole X-chromosome and at the X:2000000-3000000 region to see the effect of bin merging on the interactions visualization.
$ hicPlotMatrix -m myMatrix_corrected.h5 \
-o myMatrix_corrected_Xchr.png \
--chromosomeOrder X \
-t Restriction_sites_resolution --log1p \
--clearMaskedBins
$ hicPlotMatrix -m myMatrix_merged_nb3_corrected.h5 \
-o myMatrix_merged_nb3_corrected_Xchr.png \
--chromosomeOrder X \
-t Bins_merged_by_3 --log1p \
--clearMaskedBins
$ hicPlotMatrix -m myMatrix_merged_nb50_corrected.h5 \
-o myMatrix_merged_nb50_corrected_Xchr.png \
--chromosomeOrder X \
-t Bins_merged_by_50 --log1p \
--clearMaskedBins
When observed altogether, the plots produced by these three commands show that merging of bins by 50 is the most adequate way to plot interactions for a whole chromosome in Drosophila melanogaster when starting from a matrix with bins of a median length of 529bp.

$ hicPlotMatrix -m myMatrix_corrected.h5 \
-o myMatrix_corrected_Xregion.png \
--region X:2000000-3000000 \
-t Restriction_sites_resolution --log1p \
--clearMaskedBins
$ hicPlotMatrix -m myMatrix_merged_nb3_corrected.h5 \
-o myMatrix_merged_nb3_corrected_Xregion.png \
--region X:2000000-3000000 \
-t Bins_merged_by_3 --log1p \
--clearMaskedBins
$ hicPlotMatrix -m myMatrix_merged_nb50_corrected.h5 \
-o myMatrix_merged_nb50_corrected_Xregion.png \
--region X:2000000-3000000 \
-t Bins_merged_by_50 --log1p \
--clearMaskedBins
When observed altogether, the plots produced by these three commands show that merging of bins by 3 is the most adequate way to plot interactions for a region of 1Mb in Drosophila melanogaster when starting from a matrix with bins of a median length of 529bp.

hicCorrectMatrix¶
This function provides 2 balancing methods which can be applied on a raw matrix.
I. KR: It balances a matrix using a fast balancing algorithm introduced by Knight and Ruiz (2012).
II. ICE: Iterative correction of a Hi-C matrix (see Imakaev et al. 2012 Nature Methods for details). For this method to work correctly, bins with zero reads assigned to them should be removed as they cannot be corrected. Also, bins with low number of reads should be removed, otherwise, during the correction step, the counts associated with those bins will be amplified (usually, zero and low coverage bins tend to contain repetitive regions). Bins with extremely high number of reads can also be removed from the correction as they may represent copy number variations.
To aid in the identification of bins with low and high read coverage, the histogram of the number of reads can be plotted together with the Median Absolute Deviation (MAD).
It is recommended to run hicCorrectMatrix as follows:
$ hicCorrectMatrix diagnostic_plot –matrix hic_matrix.h5 -o plot_file.png
Then, after revising the plot and deciding on the threshold values:
$ hicCorrectMatrix correct –correctionMethod ICE –matrix hic_matrix.h5 –filterThreshold <lower threshold> <upper threshold> -o corrected_matrix
For a more in-depth review of how to determine the threshold values, please visit: http://hicexplorer.readthedocs.io/en/latest/content/example_usage.html#correction-of-hi-c-matrix
We recommend to compute first the normalization (with hicNormalize) and correct the data (with hicCorrectMatrix) in a second step.
usage: hicCorrectMatrix [-h] [--version] ...
Named Arguments¶
- --version
show program’s version number and exit
Options¶
Possible choices: diagnostic_plot, correct
To get detailed help on each of the options:
$ hicCorrectMatrix diagnostic_plot -h
$ hicCorrectMatrix correct -h
Sub-commands:¶
- Plots a histogram of the coverage per bin together with the
modified z-score based on the median absolute deviation method
(see Boris Iglewicz and David Hoaglin 1993, Volume 16: How to Detect and Handle Outliers The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor).
hicCorrectMatrix diagnostic_plot --matrix hic_matrix.h5 -o file.png
- --matrix, -m
Name of the Hi-C matrix to correct in .h5 format.
- --plotName, -o
File name to save the diagnostic plot.
- --chromosomes
List of chromosomes to be included in the iterative correction. The order of the given chromosomes will be then kept for the resulting corrected matrix.
- --xMax
Max value for the x-axis in counts per bin.
- --perchr
Compute histogram per chromosome. For samples from cells with uneven number of chromosomes and/or translocations it is advisable to check the histograms per chromosome to find the most conservative filterThreshold.
Default: False
- --verbose
Print processing status.
Default: False
Run Knight-Ruiz matrix balancing algorithm (KR) or the iterative matrix correction (ICE) .
hicCorrectMatrix correct --matrix hic_matrix.h5 --filterThreshold -1.2 5 (Only if ICE)-out corrected_matrix.h5
- --matrix, -m
Name of the Hi-C matrix to correct in .h5 format.
- --outFileName, -o
File name to save the resulting matrix. The output is a .h5 file.
- --correctionMethod
Possible choices: KR, ICE
Method to be used for matrix correction. It can be set to KR or ICE (Default: “KR”).
Default: “KR”
- --filterThreshold, -t
Removes bins of low or large coverage. Usually these bins do not contain valid Hi-C data or represent regions that accumulate reads and thus must be discarded. Use hicCorrectMatrix diagnostic_plot to identify the modified z-value thresholds. A lower and upper threshold are required separated by space, e.g. –filterThreshold -1.5 5. Applied only for ICE!
- --iterNum, -n
Number of iterations to compute.only for ICE! (Default: 500).
Default: 500
- --inflationCutoff
Value corresponding to the maximum number of times a bin can be scaled up during the iterative correction. For example, an inflation cutoff of 3 will filter out all bins that were expanded 3 times or more during the iterative correctionself.Only for ICE!
- --transCutoff, -transcut
Clip high counts in the top -transcut trans regions (i.e. between chromosomes). A usual value is 0.05. Only for ICE!
- --sequencedCountCutoff
Each bin receives a value indicating the fraction that is covered by reads. A cutoff of 0.5 will discard all those bins that have less than half of the bin covered. Only for ICE!
- --chromosomes
List of chromosomes to be included in the iterative correction. The order of the given chromosomes will be then kept for the resulting corrected matrix
- --skipDiagonal, -s
If set, diagonal counts are not included. Only for ICE!
Default: False
- --perchr
Normalize each chromosome separately. This is useful for samples from cells with uneven number of chromosomes and/or translocations.
Default: False
- --verbose
Print processing status.
Default: False
- --version
show program’s version number and exit
With HiCExplorer version 3.0 we offer an additional Hi-C interaction matrix correction algorithm: Knight-Ruiz.
$ hicCorrectMatrix correct --matrix matrix.cool --correctionMethod KR --chromosomes chrUextra chr3LHet --outFileName corrected_KR.cool
The iterative correction can be used via:
$ hicCorrectMatrix correct --matrix matrix.cool --correctionMethod ICE --chromosomes chrUextra chr3LHet --iterNum 500 --outFileName corrected_ICE.cool --filterThreshold -1.5 5.0
HiCExplorer version 3.1 changes the way data is transferred from Python to C++ for the KR correction algorithm. With these changes the following runtime and peak memory usage on Rao 2014 GM12878 primary + replicate data is possible:
KR on 25kb: 165 GB, 1:08 h
ICE on 25kb: 224 GB, 3:10 h
KR on 10kb: 228 GB, 1:42 h
ICE on 10kb: 323 GB, 4:51 h
KR on 1kb: 454 GB, 16:50 h
ICE on 1kb: >600 GB, > 2.5 d (we interrupted the computation and strongly recommend to use KR on this resolution)
For HiCExplorer versions <= 3.0 KR performs as follows:
KR on 25kb: 159 GB, 57:11 min
KR on 10kb: >980 GB, – (out of memory on 1TB node, we do not have access to a node with more memory on our cluster)
hicNormalize¶
Normalizes given matrices either to the smallest given read number of all matrices or to 0 - 1 range. However, it does NOT compute the contact probability.
We recommend to compute first the normalization (with hicNormalize) and correct the data (with hicCorrectMatrix) in a second step.
usage: hicNormalize --matrices MATRICES [MATRICES ...] --normalize
{norm_range,smallest,multiplicative} --outFileName
FILENAME [FILENAME ...]
[--multiplicativeValue MULTIPLICATIVEVALUE]
[--setToZeroThreshold SETTOZEROTHRESHOLD] [--help]
[--version]
Required arguments¶
- --matrices, -m
The matrix (or multiple matrices) to get information about. HiCExplorer supports the following file formats: h5 (native HiCExplorer format) and cool.
- --normalize, -n
Possible choices: norm_range, smallest, multiplicative
Normalize to a) 0 to 1 range, b) all matrices to the lowest read count of the given matrices (Default: “smallest”).
Default: “smallest”
- --outFileName, -o
Output file name for the Hi-C matrix.
Optional arguments¶
- --multiplicativeValue, -mv
Value to multiply if –normalize is set to multiplicative. (Default: 1).
Default: 1
- --setToZeroThreshold, -sz
A threshold to set all values after normalization to 0 if smaller this threshold. Default value is 0 i.e. there is no effect.It is recommended to set it for the normalize mode “smallest” to 1.0. This parameter will influence the sparsity of the matrix by removing many values close to 0 in smallest normalization mode.
Default: 0.0
- --version
show program’s version number and exit
Background¶
To be able to compare different Hi-C interaction matrices the matrices need to be normalized to a equal level of read coverage or value ranges. hicNormalize accomplish this by offering two modes: 0-1 range normalization or a read count normalization.
Usage example¶
norm_range mode of hicNormalize normalizes all reads to the 0 to 1 range i.e. the maximum value of the interaction matrix becomes 1 and the minimum value 0.
$ hicNormalize -m matrix.cool --normalize norm_range -o matrix_0_1_range.cool
All matrices are normalized in the way the total read count of each matrix is equal to the read count of the matrix with the smallest read count of all input matrices.
matrix.cool with a read count of 10000
matrix2.cool with a read count of 12010
matrix3.cool with a read count of 11000
In this example each entry in matrix2.cool and matrix3.cool are normalized with a factor of 12010 / 10000 respective with 11000 / 10000.
$ hicNormalize -m matrix.cool matrix2.cool matrix3.cool --normalize smallest
-o matrix_normalized.cool matrix2_normalized.cool matrix3_normalized.cool
Tools for Hi-C QC¶
hicQuickQC¶
Background¶
This tool considers the first 1,000,000 reads (or user defined number) of the mapped bam files to get a quality estimate of the Hi-C data.
Description¶
The tool hicQuickQC considers the first n lines of two bam/sam files to get a first estimate of the quality of the data. It is highly recommended to set the restriction enzyme and dangling end parameter to get a good quality report.
usage: hicQuickQC --samFiles two sam files two sam files --QCfolder FOLDER
--restrictionCutFile BED file [BED file ...]
--restrictionSequence RESTRICTIONSEQUENCE
[RESTRICTIONSEQUENCE ...] --danglingSequence
DANGLINGSEQUENCE [DANGLINGSEQUENCE ...] [--lines LINES]
[--help] [--version]
- --samFiles, -s
The two PE alignment sam files to process.
- --QCfolder
Path of folder to save the quality control data of the matrix. The log files produced this way can be loaded into hicQC in order to compare the quality of multiple Hi-C libraries.
- --restrictionCutFile, -rs
BED file(s) with all restriction cut places (output of “findRestSite” command). Should contain only mappable restriction sites. If given, the bins are set to match the restriction fragments (i.e. the region between one restriction site and the next). Alternatively, a fixed binSize can be defined instead. However, either binSize or restrictionCutFile must be defined. To use more than one restriction enzyme, generate for each one a restrictionCutFile and list them space seperated.
- --restrictionSequence, -seq
Sequence of the restriction site, if multiple are used, please list them space seperated. If a dangling sequence is listed at the same time, please preserve the same order.
- --danglingSequence
Sequence left by the restriction enzyme after cutting, if multiple are used, please list them space seperated and preserve the order. Each restriction enzyme recognizes a different DNA sequence and, after cutting, they leave behind a specific “sticky” end or dangling end sequence. For example, for HindIII the restriction site is AAGCTT and the dangling end is AGCT. For DpnII, the restriction site and dangling end sequence are the same: GATC. This information is easily found on the description of the restriction enzyme. The dangling sequence is used to classify and report reads whose 5’ end starts with such sequence as dangling-end reads. A significant portion of dangling-end reads in a sample are indicative of a problem with the re-ligation step of the protocol.
- --lines
Number of lines to consider for the QC test run (Default: 1000000).
Default: 1000000
- --version
show program’s version number and exit
For more information see encode .
hicQC¶
Background¶
This tool can process quality control log files produced by hicBuildMatrix for multiple samples at once to generate summary tables and plots of quality control (QC) measures for all these samples. Additionally, an HTML output is generated where all summary tables and plots are displayed.
Description¶
Tabulates and plots QC measures from hicBuildMatrix log files within an HTML output
usage: hicQC --logfiles matrix1_QCfolder/QC.log matrix2_QCfolder/QC.log --labels "sample 1" "sample 2" --outputFolder QC_all_samples)
- --logfiles, -l
Path to the log files to be processed
- --labels
Label to assign to each log file. Each label should be separated by a space. Quote labels that contain spaces: E.g. –labels label1 “labels 2”
- --outputFolder, -o
Several files with be saved under this folder: A table containing the results and a html file with several images.
- --dpi
Image resolution. By default high resolution png images with a 200 dpi are created.
Default: 200
- --version
show program’s version number and exit
Usage example¶
hicBuildMatrix generates a QC.log file for each processed Hi-C sample in a folder specified in the --QCfolder
argument. The quality control measures stored for each of these samples can be merged in summary tables and plots using hicQC. An example usage is:
$ hicQC --logfiles ./sample_1/QC.log ./sample_2/QC.log /sample_3/QC.log \
--labels "Sample 1" "Sample 2" "Sample 3" \
-o QC_plots
This command will generate several plots that are described in details below.

On the plot above, we can see how many reads were sequenced per sample (Sequenced reads), how many reads were mappable, unique and of high quality and how many reads passed all quality controls and are thus useful for further analysis (Hi-C contacts). All quality controls used for read filtering are explained below.

The figure above contains the fraction of reads with respect to the total number of reads that did not map, that have a low quality score or that didn’t map uniquely to the genome. In our example we can see that Sample 3 has the highest fraction of Hi-C contacts. We explain the differences between the three samples on the plot below.

This figure contains the fraction of read pairs (with respect to mappable and unique reads) that were discarded when building the Hi-C matrix. You can find the description of each category below:
Dangling ends: reads that start with the restriction site and constitute reads that were digested but not ligated. Sample 1 in our example has a high fraction of dangling ends (and thus a low proportion of Hi-C contacts). Reasons for this can be inefficient ligation or insufficient removal of danging ends during samples preparation.
Duplicated pairs: reads that have the same sequence due to PCR amplification. For example, Sample 2 was amplified too much and thus has a very high fraction of duplicated pairs.
Same fragment: read mates facing inward, separated by up to 800bp that do not have a restriction enzyme site in between. These read pairs are not valid Hi-C pairs and are thus discarded from further analyses.
Self circle: read pairs within 25kb with ‘outward’ read orientation.
Self ligation: read pairs with a restriction site in between that are within 800bp.

The figure above contains the fraction of read pairs (with respect to mappable reads) that compose inter chromosomal, intra short range (< 20kb) or intra long range (>= 20kb) contacts. Inter chromosomal reads of a wild-type sample are expected to be low. Trans-chromosomal contacts can be primarily considered as random ligation events. These would be expected to contribute to technical noise that may obscure some of the finer features in the Hi-C datasets (Nagano et al. 2015, Comparison of Hi-C results using in-solution versus in-nucleus ligation, doi: https://doi.org/10.1186/s13059-015-0753-7). As such, a high fraction of inter chromosomal reads is an indicator of low sample quality, but it can also be associated to cell cycle changes (Nagano et al. 2018, Cell-cycle dynamics of chromosomal organization at single-cell resolution, doi: https://doi.org/10.1038/nature23001).
Short range and long range contacts proportions can be associated to how the fixation is performed during Hi-C sample preparation. These two proportions also directly impact the Hi-C corrected counts versus genomic distance plots generated by hicPlotDistVsCounts.

The last figure shows the fractions of the read pair types: inward, outward, left or right read pairs (with respect to mappable reads). Deviations from an equal distribution indicates problems during sample preparation.
hicCorrelate¶
Background¶
hicCorrelate
is a dedicated Quality Control tool that allows the correlation of multiple Hi-C matrices at once with either a heatmap or scatter plots output.
Description¶
Computes pairwise correlations between Hi-C matrices data. The correlation is computed taking the values from each pair of matrices and discarding values that are zero in both matrices.Parameters that strongly affect correlations are bin size of the Hi-C matrices and the considered range. The smaller the bin size of the matrices, the finer differences you score. The –range parameter should be selected at a meaningful genomic scale according to, for example, the mean size of the TADs in the organism you work with.
usage: hicCorrelate --matrices MATRICES [MATRICES ...] [--zMin ZMIN]
[--zMax ZMAX] [--colorMap] [--plotFileFormat FILETYPE]
[--plotNumbers] [--method {pearson,spearman}] [--log1p]
[--labels sample1 sample2 [sample1 sample2 ...]]
[--range RANGE] --outFileNameHeatmap OUTFILENAMEHEATMAP
--outFileNameScatter OUTFILENAMESCATTER
[--chromosomes CHROMOSOMES [CHROMOSOMES ...]] [--help]
[--version]
- --matrices, -m
Matrices to correlate (usually .h5 but other formats are allowed). hicCorrelate is better used on un-corrected matrices in order to exclude any changes introduced by the correction.
Options for generating the correlation heatmap
- --zMin, -min
Minimum value for the heatmap intensities. If not specified the value is set automatically.
- --zMax, -max
Maximum value for the heatmap intensities.If not specified the value is set automatically.
- --colorMap
Color map to use for the heatmap. Available values can be seen here: http://matplotlib.org/examples/color/colormaps_reference.html (Default: “jet”).
Default: “jet”
- --plotFileFormat
Possible choices: png, pdf, svg, eps, emf
Image format type. If given, this option overrides the image format based on the plotFile ending. The available options are: png, emf, eps, pdf and svg.
- --plotNumbers
If set, then the correlation number is plotted on top of the heatmap.
Default: False
- --method
Possible choices: pearson, spearman
Correlation method to use (Default: “pearson”).
Default: “pearson”
- --log1p
If set, then the log1p of the matrix values is used. This parameter has no effect for Spearman correlations but changes the output of Pearson correlation and, for the scatter plot, if set, the visualization of the values is easier.
Default: False
- --labels, -l
User defined labels instead of default labels from file names. Multiple labels have to be separated by space, e.g. –labels sample1 sample2 sample3
- --range
In bp with the format low_range:high_range, for example 1000000:2000000. If –range is given only counts within this range are considered. The range should be adjusted to the size of interacting domains in the genome you are working with.
- --outFileNameHeatmap, -oh
File name to save the resulting heatmap plot.
- --outFileNameScatter, -os
File name to save the resulting scatter plot.
- --chromosomes
List of chromosomes to be included in the correlation.
- --version
show program’s version number and exit
Usage example¶
Below, you can find a correlation example of uncorrected Hi-C matrices obtained from Drosophila melanogaster embryos, either wild-type or having one gene knocked-down by RNAi.
$ hicCorrelate -m Dmel_wt_1.h5 Dmel_wt_2.h5 Dmel_kd_1.h5 Dmel_kd_2.h5 \
--method=pearson --log1p \
--labels Dmel_wt_1 Dmel_wt_2 Dmel_kd_1 Dmel_kd_2 \
--range 5000:200000 \
--outFileNameHeatmap Dmel_heatmap --outFileNameScatter Dmel_scatterplot \
--plotFileFormat png

This example is showing a heatmap that was calculated using the Pearson correlation of un-corrected Hi-C matrices with a bin size of 6000 bp. The dendrogram indicates which samples are most similar to each other. You can see that the wild-type samples are separated from the knock-down samples. The second option we offer is calculating the Spearman correlation.
Additionally, pairwise scatter plots comparing interactions between each sample can be plotted.

hicPlotDistVsCounts¶
Background¶
This tool allows a quick comparison between multiple Hi-C matrices of the Hi-C counts enrichment at different genomic ranges / distances up to whole chromosomes. Biological replicates should display the exact same distribution while samples coming from different cell-lines, treated versus untreated samples or mutant versus wild-type samples can display a different distribution at long and/or close ranges.
The results of this tool usually reflect the proportion of long-range and short-range contacts calculated in each sample by hicQC. Local TAD or contact enrichments will not impact the results computed by this tool, hicPCA is better suited for that purpose.
Description¶
This program creates distance vs. Hi-C counts plots. It can use several matrix files to compare them at once. If the –perchr option is given, each chromosome is plotted independently. When plotting multiple matrices, denser matrices are scaled down to match the sum of the smallest matrix.
usage: hicPlotDistVsCounts --matrices MATRICES [MATRICES ...] --plotFile file
name [--labels LABELS [LABELS ...]]
[--skipDiagonal] [--maxdepth INT bp] [--perchr]
[--chromosomeExclude CHROMOSOMEEXCLUDE [CHROMOSOMEEXCLUDE ...]]
[--outFileData OUTFILEDATA]
[--plotsize PLOTSIZE PLOTSIZE] [--help] [--version]
- --matrices, -m
Hi-C normalized (corrected) matrices. Each path should be separated by a space.
- --plotFile, -o
File name to save the file. The given file ending will be used to determine the image format. The available options are: .png, .emf, .eps, .pdf and .svg.
- --labels
Label to assign to each matrix file. Each label should be separated by a space. Quote labels that contain spaces: E.g. –labels label1 “labels 2”. If no labels are given then the file name is used.
- --skipDiagonal, -s
If set, diagonal counts are not included.
Default: False
- --maxdepth
Maximum distance from diagonal to use. In other words, distances up to maxDepth are computed. Default is 3 million bp.
Default: 3000000
- --perchr
If given, computes and display distance versus Hi-C counts plots for each chromosome stored in the matrices passed to –matrices.
Default: False
- --chromosomeExclude
Exclude the given list of chromosomes. This is useful for example to exclude the Y chromosome. The names of the chromosomes should be separated by space.
- --outFileData
If given, the data underlying the plots is saved on this file.
- --plotsize
Width and height of the plot (in inches). Default is 6*number of cols, 4 * number of rows. The maximum number of rows is 4. Example: –plotsize 6 5
- --version
show program’s version number and exit
Usage example¶
hicPlotDistVsCounts should be used on corrected matrices with large bins (e.g. at least 50kb bins), otherwise the curves will be very spiky and unstable at longer ranges because of the sparseness of the contacts. The tool hicMergeMatrixBins can be used to merge matrix bins and the tool hicCorrectMatrix can be used for matrix correction before using hicPlotDistVsCounts.
hicPlotDistVsCounts -m \
condition1_sample1_50_bins_merged.h5 \
condition1_sampel2_50_bins_merged.h5 \
condition2_sample1_50_bins_merged.h5 \
condition2_sample2_50_bins_merged.h5 \
-o counts_vs_dist_50_bins_merged.png \
--labels 'Cond 1 Sample 1' 'Cond 1 Sample 2' 'Cond 2 Sample 1' 'Cond 2 Sample 2' \
--maxdepth 20000000 \
--plotsize 5 4.2

Here, we see that the samples of the first condition are not so well correlated, but they follow the same tendencies and are distinct from the two samples of the second condition. The later are well correlated and display enriched long-range contacts compared to the samples of the first condition.
hicInfo¶
Prints information about a matrix or matrices including matrix size, number of elements, sum of elements, etc. An example usage is: $ hicInfo -m matrix1.h5 matrix2.h5 matrix3.h5
usage: hicInfo --matrices MATRICES [MATRICES ...] [--outFileName OUTFILENAME]
[--no_metadata] [--help] [--version]
Required arguments¶
- --matrices, -m
The matrix (or multiple matrices) to get information about. HiCExplorer supports the following file formats: h5 (native HiCExplorer format) and cool.
Optional arguments¶
- --outFileName, -o
File name to save information of the matrix instead of writing it to the bash.
- --no_metadata, -nm
Do not use meta data from cooler file to display information. This method is slower and was the default until version 2.2 of HiCExplorer. H5 files always use this parameter.
Default: True
- --version
show program’s version number and exit
An example output looks like this:
File: hic_matrix.h5
WARNING: bin size is not homogeneous. Median 434
Size: 339,684
Sum: 243,779,407
Bin length: 434
Chromosomes: 2L, 2R, 3L, 3R, 4, X
Non-zero elements: 229,122,848
Minimum: 0
Maximum: 2116
NaN bins: 0
With HiCExplorer version 3 we support metadata in cooler files. The default behavior is to use the metadata:
$ hicInfo -m matrix.cool
To use the old method (and default for h5) please add the parameter:
$ hicInfo -m matrix.cool --no_metadata
Tools for Hi-C data analysis¶
hicCompareMatrices¶
Background¶
This tool is useful to compare two matrices in .h5 format by applying operations like difference, ratio or log2ratio after normalization. This can be used to determine the effect of a mutation compared to wild-type samples on contact enrichment, or to see TAD structure modifications near differentially expressed genes between two conditions when followed by hicPlotMatrix. It can also be used to compare two biological replicates.
Description¶
Takes two matrices as input, normalizes them and applies the given operation. To normalize the matrices each element is divided by the sum of the matrix.
usage: hicCompareMatrices --matrices matrix.h5 matrix.h5 --outFileName
OUTFILENAME [--operation {diff,ratio,log2ratio}]
[--help] [--version]
- --matrices, -m
Name of the matrices in .h5 format to use, separated by a space.
- --outFileName, -o
File name to save the resulting matrix. The output is also a .h5 file.
- --operation
Possible choices: diff, ratio, log2ratio
Operation to apply to the matrices (Default: “log2ratio”).
Default: “log2ratio”
- --version
show program’s version number and exit
Usage example¶
hicCompareMatrices
is usually perfomed on corrected matrices (hicCorrectMatrix) with bins merged (hicMergeMatrixBins) depending on the downstream analyses to perform. Here is an example of a log2ratio comparison between M1BP Knockdown and GST cells in Drosophila melanogaster on corrected matrices with 50 bins merged (about 30kb bins).
hicCompareMatrices -m \
M1BP_KD_merge_m50_corrected.h5 \
GST_merge_rf_m50_corrected.h5 \
--operation log2ratio -o m1bp_over_gst_log2_m50.h5
This code outputs a matrix containing the normalized log2ratio values of M1BP_KD_merge_m50_corrected.h5 over GST_merge_rf_m50_corrected.h5. We can then display this matrix using hicPlotMatrix.
hicPlotMatrix -m \
m1bp_over_gst_log2_m50.h5 \
--clearMaskedBins \
--region chr2L:12,000,000-19,000,000 \
--vMin -4 --vMax 4 \
-o m1bp_over_gst_log2_m50_matrix_plot.png

In this plot we see that the cells with a M1BP Knockdown display a negative log2ratio compared to the wild-type. Depletion of M1BP thus show a dramatic effect on the distribution of Hi-C contacts in which short range contacts decrease (Ramirez et al. 2017, High-resolution TADs reveal DNA sequences underlying genome organization in flies, https://doi.org/10.1038/s41467-017-02525-w).
Below you can find an example of a log2ratio plot between Hi-C matrices of two biological replicates, no differences are observable which means that the replicates are well correlated.

hicPCA¶
Computes PCA eigenvectors for a Hi-C matrix.
$ hicPCA –matrix hic_matrix.h5 -o pca1.bedgraph pca2.bedgraph
usage: hicPCA --matrix MATRIX --outputFileName OUTPUTFILENAME
[OUTPUTFILENAME ...]
[--numberOfEigenvectors NUMBEROFEIGENVECTORS]
[--format {bedgraph,bigwig}]
[--chromosomes CHROMOSOMES [CHROMOSOMES ...]]
[--method {dist_norm,lieberman}] [--ligation_factor]
[--extraTrack EXTRATRACK] [--histonMarkType HISTONMARKTYPE]
[--pearsonMatrix PEARSONMATRIX] [--obsexpMatrix OBSEXPMATRIX]
[--ignoreMaskedBins] [--help] [--version]
Required arguments¶
- --matrix, -m
HiCExplorer matrix in h5 format.
- --outputFileName, -o
File names for the result of the pca.Number of output files must match the number of computed eigenvectors.
Optional arguments¶
- --numberOfEigenvectors, -noe
The number of eigenvectors that the PCA should compute (Default: 2).
Default: 2
- --format, -f
Possible choices: bedgraph, bigwig
Output format. Either bedgraph or bigwig (Default: “bigwig”).
Default: “bigwig”
- --chromosomes
List of chromosomes to be included in the correlation.
- --method
Possible choices: dist_norm, lieberman
possible methods which can be used to build the obs-exp matrix are dist_norm and lieberman (Default: “dist_norm”).
Default: “dist_norm”
- --ligation_factor
Setting this flag multiplies a scaling factor to each entry of the expected matrix to take care of the proximity ligation as has been explained in Homer software. This flag is only affective with dist_norm method and will be ignored if lieberman method is chosen.
Default: False
- --extraTrack
Either a gene track or a histon mark coverage file (preferably a broad mark) is needed to decide if the values of the eigenvector need a sign flip or not.
- --histonMarkType
Set it to active or inactive. This is only necessary if a histon mark coverage file is given as an extraTrack (Default: “active”).
Default: “active”
- --pearsonMatrix, -pm
Internally the input matrix is converted per chromosome to obs_exp matrix and consecutively to a Pearson matrix. Set this parameter to write the pearson matrix to a file.
- --obsexpMatrix, -oem
Internally the input matrix is converted per chromosome to obs_exp matrix and consecutively to a Pearson matrix. Set this parameter to write the observe/expected matrix to a file.
- --ignoreMaskedBins
Mask bins are usually set to 0. This option removes the masked bins before the PCA is computed. Attention: this will lead to empty PCA regions.
Default: False
- --version
show program’s version number and exit
hicTransform¶
Converts the (interaction) matrix to different types of obs/exp, pearson or covariance matrix.
usage: hicTransform --matrix MATRIX --outFileName OUTFILENAME
[--method {obs_exp,obs_exp_lieberman,obs_exp_non_zero,pearson,covariance}]
[--ligation_factor]
[--chromosomes CHROMOSOMES [CHROMOSOMES ...]]
[--perChromosome] [--help] [--version]
Required arguments¶
- --matrix, -m
input file. The computation is done per chromosome.
- --outFileName, -o
File name to save the exported matrix.
Optional arguments¶
- --method, -me
Possible choices: obs_exp, obs_exp_lieberman, obs_exp_non_zero, pearson, covariance
Transformation methods to use for input matrix. Transformation is computed per chromosome.obs_exp computes the expected matrix as the sum per genomic distance j divided by maximal possible contacts: sum(diagonal(j) / number of elements in diagonal(j) obs_exp_lieberman computes the expected matrix as the sum per genomic distance j divided by the : sum(diagonal(j) / (length of chromosome - j))obs_exp_non_zero computes the expected matrix as the sum per genomic distance j divided by sum of non-zero contacts: sum(diagonal(j) / number of non-zero elements in diagonal(j)Optionally, ``–ligation_factor` can be used for this method as has been used by HOMER software. If –ligation_factor, then exp_i,j = exp_i,j * sum(row(i)) * sum(row(j)) / sum(matrix)pearson computes the Pearson correlation of the input matrix: Pearson_i,j = C_i,j / sqrt(C_i,i * C_j,j) and C is the covariance matrixcovariance computes the Covariance of the input matrix: Cov_i,j = E[M_i, M_j] - my_i * my_j where M is the input matrix and my the mean (Default: “obs_exp”).
Default: “obs_exp”
- --ligation_factor
Setting this flag, multiplies a scaling factor to each entry of the expected matrix to take care of the proximity ligation as has been explained in Homer software. This flag is only affective with obs_exp_non_zero method and will be ignored if any other obs/exp method is chosen.
Default: False
- --chromosomes
List of chromosomes to be included in the computation.
- --perChromosome, -pc
Each chromosome is processed individually, inter-chromosomal interactions are ignored. Option not valid for obs_exp_lieberman.
Default: False
- --version
show program’s version number and exit
Background¶
hicTransform transforms a given input matrix into a new matrix using one of the following methods:
obs_exp
obs_exp_lieberman
obs_exp_non_zero
pearson
covariance
All expected values are computed per genomic distances.
Usage¶
$ hicTransform -m matrix.cool --method obs_exp -o obs_exp.cool
For all images data from Rao 2014 was used.
Observed / Expected¶
All values, including non-zero values, are used to compute the expected values per genomic distance.

Observed / Expected lieberman¶
The expected matrix is computed in the way as Lieberman-Aiden used it in the 2009 publication. It is quite similar to the obs/exp matrix computation.

Observed / Expected non zero¶
Only non-zero values are used to compute the expected values per genomic distance, i.e. only non-zero values are taken into account for the denominator.

By adding the –ligation_factor flag, the expected matrix can be re-scaled in the same way as has been done by Homer software when computing observed/expected matrices with the option ‘-norm’.

Pearson correlation matrix¶
C is the covariance matrix


The first image shows the Pearson correlation on the original interaction matrix, the second one shows the Person correlation matrix on an observed/expected matrix. A consecutive computation like this is used in the A/B compartment computation.
Covariance matrix¶
where M is the input matrix and \(\mu\) the mean.
hicAverageRegions¶
Sums Hi-C contacts around given reference points and computes their average. This tool is useful to detect differences at certain reference points as for example TAD boundaries between samples.
WARNING: This tool can only be used with fixed bin size Hi-C matrices. No guarantees how and if it works on restriction site interaction matrices.
usage: hicAverageRegions --matrix MATRIX --regions REGIONS
(--range RANGE RANGE | --rangeInBins RANGEINBINS RANGEINBINS)
--outFileName OUTFILENAME [--help]
[--coordinatesToBinMapping {start,center,end}]
[--version]
Named Arguments¶
- --range, -ra
Range of region up- and downstream of each region to include in genomic units.
- --rangeInBins, -rib
Range of region up- and downstream of each region to include in bin units.
Required arguments¶
- --matrix, -m
The matrix to use for the average of TAD regions.
- --regions, -r
BED file which stores a list of regions that are summed and averaged
- --outFileName, -o
File name to save the average regions TADs matrix.
Optional arguments¶
- --coordinatesToBinMapping, -cb
Possible choices: start, center, end
If the region contains start and end coordinates, define if the start, center (start + (end-start) / 2) or end bin should be used as start for range.This parameter is only important to set if the given start and end coordinates are not in the same bin (Default: “start”).
Default: “start”
- --version
show program’s version number and exit
hicAverageRegions takes as input a BED files with genomic positions, these are the reference points to sum up and average the regions up- and downstream of all these positions, good reference points are e.g. the borders of TADs. This can help to determine changes in the chromatin organization and TAD structure changes.
In the following example the 10kb resolution interaction matrix of Rao 2014 is used.
The first step computes the TADs for chromosome 1.
$ hicFindTADs -m GSE63525_GM12878_insitu_primary_10kb_KR.cool --outPrefix TADs
--correctForMultipleTesting fdr --minDepth 30000 --maxDepth 100000
--step 10000 -p 20 --chromosomes 1
Next, we use the domains.bed file of hicFindTADs to use the borders of TADs as reference points. As a range up- and downstream of each reference point 100kb are chosen.
$ hicAverageRegions -m GSE63525_GM12878_insitu_primary_10kb_KR.cool
-r TADs_domains.bed --range 100000 100000 --outFileName primary_chr1
In a last step, the computed average region is plotted.
$ hicPlotAverageRegions -m primary_chr1.npz -o primary_chr1.png

hicDetectLoops¶
hicDetectLoops can detect enriched interaction regions (peaks / loops) based on a strict candidate selection, negative binomial distributions and Wilcoxon rank-sum tests.
The algorithm was mainly develop on GM12878 cells from Rao 2014 on 10kb and 5kb fixed bin size resolution.
Example usage¶
$ hicDetectLoops -m matrix.cool -o loops.bedgraph --maxLoopDistance 2000000 --windowSize 10 --peakWidth 6 --pValuePreselection 0.05 --pValue 0.05
The candidate selection is based on the restriction of the maximum genomic distance, here 2MB. This distance is given by Rao 2014. For each genomic distance
a continuous negative binomial distribution is computed and only interaction pairs with a threshold less than --pValuePreselection
are accepted.
In a second step, each candidate is considered compared to its neighborhood. This neighborhood is defined by the --windowSize
parameter in the x and y dimension.
Per neighborhood only one candidate is considered, therefore only the candidate with the highest peak values is accepted. As a last step,
the neighborhood is split into a peak and background region (parameter --peakWidth
). The peakWidth can never be larger than the windowSize. However, we recommend
for 10kb matrices a windowSize of 10 and a peakWidth of 6.
With version 3.5 a major revision of this tool was published. The biggest changes are:
The introduction of an observed/expected matrix as one of the first steps. Based on it, the other calculations compute without a distance dependent factor and the results are more accurate.
The testing of peak vs background region with the donut layout as proposed by HiCCUPS with Wilcoxon rank-sum test. Anderson-Darling test is removed.
Improving the handling of the parallelization and rewrote of the merging of candidates in one neighborhood. Results in faster execution time and less memory demand.
Loading only the interactions within the range of maxLoopDistance. This is possible with HiCMatrix version 13 and results in faster load time and a reduced memory peak. This improvement is only for cool matrices, h5 matrices do not profit from these changes.
hicDetectLoops has many parameters and it can be quite difficult to search for the best parameter setting for your data. With version 3.5 we introduce therefore two new tools hicHyperoptDetectLoops and hicHyperoptDetectLoopsHiCCUPS. The first one searches for the optimal parameter setting for your data based on HiCExplorer’s hicDetectLoops. However, if you want to compare the results to Juicer HiCCUPS, the second tool provides a parameter search for it. Please note that HiCCUPS and any dependency of it are not provided by HiCExplorer and must be installed by the user on its own.
The output file (´´-o loops.bedgraph``) contains the x and y position of each loop and its corresponding p-value of the statistical test.
1 120000000 122500000 1 145000000 147500000 0.001
The results can visualized using hicPlotMatrix:
$ hicPlotMatrix -m matrix.cool -o plot.png --log1p --region 1:18000000-22000000 --loops loops.bedgraph

Computes enriched regions (peaks) or long range contacts on the given contact matrix.
usage: hicDetectLoops --matrix MATRIX --outFileName OUTFILENAME
[--peakWidth PEAKWIDTH] [--windowSize WINDOWSIZE]
[--pValuePreselection PVALUEPRESELECTION]
[--peakInteractionsThreshold PEAKINTERACTIONSTHRESHOLD]
[--obsExpThreshold OBSEXPTHRESHOLD] [--pValue PVALUE]
[--maxLoopDistance MAXLOOPDISTANCE]
[--chromosomes CHROMOSOMES [CHROMOSOMES ...]]
[--threads THREADS]
[--threadsPerChromosome THREADSPERCHROMOSOME]
[--expected {mean,mean_nonzero,mean_nonzero_ligation}]
[--help] [--version]
- --matrix, -m
The matrix to compute the loop detection on.
- --outFileName, -o
Outfile name to store the detected loops. The file will in bedgraph format.
- --peakWidth, -pw
The width of the peak region in bins. The square around the peak will include (2 * peakWidth)^2 bins (Default: 2).
Default: 2
- --windowSize, -w
The window size for the neighborhood region the peak is located in. All values from this region (exclude the values from the peak region) are tested against the peak region for significant difference. The square will have the size of (2 * windowSize)^2 bins (Default: 5).
Default: 5
- --pValuePreselection, -pp
Only candidates with p-values less the given threshold will be considered as candidates. For each genomic distance a negative binomial distribution is fitted and for each pixel a p-value given by the cumulative density function is given. This does NOT influence the p-value for the neighborhood testing. Can a single value or a threshold file created by hicCreateThresholdFile (Default: 0.1).
Default: 0.1
- --peakInteractionsThreshold, -pit
The minimum number of interactions a detected peaks needs to have to be considered (Default: 10).
Default: 10
- --obsExpThreshold, -oet
The minimum number of obs/exp interactions a detected peaks needs to have to be considered (Default: 1.5).
Default: 1.5
- --pValue, -p
Rejection level for Anderson-Darling or Wilcoxon-rank sum test for H0. H0 is peak region and background have the same distribution (Default: 0.025).
Default: 0.025
- --maxLoopDistance
Maximum genomic distance of a loop, usually loops are within a distance of ~2MB (Default: 2000000).
Default: 2000000
- --chromosomes
Chromosomes to include in the analysis. If not set, all chromosomes are included.
- --threads, -t
Number of threads to use, the parallelization is implemented per chromosome (Default: 4).
Default: 4
- --threadsPerChromosome, -tpc
Number of threads to use per parallel thread processing a chromosome. E.g. –threads = 4 and –threadsPerChromosome = 4 makes 4 * 4 = 16 threads in total (Default: 4).
Default: 4
- --expected, -exp
Possible choices: mean, mean_nonzero, mean_nonzero_ligation
Method to compute the expected value per distance: Either the mean (mean), the mean of non-zero values (mean_nonzero) or the mean of non-zero values with ligation factor correction (mean_nonzero_ligation) (Default: “mean”).
Default: “mean”
- --version
show program’s version number and exit
hicValidateLocations¶
hicValidateLoops is a tool to compare the detect loops from hicDetectLoops (or from any other software as long as the data format is followed, see below) with known peak protein locations to validate if the computed loops do have the expected anchor points. For example, loops in mammals are usually bound by CTCF or Cohesin, therefore it is important to know if the detect loops have protein peaks at their X and Y position.

Loops in Hi-C, graphic from Bonev & Cavalli, Nature Reviews Genetics 2016¶
Data format¶
The data format of hicDetectLoops output is:
chr_x start_x end_x chr_y start_y end_y p-value
As --protein
the input of narrowPeak or broadPeak files are accepted. However, as long as the --protein
input file contains chromosome, start and end in the first three columns, it should work.
This script overlaps the loop locations with protein locations to determine the accuracy of the loop detection. Loops need to have format as follows:
chr start end chr start end
The protein peaks need to be in narrowPeaks or broadPeak format.
A protein match is successfull if at the bin of the x and y location a protein peak is overlapped. A bin is assumed to have a protein if one or more protein peaks falling within the bin region. The value of the protein is not considered, only match or non-match.
usage: hicValidateLocations --data DATA --protein PROTEIN [--method {loops}]
--resolution RESOLUTION
[--outFileName OUTFILENAME] [--addChrPrefixLoops]
[--addChrPrefixProtein] [--help] [--version]
- --data, -d
The loop file from hicDetectLoops. To use files from other sources, please follow ‘chr start end chr start end’ format.
- --protein, -p
The protein peak file. Can be narrowPeak or broadPeak
- --method, -m
Possible choices: loops
The method used (for the moment only loop is possible) (Default: “loops”).
Default: “loops”
- --resolution, -r
The used resolution of the Hi-C interaction matrix.
- --outFileName, -o
The prefix name of the output files. Two file are written: output_matched_locations and output_statistics.First file contains all loop locations with protein location matches, second file contains statistics about this matching.
- --addChrPrefixLoops, -cl
Adding a ‘chr’-prefix to chromosome name of the loops.
Default: False
- --addChrPrefixProtein, -cp
Adding a ‘chr’-prefix to chromosome name of the protein.
Default: False
- --version
show program’s version number and exit
hicMergeLoops¶
This script merges the locations of loops detected at several resolutions.
Loops in the inputFiles need to have the following format:
chr start end chr start end
Loops are merged if the x and y position of a loop overlap with the x and y position of another loop; all loops are considered as an overlap within +/- the bin size of the lowest resolution. I.e. for a loop with coordinates x and y, the overlap with all other loops is checked for (x - lowest resolution) and (y + lowest resolution). If two or more locations are to be merged, the loop at the lowest resolution is taken as the merged loop.
Example usage:
$ hicMergeLoops -i gm12878_10kb.bedgraph gm12878_5kb.bedgraph gm12878_25kb.bedgraph -o merged_result.bedgraph -r 25000
Please recall: We work with binned data i.e. the lowest resolution is therefore the one where we merge the most bases into one bin. In the above example the lowest resultion is 25kb, the highest resolution is 5kb.
usage: hicMergeLoops --inputFiles INPUTFILES [INPUTFILES ...] --outFileName
OUTFILENAME --lowestResolution LOWESTRESOLUTION [--help]
[--version]
Required arguments¶
- --inputFiles, -i
The loop files from hicDetectLoops. To use files from other sources, please follow ‘chr start end chr start end’ format and remove any header.
- --outFileName, -o
The name of the merged loop file.
- --lowestResolution, -r
The lowest resolution of all loop files, i.e. 5kb, 10kb and 25kb, please use 25000.
Optional arguments¶
- --version
show program’s version number and exit
hicHyperoptDetectLoops¶
usage: hicHyperoptDetectLoops --matrix MATRIX --proteinFile PROTEINFILE
--maximumNumberOfLoops MAXIMUMNUMBEROFLOOPS
[--outputFileName OUTPUTFILENAME]
[--resolution RESOLUTION] [--threads THREADS]
[--runs RUNS] [--help] [--version]
Required arguments¶
- --matrix, -m
The matrix to compute the loops on.
- --proteinFile, -p
The protein file to validate the detected loops
- --maximumNumberOfLoops, -ml
The maximum number of loops that should be used for optimization computation.
- --outputFileName, -o
File names for the result of the optimization (Default: “hyperopt_result.txt”).
Default: “hyperopt_result.txt”
Optional arguments¶
- --resolution, -re
Resolution of matrix (Default: 10000).
Default: 10000
- --threads, -t
Number of threads (uses the python multiprocessing module) (Default: 4).
Default: 4
- --runs, -r
Number of runs of hyperopt (Default: 100).
Default: 100
- --version
show program’s version number and exit
hicHyperoptDetectLoopsHiCCUPS¶
usage: hicHyperoptDetectLoopsHiCCUPS --matrix MATRIX --proteinFile PROTEINFILE
--maximumNumberOfLoops
MAXIMUMNUMBEROFLOOPS --juicerPath
JUICERPATH
[--outputFileName OUTPUTFILENAME]
[--resolution RESOLUTION] [--runs RUNS]
[--threads THREADS] --normalization
NORMALIZATION [--cpu] [--restricted]
[--help] [--version]
Required arguments¶
- --matrix, -m
The matrix to compute the loops on.
- --proteinFile, -p
The protein file to validate the detected loops
- --maximumNumberOfLoops, -ml
The maximum number of loops that should be used for optimization computation.
- --juicerPath, -j
path to juicer.jar
- --outputFileName, -o
File names for the result of the optimization (Default: “hyperoptHiCCUPS_result.txt”).
Default: “hyperoptHiCCUPS_result.txt”
Optional arguments¶
- --resolution, -r
Resolution of matrix (Default: 10000).
Default: 10000
- --runs
Number of runs of hyperopt.
Default: 100
- --threads, -t
Number of threads (uses the python multiprocessing module) (Default: 4).
Default: 4
- --normalization, -k
Normalization table name
- --cpu
use the CPU version
Default: False
- --restricted
If the GPU version is used, search only within 8 MB.
Default: False
- --version
show program’s version number and exit
hicCompartmentalization¶
Rearrange the average interaction frequencies using the first PC values to represent the global compartmentalization signal. To our knowledge this has been first introduced and implemented by Wibke Schwarzer et al. 2017 (Nature. 2017 Nov 2; 551(7678): 51–56)
$ hicCompartmentalization –obsexp_matrices obsExpMatrix.h5 –pca pc1.bedgraph -o global_signal.png
usage: hicCompartmentalization --obsexp_matrices OBSEXP_MATRICES
[OBSEXP_MATRICES ...] --pca PCA
--outputFileName OUTPUTFILENAME
[--quantile QUANTILE] [--outliers OUTLIERS]
[--outputMatrix OUTPUTMATRIX]
[--offset OFFSET [OFFSET ...]] [-h] [--version]
Required arguments¶
- --obsexp_matrices, -m
HiCExplorer matrices in h5/cool format.
- --pca
a PCA vector as a bedgraph file with no header. In case of several matrices with different conditions, ie. controltreatment, the PCA of control can be used. Note that only one PCA can be provided.
- --outputFileName, -o
Plot to represent the polarization of A/B compartments.
Optional arguments¶
- --quantile, -q
number of quantiles. (Default: 30).
Default: 30
- --outliers
precentage of outlier to remove. (Default: 0).
Default: 0
- --outputMatrix
output .npz file includes all the generated matrices
- --offset
set nan for the distances mentioned as offset from main diagonal, only positive values are accepted! Example: if –offset 0, then values of main diagonal will set to nan and if –offset 0 1 then on top of the main diagonal, +1 and -1 diagonal values are also set to nan.
- --version
show program’s version number and exit
PCA to compute the global compartmentalization signal¶
To our knowledge this method has been first introduced by Wibke Schwarzer et al. 2017 (Nature. 2017 Nov 2; 551(7678): 51–56). In this method, a global (genome-wide) strength for compartmentalization is computed as (AA + BB) / (AB + BA) after rearranging the bins of obs/exp based on their corresponding pc1 values. For this purpose, first pc1 values are reordered incrementally, then the same order of bins is used to rearrange the bins in obs/exp matrix.
$ _hicCompartmentalization --obsexp_matrices obsExpMatrix.h5 --pca pc1.bedgraph
-o global_signal.png
hicPlotSVL¶
Description¶
hicPlotSVL computes the ratio between short range and long range contacts per chromosome independently. Per sample one box plot is created and, if more than one sample is given, the computed ratios are assumed to be one distribution and a Wilcoxon rank-sum test under H0 ‘distributions are equal’ is computed. All used data is written to a third raw data file.
The distance to distinct short and long range contacts can be set via the –distance parameter; for short range the sum for all contacts smaller or equal this distance are computed, for long range contacts all contacts greater this distance.
Usage example¶
$ hicPlotSVL -m hmec_10kb.cool nhek_10kb.cool hmec_10kb.cool --distance 2000000 --threads 4 --plotFileName plot.png --outFileName pvalues.txt --outFileNameData rawData.txt
This results in three files:
The raw data containing the sums for short range, long range and ratio per sample and chromosome.
# Created with HiCExplorer's hicPlotSVL 3.3
# Short range vs long range contacts per chromosome: raw data
# Short range contacts: <= 2000000
# hmec_10kb.cool nhek_10kb.cool hmec_10kb.cool
# Chromosome Ratio Sum <= 2000000 Sum > 2000000 Ratio Sum <= 2000000 Sum > 2000000 Ratio Sum <= 2000000 Sum > 2000000
1 3.0399769346724543 33476834 11012200 2.79740105237572 44262902 15822866 3.0399769346724543 33476834 11012200
2 2.7532203542810625 31723954 11522490 2.5007877714355438 47468438 18981394 2.7532203542810625 31723954 11522490
3 2.922650759458664 26251027 8981924 2.6235211241878442 39640848 15109788 2.922650759458664 26251027 8981924
4 2.7235598858451637 22474680 8251950 2.5572455199457864 37486882 14659086 2.7235598858451637 22474680 8251950
5 2.9585962905193712 22716268 7678056 2.752922527526723 35445722 12875670 2.9585962905193712 22716268 7678056
6 3.168274165465025 22872690 7219290 2.8602111006131703 33990211 11883812 3.168274165465025 22872690 7219290
7 3.1093346580597516 19603416 6304698 2.8021236966788887 29712823 10603680 3.1093346580597516 19603416 6304698
8 3.135391026076832 18355087 5854162 2.7964394470859024 28660624 10248970 3.135391026076832 18355087 5854162
9 4.1147978383348125 15395763 3741560 3.819940066283481 21994046 5757694 4.1147978383348125 15395763 3741560
10 3.448063050802953 17964043 5209894 3.1116673856502253 26270171 8442474 3.448063050802953 17964043 5209894
11 3.5924666993070407 18651850 5191934 3.1364875011923035 26240350 8366158 3.5924666993070407 18651850 5191934
12 3.6817551043464416 18640866 5063038 3.306662109403207 26101554 7893626 3.6817551043464416 18640866 5063038
13 3.476204237522881 11018462 3169682 3.0976674036654805 18922281 6108558 3.476204237522881 11018462 3169682
14 3.70550850832778 11164875 3013048 3.6226817463785164 17245704 4760480 3.70550850832778 11164875 3013048
15 4.607631079612186 11165313 2423222 4.567998349104569 15273742 3343640 4.607631079612186 11165313 2423222
16 4.397874357146307 10745775 2443402 3.890983210350018 14666462 3769346 4.397874357146307 10745775 2443402
17 5.809374740402161 12168235 2094586 5.3360710927739285 14154110 2652534 5.809374740402161 12168235 2094586
18 3.7647349280938895 9339833 2480874 3.485487446356812 15019063 4309028 3.7647349280938895 9339833 2480874
19 6.492239632778196 8466283 1304062 5.774337450385819 9368978 1622520 6.492239632778196 8466283 1304062
20 5.542933774973686 8962935 1617002 4.977679877778358 12009479 2412666 5.542933774973686 8962935 1617002
21 6.665622315255486 3910374 586648 6.1843701763589225 6554715 1059884 6.665622315255486 3910374 586648
22 8.063663557923096 4992327 619114 7.433759425439728 5932928 798106 8.063663557923096 4992327 619114
X 2.208752982178897 14424173 6530460 2.3130534357407995 27628734 11944702 2.208752982178897 14424173 6530460
Y 4.165021803993573 36294 8714 3.8063291139240505 45105 11850 4.165021803993573 36294 8714
MT
The p-values between the samples:
# Created with HiCExplorer's hicPlotSVL 3.3
# Short range vs long range contacts per chromosome, p-values of each distribution against each other distribution with Wilcoxon rank-sum
# Short range contacts: <= 2000000
hmec_10kb.cool nhek_10kb.cool 0.28362036331636575
hmec_10kb.cool hmec_10kb.cool 1.0
nhek_10kb.cool hmec_10kb.cool 0.28362036331636575
The box plot:

Tools for TADs processing¶
hicFindTADs¶
Description¶
Uses a measure called TAD-separation score to identify the degree of separation between the left and right regions at each Hi-C matrix bin. This is done for a running window of different sizes. Then, TADs are called as those positions having a local TAD-separation score minimum. The TAD-separation score is measured using the z-score of the Hi-C matrix and is defined as the mean zscore of all the matrix contacts between the left and right regions (diamond).
To find the TADs, the program needs to compute first the TAD scores at different window sizes. Then, the results of that computation are used to call the TADs. This is convenient to test different filtering criteria quickly as the demanding step is the computation of TAD-separation scores.
A simple example usage is:
$ hicFindTads -m hic_matrix.h5 –outPrefix TADs –correctForMultipleTesting fdr
The bedgraph file produced by this tool can be used to plot the so-called insulation score along the genome or at specific regions. This score is much more reliable across samples than the number of TADs or the TADs width that can vary depending on the sequencing depth because of the lack of information at certain bins, and depending on the parameters used with this tool.
usage: hicFindTADs --matrix MATRIX --outPrefix OUTPREFIX
--correctForMultipleTesting {fdr,bonferroni,None}
[--minDepth INT bp] [--maxDepth INT bp] [--step INT bp]
[--TAD_sep_score_prefix TAD_SEP_SCORE_PREFIX]
[--thresholdComparisons THRESHOLDCOMPARISONS]
[--delta DELTA] [--minBoundaryDistance MINBOUNDARYDISTANCE]
[--chromosomes CHROMOSOMES [CHROMOSOMES ...]]
[--numberOfProcessors NUMBEROFPROCESSORS] [--help]
[--version]
- --matrix, -m
Corrected Hi-C matrix to use for the computations.
- --outPrefix
File prefix to save the resulting files: 1. <prefix>_tad_separation.bm The format of the output file is chrom start end TAD-sep1 TAD-sep2 TAD-sep3 .. etc. We call this format a bedgraph matrix and can be plotted using hicPlotTADs. Each of the TAD-separation scores in the file corresponds to a different window length starting from –minDepth to –maxDepth. 2. <prefix>_zscore_matrix.h5, the z-score matrix used for the computation of the TAD-separation score. 3. < prefix > _boundaries.bed, which contains the positions of boundaries. The genomic coordinates in this file correspond to the resolution used. Thus, for Hi-C bins of 10.000bp the boundary position is 10.000bp long. For restriction fragment matrices the boundary position varies depending on the fragment length at the boundary. 4. <prefix>_domains.bed contains the TADs positions. This is a non-overlapping set of genomic positions. 5. <prefix>_boundaries.gff Similar to the boundaries bed file but with extra information (p-value, delta). 6. <prefix>_score.bedgraph file contains the TAD-separation score measured at each Hi-C bin coordinate. Is useful to visualize in a genome browser. The delta and p-value settings are saved as part of the name.
- --correctForMultipleTesting
Possible choices: fdr, bonferroni, None
Select the bonferroni or false discovery rate for a multiple comparison. Bonferroni controls the family-wise error rate (FWER) and needs a p-value. The false discovery rate (FDR) controls the likelyhood of type I errors and needs a q-value. As a third option it is possible to not use a multiple comparison method at all (Default: “fdr”).
Default: “fdr”
- --minDepth
Minimum window length (in bp) to be considered to the left and to the right of each Hi-C bin. This number should be at least 3 times as large as the bin size of the Hi-C matrix.
- --maxDepth
Maximum window length to be considered to the left and to the right of the cut point in bp. This number should around 6-10 times as large as the bin size of the Hi-C matrix.
- --step
Step size when moving from –minDepth to –maxDepth. Note, the step size grows exponentially as minDeph + (step * int(x)**1.5) for x in [0, 1, …] until it reaches maxDepth. For example, selecting step=10,000, minDepth=20,000 and maxDepth=150,000 will compute TAD-scores for window sizes: 20,000, 30,000, 40,000, 70,000 and 100,000
- --TAD_sep_score_prefix
Sometimes it is useful to change some of the parameters without recomputing the z-score matrix and the TAD-separation score. For this case, the prefix containing the TAD separation score and the z-score matrix can be given. If this option is given, new boundaries will be computed but the values of –minDepth, –maxDepth and –step will not be used.
- --thresholdComparisons
P-value threshold for the Bonferroni correction / q-value for FDR. The probability of a local minima to be a boundary is estimated by comparing the distribution (Wilcoxon ranksum) of the z-scores between the left and right regions (diamond) at the local minimum with the matrix z-scores for a diamond at –minDepth to the left and a diamond –minDepth to the right. If –correctForMultipleTesting is ‘None’ the threshold is applied on the raw p-values without any multiple testing correction. Set it to ‘1’ if no threshold should be used (Default: 0.01).
Default: 0.01
- --delta
Minimum threshold of the difference between the TAD-separation score of a putative boundary and the mean of the TAD-sep. score of surrounding bins. The delta value reduces spurious boundaries that are shallow, which usually occur at the center of large TADs when the TAD-sep. score is flat. Higher delta threshold values produce more conservative boundary estimations (Default: 0.01).
Default: 0.01
- --minBoundaryDistance
Minimum distance between boundaries (in bp). This parameter can be used to reduce spurious boundaries caused by noise.
- --chromosomes
Chromosomes and order in which the chromosomes should be plotted. This option overrides –region and –region2.
- --numberOfProcessors, -p
Number of processors to use (Default: 1).
Default: 1
- --version
show program’s version number and exit
Usage example¶
It is recommended to test multiple parameters of TAD calling with hicFindTADs before making conclusions about the number of TADs in a given sample or before comparing TAD calling between multiple conditions. In order to compare several TAD calling parameters at once, it is recommended to use hicPlotTADs.
Below you can find a typical command-line to use hicFindTADs:
$ hicFindTADs -m myHiCmatrix.h5 \
--outPrefix myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr \
--minDepth 3000 \
--maxDepth 31500 \
--step 1500 \
--thresholdComparisons 0.05 \
--delta 0.01 \
--correctForMultipleTesting fdr \
-p 64
This command will output the following files:
myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr_boundaries.bed
myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr_boundaries.gff
myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr_domains.bed
myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr_score.bedgraph
myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr_score.npz
myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr_tad_score.bm
myHiCmatrix_min3000_max31500_step1500_thres0.05_delta0.01_fdr_zscore_matrix.h5
TAD boundaries locations are stored in the boundaries
files, domains.bed
file contains the TAD locations, score
files contain TAD separation score, or the so-called TAD insulation score, in various formats. As a side note, the tad_score.bm
file is a bedgraph matrix that can be used to display TAD separation score curves in hicPlotTADs for example.
The zscore_matrix.h5
file contains a z-score matrix that is useful to quickly test the –thresholdComparisons, –delta and –correctForMultipleTesting parameters by using the –TAD_sep_score_prefix option pointing to this zscore_matrix.h5
file. For example to quickly test a –thresholdComparisons of 0.01 instead of 0.05 we can run the following command:
$ hicFindTADs -m myHiCmatrix.h5 \
--outPrefix myHiCmatrix_min10000_max40000_step1500_thres0.01_delta0.01_fdr \
--TAD_sep_score_prefix myHiCmatrix_min10000_max40000_step1500_thres0.05_delta0.01_fdr
--thresholdComparisons 0.01 \
--delta 0.01 \
--correctForMultipleTesting fdr \
-p 64
As you can see above, –minDepth, –maxDepth and –step are ignored because these parameters are used to calculate the z-score matrix which is here provided to –TAD_sep_score_prefix. Since z-score matrix computation is the most demanding step of hicFindTADs in terms of memory and computation, the above command will thus run significantly faster than the previous one.
Multiple combinations of parameters can be tested that way with only one z-score matrix computation. To compare several TAD calling outputs, we use hicPlotTADs with the following command using, for example, the following tracks.ini file:
command line:
$ hicPlotTADs --tracks tracks.ini --region chrX:6800000-8500000 -o TAD_calling_comparison.png
tracks.ini:
[x-axis]
fontsize=10
[hic]
file = myHiCmatrix.h5
title = Threshold 0.05
colormap = Spectral_r
depth = 400000
min_value = 1
max_value = 80
transform = log1p
file_type = hic_matrix
show_masked_bins = false
[tads]
file = myHiCmatrix_min10000_max40000_step1500_thres0.05_delta0.01_fdr_domains.bed
file_type = domains
border_color = black
color = none
overlay_previous = share-y
[hic]
file = myHiCmatrix.h5
title = Threshold 0.01
colormap = Spectral_r
depth = 400000
min_value = 1
max_value = 80
transform = log1p
file_type = hic_matrix
show_masked_bins = false
[tads]
file = myHiCmatrix_min10000_max40000_step1500_thres0.01_delta0.01_fdr_domains.bed
file_type = domains
border_color = black
color = none
overlay_previous = share-y
[spacer]
height = 0.1
[hic]
file = myHiCmatrix.h5
title = Threshold 0.005
colormap = Spectral_r
depth = 400000
min_value = 1
max_value = 80
transform = log1p
file_type = hic_matrix
show_masked_bins = false
[tads]
file = myHiCmatrix_min10000_max40000_step1500_thres0.005_delta0.01_fdr_domains.bed
file_type = domains
border_color = black
color = none
overlay_previous = share-y
[spacer]
height = 0.1
[hic]
file = myHiCmatrix.h5
title = Threshold 0.001
colormap = Spectral_r
depth = 400000
min_value = 1
max_value = 80
transform = log1p
file_type = hic_matrix
show_masked_bins = false
[tads]
file = myHiCmatrix_min10000_max40000_step1500_thres0.001_delta0.01_fdr_domains.bed
file_type = domains
border_color = black
color = none
overlay_previous = share-y
[spacer]
height = 0.1
[bigwig]
file = /data/processing/ChIP-Seq_Embryos/H3K36me3_14c.bigwig
title = H3K36me3
color = darkred
min_value = 0
max_value = auto
height = 2
file_type = bigwig
[spacer]
height = 0.1
[genes]
file = /data/group/bedfiles/dm6/genes_sorted.bed
title = genes
color = black
height = 18
labels = true
file_type = bed
This will result in the following plot where we see that the fourth set of hicFindTADs parameters with a threshold of 0.001 gives the best results in terms of TAD calling compared to the corrected Hi-C counts distribution and compared to the enrichment of H3K36me3, which is known to be enriched at TAD boundaries in Drosophila melanogaster.

Notes¶
In the _domains.bed output file, the 5th column contains the TAD-separation score at the boundary located at the start of each domain.
The process to identify boundaries is as follows:
call all local minima in the average TAD-score. Each local minimum should be separated by at least min_boundary_distance. If this value is not given, it is set to the average bin size * 4
for each local minimum detected, compute its p-value and then compute a q-value.
for each local minimum detected, compute the ‘delta’ which is the difference between the mean TAD-score of the 10 bins before and the 10 bins after the minimum (excluding the min point)
keep only those minima that fulfill the following criteria: the p-value (or q-value depending on the user selection) should be below the given threshold and the delta should be above the user defined threshold
everything between 2 consecutive boundaries is a TAD
For computation of the p-values, the distribution of the z-scores at the ‘diamond’ above the local minimum is compared with the distribution of z-scores that are min_depth downstream using the Wilcoxon rank-sum test. Similarly, the distribution of z-scores is computed with the z-scores min_depth upstream of the local minimum. The smallest of the two p-values is assigned to the local minimum.
If min_depth is not given, this is computed as bin size * 30 (if the bins are smaller than 1000), as bin size * 10 if the bins are between 1000 and 20.000 and as bin size * 5 if the bin size is bigger than 20.000.
If max_depth is not given, this is computed as bin size * 60 (if the bins are smaller than 1000), as bin size * 40 if the bins are between 1000 and 20.000 and as bin size * 10 if the bin size is bigger than 20.000.
hicMergeDomains¶
hicMergeDomains takes as input multiple TAD domain files from hicFindTads. It merges TADs from different resolutions to one TAD domains file, considers protein peaks from known TAD binding sites and computes a dependency graph of the TADs.
Two TADs are considered as one if they don’t overlap at x bins given by –value; TAD borders need to match the protein peaks given by –proteinFile; a relation between two TADs is given by their overlap of area in percent, parameter –percent. The protein peaks are only considered if in one bin at least –minPeak.
An example usage is:
$ hicMergeDomains –domainFiles 10kbtad_domains.bed 50kbtad_domains.bed –proteinFile ctcf_sorted.bed –outputMergedList two_files_ctcf –outputRelationList two_files_relation_ctcf –outputTreePlotPrefix two_files_plot_ctcf –outputTreePlotFormat pdf
usage: hicMergeDomains --domainFiles DOMAINFILES [DOMAINFILES ...]
[--proteinFile PROTEINFILE]
[--minimumNumberOfPeaks MINIMUMNUMBEROFPEAKS]
[--value VALUE] [--percent PERCENT]
[--outputMergedList OUTPUTMERGEDLIST]
[--outputRelationList OUTPUTRELATIONLIST]
[--outputTreePlotPrefix OUTPUTTREEPLOTPREFIX]
[--outputTreePlotFormat OUTPUTTREEPLOTFORMAT] [--help]
[--version]
Required arguments¶
- --domainFiles, -d
The domain files of the different resolutions is required
Optional arguments¶
- --proteinFile, -p
In order to be able to better assess the relationship between TADs, the associated protein file (e.g. CTCF for mammals) can be included. The protein file is required in broadpeak format
- --minimumNumberOfPeaks, -m
Optional parameter to adjust the number of protein peaks when adapting the resolution to the domain files. At least minimumNumberOfPeaks of unique peaks must be in a bin to considered. Otherwise the bin is treated like it has no peaks (Default: 1).
Default: 1
- --value, -v
Determine a value by how much the boundaries of two TADs must at least differ to consider them as two separate TADs (Default: 5000).
Default: 5000
- --percent, -pe
For the relationship determination, a percentage is required from which area coverage the TADs are related to each other.For example, a relationship should be entered from 5 percent area coverage -p 0.05 (Default: 0.5).
Default: 0.5
- --outputMergedList, -om
File name for the merged domains list (Default: “mergedDomains.bed”).
Default: “mergedDomains.bed”
- --outputRelationList, -or
File name for the relationship list of the TADs (Default: “relationList.txt”).
Default: “relationList.txt”
- --outputTreePlotPrefix, -ot
File name prefix for the relationship tree of the TADs (Default: “relationship_tree_”).
Default: “relationship_tree_”
- --outputTreePlotFormat, -of
File format of the relationship tree. Supported formats are listed on: https://www.graphviz.org/doc/info/output.html (Default: “pdf”).
Default: “pdf”
- --version
show program’s version number and exit
This tool is the result of the bachelor thesis (in German) from Sarah Domogalla at the Bioinformatics lab of the Albert-Ludwigs-University Freiburg. The thesis was written in winter semester 2019/2020.
The thesis covers the issue of differently detected TADs of the same data but on different resolutions.

As it can be seen here, the detected TADs on the four resolutions are quite different, and have maybe a hierarchy. Compare for example the region around 53 Mb in the different resolutions, it is unclear if these are multiple small TADs as suggest by the 10kb resolution, mid-size one as seen by 50 kb or one big, as detected on 100 kb resolution. Consider the next image for an abstract representation of this problem.

The abstract representation shows multiple issues: Are ID_85,ID_86 and ID_87 related in any hierarchy? Which role plays ID_75? hicMergeDomains tries to solve this by providing the parameters –value and –percentage to control how the relationship is determined and if TADs are duplicated and therefore deleted.

This abstract representation of the hierarchical TADs is created for each chromosome and provides insights for the organization of the chromatin structure in the three dimensional space.

The last image shows a cleaned TAD detection based on all four inputs.
hicDifferentialTAD¶
Computes differential TADs by comparing the precomputed TAD regions of the target matrix with the same regions of the control matrix. Please notice that the matrices need to have the same read coverage, this can be achieved with hicNormalize and the ‘smallest’-mode. H0 is the assumption that two regions are identical, the rejected files contain therefore the as differential considered regions.
usage: hicDifferentialTAD [--targetMatrix TARGETMATRIX]
[--controlMatrix CONTROLMATRIX]
[--tadDomains TADDOMAINS]
[--outFileNamePrefix OUTFILENAMEPREFIX]
[--pValue PVALUE]
[--mode {intra-TAD,left-inter-TAD,right-inter-TAD,all}]
[--modeReject {all,one}] [--threads THREADS]
[--help] [--version]
Required arguments¶
- --targetMatrix, -tm
The matrix which was used to compute the TADs
- --controlMatrix, -cm
The control matrix to test the TADs for a differential interaction pattern.
- --tadDomains, -td
The TADs domain file computed by hicFindTADs.
- --outFileNamePrefix, -o
Outfile name prefix to store the accepted / rejected H0 TADs.
Optional arguments¶
- --pValue, -p
H0 is considered as ‘two regions are identical.’ i.e. all regions with a test result of <= p-value are rejected and considered as differential (Default: 0.05).
Default: 0.05
- --mode, -m
Possible choices: intra-TAD, left-inter-TAD, right-inter-TAD, all
Consider only intra-TAD interactions, or additional left inter-TAD, right inter-TAD or all (Default: “all”).
Default: “all”
- --modeReject, -mr
Possible choices: all, one
All test of a mode must be rejected (all) or reject region (and accept it is differential) as soon as at least one region is having a p-value <= –pValue (Default: “one”).
Default: “one”
- --threads, -t
Number of threads to use, the parallelization is implemented per chromosome (Default: 4).
Default: 4
- --version
show program’s version number and exit
hicDifferentialTAD computes with a treatment Hi-C file, a control Hi-C file and a precomputed TAD domains file if the detected TADs are differential between the treatment and the control sample. The TADs need to be precomputed on the treatment file with _hicFindTADs.
hicDifferentialTAD extract per TAD three regions: the intra-TAD, the left and right inter-TAD region. In the following image, the upper visualization shows a region with the detected TADs which are indicated by the black lines. The bottom shows as an example which regions are used for the differential test: the intra-TAD region is highlighted in red, the left inter-TAD in sandy-color and the right inter-TAD in blue. Between two samples a Wilcoxon rank-sum test is applied for a TAD under H0: ‘The regions are equal’. For all three regions of a TAD the rank-sum test is independently applied. The user has the choice with the two parameters ‘mode’ and ‘modeReject’ to define if a) all three regions should be considered (‘all’), or only a subset e.g. the intra-TAD or intra-TAD and left inter-TAD should be considered; and b) if all regions need to have lower p-value than the user given to reject H0 or if it is enough that at least one region is rejecting H0 to consider the region as differential.

Example usage¶
$ hicDifferentialTAD -tm GSM2644945_Untreated-R1.100000_chr1.cool -cm GSM2644947_Auxin2days-R1.100000_chr1.cool -td untreated_R1_domains.bed -o differential -p 0.01 -t 4 -mr all
In this example data from Nora et al. “Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization”, Cell 169.5 (2017): 930-944 is used [GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE98671]

The output are two BED-similar files: the ‘_accepted.diff_tad’ and ‘_rejected.diff_tad’ file. The difference to a real BED file is a) the usage of a header starting with ‘#’, the first six columns are BED6 standard, however, there are three additional columns with the p-values of each intra-TAD, left-inter-TAD and right-inter-TAD test. The score value and name is copied from the domains.bed file which is a output of hicFindTADs.
# Created with HiCExplorer's hicDifferentialTAD version 3.5-dev
# H0 'regions are equal' H0 is rejected for all p-value smaller or equal the user given p-value threshold; i.e. regions in this file are considered as differential.
# Rejected regions with Wilcoxon rank-sum test to p-value: 0.01 with used mode: all and modeReject: all
# Chromosome start end name score strand p-value left-inter-TAD p-value right-inter-TAD p-value intra-TAD
chr1 4400000 6200000 ID_0.01_1 -0.5630275 . nan 0.42772134044376386 0.0001942482956610518
chr1 6200000 7300000 ID_0.01_2 -0.235798 . 0.3479058102348157 0.651801360704674 0.011174626122333891
chr1 7300000 9500000 ID_0.01_3 -0.44334 . 0.7619708385959597 0.9408966423531526 4.668547072331386e-06
chr1 10100000 11600000 ID_0.01_5 -0.6844015 . 0.10355136240139871 0.2022260523997077 0.006197912102328514
chr1 11600000 12900000 ID_0.01_6 -0.607587 . 0.39564035674322084 0.04934747494654432 5.069275897797787e-07
chr1 12900000 13700000 ID_0.01_7 -0.6303795 . 0.030328896271663634 0.048097680638217614 0.0023999817017426378
chr1 13700000 14900000 ID_0.01_8 -1.2063985 . 0.07605786424686575 0.5322636852613494 1.8146100883843892e-06
chr1 14900000 17100000 ID_0.01_9 -0.3406565 . 0.4982863585572014 0.692745144471043 0.002562121461829293
chr1 17100000 18100000 ID_0.01_10 -0.230354 . 0.8237750360973404 0.011911117420279341 0.00020602838341698515
chr1 18100000 19100000 ID_0.01_11 -0.5135365 . 0.007089417650089524 0.07553560212368073 0.03561337274424189
hicMergeTADbins¶
Uses a BED file of domains or TAD boundaries to merge the bin counts of a Hi-C matrix per TAD.
The output matrix contains the total counts per TAD and the total contacts with all other TADs.
usage: hicMergeTADbins [-h] --matrix MATRIX --domains DOMAINS --outFile
OUTFILE [--version]
Named Arguments¶
- --matrix, -m
Path to Hi-C matrix to use.
- --domains
Path to a bed file containing the domains.
- --outFile, -o
Name for the resulting matrix file.
- --version
show program’s version number and exit
Tools for Hi-C and TADs visualization¶
hicPlotMatrix¶
Creates a heatmap of a Hi-C matrix.
usage: hicPlotMatrix --matrix MATRIX --outFileName OUTFILENAME [--title TITLE]
[--scoreName SCORENAME] [--perChromosome]
[--clearMaskedBins]
[--chromosomeOrder CHROMOSOMEORDER [CHROMOSOMEORDER ...]]
[--region REGION] [--region2 REGION2] [--log1p] [--log]
[--colorMap COLORMAP] [--vMin VMIN] [--vMax VMAX]
[--dpi DPI] [--bigwig BIGWIG [BIGWIG ...]]
[--bigwigAdditionalVerticalAxis]
[--vMinBigwig VMINBIGWIG] [--vMaxBigwig VMAXBIGWIG]
[--flipBigwigSign]
[--scaleFactorBigwig SCALEFACTORBIGWIG]
[--fontsize FONTSIZE] [--rotationX ROTATIONX]
[--rotationY ROTATIONY]
[--increaseFigureWidth INCREASEFIGUREWIDTH]
[--increaseFigureHeight INCREASEFIGUREHEIGHT]
[--loops LOOPS] [--help] [--version]
Required arguments¶
- --matrix, -m
Path of the Hi-C matrix to plot.
- --outFileName, -out
File name to save the image.
Optional arguments¶
- --title, -t
Plot title.
- --scoreName, -s
Score name label for the heatmap legend.
- --perChromosome
Instead of plotting the whole matrix, each chromosome is plotted next to the other. This parameter is not compatible with –region.
Default: False
- --clearMaskedBins
If set, masked bins are removed from the matrix and the nearest bins are extended to cover the empty space instead of plotting black lines.
Default: False
- --chromosomeOrder
Chromosomes and order in which the chromosomes should be plotted. This option overrides –region and –region2.
- --region
Plot only this region. The format is chr:start-end The plotted region contains the main diagonal and is symmetric unless –region2 is given.
- --region2
If given, then only the region defined by –region and –region2 is given. The format is the same as –region1.
- --log1p
Plot the log1p of the matrix values.
Default: False
- --log
Plot the MINUS log of the matrix values.
Default: False
- --colorMap
Color map to use for the heatmap. Available values can be seen here: http://matplotlib.org/examples/color/colormaps_reference.html (Default: “RdYlBu_r”).
Default: “RdYlBu_r”
- --vMin
Minimum score value.
- --vMax
Maximum score value.
- --dpi
Resolution for the image in case theoutput is a raster graphics image (e.g png, jpg) (Default: 72).
Default: 72
- --bigwig
Bigwig file to plot below the matrix. This can for example be used to visualize A/B compartments or ChIP-seq data.
- --bigwigAdditionalVerticalAxis
Add an additional axis to determine the values of a bigwig file in 2D better.
Default: False
- --vMinBigwig
Minimum score value for bigwig.
- --vMaxBigwig
Maximum score value for bigwig
- --flipBigwigSign
The sign of the bigwig values are flipped. Useful if hicPCA gives inverted values.
Default: False
- --scaleFactorBigwig
Scale the values of a bigwig file by the given factor (Default: 1.0).
Default: 1.0
- --fontsize
Fontsize in the plot for x and y axis (Default: 10).
Default: 10
- --rotationX
Rotation in degrees for the labels of x axis (Default: 0).
Default: 0
- --rotationY
Rotation in degrees for the labels of y axis (Default: 0).
Default: 0
- --increaseFigureWidth
Plotting additional bigwig tracks can cause compression in x or y direction of the heatmap. Set this factor to a value unequal to 0 to correct this (Default: 0.5).
Default: 0.5
- --increaseFigureHeight
Plotting additional bigwig tracks can cause compression in x or y direction of the heatmap. Set this factor to a value unequal to 0 to correct this (Default: 0.5).
Default: 0.5
- --loops
Bedgraph file to plot detected long range contacts from hicDetectLoops.
- --version
show program’s version number and exit
Details¶
hicplotMatrix takes a Hi-C matrix and plots the interactions of all or some chromosomes.
Examples¶
Here’s an example of Hi-C data from wild-type D. melanogaster embryos.

This plot shows all contacts of a Hi-C matrix. Its bins were merged into 25 kb bins using hicMergeMatrixBins. Alternatively, chromosomes can be plotted separately:

$ hicPlotMatrix -m Dmel.h5 -o hicPlotMatrix.png \
-t 'D.melanogaster (--perChromosome)' --log1p \
--clearMaskedBins --chromosomeOrder 2L 2R 3L 3R X --perChromosome
hicPlotTADs¶
Description¶
For parameter options please see the pyGenomeTracks documentation.
Usage example¶
The hicPlotTADs output is similar to a genome browser screenshot that besides the usual genes and score data (like bigwig or bedgraph files) also contains Hi-C data. The plot is composed of tracks that need to be specified in a configuration file. Once the track file is ready, hicPlotTADs can be used as follows:
$ hicPlotTADs --tracks tracks.ini --region chrX:99,974,316-101,359,967 \
-t 'Marks et. al. TADs on X' -o tads.pdf

Configuration file template¶
The following is a template for the configuration file which is based on .ini configuration files. Each track is defined by a section header (for example [hic track]), followed by parameters specific to the section as color, title, etc. For details please see the documentation of pyGenomeTracks.
$ hicPlotTADs --tracks hic_track.ini -o hic_track.png --region chrX:2500000-3500000
[x-axis]
where = top
[hic matrix]
file = hic_data.h5
title = Hi-C data
# depth is the maximum distance plotted in bp. In Hi-C tracks
# the height of the track is calculated based on the depth such
# that the matrix does not look deformed
depth = 300000
transform = log1p
file_type = hic_matrix
[tads]
file = domains.bed
file_type = domains
border_color = black
color = none
# the tads are overlay over the hic-matrix
# the share-y options sets the y-axis to be shared
# between the Hi-C matrix and the TADs.
overlay_previous = share-y
[spacer]
[bigwig file test]
file = bigwig.bw
# height of the track in cm (optional value)
height = 4
title = ChIP-seq
min_value = 0
max_value = 30

hicPlotViewpoint¶
Plots the number of interactions around a given reference point in a region.
usage: hicPlotViewpoint --matrix MATRIX [MATRIX ...] --region REGION
--outFileName OUTFILENAME --referencePoint
REFERENCEPOINT [--chromosome CHROMOSOME]
[--interactionOutFileName INTERACTIONOUTFILENAME]
[--dpi DPI] [--version] [--help]
Required arguments¶
- --matrix, -m
Hi-C matrix to plot.
- --region
The format is chr:start-end.
- --outFileName, -o
File name of the image to save.
- --referencePoint, -rp
Reference point. Needs to be in the format: ‘chr:100’ for a single reference point or ‘chr:100-200’ for a reference region.
Optional arguments¶
- --chromosome, -C
Optional parameter: Only show results for this chromosome.
- --interactionOutFileName, -i
Optional parameter: If set, a bedgraph file with all interaction will be created.
- --dpi
Optional parameter: Resolution for the image in case theouput is a raster graphics image (e.g png, jpg) (Default: 300).
Default: 300
- --version
show program’s version number and exit
hicAggregateContacts¶
Background¶
hicAggregateContacts
is a tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions. Positions of interest can for example be binding sites of a specific protein that were determined by ChIP-seq or genetic elements as transcription start sites of active genes.
Description¶
Takes a list of positions in the Hi-C matrix and makes a pooled image.
usage: hicAggregateContacts --matrix MATRIX --outFileName OUTFILENAME --BED
BED --mode {inter-chr,intra-chr,all}
[--range RANGE] [--row_wise] [--BED2 BED2]
[--numberOfBins NUMBEROFBINS]
[--transform {total-counts,z-score,obs/exp,none}]
[--operationType {sum,mean,median}] [--perChr]
[--largeRegionsOperation {first,last,center}]
[--help] [--version]
[--outFilePrefixMatrix OUTFILEPREFIXMATRIX]
[--outFileContactPairs OUTFILECONTACTPAIRS]
[--outFileObsExp OUTFILEOBSEXP]
[--diagnosticHeatmapFile DIAGNOSTICHEATMAPFILE]
[--kmeans KMEANS] [--hclust HCLUST]
[--spectral SPECTRAL]
[--howToCluster {full,center,diagonal}]
[--chromosomes CHROMOSOMES [CHROMOSOMES ...]]
[--colorMap COLORMAP] [--plotType {2d,3d}]
[--vMin VMIN] [--vMax VMAX] [--dpi DPI]
- --matrix, -m
Path of the Hi-C matrix to plot.
- --outFileName, -out
File name to save the image.
- --BED
Interactions between regions in this BED file are plotted.
- --mode
Possible choices: inter-chr, intra-chr, all
- --range
Range of contacts that will be considered for plotting the aggregate contacts in bp with the format low_range:high_range for example 1000000:20000000. The range should start at contacts larger than TAD size to reduce background interactions. This will be ignored if inter-chromosomal contacts are of interest.
- --row_wise
If given,the insteractions between each row of the BED file and its corresponding row of the BED2 file are computed. If intra-chromosomal contacts are computed, the rows with different chromosomes are ignored. If inter-chromosomal, the rows with same chromosomes are ignored. It keeps all the rows if all.
Default: False
- --BED2
Optional second BED file. Interactions between regions in first and second BED file are plotted.
- --numberOfBins
Number of bins to include in the submatrix. The bed regions will be centered between half number of bins and the other half number of bins (Default: 51).
Default: 51
- --transform
Possible choices: total-counts, z-score, obs/exp, none
Type of transformation for the matrix. The options are “none”, “total-counts”, “z-score” and “obs/exp”. If total counts are selected, the sub-matrix values are divided by the total counts for normalization. If z-score or obs/exp are selected, the Hi-C matrix is converted into a z-score or observed / expected matrix (Default: “none”).
Default: “none”
- --operationType
Possible choices: sum, mean, median
Type of the operation to be applied to summerize the submatrices into a single matrix. Options are sum, mean and median. (Default: “median”)
Default: “median”
- --perChr
if set, it generates a plot per chromosome. It is only affected if intra-chromosomal contacts are of interest.
Default: False
- --largeRegionsOperation
Possible choices: first, last, center
If a given coordinate in the bed file is larger than a bin of the input matrix, by default only the first bin is taken into account. However there are more posibilities to handel such a case. Users can ask for the last bin or for center of the region. As an example if a region falls into bins [4,5,6] and –numberOfBins = 2 then if first, bins [3,4,5] are kept. If last: [5,6,7] and if center: [4,5,6].
Default: “first”
- --version
show program’s version number and exit
- --dpi
Optional parameter: Resolution for the image in case theoutput is a raster graphics image (e.g png, jpg) (Default: 300).
Default: 300
- --outFilePrefixMatrix
If this option is set, the values underlying the output matrix will be saved to tab-delimited tables (one per chromosome) using the indicated prefix, for example TSS_to_TSS_chrX.tab. If clustering is performed, the values are saved including the cluster_id in the file TSS_to_TSS_chrX_cluster_1.tab.
- --outFileContactPairs
Output file prefix. If this option is set, the position of the contact positions are saved as (chrom1, start1, end1, chrom2, start2, end2) where chrom_n, start_n, end_n correspond to the pair of positions used to compute the submatrix. The data is saved per chromosome and per cluster separately (one file each). The format is {prefix}_{chrom}_{cluster_id}.tab. If no clusters were computed, then only one file per chromosome is produced.
- --outFileObsExp
writes the obs/exp matrix to a file, if –transform=obs/exp.
- --diagnosticHeatmapFile
If given, a heatmap (per chromosome) is saved. Each row in the heatmap contains thediagonal of each of the submatrices centered on the bed file. This file is useful to get an idea of the values that are used for the aggregate matrix and to determine the fraction of sub-matrices that are aggregated that may have an enrichment at the center.
- --kmeans
Number of clusters to compute. When this option is set, the submatrices are split into clusters (per chromosome) using the k-means algorithm.
- --hclust
Number of clusters to compute (per chromosome). When this option is set, the matrix is split into clusters using the hierarchical clustering algorithm, using “ward linkage”. –hclust could be very slow if you have >1000 submatrices per chromosome. In those cases, you might prefer –kmeans
- --spectral
Number of clusters to compute (per chromosome). When this option is set, the matrix is split into clusters using the spectral clustering algorithm.
- --howToCluster
Possible choices: full, center, diagonal
Options are “full”, “center” and “diagonal”. The full clustering takes all values of each submatrix for clustering. “center”, takes only a square of length 3x3 from each submatrix and uses only this values for clustering. With the “diagonal” option the clustering is only carried out based on the submatrix diagonal (representing values at the same distance to each other) (Default: “full”).
Default: “full”
- --chromosomes, -C
List of chromosomes to plot.
- --colorMap
Color map to use for the heatmap. Available values can be seen here: http://matplotlib.org/examples/color/colormaps_reference.html (Default: “RdYlBu_r”).
Default: “RdYlBu_r”
- --plotType
Possible choices: 2d, 3d
Plot type (Default: “2d”).
Default: “2d”
- --vMin
Minimum value of the plotted score.
- --vMax
Maximum value of the plotted score.
Usage example¶
Below, you can find an example of an aggregate Hi-C matrix obtained from Drosophila melanogaster Hi-C data. The interactions are plotted at binding sites of a protein that were determined by ChIP-seq. We plot sub-matrices of 30 bins (1.5 kb bin size, 45 kb in total). The regions specified in the BED file will be centered between half number of bins and the other half number of bins.The considered range is 300-1000 kb. The range should be adjusted and only contain contacts larger than TAD size to reduce background interactions.
$ hicAggregateContacts --matrix Dmel.h5 --BED ChIP-seq-peaks.bed \
--outFileName Dmel_aggregate_Contacts --vMin 0.8 --vMax 2.2 \
--range 300000:1000000 --numberOfBins 30 --chromosomes X \
--operationType mean --transform obs/exp --mode intra-chr

This example was calculated using mean interactions of an observed vs expected transformed Hi-C matrix. Additional options for the matrix transformation are total-counts or z-score. Aggregate contacts can be plotted in 2D or 3D.
hicPlotAverageRegions¶
hicPlotAverage regions plots the data computed by hicAverageRegions. It shows the summed up and averaged regions around all given reference points. This tool is useful to plot differences at certain reference points as for example TAD boundaries between samples.
usage: hicPlotAverageRegions --matrix MATRIX --outputFile OUTPUTFILE [--log1p]
[--log] [--colorMap COLORMAP] [--vMin VMIN]
[--vMax VMAX] [--dpi DPI] [--help] [--version]
Required arguments¶
- --matrix, -m
The averaged regions file computed by hicAverageRegions (npz file).
- --outputFile, -o
The averaged regions plot.
Optional arguments¶
- --log1p
Plot log1p of the matrix values.
Default: False
- --log
Plot log of the matrix values.
Default: False
- --colorMap
Color map to use for the heatmap. Available values can be seen here: http://matplotlib.org/examples/color/colormaps_reference.html (Default: “hot_r”).
Default: “hot_r”
- --vMin
Minimum score value.
- --vMax
Maximum score value.
- --dpi
Resolution of image ifouput is a raster graphics image (e.g png, jpg) (Default: 300).
Default: 300
- --version
show program’s version number and exit
See hicAverageRegions for an example usage.
Hi-C contact matrix handling¶
hicConvertFormat¶
Conversion of Hi-C matrices of different file formats. We support the conversion of hic to cool format via hic2cool, and homer, HicPro, h5 and cool format to h5, cool, homer or ginteractions format. Moreover, hicConvertFormat accepts multiple input files from one format with different resolutions and creates a mcool file. Each original file is stored under the path e.g. ::/resolutions/10000. A batch computation is possible, the number of input files and output files needs to match, all input files need to be of the same format type and all output files too. For input and output of cooler files special options are available, for all other formats they will be ignored.HiCPro file format needs an additional bed file as input.
usage: hicConvertFormat --matrices MATRICES [MATRICES ...] --outFileName
OUTFILENAME [OUTFILENAME ...] --inputFormat
{h5,cool,hic,homer,hicpro} --outputFormat
{cool,h5,homer,ginteractions,mcool}
[--correction_name CORRECTION_NAME]
[--correction_division] [--store_applied_correction]
[--chromosome CHROMOSOME] [--enforce_integer]
[--load_raw_values]
[--resolutions RESOLUTIONS [RESOLUTIONS ...]] [--help]
[--version]
[--bedFileHicpro BEDFILEHICPRO [BEDFILEHICPRO ...]]
Required arguments¶
- --matrices, -m
input file(s). Could be one or many files.
- --outFileName, -o
File name to save the exported matrix.
- --inputFormat
Possible choices: h5, cool, hic, homer, hicpro
File format of the input matrix file. The following options are available: h5 (native HiCExplorer format based on hdf5 storage format), cool, hic, homer, hicpro.
- --outputFormat
Possible choices: cool, h5, homer, ginteractions, mcool
Output format. The following options are available: h5 (native HiCExplorer format based on hdf5 storage format). cool, ginteractions, homer and mcool (Default: “cool”).
Default: “cool”
Optional arguments¶
- --correction_name
Name of the column which stores the correction factors. The information about the column names can be figured out with the tool hicInfo. Option only for cool input files (Default: “weight”).
Default: “weight”
- --correction_division
If set, division is applied for correction. Default is a multiplication. Option only for cool input files.
Default: False
- --store_applied_correction
Store the applied correction and do not set correction factors. Option only for cool input files.
Default: False
- --chromosome
Load only one chromosome. Option only for cool input files.
- --enforce_integer
Enforce datatype of counts to integer. Option only for cool input files.
Default: False
- --load_raw_values
Load only ‘count’ data and do not apply a correction. Option only for cool input files.
Default: False
- --resolutions, -r
List of resolutions that should be added.
- --version
show program’s version number and exit
- --bedFileHicpro, -bf
Bed file(s) of hicpro file format.
Background¶
To reproduce analyses and to compare and use different Hi-C analysis software, conversion between interaction matrix file formats is crucial. However, most Hi-C softwares are only supporting their own data format which makes the exchange difficult. HiCExplorer supports a range of interaction matrices, both for import and for export.
- Import:
hic
cool
h5
homer
HicPro
- Export:
cool
mcool
h5
homer
ginteractions
With HiCExplorer version 2.2 hicConvertFormat and hicAdjustMatrix replace hicExport from HiCExplorer 2.1 and older versions.
Usage example¶
HiCExplorer uses the library hic2cool to convert .hic interaction matrix files to the cool format. Usually .hic files have the three correction factors KR, VC or VC_SQRT; however these cannot be applied natively by HiCExplorer tools because HiCExplorer expects the correction values to be stored in the column weight. To work with corrected data the correction factors need to applied separately, see section cool to cool.
The following example will convert a hic file which contains the resolution of 1000 to a cool file with 10kb resolution. The desired resolution needs to be existing in the hic file. If no resolution parameter is defined, a mcool file with all available resolutions is created.
$ hicConvertFormat -m matrix.hic --inputFormat hic --outputFormat cool -o matrix.cool --resolutions 10000
It is only possible to convert from hic to cool format, no other formats are supported.
The cool file format is developed and maintained by the Mirny lab and allows to access interaction matrices in an easy to use data format.
The cool data format allows to use the following options:
correction_name: In case correction factors are not stored in ‘weight’ the correct column name can be defined using this parameter and the resulting matrix will store the values in ‘weight’.
correction_division: Correction factors can be applied by a multiplication or a division. The default behaviour is to use the multiplication, in case the correction factors are inverted, set this parameter.
store_applied_correction: Set this parameter if correction factors should be applied on the data and should be written back to colum ‘counts’ in the corrected form and not as raw. Default: not set.
chromosomes: Define a list of chromosomes which should be included in the output matrix. All chromosomes which are not defined are not part of the new matrix. This parameter can speed up the processing especially if only one chromosome is used.
enforce_integer: Raw interaction data is stored as integers, after the correction is applied the data is a float. Set a this parameter to enforce integer values in the new matrix.
load_raw_values: Set this parameter if the interaction data should not be loaded with the correction factors.
$ hicConvertFormat -m matrix.cool --inputFormat cool --outputFormat cool -o matrix.cool --correction_name KR
Homer is a software for ‘motif discovery and next generation sequencing analysis’ and supports Hi-C. HiCExplorer is able to read and write the interaction matrix from Homer. Homer stores the interaction matrix in a simple text file as a dense matrix. To write large matrices in Homer format needs a lot of space and can take a few ours to days.
$ hicConvertFormat -m matrix.homer --inputFormat homer --outputFormat cool -o matrix.cool
HiC-Pro file format needs an additional bed file as input:
$ hicConvertFormat -m matrix.hicpro --bedFileHicpro hicpro.bed --inputFormat hicpro --outputFormat cool -o matrix.cool
With HiCExplorer it is possible to create a multiple cool (mcool) file. These mcool files can be used e.g. with HiGlass.
To create an mcool file, use as input either one matrix in one of the supported read formats and define the desired resolutions or define multiple input matrices. In the second case, the matrices should all have different resolutions.
The resolutions need to be a multiple of the input matrix i.e. matrix with 10kb, 20kb and 30kb are possible but not 35kb.
$ hicConvertFormat -m matrix.cool --inputFormat cool --outputFormat mcool
-o multi_matrix.mcool --resolutions 20000 40000 70000 120000 500000
$ hicConvertFormat -m matrix10kb.cool matrix20kb.cool matrix30kb.cool
--inputFormat cool --outputFormat mcool -o multi_matrix.mcool
The mcool matrix contains the individual matrices as follows:
multi_matrix.mcool::/resolutions/10000
multi_matrix.mcool::/resolutions/40000
multi_matrix.mcool::/resolutions/70000
multi_matrix.mcool::/resolutions/120000
multi_matrix.mcool::/resolutions/500000
hicAdjustMatrix¶
This tool adjusts hic matrices by keeping, removing or masking a given list of regions or chromosmes.
usage: hicAdjustMatrix --matrix MATRIX --outFileName OUTFILENAME
[--chromosomes CHROMOSOMES [CHROMOSOMES ...] |
--regions REGIONS | --maskBadRegions MASKBADREGIONS]
[--action {keep,remove,mask}] [--help] [--version]
Named Arguments¶
- --chromosomes, -c
List of chromosomes to keep/remove.
- --regions, -r
BED file which stores a list of regions to keep/remove.
- --maskBadRegions, -mbr
Bad regions are identified and masked.
Required arguments¶
- --matrix, -m
The Hi-C matrix to adjust. HiCExplorer supports the following file formats: h5 (native HiCExplorer format) and cool.
- --outFileName, -o
File name to save the adjusted matrix.
Optional arguments¶
- --action, -a
Possible choices: keep, remove, mask
Keep, remove or mask the list of specified chromosomes/regions. keep/remove: These options keep/remove bins of matrix by deleting them. This may cause issue plotting the matrix if several parts of a single chromosome are going to be deleted. In that case, one may consider using the mask option (Default: “keep”).
Default: “keep”
- --version
show program’s version number and exit
hicAdjustMatrix can mask, remove or keep defined regions from a BED file or given chromosomes. This can be useful to create a smaller Hi-C matrix of a single chromosome for e.g. testing purposes or to remove repetitive chromosome ends before calculating A/B compartments using hicPCA.
Example usages¶
$ hicAdjustMatrix -m matrix.cool --action keep --chromosomes chr1 -o matrix_chr1.cool
$ hicAdjustMatrix -m matrix.cool --action mask --regions mask_regions.bed -o matrix_masked.cool
mask_regions.bed
chr1 10 30
chr1 50 300
Capture Hi-C analysis¶
chicQualityControl¶
Computes the sparsity of each viewpoint to determine the quality. A viewpoint is considered to be of bad quality if it is too sparse i.e. if there are too many locations with no interactions recorded.
This script creates three output files: a plot with the sparsity distribution per matrix, a plot with the sparsity distribution as histograms and a filtered reference points file.
An example usage is:
$ chicQualityControl -m matrix1.h5 matrix2.h5 -rp referencePointsFile.bed –range 20000 40000 –sparsity 0.01 -o referencePointFile_QC_passed.bed
usage: chicQualityControl --matrices MATRICES [MATRICES ...] --referencePoints
REFERENCEPOINTS --sparsity SPARSITY
[--outFileName OUTFILENAME]
[--outFileNameHistogram OUTFILENAMEHISTOGRAM]
[--outFileNameSparsity OUTFILENAMESPARSITY]
[--threads THREADS] [--fixateRange FIXATERANGE]
[--dpi DPI] [--help] [--version]
Required arguments¶
- --matrices, -m
The input matrices to apply the QC on.
- --referencePoints, -rp
Bed file contains all reference points which are checked for a sufficient number of interactions.
- --sparsity, -s
Viewpoints with a sparsity less than the value given are considered of bad quality. If multiple matrices are given, the viewpoint is removed as soon as it is of bad quality in at least one matrix.
Optional arguments¶
- --outFileName, -o
The output file name of the passed reference points. Used as prefix for the plots as well (Default: “new_referencepoints.bed”).
Default: “new_referencepoints.bed”
- --outFileNameHistogram, -oh
The output file for the histogram plot (Default: “histogram.png”).
Default: “histogram.png”
- --outFileNameSparsity, -os
The output file for the sparsity distribution plot (Default: “sparsity.png”).
Default: “sparsity.png”
- --threads, -t
Number of threads (Default: 4).
Default: 4
- --fixateRange, -fs
Fixate score of background model starting at distance x. E.g. all values greater than 500kb are set to the value of the 500kb bin (Default: 500000).
Default: 500000
- --dpi
Optional parameter: Resolution for the image if theoutput is a raster graphics image (e.g png, jpg) (Default: 300).
Default: 300
- --version
show program’s version number and exit
chicViewpointBackgroundModel¶
chicViewpointBackgroundModel computes a background model for all given samples with all reference points. For all relative distances to a reference point a negative binomial distribution is fitted. In addition, for each relative distance to a reference point the average value for this location is computed. Both background models are used, the first one for p-value and significance computation, the second one to filter out interactions with a smaller x-fold over the mean.
The background distributions are fixed at –fixateRange, i.e. all distances lower or higher than this value use the fixed background distribution.
An example usage is:
$ chicViewpointBackgroundModel –matrices matrix1.cool matrix2.cool matrix3.cool –referencePoints referencePointsFile.bed –range 20000 40000 –outFileName background_model.bed
usage: chicViewpointBackgroundModel --matrices MATRICES [MATRICES ...]
--referencePoints REFERENCEPOINTS
[--averageContactBin AVERAGECONTACTBIN]
[--truncateZeros]
[--outFileName OUTFILENAME]
[--threads THREADS]
[--fixateRange FIXATERANGE] [--help]
[--version]
Required arguments¶
- --matrices, -m
The input matrices (samples) to build the background model on.
- --referencePoints, -rp
Bed file contains all reference points which should be used to build the background model.
Optional arguments¶
- --averageContactBin
Average the contacts of n bins via a sliding window approach (Default: 5).
Default: 5
- --truncateZeros, -tz
Truncates the zeros before the distributions are fitted. Use it in case you observe an over dispersion.
Default: False
- --outFileName, -o
The name of the background model file (Default: “background_model.txt”).
Default: “background_model.txt”
- --threads, -t
Number of threads (uses the python multiprocessing module) (Default: 4).
Default: 4
- --fixateRange, -fs
Fixate score of backgroundmodel starting at distance x. E.g. all values greater 500kb are set to the value of the 500kb bin (Default: 500000).
Default: 500000
- --version
show program’s version number and exit
chicViewpoint¶
Computes per input matrix all viewpoints which are defined in the reference points file. All files are stored in the folder defined by –outputFolder, and the files are named by the name of the reference point, the sample name and the location of the reference point:
gene_matrix_name_chr_start_end.txt
If multiple reference points are used and the processing downstream should be automated via batch processing mode, please activate –writeFileNamesToFile. All the file names will be written to this file; in the case of multiple samples two consecutive lines are considered as treatment vs. control for the differential analysis.
An example usage is:
$ chicViewpoint –matrices matrix1.cool matrix2.cool matrix3.cool –referencePoints referencePointsFile.txt –range 20000 40000 –outputFolder interactionFilesFolder
An example usage for batch mode is:
$ chicViewpoint –matrices matrix1.cool matrix2.cool matrix3.cool –referencePoints referencePointsFile.txt –range 20000 40000 –outputFolder interactionFilesFolder –writeFileNamesToFile interactionFile.txt
usage: chicViewpoint --matrices MATRICES [MATRICES ...] --range RANGE RANGE
--referencePoints REFERENCEPOINTS --backgroundModelFile
BACKGROUNDMODELFILE [--threads THREADS]
[--averageContactBin AVERAGECONTACTBIN]
[--decimalPlaces DECIMALPLACES]
[--writeFileNamesToFile WRITEFILENAMESTOFILE]
[--fixateRange FIXATERANGE] [--allViewpointsList]
[--outputFolder OUTPUTFOLDER] [--help] [--version]
Required arguments¶
- --matrices, -m
Path to the Hi-C matrices which store the captured Hi-C data per sample.
- --range
Defines the region upstream and downstream of a reference point which should be considered in the analysis. Please remember to use the same fixate range setting as for the background model computation and that distances of the range larger than the fixate range use the background model of those.Format is –region upstream downstream
- --referencePoints, -rp
Reference point file. Needs to be in the format: ‘chr 100’ for a single reference point or ‘chr 100 200’ for a reference region and with a single reference point per line
- --backgroundModelFile, -bmf
path to the background file computed by chicViewpointBackgroundModel
Optional arguments¶
- --threads, -t
Number of threads (uses the python multiprocessing module) (Default: 4).
Default: 4
- --averageContactBin
Average the contacts of n bins via a sliding window approach to smooth the values and be less sensitive for outliers (Default: 5).
Default: 5
- --decimalPlaces
Decimal places for all output floating numbers in the viewpoint files (Default: 12).
Default: 12
- --writeFileNamesToFile, -w
Set this parameter to have a file with all file names of the viewpoint files (useful only in batch processing mode).
- --fixateRange, -fs
Fixate range of background model starting at distance x. E.g. all values greater 500kb are set to the value of the 500kb bin (Default: 500000).
Default: 500000
- --allViewpointsList, -avl
Writes a file where all viewpoints all samples are sorted by the viewpoints.
Default: False
- --outputFolder, -o
This folder contains all created viewpoint files (Default: “interactionFiles”).
Default: “interactionFiles”
- --version
show program’s version number and exit
chicSignificantInteractions¶
Per viewpoint the significant interactions are detected based on the background model. For each viewpoint file, an output file is created with all recorded significant interactions and a target file. The target file is especially useful in the batch mode context; for two consecutive listed control and treatment viewpoints it merges the significant interactions which can then be used to test for a differential interaction scheme.
chicSignificantInteractions supports two modes to detect significant interactions, either by an x-fold over the average background or a loose p-value. In both cases neighboring significant peaks are merged together and an additional p-value is computed based on the sum of interactions for this neighborhood. Only interactions with a higher p-value (as specified by the threshold –pValue) are accepted as a significant interaction.
An example usage is for single mode is:
$ chicSignificantInteractions –interactionFile interactionFilesFolder/Sox17_FL-E13-5_chr1_1000_2000.bed –referencePoints referencePointsFile.bed –range 20000 40000 –backgroundModelFile background_model.bed –loosePValue 0.5 –pValue 0.01
An example usage is for batch mode is:
$ chicViewpointBackgroundModel –matrices matrix1.cool matrix2.cool matrix3.cool –referencePoints referencePointsFile.bed –range 20000 40000 –outFileName background_model.bed
The main difference between single mode and batch mode is that in single mode the parameter –interactionFile is interpreted as a list of viewpoint files created with chicViewpoint, whereas in batch mode only one file is allowed, containing the file names of viewpoint files (one per line). This file is created by chicViewpoint and the parameter –writeFileNamesToFile. In batch mode, please remember to specify the folder (via –interactionFileFolder) where chicViewpoint wrote the files.
usage: chicSignificantInteractions --interactionFile INTERACTIONFILE
[INTERACTIONFILE ...] --pValue PVALUE
[--xFoldBackground XFOLDBACKGROUND | --loosePValue LOOSEPVALUE]
--backgroundModelFile BACKGROUNDMODELFILE
--range RANGE RANGE
[--outFileNameSuffix OUTFILENAMESUFFIX]
[--interactionFileFolder INTERACTIONFILEFOLDER]
[--targetFolder TARGETFOLDER]
[--outputFolder OUTPUTFOLDER]
[--writeFileNamesToFile WRITEFILENAMESTOFILE]
[--targetFileList TARGETFILELIST]
[--batchMode] [--threads THREADS]
[--truncateZeroPvalues]
[--fixateRange FIXATERANGE]
[--peakInteractionsThreshold PEAKINTERACTIONSTHRESHOLD]
[--resolution RESOLUTION]
[--computeSampleNumber COMPUTESAMPLENUMBER]
[--multipleTesting {fdr,bonferroni,None}]
[--thresholdMultipleTesting THRESHOLDMULTIPLETESTING]
[--help] [--version]
Named Arguments¶
- --xFoldBackground, -xf
Filter x-fold over background. Used to merge neighboring bins with a broader peak but less significant interactions to a single peak with high significance. Used only for pValue option.
- --loosePValue, -lp
loose p-value threshold to filter target regions in a first round. Used to merge neighboring bins with a broader peak but less significant interactions to a single peak with high significance. Used only for pValue option.
Required arguments¶
- --interactionFile, -if
path to the interaction files which should be used for aggregation of the statistics.
- --pValue, -p
p-value threshold to filter target regions for inclusion in differential analysis.
- --backgroundModelFile, -bmf
path to the background file which is necessary to compute the rbz-score
- --range
Defines the region upstream and downstream of a reference point which should be included. Format is –region upstream downstream, e.g. –region 500000 500000 plots 500kb up- and 500kb downstream. This value should not exceed the range used in the other chic-tools.
Optional arguments¶
- --outFileNameSuffix, -suffix
File name suffix to save the results; prefix is the input file name (Default: “_significant_interactions.txt”).
Default: “_significant_interactions.txt”
- --interactionFileFolder, -iff
Folder where the interaction files are stored. Applies only for batch mode (Default: “.”).
Default: “.”
- --targetFolder, -tf
Folder where the target files are stored (Default: “targetFolder”).
Default: “targetFolder”
- --outputFolder, -o
Output folder of the significant interaction files (Default: “significantFiles”).
Default: “significantFiles”
- --writeFileNamesToFile, -w
(Default: “significantFilesBatch.txt”).
Default: “significantFilesBatch.txt”
- --targetFileList, -tl
The file to store the target file names (Default: “targetList.txt”).
Default: “targetList.txt”
- --batchMode, -bm
Turn on batch mode. The given file for –interactionFile and or –targetFile contain a list of the to be processed files.
Default: False
- --threads, -t
Number of threads (uses the python multiprocessing module) (Default: 4).
Default: 4
- --truncateZeroPvalues, -tzpv
Sets all p-values which are equal to zero to one. This has the effect that the associated positions are not part of the significance decision.
Default: False
- --fixateRange, -fs
Fixate range of backgroundmodel starting at distance x. E.g. all values greater than 500kb are set to the value of the 500kb bin (Default: 500000).
Default: 500000
- --peakInteractionsThreshold, -pit
The minimum number of interactions a detected peak needs to have to be considered (Default: 5).
Default: 5
- --resolution, -r
Resolution of the bin in genomic units. Values are set as number of bases, e.g. 1000 for a 1kb, 5000 for a 5kb or 10000 for a 10kb resolution.This value is used to merge neighboring bins (Default: 1000).
Default: 1000
- --computeSampleNumber, -csn
Number of samples to compute together. Applies only in batch mode (Default: 2).
Default: 2
- --multipleTesting, -mt
Possible choices: fdr, bonferroni, None
Multiple testing correction per relative distance with Bonferroni or FDR (Default: “None”).
Default: “None”
- --thresholdMultipleTesting, -tmt
Threshold for Bonferroni / FDR. Either a float value for all or a file with one threshold per relative distance.
- --version
show program’s version number and exit
chicAggregateStatistic¶
chicAggregateStatistic is a preprocessing tool for chicDifferentialTest. It takes two consecutive viewpoint files and one target file and creates one file containing all locations which should be tested for differential interactions. Either one target file for two consecutive viewpoint files or one target file for all viewpoints is accepted.
An example usage is:
$ chicAggregateStatistic –interactionFile viewpoint1.txt viewpoint2.txt –targetFile targets.txt –outFileNameSuffix aggregated.txt
which will create a single output file: viewpoint1_viewpoint2_aggregated.txt
A second mode is the batch processing mode. For this you need a file containing the names of the viewpoint files (generated by chicViewpoint via –writeFileNamesToFile), a folder which contains the files, a target list file containing the name of all target files and a folder which contains the target files (created by chicSignificantInteractions):
$ chicAggregateStatistic –interactionFile viewpoint_names.txt –targetFile target_names.txt –interactionFileFolder viewpointFilesFolder –targetFileFolder targetFolder –batchMode –threads 20 –outFileNameSuffix aggregated.bed
If the –targetFileFolder flag is not set in batch mode, it is assumed the –targetFile should be used for all viewpoints.
usage: chicAggregateStatistic --interactionFile INTERACTIONFILE
[INTERACTIONFILE ...]
[--targetFile TARGETFILE [TARGETFILE ...]]
[--outFileNameSuffix OUTFILENAMESUFFIX]
[--interactionFileFolder INTERACTIONFILEFOLDER]
[--targetFileFolder TARGETFILEFOLDER]
[--outputFolder OUTPUTFOLDER]
[--writeFileNamesToFile WRITEFILENAMESTOFILE]
[--batchMode] [--threads THREADS] [--help]
[--version]
Required arguments¶
- --interactionFile, -if
path to the interaction files which should be used for aggregation of the statistics.
- --targetFile, -tf
path to the target files which contains the target regions to prepare data for differential analysis.
Optional arguments¶
- --outFileNameSuffix, -suffix
File name suffix to save the result (Default: “_aggregate_target.txt”).
Default: “_aggregate_target.txt”
- --interactionFileFolder, -iff
Folder where the interaction files are stored. Applies only for batch mode (Default: “.”).
Default: “.”
- --targetFileFolder, -tff
Folder where the target files are stored. Applies only for batch mode.
- --outputFolder, -o
Output folder containing the files (Default: “aggregatedFiles”).
Default: “aggregatedFiles”
- --writeFileNamesToFile, -w
(Default: “aggregatedFilesBatch.txt”).
Default: “aggregatedFilesBatch.txt”
- --batchMode, -bm
turns on batch mode. The files provided by –interactionFile and/or –targetFile contain a list of the files to be processed.
Default: False
- --threads, -t
Number of threads (uses the python multiprocessing module)ist (Default: 4).
Default: 4
- --version
show program’s version number and exit
chicDifferentialTest¶
chicDifferentialTest tests if two locations under consideration of the reference point have a different interaction count. For this either Fisher’s test or the chi2 contingency test can be used. The files that are accepted for this test can be created with chicAggregateStatistic. H0 assumes the interactions are not different. Therefore the differential interaction counts are all where H0 was rejected.
An example usage is:
$ chicDifferentialTest –interactionFile viewpoint1_aggregated.txt viewpoint2_aggregated.txt –alpha 0.05 –statisticTest fisher –outputFolder differentialResults
and this will create three files: viewpoint1_viewpoint2_aggregated_H0_accepted.txt, viewpoint1_viewpoint2_aggregated_H0_rejected.txt, viewpoint1_viewpoint2_aggregated_results.txt
The first file contains all locations where H0 was accepted, the second file all locations where H0 was rejected and the third one all locations with the test result.
A second mode is the batch processing mode. For this you need a file containing the names of the aggregated files (generated by chicAggregateStatistic via –writeFileNamesToFile and the batch mode):
$ chicDifferentialTest –statisticTest fisher –alpha 0.05 –interactionFile aggregatedFilesBatch.txt –interactionFileFolder aggregatedFilesFolder –batchMode –threads 20 –outputFolder differentialResults
This will create, as in the non-batch mode, three files per aggregated file and writes the file name to the file given by –rejectedFileNamesToFile. This last file can be used to plot the differential interactions per viewpoint in batch mode, using chicPlotViewpoint.
usage: chicDifferentialTest --interactionFile INTERACTIONFILE
[INTERACTIONFILE ...] --alpha ALPHA
[--interactionFileFolder INTERACTIONFILEFOLDER]
[--outputFolder OUTPUTFOLDER]
[--statisticTest {fisher,chi2}] [--batchMode]
[--threads THREADS]
[--rejectedFileNamesToFile REJECTEDFILENAMESTOFILE]
[--help] [--version]
Required arguments¶
- --interactionFile, -if
path to the interaction files which should be used for the differential test.
- --alpha, -a
define a significance level (alpha) for accepting samples
Optional arguments¶
- --interactionFileFolder, -iff
Folder where the interaction files are stored. Applies only for batch mode (Default: “.”).
Default: “.”
- --outputFolder, -o
Output folder of the files (Default: “differentialResults”).
Default: “differentialResults”
- --statisticTest
Possible choices: fisher, chi2
Type of test used: fisher’s exact test or chi2 contingency (Default: “fisher”).
Default: “fisher”
- --batchMode, -bm
turn on batch mode. The given file for –interactionFile and or –targetFile contain a list of the to be processed files.
Default: False
- --threads, -t
Number of threads (uses the python multiprocessing module) (Default: 4).
Default: 4
- --rejectedFileNamesToFile, -r
Writes the names of the rejected H0 (therefore containing the differential interactions) to file. Can be used for batch processing mode of chicPlotViewpoint. (Default: “rejected_H0.txt”).
Default: “rejected_H0.txt”
- --version
show program’s version number and exit
chicPlotViewpoint¶
chicPlotViewpoint plots one or many viewpoints with the average background model and the computed p-value per sample. In addition, it can highlight differential interactions of two samples and/or significant regions.
An example usage is:
$ chicPlotViewpoint –interactionFile viewpoint1.txt viewpoint2.txt –range 500000 500000 –backgroundModelFile background_model.txt –pValue –outFileName viewpoint1_2.png –dpi 300
In batch mode the list of file names and the folders containing the files need to be given:
$ chicPlotViewpoint –interactionFile viewpoint_names.txt -interactionFileFolder viewpointFilesFolder –differentialTestResult rejected_H0.txt –differentialTestResultsFolder differentialFolder –range 500000 500000 –backgroundModelFile background_model.txt –pValue –outputFolder plotsFOlder –dpi 300 –threads 20
usage: chicPlotViewpoint --interactionFile INTERACTIONFILE
[INTERACTIONFILE ...] --range RANGE RANGE
[--backgroundModelFile BACKGROUNDMODELFILE]
[--interactionFileFolder INTERACTIONFILEFOLDER]
[--differentialTestResult DIFFERENTIALTESTRESULT [DIFFERENTIALTESTRESULT ...]]
[--significantInteractionFileFolder SIGNIFICANTINTERACTIONFILEFOLDER]
[--differentialTestResultsFolder DIFFERENTIALTESTRESULTSFOLDER]
[--significantInteractions SIGNIFICANTINTERACTIONS [SIGNIFICANTINTERACTIONS ...]]
[--plotSignificantInteractions]
[--outputFolder OUTPUTFOLDER]
[--outputFormat OUTPUTFORMAT] [--dpi DPI]
[--binResolution BINRESOLUTION]
[--colorMapPvalue COLORMAPPVALUE]
[--maxPValue MAXPVALUE] [--minPValue MINPVALUE]
[--pValue]
[--pValueSignificanceLevels PVALUESIGNIFICANCELEVELS [PVALUESIGNIFICANCELEVELS ...]]
[--xFold XFOLD] [--truncateZeroPvalues]
[--outFileName OUTFILENAME] [--batchMode]
[--plotSampleNumber PLOTSAMPLENUMBER]
[--colorList COLORLIST [COLORLIST ...]]
[--threads THREADS] [--help] [--version]
Required arguments¶
- --interactionFile, -if
path to the interaction files which should be used for plotting
- --range
Defines the region upstream and downstream of a reference point which should be included. Format is –region upstream downstream, e.g.: –region 500000 500000 plots 500kb up- and 500kb downstream. This value should not exceed the range used in the other chic-tools.
Optional arguments¶
- --backgroundModelFile, -bmf
path to the background file which should be used for plotting
- --interactionFileFolder, -iff
Folder where the interaction files are stored. Applies only for batch mode (Default: “.”).
Default: “.”
- --differentialTestResult, -dif
Path to the H0 rejected files to highlight the regions in the plot.
- --significantInteractionFileFolder, -siff
Folder where the files with detected significant interactions are stored. Applies only for batch mode (Default: “.”).
Default: “.”
- --differentialTestResultsFolder, -diff
Folder where the H0 rejected files are stored. Applies only for batch mode (Default: “.”).
Default: “.”
- --significantInteractions, -si
Path to the files with detected significant interactions to highlight the regions in the plot.
- --plotSignificantInteractions, -psi
Highlights the significant interactions in the plot itself. If not set, only the p-values are updated
Default: False
- --outputFolder, -of
Output folder of the files (Default: “.”).
Default: “.”
- --outputFormat, -format
Output format of the plot (Default: “png”).
Default: “png”
- --dpi
Optional parameter: Resolution for the image, ifoutput is a raster graphics image (e.g png, jpg) (Default: 300).
Default: 300
- --binResolution, -r
Resolution of the bin in genomic units. Values are set as number of bases, e.g. 1000 for a 1kb, 5000 for a 5kb or 10000 for a 10kb resolution (Default: 1000).
Default: 1000
- --colorMapPvalue
Color map to use for the p-value. Available values can be seen here: http://matplotlib.org/examples/color/colormaps_reference.html (Default: “RdYlBu”).
Default: “RdYlBu”
- --maxPValue, -map
Maximal value for p-value. Values above this threshold are set to this value (Default: 0.1).
Default: 0.1
- --minPValue, -mp
Minimal value for p-value. Values below this threshold are set to this value (Default: 0.0).
Default: 0.0
- --pValue, -p
Plot p-values as a colorbar
Default: False
- --pValueSignificanceLevels, -psl
Highlight the p-values by the defined significance levels.
- --xFold, -xf
Plot x-fold region for the mean background.
- --truncateZeroPvalues, -tzpv
Sets all p-values which are equal to zero to one.
Default: False
- --outFileName, -o
File name to save the image. Not used in batch mode.
- --batchMode, -bm
The given file for –interactionFile and or –targetFile contain a list of the to be processed files.
Default: False
- --plotSampleNumber, -psn
Number of samples per plot. Applies only in batch mode (Default: 2).
Default: 2
- --colorList, -cl
Colorlist for the viewpoint lines (Default g b c m y k).
Default: [‘g’, ‘b’, ‘c’, ‘m’, ‘y’, ‘k’]
- --threads, -t
Number of threads (uses the python multiprocessing module) (Default: 4).
Default: 4
- --version
show program’s version number and exit
For single-cell Hi-C data analysis please use scHiCExplorer .
tool |
type |
input files |
main output file(s) |
application |
---|---|---|---|---|
preprocessing |
1 genome FASTA file |
bed file with restriction site coordinates |
Identifies the genomic locations of restriction sites |
|
preprocessing |
2 BAM/SAM files |
hicMatrix object |
Creates a Hi-C matrix using the aligned BAM files of the Hi-C sequencing reads |
|
preprocessing |
hicMatrix object |
normalized hicMatrix object |
Uses iterative correction or Knight-Ruiz to remove biases from a Hi-C matrix |
|
preprocessing |
hicMatrix object |
hicMatrix object |
Merges consecutive bins on a Hi-C matrix to reduce resolution |
|
preprocessing |
2 or more hicMatrix objects |
hicMatrix object |
Adds Hi-C matrices of the same size |
|
preprocessing |
multiple Hi-C matrices |
multiple Hi-C matrices |
Normalize data to 0 to 1 range or to smallest total read count |
|
analysis |
2 or more hicMatrix objects |
a heatmap/scatter plot |
Computes and visualizes the correlation of Hi-C matrices |
|
analysis |
hicMatrix object |
bedGraph file (TAD score), a boundaries.bed file, a domains.bed file (TADs) |
Identifies Topologically Associating Domains (TADs) |
|
analysis |
multiple TAD domain files |
tad domain file with merged tad locations multiple files with plotted TAD relations |
Merges detect TADs locations of different resolutions; hierarchical relation between TADs as multiple plots |
|
analysis |
two Hi-C matrices one TAD domain file |
two diff_tad files: accepted H0 and rejected H0. Similar to BED6 |
Identifies differential Topologically Associating Domains (TADs) between two Hi-C matrices |
|
visualization |
hicMatrix object |
a heatmap of Hi-C contacts |
Plots a Hi-C matrix as a heatmap |
|
visualization |
hicMatrix object, a config file |
Hi-C contacts on a given region, along with other provided signal (bigWig) or regions (bed) file |
Plots TADs as a track that can be combined with other tracks (genes, signal, interactions) |
|
visualization |
hicMatrix object |
log log plot of Hi-C contacts per distance |
Quality control |
|
data integration |
one/multiple Hi-C file formats |
Hi-C matrices/outputs in several formats |
Convert matrix to different formats |
|
data integration |
one Hi-C file formats |
Hi-C matrix |
Removes, masks or keeps specified regions of a matrix |
|
information |
one or more hicMatrix objects |
Screen info |
Prints information about matrices, like size, maximum, minimum, bin size, etc. |
|
analysis |
one Hi-C matrix |
bedgraph or bigwig file(s) for each eigenvector |
Computes for A / B compartments the eigenvectors |
|
analysis |
one Hi-C matrix |
Hi-C matrix |
Computes a obs_exp matrix like Lieberman-Aiden (2009), a pearson correlation matrix and or a covariance matrix. These matrices can be used for plotting. |
|
visualization |
one Hi-C matrix |
A viewpoint plot |
A plot with the interactions around a reference point or region. |
|
information |
log files from hicBuildMatrix |
A quality control report |
Quality control of the created contact matrix. |
|
information |
2 BAM/SAM files |
An estimated quality control report |
Estimated quality report of the Hi-C data. |
|
analysis |
two Hi-C matrices |
one Hi-C matrix |
Applies diff, ratio or log2ratio on matrices to compare them. |
|
analysis |
multiple Hi-C matrices |
one npz object |
Averages the given locations. Visualization with hicPlotAverageRegions |
|
analysis |
one Hi-C matrices |
bedgraph file with loop locations |
Detects enriched regions. Visualization with hicPlotmatrix and –loop parameter. |
|
analysis |
one loop, one protein peak file |
bedgraph file with matched loop locations, one file with loop / protein statistics |
Matches loop locations with protein peak positions |
|
analysis |
multiple loop files |
bedgraph file with merged loop locations |
Merges detect loop locations of different resolutions |
|
analysis |
one Hi-C matrix, protein peaks |
best parameter setting |
Search for best parameter setting for hicDetectLoops |
|
analysis |
one Hi-C matrix, protein peaks |
best parameter setting |
Search for best parameter setting for Juicer’s HiCCUPS |
|
visualization |
one Hi-C interaction matrix one PCA bedgraph file |
one image polarization plot |
The global compartmentalization signal. |
|
visualization |
one npz file |
one image |
Visualization of hicAverageRegions. |
|
analysis |
one / multiple Hi-C matrices |
one image, p-values file, raw data file |
Computes short/long range contacts; a box plot, a p-value and raw data file |
|
preprocessing |
one Hi-C matrix, one BED file |
one Hi-C matrix |
Uses a BED file of domains or TAD boundaries to merge the bin counts of a Hi-C matrix. |
|
preprocessing |
Hi-C matrices reference point BED file |
two plots accepted reference point BED file |
Checks for sparsity of viewpoints and removes them if too sparse. |
|
preprocessing |
Hi-C matrices reference point BED file |
background model file |
Creates a background model for all given samples and reference points. |
|
preprocessing |
Hi-C matrices background model file |
viewpoint file(s) |
Creates per sample per viewpoint one viewpoint file. |
|
preprocessing analysis |
viewpoint file(s) background model file |
significant interaction file(s) target file(s) |
Detects significant interactions per viewpoint based on the background and neighborhood merging via x-fold and loose p-values. |
|
preprocessing |
viewpoint files(s) target file (s) |
aggregated file(s) for differential test |
Aggregates for one viewpoint of two samples via a target file the locations to test for differential interactions. |
|
analysis |
aggregated file(s) of two samples |
H0_accepted-, H0_rejected-files |
Tests with chi2 or fisher for differential interactions of two samples. |
|
visualization |
viewpoint file(s) differential expression file(s) significant interactions file(s) |
one image per viewpoint |
Visualization of a viewpoint. |
General principles¶
A typical HiCExplorer command could look like this:
$ hicPlotMatrix -m myHiCmatrix.h5 \
-o myHiCmatrix.pdf \
--clearMaskedBins \
--region chrX:10,000,000-15,000,000 \
--vMin -4 --vMax 4 \
You can always see all available command-line options via –help:
$ hicPlotMatrix --help
Output format of plots should be indicated by the file ending, e.g.
MyPlot.pdf
will return a pdf file,MyPlot.png
a png-file.Most of the tools that produce plots can also output the underlying data - this can be useful in cases where you don’t like the HiCExplorer visualization, as you can then use the data matrices produced by deepTools with your favorite plotting tool, such as R.
The vast majority of command line options are also available in Galaxy (in a few cases with minor changes to their naming).
Example usage¶
Hi-C analysis of mouse ESCs using HiCExplorer¶
The following example shows how we can use HiCExplorer to analyze a published dataset. Here we are using a Hi-C dataset from Marks et. al. 2015, on mouse ESCs.
Protocol¶
The collection of the cells for Hi-C and the Hi-C sample preparation procedure was performed as previously described Lieberman-Aiden et al., with the slight modification that DpnII was used as restriction enzyme during initial digestion. Paired-end libraries were prepared according to Lieberman-Aiden et al. and sequenced on the NextSeq 500 platform using 2 × 75 bp sequencing.
Prepare for analysis¶
Download Raw fastq files¶
The fastq files can be downloaded from the EBI archive (or NCBI archive). We will store the files in the directory original_data.
mkdir original_data
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/007/SRR1956527/SRR1956527_1.fastq.gz -O original_data/SRR1956527_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/007/SRR1956527/SRR1956527_2.fastq.gz -O original_data/SRR1956527_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/008/SRR1956528/SRR1956528_1.fastq.gz -O original_data/SRR1956528_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/008/SRR1956528/SRR1956528_2.fastq.gz -O original_data/SRR1956528_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/009/SRR1956529/SRR1956529_1.fastq.gz -O original_data/SRR1956529_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/009/SRR1956529/SRR1956529_2.fastq.gz -O original_data/SRR1956529_2.fastq.gz
Create an index¶
We start with creating an index for our alignment software for the GRCm38/mm10 genome. As a source we use the mm10 genome from UCSC
mkdir genome_mm10
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz -O genome_mm10/chromFa.tar.gz
tar -xvzf genome_mm10/chromFa.tar.gz
cat genome_mm10/*.fa > genome_mm10/mm10.fa
We have the mm10 genome stored in one fasta file and can build the index. We tried it successfully with hisat2, bowtie2 and bwa. Run the mapping with one of them and do not mix them!
hisat2-build -p 8 genome_mm10/mm10.fa hisat2/mm10_index
You can find more information about hisat
Mapping the RAW files¶
Mates have to be mapped individually to avoid mapper specific heuristics designed for standard paired-end libraries.
It is important to have in mind for the different mappers:
for either bowtie2 or hisat2 use the –reorder parameter which tells bowtie2 or hisat2 to output the sam files in the exact same order as in the .fastq files.
use local mapping, in contrast to end-to-end. A fraction of Hi-C reads are chimeric and will not map end-to-end thus, local mapping is important to increase the number of mapped reads.
Tune the aligner parameters to penalize deletions and insertions. This is important to avoid aligned reads with gaps if they happen to be chimeric.
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956527_1.fastq.gz --reorder | samtools view -Shb - > SRR1956527_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956527_2.fastq.gz --reorder | samtools view -Shb - > SRR1956527_2.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956528_1.fastq.gz --reorder | samtools view -Shb - > SRR1956528_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956528_2.fastq.gz --reorder | samtools view -Shb - > SRR1956528_2.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956529_1.fastq.gz --reorder | samtools view -Shb - > SRR1956529_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956529_2.fastq.gz --reorder | samtools view -Shb - > SRR1956529_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956527_1.fastq.gz --reorder | samtools view -Shb - > SRR1956527_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956527_2.fastq.gz --reorder | samtools view -Shb - > SRR1956527_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956528_1.fastq.gz --reorder | samtools view -Shb - > SRR1956528_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956528_2.fastq.gz --reorder | samtools view -Shb - > SRR1956528_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956529_1.fastq.gz --reorder | samtools view -Shb - > SRR1956529_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956529_2.fastq.gz --reorder | samtools view -Shb - > SRR1956529_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956527_1.fastq.gz | samtools view -Shb - > SRR1956527_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956527_2.fastq.gz | samtools view -Shb - > SRR1956527_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956528_1.fastq.gz | samtools view -Shb - > SRR1956528_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956528_2.fastq.gz | samtools view -Shb - > SRR1956528_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956529_1.fastq.gz | samtools view -Shb - > SRR1956529_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956529_2.fastq.gz | samtools view -Shb - > SRR1956529_2.bam
Build, visualize and correct Hi-C matrix¶
Create a Hi-C matrix using the aligned files¶
In the following we will create three Hi-C matrices and merge them to one.
hicBuildMatrix builds the matrix of read counts over the bins in the genome, considering the sites around the given restriction site. We need to provide:
the input BAM/SAM files: –samFiles SRR1956527_1.sam SRR1956527_2.sam
binsize: –binSize 1000
restriction sequence: –restrictionSequence GATC
dangling end sequence: –danglingSequence GATC
the restriction sequence cut site file: create it with hicFindRestSite, the restriction sequence ‘GATC’ and the above used mouse genome fasta file
the name of output bam file which contains the accepted alignments: –outBam SRR1956527_ref.bam
name of output matrix file: –outFileName hicMatrix/SRR1956527_10kb.h5
the folder for the quality report: –QCfolder hicMatrix/SRR1956527_QC
the number of to be used threads. Minimum value is 3: –threads 8
the buffer size for each thread buffering inputBufferSize lines of each input BAM/SAM file: –inputBufferSize 400000
To build the Hi-C matrices:
mkdir hicMatrix
hicBuildMatrix --samFiles SRR1956527_1.bam SRR1956527_2.bam --binSize 10000 --restrictionSequence GATC --danglingSequence GATC --restrictionCutFile cut_sites.bed --outBam SRR1956527_ref.bam --outFileName hicMatrix/SRR1956527_10kb.h5 --QCfolder hicMatrix/SRR1956527_10kb_QC --threads 8 --inputBufferSize 400000
hicBuildMatrix --samFiles SRR1956528_1.bam SRR1956528_2.bam --binSize 10000 --restrictionSequence GATC --danglingSequence GATC --restrictionCutFile cut_sites.bed --outBam SRR1956528_ref.bam --outFileName hicMatrix/SRR1956528_10kb.h5 --QCfolder hicMatrix/SRR1956528_10kb_QC --threads 8 --inputBufferSize 400000
hicBuildMatrix --samFiles SRR1956529_1.bam SRR1956529_2.bam --binSize 10000 --restrictionSequence GATC --danglingSequence GATC --restrictionCutFile cut_sites.bed --outBam SRR1956529_ref.bam --outFileName hicMatrix/SRR1956529_10kb.h5 --QCfolder hicMatrix/SRR1956529_10kb_QC --threads 8 --inputBufferSize 400000
The output bam files show that we have around 34M, 54M and 58M selected reads for SRR1956527, SRR1956528 & SRR1956529, respectively. Normally 25% of the total reads are selected. The output matrices have counts for the genomic regions. The extension of output matrix files is .h5.
A quality report is created in e.g. hicMatrix/SRR1956527_10kb_QC, have a look at the report hicQC.html.

A segment of Hi-C quality report.¶
To increase the depth of reads we merge the counts from these three replicates.
hicSumMatrices --matrices hicMatrix/SRR1956527_10kb.h5 hicMatrix/SRR1956528_10kb.h5 \
hicMatrix/SRR1956529_10kb.h5 --outFileName hicMatrix/replicateMerged_10kb.h5
Plot Hi-C matrix¶
A 10kb bin matrix is quite large to plot and is better to reduce the resolution (to know the size of a Hi-C matrix use the tool hicInfo), i.e. we usually run out of memory for a 1 kb or a 10 kb matrix and second, the time to plot is very long (minutes instead of seconds). For this we use the tool hicMergeMatrixBins.
hicMergeMatrixBins merges the bins into larger bins of given number (specified by –numBins). We will merge 1000 bins in the original (uncorrected) matrix and then correct it. The new bin size is going to be 10.000 bp * 100 = 1.000.000 bp = 1 Mb
hicMergeMatrixBins \
--matrix hicMatrix/replicateMerged_10kb.h5 --numBins 100 \
--outFileName hicMatrix/replicateMerged.100bins.h5
hicPlotMatrix can plot the merged matrix. We use the following options:
the matrix to plot: –matrix hicMatrix/replicateMerged.100bins.h5
logarithmic values for plotting: –log1p
the resolution of the plot: –dpi 300
masked bins should not be plotted: –clearMaskedBins
the order of the chromosomes in the plot: –chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY
the color map: –colorMap jet
the title of the plot: –title “Hi-C matrix for mESC”
the plot image itself: –outFileName plots/plot_1Mb_matrix.png
mkdir plots
hicPlotMatrix \
--matrix hicMatrix/replicateMerged.100bins.h5 \
--log1p \
--dpi 300 \
--clearMaskedBins \
--chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--colorMap jet \
--title "Hi-C matrix for mESC" \
--outFileName plots/plot_1Mb_matrix.png

The Hi-C interaction matrix with a resolution of 1 MB.¶
hicCorrectMatrix corrects the matrix counts in an iterative manner. For correcting the matrix, it’s important to remove the unassembled scaffolds (e.g. NT_) and keep only chromosomes, as scaffolds create problems with matrix correction. Therefore we use the chromosome names (1-19, X, Y) here. Important: Use ‘chr1 chr2 chr3 etc.’ if your genome index uses chromosome names with the ‘chr’ prefix.
Matrix correction works in two steps: first a histogram containing the sum of contact per bin (row sum) is produced. This plot needs to be inspected to decide the best threshold for removing bins with lower number of reads. The second steps removes the low scoring bins and does the correction.
In the following we will use a matrix with a bin size of 20 kb: 10kb * 2 = 20 kb
hicMergeMatrixBins \
--matrix hicMatrix/replicateMerged_10kb.h5 --numBins 2 \
--outFileName hicMatrix/replicateMerged.matrix_20kb.h5
(1-19, X, Y) variant:
hicCorrectMatrix diagnostic_plot \
--chromosomes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 --plotName hicMatrix/diagnostic_plot.png
(chr1-ch19, chrX, chrY) variant:
hicCorrectMatrix diagnostic_plot \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 --plotName hicMatrix/diagnostic_plot.png

Diagnostic plot for the Hi-C matrix at a resolution of 20 kb¶
The output of the program prints a threshold suggestion that is usually accurate but is better to revise the histogram plot. The threshold is visualized in the plot as a black vertical line. See Example usage for an example and for more info.
- The threshold parameter needs two values:
low z-score
high z-score
“The absolute value of z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.” (Source). For more information see wikipedia.
The z-score definition.¶
In our case the distribution describes the counts per bin of a genomic distance. To remove all bins with a z-score threshold less / more than X means to remove all bins which have less / more counts than X of mean of their specific distribution in units of the standard deviation.
Looking at the above distribution, we can select the value of -2 (lower end) and 3 (upper end) to remove. This is given by the –filterThreshold option in hicCorrectMatrix.
(1-19, X, Y) variant:
hicCorrectMatrix correct \
--chromosomes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 \
--filterThreshold -2 3 --perchr --outFileName hicMatrix/replicateMerged.Corrected_20kb.h5
(chr1-ch19, chrX, chrY) variant:
hicCorrectMatrix correct \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 \
--filterThreshold -2 3 --perchr --outFileName hicMatrix/replicateMerged.Corrected_20kb.h5
It can happen that the correction stops with:
`ERROR:iterative correction:*Error* matrix correction produced extremely large values.
This is often caused by bins of low counts. Use a more stringent filtering of bins.`
This can be solved by a more stringent z-score values for the filter threshold or by a look at the plotted matrix. In our case we see that chromosome Y is having more or less 0 counts in its bins. This chromosome can be excluded from the correction by not defining it for the set of chromosomes that should be corrected, parameter –chromosomes.
In the case of multiple samples / replicates that need to be normalized to the same read coverage we recommend to compute first the normalization (with hicNormalize) and correct the data (with hicCorrectMatrix) in a second step.
We can now plot the one of the chromosomes (e.g. chromosome X) , with the corrected matrix.
- New parameter:
The region to plot: –region chrX:10000000-2000000 or –region chrX
(1-19, X, Y) variant:
hicPlotMatrix \
--log1p --dpi 300 \
-matrix hicMatrix/replicateMerged.Corrected_20kb.npz \
--region X --title "Corrected Hi-C matrix for mESC : chrX" \
--outFileName plots/replicateMerged_Corrected-20kb_plot-chrX.png
(chr1-ch19, chrX, chrY) variant:
hicPlotMatrix \
--log1p --dpi 300 \
--matrix hicMatrix/replicateMerged.Corrected_20kb.npz \
--region chrX --title "Corrected Hi-C matrix for mESC : chrX" \
--outFileName plots/replicateMerged_Corrected-20kb_plot-chrX.png

The Hi-C interaction matrix for chromosome X.¶
Plot TADs¶
“The partitioning of chromosomes into topologically associating domains (TADs) is an emerging concept that is reshaping our understanding of gene regulation in the context of physical organization of the genome” [Ramirez et al. 2017].
Find TADs¶
TAD calling works in two steps: First HiCExplorer computes a TAD-separation score based on a z-score matrix for all bins. Then those bins having a local minimum of the TAD-separation score are evaluated with respect to the surrounding bins to decide assign a p-value. Then a cutoff is applied to select the bins more likely to be TAD boundaries.
hicFindTADs tries to identify sensible parameters but those can be change to identify more stringent set of boundaries.
mkdir TADs
hicFindTADs --matrix hicMatrix/replicateMerged.Corrected_20kb.h5 \
--minDepth 60000 --maxDepth 120000 --numberOfProcessors 8 --step 20000 \
--outPrefix TADs/marks_et-al_TADs_20kb-Bins --minBoundaryDistance 80000 \
--correctForMultipleTesting fdr --threshold 0.05
As an output we get the boundaries, domains and scores separated files. We will use in the plot below only the TAD-score file.
Build Tracks File¶
We can plot the TADs for a given chromosomal region. For this we need to create a track file containing the instructions to build the plot. The hicPlotTADs documentation contains the instructions to build the track file.
In following plot we will use the listed track file. Please store it as track.ini.
[hic]
file = hicMatrix/replicateMerged.Corrected_20kb.h5
title = HiC mESC chrX:99974316-101359967
colormap = RdYlBu_r
depth = 2000000
height = 7
transform = log1p
file_type = hic_matrix
[tads]
file = TADs/marks_et-al_TADs_20kb-Bins_domains.bed
file_type = domains
border_color = black
color = none
line_width = 1.5
overlay_previous = share-y
show_data_range = no
[x-axis]
fontsize = 16
where = top
[tad score]
file = TADs/marks_et-al_TADs_20kb-Bins_score.bm
title = TAD separation score
height = 4
file_type = bedgraph_matrix
[spacer]
[gene track]
file = mm10_genes_sorted.bed
height = 10
title = mm10 genes
labels = false
We used as a gene track mm10 genes and sorted with sortBed from bedtools.
Plot¶
We plot the result with:
(1-19, X, Y) variant:
hicPlotTADs --tracks track.ini --region X:98000000-105000000 \
--dpi 300 --outFileName plots/marks_et-al_TADs.png \
--title "Marks et. al. TADs on X"
(chr1-ch19, chrX, chrY) variant:
hicPlotTADs --tracks track.ini --region chrX:98000000-105000000 \
--dpi 300 --outFileName plots/marks_et-al_TADs.png \
--title "Marks et. al. TADs on X"
The result is:

TADplot¶
Importing and Exporting HiCExplorer data¶
Exporting HiCExplorer output to Bioconductor¶
It’s possible to export Hi-C Matrices produced by HiCExplorer to bioconductor in R, which allows us to use existing bioconductor infrastructure for differential Hi-C analysis. The tool hicConvertFormat allows us to write Hi-C matrices in a format that can easily be imported in bioconductor as GInteractions object. Below is an example.
## Assuming HiCExplorer is installed in ~/programs
hicConvertFormat --matrix ~/programs/HiCExplorer/test/test_data/Li_et_al_2015.h5 \ --inputFormat h5
-o GInteration_example --outputFormat GInteractions
The output file is in tsv format. It looks like this :
V1 |
V2 |
V3 |
V4 |
V5 |
V6 |
V7 |
---|---|---|---|---|---|---|
X |
19537 |
20701 |
X |
19537 |
20701 |
1054.47483 |
X |
19537 |
20701 |
X |
20701 |
22321 |
375.86990 |
X |
19537 |
20701 |
X |
22321 |
24083 |
222.53900 |
X |
19537 |
20701 |
X |
24083 |
25983 |
114.26340 |
X |
19537 |
20701 |
X |
25983 |
27619 |
95.87463 |
This file can now be loaded into R as a GInteractions object, as shown below :
## INSIDE R
library(GenomicRanges)
library(InteractionSet)
hic <- read.delim("GInteraction_example.tsv", header = FALSE)
# Converting data.frame to GInteraction
convertToGI <- function(df){
row.regions <- GRanges(df$V1, IRanges(df$V2,df$V3))# interaction start
col.regions <- GRanges(df$V4, IRanges(df$V5,df$V6))# interaction end
gi <- GInteractions(row.regions, col.regions)
gi$norm.freq <- df$V7 # Interaction frequencies
return(gi)
}
}
hic.gi <- convertToGI(hic)
Multiple files can be loaded, and converted to an InteractionSet object. If you have prepared matrices using binning, the intervals in the matrices must be the same. Therefore it’s easy to merge these matrices together in an InteractionSet object. In case some bins don’t match, we can merge the GInteraction objects based on matching bins, as follows.
# assuming hic.gi is a list of two GInteration objects hic.gi1 and hic.gi2
# hic.gi <- list(hic.gi1, hic.gi2)
# Get common regions between the two objects
combined <- unique(c(hic.gi$hic.gi1, hic.gi$hic.gi2))
# replace original regions with the common regions
replaceRegions(hic.gi$hic.gi1) <- regions(combined)
replaceRegions(hic.gi$hic.gi2) <- regions(combined)
# Get the matching indexes between the two objects
matched <- lapply(hic.gi, function(x) {
match(x, combined)
})
# Create a count matrix (for interaction frequencies)
counts <- matrix(0, ncol = 2, nrow=length(combined)) # counts for unmatched bins set to zero
# fill in the counts for matched bins
counts[matched$hic.gi1,1] <- hic.gi$hic.gi1$norm.freq
counts[matched$hic.gi2,2] <- hic.gi$hic.gi2$norm.freq
# Finally, create the InteractionSet object
iset <- InteractionSet(counts, combined)
InteractionSet objects can be used for packages like diffHic, for differential Hi-C analysis.
For more information on working with GInteraction and InteractionSet objects in bioconductor check out this vignette.
Captured Hi-C data analysis¶
How we use HiCExplorer to analyse cHi-C data¶
This How-to is based on the published dataset from Andrey et al. 2017. For the tutorial, we use the samples FL-E13.5 and MB-E-10.5.
Download the raw data¶
Please download the raw data via the following links or via NCBI GSE84795 .
Dataset |
forward |
reverse |
---|---|---|
CC-FL-E135-Wt-Mm-Rep1 |
||
CC-FL-E135-Wt-Mm-Rep2 |
||
CC-MB-E105-Wt-Mm-Rep1 |
||
CC-MB-E105-Wt-Mm-Rep2 |
Mapping¶
Map the files with a mapper of your choice against the mm9 reference genome; as an example, the mapping with bowtie2 is shown.
bowtie2 -x mm9_index --threads 8 -U SRR3950565_1.fastq.gz --reorder | samtools view -Shb - > SRR3950565_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950565_2.fastq.gz --reorder | samtools view -Shb - > SRR3950565_2.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950566_1.fastq.gz --reorder | samtools view -Shb - > SRR3950566_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950566_2.fastq.gz --reorder | samtools view -Shb - > SRR3950566_2.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950559_1.fastq.gz --reorder | samtools view -Shb - > SRR3950559_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950559_2.fastq.gz --reorder | samtools view -Shb - > SRR3950559_2.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950560_1.fastq.gz --reorder | samtools view -Shb - > SRR3950560_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950560_2.fastq.gz --reorder | samtools view -Shb - > SRR3950560_2.bam
Create cHi-C matrices¶
To create a cHi-C matrix we use HiCExplorer’s hicBuildMatrix for each replicate separately and merge the replicates into a single matrix later. Like Andrey et al., we use a resolution of 1kb and use the restriction enzyme DpnII.
hicBuildMatrix --samFiles SRR3950565_1.bam SRR3950565_2.bam --binSize 1000 --restrictionSequence GATC --outFileName SRR3950565.cool --QCfolder SRR3950565_QC --threads 6
hicBuildMatrix --samFiles SRR3950566_1.bam SRR3950566_2.bam --binSize 1000 --restrictionSequence GATC --outFileName SRR3950566.cool --QCfolder SRR3950566_QC --threads 6
hicBuildMatrix --samFiles SRR3950559_1.bam SRR3950559_2.bam --binSize 1000 --restrictionSequence GATC --outFileName SRR3950559.cool --QCfolder SRR3950559_QC --threads 6
hicBuildMatrix --samFiles SRR3950560_1.bam SRR3950560_2.bam --binSize 1000 --restrictionSequence GATC --outFileName SRR3950560.cool --QCfolder SRR3950560_QC --threads 6
hicSumMatrix --matrices SRR3950565.cool SRR3950566.cool --outFileName FL-E13-5.cool
hicSumMatrix --matrices SRR3950559.cool SRR3950560.cool --outFileName MB-E10-5.cool
Terminology: Reference point vs viewpoint¶
A reference point is one single genomic position i.e. chr1 500 510 is a reference point. A viewpoint is in contrast the region defined by the reference point and the up and downstream range, i.e. range 100 and reference point chr1 50 70 leads to the viewpoint chr1 400 610.
Creation of reference point file¶
Andrey et al. state that they used a total of 460 reference points, but that 24 were removed due to low sequence coverage or non-correspondence to a promoter region, leading to 446 in total.
To reproduce this, we need all reference points which are published in Supplementary Table S2 and S8.
It is simplest to create the reference point file in the following format using Excel and store it as a tab separated file:
chr1 4487435 4487435 Sox17
Otherwise, just download the prepared file. We will do the quality control on our own and compare with the results of Andrey et al.
Quality control¶
As a first step we compute the quality of each viewpoint by considering the sparsity. As soon as one viewpoint in one sample is less than the user-defined threshold (–sparsity), the reference point is no longer considered.
chicQualityControl -m FL-E13-5.cool MB-E10-5.cool -rp reference_points.bed --sparsity 0.025 --threads 20
The quality control creates five files: two plots showing the sparsity structure of the samples and three files containing the accepted reference points, the rejected ones and one file with all viewpoints and their sparsity level per sample.
In our example the plots look like the following:


The first plot shows the sparsity per sample for each viewpoint, while the second one shows the sparsity distribution as a histogram. It can be seen quite clearly that only a minority of the samples are really sparse and therefore need to be removed. The red line indicates the chosen sparsity level.
The reference point Tdap2b at chr1 19198995, which has a sparsity of 0.018 in FL-E13-5 and 0.016 in MB-E10-5, is considered to be of bad quality. To confirm this result we plot the viewpoint:

The plot shows there are effectively no interactions except with the reference point itself and confirm the point should be removed from the data.
The result of the quality control rejected 71 reference points as too sparse, but surprisingly the viewpoints rejected by Andrey et al. are accepted. An explanation for this could be that we only consider two samples and not all samples used by Andrey, and therefore we missed the bad quality of some viewpoints.
Please consider that this bad viewpoint was selected arbitrary out of the sample data and is only an example.
Download the data: Filtered reference points, Quality control raw data and rejected reference points.
Background model¶
The background model computes all viewpoints given by the reference points for both samples in a range defined by the parameter fixateRange. We recommend setting it to 500kb because real interactions above the range are rarely observed and very low interaction numbers such as 1 are already considered to be significant. With this setting, only the interactions in a range 500kb up- and downstream of the reference point are considered for each viewpoint. Based on this data, two background models are computed; the first one computes the average per relative distance to the reference point, and secondly, a negative binomial distribution per relative distance to the reference point is fitted. This first model is used for filtering in the significant interaction evaluation by an x-fold factor and for plotting. The negative binomial model is more important; it is used to compute a p-value per relative distance in each sample, which is used to make the decision if an interaction is considered as significant.
chicViewpointBackgroundModel -m FL-E13-5.cool MB-E10-5.cool --fixateRange 500000 -t 20 -rp reference_points.bed -o background_model.txt
The background model looks like this:
Relative position size nbinom prob nbinom max value mean value
-500000 75.895607451213 0.998528939430 2.333333333333 0.000101543771
-499000 90.348171762247 0.998725799952 2.750000000000 0.000104681360
-498000 78.512621775755 0.998514111424 2.800000000000 0.000106107536
-497000 75.706478185610 0.998327784087 3.800000000000 0.000116147819
You can download the background model.
Viewpoint computation¶
In this step the viewpoints for each reference point listed in a reference_points.bed-file is extracted from the interaction matrix, using the quality controlled file created by chicQualityControl. The up- and downstream range can be given via –range upstream downstream. Please use the same value for –fixateRange as in the background model computation. For each relative distance the x-fold over the average value of this relative distance is computed and each location is assigned a p-value based on the background negative binomial distribution for this relative distance. For each viewpoint one viewpoint file is created and stored in the folder given by the parameter –outputFolder.
chicViewpoint -m FL-E13-5.cool MB-E10-5.cool --averageContactBin 5 --range 1000000 1000000 -rp referencePoints.bed -bmf background_model.txt --writeFileNamesToFile interactionFiles.txt --outputFolder interactionFilesFolder --fixateRange 500000 --threads 20
The name of each viewpoint file starts with its sample name (given by the name of the matrix), the exact location and the gene / promoter name. For example, the viewpoint chr1 4487435 4487435 Sox17 from MB-E10-5.cool matrix will be called MB-E10-5_chr1_4487435_4487435_Sox17.txt and looks like the following:
# Interaction file, created with HiCExplorer's chicViewpoint version 3.2
# MB-E10-5.cool chr1_4487435_4487435 3.49 Mbp 5.49 Mbp Sox17 Sum of interactions in fixate range: 978.0
# Chromosome Start End Gene Sum of interactions Relative position Relative Interactions p-value x-fold Raw
#
chr1 3487000 3488000 Sox17 978.0 -1000000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3488000 3489000 Sox17 978.0 -999000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3489000 3490000 Sox17 978.0 -998000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3490000 3491000 Sox17 978.0 -997000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3491000 3492000 Sox17 978.0 -996000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3492000 3493000 Sox17 978.0 -995000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3493000 3494000 Sox17 978.0 -994000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3494000 3495000 Sox17 978.0 -993000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3495000 3496000 Sox17 978.0 -992000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
chr1 3496000 3497000 Sox17 978.0 -991000 0.000000000000 0.894286365313 0.000000000000 0.000000000000
Each file contains a header with information about the HiCExplorer version used, the sample, the viewpoint and the content of the different columns.
If the parameter –writeFileNamesToFile is set, the viewpoint file names are written to a file which can be used for batch processing in the later steps. Each sample is combined with every other sample for each viewpoint to enable the batch processing for the differential analysis. Example: matrices FL-E13-5.cool and MB-E10-5.cool with the three reference points:
FL-E13-5_chr1_4487435_4487435_Sox17.txt
MB-E10-5_chr1_4487435_4487435_Sox17.txt
FL-E13-5_chr1_14300280_14300280_Eya1.txt
MB-E10-5_chr1_14300280_14300280_Eya1.txt
FL-E13-5_chr1_19093103_19093103_Tfap2d.txt
MB-E10-5_chr1_19093103_19093103_Tfap2d.txt
Significant interactions detection¶
To detect significant interactions and to prepare a target file for each viewpoint which will be used for the differential analysis, the script chicSignificantInteractions is used. It offers two modes: either the user can specify an x-fold value or a loose p-value. The first one considers all interactions with a minimum x-fold over the average background for its relative distribution as a candidate or secondly, all interactions with a loose p-value or less are considered. These are pre-selection steps to be able to detect wider peaks in the same way as sharp ones. All detected candidates are merged to one peak if they are direct neighbors, and the sum of all interaction values of this neighborhood is used to compute a new p-value. The p-value is computed based on the relative distance negative binomial distribution of the interaction with the original highest interaction value. All peaks considered are accepted as significant interactions if their p-value is as small as the threshold –pvalue.
To exclude interactions with an interaction value smaller than desired the parameter –peakInteractionsThreshold can be set.
In this example we use the reference point Mstn at location chr1 53118507, a loose p-value of 0.1 and p-value of 0.01:
chicSignificantInteractions --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt -bmf background_model.txt --range 1000000 1000000 --pValue 0.01 --loosePValue 0.1
This creates two files:
FL-E13-5_chr1_53118507_53118507_Mstn_target.txt
FL-E13-5_chr1_53118507_53118507_Mstn__significant_interactions.txt
These files are stored in the folders given by the parameters –targetFolder and –outputFolder.
The significant interaction files looks like the following:
# FL-E13-5.cool chr1_53118507_53118507 52.12 Mbp 54.12 Mbp Mstn Sum of interactions in fixate range: 1517.0
#Chromosome Start End Gene Sum of interactions Relative position Relative interactions p-value x-fold Raw target
chr1 53318000 53321000 Mstn 1517.0 200000 0.00395517468600000040 0.00000145009991170397 8.37043994897500098773 6.00000000000000000000
chr1 53329000 53334000 Mstn 1517.0 212000 0.01081081081000000166 0.00000000000000188738 22.37661518795599846499 16.39999999999999857891
chr1 53348000 53350000 Mstn 1517.0 231000 0.00329597890600000004 0.00001463968364323609 7.37204640642899988734 5.00000000000000000000
chr1 53351000 53359000 Mstn 1517.0 239000 0.01437046802899999941 0.00000000000000099920 34.20972383882499912033 21.80000000000000071054
chr1 53393000 53401000 Mstn 1517.0 278000 0.01793012524599999977 0.00000000000000044409 48.20518935066399990319 27.19999999999999928946
chr1 53408000 53420000 Mstn 1517.0 294000 0.02307185234000000418 0.00000000000001743050 68.05162660125500906361 35.00000000000000000000
The target file looks like:
# Significant interactions result file of HiCExplorer's chicSignificantInteractions version 3.2-dev
# targetFolder/FL-E13-5_chr1_53118507_53118507_Mstn_target.txt
# Mode: loose p-value with 0.1
# Used p-value: 0.01
#
chr1 53318000 53321000
chr1 53329000 53334000
chr1 53348000 53359000
chr1 53393000 53401000
chr1 53408000 53420000
The batch mode supports the computation of many viewpoints at once and is able to create one target list for the same viewpoint and two (or n) samples. To do the batch computation the parameter –batchMode needs to be added and the folder of the viewpoint files needs to be defined. In addition, the list of viewpoints created by chicViewpoint with –writeFileNamesToFile needs to be used as input. One target file is created for n consecutive lines and can be defined via the –computeSampleNumber parameter. However, for the differential test where the target file is needed, only two samples and one target file is supported.
chicSignificantInteractions --interactionFile interactionFiles.txt --interactionFileFolder interactionFilesFolder/ -bmf background_model.txt --range 1000000 1000000 --pValue 0.01 --loosePValue 0.1 --batchMode
The output is:
A folder containing all target files, name defined by –targetFolder, default value is targetFolder
A folder with all significant interaction files, name defined by –outputFolder, default value is significantFiles
A list containing the file names of all target files, name defined by –targetFileList, default value is targetList.txt
A list containing the file names of all significant interaction files, name defined by –writeFileNamesToFile, default value is significantFilesBatch.txt
Aggregate data for differential test¶
The process to aggregate data is only necessary if the differential test is used. Either two files and one target list are used to generate the files for the differential test or the batch mode can be used. chicAggregateStatistic takes the created viewpoint files from chicViewpoint as input and either the target files per two samples created by chicSignificantInteractions or one target file which applies for all viewpoints.
chicAggregateStatistic --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt interactionFilesFolder/MB-E10-5_chr1_53118507_53118507_Mstn.txt --targetFile targetFolder/FL-E13-5_MB-E10-5_chr1_53118507_53118507_Mstn_target.txt
It selects the original data based on the target locations and returns one file per sample which is used for the differential test.
In the batch mode the interaction file is the file containing the viewpoint file names, the folder needs to be defined by –interactionFileFolder, the same applies to the target file and folder. Two viewpoint files are match with one target file created by chicSignificantInteractions or one target file for all viewpoints. Please notice the output files are written to the folder name defined by –outputFolder, the default is aggregatedFiles and it is recommended to write the file names for further batch processing with hicDifferentialTest to –writeFileNamesToFile. All output files get the suffix defined by –outFileNameSuffix, default value is _aggregate_target.txt.
chicAggregateStatistic --interactionFile interactionFiles.txt --interactionFileFolder interactionFilesFolder --targetFile targetList.txt --targetFileFolder targetFolder --batchMode
Differential test¶
The differential test tests the interaction value of the reference point and the interaction value of the target of two samples for a differential expression. To achieve this, either Fisher’s test or the chi-squared test can be used. H0 is defined as ‘both locations are equal’, meaning the differential expressed targets can be found in the H0 rejected file.
This can be computed per sample:
chicDifferentialTest --interactionFile aggregatedFiles/FL-E13-5_chr1_53118507_53118507_Mstn__aggregate_target.txt aggregatedFiles/MB-E10-5_chr1_53118507_53118507_Mstn__aggregate_target.txt --alpha 0.05 --statisticTest fisher
Or via batch mode:
chicDifferentialTest --interactionFile aggregatedFilesBatch.txt --interactionFileFolder aggregatedFiles --alpha 0.05 --statisticTest fisher --batchMode --threads 20
In both cases it is important to set the desired alpha value and the output is written to –outputFolder (default differentialResults). For each sample three files are created:
H0 rejected targets
H0 accepted targets
one file containing both
In the batch mode, the file –rejectedFileNamesToFile is also written and contains the file names of the rejected files. This can be used for the batch processing mode of chicPlotViewpoint.
# Differential analysis result file of HiCExplorer's chicDifferentialTest version 3.2-dev
# This file contains the p-values computed by fisher test
# To test the smoothed (float) values they were rounded up to the next integer
#
# Alpha level 0.05
# Degrees of freedom 1
#
# FL-E13-5.cool chr1_53118507_53118507 52.12 Mbp 54.12 Mbp Mstn Sum of interactions in fixate range: 1517.0
# MB-E10-5.cool chr1_53118507_53118507 52.12 Mbp 54.12 Mbp Mstn Sum of interactions in fixate range: 1670.0
#Chromosome Start End Gene Relative distance sum of interactions 1 target_1 raw sum of interactions 2 target_2 raw p-value
chr1 53089000 53091000 Mstn -28000 1517.00000 5.00000 1670.00000 10.40000 0.21800
chr1 53131000 53133000 Mstn 14000 1517.00000 18.20000 1670.00000 23.60000 0.75900
chr1 53156000 53158000 Mstn 39000 1517.00000 3.00000 1670.00000 10.80000 0.06117
chr1 53251000 53254000 Mstn 135000 1517.00000 4.00000 1670.00000 9.60000 0.18614
chr1 53287000 53291000 Mstn 172000 1517.00000 7.20000 1670.00000 15.00000 0.29506
chr1 53305000 53309000 Mstn 190000 1517.00000 6.20000 1670.00000 12.40000 0.36952
chr1 53318000 53321000 Mstn 202000 1517.00000 6.00000 1670.00000 3.20000 0.53309
chr1 53326000 53334000 Mstn 215000 1517.00000 25.80000 1670.00000 22.60000 0.47374
chr1 53346000 53359000 Mstn 240000 1517.00000 31.60000 1670.00000 22.20000 0.13464
chr1 53408000 53421000 Mstn 302000 1517.00000 36.40000 1670.00000 28.20000 0.21290
# Differential analysis result file of HiCExplorer's chicDifferentialTest version 3.2-dev
# This file contains the p-values computed by fisher test
# To test the smoothed (float) values they were rounded up to the next integer
#
# Alpha level 0.05
# Degrees of freedom 1
#
# FL-E13-5.cool chr1_53118507_53118507 52.12 Mbp 54.12 Mbp Mstn Sum of interactions in fixate range: 1517.0
# MB-E10-5.cool chr1_53118507_53118507 52.12 Mbp 54.12 Mbp Mstn Sum of interactions in fixate range: 1670.0
#Chromosome Start End Gene Relative distance sum of interactions 1 target_1 raw sum of interactions 2 target_2 raw p-value
chr1 53393000 53401000 Mstn 282000 1517.00000 27.20000 1670.00000 6.40000 0.00012
# Differential analysis result file of HiCExplorer's chicDifferentialTest version 3.2-dev
# This file contains the p-values computed by fisher test
# To test the smoothed (float) values they were rounded up to the next integer
#
# Alpha level 0.05
# Degrees of freedom 1
#
# FL-E13-5.cool chr1_53118507_53118507 52.12 Mbp 54.12 Mbp Mstn Sum of interactions in fixate range: 1517.0
# MB-E10-5.cool chr1_53118507_53118507 52.12 Mbp 54.12 Mbp Mstn Sum of interactions in fixate range: 1670.0
#Chromosome Start End Gene Relative distance sum of interactions 1 target_1 raw sum of interactions 2 target_2 raw p-value
chr1 53089000 53091000 Mstn -28000 1517.00000 5.00000 1670.00000 10.40000 0.21800
chr1 53131000 53133000 Mstn 14000 1517.00000 18.20000 1670.00000 23.60000 0.75900
chr1 53156000 53158000 Mstn 39000 1517.00000 3.00000 1670.00000 10.80000 0.06117
chr1 53251000 53254000 Mstn 135000 1517.00000 4.00000 1670.00000 9.60000 0.18614
chr1 53287000 53291000 Mstn 172000 1517.00000 7.20000 1670.00000 15.00000 0.29506
chr1 53305000 53309000 Mstn 190000 1517.00000 6.20000 1670.00000 12.40000 0.36952
chr1 53318000 53321000 Mstn 202000 1517.00000 6.00000 1670.00000 3.20000 0.53309
chr1 53326000 53334000 Mstn 215000 1517.00000 25.80000 1670.00000 22.60000 0.47374
chr1 53346000 53359000 Mstn 240000 1517.00000 31.60000 1670.00000 22.20000 0.13464
chr1 53393000 53401000 Mstn 282000 1517.00000 27.20000 1670.00000 6.40000 0.00012
chr1 53408000 53421000 Mstn 302000 1517.00000 36.40000 1670.00000 28.20000 0.21290
Plotting of Viewpoints¶
chicPlotViewpoint can plot n viewpoints in one plot, add the mean background, show the p-value per relative distance per sample as an additional heatmap bar and highlight significant interactions or differential expressed regions.
One viewpoint:
chicPlotViewpoint --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt --range 200000 200000 -o single_plot.png

Two viewpoints, background, differential expression and p-values:
chicPlotViewpoint --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt interactionFilesFolder/MB-E10-5_chr1_53118507_53118507_Mstn.txt --range 300000 300000 --pValue --differentialTestResult differentialResults/FL-E13-5_MB-E10-5_chr1_53118507_53118507_Mstn__H0_rejected.txt --backgroundModelFile background_model.txt -o differential_background_pvalue.png

Two viewpoints, background, significant interactions and p-values:
chicPlotViewpoint --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt interactionFilesFolder/MB-E10-5_chr1_53118507_53118507_Mstn.txt --range 300000 300000 --pValue --significantInteractions significantFiles/FL-E13-5_chr1_53118507_53118507_Mstn__significant_interactions.txt significantFiles/MB-E10-5_chr1_53118507_53118507_Mstn__significant_interactions.txt --backgroundModelFile background_model.txt -o significant_background_pvalue.png

The batch mode is able to plot files under the same parameter setting for multiple viewpoints. These viewpoints are given by the file created by chicViewpoint with –writeFileNamesToFile parameter. The number of consecutive lines which should be plotted to one image can be defined using –plotSampleNumber. If the differentially expressed regions should be highlighted, setting this parameter to 2 is recommended.
For all modes the principle of a file containing the file names and a folder containing them applies for the plotting too, and using all cores available is highly recommended.
chicPlotViewpoint --interactionFile interactionFiles.txt --interactionFileFolder interactionFilesFolder/ --range 300000 300000 --pValue --significantInteractions significantFilesBatch.txt --significantInteractionFileFolder significantFiles --backgroundModelFile background_model.txt --outputFolder plots --threads 20 --batchMode
How we use HiCExplorer¶
To generate a Hi-C contact matrix is necessary to perform the following basic steps
Map the Hi-C reads to the reference genome
Filter the aligned reads to create a contact matrix
Filter matrix bins with low or zero read coverage
Remove biases from the Hi-C contact matrices
After a corrected Hi-C matrix is created other tools can be used to visualize it, call TADS or compare it with other matrices.
Reads mapping¶
Mates have to be mapped individually to avoid mapper specific heuristics designed for standard paired-end libraries.
We have used the HiCExplorer successfully with bwa, bowtie2 and hisat2. However, it is important to:
for either bowtie2`or `hisat2 use the –reorder parameter which tells bowtie2 or hisat2 to output the sam files in the exact same order as in the .fastq files.
use local mapping, in contrast to end-to-end. A fraction of Hi-C reads are chimeric and will not map end-to-end thus, local mapping is important to increase the number of mapped reads.
Tune the aligner parameters to penalize deletions and insertions. This is important to avoid aligned reads with gaps if they happen to be chimeric.
# map the reads, each mate individually using
# for example bwa
#
# bwa mem mapping options:
# -A INT score for a sequence match, which scales options -TdBOELU unless overridden [1]
# -B INT penalty for a mismatch [4]
# -O INT[,INT] gap open penalties for deletions and insertions [6,6]
# -E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1] # this is set very high to avoid gaps
# at restriction sites. Setting the gap extension penalty high, produces better results as
# the sequences left and right of a restriction site are mapped independently.
# -L INT[,INT] penalty for 5'- and 3'-end clipping [5,5] # this is set to no penalty.
$ bwa mem -A1 -B4 -E50 -L0 index_path \
mate_R1.fastq.gz 2>>mate_R1.log | samtools view -Shb - > mate_R1.bam
$ bwa mem -A1 -B4 -E50 -L0 index_path \
mate_R2.fastq.gz 2>>mate_R2.log | samtools view -Shb - > mate_R2.bam
Creation of a Hi-C matrix¶
Once the reads have been mapped the Hi-C matrix can be built. For this, the minimal extra information required is the binSize used for the matrix. Is it best to enter a low number like 10.000 because lower resolution matrices (larger bins) can be easily constructed using hicMergeMatrixBins. Matrices at restriction fragment resolution can be created by providing a file containing the restriction sites, this file can be created with the tool hicFindRestSites
hicFindRestSites that is part of HiCExplorer.
# build matrix from independently mated read pairs
# the restriction sequence GATC is recognized by the DpnII restriction enzyme
$ hicBuildMatrix --samFiles mate_R1.bam mate_R2.bam \
--binSize 10000 \
--restrictionSequence GATC \
--danglingSequence GATC \
--restrictionCutFile cut_sites.bed \
--threads 4 \
--inputBufferSize 100000 \
--outBam hic.bam \
-o hic_matrix.h5 \
--QCfolder ./hicQC
hicBuildMatrix creates two files, a bam file containing only the valid Hi-C read pairs and a matrix containing the Hi-C contacts at the given resolution. The bam file is useful to check the quality of the Hi-C library on the genome browser. A good Hi-C library should contain piles of reads near the restriction fragment sites. In the QCfolder a html file is saved with plots containing useful information for the quality control of the Hi-C sample like the number of valid pairs, duplicated pairs, self-ligations etc. Usually, only 25%-40% of the reads are valid and used to build the Hi-C matrix mostly because of the reads that are on repetitive regions that need to be discarded.
An important quality control measurement to check is the inter chromosomal fraction of reads as this is an indirect measure of random Hi-C contacts. Good Hi-C libraries have lower than 10% inter chromosomal contacts. The hicQC module can be used to compare the QC measures from different samples.
Correction of Hi-C matrix¶
The Hi-C matrix has to be corrected to remove GC, open chromatin biases and, most importantly, to normalize the number of restriction sites per bin. Because a fraction of bins from repetitive regions contain few contacts it is necessary to filter those regions first. Also, in mammalian genomes some regions enriched by reads should be discarded. To aid in the filtering of regions hicCorrectMatrix generates a diagnostic plot as follows:
$ hicCorrectMatrix diagnostic_plot -m hic_matrix.h5 -o hic_corrected.png
The plot should look like this:

Histogram of the number of counts per bin.¶
For the upper threshold is only important to remove very high outliers and thus a value of 5 could be used. For the lower threshold it is recommended to use a value between -2 and -1. What it not desired is to try to correct low count bins which could result simply in an amplification of noise. For the upper threshold is not so concerning because those bins will be scaled down.
Once the thresholds have been decided, the matrix can be corrected
# correct Hi-C matrix
$ hicCorrectMatrix correct -m hic_matrix.h5 --filterThreshold -1.5 5 -o hic_corrected.h5
In the case of multiple samples / replicates that need to be normalized to the same read coverage we recommend to compute first the normalization (with hicNormalize) and correct the data (with hicCorrectMatrix) in a second step.
Visualization of results¶
There are two ways to see the resulting matrix, one using hicPlotMatrix and the other is using hicPlotTADs. The first one allows the visualization over large regions while the second one is preferred to see specific parts together with other information, for example genes or bigwig tracks.
Because of the large differences in counts found int he matrix, it is better to plot the counts using the –log1p option.
$ hicPlotMatrix -m hic_corrected.h5 -o hic_plot.png --region 1:20000000-80000000 --log1p

Corrected Hi-C counts in log scale.¶
Quality control of Hi-C data and biological replicates comparison¶
HiCExplorer integrates multiple tools that allow the evaluation of the quality of Hi-C libraries and matrices.
hicQC on the log files produced by hicBuildMatrix and control of the pdf file produced.
Proportion of useful reads is important to assess the efficiency of the HiC protocol, which is dependant of proportion of dangling ends detected… Proportion of inter chromosomal, short range and long range contacts are important for….
hicPlotDistVsCounts to compare the distribution of corrected Hi-C counts in relation with the genomic
distance between multiple samples. If some differences are observed between biological replicates, these can be investigated more precisely by computing log2ratio matrices.
hicCompareMatrices log2ratio of matrices of biological replicates to identify where the potential changes are located.
hicPlotPCA bins correlation of two biological replicates.
TAD calling¶
To call TADs a corrected matrix is needed. Restriction fragment resolution matrices provide the best results. TAD calling works in two steps: First HiCExplorer computes a TAD-separation score based on a z-score matrix for all bins. Then those bins having a local minimum of the TAD-separation score are evaluated with respect to the surrounding bins to decide assign a p-value. Then a cutoff is applied to select the bins more likely to be TAD boundaries.
$ hicFindTADs -m hic_corrected.h5 --outPrefix hic_corrected --numberOfProcessors 16
This code will produce several files: 1. The TAD-separation score file, 2. the z-score matrix, 3. a bed file with the boundary location, 4. a bed file with the domains, 5. a bedgraph file with the TAD-score that can be visualized in a genome browser.
The TAD-separation score and the matrix can be visualized using hicPlotTADs.

Example output from hicPlotTADs from http://chorogenome.ie-freiburg.mpg.de/¶
A / B compartment analysis¶
To compute the A / B compartments the matrix needs to be transformed to an observed/expected matrix in the way Lieberman-Aiden describes it. In a next step a pearson correlation matrix and based on it a covariance matrix is computed. Finally the eigenvectors based on the covariance matrix are computed. All these steps are computed with the command:
$ hicPCA -m hic_corrected.h5 --outFileName pca1.bw pca2.bw --format bigwig
If the intermediate matrices of this process should be used for plotting run:
$ hicTransform -m hic_corrected.h5 --outFileName all.h5 --method all
This creates all intermediate matrices: obs_exp_all.h5, pearson_all.h5 and covariance_all.h5.
The A / B compartments can be plotted with hicPlotMatrix.
$ hicPlotMatrix -m pearson_all.h5 --outFileName pca1.png --perChr --bigwig pca1.bw
News and Developments¶
Release 3.6¶
10 November 2020
- hicAggregateContacts, thanks @LeilyR:
hicAggragateContact has been updated to be able to handle inter chromosomal contacts as well as inter chromosomal contacts
Added scikit-learn to dependencies
hicBuildMatrix: Fixing another bug concerning scaffolds without any restriction enzyme cut site
Updated dependencies
Adding default values to the documentation. Thanks @lldelisle
hicTransform: Fixing a bug in case one of the intermediate matrices is empty
- Official Python 3.8 support:
Manually setting ‘fork’ as the start method for multiprocessing because the default on macOS was set to ‘spawn’
Release 3.5.3¶
14 October 2020
- Bug fix release:
Reads from scaffolds without any restriction enzyme cut site are considered as ‘same fragment’. An appearance of such a read will not lead to a crash anymore
Minor documentation improvements
Release 3.5.2¶
06 October 2020
- Bug fix release:
enforcing version 15 of HiCMatrix. Version 14 had a bug concerning the application of the correction factors of cool files. See issue #595
Fixing a bug in hicDetectLoops in single-core mode. Thanks @alecw
Fixing a bug in hicDifferentialTAD concerning multiple chromosomes in the bed file. See issue #587
Updating dependencies to newest versions, except biopython. Forcing here <1.77 because their API change is breaking our source code. See issue #609
Fixing #596
Changing internal structure of the docs. Navigation should be better now.
Release 3.5.1¶
11 August 2020
patch hicCorrelate
hicBuildMatrix: Better help text
patch for hicPlotMatrix if matrix does not start at 0 and –region is used
bug fix for remove option in hicAdjustMatrix
Release 3.5¶
10 July 2020
Major update for hicDetectLoops: Results are now closer to HiCCUPS, it is faster and needs less memory.
hicHyperoptDetectLoops: New tool to compute the best parameter setting if the number of expected loops is given and a protein boundary file (e.g. CTCF on mammals) is provided
hicHyperoptDetectLoopsHiCCUPS: New tool to compute the best parameter setting for Juicers HiCCUPS. HiCCUPS and all its dependencies are not part of HiCExplorer and must be provided by the user. Number of expected loops and a protein boundary file (e.g. CTCF on mammals) must be provided.
hicMergeDomains: New tool to compute a merge of TADs computed on different resolutions. Moreover it provides a cleaning of the boundaries with the help of a protein peak file, and the hierarchical dependencies of TADs can be plotted. This tool is the result of the Bachelor thesis from Sarah Domogalla (@SarahMaria27). Thanks for your contribution!
hicDifferentialTAD: New tool to compute differential TADs between two samples
Bug fix for hicFindTADs: The format of the intermediate z-score matrices are now depending on the format of the input Hi-C matrix
Bug fix for chic*-modules: Fixate range is now correct applied.
chicSignificantInteractions, hicDetectLoops: Option to use a per relative position p-value with a p-value threshold file
Adding hicCreateThresholdFile: A script to generate a per position p-value threshold for hicDetectLoops and chicSignificantInteractions
- Bug fix for hicPlotMatrix:
multiple bigwig tracks in the vertical setting are now supported
correct plot of bigwig if the given matrix does not start at the beginning of the chromosome
new parameters ‘increaseFigureWidth’ and ‘increaseFigureHeight’ to add more space to the plot if multiple bigwigs are plotted. Adjust this parameter to correct the plot of the heatmap which may be not quadratic anymore.
restriction of the loop regions to the user given range. This effects especially SVGs that will now contain less objects as before.
- New feature for hicBuildMatrix:
multiple restriction cut sequences, dangling ends and restriction cut sites files are now supported
restriction cut sequences, dangling ends and restriction cut sites files are now mandatory parameters. This is now enforced to guarantee a correct detection of self ligations and self circles
hicPrepareQCreport: New support for multiple dangling ends
hicQuickQC: restriction cut sequences, dangling ends and restriction cut sites files are now mandatory parameters
hicFindRestSite: gz support for fasta file
Add fallback modes to multiple scripts if the parallelization fails.
hicAggregate: interactions between two bed files by comparing every row in the first bed file with its corresponding row in the second file. (issue #390)
- hicAdjustMatrix: fix #341, 454
fixed –action remove: it actually remove the part from the matrix previously was masking it
the case where the end of the chromosome need to be removed.
New Azure testing for the general test cases, the trivial ones run on travis
Publication¶
02 July 2020
Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization Joachim Wolff, Leily Rabbani, Ralf Gilsbach, Gautier Richard, Thomas Manke, Rolf Backofen, Björn A Grüning Nucleic Acids Research, Volume 48, Issue W1, 02 July 2020, Pages W177–W184
Release 3.4.3¶
23 March 2020
Fixing the wrong p-value computation in for chicViewpoint. New method is more accurate for floating points.
Fixing a bug in chicViewpointBackgroundModel and chicQualityControl if an non-existing reference point was used.
Improving all chic* modules with a capturing of errors in sub-processes. It is now guaranteed that the computation will terminate. Either successful or by error.
Add option ‘truncateZero’ to chicViewpointBackgroundModel: This removes all zero values for the distributions before fitting to fight over dispersion.
Add option ‘decimalPlaces’ to chicViewpoint to adjust the decimal places in the output for all floating values. Helpful for really small p-values
Add option ‘truncateZeroPvalues’ to chicSignificantInteractions to set all p-values which are 0 to 1 and are therefore ignored.
Add option ‘truncateZeroPvalues’ to chicPlotViewpoint to set all p-values which are 0 to 1 and do not disturb the presentation of real p-values
Release 3.4.2¶
7 March 2020
This release fixes the wrong name scheme which was used in the chicModules. The most .bed files are now .txt files.
hicDetectLoops got an inner chromosome parallelization to decrease the compute time.
hicPlotMatrix got three new parameters: rotationX, rotationY and fontSize to adjust the position and font size of the labels. We hope this can lead in certain cases to a a better readability
hicPlotMatrix: fixed a bug that occurred if the list of chromosomes was given and the last chromosome appeared as an additional label.
Improving and updating the documentation.
Preprint¶
**6 March 2020*
The preprint of the loop detection algorithm is online via biorXiv: https://www.biorxiv.org/content/10.1101/2020.03.05.979096v1
Release 3.4.1¶
3 February 2020
This release fixes a bug in chicViewpoint that caused a crash if the data to be averaged is less than the window size.
Release 3.4¶
28 January 2020
Fixing a bug in hicAdjustMatrix: keep option had a bug concerning the cutting before the end of a chromosome or the start position was not from the beginning of the chromosome
hicCompartmentPolarization was renamed to hicCompartmentalization and got some bug fixes
Extending the option on how the observed vs. Expected matrix is computed and adding the parameter –ligation_factor to achieve a rescale behaviour of the values as it is implemented in Homer. The same changes are applied to hicTransform
Improved the documentation
Adding an option in hicAverageRegions to select start, end, center or start_end as start index for up/downstream range.
hicBuildMatrix: Removed default value of binSize to enable mutually exclusive group error if not one of them is set. Behaviour so far was that the binSize was taken.
hicPlotSVL: adding xlegend to plot of SVL ratios to indicate the data points per boxplots are the chromosome ratios
hicQuickQC: Removed binSize option of hicQuickQC because it does not matter for QC calculation and adding a sentence to recommend the usage of restriction enzyme and dangling end sequence. Fixing bug issue #464
hicNormalize: Adding option in hicNormalize to remove values after the normalization if values are smaller than a given threshold
Capture Hi-C modules: Change background model distribution assumption from negative binomial to continuous negative binomial by using Gamma functions as a replacement for the binomial coefficient. Source: https://stats.stackexchange.com/questions/310676/continuous-generalization-of-the-negative-binomial-distribution/311927
hicInfo: Implementing feature request #456. The length of chromosomes is now show in the information too
Release 3.3.1¶
15 November 2019
Fixing a bug in the labeling of chicPlotViewpoints if the value range is counted in MB
Add an option to chicViewpoint to pre-compute a x-fold of p-value over the maximum value of the relative distance
Release 3.3¶
8 October 2019
- Fixing many bugs:
A bug in hicDetectLoops if a sub-matrix was very small
A bug in hicPlotMatrix if the region defined by –region was only a chromosome and loops should be plotted too
A bug in hicPlotMatrix if a loop region should be plotted and chromosomeOrder argument was used too
A bug in hicAggregateContacts (issue #405) if chromosomes were present in the matrix but not in the bed file
A bug in hicAdjustMatrix concerning a bed file and consecutive regions, see issue #343
A bug in hicAdjustMatrix if a chromosome is present in the matrix but not in the bed file, see issue #397
A bug in hicCompartmentsPolarization concerning the arguments ‘quantile’ and ‘outliers’ were interpreted as strings but should be integers
A bug in hicAdjustMatrix concerning the ‘keep’ option and how matrices are reordered internally. Thanks @LeilyR
- Added features as requested:
hicPCA ignores now masked bins, see issue #342
- chicPlotViewpoint:
Better legend handling on x-axis
Peaks are now display with their fill width
Add option –pValueSignificantLevels to categorize the p-values in x levels (e.g. 0.001 0.05 0.1)
- chicViewpoint:
adding sorting via viewpoints and not by samples option (–allViewpointsList)
Adding an option to hicNormalize to normalize via multiplication and a use defined value (see issues #385, #424)
Rearrange hicAdjustMatrix to have a better accessibility to its functions from outside of main
Improving the documentation and fixing grammar / spelling mistakes. Thanks @simonbray
New script: hicPlotSVL to investigate short range vs long range ratios.
Release 3.2¶
** 22 August 2019**
Adding the new captured Hi-C module. Viewpoint analysis based on a background model, significant interaction detection and differential analysis are provided.
Adding documentation for captured Hi-C module and a tutorial on how to use it.
Adding a module to be able to detect quite fast the quality of a Hi-C data set: hicQuickQC.
Adding a tool to merge loops of different resolutions.
Improving validation of locations: Presorting is no longer necessary; adding feature to add ‘chr’ prefix to loop or protein chromosome name
- Change loop detection slightly to improve results and fixed bugs:
preselection p-value was ignored and only p-value was used
adding additional test to the peak region test to decrease false discoveries
exchanging pThreshold / ln(distance) to remove too low values by a share of the maximum value of the distance. New parameter ‘maximumInteractionPercentageThreshold’
Removal of the folder ‘scripts’ and its content. These were outdated scripts and will maybe part of regular Hi-C tools in the future.
Release 3.1¶
9 July 2019
KR correction improvements: It is now able to process larger data sets like GM12878 primary+replicate on 10kb resolution.
Adding script for validation of loop locations with protein peak locations
Adding script hicCompartmentsPolarization: Rearrange the average interaction frequencies using the first PC values to represent the global compartmentalization signal
Release 3.0.2¶
28 June 2019
Pinning dependencies to:
hicmatrix version 9: API changes in version 10
krbalancing version 0.0.4: API changes in version 0.0.5
matplotlib version 3.0: Version 3.1 raises ‘Not implemented error’ for unknown reasons.
Set fit_nbinom to version 1.1: Version 1.0 Had deprecated function call of scipy > 1.2.
Small documentation fixes and improvements.
Release 3.0¶
3 April 2019
Python 3 only. Python 2.X is no longer supported
Additional Hi-C interaction matrix correction algorithm ‘Knight-Ruiz’ as a C++ module for a faster runtime and less memory usage.
Enriched regions detection tool: ‘hicDetectLoops’ based on strict candidate selection, ‘hicFindEnrichedContacts’ was deleted
Metadata for cooler files is supported: hicBuildMatrix and hicInfo are using it
New options for hicPlotMatrix: –loops to visualize computed loops from hicDetectLoops and –bigwigAdditionalVerticalAxis to display a bigwig track on the vertical axis too.
Release 2.2.3¶
22 March 2019
This bug fix release patches an issue with cooler files, hicBuildMatrix and the usage of a restriction sequence file instead of fixed bin size.
Release 2.2.2¶
27 February 2019
This bug fix release removes reference to hicExport that were forgotten to delete in 2.2. Thanks @BioGeek for this contribution.
Release 2.2.1¶
7 February 2019
Muting log output of matplotlib and cooler
Set version number of hicmatrix to 7
Optional parameter for hicInfo to write the result to a file instead to the bash
Release 2.2¶
18 January 2019
This release contains:
replaced hicExport by hicConvertFormat and hicAdjustMatrix
extended functionality for hicConvertFormat
read support for homer, hicpro, cool, h5
write support for h5, homer, cool
convert hic to cool
creation of mcool matrices
hicAdjustMatrix
remove, keep or mask specified regions from a file, or chromosomes
hicNormalize
normalize matrices to 0 - 1 range or to the read coverage of the lowest given
hicBuildMatrix
support for build mcool
restructuring the central class HiCMatrix to object oriented model and moved to its own library: deeptools/HiCMatrix.
Extended read / write support for file formats
better (faster, less memory) support for cool format
remove of old, unused code
restrict support to h5 and cool matrices, except hicConvertFormat
hicFindTADs: Option to run computation per specified chromosomes
hicPlotTADs: removed code and calls pyGenomeTracks
hicAverageRegions: Sum up in a given range around defined reference points. Useful to detect changes in TAD structures between different samples.
hicPlotAverageRegions: Plots such a average region
hicTransform: Restructuring the source code, remove of option ‘all’ because it was generating confusion. Adding option ‘exp_obs’, exp_obs_norm and exp_obs_lieberman. These three different options use different expectation matrix computations.
hicPCA
Adding –norm option to compute the expected matrix in the way HOMER is doing it. Useful for drosophila genomes
Adding option to write out the intermediate matrices ‘obs_exp’ and ‘pearson’ which are necessary in the computation of the PCA
hicPlotMatrix
Add option to clip bigwig values
Add option to scale bigwig values
Removed hicLog2Ration, functionality is covered by hicCompareMatrices
Extending test cases to cover more source code and be hopefully more stable.
Many small bugfixes
Publication¶
13 June 2018
We are proud to announce our latest publication:
Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. “Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization”, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: https://doi.org/10.1093/nar/gky504
Release 2.1.4¶
25 May 2018
cooler file format correction factors are applied as they should be
parameter ‘–region’ of hicBuildMatrix works with Python 3
Release 2.1.3¶
7 May 2018
The third bugfix release of version 2.1 corrects an error in hicPlotViewpoint. It adds a feature requested in issue #169 which should have been included in release 2.1 but was accidentally not.
From 2.1 release note: hicPlotViewpoint: Adds a feature to plot multiple matrices in one image
Release 2.1.2¶
26 April 2018
The second bug fix release of 2.1 includes:
documentation improvements
fixing broken Readthedocs documentation
Small bug fix concerning hicPlotMatrix and cooler: –chromosomeOrder is now possible with more than one chromosome
Small fixes concerning updated dependencies: Fixing version number a bit more specific and not that strict in test cases delta values.
Release 2.1.1¶
27 March 2018
This release fixes a problem related to python3 in which chromosome names were of bytes type
Release 2.1¶
5 March 2018
The 2.1 version of HiCExplorer comes with new features and bugfixes.
Adding the new feature hicAggregateContacts: A tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions.
Many improvements to the documentation and the help text. Thanks to Gina Renschler and Gautier Richard from the MPI-IE Freiburg, Germany.
hicPlotMatrix
supports only bigwig files for an additional data track.
the argument –pca was renamed to –bigwig
Smoothing the bigwig values to neighboring bins if no data is present there
Fixes to a bug concerning a crash of tight_layout
Adding the possibility to flip the sign of the values of the bigwig track
Adding the possibility to scale the values of the bigwig track
hicPlotViewpoint: Adds a feature to plot multiple matrices in one image
cooler file format
supports mcool files
applies correction factors if present
optionally reads bin[‘weight’]
fixes
a crash in hicPlotTads if horizontal lines were used
checks if all characters of a title are ASCII. If not they are converted to the closest looking one.
Updated and fixate version number of the dependencies
Release 2.0¶
December 21, 2017
This release makes HiCExplorer ready for the future:
Python 3 support
Cooler file format support
A/B comparment analysis
Improved visualizations
bug fixes for
--perChr
option in hicPlotMatrixeigenvector track with
--pca
for hicPlotMatrixvisualization of interactions around a reference point or region with hicPlotViewpoint
Higher test coverage
re-licensing from GPLv2 to GPLv3
Release 1.8.1¶
November 27, 2017
Bug fix release:
a fix concerning the handling chimeric alignments in hicBuildMatrix. Thanks to Aleksander Jankowski @ajank
handling of dangling ends was too strict
improved help message in hicBuildMatrix
Release 1.8¶
October 25, 2017
This release is adding new features and fixes many bugs:
hicBuildMatrix: Added multicore support, new parameters –threads and –inputBufferSize
hicFindTADs:
One call instead of two: hicFindTADs TAD_score and hicFindTADs find_TADs merged to hicFindTADs.
New multiple correction method supported: False discovery rate. Call it with –correctForMultipleTesting fdr and –threshold 0.05.
Update of the tutorial: mES-HiC analysis.
Additional test cases and docstrings to improve the software quality
Fixed a bug occurring with bigwig files with frequent NaN values which resulted in only NaN averages
hicPlotTADs: Support for plotting points
Moved galaxy wrappers to https://github.com/galaxyproject/tools-iuc
Fixed multiple bugs with saving matrices
hicCorrelate: Changes direction of dendograms to left
Release 1.7.2¶
April 3, 2017
Added option to plot bigwig files as a line hicPlotTADs
Updated documentation
Improved hicPlotMatrix –region output
Added compressed matrices. In our tests the compressed matrices are significantly smaller.
March 28, 2017
Release 1.7¶
March 28, 2017
This release adds a quality control module to check the results from hicBuildMatrix. By default, now hicBuildMatrix generates a HTML page containing the plots from the QC measures. The results from several runs of hicBuildMatrix can be combined in one page using the new tool hicQC.
Also, this release added a module called hicCompareMatrices that takes two Hi-C matrices and computes the difference, the ratio or the log2 ratio. The resulting matrix can be plotted with hicPlotMatrix to visualize the changes.
Preprint introducing HiCExplorer is now online¶
March 8, 2017
Our #biorXiv preprint on DNA sequences behind Fly genome architecture is online!
Read the article here : http://biorxiv.org/content/early/2017/03/08/115063
In this article, we introduce HiCExplorer : Our easy to use tool for Hi-C data analysis, also available in Galaxy.
We also introduce HiCBrowser : A standalone software to visualize Hi-C along with other genomic datasets.
Based on HiCExplorer and HiCBrowser, we built a useful resource for anyone to browse and download the chromosome conformation datasets in Human, Mouse and Flies. It’s called the chorogenome navigator
Along with these resources, we present an analysis of DNA sequences behind 3D genome of Flies. Using high-resolution Hi-C analysis, we find a set of DNA motifs that characterize TAD boundaries in Flies and show the importance of these motifs in genome organization.
We hope that these resources and analysis would be useful for the community and welcome any feedback.
HiCExplorer wins best poster prize at VizBi2016¶
March 20, 2016
We are excited to announce that HiCExplorer has won the NVIDIA Award for Best Scientific Poster in VizBi2016, the international conference on visualization of biological data.
This was our poster :

Citation¶
Please cite HiCExplorer as follows:
Joachim Wolff, Leily Rabbani, Ralf Gilsbach, Gautier Richard, Thomas Manke, Rolf Backofen, Björn A Grüning. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization, Nucleic Acids Research, Nucleic Acids Research, Volume 48, Issue W1, 02 July 2020, Pages W177–W184, https://doi.org/10.1093/nar/gkaa220
Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: https://doi.org/10.1093/nar/gky504
Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening,Kin Chung Lam, Bianca Habermann, Asifa Akhtar, Thomas Manke. “High-resolution TADs reveal DNA sequences underlying genome organization in flies”. Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w
Citation¶
Please cite HiCExplorer as follows:
Joachim Wolff, Leily Rabbani, Ralf Gilsbach, Gautier Richard, Thomas Manke, Rolf Backofen, Björn A Grüning. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization, Nucleic Acids Research, Nucleic Acids Research, Volume 48, Issue W1, 02 July 2020, Pages W177–W184, https://doi.org/10.1093/nar/gkaa220
Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: https://doi.org/10.1093/nar/gky504
Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening,Kin Chung Lam, Bianca Habermann, Asifa Akhtar, Thomas Manke. “High-resolution TADs reveal DNA sequences underlying genome organization in flies”. Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w

This tool suite is developed by the Bioinformatics Unit at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg and by the Bioinformatics Lab of the Albert-Ludwigs-University Freiburg, Germany.