HiCExplorer

Set of programs to process, normalize, analyze and visualize Hi-C and cHi-C data

HiCExplorer addresses the common tasks of Hi-C data analysis from processing to visualization.

_images/hicex3.png

Availability

HiCExplorer is available as a command line suite of tools on this GitHub repository.

A Galaxy HiCExplorer version is directly available to users at http://hicexplorer.usegalaxy.eu. Training material is available at the Galaxy Training Network, while a Galaxy Tour is available here for users not familiar with this platform. Galaxy HiCExplorer is also available as a Docker image at the Docker Galaxy HiCExplorer GitHub repository. Finally, this Galaxy version is available on the Galaxy Tool Shed and on the corresponding GitHub repository.

The following is the list of tools available in HiCExplorer

tool

description

hicFindRestSites

Identifies the genomic locations of restriction sites

hicBuildMatrix

Creates a Hi-C matrix using the aligned BAM files of the Hi-C sequencing reads

hicQuickQC

Estimates the quality of Hi-C dataset

hicQC

Plots QC measures from the output of hicBuildMatrix

hicCorrectMatrix

Uses iterative correction to remove biases from a Hi-C matrix

hicDetectLoops

Identifies enriched Hi-C contacts

hicCorrelate

Computes and visualizes the correlation of Hi-C matrices

hicFindTADs

Identifies Topologically Associating Domains (TADs)

hicPCA

Computes for A / B compartments the eigenvectors

hicTransform

Computes a obs_exp matrix like Lieberman-Aiden (2009), a pearson correlation matrix and or a covariance matrix. These matrices can be used for plotting.

hicMergeMatrixBins

Merges consecutive bins on a Hi-C matrix to reduce resolution

hicMergeTADbins

Uses a BED file of domains or TAD boundaries to merge the bin counts of a Hi-C matrix.

hicPlotDistVsCounts

Plot the decay in interaction frequency with distance

hicPlotMatrix

Plots a Hi-C matrix as a heatmap

hicPlotTADs

Plots TADs as a track that can be combined with other tracks (genes, signal, interactions)

hicPlotViewpoint

A plot with the interactions around a reference point or region.

hicAggregateContacts

A tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions.

hicSumMatrices

Adds Hi-C matrices of the same size

hicPlotDistVsCounts

Plots distance vs. Hi-C counts of corrected data

hicInfo

Shows information about a Hi-C matrix file (no. of bins, bin length, sum, max, min, etc)

hicCompareMatrices

Computes difference or ratio between two matrices

hicAverageRegions

Computes the average of multiple given regions, usually TAD regions

hicPlotAverageRegions

visualization of hicAverageRegions

hicNormalize

Normalizes the given matrices to 0-1 range or the smallest read coverage

hicConvertFormat

Converts between different Hi-C interaction matrices

hicAdjustMatrix

Keeps, removes or masks regions in a Hi-C matrix

hicValidateLocations

Compare the loops with known peak protein locations

hicMergeLoops

Merges loops of different resolutions

hicCompartmentalization

Compute the global compartmentalization signal

chicQualityControl

Quality control for cHi-C data

chicViewpointBackgroundModel

Background model computation for cHi-C analysis

chicViewpoint

Computation of all viewpoints based on background model for cHi-C analysis

chicSignificantInteractions

Detection of significant interactions per viewpoint based on background model

chicAggregateStatistic

Compiling of target regions for two samples as input for differential analysis

chicDifferentialTest

Differential analysis of interactions of two samples

chicPlotViewpoint

Plotting of viewpoint with background model and highlighting of significant and differential regions

hicPlotSVL

Computing short vs long range contacts and plotting the results

Getting Help

  • For all kind of questions, suggesting changes/enhancements and to report bugs, please create an issue on our GitHub repository

  • In the past we offered to post on Biostars with Tag hicexplorer : Biostars or on the deepTools mailing list. We still check these resources from time to time but the preferred way to communicate are GitHub issues.

Contents:

Installation

Requirements

  • Python 3.6

  • numpy >= 1.17

  • scipy >= 1.3

  • matplotlib == 3.1

  • pysam >= 0.15

  • intervaltree >= 3.0

  • biopython >= 1.74

  • pytables >= 3.5

  • pyBigWig >= 0.3

  • future >= 0.17

  • jinja2 >= 2.10

  • pandas >= 0.25

  • unidecode >= 1.1

  • hicmatrix = 11

  • pygenometracks >= 3.0

  • psutil >= 5.6

  • hic2cool >= 0.7

  • cooler >= 0.8.5

  • krbalancing >= 0.0.5 (Needs the library eigen; openmp is recommended for linux users. No openmp support on macOS.)

  • fit_nbinom >= 1.1

  • pybedtools >= 0.8

Warning: Python 2.7 support is discontinued. Moreover, the support for pip is discontinued too. Warning: We strongly recommend to use the conda package manager and will no longer give support on all issues raising with pip.

Command line installation using conda

The fastet way to obtain Python 3.6 together with numpy and scipy is via the Anaconda Scientific Python Distribution. Just download the version that’s suitable for your operating system and follow the directions for its installation. All of the requirements for HiCExplorer can be installed in Anaconda with:

$ conda install hicexplorer -c bioconda -c conda-forge

We strongly recommended to use conda to install HiCExplorer.

Command line installation using pip

The installation via pip is discontinued with version 3.0. The reason for this is that we want to provide a ‘one-click’ installation. However, with version 3.0 we added the C++ library eigen as dependency and pip does not support non-Python packages.

For older versions you can still use pip: Install HiCExplorer using the following command:

$ pip install hicexplorer

All python requirements should be automatically installed.

If you need to specify a specific path for the installation of the tools, make use of pip install’s numerous options:

$ pip install --install-option="--prefix=/MyPath/Tools/hicexplorer" git+https://github.com/deeptools/HiCExplorer.git

Warning: It can be that you have to install additional packages via your system package manager to successfully install HiCExplorer via pip. Warning: We strongly recommend to use the conda package manager and will no longer give support on all issues raising with pip.

Command line installation without pip

You are highly recommended to use pip rather than these more complicated steps.

  1. Install the requirements listed above in the “requirements” section. This is done automatically by pip.

2. Download source code

$ git clone https://github.com/deeptools/HiCExplorer.git

or if you want a particular release, choose one from https://github.com/deeptools/HiCExplorer/releases:

$ wget https://github.com/deeptools/HiCExplorer/archive/1.5.12.tar.gz
$ tar -xzvf

3. To install the source code (if you don’t have root permission, you can set a specific folder using the --prefix option)

$ python setup.py install --prefix /User/Tools/hicexplorer

Galaxy installation

HiCExplorer can be easily integrated into a local Galaxy, the wrappers are provided at the Galaxy tool shed.

Installation with Docker

The HiCExplorer Galaxy instance is also available as a docker container, for those wishing to use the Galaxy framework but who also prefer a virtualized solution. This container is quite simple to install:

$ sudo docker pull quay.io/bgruening/galaxy-hicexplorer

To start and otherwise modify this container, please see the instructions on the docker-galaxy-stable github repository. Note that you must use bgruening/galaxy-hicexplorer in place of bgruening/galaxy-stable in the examples, as the HiCExplorer Galaxy container is built on top of the galaxy-stable container.

Tip

For support, or feature requests contact: deeptools@googlegroups.com

HiCExplorer tools

tool

type

input files

main output file(s)

application

hicFindRestSites

preprocessing

1 genome FASTA file

bed file with restriction site coordinates

Identifies the genomic locations of restriction sites

hicBuildMatrix

preprocessing

2 BAM/SAM files

hicMatrix object

Creates a Hi-C matrix using the aligned BAM files of the Hi-C sequencing reads

hicCorrectMatrix

preprocessing

hicMatrix object

normalized hicMatrix object

Uses iterative correction or Knight-Ruiz to remove biases from a Hi-C matrix

hicMergeMatrixBins

preprocessing

hicMatrix object

hicMatrix object

Merges consecutive bins on a Hi-C matrix to reduce resolution

hicSumMatrices

preprocessing

2 or more hicMatrix objects

hicMatrix object

Adds Hi-C matrices of the same size

hicNormalize

preprocessing

multiple Hi-C matrices

multiple Hi-C matrices

Normalize data to 0 to 1 range or to smallest total read count

hicCorrelate

analysis

2 or more hicMatrix objects

a heatmap/scatter plot

Computes and visualizes the correlation of Hi-C matrices

hicFindTADs

analysis

hicMatrix object

bedGraph file (TAD score), a boundaries.bed file, a domains.bed file (TADs)

Identifies Topologically Associating Domains (TADs)

hicPlotMatrix

visualization

hicMatrix object

a heatmap of Hi-C contacts

Plots a Hi-C matrix as a heatmap

hicPlotTADs

visualization

hicMatrix object, a config file

Hi-C contacts on a given region, along with other provided signal (bigWig) or regions (bed) file

Plots TADs as a track that can be combined with other tracks (genes, signal, interactions)

hicPlotDistVsCounts

visualization

hicMatrix object

log log plot of Hi-C contacts per distance

Quality control

hicConvertFormat

data integration

one/multiple Hi-C file formats

Hi-C matrices/outputs in several formats

Convert matrix to different formats

hicAdjustMatrix

data integration

one Hi-C file formats

Hi-C matrix

Removes, masks or keeps specified regions of a matrix

hicInfo

information

one or more hicMatrix objects

Screen info

Prints information about matrices, like size, maximum, minimum, bin size, etc.

hicPCA

analysis

one Hi-C matrix

bedgraph or bigwig file(s) for each eigenvector

Computes for A / B compartments the eigenvectors

hicTransform

analysis

one Hi-C matrix

Hi-C matrix

Computes a obs_exp matrix like Lieberman-Aiden (2009), a pearson correlation matrix and or a covariance matrix. These matrices can be used for plotting.

hicPlotViewpoint

visualization

one Hi-C matrix

A viewpoint plot

A plot with the interactions around a reference point or region.

hicQC

information

log files from hicBuildMatrix

A quality control report

Quality control of the created contact matrix.

hicQuickQC

information

2 BAM/SAM files

An estimated quality control report

Estimated quality report of the Hi-C data.

hicCompareMatrices

analysis

two Hi-C matrices

one Hi-C matrix

Applies diff, ratio or log2ratio on matrices to compare them.

hicAverageRegions

analysis

multiple Hi-C matrices

one npz object

Averages the given locations. Visualization with hicPlotAverageRegions

hicDetectLoops

analysis

one Hi-C matrices

bedgraph file with loop locations

Detects enriched regions. Visualization with hicPlotmatrix and –loop parameter.

hicValidateLocations

analysis

one loop, one protein peak file

bedgraph file with matched loop locations, one file with loop / protein statistics

Matches loop locations with protein peak positions

hicMergeLoops

analysis

multiple loop files

bedgraph file with merged loop locations

Merges detect loop locations of different resolutions

hicCompartmentalization

visualization

one Hi-C interaction matrix one PCA bedgraph file

one image polarization plot

The global compartmentalization signal.

hicPlotAverageRegions

visualization

one npz file

one image

Visualization of hicAverageRegions.

:ref`hicPlotSVL`

analysis

one / multiple Hi-C matrices

one image, p-values file, raw data file

Computes short/long range contacts; a box plot, a p-value and raw data file

hicMergeTADbins

preprocessing

one Hi-C matrix, one BED file

one Hi-C matrix

Uses a BED file of domains or TAD boundaries to merge the bin counts of a Hi-C matrix.

chicQualityControl

preprocessing

Hi-C matrices reference point BED file

two plots accepted reference point BED file

Checks for sparsity of viewpoints and removes them if too sparse.

chicViewpointBackgroundModel

preprocessing

Hi-C matrices reference point BED file

background model file

Creates a background model for all given samples and reference points.

chicViewpoint

preprocessing

Hi-C matrices background model file

viewpoint file(s)

Creates per sample per viewpoint one viewpoint file.

chicSignificantInteractions

chicAggregateStatistic

preprocessing analysis

viewpoint file(s) background model file

significant interaction file(s) target file(s)

Detects significant interactions per viewpoint based on the background and neighborhood merging via x-fold and loose p-values.

preprocessing

viewpoint files(s) target file (s)

aggregated file(s) for differential test

Aggregates for one viewpoint of two samples via a target file the locations to test for differential interactions.

chicDifferentialTest

analysis

aggregated file(s) of two samples

H0_accepted-, H0_rejected-files

Tests with chi2 or fisher for differential interactions of two samples.

chicPlotViewpoint

visualization

viewpoint file(s) differential expression file(s) significant interactions file(s)

one image per viewpoint

Visualization of a viewpoint.

General principles

A typical HiCExplorer command could look like this:

$ hicPlotMatrix -m myHiCmatrix.h5 \
-o myHiCmatrix.pdf \
--clearMaskedBins \
--region chrX:10,000,000-15,000,000 \
--vMin -4 --vMax 4 \

You can always see all available command-line options via –help:

$ hicPlotMatrix --help
  • Output format of plots should be indicated by the file ending, e.g. MyPlot.pdf will return a pdf file, MyPlot.png a png-file.

  • Most of the tools that produce plots can also output the underlying data - this can be useful in cases where you don’t like the HiCExplorer visualization, as you can then use the data matrices produced by deepTools with your favorite plotting tool, such as R.

  • The vast majority of command line options are also available in Galaxy (in a few cases with minor changes to their naming).

Example usage

Hi-C analysis of mouse ESCs using HiCExplorer

The following example shows how we can use HiCExplorer to analyze a published dataset. Here we are using a Hi-C dataset from Marks et. al. 2015, on mouse ESCs.

Protocol

The collection of the cells for Hi-C and the Hi-C sample preparation procedure was performed as previously described Lieberman-Aiden et al., with the slight modification that DpnII was used as restriction enzyme during initial digestion. Paired-end libraries were prepared according to Lieberman-Aiden et al. and sequenced on the NextSeq 500 platform using 2 × 75 bp sequencing.

Prepare for analysis
Download Raw fastq files

The fastq files can be downloaded from the EBI archive (or NCBI archive). We will store the files in the directory original_data.

mkdir original_data

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/007/SRR1956527/SRR1956527_1.fastq.gz -O original_data/SRR1956527_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/007/SRR1956527/SRR1956527_2.fastq.gz -O original_data/SRR1956527_2.fastq.gz

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/008/SRR1956528/SRR1956528_1.fastq.gz -O original_data/SRR1956528_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/008/SRR1956528/SRR1956528_2.fastq.gz -O original_data/SRR1956528_2.fastq.gz

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/009/SRR1956529/SRR1956529_1.fastq.gz -O original_data/SRR1956529_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR195/009/SRR1956529/SRR1956529_2.fastq.gz -O original_data/SRR1956529_2.fastq.gz
Create an index

We start with creating an index for our alignment software for the GRCm38/mm10 genome. As a source we use the mm10 genome from UCSC

mkdir genome_mm10
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz -O genome_mm10/chromFa.tar.gz
tar -xvzf genome_mm10/chromFa.tar.gz
cat genome_mm10/*.fa > genome_mm10/mm10.fa

We have the mm10 genome stored in one fasta file and can build the index. We tried it successfully with hisat2, bowtie2 and bwa. Run the mapping with one of them and do not mix them!

hisat2
hisat2-build -p 8 genome_mm10/mm10.fa hisat2/mm10_index

You can find more information about hisat

bowtie2
bowtie2-build genome_mm10/mm10.fa bowtie2/mm10_index --threads 8

You can find more information about bowtie

bwa
bwa index -p bwa/mm10_index genome_mm10/mm10.fa

You can find more information about bwa

Mapping the RAW files

Mates have to be mapped individually to avoid mapper specific heuristics designed for standard paired-end libraries.

It is important to have in mind for the different mappers:

  • for either bowtie2 or hisat2 use the –reorder parameter which tells bowtie2 or hisat2 to output the sam files in the exact same order as in the .fastq files.

  • use local mapping, in contrast to end-to-end. A fraction of Hi-C reads are chimeric and will not map end-to-end thus, local mapping is important to increase the number of mapped reads.

  • Tune the aligner parameters to penalize deletions and insertions. This is important to avoid aligned reads with gaps if they happen to be chimeric.

hisat2
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956527_1.fastq.gz --reorder | samtools view -Shb - > SRR1956527_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956527_2.fastq.gz --reorder | samtools view -Shb - > SRR1956527_2.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956528_1.fastq.gz --reorder | samtools view -Shb - > SRR1956528_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956528_2.fastq.gz --reorder | samtools view -Shb - > SRR1956528_2.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956529_1.fastq.gz --reorder | samtools view -Shb - > SRR1956529_1.bam
hisat2 -x hisat2/mm10_index --threads 8 -U ../original_data/SRR1956529_2.fastq.gz --reorder | samtools view -Shb - > SRR1956529_2.bam
bowtie2
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956527_1.fastq.gz --reorder | samtools view -Shb - > SRR1956527_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956527_2.fastq.gz --reorder | samtools view -Shb - > SRR1956527_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956528_1.fastq.gz --reorder | samtools view -Shb - > SRR1956528_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956528_2.fastq.gz --reorder | samtools view -Shb - > SRR1956528_2.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956529_1.fastq.gz --reorder | samtools view -Shb - > SRR1956529_1.bam
bowtie2 -x bowtie2/mm10_index --threads 8 -U ../original_data/SRR1956529_2.fastq.gz --reorder | samtools view -Shb - > SRR1956529_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956527_1.fastq.gz | samtools view -Shb - > SRR1956527_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956527_2.fastq.gz | samtools view -Shb - > SRR1956527_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956528_1.fastq.gz | samtools view -Shb - > SRR1956528_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956528_2.fastq.gz | samtools view -Shb - > SRR1956528_2.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956529_1.fastq.gz | samtools view -Shb - > SRR1956529_1.bam
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 bwa/mm10_index original_data/SRR1956529_2.fastq.gz | samtools view -Shb - > SRR1956529_2.bam
Build, visualize and correct Hi-C matrix
Create a Hi-C matrix using the aligned files

In the following we will create three Hi-C matrices and merge them to one.

Build Hi-C matrix

hicBuildMatrix builds the matrix of read counts over the bins in the genome, considering the sites around the given restriction site. We need to provide:

  • the input BAM/SAM files: –samFiles SRR1956527_1.sam SRR1956527_2.sam

  • binsize: –binSize 1000

  • restriction sequence: –restrictionSequence GATC

  • the name of output bam file which contains the accepted alignments: –outBam SRR1956527_ref.bam

  • name of output matrix file: –outFileName hicMatrix/SRR1956527_10kb.h5

  • the folder for the quality report: –QCfolder hicMatrix/SRR1956527_QC

  • the number of to be used threads. Minimum value is 3: –threads 8

  • the buffer size for each thread buffering inputBufferSize lines of each input BAM/SAM file: –inputBufferSize 400000

To build the Hi-C matrices:

mkdir hicMatrix
hicBuildMatrix --samFiles SRR1956527_1.bam SRR1956527_2.bam --binSize 10000 --restrictionSequence GATC --outBam SRR1956527_ref.bam --outFileName hicMatrix/SRR1956527_10kb.h5 --QCfolder hicMatrix/SRR1956527_10kb_QC --threads 8 --inputBufferSize 400000
hicBuildMatrix --samFiles SRR1956528_1.bam SRR1956528_2.bam --binSize 10000 --restrictionSequence GATC --outBam SRR1956528_ref.bam --outFileName hicMatrix/SRR1956528_10kb.h5 --QCfolder hicMatrix/SRR1956528_10kb_QC --threads 8 --inputBufferSize 400000
hicBuildMatrix --samFiles SRR1956529_1.bam SRR1956529_2.bam --binSize 10000 --restrictionSequence GATC --outBam SRR1956529_ref.bam --outFileName hicMatrix/SRR1956529_10kb.h5 --QCfolder hicMatrix/SRR1956529_10kb_QC --threads 8 --inputBufferSize 400000

The output bam files show that we have around 34M, 54M and 58M selected reads for SRR1956527, SRR1956528 & SRR1956529, respectively. Normally 25% of the total reads are selected. The output matrices have counts for the genomic regions. The extension of output matrix files is .h5.

A quality report is created in e.g. hicMatrix/SRR1956527_10kb_QC, have a look at the report hicQC.html.

The Hi-C quality report showing the results for 'pairs used & filtered'

A segment of Hi-C quality report.

Merge (sum) matrices from replicates

To increase the depth of reads we merge the counts from these three replicates.

hicSumMatrices --matrices hicMatrix/SRR1956527_10kb.h5 hicMatrix/SRR1956528_10kb.h5 \
        hicMatrix/SRR1956529_10kb.h5 --outFileName hicMatrix/replicateMerged_10kb.h5
Plot Hi-C matrix

A 10kb bin matrix is quite large to plot and is better to reduce the resolution (to know the size of a Hi-C matrix use the tool hicInfo), i.e. we usually run out of memory for a 1 kb or a 10 kb matrix and second, the time to plot is very long (minutes instead of seconds). For this we use the tool hicMergeMatrixBins.

Merge matrix bins for plotting

hicMergeMatrixBins merges the bins into larger bins of given number (specified by –numBins). We will merge 1000 bins in the original (uncorrected) matrix and then correct it. The new bin size is going to be 10.000 bp * 100 = 1.000.000 bp = 1 Mb

hicMergeMatrixBins \
--matrix hicMatrix/replicateMerged_10kb.h5 --numBins 100 \
--outFileName hicMatrix/replicateMerged.100bins.h5
Plot the corrected Hi-C matrix

hicPlotMatrix can plot the merged matrix. We use the following options:

  • the matrix to plot: –matrix hicMatrix/replicateMerged.100bins.h5

  • logarithmic values for plotting: –log1p

  • the resolution of the plot: –dpi 300

  • masked bins should not be plotted: –clearMaskedBins

  • the order of the chromosomes in the plot: –chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY

  • the color map: –colorMap jet

  • the title of the plot: –title “Hi-C matrix for mESC”

  • the plot image itself: –outFileName plots/plot_1Mb_matrix.png

mkdir plots
hicPlotMatrix \
--matrix hicMatrix/replicateMerged.100bins.h5 \
--log1p \
--dpi 300 \
--clearMaskedBins \
--chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--colorMap jet \
--title "Hi-C matrix for mESC" \
--outFileName plots/plot_1Mb_matrix.png
corrected\_1Mb\_plot

The Hi-C interaction matrix with a resolution of 1 MB.

Correct Hi-C Matrix

hicCorrectMatrix corrects the matrix counts in an iterative manner. For correcting the matrix, it’s important to remove the unassembled scaffolds (e.g. NT_) and keep only chromosomes, as scaffolds create problems with matrix correction. Therefore we use the chromosome names (1-19, X, Y) here. Important: Use ‘chr1 chr2 chr3 etc.’ if your genome index uses chromosome names with the ‘chr’ prefix.

Matrix correction works in two steps: first a histogram containing the sum of contact per bin (row sum) is produced. This plot needs to be inspected to decide the best threshold for removing bins with lower number of reads. The second steps removes the low scoring bins and does the correction.

In the following we will use a matrix with a bin size of 20 kb: 10kb * 2 = 20 kb

hicMergeMatrixBins \
--matrix hicMatrix/replicateMerged_10kb.h5 --numBins 2 \
--outFileName hicMatrix/replicateMerged.matrix_20kb.h5

(1-19, X, Y) variant:

hicCorrectMatrix diagnostic_plot \
--chromosomes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 --plotName hicMatrix/diagnostic_plot.png

(chr1-ch19, chrX, chrY) variant:

hicCorrectMatrix diagnostic_plot \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 --plotName hicMatrix/diagnostic_plot.png
diagplot

Diagnostic plot for the Hi-C matrix at a resolution of 20 kb

The output of the program prints a threshold suggestion that is usually accurate but is better to revise the histogram plot. The threshold is visualized in the plot as a black vertical line. See Example usage for an example and for more info.

The threshold parameter needs two values:
  • low z-score

  • high z-score

“The absolute value of z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.” (Source). For more information see wikipedia.

z-score definition: z = (x - my) / sigma

The z-score definition.

In our case the distribution describes the counts per bin of a genomic distance. To remove all bins with a z-score threshold less / more than X means to remove all bins which have less / more counts than X of mean of their specific distribution in units of the standard deviation.

Looking at the above distribution, we can select the value of -2 (lower end) and 3 (upper end) to remove. This is given by the –filterThreshold option in hicCorrectMatrix.

(1-19, X, Y) variant:

hicCorrectMatrix correct \
--chromosomes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 \
--filterThreshold -2 3 --perchr --outFileName hicMatrix/replicateMerged.Corrected_20kb.h5

(chr1-ch19, chrX, chrY) variant:

hicCorrectMatrix correct \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
--matrix hicMatrix/replicateMerged.matrix_20kb.h5 \
--filterThreshold -2 3 --perchr --outFileName hicMatrix/replicateMerged.Corrected_20kb.h5

It can happen that the correction stops with:

`ERROR:iterative correction:*Error* matrix correction produced extremely large values.
This is often caused by bins of low counts. Use a more stringent filtering of bins.`

This can be solved by a more stringent z-score values for the filter threshold or by a look at the plotted matrix. In our case we see that chromosome Y is having more or less 0 counts in its bins. This chromosome can be excluded from the correction by not defining it for the set of chromosomes that should be corrected, parameter –chromosomes.

In the case of multiple samples / replicates that need to be normalized to the same read coverage we recommend to compute first the normalization (with hicNormalize) and correct the data (with hicCorrectMatrix) in a second step.

Plot corrected matrix

We can now plot the one of the chromosomes (e.g. chromosome X) , with the corrected matrix.

New parameter:
  • The region to plot: –region chrX:10000000-2000000 or –region chrX

(1-19, X, Y) variant:

hicPlotMatrix \
--log1p --dpi 300 \
-matrix hicMatrix/replicateMerged.Corrected_20kb.npz \
--region X --title "Corrected Hi-C matrix for mESC : chrX" \
--outFileName plots/replicateMerged_Corrected-20kb_plot-chrX.png

(chr1-ch19, chrX, chrY) variant:

hicPlotMatrix \
--log1p --dpi 300 \
--matrix hicMatrix/replicateMerged.Corrected_20kb.npz \
--region chrX --title "Corrected Hi-C matrix for mESC : chrX" \
--outFileName plots/replicateMerged_Corrected-20kb_plot-chrX.png
correctMatrixPlot

The Hi-C interaction matrix for chromosome X.

Plot TADs

“The partitioning of chromosomes into topologically associating domains (TADs) is an emerging concept that is reshaping our understanding of gene regulation in the context of physical organization of the genome” [Ramirez et al. 2017].

Find TADs

TAD calling works in two steps: First HiCExplorer computes a TAD-separation score based on a z-score matrix for all bins. Then those bins having a local minimum of the TAD-separation score are evaluated with respect to the surrounding bins to decide assign a p-value. Then a cutoff is applied to select the bins more likely to be TAD boundaries.

hicFindTADs tries to identify sensible parameters but those can be change to identify more stringent set of boundaries.

mkdir TADs
hicFindTADs --matrix hicMatrix/replicateMerged.Corrected_20kb.h5 \
--minDepth 60000 --maxDepth 120000 --numberOfProcessors 8 --step 20000 \
--outPrefix TADs/marks_et-al_TADs_20kb-Bins  --minBoundaryDistance 80000 \
--correctForMultipleTesting fdr --threshold 0.05

As an output we get the boundaries, domains and scores separated files. We will use in the plot below only the TAD-score file.

Build Tracks File

We can plot the TADs for a given chromosomal region. For this we need to create a track file containing the instructions to build the plot. The hicPlotTADs documentation contains the instructions to build the track file.

In following plot we will use the listed track file. Please store it as track.ini.

[hic]
file = hicMatrix/replicateMerged.Corrected_20kb.h5
title = HiC mESC chrX:99974316-101359967
colormap = RdYlBu_r
depth = 2000000
height = 7
transform = log1p
file_type = hic_matrix

[tads]
file = TADs/marks_et-al_TADs_20kb-Bins_domains.bed
file_type = domains
border_color = black
color = none
line_width = 1.5
overlay_previous = share-y
show_data_range = no

[x-axis]
fontsize = 16
where = top

[tad score]
file = TADs/marks_et-al_TADs_20kb-Bins_score.bm
title = TAD separation score
height = 4
file_type = bedgraph_matrix

[spacer]

[gene track]
file = mm10_genes_sorted.bed
height = 10
title = mm10 genes
labels = false

We used as a gene track mm10 genes and sorted with sortBed from bedtools.

Plot

We plot the result with:

(1-19, X, Y) variant:

hicPlotTADs --tracks track.ini --region X:98000000-105000000 \
--dpi 300 --outFileName plots/marks_et-al_TADs.png \
--title "Marks et. al. TADs on X"

(chr1-ch19, chrX, chrY) variant:

hicPlotTADs --tracks track.ini --region chrX:98000000-105000000 \
--dpi 300 --outFileName plots/marks_et-al_TADs.png \
--title "Marks et. al. TADs on X"

The result is:

TADplot

TADplot

Importing and Exporting HiCExplorer data

Exporting HiCExplorer output to Bioconductor

It’s possible to export Hi-C Matrices produced by HiCExplorer to bioconductor in R, which allows us to use existing bioconductor infrastructure for differential Hi-C analysis. The tool hicConvertFormat allows us to write Hi-C matrices in a format that can easily be imported in bioconductor as GInteractions object. Below is an example.

## Assuming HiCExplorer is installed in ~/programs
hicConvertFormat --matrix ~/programs/HiCExplorer/test/test_data/Li_et_al_2015.h5 \ --inputFormat h5
-o GInteration_example --outputFormat GInteractions

The output file is in tsv format. It looks like this :

V1

V2

V3

V4

V5

V6

V7

X

19537

20701

X

19537

20701

1054.47483

X

19537

20701

X

20701

22321

375.86990

X

19537

20701

X

22321

24083

222.53900

X

19537

20701

X

24083

25983

114.26340

X

19537

20701

X

25983

27619

95.87463

This file can now be loaded into R as a GInteractions object, as shown below :

## INSIDE R
library(GenomicRanges)
library(InteractionSet)

hic <- read.delim("GInteraction_example.tsv", header = FALSE)

# Converting data.frame to GInteraction
convertToGI <- function(df){
            row.regions <- GRanges(df$V1, IRanges(df$V2,df$V3))# interaction start
            col.regions <- GRanges(df$V4, IRanges(df$V5,df$V6))# interaction end
            gi <- GInteractions(row.regions, col.regions)
            gi$norm.freq <- df$V7 # Interaction frequencies
            return(gi)
}
                        }
hic.gi <- convertToGI(hic)

Multiple files can be loaded, and converted to an InteractionSet object. If you have prepared matrices using binning, the intervals in the matrices must be the same. Therefore it’s easy to merge these matrices together in an InteractionSet object. In case some bins don’t match, we can merge the GInteraction objects based on matching bins, as follows.

# assuming hic.gi is a list of two GInteration objects hic.gi1 and hic.gi2
# hic.gi <- list(hic.gi1, hic.gi2)

# Get common regions between the two objects
combined <- unique(c(hic.gi$hic.gi1, hic.gi$hic.gi2))

# replace original regions with the common regions
replaceRegions(hic.gi$hic.gi1) <- regions(combined)
replaceRegions(hic.gi$hic.gi2) <- regions(combined)

# Get the matching indexes between the two objects
matched <- lapply(hic.gi, function(x) {
            match(x, combined)
            })

# Create a count matrix (for interaction frequencies)
counts <- matrix(0, ncol = 2, nrow=length(combined)) # counts for unmatched bins set to zero

# fill in the counts for matched bins
counts[matched$hic.gi1,1] <- hic.gi$hic.gi1$norm.freq
counts[matched$hic.gi2,2] <- hic.gi$hic.gi2$norm.freq

# Finally, create the InteractionSet object
iset <- InteractionSet(counts, combined)

InteractionSet objects can be used for packages like diffHic, for differential Hi-C analysis.

  • For more information on working with GInteraction and InteractionSet objects in bioconductor check out this vignette.

Captured Hi-C data analysis

How we use HiCExplorer to analyse cHi-C data

This How-to is based on the published dataset from Andrey et al. 2017. For the tutorial, we use the samples FL-E13.5 and MB-E-10.5.

Download the raw data

Please download the raw data via the following links or via NCBI GSE84795 .

Dataset

forward

reverse

CC-FL-E135-Wt-Mm-Rep1

SRR3950565_1

SRR3950565_2

CC-FL-E135-Wt-Mm-Rep2

SRR3950566_1

SRR3950566_2

CC-MB-E105-Wt-Mm-Rep1

SRR3950559_1

SRR3950559_2

CC-MB-E105-Wt-Mm-Rep2

SRR3950560_1

SRR3950560_2

Mapping

Map the files with a mapper of your choice against the mm9 reference genome; as an example, the mapping with bowtie2 is shown.

bowtie2 -x mm9_index --threads 8 -U SRR3950565_1.fastq.gz --reorder | samtools view -Shb - > SRR3950565_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950565_2.fastq.gz --reorder | samtools view -Shb - > SRR3950565_2.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950566_1.fastq.gz --reorder | samtools view -Shb - > SRR3950566_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950566_2.fastq.gz --reorder | samtools view -Shb - > SRR3950566_2.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950559_1.fastq.gz --reorder | samtools view -Shb - > SRR3950559_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950559_2.fastq.gz --reorder | samtools view -Shb - > SRR3950559_2.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950560_1.fastq.gz --reorder | samtools view -Shb - > SRR3950560_1.bam
bowtie2 -x mm9_index --threads 8 -U SRR3950560_2.fastq.gz --reorder | samtools view -Shb - > SRR3950560_2.bam
Create cHi-C matrices

To create a cHi-C matrix we use HiCExplorer’s hicBuildMatrix for each replicate separately and merge the replicates into a single matrix later. Like Andrey et al., we use a resolution of 1kb and use the restriction enzyme DpnII.

hicBuildMatrix --samFiles SRR3950565_1.bam SRR3950565_2.bam  --binSize 1000 --restrictionSequence GATC --outFileName SRR3950565.cool --QCfolder SRR3950565_QC --threads 6
hicBuildMatrix --samFiles SRR3950566_1.bam SRR3950566_2.bam  --binSize 1000 --restrictionSequence GATC --outFileName SRR3950566.cool --QCfolder SRR3950566_QC --threads 6
hicBuildMatrix --samFiles SRR3950559_1.bam SRR3950559_2.bam  --binSize 1000 --restrictionSequence GATC --outFileName SRR3950559.cool --QCfolder SRR3950559_QC --threads 6
hicBuildMatrix --samFiles SRR3950560_1.bam SRR3950560_2.bam  --binSize 1000 --restrictionSequence GATC --outFileName SRR3950560.cool --QCfolder SRR3950560_QC --threads 6
hicSumMatrix --matrices SRR3950565.cool SRR3950566.cool --outFileName FL-E13-5.cool
hicSumMatrix --matrices SRR3950559.cool SRR3950560.cool --outFileName MB-E10-5.cool
Terminology: Reference point vs viewpoint

A reference point is one single genomic position i.e. chr1 500 510 is a reference point. A viewpoint is in contrast the region defined by the reference point and the up and downstream range, i.e. range 100 and reference point chr1 50 70 leads to the viewpoint chr1 400 610.

Creation of reference point file

Andrey et al. state that they used a total of 460 reference points, but that 24 were removed due to low sequence coverage or non-correspondence to a promoter region, leading to 446 in total.

To reproduce this, we need all reference points which are published in Supplementary Table S2 and S8.

It is simplest to create the reference point file in the following format using Excel and store it as a tab separated file:

chr1        4487435 4487435 Sox17

Otherwise, just download the prepared file. We will do the quality control on our own and compare with the results of Andrey et al.

Quality control

As a first step we compute the quality of each viewpoint by considering the sparsity. As soon as one viewpoint in one sample is less than the user-defined threshold (–sparsity), the reference point is no longer considered.

chicQualityControl -m FL-E13-5.cool MB-E10-5.cool -rp reference_points.bed --sparsity 0.025 --threads 20

The quality control creates five files: two plots showing the sparsity structure of the samples and three files containing the accepted reference points, the rejected ones and one file with all viewpoints and their sparsity level per sample.

In our example the plots look like the following:

_images/sparsity.png _images/histogram.png

The first plot shows the sparsity per sample for each viewpoint, while the second one shows the sparsity distribution as a histogram. It can be seen quite clearly that only a minority of the samples are really sparse and therefore need to be removed. The red line indicates the chosen sparsity level.

The reference point Tdap2b at chr1 19198995, which has a sparsity of 0.018 in FL-E13-5 and 0.016 in MB-E10-5, is considered to be of bad quality. To confirm this result we plot the viewpoint:

_images/Tfap2b_FL-E13-5_MB-E10-5_chr1_19198995_19198995.png

The plot shows there are effectively no interactions except with the reference point itself and confirm the point should be removed from the data.

The result of the quality control rejected 71 reference points as too sparse, but surprisingly the viewpoints rejected by Andrey et al. are accepted. An explanation for this could be that we only consider two samples and not all samples used by Andrey, and therefore we missed the bad quality of some viewpoints.

Please consider that this bad viewpoint was selected arbitrary out of the sample data and is only an example.

Download the data: Filtered reference points, Quality control raw data and rejected reference points.

Background model

The background model computes all viewpoints given by the reference points for both samples in a range defined by the parameter fixateRange. We recommend setting it to 500kb because real interactions above the range are rarely observed and very low interaction numbers such as 1 are already considered to be significant. With this setting, only the interactions in a range 500kb up- and downstream of the reference point are considered for each viewpoint. Based on this data, two background models are computed; the first one computes the average per relative distance to the reference point, and secondly, a negative binomial distribution per relative distance to the reference point is fitted. This first model is used for filtering in the significant interaction evaluation by an x-fold factor and for plotting. The negative binomial model is more important; it is used to compute a p-value per relative distance in each sample, which is used to make the decision if an interaction is considered as significant.

chicViewpointBackgroundModel -m FL-E13-5.cool MB-E10-5.cool --fixateRange 500000 -t 20 -rp reference_points.bed -o background_model.txt

The background model looks like this:

Relative position   size nbinom     prob nbinom     max value       mean value
-500000             75.895607451213 0.998528939430  2.333333333333  0.000101543771
-499000             90.348171762247 0.998725799952  2.750000000000  0.000104681360
-498000             78.512621775755 0.998514111424  2.800000000000  0.000106107536
-497000             75.706478185610 0.998327784087  3.800000000000  0.000116147819

You can download the background model.

Viewpoint computation

In this step the viewpoints for each reference point listed in a reference_points.bed-file is extracted from the interaction matrix, using the quality controlled file created by chicQualityControl. The up- and downstream range can be given via –range upstream downstream. Please use the same value for –fixateRange as in the background model computation. For each relative distance the x-fold over the average value of this relative distance is computed and each location is assigned a p-value based on the background negative binomial distribution for this relative distance. For each viewpoint one viewpoint file is created and stored in the folder given by the parameter –outputFolder.

chicViewpoint -m FL-E13-5.cool MB-E10-5.cool --averageContactBin 5 --range 1000000 1000000 -rp referencePoints.bed -bmf background_model.txt --writeFileNamesToFile interactionFiles.txt --outputFolder interactionFilesFolder --fixateRange 500000 --threads 20

The name of each viewpoint file starts with its sample name (given by the name of the matrix), the exact location and the gene / promoter name. For example, the viewpoint chr1 4487435 4487435 Sox17 from MB-E10-5.cool matrix will be called MB-E10-5_chr1_4487435_4487435_Sox17.txt and looks like the following:

# Interaction file, created with HiCExplorer's chicViewpoint version 3.2
# MB-E10-5.cool chr1_4487435_4487435    3.49  Mbp       5.49  Mbp       Sox17   Sum of interactions in fixate range: 978.0
# Chromosome    Start   End     Gene    Sum of interactions     Relative position       Relative Interactions   p-value x-fold  Raw
#
chr1    3487000 3488000 Sox17   978.0   -1000000        0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3488000 3489000 Sox17   978.0   -999000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3489000 3490000 Sox17   978.0   -998000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3490000 3491000 Sox17   978.0   -997000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3491000 3492000 Sox17   978.0   -996000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3492000 3493000 Sox17   978.0   -995000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3493000 3494000 Sox17   978.0   -994000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3494000 3495000 Sox17   978.0   -993000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3495000 3496000 Sox17   978.0   -992000 0.000000000000  0.894286365313  0.000000000000  0.000000000000
chr1    3496000 3497000 Sox17   978.0   -991000 0.000000000000  0.894286365313  0.000000000000  0.000000000000

Each file contains a header with information about the HiCExplorer version used, the sample, the viewpoint and the content of the different columns.

If the parameter –writeFileNamesToFile is set, the viewpoint file names are written to a file which can be used for batch processing in the later steps. Each sample is combined with every other sample for each viewpoint to enable the batch processing for the differential analysis. Example: matrices FL-E13-5.cool and MB-E10-5.cool with the three reference points:

FL-E13-5_chr1_4487435_4487435_Sox17.txt
MB-E10-5_chr1_4487435_4487435_Sox17.txt
FL-E13-5_chr1_14300280_14300280_Eya1.txt
MB-E10-5_chr1_14300280_14300280_Eya1.txt
FL-E13-5_chr1_19093103_19093103_Tfap2d.txt
MB-E10-5_chr1_19093103_19093103_Tfap2d.txt
Significant interactions detection

To detect significant interactions and to prepare a target file for each viewpoint which will be used for the differential analysis, the script chicSignificantInteractions is used. It offers two modes: either the user can specify an x-fold value or a loose p-value. The first one considers all interactions with a minimum x-fold over the average background for its relative distribution as a candidate or secondly, all interactions with a loose p-value or less are considered. These are pre-selection steps to be able to detect wider peaks in the same way as sharp ones. All detected candidates are merged to one peak if they are direct neighbors, and the sum of all interaction values of this neighborhood is used to compute a new p-value. The p-value is computed based on the relative distance negative binomial distribution of the interaction with the original highest interaction value. All peaks considered are accepted as significant interactions if their p-value is as small as the threshold –pvalue.

To exclude interactions with an interaction value smaller than desired the parameter –peakInteractionsThreshold can be set.

In this example we use the reference point Mstn at location chr1 53118507, a loose p-value of 0.1 and p-value of 0.01:

chicSignificantInteractions --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt -bmf background_model.txt --range 1000000 1000000 --pValue 0.01 --loosePValue 0.1

This creates two files:

FL-E13-5_chr1_53118507_53118507_Mstn_target.txt
FL-E13-5_chr1_53118507_53118507_Mstn__significant_interactions.txt

These files are stored in the folders given by the parameters –targetFolder and –outputFolder.

The significant interaction files looks like the following:

# FL-E13-5.cool     chr1_53118507_53118507  52.12  Mbp      54.12  Mbp      Mstn    Sum of interactions in fixate range: 1517.0
#Chromosome Start   End     Gene    Sum of interactions     Relative position       Relative interactions   p-value x-fold  Raw target
chr1        53318000        53321000        Mstn    1517.0  200000  0.00395517468600000040  0.00000145009991170397  8.37043994897500098773  6.00000000000000000000
chr1        53329000        53334000        Mstn    1517.0  212000  0.01081081081000000166  0.00000000000000188738  22.37661518795599846499 16.39999999999999857891
chr1        53348000        53350000        Mstn    1517.0  231000  0.00329597890600000004  0.00001463968364323609  7.37204640642899988734  5.00000000000000000000
chr1        53351000        53359000        Mstn    1517.0  239000  0.01437046802899999941  0.00000000000000099920  34.20972383882499912033 21.80000000000000071054
chr1        53393000        53401000        Mstn    1517.0  278000  0.01793012524599999977  0.00000000000000044409  48.20518935066399990319 27.19999999999999928946
chr1        53408000        53420000        Mstn    1517.0  294000  0.02307185234000000418  0.00000000000001743050  68.05162660125500906361 35.00000000000000000000

The target file looks like:

# Significant interactions result file of HiCExplorer's chicSignificantInteractions version 3.2-dev
# targetFolder/FL-E13-5_chr1_53118507_53118507_Mstn_target.txt
# Mode: loose p-value with 0.1
# Used p-value: 0.01
#
chr1        53318000        53321000
chr1        53329000        53334000
chr1        53348000        53359000
chr1        53393000        53401000
chr1        53408000        53420000
Batch mode

The batch mode supports the computation of many viewpoints at once and is able to create one target list for the same viewpoint and two (or n) samples. To do the batch computation the parameter –batchMode needs to be added and the folder of the viewpoint files needs to be defined. In addition, the list of viewpoints created by chicViewpoint with –writeFileNamesToFile needs to be used as input. One target file is created for n consecutive lines and can be defined via the –computeSampleNumber parameter. However, for the differential test where the target file is needed, only two samples and one target file is supported.

chicSignificantInteractions --interactionFile interactionFiles.txt --interactionFileFolder interactionFilesFolder/  -bmf background_model.txt --range 1000000 1000000 --pValue 0.01 --loosePValue 0.1 --batchMode

The output is:

  • A folder containing all target files, name defined by –targetFolder, default value is targetFolder

  • A folder with all significant interaction files, name defined by –outputFolder, default value is significantFiles

  • A list containing the file names of all target files, name defined by –targetFileList, default value is targetList.txt

  • A list containing the file names of all significant interaction files, name defined by –writeFileNamesToFile, default value is significantFilesBatch.txt

Aggregate data for differential test

The process to aggregate data is only necessary if the differential test is used. Either two files and one target list are used to generate the files for the differential test or the batch mode can be used. chicAggregateStatistic takes the created viewpoint files from chicViewpoint as input and either the target files per two samples created by chicSignificantInteractions or one target file which applies for all viewpoints.

chicAggregateStatistic --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt interactionFilesFolder/MB-E10-5_chr1_53118507_53118507_Mstn.txt --targetFile targetFolder/FL-E13-5_MB-E10-5_chr1_53118507_53118507_Mstn_target.txt

It selects the original data based on the target locations and returns one file per sample which is used for the differential test.

Batch mode

In the batch mode the interaction file is the file containing the viewpoint file names, the folder needs to be defined by –interactionFileFolder, the same applies to the target file and folder. Two viewpoint files are match with one target file created by chicSignificantInteractions or one target file for all viewpoints. Please notice the output files are written to the folder name defined by –outputFolder, the default is aggregatedFiles and it is recommended to write the file names for further batch processing with hicDifferentialTest to –writeFileNamesToFile. All output files get the suffix defined by –outFileNameSuffix, default value is _aggregate_target.txt.

chicAggregateStatistic --interactionFile interactionFiles.txt --interactionFileFolder interactionFilesFolder --targetFile targetList.txt --targetFileFolder targetFolder --batchMode
Differential test

The differential test tests the interaction value of the reference point and the interaction value of the target of two samples for a differential expression. To achieve this, either Fisher’s test or the chi-squared test can be used. H0 is defined as ‘both locations are equal’, meaning the differential expressed targets can be found in the H0 rejected file.

This can be computed per sample:

chicDifferentialTest --interactionFile aggregatedFiles/FL-E13-5_chr1_53118507_53118507_Mstn__aggregate_target.txt aggregatedFiles/MB-E10-5_chr1_53118507_53118507_Mstn__aggregate_target.txt --alpha 0.05 --statisticTest fisher

Or via batch mode:

chicDifferentialTest --interactionFile aggregatedFilesBatch.txt --interactionFileFolder aggregatedFiles --alpha 0.05 --statisticTest fisher --batchMode --threads 20

In both cases it is important to set the desired alpha value and the output is written to –outputFolder (default differentialResults). For each sample three files are created:

  • H0 rejected targets

  • H0 accepted targets

  • one file containing both

In the batch mode, the file –rejectedFileNamesToFile is also written and contains the file names of the rejected files. This can be used for the batch processing mode of chicPlotViewpoint.

# Differential analysis result file of HiCExplorer's chicDifferentialTest version 3.2-dev
# This file contains the p-values computed by fisher test
# To test the smoothed (float) values they were rounded up to the next integer
#
# Alpha level 0.05
# Degrees of freedom 1
#
# FL-E13-5.cool     chr1_53118507_53118507  52.12  Mbp      54.12  Mbp      Mstn    Sum of interactions in fixate range: 1517.0
# MB-E10-5.cool     chr1_53118507_53118507  52.12  Mbp      54.12  Mbp      Mstn    Sum of interactions in fixate range: 1670.0
#Chromosome Start   End     Gene    Relative distance       sum of interactions 1   target_1 raw    sum of interactions 2   target_2 raw    p-value
chr1        53089000        53091000        Mstn    -28000  1517.00000      5.00000 1670.00000      10.40000                0.21800
chr1        53131000        53133000        Mstn    14000   1517.00000      18.20000        1670.00000      23.60000                0.75900
chr1        53156000        53158000        Mstn    39000   1517.00000      3.00000 1670.00000      10.80000                0.06117
chr1        53251000        53254000        Mstn    135000  1517.00000      4.00000 1670.00000      9.60000         0.18614
chr1        53287000        53291000        Mstn    172000  1517.00000      7.20000 1670.00000      15.00000                0.29506
chr1        53305000        53309000        Mstn    190000  1517.00000      6.20000 1670.00000      12.40000                0.36952
chr1        53318000        53321000        Mstn    202000  1517.00000      6.00000 1670.00000      3.20000         0.53309
chr1        53326000        53334000        Mstn    215000  1517.00000      25.80000        1670.00000      22.60000                0.47374
chr1        53346000        53359000        Mstn    240000  1517.00000      31.60000        1670.00000      22.20000                0.13464
chr1        53408000        53421000        Mstn    302000  1517.00000      36.40000        1670.00000      28.20000                0.21290
# Differential analysis result file of HiCExplorer's chicDifferentialTest version 3.2-dev
# This file contains the p-values computed by fisher test
# To test the smoothed (float) values they were rounded up to the next integer
#
# Alpha level 0.05
# Degrees of freedom 1
#
# FL-E13-5.cool     chr1_53118507_53118507  52.12  Mbp      54.12  Mbp      Mstn    Sum of interactions in fixate range: 1517.0
# MB-E10-5.cool     chr1_53118507_53118507  52.12  Mbp      54.12  Mbp      Mstn    Sum of interactions in fixate range: 1670.0
#Chromosome Start   End     Gene    Relative distance       sum of interactions 1   target_1 raw    sum of interactions 2   target_2 raw    p-value
chr1        53393000        53401000        Mstn    282000  1517.00000      27.20000        1670.00000      6.40000         0.00012
# Differential analysis result file of HiCExplorer's chicDifferentialTest version 3.2-dev
# This file contains the p-values computed by fisher test
# To test the smoothed (float) values they were rounded up to the next integer
#
# Alpha level 0.05
# Degrees of freedom 1
#
# FL-E13-5.cool     chr1_53118507_53118507  52.12  Mbp      54.12  Mbp      Mstn    Sum of interactions in fixate range: 1517.0
# MB-E10-5.cool     chr1_53118507_53118507  52.12  Mbp      54.12  Mbp      Mstn    Sum of interactions in fixate range: 1670.0
#Chromosome Start   End     Gene    Relative distance       sum of interactions 1   target_1 raw    sum of interactions 2   target_2 raw    p-value
chr1        53089000        53091000        Mstn    -28000  1517.00000      5.00000 1670.00000      10.40000                0.21800
chr1        53131000        53133000        Mstn    14000   1517.00000      18.20000        1670.00000      23.60000                0.75900
chr1        53156000        53158000        Mstn    39000   1517.00000      3.00000 1670.00000      10.80000                0.06117
chr1        53251000        53254000        Mstn    135000  1517.00000      4.00000 1670.00000      9.60000         0.18614
chr1        53287000        53291000        Mstn    172000  1517.00000      7.20000 1670.00000      15.00000                0.29506
chr1        53305000        53309000        Mstn    190000  1517.00000      6.20000 1670.00000      12.40000                0.36952
chr1        53318000        53321000        Mstn    202000  1517.00000      6.00000 1670.00000      3.20000         0.53309
chr1        53326000        53334000        Mstn    215000  1517.00000      25.80000        1670.00000      22.60000                0.47374
chr1        53346000        53359000        Mstn    240000  1517.00000      31.60000        1670.00000      22.20000                0.13464
chr1        53393000        53401000        Mstn    282000  1517.00000      27.20000        1670.00000      6.40000         0.00012
chr1        53408000        53421000        Mstn    302000  1517.00000      36.40000        1670.00000      28.20000                0.21290
Plotting of Viewpoints

chicPlotViewpoint can plot n viewpoints in one plot, add the mean background, show the p-value per relative distance per sample as an additional heatmap bar and highlight significant interactions or differential expressed regions.

One viewpoint:

chicPlotViewpoint --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt --range 200000 200000 -o single_plot.png
_images/single_plot.png

Two viewpoints, background, differential expression and p-values:

chicPlotViewpoint --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt interactionFilesFolder/MB-E10-5_chr1_53118507_53118507_Mstn.txt --range 300000 300000 --pValue --differentialTestResult differentialResults/FL-E13-5_MB-E10-5_chr1_53118507_53118507_Mstn__H0_rejected.txt --backgroundModelFile background_model.txt -o differential_background_pvalue.png
_images/differential_background_pvalue.png

Two viewpoints, background, significant interactions and p-values:

chicPlotViewpoint --interactionFile interactionFilesFolder/FL-E13-5_chr1_53118507_53118507_Mstn.txt interactionFilesFolder/MB-E10-5_chr1_53118507_53118507_Mstn.txt --range 300000 300000 --pValue --significantInteractions significantFiles/FL-E13-5_chr1_53118507_53118507_Mstn__significant_interactions.txt significantFiles/MB-E10-5_chr1_53118507_53118507_Mstn__significant_interactions.txt --backgroundModelFile background_model.txt -o significant_background_pvalue.png
_images/significant_background_pvalue.png
Batch mode

The batch mode is able to plot files under the same parameter setting for multiple viewpoints. These viewpoints are given by the file created by chicViewpoint with –writeFileNamesToFile parameter. The number of consecutive lines which should be plotted to one image can be defined using –plotSampleNumber. If the differentially expressed regions should be highlighted, setting this parameter to 2 is recommended.

For all modes the principle of a file containing the file names and a folder containing them applies for the plotting too, and using all cores available is highly recommended.

chicPlotViewpoint --interactionFile interactionFiles.txt --interactionFileFolder interactionFilesFolder/ --range 300000 300000 --pValue --significantInteractions significantFilesBatch.txt --significantInteractionFileFolder significantFiles --backgroundModelFile background_model.txt --outputFolder plots --threads 20 --batchMode

How we use HiCExplorer

To generate a Hi-C contact matrix is necessary to perform the following basic steps

  1. Map the Hi-C reads to the reference genome

  2. Filter the aligned reads to create a contact matrix

  3. Filter matrix bins with low or zero read coverage

  4. Remove biases from the Hi-C contact matrices

After a corrected Hi-C matrix is created other tools can be used to visualize it, call TADS or compare it with other matrices.

Reads mapping

Mates have to be mapped individually to avoid mapper specific heuristics designed for standard paired-end libraries.

We have used the HiCExplorer successfully with bwa, bowtie2 and hisat2. However, it is important to:

  • for either bowtie2`or `hisat2 use the –reorder parameter which tells bowtie2 or hisat2 to output the sam files in the exact same order as in the .fastq files.

  • use local mapping, in contrast to end-to-end. A fraction of Hi-C reads are chimeric and will not map end-to-end thus, local mapping is important to increase the number of mapped reads.

  • Tune the aligner parameters to penalize deletions and insertions. This is important to avoid aligned reads with gaps if they happen to be chimeric.

# map the reads, each mate individually using
# for example bwa
#
# bwa mem mapping options:
#       -A INT        score for a sequence match, which scales options -TdBOELU unless overridden [1]
#       -B INT        penalty for a mismatch [4]
#       -O INT[,INT]  gap open penalties for deletions and insertions [6,6]
#       -E INT[,INT]  gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1] # this is set very high to avoid gaps
#                                  at restriction sites. Setting the gap extension penalty high, produces better results as
#                                  the sequences left and right of a restriction site are mapped independently.
#       -L INT[,INT]  penalty for 5'- and 3'-end clipping [5,5] # this is set to no penalty.

$ bwa mem -A1 -B4  -E50 -L0  index_path \
    mate_R1.fastq.gz 2>>mate_R1.log | samtools view -Shb - > mate_R1.bam

$ bwa mem -A1 -B4  -E50 -L0  index_path \
    mate_R2.fastq.gz 2>>mate_R2.log | samtools view -Shb - > mate_R2.bam
Creation of a Hi-C matrix

Once the reads have been mapped the Hi-C matrix can be built. For this, the minimal extra information required is the binSize used for the matrix. Is it best to enter a low number like 10.000 because lower resolution matrices (larger bins) can be easily constructed using hicMergeMatrixBins. Matrices at restriction fragment resolution can be created by providing a file containing the restriction sites, this file can be created with the tool findRestSite

findRestSite that is part of HiCExplorer.

# build matrix from independently mated read pairs
# the restriction sequence GATC is recognized by the DpnII restriction enzyme

$ hicBuildMatrix --samFiles mate_R1.bam mate_R2.bam \
                 --binSize 10000 \
                 --restrictionSequence GATC \
                 --threads 4
                 --inputBufferSize 100000
                 --outBam hic.bam \
                 -o hic_matrix.h5
                 --QCfolder ./hicQC

hicBuildMatrix creates two files, a bam file containing only the valid Hi-C read pairs and a matrix containing the Hi-C contacts at the given resolution. The bam file is useful to check the quality of the Hi-C library on the genome browser. A good Hi-C library should contain piles of reads near the restriction fragment sites. In the QCfolder a html file is saved with plots containing useful information for the quality control of the Hi-C sample like the number of valid pairs, duplicated pairs, self-ligations etc. Usually, only 25%-40% of the reads are valid and used to build the Hi-C matrix mostly because of the reads that are on repetitive regions that need to be discarded.

An important quality control measurement to check is the inter chromosomal fraction of reads as this is an indirect measure of random Hi-C contacts. Good Hi-C libraries have lower than 10% inter chromosomal contacts. The hicQC module can be used to compare the QC measures from different samples.

Correction of Hi-C matrix

The Hi-C matrix has to be corrected to remove GC, open chromatin biases and, most importantly, to normalize the number of restriction sites per bin. Because a fraction of bins from repetitive regions contain few contacts it is necessary to filter those regions first. Also, in mammalian genomes some regions enriched by reads should be discarded. To aid in the filtering of regions hicCorrectMatrix generates a diagnostic plot as follows:

$ hicCorrectMatrix diagnostic_plot -m hic_matrix.h5 -o hic_corrected.png

The plot should look like this:

_images/diagnostic_plot.png

Histogram of the number of counts per bin.

For the upper threshold is only important to remove very high outliers and thus a value of 5 could be used. For the lower threshold it is recommended to use a value between -2 and -1. What it not desired is to try to correct low count bins which could result simply in an amplification of noise. For the upper threshold is not so concerning because those bins will be scaled down.

Once the thresholds have been decided, the matrix can be corrected

# correct Hi-C matrix
$ hicCorrectMatrix correct -m hic_matrix.h5 --filterThreshold -1.5 5 -o hic_corrected.h5

In the case of multiple samples / replicates that need to be normalized to the same read coverage we recommend to compute first the normalization (with hicNormalize) and correct the data (with hicCorrectMatrix) in a second step.

Visualization of results

There are two ways to see the resulting matrix, one using hicPlotMatrix and the other is using hicPlotTADs. The first one allows the visualization over large regions while the second one is preferred to see specific parts together with other information, for example genes or bigwig tracks.

Because of the large differences in counts found int he matrix, it is better to plot the counts using the –log1p option.

$ hicPlotMatrix -m hic_corrected.h5 -o hic_plot.png --region 1:20000000-80000000 --log1p
_images/corrected_matrix_example.png

Corrected Hi-C counts in log scale.

Quality control of Hi-C data and biological replicates comparison

HiCExplorer integrates multiple tools that allow the evaluation of the quality of Hi-C libraries and matrices.

  • hicQC on the log files produced by hicBuildMatrix and control of the pdf file produced.

Proportion of useful reads is important to assess the efficiency of the HiC protocol, which is dependant of proportion of dangling ends detected… Proportion of inter chromosomal, short range and long range contacts are important for….

  • hicPlotDistVsCounts to compare the distribution of corrected Hi-C counts in relation with the genomic

distance between multiple samples. If some differences are observed between biological replicates, these can be investigated more precisely by computing log2ratio matrices.

  • hicCompareMatrices log2ratio of matrices of biological replicates to identify where the potential changes are located.

  • hicPlotPCA bins correlation of two biological replicates.

TAD calling

To call TADs a corrected matrix is needed. Restriction fragment resolution matrices provide the best results. TAD calling works in two steps: First HiCExplorer computes a TAD-separation score based on a z-score matrix for all bins. Then those bins having a local minimum of the TAD-separation score are evaluated with respect to the surrounding bins to decide assign a p-value. Then a cutoff is applied to select the bins more likely to be TAD boundaries.

$ hicFindTADs -m hic_corrected.h5 --outPrefix hic_corrected --numberOfProcessors 16

This code will produce several files: 1. The TAD-separation score file, 2. the z-score matrix, 3. a bed file with the boundary location, 4. a bed file with the domains, 5. a bedgraph file with the TAD-score that can be visualized in a genome browser.

The TAD-separation score and the matrix can be visualized using hicPlotTADs.

_images/chorogenome_example.jpg

Example output from hicPlotTADs from http://chorogenome.ie-freiburg.mpg.de/

A / B compartment analysis

To compute the A / B compartments the matrix needs to be transformed to an observed/expected matrix in the way Lieberman-Aiden describes it. In a next step a pearson correlation matrix and based on it a covariance matrix is computed. Finally the eigenvectors based on the covariance matrix are computed. All these steps are computed with the command:

$ hicPCA -m hic_corrected.h5 --outFileName pca1.bw pca2.bw --format bigwig

If the intermediate matrices of this process should be used for plotting run:

$ hicTransform -m hic_corrected.h5 --outFileName all.h5 --method all

This creates all intermediate matrices: obs_exp_all.h5, pearson_all.h5 and covariance_all.h5.

The A / B compartments can be plotted with hicPlotMatrix.

$ hicPlotMatrix -m pearson_all.h5 --outFileName pca1.png --perChr --bigwig pca1.bw
_images/eigenvector1_lieberman.png

News and Developments

Release 3.4.2

7 March 2020

  • This release fixes the wrong name scheme which was used in the chicModules. The most .bed files are now .txt files.

  • hicDetectLoops got an inner chromosome parallelization to decrease the compute time.

  • hicPlotMatrix got three new parameters: rotationX, rotationY and fontSize to adjust the position and font size of the labels. We hope this can lead in certain cases to a a better readability

  • hicPlotMatrix: fixed a bug that occurred if the list of chromosomes was given and the last chromosome appeared as an additional label.

  • Improving and updating the documentation.

Preprint

**6 March 2020*

The preprint of the loop detection algorithm is online via biorXiv: https://www.biorxiv.org/content/10.1101/2020.03.05.979096v1

Release 3.4.1

3 February 2020

  • This release fixes a bug in chicViewpoint that caused a crash if the data to be averaged is less than the window size.

Release 3.4

28 January 2020

  • Fixing a bug in hicAdjustMatrix: keep option had a bug concerning the cutting before the end of a chromosome or the start position was not from the beginning of the chromosome

  • hicCompartmentPolarization was renamed to hicCompartmentalization and got some bug fixes

  • Extending the option on how the observed vs. Expected matrix is computed and adding the parameter –ligation_factor to achieve a rescale behaviour of the values as it is implemented in Homer. The same changes are applied to hicTransform

  • Improved the documentation

  • Adding an option in hicAverageRegions to select start, end, center or start_end as start index for up/downstream range.

  • hicBuildMatrix: Removed default value of binSize to enable mutually exclusive group error if not one of them is set. Behaviour so far was that the binSize was taken.

  • hicPlotSVL: adding xlegend to plot of SVL ratios to indicate the data points per boxplots are the chromosome ratios

  • hicQuickQC: Removed binSize option of hicQuickQC because it does not matter for QC calculation and adding a sentence to recommend the usage of restriction enzyme and dangling end sequence. Fixing bug issue #464

  • hicNormalize: Adding option in hicNormalize to remove values after the normalization if values are smaller than a given threshold

  • Capture Hi-C modules: Change background model distribution assumption from negative binomial to continuous negative binomial by using Gamma functions as a replacement for the binomial coefficient. Source: https://stats.stackexchange.com/questions/310676/continuous-generalization-of-the-negative-binomial-distribution/311927

  • hicInfo: Implementing feature request #456. The length of chromosomes is now show in the information too

Release 3.3.1

15 November 2019

  • Fixing a bug in the labeling of chicPlotViewpoints if the value range is counted in MB

  • Add an option to chicViewpoint to pre-compute a x-fold of p-value over the maximum value of the relative distance

Release 3.3

8 October 2019

  • Fixing many bugs:
    • A bug in hicDetectLoops if a sub-matrix was very small

    • A bug in hicPlotMatrix if the region defined by –region was only a chromosome and loops should be plotted too

    • A bug in hicPlotMatrix if a loop region should be plotted and chromosomeOrder argument was used too

    • A bug in hicAggregateContacts (issue #405) if chromosomes were present in the matrix but not in the bed file

    • A bug in hicAdjustMatrix concerning a bed file and consecutive regions, see issue #343

    • A bug in hicAdjustMatrix if a chromosome is present in the matrix but not in the bed file, see issue #397

    • A bug in hicCompartmentsPolarization concerning the arguments ‘quantile’ and ‘outliers’ were interpreted as strings but should be integers

    • A bug in hicAdjustMatrix concerning the ‘keep’ option and how matrices are reordered internally. Thanks @LeilyR

  • Added features as requested:
    • hicPCA ignores now masked bins, see issue #342

    • chicPlotViewpoint:
      • Better legend handling on x-axis

      • Peaks are now display with their fill width

      • Add option –pValueSignificantLevels to categorize the p-values in x levels (e.g. 0.001 0.05 0.1)

    • chicViewpoint:
      • adding sorting via viewpoints and not by samples option (–allViewpointsList)

    • Adding an option to hicNormalize to normalize via multiplication and a use defined value (see issues #385, #424)

  • Rearrange hicAdjustMatrix to have a better accessibility to its functions from outside of main

  • Improving the documentation and fixing grammar / spelling mistakes. Thanks @simonbray

  • New script: hicPlotSVL to investigate short range vs long range ratios.

Release 3.2

** 22 August 2019**

  • Adding the new captured Hi-C module. Viewpoint analysis based on a background model, significant interaction detection and differential analysis are provided.

  • Adding documentation for captured Hi-C module and a tutorial on how to use it.

  • Adding a module to be able to detect quite fast the quality of a Hi-C data set: hicQuickQC.

  • Adding a tool to merge loops of different resolutions.

  • Improving validation of locations: Presorting is no longer necessary; adding feature to add ‘chr’ prefix to loop or protein chromosome name

  • Change loop detection slightly to improve results and fixed bugs:
    • preselection p-value was ignored and only p-value was used

    • adding additional test to the peak region test to decrease false discoveries

    • exchanging pThreshold / ln(distance) to remove too low values by a share of the maximum value of the distance. New parameter ‘maximumInteractionPercentageThreshold’

  • Removal of the folder ‘scripts’ and its content. These were outdated scripts and will maybe part of regular Hi-C tools in the future.

Release 3.1

9 July 2019

  • KR correction improvements: It is now able to process larger data sets like GM12878 primary+replicate on 10kb resolution.

  • Adding script for validation of loop locations with protein peak locations

  • Adding script hicCompartmentsPolarization: Rearrange the average interaction frequencies using the first PC values to represent the global compartmentalization signal

Release 3.0.2

28 June 2019

  • Pinning dependencies to:

    • hicmatrix version 9: API changes in version 10

    • krbalancing version 0.0.4: API changes in version 0.0.5

    • matplotlib version 3.0: Version 3.1 raises ‘Not implemented error’ for unknown reasons.

  • Set fit_nbinom to version 1.1: Version 1.0 Had deprecated function call of scipy > 1.2.

  • Small documentation fixes and improvements.

Release 3.0.1

5 April 2019

  • Fixes KR balancing correction factors

  • Deactivates log.debug

Release 3.0

3 April 2019

  • Python 3 only. Python 2.X is no longer supported

  • Additional Hi-C interaction matrix correction algorithm ‘Knight-Ruiz’ as a C++ module for a faster runtime and less memory usage.

  • Enriched regions detection tool: ‘hicDetectLoops’ based on strict candidate selection, ‘hicFindEnrichedContacts’ was deleted

  • Metadata for cooler files is supported: hicBuildMatrix and hicInfo are using it

  • New options for hicPlotMatrix: –loops to visualize computed loops from hicDetectLoops and –bigwigAdditionalVerticalAxis to display a bigwig track on the vertical axis too.

Release 2.2.3

22 March 2019

  • This bug fix release patches an issue with cooler files, hicBuildMatrix and the usage of a restriction sequence file instead of fixed bin size.

Release 2.2.2

27 February 2019

  • This bug fix release removes reference to hicExport that were forgotten to delete in 2.2. Thanks @BioGeek for this contribution.

Release 2.2.1

7 February 2019

  • Muting log output of matplotlib and cooler

  • Set version number of hicmatrix to 7

  • Optional parameter for hicInfo to write the result to a file instead to the bash

Release 2.2

18 January 2019

This release contains:

  • replaced hicExport by hicConvertFormat and hicAdjustMatrix

  • extended functionality for hicConvertFormat

    • read support for homer, hicpro, cool, h5

    • write support for h5, homer, cool

    • convert hic to cool

    • creation of mcool matrices

  • hicAdjustMatrix

    • remove, keep or mask specified regions from a file, or chromosomes

  • hicNormalize

    • normalize matrices to 0 - 1 range or to the read coverage of the lowest given

  • hicBuildMatrix

    • support for build mcool

  • restructuring the central class HiCMatrix to object oriented model and moved to its own library: deeptools/HiCMatrix.

    • Extended read / write support for file formats

    • better (faster, less memory) support for cool format

    • remove of old, unused code

    • restrict support to h5 and cool matrices, except hicConvertFormat

  • hicFindTADs: Option to run computation per specified chromosomes

  • hicPlotTADs: removed code and calls pyGenomeTracks

  • hicAverageRegions: Sum up in a given range around defined reference points. Useful to detect changes in TAD structures between different samples.

  • hicPlotAverageRegions: Plots such a average region

  • hicTransform: Restructuring the source code, remove of option ‘all’ because it was generating confusion. Adding option ‘exp_obs’, exp_obs_norm and exp_obs_lieberman. These three different options use different expectation matrix computations.

  • hicPCA

    • Adding –norm option to compute the expected matrix in the way HOMER is doing it. Useful for drosophila genomes

    • Adding option to write out the intermediate matrices ‘obs_exp’ and ‘pearson’ which are necessary in the computation of the PCA

  • hicPlotMatrix

    • Add option to clip bigwig values

    • Add option to scale bigwig values

  • Removed hicLog2Ration, functionality is covered by hicCompareMatrices

  • Extending test cases to cover more source code and be hopefully more stable.

  • Many small bugfixes

Publication

13 June 2018

We are proud to announce our latest publication:

Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. “Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization”, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: https://doi.org/10.1093/nar/gky504

Release 2.1.4

25 May 2018

  • cooler file format correction factors are applied as they should be

  • parameter ‘–region’ of hicBuildMatrix works with Python 3

Release 2.1.3

7 May 2018

The third bugfix release of version 2.1 corrects an error in hicPlotViewpoint. It adds a feature requested in issue #169 which should have been included in release 2.1 but was accidentally not.

From 2.1 release note: hicPlotViewpoint: Adds a feature to plot multiple matrices in one image

Release 2.1.2

26 April 2018

The second bug fix release of 2.1 includes:

  • documentation improvements

  • fixing broken Readthedocs documentation

  • Small bug fix concerning hicPlotMatrix and cooler: –chromosomeOrder is now possible with more than one chromosome

  • Small fixes concerning updated dependencies: Fixing version number a bit more specific and not that strict in test cases delta values.

Release 2.1.1

27 March 2018

This release fixes a problem related to python3 in which chromosome names were of bytes type

Release 2.1

5 March 2018

The 2.1 version of HiCExplorer comes with new features and bugfixes.

  • Adding the new feature hicAggregateContacts: A tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions.

  • Many improvements to the documentation and the help text. Thanks to Gina Renschler and Gautier Richard from the MPI-IE Freiburg, Germany.

  • hicPlotMatrix

    • supports only bigwig files for an additional data track.

    • the argument –pca was renamed to –bigwig

    • Smoothing the bigwig values to neighboring bins if no data is present there

    • Fixes to a bug concerning a crash of tight_layout

    • Adding the possibility to flip the sign of the values of the bigwig track

    • Adding the possibility to scale the values of the bigwig track

  • hicPlotViewpoint: Adds a feature to plot multiple matrices in one image

  • cooler file format

    • supports mcool files

    • applies correction factors if present

    • optionally reads bin[‘weight’]

  • fixes

    • a crash in hicPlotTads if horizontal lines were used

    • checks if all characters of a title are ASCII. If not they are converted to the closest looking one.

  • Updated and fixate version number of the dependencies

Release 2.0

December 21, 2017

This release makes HiCExplorer ready for the future:

  • Python 3 support

  • Cooler file format support

  • A/B comparment analysis

  • Improved visualizations

  • bug fixes for --perChr option in hicPlotMatrix

  • eigenvector track with --pca for hicPlotMatrix

  • visualization of interactions around a reference point or region with hicPlotViewpoint

  • Higher test coverage

  • re-licensing from GPLv2 to GPLv3

Release 1.8.1

November 27, 2017

Bug fix release:

  • a fix concerning the handling chimeric alignments in hicBuildMatrix. Thanks to Aleksander Jankowski @ajank

  • handling of dangling ends was too strict

  • improved help message in hicBuildMatrix

Release 1.8

October 25, 2017

This release is adding new features and fixes many bugs:

  • hicBuildMatrix: Added multicore support, new parameters –threads and –inputBufferSize

  • hicFindTADs:

  • One call instead of two: hicFindTADs TAD_score and hicFindTADs find_TADs merged to hicFindTADs.

  • New multiple correction method supported: False discovery rate. Call it with –correctForMultipleTesting fdr and –threshold 0.05.

  • Update of the tutorial: mES-HiC analysis.

  • Additional test cases and docstrings to improve the software quality

  • Fixed a bug occurring with bigwig files with frequent NaN values which resulted in only NaN averages

  • hicPlotTADs: Support for plotting points

  • Moved galaxy wrappers to https://github.com/galaxyproject/tools-iuc

  • Fixed multiple bugs with saving matrices

  • hicCorrelate: Changes direction of dendograms to left

Release 1.7.2

April 3, 2017

  • Added option to plot bigwig files as a line hicPlotTADs

  • Updated documentation

  • Improved hicPlotMatrix –region output

  • Added compressed matrices. In our tests the compressed matrices are significantly smaller.

March 28, 2017

Release 1.7

March 28, 2017

This release adds a quality control module to check the results from hicBuildMatrix. By default, now hicBuildMatrix generates a HTML page containing the plots from the QC measures. The results from several runs of hicBuildMatrix can be combined in one page using the new tool hicQC.

Also, this release added a module called hicCompareMatrices that takes two Hi-C matrices and computes the difference, the ratio or the log2 ratio. The resulting matrix can be plotted with hicPlotMatrix to visualize the changes.

Preprint introducing HiCExplorer is now online

March 8, 2017

Our #biorXiv preprint on DNA sequences behind Fly genome architecture is online!

Read the article here : http://biorxiv.org/content/early/2017/03/08/115063

In this article, we introduce HiCExplorer : Our easy to use tool for Hi-C data analysis, also available in Galaxy.

We also introduce HiCBrowser : A standalone software to visualize Hi-C along with other genomic datasets.

Based on HiCExplorer and HiCBrowser, we built a useful resource for anyone to browse and download the chromosome conformation datasets in Human, Mouse and Flies. It’s called the chorogenome navigator

Along with these resources, we present an analysis of DNA sequences behind 3D genome of Flies. Using high-resolution Hi-C analysis, we find a set of DNA motifs that characterize TAD boundaries in Flies and show the importance of these motifs in genome organization.

We hope that these resources and analysis would be useful for the community and welcome any feedback.

HiCExplorer wins best poster prize at VizBi2016

March 20, 2016

We are excited to announce that HiCExplorer has won the NVIDIA Award for Best Scientific Poster in VizBi2016, the international conference on visualization of biological data.

Read more here

This was our poster :

HiCExplorer

Citation

Please cite HiCExplorer as follows:

Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening,Kin Chung Lam, Bianca Habermann, Asifa Akhtar, Thomas Manke. “High-resolution TADs reveal DNA sequences underlying genome organization in flies”. Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w

Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: https://doi.org/10.1093/nar/gky504

_images/logo_mpi-ie.jpg

This tool suite is developed by the Bioinformatics Unit at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg and by the Bioinformatics Lab of the Albert-Ludwigs-University Freiburg, Germany.