hicCorrectMatrix

Iterative correction for a Hi-C matrix (see Imakaev et al. 2012 Nature Methods for details). For the method to work correctly, bins with low or too high coverage need to be filtered. For this, it is recommended to first run some diagnostic plots to determine the modified z-score cut off.

It is recommended to run hicCorrectMatrix as follows:

$ hicCorrectMatrix diagnostic_plot –matrix hic_matrix -o plot_file.png

Then, after revising the plot and deciding the threshold values:

$ hicCorrectMatrix correct –matrix hic_matrix
–filterThreshold <lower threshold> <upper threshold> -o corrected_matrix

usage: hicCorrectMatrix [-h] [--version] [--verbose]  ...

Named Arguments

–version show program’s version number and exit
–verbose

Print processing status

Default: False

commands

Possible choices: correct, diagnostic_plot

To get detailed help on each of the options:

$ hicCorrectMatrix diagnostic_plot -h

$ hicCorrectMatrix correct -h

Sub-commands:

correct

Run the iterative correction.

hicCorrectMatrix correct --matrix hic_matrix.h5 --filterThreshold -1.2 5-out corrected_matrix.h5

Named Arguments

–matrix, -m Hi-C matrix.
–iterNum, -n

number of iterations

Default: 500

–outFileName, -o
 File name to save the resulting matrix. The output is a .h5 file.
–filterThreshold, -t
 Bins of low coverage or large coverage need to be removed. Usually they do not contain valid Hi-C data of represent regions that accumulate reads. Use hicCorrectMatrix diagnostic_plot to identify the modified z-value thresholds. A lower and upper threshold are required separated by space. Eg. –filterThreshold -1.5 5
–inflationCutoff
 Value corresponding to the maximum number of times a bin can be scaled up during the iterative correction. For example, a inflation Cutoff of 3 will filter out all bins that were expanded 3 times or more during the iterative correction.
–transCutoff, -transcut
 Clip high counts in the top -transcut trans regions (i.e. between chromosomes). A usual value is 0.05
–sequencedCountCutoff
 Each bin receives a value indicating the fraction that is covered by reads. A cutoff of 0.5 will discard all those bins that have less than half of the bin covered.
–chromosomes List of chromosomes to be included in the iterative correction. The order of the given chromosomes will be then kept for the resulting corrected matrix
–skipDiagonal, -s
 

If set, diagonal counts are not included

Default: False

–perchr

Normalize each chromosome separately

Default: False

–verbose

Print processing status

Default: False

–version show program’s version number and exit

diagnostic_plot

Plots a histogram of the coverage per bin together with the modified z-score based on the median absolute deviation method (see Boris Iglewicz and David Hoaglin 1993, Volume 16: How to Detect and Handle Outliers The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.

hicCorrectMatrix diagnostic_plot --matrix hic_matrix.h5 -o file.png

Named Arguments

–matrix, -m Hi-C matrix.
–plotName, -o File name to save the diagnostic plot.
–chromosomes List of chromosomes to be included in the iterative correction. The order of the given chromosomes will be then kept for the resulting corrected matrix
–xMax Max value for the x-axis in counts per bin
–perchr

Compute histogram per chromosome. For samples from cells with uneven number of chromosomes and/or translocations it is advisable to check the histograms per chromosome to find the most conservative filterThreshold.

Default: False