hicCorrectMatrix

Iterative correction for a hic matrix (see Imakaev et al. 2012 Nature Methods for details). For the method to work correctly, bins with low or too high coverage need to be filtered. For this, it is recommended to first run some diagnostic plots to determine the modified z-score cut off.

It is recommended to run hicCorrectMatrix as follows:

$ hicCorrectMatrix diagnostic_plot –matrix hic_matrix -o plot_file.png

Then, after revising the plot and deciding the threshold values:

$ hicCorrectMatrix correct –matrix hic_matrix
–filterThreshold <lower threshold> <upper threshold> -o corrected_matrix

usage: hicCorrectMatrix [-h] [--version] [--verbose]  ...
optional arguments
--version show program’s version number and exit
--verbose=False
 Print processing status
commands

To get detailed help on each of the options: $ hicCorrectMatrix diagnostic_plot -h $ hicCorrectMatrix correct -h

Possible choices: correct, diagnostic_plot

Sub-commands:
correct

Run the iterative correction.

usage: hicCorrectMatrix correct --matrix hic_matrix.h5 --filterThreshold -1.2 5-out corrected_matrix.h5
optional arguments
--matrix, -m Hi-C matrix.
--iterNum=500, -n=500
 number of iterations
--outFileName, -o
 File name to save the resulting matrix. The output is a .h5 file.
--filterThreshold, -t
 Bins of low coverage or large coverage need to be removed. Usually they do not contain valid Hi-C data of represent regions that accumulate reads. Use hicCorrectMatrix diagnostic_plot to identify the modified z-value thresholds. A lower and upper threshold are required separated by space. Eg. –filterThreshold -1.5 5
--inflationCutoff
 Value corresponding to the maximum number of times a bin can be scaled up during the iterative correction. For example, a inflation Cutoff of 3 will filter out all bins that were expanded 3 times or more during the iterative correction.
--transCutoff, -transcut
 Clip high counts in the top -transcut trans regions (i.e. between chromosomes). A usual value is 0.05
--sequencedCountCutoff
 Each bin receives a value indicating the fraction that is covered by reads. A cutoff of 0.5 will discard all those bins that have less than half of the bin covered.
--chromosomes List of chromosomes to be included in the iterative correction. The order of the given chromosomes will be then kept for the resulting corrected matrix
--skipDiagonal=False, -s=False
 If set, diagonal counts are not included
--perchr=False Normalize each chromosome separately
--verbose=False
 Print processing status
--version show program’s version number and exit
diagnostic_plot

Plots a histogram of the coverage per bin together with the modified z-score based on the median absolute deviation method (see Boris Iglewicz and David Hoaglin 1993, Volume 16: How to Detect and Handle Outliers The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.

usage: hicCorrectMatrix diagnostic_plot --matrix hic_matrix.h5 -o file.png
optional arguments
--matrix, -m Hi-C matrix.
--plotName, -o File name to save the diagnostic plot.
--chromosomes List of chromosomes to be included in the iterative correction. The order of the given chromosomes will be then kept for the resulting corrected matrix
--xMax Max value for the x-axis in counts per bin
--perchr=False Compute histogram per chromosome. For samples from cells with uneven number of chromosomes and/or translocations it is advisable to check the histograms per chromosome to find the most conservative `filterThreshold`.