hicDetectLoops¶
hicDetectLoops can detect enriched interaction regions (peaks / loops) based on a strict candidate selection, negative binomial distributions and Wilcoxon rank-sum tests.
The algorithm was mainly develop on GM12878 cells from Rao 2014 on 10kb and 5kb fixed bin size resolution.
Example usage¶
$ hicDetectLoops -m matrix.cool -o loops.bedgraph --maxLoopDistance 2000000 --windowSize 10 --peakWidth 6 --pValuePreselection 0.05 --pValue 0.05
The candidate selection is based on the restriction of the maximum genomic distance, here 2MB. This distance is given by Rao 2014. For each genomic distance
a continuous negative binomial distribution is computed and only interaction pairs with a threshold less than --pValuePreselection
are accepted.
In a second step, each candidate is considered compared to its neighborhood. This neighborhood is defined by the --windowSize
parameter in the x and y dimension.
Per neighborhood only one candidate is considered, therefore only the candidate with the highest peak values is accepted. As a last step,
the neighborhood is split into a peak and background region (parameter --peakWidth
). The peakWidth can never be larger than the windowSize. However, we recommend
for 10kb matrices a windowSize of 10 and a peakWidth of 6.
With version 3.5 a major revision of this tool was published. The biggest changes are:
The introduction of an observed/expected matrix as one of the first steps. Based on it, the other calculations compute without a distance dependent factor and the results are more accurate.
The testing of peak vs background region with the donut layout as proposed by HiCCUPS with Wilcoxon rank-sum test. Anderson-Darling test is removed.
Improving the handling of the parallelization and rewrote of the merging of candidates in one neighborhood. Results in faster execution time and less memory demand.
Loading only the interactions within the range of maxLoopDistance. This is possible with HiCMatrix version 13 and results in faster load time and a reduced memory peak. This improvement is only for cool matrices, h5 matrices do not profit from these changes.
hicDetectLoops has many parameters and it can be quite difficult to search for the best parameter setting for your data. With version 3.5 we introduce therefore two new tools hicHyperoptDetectLoops and hicHyperoptDetectLoopsHiCCUPS. The first one searches for the optimal parameter setting for your data based on HiCExplorer’s hicDetectLoops. However, if you want to compare the results to Juicer HiCCUPS, the second tool provides a parameter search for it. Please note that HiCCUPS and any dependency of it are not provided by HiCExplorer and must be installed by the user on its own.
The output file (´´-o loops.bedgraph``) contains the x and y position of each loop and its corresponding p-value of the statistical test.
1 120000000 122500000 1 145000000 147500000 0.001
The results can visualized using hicPlotMatrix:
$ hicPlotMatrix -m matrix.cool -o plot.png --log1p --region 1:18000000-22000000 --loops loops.bedgraph
Computes enriched regions (peaks) or long range contacts on the given contact matrix.
usage: hicDetectLoops --matrix MATRIX --outFileName OUTFILENAME
[--peakWidth PEAKWIDTH] [--windowSize WINDOWSIZE]
[--pValuePreselection PVALUEPRESELECTION]
[--peakInteractionsThreshold PEAKINTERACTIONSTHRESHOLD]
[--obsExpThreshold OBSEXPTHRESHOLD] [--pValue PVALUE]
[--maxLoopDistance MAXLOOPDISTANCE]
[--chromosomes CHROMOSOMES [CHROMOSOMES ...]]
[--threads THREADS]
[--threadsPerChromosome THREADSPERCHROMOSOME]
[--expected {mean,mean_nonzero,mean_nonzero_ligation}]
[--help] [--version]
Required arguments¶
- --matrix, -m
The matrix to compute the loop detection on.
- --outFileName, -o
Outfile name to store the detected loops. The file will in bedgraph format.
Optional arguments¶
- --peakWidth, -pw
The width of the peak region in bins. The square around the peak will include (2 * peakWidth)^2 bins (Default: 2).
Default: 2
- --windowSize, -w
The window size for the neighborhood region the peak is located in. All values from this region (exclude the values from the peak region) are tested against the peak region for significant difference. The square will have the size of (2 * windowSize)^2 bins (Default: 5).
Default: 5
- --pValuePreselection, -pp
Only candidates with p-values less the given threshold will be considered as candidates. For each genomic distance a negative binomial distribution is fitted and for each pixel a p-value given by the cumulative density function is given. This does NOT influence the p-value for the neighborhood testing. Can a single value or a threshold file created by hicCreateThresholdFile (Default: 0.1).
Default: 0.1
- --peakInteractionsThreshold, -pit
The minimum number of interactions a detected peaks needs to have to be considered (Default: 10).
Default: 10
- --obsExpThreshold, -oet
The minimum number of obs/exp interactions a detected peaks needs to have to be considered (Default: 1.5).
Default: 1.5
- --pValue, -p
Rejection level for Anderson-Darling or Wilcoxon-rank sum test for H0. H0 is peak region and background have the same distribution (Default: 0.025).
Default: 0.025
- --maxLoopDistance
Maximum genomic distance of a loop, usually loops are within a distance of ~2MB (Default: 2000000).
Default: 2000000
- --chromosomes
Chromosomes to include in the analysis. If not set, all chromosomes are included.
- --threads, -t
Number of threads to use, the parallelization is implemented per chromosome (Default: 4).
Default: 4
- --threadsPerChromosome, -tpc
Number of threads to use per parallel thread processing a chromosome. E.g. –threads = 4 and –threadsPerChromosome = 4 makes 4 * 4 = 16 threads in total (Default: 4).
Default: 4
- --expected, -exp
Possible choices: mean, mean_nonzero, mean_nonzero_ligation
Method to compute the expected value per distance: Either the mean (mean), the mean of non-zero values (mean_nonzero) or the mean of non-zero values with ligation factor correction (mean_nonzero_ligation) (Default: “mean”).
Default: “mean”
- --version
show program’s version number and exit