hicFindTADs

Uses a measure called TAD-separation score to identify the degree of separation between the left and right regions at each Hi-C matrix bin. This is done for a running window of different sizes. Then, TADs are called as those positions having a local TAD-separation score minimum. The TAD-separation score is measured using the z-score of the Hi-C matrix and is defined as the mean zscore of all the matrix contacts between the left and right regions (diamond).

To find the TADs, the program needs to compute first the TAD scores at different window sizes. Then, the results of that computation are used to call the TADs. This is convenient to test different filtering criteria quickly as the demanding step is the computation of TAD-separation scores.

An simple example usage is:

$ hicFindTads TAD_score -m hic_matrix.h5 -o TAD_score.txt

$ hicFindTads find_TADs -f TAD_score.txt –outPrefix TADs

For detailed help:

hicFindTADs TAD_score -h

or

hicFindTADs find_TADs -h

usage: hicFindTADs [-h] [--version]  ...
optional arguments
--version show program’s version number and exit
commands

Undocumented

Possible choices: TAD_score, find_TADs

Sub-commands:
TAD_score

Undocumented

usage: hicFindTADs TAD_score [-h] --matrix MATRIX --outFileName OUTFILENAME
                             [--minDepth INT bp] [--maxDepth INT bp]
                             [--step INT bp]
                             [--numberOfProcessors NUMBEROFPROCESSORS]
optional arguments
--matrix, -m Corrected Hi-C matrix to use for the computations
--outFileName, -o
 File name to store the computation of the TAD-separation score. The formatof the output file is chrom start end TAD-sep1 TAD-sep2 TAD-sep3 .. etc. We call this format a bedgraph matrix and can be plotted using `hicPlotTADs`. Each of the TAD-separation scores in the file corresponds to a different window length starting from –minDepth to –maxDepth.
--minDepth Minimum window length (in bp) to be considered to the left and to the right of each Hi-C bin. This number should be at least 3 times as large as the bin size of the Hi-C matrix.
--maxDepth Maximum window length to be considered to the left and to the right of the cut point in bp. This number should around 6-10 times as large as the bin size of the Hi-C matrix.
--step Step size when moving from –minDepth to –maxDepth. Note, the step sizegrows exponentially as `maxDeph + (step * int(x)**1.5) for x in [0, 1, ...]` until it reaches `maxDepth`. For example, selecting step=10,000, minDepth=20,000 and maxDepth=150,000 will compute TAD-scores for window sizes: 20,000, 30,000, 40,000, 70,000 and 100,000
--numberOfProcessors=1, -p=1
 Number of processors to use
find_TADs

Undocumented

usage: hicFindTADs find_TADs [-h] --tadScoreFile TADSCOREFILE
                             [--minBoundaryDistance MINBOUNDARYDISTANCE]
                             [--pvalue PVALUE] [--delta DELTA] --outPrefix
                             OUTPREFIX
optional arguments
--tadScoreFile, -f
 File containing the TAD scores (generated by running hicFindTADs TAD_score)
--minBoundaryDistance
 Minimum distance between boundaries (in bp). This parameter can be used to reduce spurious boundaries caused by noise.
--pvalue=0.01 P-value threshold. The probability of a local minima to be a boundary is estimated by comparing the distribution (Wilcoxon ranksum) of the zscores between the left and right regions (diamond) at the local minimum with the matrix zscores for a diamond at –minDepth to the left and a diamond –minDepth to the right. The reported pvalue is the Bonferroni correction all pvalues. Default is 0.01
--delta=0.01 Minimum threshold of the difference between the TAD-separation score of a putative boundary and the mean of the TAD-sep. score of surrounding bins. The delta value reduces spurious boundaries that are shallow, which usually occur at the center of large TADs when the TAD-sep. score is flat. Higher delta threshold values produce more conservative boundary estimations. By default a value of 0.01 is used.
--outPrefix File prefix to save the resulting files: 1. <prefix>_boundaries.bed, which contains the positions of boundaries. The genomic coordinates in this file correspond to the resolution used. Thus, for Hi-C bins of 10.000bp the boundary position is 10.000bp long. For restriction fragment matrices the boundary position varies depending on the fragment length at the boundary. 2. <prefix>_domains.bed contains the TADs positions. This is a non-overlapping set of genomic positions. 3. <prefix>_boundaries.gff Similar to the boundaries bed file but with extra information (pvalue, delta). 4. <prefix>_score.bedgraph file contains the TAD-separation score measured at each Hi-C bin coordinate. Is useful to visualize in a genome browser. The delta and pvalue settings are saved as part of the name.