hicValidateLocations

hicValidateLoops is a tool to compare the detect loops from hicDetectLoops (or from any other software as long as the data format is followed, see below) with known peak protein locations to validate if the computed loops do have the expected anchor points. For example, loops in mammals are usually bound by CTCF or Cohesin, therefore it is important to know if the detect loops have protein peaks at their X and Y position.

../../_images/loops_bonev_cavalli.png

Loops in Hi-C, graphic from Bonev & Cavalli, Nature Reviews Genetics 2016

Data format

The data format of hicDetectLoops output is:

chr_x start_x end_x chr_y start_y end_y p-value

As --protein the input of narrowPeak or broadPeak files are accepted. However, as long as the --protein input file contains chromosome, start and end in the first three columns, it should work.

This script overlaps the loop locations with protein locations to determine the accuracy of the loop detection. Loops need to have format as follows:

chr start end chr start end

The protein peaks need to be in narrowPeaks or broadPeak format.

A protein match is successfull if at the bin of the x and y location a protein peak is overlapped. A bin is assumed to have a protein if one or more protein peaks falling within the bin region. The value of the protein is not considered, only match or non-match.

usage: hicValidateLocations --data DATA --protein PROTEIN [--method {loops}]
                            --resolution RESOLUTION
                            [--outFileName OUTFILENAME] [--addChrPrefixLoops]
                            [--addChrPrefixProtein] [--help] [--version]

Required arguments

--data, -d

The loop file from hicDetectLoops. To use files from other sources, please follow ‘chr start end chr start end’ format.

--protein, -p

The protein peak file. Can be narrowPeak or broadPeak

--method, -m

Possible choices: loops

The loop file

Default: “loops”

--resolution, -r

The used resolution of the Hi-C interaction matrix.

Optional arguments

--outFileName, -o

The prefix name of the output files. Two file are written: output_matched_locations and output_statistics.First file contains all loop locations with protein location matches, second file contains statistics about this matching.

--addChrPrefixLoops, -cl

Adding a ‘chr’-prefix to chromosome name of the loops.

Default: False

--addChrPrefixProtein, -cp

Adding a ‘chr’-prefix to chromosome name of the protein.

Default: False

--version

show program’s version number and exit