Using an alignment from Bowtie2 where both PE reads are mapped using the –local option, this program reads such file and creates a matrix of interactions.
usage: hicBuildMatrix [-h] --samFiles two sam files two sam files --outBam bam file (--binSize BINSIZE | --restrictionCutFile BED file) [--minDistance MINDISTANCE] [--maxDistance MAXDISTANCE] [--restrictionSequence RESTRICTIONSEQUENCE] --outFileName FILENAME --QCfolder FOLDER [--region CHR:START-END] [--removeSelfCircles] [--minMappingQuality MINMAPPINGQUALITY] [--doTestRun] [--skipDuplicationCheck] [--version]
- optional arguments
--samFiles, -s The two sam files to process --outBam, -b Bam file to process --binSize=10000, -bs=10000 Size in bp for the bins. --restrictionCutFile, -rs BED file with all restriction cut places. Should contain only mappable restriction sites. If given, the bins are set to match the restriction fragments (i.e. the region between one restriction site and the next). --minDistance=300 Minimum distance between restriction sites. Restriction sites that are closer that this distance are merged into one. This option only applies if –restrictionCutFile is given. --maxDistance=800 Maximum distance in bp from restriction site to read, to consider a read a valid one. This option only applies if –restrictionCutFile is given. --restrictionSequence, -seq Sequence of the restriction site. This is used to discard reads that end/start with such sequence and that are considered un-ligated fragments or “dangling-ends”. If not given, such statistics will not be available. --outFileName, -o Output file name for a matrix --QCfolder Path of folder to save the quality control data for the matrix --region, -r Region of the genome to limit the operation. The format is chr:start-end. Also valid is just to specify a chromosome, for example –region chr10 --removeSelfCircles=False If set, outward facing reads, at a distance of less thatn 25kbs are removed. --minMappingQuality=15 minimun mapping quality for reads to be accepted. Because the restriction enzyme site could be located on top of the read, this may reduce the reported quality of the read. Thus, this parameter may be adusted if too many low quality (but otherwise perfectly valid hic-reads) are found. A good strategy is to make a test run (using the –doTestRun), then checking the results to see if too many low quality reads are present and then using the bam file generated to check if those low quality reads are caused by the read not being mapped entirely. --doTestRun=False A test run is useful to test the quality of a Hi-C experiment quickly. It works by testing only 1,000.000 reads. This option is useful to get an idea of quality controlvalues like inter-chromosomal interactins, duplication rates etc. --skipDuplicationCheck=False Identification of duplicated read pairs is memory consuming. Thus, in case of memory errors this check can be skipped. However, consider running a `–doTestRun` first to get an estimation of the duplicated reads. --version show program’s version number and exit