hicExport

Conversion of Hi-C matrices between different file formats.

usage: hicExport --inFile INFILE [INFILE ...] --outFileName OUTFILENAME
                 [--inputFormat {dekker,ren,lieberman,h5,npz,GInteractions,cool,hicexplorer}]
                 [--outputFormat {dekker,ren,lieberman,h5,npz,GInteractions,cool,hicexplorer}]
                 [--chrNameList CHRNAMELIST [CHRNAMELIST ...]]
                 [--chromosomeOrder CHROMOSOMEORDER [CHROMOSOMEORDER ...]]
                 [--bplimit INT bp] [--clearMaskedBins] [--help] [--version]

Required arguments

–inFile, -in input file(s). Could be one or many files. Multiple input files are allowed for hicexplorer or lieberman format. In case of multiple input files, they will be combined.
–outFileName, -o
 File name to save the exported matrix. In the case of “lieberman” output format this should be the path of a folder where the information per chromosome is stored.

Optional arguments

–inputFormat

Possible choices: dekker, ren, lieberman, h5, npz, GInteractions, cool, hicexplorer

file format for the matrix file. The following options are available: hicexplorer or h5 (native HiCExplorer format based on hdf5 storage format), npz (format used by earlier versions of HiCExplorer), dekker (matrix format used in Job Dekker publications), lieberman (format used by Erez Lieberman Aiden) and cool. This last formats may change in the future.

Default: “hicexplorer”

–outputFormat

Possible choices: dekker, ren, lieberman, h5, npz, GInteractions, cool, hicexplorer

Output format. The possibilities are “hicexplorer” or “h5” (native HiCExplorer format), “dekker”, “ren”, npz (former hicexplorer format), “GInteractoins” and “cool”. The dekker format outputs the whole matrix where the first column and first row are the bin widths and labels. The “ren” format is a list of tuples of the form chrom, bin_star, bin_end, values. The lieberman format writes separate files for each chromosome,with three columns: contact start, contact end, and raw observed score. This corresponds to the RawObserved files from lieberman group. The hicexplorer format stores the data using a hdf5 format. Optionally, the numpy npz format can be used for small datasets (< 4GB).The GInteractions format is in the form : Bin1, Bin2 , Interaction, where Bin1 and Bin2 are intervals (chr,start,end), seperated by tab.

Default: “dekker”

–chrNameList list of chromosome names (only if input format is lieberman), eg : 1 2 .
–chromosomeOrder
 Chromosomes and order in which the chromosomes should be saved. If not all chromosomes are given, the missing chromosomes are left out. For example, –chromosomeOrder chrX will export a matrix only containing chromosome X.
–bplimit, -b When merging many matrices : maximum limit (in base pairs) after which the matrix will be truncated. i.e. TADs bigger than this size will not be shown. For Matrices with very high resolution, truncating the matrix after a limit helps in saving memory during processing, without much loss of data. You can use bplimit of 2 x size of biggest expected TAD.
–clearMaskedBins
 

if set, masked bins are removed from the matrix. Masked bins are those that do not have any values, mainly because they are repetitive regions of the genome

Default: False

–version show program’s version number and exit