Converts between different matrix file formats

usage: hicExport [-h] --inFile INFILE [INFILE ...] [--inputFormat INPUTFORMAT]
                 [--chrNameList CHRNAMELIST [CHRNAMELIST ...]] --outFileName
                 [--chromosomeOrder CHROMOSOMEORDER [CHROMOSOMEORDER ...]]
                 [--bplimit INT bp]
                 [--outputFormat {dekker,ren,lieberman,h5,npz,GInteractions,cool}]
                 [--clearMaskedBins] [--version]

Named Arguments

–inFile, -in input file(s). Could be one or many files. Multiple input files are allowed for hicexplorer or lieberman format. In case of multiple input files, they will be combined.

file format for the matrix file. The following options are available: hicexplorer (native HiCExplorer format, npz (format used by earlier versions of HiCExplorer), dekker (matrix format used in Job Dekker publications), lieberman (format used by Erez Lieberman Aiden) and cool. This last formats may change in the future.

Default: “hicexplorer”

–chrNameList list of chromosome names (only if input format is lieberman), eg : 1 2 .
–outFileName, -o
 File name to save the exported matrix. In the case of “lieberman” output format this should be the path of a folder where the information per chromosome is stored.
 Chromosomes and order in which the chromosomes should be saved. If not all chromosomes are given, the missing chromosomes are left out. For example, –chromosomeOrder chrX will export a matrix only containing chromosome X
–bplimit, -b When merging many matrices : maximum limit (in base pairs) after which the matrix will be truncated. i.e. TADs bigger than this size will not be shown. For Matrices with very high resolution, truncating the matrix after a limit helps in saving memory during processing, without much loss of data. You can use bplimit of 2 x size of biggest expected TAD.

Possible choices: dekker, ren, lieberman, h5, npz, GInteractions, cool

Output format. The possibilities are “dekker”, “ren”, “h5, npz (former hicexplorer format), “GInteractoins” and “cool”. The dekker format outputs the whole matrix where the first column and first row are the bin widths and labels. The “ren” format is a list of tuples of the form chrom, bin_star, bin_end, values. The lieberman format writes separate files for each chromosome,with three columns: contact start, contact end, and raw observed score. This corresponds to the RawObserved files from lieberman group. The hicexplorer format stores the data using a hdf5 format. Optionally, the numpy npz format can be used for small datasets (< 4GB).The GInteractions format is in the form : Bin1, Bin2 , Interaction,where Bin1 and Bin2 are intervals (chr,start,end), seperated by tab.

Default: “dekker”


if set, masked bins are removed from the matrix. Masked bins are those that do not have any values, mainly because they arerepetitive regions of the genome

Default: False

–version show program’s version number and exit