Help - HOmo sapiens COmprehensive MOdel COllection

HOmo sapiens COmprehensive MOdel COllection (HOCOMOCO) contains transcription factor (TF) binding motifs represented as classic Position Weight Matrices (PWMs, also known as Position-Specific Scoring Matrices, PSSMs).

The PCM to PWM conversion scheme used in HOCOMOCO follows that of MACRO-APE, see the respective manual, page 20–21. Uniform background was used in this process, as well as when estimating the downloadable threshold-to-P-value tables.

[Motif discovery]

HOCOMOCO motifs were constructed with ChIPMunk by systematic motif discovery from thousands of ChIP-Seq and HT-SELEX datasets. Please refer to the HOCOMOCO v12 paper for more details on the motif discovery procedure.

[Motif finding; Sequence scanning]

HOCOMOCO provides PWMs accompanied by precomputed score thresholds. The thresholds and P-value for HOCOMOCO v12 motifs are estimated against uniform background probabilities. To interactively visualize predicted TFBS in a small set of sequences we provide MoLoTool. For large-scale analysis, we suggest using command-line tools, such as our SPRY-SARUS or MEME's FIMO.

[Motif benchmarking; Performance metrics]

To assemble the motif collection of HOCOMOCO v12 we have used multiple benchmarking protocols evaluating the motif performance for TFBS recognition in genomic regions (in vivo data, ChIP-Seq), in artificial oligonucleotides (in vitro data, HT-SELEX), and for predicting regulatory single-nucleotide variants and polymorphisms (rSNPs). Please refer to the HOCOMOCO v12 paper for more details on benchmarking protocols and resulting performance metrics.

[Quality ratings]

Each model in the collection has a quality rating from A to D where A represents motifs with the highest confidence. A quality motifs and subtypes were found in both HT-SELEX and ChIP-Seq, B quality motifs are found in at least two different experiments of the same type, and C quality motifs passed expert curation but were found in a single experiment. In the core collection, D quality marks subtypes which included only motifs inherited from HOCOMOCO v11, and in v12 there are only a few such cases. In sub-collections, D quality denotes all motifs not tested in the respective benchmarks (ChIP-Seq for v12-invivo, HT-SELEX for v12-invitro, rSNP for v12-rsnp).

[Motif subtypes]

Since v11 the alternative binding motifs of a particular TF are ranked from 0 (the primary model) to 1,2,.. (the alternative motifs). The motifs of 0 rank are the most 'general' variants with the best performance across available data in the benchmark (see the HOCOMOCO v12 paper for details).

[Experimental data types]

HOCOMOCO v12 used two data types for motif discovery: ChIP-Seq and HT-SELEX. The latter came in two variants: traditional HT-SELEX and methyl-HT-SELEX with mCpGs. Additionally, in benchmarking, we used information on differential transcription factor binding to single-nucleotide variants obtained in SNP-SELEX and identified from ChIP-Seq (the allele-specific binding, see ADASTRA).