Q: I am a bit lost, which (sub-)collection should I use in my analysis?
A: In most of the scenarios, you can safely use the main CORE collection. However, for better results, you may want to consult the following graphical scheme.
Q: But there are so many extra motif annotations such as subtype or quality, should I somehow pre-filter motifs based on those?
A: In the CORE collection all motifs are considered quite reliable. C-quality motifs have only a single supporting experiment but were nonetheless manually curated and benchmarked. Also, in the CORE collection, there are only a handful of D-quality motifs representing a few rare subtypes, which were not re-discovered when updating v11 to v12 but kept for consistency.
In the sub-collections, D quality denotes non-benchmarked motifs, e.g. in the ‘invivo’ sub-collection the D quality motifs were not tested on ChIP-Seq data.
Don't hesitate to consult the scheme.
Q: Can you explain the structure of motif IDs?
A: Let's use AHR.H12CORE.0.P.B as an example.
Here AHR denotes the UniProt AC prefix (most of the time identical between human and mouse orthologs, e.g. AHR_HUMAN and AHR_MOUSE).
H12CORE denotes the subcollection, and can also be H12RSNP/H12INVIVO/H12INVITRO in downloadable motifs sets.
0 is the subtype number, where 0 denotes the most common motifs scoring the best across all benchmarking datasets.
P is the type of the experiment that yielded motifs that were assigned to the subtype during expert curation. Can be P (ChIP-Seq), S (HT-SELEX), or M (Methyl-HT-SELEX), or any combination of those three for motifs found in several types of experiments.
B is the motif quality on the ABCD scale, see below.
Q: How do I select HOCOMOCO motifs for mouse or other species?
A: Please consult the scheme.
Q: In HOCOMOCO v11 there were separate mouse and human collections, why is there only one joint collection in v12?
A: Even in HOCOMOCO v11 we were relying on cross-validation between human and mouse datasets. As human and mouse TFs share highly similar and often identical DNA-binding domains, we have taken the next step and selected the most reliable motif well-performing across the whole range of available data for both species. Based on benchmarking results, we consider v12 motifs to be generally more informative and more reliable than in the previous releases. Please check the HOCOMOCO v12 paper for more details.
Q: Where is the dinucleotide motif collection of HOCOMOCO v11?
A: In this release, we have focused our efforts on expanding the fraction of TFs covered by reliable motifs, rigorous benchmarking, and comprehensive annotation of motif subtypes. Dinucleotide motifs will continue to be available in HOCOMOCO v11.