• Complete release data including the original motifs are available at Zenodo.
  • Harmonized list of human transcription factors and respective mouse orthologs based on the TFClass classification: tf_masterlist.tsv.
  • MoLoTool - web interface for motif finding.
  • SPRY-SARUS tool for motif finding (Java): jar, readme
  • MACRO-APE tool for motif comparison, P-value and threshold estimation: jar, manual, website
  • PERFECTOS-APE tool for functional annotation of sequence variants overlappint TFBS: jar, manual, website
Ivan V. Kulakovskiy; Ilya E. Vorontsov; Ivan S. Yevshin; Ruslan N. Sharipov; Alla D. Fedorova; Eugene I. Rumynskiy; Yulia A. Medvedeva; Arturo Magana-Mora; Vladimir B. Bajic; Dmitry A. Papatsenko; Fedor A. Kolpakov; Vsevolod J. Makeev
Nucl. Acids Res., Database issue, gkx1106 (11 November 2017)
doi: 10.1093/nar/gkx1106
License: HOCOMOCO motif collection is distributed under WTFPL. If you prefer more standard licenses, feel free to treat WTFPL as CC-BY.

Many practical motif applications require a set of motifs with reduced redundancy i.e. where similar motifs belonging to related transcription factors are grouped together and only a single matrix represents the group. To this end, we have created the non-redundant set of HOCOMOCO v12 motifs, a derivative of the HOCOMOCO v12 CORE collection.

To this end, we estimated the motif similarities with MacroAPE (see and doi:10.1186/1748-7188-8-23) at the motif P-value cutoff of 0.0005 and default matrix discretization of 1 (upscaled to 10 to reach a better precision for the cases when similarity estimates with the default discretization exceeded 0.01).

Using the pairwise motif similarity matrix, we performed hierarchical clustering using sklearn agglomerative clustering ('average' linkage). The number of clusters was taken to maximize the silhouette score resulting in 523 clusters at the silhouette score of 0.16.

For each cluster, the single representative motif was taken according to the best average similarity to other motifs in the cluster. The annotation contains a list of motifs that constitute a cluster and the list of respective TFs (UniProt IDs).

HOCOMOCO v12 subcollections

Number of motifs 1443
