Many practical motif applications require a set of motifs with reduced redundancy i.e. where similar motifs belonging to related transcription factors are grouped together and only a single matrix represents the group. To this end, we have created the non-redundant set of HOCOMOCO v12 motifs, a derivative of the HOCOMOCO v12 CORE collection.
To this end, we estimated the motif similarities with MacroAPE (see opera.autosome.org/macroape and doi:10.1186/1748-7188-8-23) at the motif P-value cutoff of 0.0005 and default matrix discretization of 1 (upscaled to 10 to reach a better precision for the cases when similarity estimates with the default discretization exceeded 0.01).
Using the pairwise motif similarity matrix, we performed hierarchical clustering using sklearn agglomerative clustering ('average' linkage). The number of clusters was taken to maximize the silhouette score resulting in 523 clusters at the silhouette score of 0.16.
For each cluster, the single representative motif was taken according to the best average similarity to other motifs in the cluster. The annotation contains a list of motifs that constitute a cluster and the list of respective TFs (UniProt IDs).