TY - GEN
T1 - As strong as the weakest link
T2 - 13th Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2013
AU - Bogdanov, Petko
AU - Baumer, Ben
AU - Basu, Prithwish
AU - Bar-Noy, Amotz
AU - Singh, Ambuj K.
PY - 2013
Y1 - 2013
N2 - Mining for cliques in networks provides an essential tool for the discovery of strong associations among entities. Applications vary, from extracting core subgroups in team performance data arising in sports, entertainment, research and business; to the discovery of functional complexes in high-throughput gene interaction data. A challenge in all of these scenarios is the large size of real-world networks and the computational complexity associated with clique enumeration. Furthermore, when mining for multiple cliques within the same network, the results need to be diversified in order to extract meaningful information that is both comprehensive and representative of the whole dataset. We formalize the problem of weighted diverse clique mining (mDKC) in large networks, incorporating both individual clique strength (measured by its weakest link) and diversity of the cliques in the result set. We show that the problem is NP-hard due to the diversity requirement. However, our formulation is sub-modular, and hence can be approximated within a constant factor from the optimal. We propose algorithms for mDKC that exploit the edge weight distribution in the input network and produce performance gains of more than 3 orders of magnitude compared to an exhaustive solution. One of our algorithms, Diverse Cliques (DiCliQ), guarantees a constant factor approximation while the other, Bottom Up Diverse Cliques (BUDiC), scales to large and dense networks without compromising the solution quality. We evaluate both algorithms on 5 real-world networks of different genres and demonstrate their utility for discovery of gene complexes and effective collaboration subgroups in sports and entertainment.
AB - Mining for cliques in networks provides an essential tool for the discovery of strong associations among entities. Applications vary, from extracting core subgroups in team performance data arising in sports, entertainment, research and business; to the discovery of functional complexes in high-throughput gene interaction data. A challenge in all of these scenarios is the large size of real-world networks and the computational complexity associated with clique enumeration. Furthermore, when mining for multiple cliques within the same network, the results need to be diversified in order to extract meaningful information that is both comprehensive and representative of the whole dataset. We formalize the problem of weighted diverse clique mining (mDKC) in large networks, incorporating both individual clique strength (measured by its weakest link) and diversity of the cliques in the result set. We show that the problem is NP-hard due to the diversity requirement. However, our formulation is sub-modular, and hence can be approximated within a constant factor from the optimal. We propose algorithms for mDKC that exploit the edge weight distribution in the input network and produce performance gains of more than 3 orders of magnitude compared to an exhaustive solution. One of our algorithms, Diverse Cliques (DiCliQ), guarantees a constant factor approximation while the other, Bottom Up Diverse Cliques (BUDiC), scales to large and dense networks without compromising the solution quality. We evaluate both algorithms on 5 real-world networks of different genres and demonstrate their utility for discovery of gene complexes and effective collaboration subgroups in sports and entertainment.
UR - https://www.scopus.com/pages/publications/84886517876
U2 - 10.1007/978-3-642-40988-2_34
DO - 10.1007/978-3-642-40988-2_34
M3 - Conference contribution
SN - 9783642409875
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 525
EP - 540
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings
PB - Springer Verlag
Y2 - 23 September 2013 through 27 September 2013
ER -