The Home of Community

800K+ People
300K+ Rooms
90K+ Communities
100+ Countries

derrickburns / generalized-kmeans-clustering

This project generalizes the Spark MLLIB K-Means clusterer to support clustering of dense or sparse, low or high dimensional data using distance functions defined by Bregman divergences (e.g. squared Euclidean distance, Kullback-Leibler divergence, etc.) Several variants of standard K-Means are easily implemented atop this package, including bisecting K-means, and Anytime K-means.

scala project generalizes spark mllib k-means