There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. Researchers from the University of Louisville introduce optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a “best” option for a given dataset. The method of weighted rank aggregation is utilized by this package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data.
Two flowcharts comparing procedures for determining an optimal clustering algorithm and optimal number of clusters. The chart on the left explains the procedure using both the clValid and RankAggreg packages. The chart on the right explains the procedure using only the optCluster package.
Availability– This package is available for free through the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/optCluster/index.html