Effectiveness of Content-Based Image Clustering Algorithms

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Retrieval of a set of similar image documents requires clustering the images based on their similar features. Clustered images are utilized by Content-Based Image Retrieval (CBIR) and querying system that requires effective query matching in large image databases. Contentbased image clustering provides a more efficient method of management and retrieval of large number of images documents. The Content-based image clustering facilitates users to browse through only a particular subset of related image documents in an efficient manner. This study focus in validating the two commonly image clustering algorithms namely: hierarchical and k-means. The validation is based on a set of selected MPEG-7 image feature descriptors. The similarity measure input to these clustering algorithms considers both quantitative and predicate-based similarity measures. We computed two similarity measures total color-based similarity matrix as a weighted sum of the MPEG-7 color descriptors and total similarity matrix as a weighted sum of color, texture and shape features. The proposed metric to measure the effectiveness of clustering subsets of COREL color photo images is with respect to their semantic meaning. Shannon’s information theory is selected in the measuring the image’s cluster cohesiveness. The clusters formed are said to be well separated when the distinct clusters formed are associated to a specific image semantic. The separation among clusters becomes better when the semantic association of images to a cluster is predictable. The intra-cluster cohesiveness is also captured by the Shannon’s entropy measure in measuring the clusters separation. The best quality clusters are formed by the hierarchical method that uses the average-linkage method when the same total color similarity matrix is input to all clustering algorithms. Experimental result shows that the quality of clusters formed by k-means clustering is not better than any of the three hierarchical methods. Hierarchical method which uses averagelinkage produced quality of clusters three times better as compared to k-means. Even though weighted texture and shape similarity measures were used in addition to total color the average HACM is the best method compared to both the k-means in the formation of both - x - semantic and cluster cohesive clusters. The other different result obtained is that the addition of texture and shape feature degrades cluster quality for all hierarchical methods.



Effectiveness ;of Content-Based