Effectiveness of Content-Based Image Clustering Algorithms
No Thumbnail Available
Date
2007-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Retrieval of a set of similar image documents requires clustering the images based on their
similar features. Clustered images are utilized by Content-Based Image Retrieval (CBIR) and
querying system that requires effective query matching in large image databases. Contentbased
image clustering provides a more efficient method of management and retrieval of
large number of images documents. The Content-based image clustering facilitates users to
browse through only a particular subset of related image documents in an efficient manner.
This study focus in validating the two commonly image clustering algorithms namely:
hierarchical and k-means. The validation is based on a set of selected MPEG-7 image feature
descriptors. The similarity measure input to these clustering algorithms considers both
quantitative and predicate-based similarity measures. We computed two similarity measures
total color-based similarity matrix as a weighted sum of the MPEG-7 color descriptors and
total similarity matrix as a weighted sum of color, texture and shape features.
The proposed metric to measure the effectiveness of clustering subsets of COREL color
photo images is with respect to their semantic meaning. Shannon’s information theory is
selected in the measuring the image’s cluster cohesiveness. The clusters formed are said to
be well separated when the distinct clusters formed are associated to a specific image
semantic. The separation among clusters becomes better when the semantic association of
images to a cluster is predictable. The intra-cluster cohesiveness is also captured by the
Shannon’s entropy measure in measuring the clusters separation.
The best quality clusters are formed by the hierarchical method that uses the average-linkage
method when the same total color similarity matrix is input to all clustering algorithms.
Experimental result shows that the quality of clusters formed by k-means clustering is not
better than any of the three hierarchical methods. Hierarchical method which uses averagelinkage
produced quality of clusters three times better as compared to k-means. Even though
weighted texture and shape similarity measures were used in addition to total color the
average HACM is the best method compared to both the k-means in the formation of both
- x -
semantic and cluster cohesive clusters. The other different result obtained is that the addition
of texture and shape feature degrades cluster quality for all hierarchical methods.
Description
Keywords
Effectiveness ;of Content-Based