Hybrid Image Annotation in Folksonomies Using Tags/Words Co-Occurrences

No Thumbnail Available

Date

2018-10-02

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

With the development of Web 2.0 and the launch of Web Sites like Flickr, sharing and collaboratively annotating images(tagging in Folksonomies) with keywords, called tags, are becoming very popular. Although tagging simplifies resource browsing and retrieval, it suffers from several issues. Among the issues are redundancy and ambiguity. Sometimes a tag which is a very important element will be missed, if the user uploads image without tag.This thesis proposed a hybrid image annotation technique which consists of both user assisted(semi-automatic) and automatic image annotation strategies. The study mainly focuses on the problem of (1) resolving tag word-sense ambiguity(tag-word disambiguation) within a typical semi-automatic tagging procedure, and (2) Recommending tags of the new image automatically, if it is uploaded without tags using tags of previously uploaded similar images based on the result of tags (or words) co-occurrence analysis. Both should rely on effective word-to-context relatedness metrics. Among the most effective relatedness metrics are those defined on the basis of a feature vector representation of the words. In the study comparison of different wordto- context relatedness metrics in terms of effectiveness within finding tags (or words) relatedness process is done. Based on the results of the comparison, we propose to use a metrics derived from a Maximum Likelihood estimator of the Jensen-Shannon Divergence among feature-count histograms and we show that such a metrics performs(in terms of quality of the output) better than both the Jensen-Shannon and the Symmetrized Kullback-Leibler divergence between histograms. The relative gain in quality within the task of unsupervised cue-word viii discovery and tags co-occurrence analysis by using a synthetic language corpus has been studied. In tags relatedness analysis using co-occurrence information, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. Context words play the role of features for the target. The overall disambiguation process or tags co-occurrence analysis can be thought as a classification process. A problem with this approach is that a large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work, in disambiguation or tags co-occurrence analysis, a novel approach using a feature selection based on the Shapley Value (SV) – a Coalitional Game Theory related metrics, measuring the importance of a component within a coalition is proposed. By including in the feature set only the words with highest Shapley Value, tags quality(correctness of tags) and performance improvements are obtained. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. The study demonstrates the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus, a corpus prepared from Flickr images database(previously annotated images) and a real world linguistic corpus from Wikipedia English document dump. We showed the extent to which each of the procedures in our approach contributes to the overall performance improvements using standard evaluation metrics.

Description

Keywords

Tagging, Disambiguation, Semantic Relatedness, Dissimilarity Metrics, Jensen-Shannon Divergence, Flickr, Folksonomy, Feature Selection, Shapley Value, Dimensional Reduction

Citation