Hybrid Image Annotation in Folksonomies Using Tags/Words Co-Occurrences
No Thumbnail Available
Date
2018-10-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
With the development of Web 2.0 and the launch of Web Sites like Flickr, sharing
and collaboratively annotating images(tagging in Folksonomies) with keywords,
called tags, are becoming very popular. Although tagging simplifies resource
browsing and retrieval, it suffers from several issues. Among the issues are redundancy
and ambiguity. Sometimes a tag which is a very important element will
be missed, if the user uploads image without tag.This thesis proposed a hybrid
image annotation technique which consists of both user assisted(semi-automatic)
and automatic image annotation strategies. The study mainly focuses on the problem
of (1) resolving tag word-sense ambiguity(tag-word disambiguation) within a
typical semi-automatic tagging procedure, and (2) Recommending tags of the
new image automatically, if it is uploaded without tags using tags of previously
uploaded similar images based on the result of tags (or words) co-occurrence
analysis.
Both should rely on effective word-to-context relatedness metrics. Among the
most effective relatedness metrics are those defined on the basis of a feature
vector representation of the words. In the study comparison of different wordto-
context relatedness metrics in terms of effectiveness within finding tags (or
words) relatedness process is done. Based on the results of the comparison,
we propose to use a metrics derived from a Maximum Likelihood estimator of
the Jensen-Shannon Divergence among feature-count histograms and we show
that such a metrics performs(in terms of quality of the output) better than both
the Jensen-Shannon and the Symmetrized Kullback-Leibler divergence between
histograms. The relative gain in quality within the task of unsupervised cue-word
viii
discovery and tags co-occurrence analysis by using a synthetic language corpus
has been studied.
In tags relatedness analysis using co-occurrence information, a word is assigned
to a specific context chosen among the different ones to which it is related.
Relatedness to a context is often defined based on the co-occurrence of the target
word with other words (context words) in sentences of a specific corpus. Context
words play the role of features for the target. The overall disambiguation process
or tags co-occurrence analysis can be thought as a classification process. A
problem with this approach is that a large number of possible context words can
reduce the classification performance, both in terms of computational effort and
in terms of quality of the outcome. Feature selection can improve the process
in both regards, by reducing the overall feature space to a manageable size with
high information content. In this work, in disambiguation or tags co-occurrence
analysis, a novel approach using a feature selection based on the Shapley Value
(SV) – a Coalitional Game Theory related metrics, measuring the importance of
a component within a coalition is proposed. By including in the feature set only
the words with highest Shapley Value, tags quality(correctness of tags) and performance
improvements are obtained. The problem of the exponential complexity
in the exact SV computation is avoided by an approximate computation based on
sampling. The study demonstrates the effectiveness of this method and of the
sampling approach results, by using both a synthetic language corpus, a corpus
prepared from Flickr images database(previously annotated images) and a real
world linguistic corpus from Wikipedia English document dump.
We showed the extent to which each of the procedures in our approach contributes
to the overall performance improvements using standard evaluation metrics.
Description
Keywords
Tagging, Disambiguation, Semantic Relatedness, Dissimilarity Metrics, Jensen-Shannon Divergence, Flickr, Folksonomy, Feature Selection, Shapley Value, Dimensional Reduction