Measuring Similarity Between News Items Using Link Analysis and Semantic Approach
No Thumbnail Available
Date
2012-08
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
In the recent years, the ways people acquire information have been completely changed. Activities
such as reading hardcopy materials such as books, journals, and newspapers, have radically
declined, and most of the people go online to find recent and up-to-date information. As a result,
news feeds technology such as RSS and ATOM was created to allow news users to get frequently
update information. However, the number of news items that will be downloaded to the aggregator
will be unmanageable when the number of provides grows. This will be even annoying when some
of the news items are similar to already read news items.
One of the possible solutions to this challenge is to measure similarity among news items. Measure
similarity between news items is pre-requisite to a number of application areas, grouping,
clustering, merging and revision/version control. Since news Feeds are XML files, they do have
several sub-elements such as title, description/summery, link, guild, etc…. Previously item/entry
sub-elements such as title and description/summary have been used as input in measuring
similarity. In this work, we propose to use link sub-element information that improves and
supplement the similarity computation between two items. As news page contains links to set of
related news pages, our new similarity approach uses these links in measuring similarity. We
developed new similarity measures that consider the link sub-element and related news links
together with their anchor text.
In order to validate our approach, we developed a prototype implementing the link based news
Feed similarity measure. Experimental results show that the link based news feed similarity is
more helpful in measuring similarity when it is combined with computing similarity only with
title and description sub-elements and compared to using SimRank and co-citation.
Keywords: similarity measure, link analysis, news Feed, Semantic similarity
Description
Keywords
Similarity Measure, Link Analysis, News Feed, Semantic Similarity