Measuring Similarity Between News Items Using Link Analysis and Semantic Approach

No Thumbnail Available

Date

2012-08

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

In the recent years, the ways people acquire information have been completely changed. Activities such as reading hardcopy materials such as books, journals, and newspapers, have radically declined, and most of the people go online to find recent and up-to-date information. As a result, news feeds technology such as RSS and ATOM was created to allow news users to get frequently update information. However, the number of news items that will be downloaded to the aggregator will be unmanageable when the number of provides grows. This will be even annoying when some of the news items are similar to already read news items. One of the possible solutions to this challenge is to measure similarity among news items. Measure similarity between news items is pre-requisite to a number of application areas, grouping, clustering, merging and revision/version control. Since news Feeds are XML files, they do have several sub-elements such as title, description/summery, link, guild, etc…. Previously item/entry sub-elements such as title and description/summary have been used as input in measuring similarity. In this work, we propose to use link sub-element information that improves and supplement the similarity computation between two items. As news page contains links to set of related news pages, our new similarity approach uses these links in measuring similarity. We developed new similarity measures that consider the link sub-element and related news links together with their anchor text. In order to validate our approach, we developed a prototype implementing the link based news Feed similarity measure. Experimental results show that the link based news feed similarity is more helpful in measuring similarity when it is combined with computing similarity only with title and description sub-elements and compared to using SimRank and co-citation. Keywords: similarity measure, link analysis, news Feed, Semantic similarity

Description

Keywords

Similarity Measure, Link Analysis, News Feed, Semantic Similarity

Citation

Collections