Application of web usage mining for Extracting employee internet access pattern by URL category: The Case of Commercial Bank of Ethiopia

No Thumbnail Available

Date

2015-10-03

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

The Internet offers e-mail, e-commerce and research tools to enhance productivity of the organization and employees. However, In Ethiopia internet bandwidth allocation is limited and costly. As a result, implementing efficient utilization of internet resources is a major concern. To this end, this study aims at extracting employee‟s internet access pattern and internet usage statistics using web usage mining and analytical tools to indicate efficient bandwidth utilization. In this research, the Web Usage Mining processes model suggested by Sharma [1] is followed which consists of, data collection, data preprocessing, pattern discovery and pattern analysis. Iron port web appliance log data was used for pattern discovery. Moreover, URLProfiler and Mcafee trusted source online tools are used to classify and categorize URLs. Next the log files were preprocessed by applying tasks such as, data cleaning, data integration, feature selection, data separation into different time group, transaction identification is made. Finally a total of 225,448 transactions are used for statistical analysis and by parsing child URL paths and by counting frequency of duplicates 11,523 unique URL parent classes were used for the association rule pattern discovery experiment. After the preprocessing was completed, experiment was conducted with the datasets using Weka Software and applying association rule to discover interesting patterns in different time groups. MS Excel were also employed to yield different useful statistical reports including frequent off time URL category, frequent work time URL category, denied service request time registered, and successful response time replied. The finding indicates that Entertainment and Social URL categories are the most frequently accessed on off duty hours, whereas Internet Services and Business URL categories most frequently accessed on duty hours. In addition to this, employees request denying service during duty hours, Lack of caching algorithm efficiency, employees‟ violation for internet access policy are some of the findings. The major challenge that needs further investigations are categorizing URLs based on their context, which is recommended as future research direction.

Description

Keywords

Web Usage Mining, Pattern Discovery, Association rule, Log file, URL Profiler, Mcafee trusted source

Citation