AAU Institutional Repository

Design of Hidden Web Crawler Using Word2vec Model

Show simple item record

dc.contributor.advisor Getahun, Fekade (PhD)
dc.contributor.author Kebede, Engdawerk
dc.date.accessioned 2021-03-31T07:18:04Z
dc.date.available 2021-03-31T07:18:04Z
dc.date.issued 2020-10-09
dc.identifier.uri http://etd.aau.edu.et/handle/123456789/25831
dc.description.abstract World Wide Web (WWW) is a huge repository of hyperlinked documents containing useful information. WWW can be broadly classified in two type’s i.e. surface web and hidden web from the user’s point of view. The surface web consists of static hyperlinked web pages that can be crawled and index by general search engine. On the other hand the hidden web refers to the dynamic web pages which can be accessed through specific query interfaces. Web crawler is program that is specialized in downloading web contents. Conventional web crawler can easily search and analyze the surface web having interlinked html pages but they have the limitations in fetching the data from deep web due to the query interface. To access deep web, a user must request for information from a particular database through a query interface. Traditional web crawler can easily crawl surface web, but not able to crawl the hidden portion of the web. These traditional crawlers retrieve contents from web pages, which are linked by hyperlinks ignoring the information hidden behind form pages, which cannot be extracted using simple hyperlink structure. Thus, they ignore large amount of data hidden behind search forms. In this work, we propose a hidden web crawler using word2vec model the proposed crawling approach use e-commerce product review text word2vec model to extract relevant keyword from e-commerce hidden web page. Once automatically extract keywords considering semantics relatedness between words to fill fields of a hidden web form leads to more accurate and relevant result. The results of the proposed approach are analyzed and found as per our expectation. en_US
dc.language.iso en en_US
dc.publisher Addis Ababa University en_US
dc.subject Hidden Web en_US
dc.subject Surface Web en_US
dc.subject Hidden Web Crawler en_US
dc.subject Word 2vec Model en_US
dc.title Design of Hidden Web Crawler Using Word2vec Model en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AAU-ETD


Browse

My Account