Graduate Projects


Project ID: 464
Author: Chaitanya Chowdary Maddukuri
Project Title: User-Guided Information Extraction Based on Webpage Layout
Semester: 2 2015
Committe Chair: Dr. Longzhuang Li
Committee Member 1: Dr. Dulal Kar
Project Description: Some of our daily activities have been web mining, search and accessibility. These activities possess issues like eliminating noisy information and extracting informative content. The extraction process employs methods like automatic techniques and hand crafted rules. Automatic techniques have their focus on different Rule Based Extraction techniques, but the problem with implementing these techniques is that it increases the time complexity of the extraction process. On the other hand, extraction by using hand crafted rules is generally an effective technique which works by using string manipulation functions, but the preparation of these rules gets difficult and cumbersome for users. In this paper, we present a special approach which contains two steps that invoke each other. Initially it retrieves the information from the source code of the web page and stores them in various blocks and then by applying the Rule based Extraction algorithm based like the density classifier technique it forms various rules for that particular web page and further it retrieves the information from the web pages using these rules so created, Which makes the end user to save a lot of time and navigate through the web page easily and view the web page according to his own interest which achieves the main goal of this project.
Project URL:   464.pdf