Project ID: 212
Author: Jermy D. Zapata
Project Title: A Learning Aggregator - A Bayesian Classification System For Online Syndication
Semester: Fall 2003
Committe Chair: Dr. Mario Garcia
Committee Member 1: Dr. David Thomas
Committee Member 2: Dr. Dulal C. Kar
Project Description: Throughout the past decade machine learning research has attempted to address several challenges that the internet and its dynamic nature have presented. Particularly interesting are problems concerning classification. One way to study this area is by examining syndicate web sites that publish an .RSS file for their site. This file also known as a “feed” is essentially a listing of pages on their site with each containing a headline, URL, description, as well as other information. This information is updated on a regular basis by the web site and can be downloaded by users who have a program that can read the file. These programs, called “aggregators,” up to now only display the headlines and allow the user to browse to a particular story that they find interesting. This project will attempt to incorporate a machine learning algorithm (Naive Bayes Classifier) to find a way to classify news stories based upon their RSS information. Using RSS information in order to classify will be compared to using the full text of the article.
Project URL:   212.pdf