blue banner

Graduate Projects - Details

Computer Science Program

Project ID: 494
Author: Tejaswi Reddy Kolli
Project Title: A Prototype for Comparing Classification algorithms by mining Frequent Patterns
Semester: 1 2015
Committe Chair: Dr. Longzhuang Li
Committee Member 1: Dr. David Thomas
Committee Member 2: -
Project Description: Large corporations and multi-national companies produce a vast number of products, actions, and services. They generate and share the description of their services and products in textual format. Customers write the reviews of their products and services in a textual format as well. These textual descriptions may contain structured information under the unstructured text. Though data extraction algorithms can expedite the process of extracting structured text, they are unreliable when working on some metadata that doesn't contain any structured data. By pre-processing the data, we can extract some structured information from huge amounts of data. We propose a comprehensive approach using the textual content and data mining algorithm for mining the data. We use Apriori algorithm that identifies frequent item sets. The Apriori algorithm generates the frequent itemsets. We collect Amazon user reviews that contain different types of reviews for our testing. We identify frequent words in each review set which contribute to the review rating. Then we generate an ARFF (Attribute-Relation File Format) file to compare various classification algorithms using Weka tool. The three major classification algorithms compared in this paper are ZeroR, NaiveBayes and J48 algorithm. Our experimental evaluation shows the comparison between different algorithms in terms of accuracy, precision, recall and F-measure using the Weka tool.
Project URL:   494.pdf
© Texas A&M University-Corpus Christi • 6300 Ocean Drive, Corpus Christi, Texas 78412 • 361-825-5700