|Project Description: ||Due to the increased usage of internet sources such as websites and forums, sentiment analysis have become a challenging research area in the past decade. There are many feature-based sentiment analysis approaches where features can be the word itself, or its part-of-speech, or some polarity tags. However, the lexical resources they use (positive and negative words list) does not generate accurate opinion extraction results. Another important problem is to solve complex opinion structures (eg. I do ‘not’ think it is a ‘bad’ product) where, just dealing with words separately is not sufficient.
The proposed solution overcomes these limitations and improves the accuracy of opinion extraction. This system accepts two types of input: text files and URLs from a website (Amazon is considered). Preparing data and building processing components are the main stages of work focused here. Preparing data includes building positive words list, negative words list and lists with words that can invert, increase or decrease the opinion and also enhancing these lists using SentiWordNet. The second stage, building processing components, takes a URL as input and determines the product and the comments. An open source tool called Stanford is used for stemming and parts-of-speech (POS) tagging. Also, opinion tags and special tags generated through Transformation-Based Learning (TBL) are used to invert, increase or decrease the opinion. The final opinion is determined by aggregating the opinion weights at word-level, sentence-level and document-level.