Вы находитесь на странице: 1из 2

#analyticsx

Text Analysis of American Airline Reviews using


SAS® Enterprise Miner™
Rajesh Tolety and Saurabh Kumar Choudhary
MS in Business Analytics, Oklahoma State University

Text Filtering
INTRODUCTION
• According to a survey report of TripAdvisor, about 43% of the airline passenger rely on online reviews of different
airlines before booking a ticket. Therefore the nature and the tone of the reviews are important metrics for airlines
to track and manage their performance and service.
• Using the text parsing and text filter nodes we can quickly get a glimpse of all the terms present in the review and
also the relationship between them. This will help us in getting acquainted with the various terminology used.
• The text cluster, text topic and text profile nodes are used to group the terms from a similar context. For example,
all the terms related to the quality of customer service are grouped in one cluster whereas the terms describing the
concerns regarding baggage claim are grouped in another cluster. Fig 2. Spell Check Fig 3. Synonym Grouping
• The text rule builder node generates set of rules which indicate the presence or absence of a word or group of • The text filter node enables us to clean the text and helps us in preparing data for doing further analysis.
words. These rules are used to predict a target variable, i.e. whether a feedback is positive or negative. • Using the spell check option we can correct the misspelt words as we can see in Fig 2. The misspelt term ‘passanger’
• This analysis can be done by the American Airlines on a regular basis to asses their performance and find out their is corrected to ‘passenger’ ,’comunication’ to ‘communication’ and so on.
scope of improvement. Also they can classify the new incoming reviews as positive or negative. • The import synonym option in the text filter node can be used to group synonyms together either by adding a table
or by manually selecting the terms and marking them as synonyms. In fig 3.
PROCESS FLOW Text Clustering
• 9 clusters are generated with each containing 20 descriptive terms which describe the cluster. We can see that the
• File import node is used to import the data. The file consists of two columns, one contains the text reviews and the classification is based on different contexts like one containing all the terms which are related to seating comfort,
other indicates whether it is a positive or negative feedback. other cluster containing reviews regarding flight delays and so on.
• Using the data partition node we divide the whole data into training and validation. The validation statistics can • From the cluster frequency by RMS and distance between clusters graph we can say that the clusters are well
then be used to evaluate the results from the text rule builder node. separated from each other and the frequency is also well distributed.
• Then we attach the text parsing node to the file import node. This node cleans up and modifies the unstructured
data. Using this node we can also drop certain types of attributes, entities and also specific parts of speech.
Fig 1 shows the various steps and the nodes used in SAS® Enterprise Miner™

Fig 1. Process Flow Fig 4. Cluster Distribution Fig 5. Clusters Generated


Text Analysis of American Airline Reviews using #analyticsx

SAS® Enterprise Miner™


Rajesh Tolety and Saurabh Kumar Choudhary
MS in Business Analytics, Oklahoma State University

Concept Links:
The text rule builder in this case generated a set of 20 rules. With the presence or absence of a word or group of words
in a review, it can be classified either positive or negative. The results can be interpreted as follows:
• Rule 1 specifies that with the presence of the term ‘hour’ and with the absence of terms like excellent, friendly and
comfortable, we can say with a precision of 99.51 that the review is a negative one.
• Similarly, rule 17 specifies that with the presence of terms like ‘on time and ‘airline’ and with the absence of terms
like ‘miss’ and ‘rude’, we can say with a precision of 87.13 that the review is a positive one.
• If we go through rule 19, it states that the presence of word ‘md80’ alone guarantees with a precision of 86.67 that
the review is a positive one. This result is in contrast with the concept link, according to which the term ‘md80’ is
strongly associated with the term ‘old’. If we go through few of the reviews, we will find that in spite of the fact that
‘md80’ is an old flight, passengers don’t hesitate to fly in this. They find the attendants very friendly and the seating
also comfortable.
Fig 6. Concept Link diagrams for ‘md80’ and ‘business class’
• In a similar way , considering the results from text rule builder and observing the concept links, detailed analysis can
The concept link shows the frequency of occurrence of a term with other terms in the documents. The term which is be done on every individual entity.
being analyzed is at the center and the width of the link determines the strength of the association. Wider the link,
• The training and validation misclassification rate for the model are 15.16% and 19.04% respectively.
stronger is the association, i.e. the two terms were present in the same document for more number of times. The term
‘md80’ is associated strongly to the term ‘old’, whereas the term ‘business class’ is strongly associated to many terms
like ‘seat’, lounge’ and ‘upgrade’. CONCLUSION
• Using the text rule builder node in SAS Enterprise Miner, we can classify the reviews into positive or negative. This
RESULTS type of analysis can be extremely useful to the audience that wants value for their money and also for those people
who like to choose the flight based on certain criterion.
Text Rule Builder:
• Concept links can be used to analyze the occurrence of a term with other terms and also the strength of the
The text rule builder node is used to generate a set of rules using subsets of terms to predict a target variable. Here the
association between the terms.
target variable is binary i.e. whether the feedback is positive or negative.
• Using the text rule builder node in SAS Enterprise Miner, we can classify the reviews as positive or negative.
• We can use the model from the text rule builder and the score node to classify the new reviews.
• The airlines can do this analysis in regular time intervals in order to know what customers think about their service.

REFERENCES
• Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS® by Goutam Chakraborty,
Murali Pagolu, Satish Garla
• SAS Institute Inc 2014. Getting Started with SAS® Text Miner 13.2. Cary, NC: SAS Institute Inc.
Acknowledgement : We wish to express our sincere thanks to Dr. Goutam Chakraborty for his valuable guidance

Fig 7. Text Rule Builder rules

Вам также может понравиться