Вы находитесь на странице: 1из 9

Running head: CLASSIFICATION METHOD 1

CLASSIFICATION METHOD

Hal Hagood

u05a1
CLASSIFICATION METHOD 2

Describes the use of patterned detection and scoring on a text data set for a given business
question or problem statement and analyzes the influence the patterned detection scoring has on
understanding the business problem.

SAS (2017) States this particular example shows you how to create topics using the Text Topic

node. This example assumes that you are the Using the Text Filter Node, and builds off the process flow

diagram created there. Select the Text Mining tab on the toolbar, and drag a Text Topic node into the

diagram workspace. Connect the Text Filter node to the Text Topic node. In the diagram workspace,

right-click the Text Topic node and select Run. Click Yes in the Confirmation dialog box that appears.

Click Results in the Run Status dialog box when the node finishes running. Select the Topics table to view

the topics created with a default run of the Text Topic node (Using SAS Global Forum 2016 Data).
CLASSIFICATION METHOD 3

Select the Number of Single-term Topics property, type 10, and press Enter on your keyboard.

Click the for the User Topics property. In the User Topics dialog box, click the Add button twice to

create two rows. For this example the terms company and president were used. Enter the

terms company and president, each with a weight of 0.5, and specify the topic company and president for

both.

Right-click the Text Topic node and select Run. Select Yes in the Confirmation dialog box, and

then Results in the Run Status dialog box when the node finishes running. Select the Topics table. Notice

that 10 new single-term topics have been created along with the topic that you specified in the User

Topics dialog box.


CLASSIFICATION METHOD 4

Applies what was learned from a text model to a given business question or problem

Select the Number of Documents by Topics window to see the multi-term, single-term, and user-

created topics by the number of documents that they contain. ( Note the blue delineated area “company

and president”, Topic ID 1, # Documents 28)


CLASSIFICATION METHOD 5

Presents the results of a text mining effort for a given business question or problem statement to
stakeholders in a succinct and relevant manner and analyzes the influence the text model has on
understanding the business problem

In the Interactive Topic Viewer, you can change the topic name, term and document cutoff

values, and the topic weight, this is a key element.

Summarizes the findings of a text mining effort using data visualization for a given business

question or problem statement

Select the topic value “company and president” in the Topics table (above) and rename the topic

to company (below). Select the topic weight for the term “company” in the Terms table, and change it to

0.25. Click Recalculate.


CLASSIFICATION METHOD 6

As we can see above the Topic Weight is now .25 for the term “company”. Its role is that of Noun

with 28 Documents and a Frequency of 73.


CLASSIFICATION METHOD 7

“The Text Topic node provides a facility to interactively adjust the topics. Click the ellipsis button

next to Topic Viewer in the Properties panel. Display 6.16 shows the Interactive Topic Viewer. This

window is divided into three sections: Topics, Terms, and Documents. The contents of the Terms and

Documents sections update based on the topic selected in the Topics section. To select a specific topic,

right-click on the last column of that particular topic, and select Select current Topic.

In the Topics section, you can modify the values in only the Topic, Term Cutoff, and Document

Cutoff columns. For example, consider the topic "+macro,+macro variable,+file,+variable,cornersas." You

can rename this topic "Macros and Macro variables" by just replacing the text in the Topic field. A Term

Cutoff score of 0.04 is calculated for this topic. From the Terms section, all terms with an absolute topic

weight value greater than or equal to 0.04 are assigned to this topic.

You can add more terms to this topic by adjusting the cutoff values. For example, to add the term

"interface" that has a term topic weight of 0.039, you can change the term cutoff value for this topic in the

Topics section to 0.039. By decreasing the term cutoff value, you are effectively increasing the number of

eligible terms for the topic. This also impacts the weights of the documents. This change results in all

terms with a term topic weight of 0.039 being added to this topic, which is not wanted. However, you can

directly adjust the term topic weight value for a specific term in the Terms table. In Display 6.18, the topic

weight value for the term "interface" is changed to 0.04.

Another great functionality of the Text Topic node is the ability to define custom topics of interest.

You can add custom topics directly using the interface or you can import them as a SAS data set. Click

the ellipsis button next to the User Topics property from the node's Properties panel. As shown in Display

6.19, you can see the topic Macros and Macro variables in this table because this topic has been

modified by the user. Scroll to the bottom of the window to add a custom topic named Calcium and its

associated terms as shown in Display 6.19. Enter a value for the weight for each term. Weight is a relative

value between 0 and 1 given to each Role and Term pair that indicates the importance of the term to the

topic. A value of 1 indicates high importance, and a value of 0 indicates low importance.

Run the Text Topic node with the user-defined topics. View the Interactive Topic Viewer after the

completion of the run. Display 6.20 shows the results from the Interactive Topic Viewer after the node is
CLASSIFICATION METHOD 8

run with user-defined topics. The user -defined topics are at the top of the Topics table. There are 176

documents that contain the topic Calcium. You can enhance this topic by adding more terms and

adjusting the cutoffs.

The output data set of the Text Topic node contains new variables that represent topics created

by the node. There are two variables created for each topic: one is the document cutoff score and the

other is a binary variable indicating topic assignment. The binary variable has a value of 1 if the document

topic weight is greater than or equal to the document cutoff score. It has a value of 0 otherwise. In our

example, 27 topic binary variables are created. These include two user-defined topics, 10 single-term

topics, and 15 multi-term topics. Display 6.21 shows the new topic binary variables in the output data set

and the existing text variables. In the presence of a target variable, the topic binary variables lend

themselves as valuable input variables for performing structured data mining analysis” (SAS, 2017).
CLASSIFICATION METHOD 9

Reference

SAS, (2017). Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS

Chapter 6 - Clustering and Topic Extraction. Retrieved August 5, 2017 from

http://viewer.books24x7.com/assetviewer.aspx?bookid=59026&chunkid=342485391&resumeboo

kmarkid=17830027-4f7a-e711-a9c3-00505686029c

SAS, (2017). Using the Text Topic Node. Retrieved August 5, 2017 from

http://support.sas.com/documentation/cdl/en/tmgs/65668/HTML/default/viewer.htm#n0zyyz2tbbrc

6cn1l58ys8dic7nu.htm

Вам также может понравиться