Академический Документы
Профессиональный Документы
Культура Документы
TABLE OF CONTENTS
INTRODUCTION ............................................................................................................................................... 3
EXERCISE INSTRUCTIONS ............................................................................................................................ 4
Acquire Data ..................................................................................................................................................... 4
Scatter Plot and Box Plot ................................................................................................................................ 7
Inter-Quartile Range (IQR) test ....................................................................................................................... 9
Nearest Neighbour Algorithm ...................................................................................................................... 13
FURTHER READING ...................................................................................................................................... 16
2
openSAP Exercise Week 3 Unit 2
INTRODUCTION
These exercises are designed to introduce you to some of the methods we can use to undertake to detect
anomalies using SAP BusinessObjects Predictive Analytics expert tool.
There are 5 columns of data and 150 rows. The columns represent the variables defined above, and the
rows represent the values for each of these variables for each individual store.
This exercise shows how to use different techniques to detect anomalies in the data.
3
openSAP Exercise Week 3 Unit 2
EXERCISE INSTRUCTIONS
Acquire Data
4
openSAP Exercise Week 3 Unit 2
For this exercise, the data set we will be using is the openSAP_STORES_US.csv text data. Therefore,
select Text as the data source and then press Next.
5
openSAP Exercise Week 3 Unit 2
Navigate to the folder where you have downloaded the data sets that accompany this training and select
openSAP_STORES_US.csv:
Press Open. The selected data will be read by SAP Predictive Analytics:
Press Create.
6
openSAP Exercise Week 3 Unit 2
Often the first step in detecting outliers is to use data visualization. For example, you could use a scatter plot
and a box plot to see if there are any unusual data values.
The scatter plot of TURNOVER and SIZE by STORE highlights the fact that the stores at Fort Worth, Little
Rock and Santa Clarita have slightly unusual turnover and store size values:
7
openSAP Exercise Week 3 Unit 2
The scatter plot of MARGIN and STAFF by STORE highlights Moreno Valley, as it has high numbers of staff,
but also indicates there are possibly different groups within the data, and one group has unusually low
margin even with low numbers of staff (bottom left in the plot below):
There are outliers indicated for SIZE. These are represented by the circles in the visualization.
Remember that:
IQR = Q3 Q1
In this visualization, outliers are indicated if the value is either 1.5IQR or more above the third quartile or
1.5IQR or more below the first quartile.
8
openSAP Exercise Week 3 Unit 2
If you hover over the SIZE box in the visualization, it will give you the statistics:
Also, any values less than (2.80 0.75 = 2.05) is classed as an outlier. You can see that there is one value
less than 2.05.
9
openSAP Exercise Week 3 Unit 2
Go to the algorithms on the right side and choose the Inter Quartile Range algorithm that is listed in the
Outliers group of algorithms:
Double click on the Inter Quartile Range algorithm and it will be added to the analysis editor, connected to
the data component:
10
openSAP Exercise Week 3 Unit 2
From the contextual menu of the Inter Quartile Range algorithm component choose Configure Settings.
Notice that the Fence Coefficient is set at 1.5. Remember that 1.5IQR is referred to as the "inner fence.
While 3IQR from the quartile is referred to as the "outer fence. In this test we are looking for any values
that are either 1.5IQR or more above the third quartile or 1.5IQR or more below the first quartile.
Press Done.
11
openSAP Exercise Week 3 Unit 2
To view the results of the analysis press the green arrow button:
If the analysis runs successfully you will get the following message and you can transfer to the Results view:
Press OK.
The results view shows an extra column in the data grid which has a 1 if the data is an outlier or a 0 if it is
not:
12
openSAP Exercise Week 3 Unit 2
If you scroll down, you will see that there are outliers indicated for Fort Worth, Fresno, Riverside, Tucson:
This is the statistical summary created when you press the bottom right radio button Summary:
It shows the values of the first quartile (Q1) and third quartile (Q3) for the SIZE variable. It also shows the
lower and upper fence values when the fence coefficient is set to 1.5. There are 4 outliers detected.
The Nearest Neighbour method is another way to detect outliers. It is based on the concept of a local
density, where locality is given by nearest neighbours.
Return to the Designer view and select New Analysis by clicking the +.
13
openSAP Exercise Week 3 Unit 2
Double click on the Nearest Neighbour Outlier algorithm and it will be added to the analysis editor, connected
to the data component:
From the contextual menu of the Nearest Neighbour Outlier algorithm component choose Configure Settings.
Select the following parameters and press Done:
14
openSAP Exercise Week 3 Unit 2
Run the analysis by clicking on the green arrow button on the toolbar and switch to results view:
The Data Grid has an extra column that indicates if the value is an outlier (a 1 flag) or not (a 0 flag).
15
openSAP Exercise Week 3 Unit 2
FURTHER READING
Detailed instructions and a wide variety of other functionality for analyzing data can be found in the user
guide pa31_expert_user_en.pdf.
16
openSAP Exercise Week 3 Unit 2
Coding Samples
Any software coding or code lines/strings (Code) provided in this documentation are only examples and are
not intended for use in a production system environment. The Code is only intended to better explain and
visualize the syntax and phrasing rules for certain SAP coding. SAP does not warrant the correctness or
completeness of the Code provided herein and SAP shall not be liable for errors or damages cause by use of
the Code, except where such damages were caused by SAP with intent or with gross negligence.
17
www.sap.com