Вы находитесь на странице: 1из 18

openSAP ds1

Exercise Week 3 Unit 2


Detect Anomalies
openSAP

TABLE OF CONTENTS
INTRODUCTION ............................................................................................................................................... 3
EXERCISE INSTRUCTIONS ............................................................................................................................ 4
Acquire Data ..................................................................................................................................................... 4
Scatter Plot and Box Plot ................................................................................................................................ 7
Inter-Quartile Range (IQR) test ....................................................................................................................... 9
Nearest Neighbour Algorithm ...................................................................................................................... 13
FURTHER READING ...................................................................................................................................... 16

2
openSAP Exercise Week 3 Unit 2

INTRODUCTION
These exercises are designed to introduce you to some of the methods we can use to undertake to detect
anomalies using SAP BusinessObjects Predictive Analytics expert tool.

The data to be used is openSAP_STORES_US.csv.

This data set is a short list of US based retail stores.

The data set contains the following variables:


STORE (US City location of the store)
TURNOVER (annual sales for the previous 12 month period for each store $000000)
SIZE (size of retail floor space in 000s of sq. ft. for each store)
STAFF (number of staff members in 10s)
MARGIN (total gross margin per store $00000)

There are 5 columns of data and 150 rows. The columns represent the variables defined above, and the
rows represent the values for each of these variables for each individual store.

This exercise shows how to use different techniques to detect anomalies in the data.

3
openSAP Exercise Week 3 Unit 2

EXERCISE INSTRUCTIONS
Acquire Data

Open SAP BusinessObjects Predictive Analytics Select Expert Analytics:

Open Expert Analytics:

4
openSAP Exercise Week 3 Unit 2

Select Acquire Data:

For this exercise, the data set we will be using is the openSAP_STORES_US.csv text data. Therefore,
select Text as the data source and then press Next.

5
openSAP Exercise Week 3 Unit 2

Navigate to the folder where you have downloaded the data sets that accompany this training and select
openSAP_STORES_US.csv:

Press Open. The selected data will be read by SAP Predictive Analytics:

Press Create.

6
openSAP Exercise Week 3 Unit 2

Scatter Plot and Box Plot


The data set will be created and you will enter the Visualize Room:

Often the first step in detecting outliers is to use data visualization. For example, you could use a scatter plot
and a box plot to see if there are any unusual data values.

The scatter plot of TURNOVER and SIZE by STORE highlights the fact that the stores at Fort Worth, Little
Rock and Santa Clarita have slightly unusual turnover and store size values:

7
openSAP Exercise Week 3 Unit 2

The scatter plot of MARGIN and STAFF by STORE highlights Moreno Valley, as it has high numbers of staff,
but also indicates there are possibly different groups within the data, and one group has unusually low
margin even with low numbers of staff (bottom left in the plot below):

The box plots can be created for the four measures:

There are outliers indicated for SIZE. These are represented by the circles in the visualization.

Remember that:
IQR = Q3 Q1
In this visualization, outliers are indicated if the value is either 1.5IQR or more above the third quartile or
1.5IQR or more below the first quartile.

8
openSAP Exercise Week 3 Unit 2

If you hover over the SIZE box in the visualization, it will give you the statistics:

For SIZE, IQR = Q3 - Q1 = 3.30 2.80 = 0.50


So, 1.5 x IQR = 1.5 x 0.50 = 0.75
Any values greater than (3.30 +0.75 = 4.05) is classed as an outlier in this visualization. You can see that
there are three values greater than 4.05.

Also, any values less than (2.80 0.75 = 2.05) is classed as an outlier. You can see that there is one value
less than 2.05.

Inter-Quartile Range (IQR) test


You can identify outliers using the Inter-Quartile range (IQR) test in the Predict room.

Navigate to the Predict room:

The data component is shown in the icon on the left side.

9
openSAP Exercise Week 3 Unit 2

Go to the algorithms on the right side and choose the Inter Quartile Range algorithm that is listed in the
Outliers group of algorithms:

Double click on the Inter Quartile Range algorithm and it will be added to the analysis editor, connected to
the data component:

10
openSAP Exercise Week 3 Unit 2

From the contextual menu of the Inter Quartile Range algorithm component choose Configure Settings.

Select the following parameters:

Notice that the Fence Coefficient is set at 1.5. Remember that 1.5IQR is referred to as the "inner fence.
While 3IQR from the quartile is referred to as the "outer fence. In this test we are looking for any values
that are either 1.5IQR or more above the third quartile or 1.5IQR or more below the first quartile.

Press Done.

11
openSAP Exercise Week 3 Unit 2

To view the results of the analysis press the green arrow button:

If the analysis runs successfully you will get the following message and you can transfer to the Results view:

Press OK.

The results view shows an extra column in the data grid which has a 1 if the data is an outlier or a 0 if it is
not:

12
openSAP Exercise Week 3 Unit 2

If you scroll down, you will see that there are outliers indicated for Fort Worth, Fresno, Riverside, Tucson:

This is the statistical summary created when you press the bottom right radio button Summary:

It shows the values of the first quartile (Q1) and third quartile (Q3) for the SIZE variable. It also shows the
lower and upper fence values when the fence coefficient is set to 1.5. There are 4 outliers detected.

Nearest Neighbour Algorithm

The Nearest Neighbour method is another way to detect outliers. It is based on the concept of a local
density, where locality is given by nearest neighbours.

Return to the Designer view and select New Analysis by clicking the +.

13
openSAP Exercise Week 3 Unit 2

Select Nearest Neighbour Outlier in the Outliers group in Algorithms:

Double click on the Nearest Neighbour Outlier algorithm and it will be added to the analysis editor, connected
to the data component:

From the contextual menu of the Nearest Neighbour Outlier algorithm component choose Configure Settings.
Select the following parameters and press Done:

14
openSAP Exercise Week 3 Unit 2

Run the analysis by clicking on the green arrow button on the toolbar and switch to results view:

The Data Grid has an extra column that indicates if the value is an outlier (a 1 flag) or not (a 0 flag).

Go to the Summary radio button on the bottom right.

The Summary chart shows the following information:

It shows that 4 outliers have been detected.

This completes the introductory exercise to Week 3 Unit 2 Detecting Anomalies.

15
openSAP Exercise Week 3 Unit 2

FURTHER READING
Detailed instructions and a wide variety of other functionality for analyzing data can be found in the user
guide pa31_expert_user_en.pdf.

16
openSAP Exercise Week 3 Unit 2

Coding Samples
Any software coding or code lines/strings (Code) provided in this documentation are only examples and are
not intended for use in a production system environment. The Code is only intended to better explain and
visualize the syntax and phrasing rules for certain SAP coding. SAP does not warrant the correctness or
completeness of the Code provided herein and SAP shall not be liable for errors or damages cause by use of
the Code, except where such damages were caused by SAP with intent or with gross negligence.

17
www.sap.com

2016 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form
or for any purpose without the express permission of SAP SE or an SAP
affiliate company.
SAP and other SAP products and services mentioned herein as well as their
respective logos are trademarks or registered trademarks of SAP SE (or an
SAP affiliate company) in Germany and other countries. Please see
http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for
additional trademark information and notices. Some software products
marketed by SAP SE and its distributors contain proprietary software
components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for
informational purposes only, without representation or warranty of any kind,
and SAP SE or its affiliated companies shall not be liable for errors or
omissions with respect to the materials. The only warranties for SAP SE or
SAP affiliate company products and services are those that are set forth in
the express warranty statements accompanying such products and services,
if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue
any course of business outlined in this document or any related presentation,
or to develop or release any functionality mentioned therein. This document,
or any related presentation, and SAP SEs or its affiliated companies
strategy and possible future developments, products, and/or platform
directions and functionality are all subject to change and may be changed by
SAP SE or its affiliated companies at any time for any reason without notice.
The information in this document is not a commitment, promise, or legal
obligation to deliver any material, code, or functionality. All forward-looking
statements are subject to various risks and uncertainties that could cause
actual results to differ materially from expectations. Readers are cautioned
not to place undue reliance on these forward-looking statements, which
speak only as of their dates, and they should not be relied upon in making
purchasing decisions.

Вам также может понравиться