Академический Документы
Профессиональный Документы
Культура Документы
Treasury Analytics 2
Contents
1 Introduction to Data Analytics and Big data .................................................................................................... 4
2 Forms of data ................................................................................................................................................. 6
2.1 Structured ................................................................................................................................................. 6
2.2 Unstructured ............................................................................................................................................. 6
2.3 Semi-structured ......................................................................................................................................... 6
3 Categories of data analytics ........................................................................................................................... 7
3.1 Descriptive analytics .................................................................................................................................. 7
3.2 Predictive analytics .................................................................................................................................... 7
3.3 Prescriptive analytics ................................................................................................................................. 8
4 Leveraging artificial intelligence (AI) and machine learning (ML) for data analytics .......................................... 9
4.1 Artificial intelligence .................................................................................................................................. 9
4.2 Machine learning ....................................................................................................................................... 9
5 Application of data analytics in finance ......................................................................................................... 12
5.1 Risk management .................................................................................................................................... 12
5.2 Fraud detection ....................................................................................................................................... 12
5.3 Consumer analytics .................................................................................................................................. 12
5.4 Algorithmic trading .................................................................................................................................. 12
6 Design overview for implementing analytics .................................................................................................. 13
7 The six stages to successfully perform data analysis ..................................................................................... 14
8 Fundamentals of data visualization ............................................................................................................... 20
9 Storytelling with data ................................................................................................................................... 20
10 Case study – journey to analytics and visualization..................................................Error! Bookmark not defined.
Treasury Analytics 3
1 Introduction to Data Analytics and Big data
Data, in the context of analytics, is any information that can be used to make interpretations. It is information
that can be used as a basis of reasoning, discussion or calculation. Data can be numbers, images, characters
or text that can be collected or examined.
The act of closely examining this data to draw conclusions or inferences is called data analytics. It is the
process of collecting, cleaning, transforming and modelling data to derive meaningful information. It refers to
the techniques of analysing raw data, building models on that data and interpreting the output of the data to
create valuable insights. However, data analytics is a broad term and has evolved gradually as a concept over
the years.
For decades data stored in excel sheets has been cleaned, wrangled and analysed manually with no application
of Analytics. Data was locked and was primarily managed by the IT team. If a business user wanted access to
the Business Intelligence system or a database, a request was sent to the IT team which would extract the
relevant data and share it with the user. Although data was fully governed, the information contained in the
reports was outdated and unstructured thus hardly useful for insightful interpretation and analysis. More
recently, the trend is towards automation and real-time applied analytics. Business users have the access to
the company’s datasets, which allows application of real-time data.
FIGURE 1.1: The Evolution Towards Digitization
Treasury Analytics 4
FIGURE 1.2: The 3 V’s of Big Data
Volume - This refers to the size or the amount of data that is generated.
Velocity - The term 'velocity' refers to the speed at which data is generated. It deals with the speed at which
data flows in from sources such as business processes, application logs, networks, social media sites, sensors
and mobile devices. The flow of such data is massive and continuous.
Variety - Variety refers to the forms of data such as structured and unstructured. This variety of data poses
issues for storage, mining and analyzing, and therefore, is a key defining property of big data.
Treasury Analytics 5
2 Forms of data
There are three forms of data, namely structured, unstructured and semi-structured. They are explained in
detail as follows.
2.1 Structured
Any data with a fixed format is called 'structured' data. Storing and processing this data is easy because the
structure is standardised. This data has fixed parameters, such as length, file type, format and field type. This
data usually can be stored in databases and one can query it using languages such as structured query
language.
Following Table 1.1 presents an example of structured data.
Illustrative only
2.2 Unstructured
Any data with an unknown form/structure is classified as unstructured data. Unstructured data poses multiple
challenges in terms of processing it to derive outcome from it because this data is not in a particular format
and no logic can be directly applied to it. A typical example of unstructured data is a data source containing a
combination of simple text files, images and videos such as a Google search result. This data is in its raw form.
2.3 Semi-structured
It is basically structured data that can have elements that are unstructured. A common example of semi-
structured data are emails. Here, the layout of the template is structured for example the ‘From’ participants,
‘To’ participants, subject, date, and so on. However, the content of the email is subject to change and is
unstructured.
A few examples of structured versus unstructured data can be viewed below:
Treasury Analytics 6
3 Categories of data analytics
Data analytics can be broadly classified into three categories, namely descriptive, predictive and prescriptive.
As suggested in the Figure 1.3, all these categories have different levels of difficulties and help us answer
different questions. These are explained in detail in the following section.
FIGURE 1.3: Categories of Data Analytics
Treasury Analytics 7
3.3 Prescriptive analytics
This category of analytics entails the application of mathematics and computational sciences to suggest the
best course of action for the output received from descriptive and predictive analytics. While descriptive and
predictive analytics provide insights on what will happen in the future, prescriptive analytics goes a step further
and analyses when and why it will happen. This is the most advanced form of analytics and enables smart
decision-making.
Example: Suppose, the CEO of an airline company wants to maximise the company’s profits. Prescriptive
analytics can facilitate this by developing an algorithm that would automatically adjust ticket prices based on
factors such as weather, gasoline prices and customer demands. It can increase ticket prices when too many
customers are looking at the flight prices, and alternatively, reduce prices when the turnout is low. This can be
an automated system and the CEO is not required to monitor ticket sales and market conditions. A computer
program can do all of this and more at a faster pace.
Treasury Analytics 8
4 Leveraging artificial intelligence (AI) and machine learning (ML) for
data analytics
A slew of emerging technologies is shaping our lives, arguably at a very fast pace. Enabled by computing power
and access to large data sets, machines are already doing a better job than humans in several areas. We see it
everywhere around us, from self-driving cars to recommendations in our shopping carts. One of the key
technologies that has enabled this process is AI.
Let us learn using a detailed case study. Illustratively, an autonomous car reaches near a stop signal and must
quickly decide how to stop. Various functions must be modelled within the system of the car to ensure that it
stops at the stop sign. Some components the car would consider are as follows:
► Recognising and identifying the stop sign
► Interpreting location through GPS
► Evaluating surrounding objects
► Controlling and calibrating the speed of a braking mechanism
A network of sensors feed data to an on-board AI system that in turn controls multiple mechanical systems.
Each of these components plays a critical role in the successful operation of the whole system and can also
represent a single point of failure in the reliability and performance of the system. Therefore, trusting the car
to fulfil its purpose requires that we collectively trust every component of the system in its individual design
and performance. Therefore, trust is achieved, sustained or lost at the system level.
Treasury Analytics 9
Let us learn with an example what and how the machine is learning from itself.
Suppose Paul loves listening to new songs. He either likes a song or dislikes it. Paul decides this on the basis of
the song’s tempo, genre, intensity and gender of voice. For simplicity, let us consider only two parameters,
namely tempo and intensity.
Assume that the tempo of the song is mapped on the X axis (ranging from relaxed to fast) and intensity is
mapped on the Y axis (ranging from light to soaring). Let us assume that Paul likes songs with a fast pace and
soaring intensity and dislikes songs with a relaxed pace and light intensity.
FIGURE 1.4: Illustrative for Machine Learning 1
Let us assume that Paul listens to a new song – Song A. This song has a fast pace and a soaring intensity. This
new song, therefore, would lie on the graph in the blue section. By looking at the data and Paul’s past choices,
the machine was able to guess correctly that Paul likes Song A based on the selected parameters.
FIGURE 1.5: Illustrative for Machine Learning 2
Now, let us consider the example of another song – Song B. This song has a medium tempo and medium
intensity. Therefore, it lies somewhere between the blue and orange section. This confuse the system to
determine whether Paul will like the song.
Treasury Analytics 10
FIGURE 1.6: Illustrative for Machine Learning 3
The role of ML is best demonstrated here. If we draw a circle around Song B, we can see that there are three
votes for Paul liking the song and only two votes for disliking. Going by Paul’s previous choices, the machine
will categorise Song B as Paul liking the song. This is a basic ML algorithm. It is known as K-nearest neighbours’
algorithm and is one of the many algorithms that can be used for ML.
The machine in the example learned from data, built a predictive model around it and generated an outcome
without human intervention. It seems easy when there are two parameters, but machines are able to predict
outcomes even in cases where there are multiple parameters and uncountable incoming data. In conclusion,
more the data, better the model and higher the accuracy.
The basics of ML comprise of learning from the environment and applying that learning for decision-making.
There are categories of ML algorithms that make it possible to do this effectively. These are as follows:
1. Supervised learning: In this model, the machine is trained using data that is ‘labelled’. Labelled data is
information that is already tagged with a correct answer. For example, let us consider that you have a
₹5 coin which is gold in colour and weighs 1 gm. In this example, the colour and weight of the coin are
labels. So, every time a new coin is added to the data set that weighs 1 gm and is gold in colour, the
machine would tag it as a ₹5 coin because it has been trained to do so. This is called supervised learning.
The K-nearest model example in the previous section is also a simple supervised ML model.
2. Unsupervised learning: In this model, the algorithm is such that it is not trained for an output. Instead,
you need to allow the model to work on its own to discover information. It mainly deals with unlabelled
data.
3. Reinforcement learning: It is a type of dynamic programming that trains algorithms using a system of
reward and punishment. Here, the system learns by interacting with its environment. The system
receives rewards for performing correctly and penalties for performing incorrectly. The system learns
without intervention from a human by maximising its rewards and minimising its penalties.
The system here can be a self-driving car or a chess-playing program. When the system interacts with its
environment, it receives a reward state depending on how it performs such as driving to a destination safely or
winning a game. Conversely, the system receives a penalty for performing incorrectly such as going off the
Treasury Analytics 11
road or being checkmated. Over time, the agent makes decisions to maximise its rewards and minimise its
penalties by using dynamic programming. The advantage of this approach is that it allows the system to learn
without a programmer spelling out how it should perform the task.
Treasury Analytics 12
6 Design overview for implementing analytics
Good design principles are critical when creating an environment to support data analytics. It must include
components that support storage, analytics, reporting and applications. The environment must include
considerations for software tools, hardware, infrastructure, software management and well-defined
application programming interfaces (APIs). In theory, a general analytical set-up is divided into two key
components, namely front-end and back-end.
FIGURE 1.7: Illustrative High-Level Technical Architecture for Analytics
1. Front-end - In computing theory, ‘front-end’ is a term usually used to represent what is going to be
visible to the end user by using the tool. It may also be called the ‘client side’. It is usually the visual
layer or input layer of the tool. For example, a web page where the user uploads the data that will be
analysed. HTML, CSS and JavaScript are some languages used for front-end development.
2. Back-end - Back-end is the analytical layer or the server side of the application where the data is stored,
manipulated and analysed. Java, Python and Hadoop are examples of tools usually used for back-end
development. In addition, database management is considered as a core part of back-end development.
All data that is uploaded by the user in the front-end is stored electronically in a secure space (usually
knows as servers).
The figure1.7 explains the elements of the technical set-up for analytics. It is evident that the base of this
structure is raw data/system data that is first aggregated. All data is stored in a database from where the
relevant data is extracted to perform analytics. The final output of this analytic computation is represented
through a dashboard that is visible to the end user. In this set-up, the data aggregation and logic/computation
layers form a part of the back-end engine and the visualisation layer forms a part of the front-end engine.
Treasury Analytics 13
7 The six stages to successfully perform data analysis
The process of data analytics involves six key steps to facilitate successful results. These include everything
from data source identification to visually representing the final output. The following Figure 1.8 explains the
six stages.
FIGURE 1.8: The 6 Stages to Successfully Perform Data Analytics
He decides to compare their stock performance against Infotech and Tech Consultancy Services (TCS).
He chooses to compare the companies’ data for a period of 1 year.
2. Data collection: The data collection stage involves gather information on the variables identified in the
previous stage. The accuracy and integrity of the data collected is crucial as it ensures that the outcome
of the analysis is valid.
Data is then collected from various sources ranging from organisational databases to web pages, and
consequently, may not be structured appropriately for the analysis and may contain irrelevant
information. Hence, the collected data must be processed and cleaned before it is analysed.
To analyse the data, Rahul requires the data to be in a standard format. Accuracy is maintained in the
data by trying to extract it from the same source. At this stage, he decides and chooses to collect his
data from the Bombay Stock Exchange. In addition, he acquires LTI’s stock performance data from the
finance department of the company.
Treasury Analytics 14
3. Data processing: At this stage, the collected data is processed and organised for analysis. Based on
the analytical tool to be used, the data must be appropriately structured. For example, the data may
have to be placed into rows and columns in a table in a spreadsheet or imported within a statistical tool.
For organising the data, he decides to use Excel and for visualisation, he chooses to use Power BI. For
both cases, he has to perform the analysis in Excel.
Rahul extracts his raw data in the form of Notepad files. The data is separated by commas and can be
easily transformed into an Excel table.
As a first step to processing his data, he will upload the raw, unformatted data from Notepad into Excel.
He will then use the Text-to-Column function in Excel to place the data in different columns.
Text-to-column can be performed by clicking on data and then on the Text-to-Column function.
Treasury Analytics 15
FIGURE 1.11: Data Processing
4. Data cleaning: Data cleaning is the process of preventing and correcting problems, such as incomplete
data, presence of duplicates, errors and missing values. Moreover, not all columns are relevant for
Rahul to conduct the analysis. For the analysis, he needs the month, closing price, number of shares,
volume and the total turnover of the company. He will, therefore, remove all other columns that are
not required for the analysis by right clicking on the selected column and selecting the delete option to
remove the column. As displayed in the following Figure 1.12, the selected column ‘No of Trades’ is
deleted.
Treasury Analytics 16
In addition, the current data set does not have a unique identifier to accurately identify what company
the data is pertaining to. Therefore, an additional column, Name of Company, is added by Rahul, which
will complete the data set.
FIGURE 1.13: Adding Columns
5. Data modelling: This is the most important step in the whole process and involves the use of several
techniques to understand, interpret and derive conclusions based on the requirements. Statistical data
models such as correlation and regression analysis can be used to identify the relations among the data
variables. The analysis may require additional data cleaning or data collection, and hence, these
activities are iterative in nature.
Rahul decides to calculate correlation of the close prices of stocks to compare their relationship with
each other.
Note: Correlation is a measure of the strength of linear relationship between two variables. It is unit
free and ranges between −1 and +1. The closer it is to −1, the stronger the negative linear relationship.
The closer it is to +1, the stronger the positive linear relationship. The closer it is to 0, the weaker the
linear relationship.
Treasury Analytics 17
FIGURE 1.14: Data Modelling
In Figure 1.14, we use inbuilt function ‘CORREL’ (stock price list 1, stock price list 2) to calculate
correlation coefficients between LTI and Infotech and LTI and TCS. The two columns of closing prices
of the two companies will be the arguments for this function.
Here, we can see that closing prices of LTI and TCS yield a correlation coefficient of −0.214, and thus,
are negatively or inversely related to each other, whereas closing prices of LTI and Infotech yield a
correlation coefficient of 0.225, and thus, are positively related to each other.
6. Data visualisation: The results of the data analysis must be reported in a format understandable by
the senior management to support their decision-making and further action. Their feedback may be
used for additional analysis. Data analysts can choose data visualisation techniques such as tables and
charts to facilitate clear and efficient communication of the message to the users.
Therefore, for visualisation, Rahul uploads the Excel files into Power BI. He uses Line Chart and
Clustered column chart to visualise his data.
Treasury Analytics 18
FIGURE 1.15: Line Chart Visualization
In most periods, when prices of LTI increase, prices of TCS decrease, and vice versa. This justifies the negative
correlation coefficient between the two stock prices that leads to a value closer to zero rather than −1.
In most periods, when there is a decrease in LTI stock prices, Infotech stock prices also decrease, and vice
versa. This justifies the positive correlation coefficient between the two stock prices and results a value closer
to zero rather than +1.
The above visualisation Figure 1.15 can also be changed to a bar chart as in Figure 1.16. However, the
readability of the bar graph is low, and it is difficult to ascertain any pattern as clearly as it in the line graph.
Line graph is better at demonstrating changes and trends over time. Therefore, the choice of type of
visualisation is important to draw appropriate conclusions from the data.
Figure 1.16: Clustered Column Chart Visualization
Treasury Analytics 19
Through the six-step process, meaningful patterns, trends and relationships, which are not easy to determine
using raw data, can be discovered.
Treasury Analytics 20
8. Variety is an important factor that keeps viewers engaged and interested in your data. Find ways to
visualise your data using different and interesting design elements to avoid repetition
9. A unified theme ensures every part of your design is consistent and follows a standard. This should
happen naturally if you have taken care of the aforementioned design principles
Fact check:
Which graph would you use for the following situation?
1. Comparative trends over a period (e.g., stock prices of Company A vs Company B over 1 year) – line chart
or bar graph?
Ans: Line chart because you want to see the number of times that Company A was priced higher than Company
B, and vice versa.
2. Sales by region (e.g., sales of a motor company in India) – pie chart or map?
Ans: Map chart (with bubble) because this would enable you to identify your geographical presence and the
bubble size will allow you to know where you had higher sales.
Treasury Analytics 21
Our offices
Ahmedabad Hyderabad
2nd floor, Shivalik Ishaan Oval Office, 18, iLabs Centre
Near C.N. Vidhyalaya Hitech City, Madhapur
Ambawadi Hyderabad - 500 081
Ahmedabad - 380 015 Tel: + 91 40 6736 2000
Tel: + 91 79 6608 3800
Jamshedpur
Bengaluru 1st Floor, Shantiniketan Building
6th, 12th & 13th floor Holding No. 1, SB Shop Area
“UB City”, Canberra Block Bistupur, Jamshedpur – 831 001
No.24 Vittal Mallya Road Tel: + 91 657 663 1000
Bengaluru - 560 001
Tel: + 91 80 4027 5000 Kochi
+ 91 80 6727 5000 9th Floor, ABAD Nucleus
+ 91 80 2224 0696 NH-49, Maradu PO
Kochi - 682 304
Ground Floor, ‘A’ wing Tel: + 91 484 304 4000
Divyasree Chambers
# 11, O’Shaughnessy Road Kolkata
Langford Gardens 22 Camac Street
Bengaluru - 560 025 3rd Floor, Block ‘C’
Tel: + 91 80 6727 5000 Kolkata - 700 016
Tel: + 91 33 6615 3400
Chandigarh
1st Floor, SCO: 166-167 Mumbai
Sector 9-C, Madhya Marg 14th Floor, The Ruby
Chandigarh - 160 009 29 Senapati Bapat Marg
Tel: + 91 172 331 7800 Dadar (W), Mumbai - 400 028
Tel: + 91 22 6192 0000
Chennai
Tidel Park, 6th & 7th Floor 5th Floor, Block B-2
A Block, No.4, Rajiv Gandhi Salai Nirlon Knowledge Park
Taramani, Chennai - 600 113 Off. Western Express Highway
Tel: + 91 44 6654 8100 Goregaon (E)
Mumbai - 400 063
Delhi NCR Tel: + 91 22 6192 0000
Golf View Corporate Tower B
Sector 42, Sector Road Pune
Gurgaon - 122 002 C-401, 4th floor
Tel: + 91 124 443 4000 Panchshil Tech Park
Yerwada
3rd & 6th Floor, Worldmark-1 (Near Don Bosco School)
IGI Airport Hospitality District Pune - 411 006
Aerocity, New Delhi - 110 037 Tel: + 91 20 4912 6000
Tel: + 91 11 4731 8000
About EY
EY is a global leader in assurance, tax,
transaction and advisory services. The insights
and quality services we deliver help build trust
and confidence in the capital markets and in
economies the world over. We develop
outstanding leaders who team to deliver on our
promises to all of our stakeholders. In so doing,
we play a critical role in building a better working
world for our people, for our clients and for
our communities.
EY refers to the global organization, and may
refer to one or more, of the member firms of
Ernst & Young Global Limited, each of which is a
separate legal entity. Ernst & Young Global
Limited, a UK company limited by guarantee,
does not provide services to clients. For more
information about our organization, please visit
ey.com.
Ernst & Young LLP is one of the Indian client serving member
firms of EYGM Limited. For more information about our
organization, please visit www.ey.com/in.
Ernst & Young LLP is a Limited Liability Partnership, registered
under the Limited Liability Partnership Act, 2008 in India, having
its registered office at 22 Camac Street, 3rd Floor, Block C,
Kolkata - 700016
© 2019 Ernst & Young LLP. Published in India.
All Rights Reserved.
EYINXXXXXXX
ED None