Вы находитесь на странице: 1из 38

Principle Component Analysis of

NBA Playoffs
A project report submitted to GITAM Institute of Management, GITAM
University in partial fulfillment for the award of degree of

BACHELOR OF BUSINESS ADMINSTRATION


(BUSINESS ANALYTICS)

Submitted by
Y. Bhanu Prakash,
Regd.No: 1214415127

Under the guidance of


Dr. D. Vijaya Geeta,
Associate Professor

GITAM INSTITUTE OF MANAGEMENT


GITAM UNIVERSITY
VISAKHAPATNAM
2015-2018
Declaration By Student
I, Y. Bhanu Prakash, Regd.No:1214415127 hereby declare that the project titled “Principle
Component Analysis of NBA Playoffs” is submitted to GITAM Institute Of Management,
GITAM University is an original work done by me and is not being submitted to any other
University for award of any degree or diploma.

Y. Bhanu Prakash
Regd.No:1214415127
Certificate By Guide
This is to certify that the project titled “Principle Component Analysis of NBA Playoffs”
is project work undertaken by Y. Bhanu Prakash, Regd.No:1214415127 under my
guidance.

Place: Visakhapatnam Dr. D. Vijaya Geeta


Date: Associate Professor
ACKNOWLEDGEMENT
It is a genuine pleasure to express my deep sense of thanks and gratitude to my principal
Prof. P. Sheela, GITAM Institute of Management, GITAM University, Visakhapatnam,
Andhra Pradesh for her continuous support and guidance throughout my project. Her
dedication and keen interest above all her overwhelming attitude to help her students had
been solely and mainly responsible for completing my work. Her timely advice,
meticulous scrutiny and scholarly advice have helped me to a very great extent to
accomplish my task.
And also I take this moment to thank my guide Dr. D. Vijaya Geeta, Associate Professor,
GITAM Institute of Management, GITAM University, Visakhapatnam, Andhra Pradesh.
Her prompt inspirations, timely suggestions with kindness, enthusiasm and dynamism
have enabled me to complete my project. I perceive as this opportunity as a big milestone
in my career development. I will strive to use gained skills and knowledge in the best
possible way, and I will continue to work on their improvement, in order to attain desired
career objectives. Hope to continue cooperation with all of you in the future.

Y. Bhanu Prakash
CONTEXT Pg.No

CHAPTER 1:
INTRODUCTION
 DEFINING: ANALYTICS 2
 DEFINING: SPORTS ANALYTICS 3-6
 PREDICTIVE ANALYTICS 7-13

CHAPTER 2:
METHODOLOGY
 RESEARCH PROBLEM 15
 NEED OF STUDY 15
 OBJECTIVES 15
 SCOPE OF THE STUDY 15
 LIMITATIONS 15
 RESEARCH DESIGN 15-19

CHAPTER 3:
ANALYSIS 21-27

CHAPTER 4:
IMPLICATIONS AND CONCLUSION
 IMPLICATIONS 29
 CONCLUSION 29
CHAPTER 5:
BIBLOGRAPHY AND ANNEXURE
 BIBLIOGRAPHY 31
 ANNEXURE 31-32
LIST OF TABLES
Table No. Title Page No.

1 Number of times teams reached playoffs based on wins 23

LIST OF FIGURES
Chart No. Title Page No.
1 Correlation between Wins and Points Difference Attributes 22
`
CHAPTER – I
INTRODUCTION

1|Page
Defining: Analytics
Analytics is generally defined as finding patterns in data and using those
patterns to answer questions. We can also say,

Analytics is the discovery and communication of meaningful patterns in data.

The important thing to note at this point is that analytics is a process. In fact, it
is an interdisciplinary process which usually brings together mathematics,
statistics, computer science, predictive methods, data visualization, and other
fields of study.

It is also important to note that analytics relies on the presence of data. This
ultimately differentiates the term from “analysis” and unfortunately creates
confusion when trying to decide if what you are doing is analytics or “data
analysis”. For our intents and purposes, the two are essentially the same
process. However, the field has been dubbed “sports analytics” and not “sports
data analysis”, so we will accept the name and move on.

Before continuing to the sports side of things, it should be noted that the term
“analytics” may also be used to describe the results of this process. For
example, “the analytics of our last project suggest that…” is a perfectly valid
sentence. However, in my experience I find that here it is best to just replace
“analytics” with “analytical findings” or “results” and reserve the term
“analytics” for the process through which these results are obtained.

2|Page
Defining: Sports Analytics

Sports analytics is essentially the analytics process, as described above, applied


to sports.

It is the process of using sports-related data (anything from player statistics to


game day weather) to find meaningful patterns (strong correlations, hidden
trends, etc.) and communicate those patterns (using graphs, charts, essays, etc.)
to help make decisions.

In his book, Benjamin Alamar presents a helpful graphic to illustrate the overall
sports analytics process. In his framework, sports analytics consists of four
elements: data management, analytic models, information systems, and the
decision maker.

Alamar’s definition of each element:

o Data management: This includes any and all processes associated with
acquiring, verifying and storing data. The data management element is
ultimately about facilitating the modelling and information-extraction
elements. As mentioned earlier, you can’t have analytics without data.
o Analytic models: This element is essentially the process of applying
statistical tools to data. The use of models to “forecast” player or team

3|Page
performance is often the most popular goal, but it is by no means a necessity.
The models may or may not offer insight into the future. It is most accurate
to say that they are concerned with using mathematics and statistics to
describe the data.
o Information systems: Unlike the previous two elements, the information
systems are slightly more abstract. The purpose of these systems is to extract
and present the data and/or model results as effectively and efficiently as
possible. A scouting report is a good basic example.
o Decision makers: The end goal of analytics is to extract relevant and
insightful information from the data and present it to the decision makers. In
modern sports these tend to be the coaching staff or management, however,
players themselves may also benefit from the whole process.

What is the history of the field?

Although many professionals believe that modern model-heavy sports analytics


is at a point of exciting growth, the field of sports analytics is by no means new.
Technically speaking, any time anyone has ever used data to make a decision
related to sport, they were conducting analytics. However, the general
consensus is that sports analytics began sometime in the 19th century with
baseball. The data (basic statistics such as hits and pitches) was collected with
good old pencil and paper. It was then used create scouting reports which a
coach or manager would use to make decisions about their team.

Referring back to Alamar’s graphic of the entire process, this type of analytics
would lack a modelling element but still follow a logical flow toward the
decision maker. These 19th century baseball decisions to be made were perhaps
fewer and less detailed, but not necessarily easier.

4|Page
What is the current state of the field?

We now have two things which we didn’t have in the 19th century of baseball
analytics. The first of these is more sports nerds. Sports have grown in
popularity and fans have become much more demanding of information. More
often than not sports arguments include statistics, even if they are about whether
or not “number of rings” is a statistic. Everyone and their parents have a fantasy
team and compulsively refresh Twitter in hopes of finding out how long Derrick
Rose(Basketball player) will be out for this season.

The second thing we now have, which in some ways overlaps with the first, is
more data. The recent advances in technology have affected just about every
aspect of life, and sports is no different. The following things have all
contributed to the recent growth in the field:

o The improvements in computing power and digital memory


o The increased quantification of our world (aka the ultimate buzzword: “big
data”)
o The advances made in solving complex engineering problems like vision and
inference

5|Page
Modern sports analytics uses database management systems and things like
SQL where pen and paper were once the norm. Analytical models from
machine learning and data mining are now used to help sort through the data
and find patterns. Models are now updated in real time and together with
innovative visualization techniques are the new breed of information system.

With more data, and more people interested in sports analytics, organizations
are doing their best to gain every possible advantage in every aspect of sports
from training routines to player recruitment and valuation.

What does the future hold?

The field is growing.

More and more sports organizations are hiring analytics “teams” and
“departments”, usually composed of professionals with STEM (science,
technology, engineering, mathematics) degrees. The media appears to be
following suit by recruiting data science professionals to find and visualize the
unique trends that their viewers want to see. There is no reason to believe that
these new opportunities will stop popping up or disappear altogether.

The field has also not gone unnoticed in academia. If conferences like MIT’s
Sloan, journals like Quantitative Analysis in Sports, and new courses offered by
top universities are any indication, institutions have noticed the growth in sports
data and are interested in conducting research in the field.

Sports analytics is sometimes discounted as just an invention of weird metrics.


But it is much more than that. From engineering solutions in data,
like SportsVu, to innovative information systems, like shot charts, the future of
the field is in ultimately working to advance each step of the whole process.

6|Page
Predictive Analytics

Predictive analytics is the use of data, statistical algorithms and machine


learning techniques to identify the likelihood of future outcomes based on
historical data. The goal is to go beyond knowing what has happened to
providing a best assessment of what will happen in the future.

History:

Though predictive analytics has been around for decades, it's a technology
whose time has come. More and more organizations are turning to predictive
analytics to increase their bottom line and competitive advantage. Why now?

 Growing volumes and types of data, and more interest in using data to produce
valuable insights.

 Faster, cheaper computers.

 Easier-to-use software.

 Tougher economic conditions and a need for competitive differentiation.

Importance:Organizations are turning to predictive analytics to help solve


difficult problems and uncover new opportunities. Common uses include:

Detecting fraud. Combining multiple analytics methods can improve pattern


detection and prevent criminal behaviour. As cybersecurity becomes a growing
concern, high-performance behavioural analytics examines all actions on
anetwork in real time to spot abnormalities that may indicate fraud, zero-day
vulnerabilities and advanced persistent threats.

7|Page
Optimizing marketing campaigns. Predictive
analytics are used to determine customer responses
or purchases, as well as promote cross-sell
opportunities. Predictive models help businesses
attract, retain and grow their most profitable
customers.

Improving operations. Many companies use


predictive models to forecast inventory and manage
resources. Airlines use predictive analytics to set
ticket prices. Hotels try to predict the number of
guests for any given night to maximize occupancy
and increase revenue. Predictive analytics enables
organizations to function more efficiently.

Reducing risk. Credit scores are used to assess a


buyer’s likelihood of default for purchases and are
a well-known example of predictive analytics. A credit score is a number
generated by a predictive model that incorporates all data relevant to a person’s
creditworthiness. Other risk-related uses include insurance claims and
collections.

Who's using it?

Any industry can use predictive analytics to reduce risks, optimize operations
and increase revenue. Here are a few examples.

Banking & Financial Services: The financial industry, with huge amounts of
data and money at stake, has long embraced predictive analytics to detect and
reduce fraud, measure credit risk, maximize cross-sell/up-sell opportunities and

8|Page
retain valuable customers. Commonwealth Bank uses analytics to predict the
likelihood of fraud activity for any given transaction before it is authorized –
within 40 milliseconds of the transaction initiation.

Retail: Since the now infamous study that showed men who buy diapers often
buy beer at the same time, retailers everywhere are using predictive analytics to
determine which products to stock, the effectiveness of promotional events and
which offers are most appropriate for consumers. Staples analyses consumer
behaviour to provide a complete picture of their customers, and realized a 137
percent ROI.

Oil, Gas & Utilities: Whether it is predicting equipment failures and future
resource needs, mitigating safety and reliability risks, or improving overall
performance, the energy industry has embraced predictive analytics with vigour.
Salt River Project is the second-largest public power utility in the US and one of
Arizona's largest water suppliers. Analyses of machine sensor data predicts
when power-generating turbines need maintenance.

Governments & the Public Sector: Governments have been key players in the
advancement of computer technologies. The US Census Bureau has been
analysing data to understand population trends for decades. Governments now
use predictive analytics like many other industries – to improve service and
performance; detect and prevent fraud; and better understand consumer
behaviour. They also use predictive analytics to enhance cybersecurity.

Health Insurance: In addition to detecting claims fraud, the health insurance


industry is taking steps to identify patients most at risk of chronic disease and
find what interventions are best. Express Scripts, a large pharmacy benefits
company, uses analytics to identify those not adhering to prescribed treatments,
resulting in a savings of $1,500 to $9,000 per patient.

9|Page
Manufacturing: For manufacturers it's very important to identify factors
leading to reduced quality and production failures, as well as to optimize parts,
service resources and distribution. Lenovo is just one manufacturer that has
used predictive analytics to better understand warranty claims – an initiative
that led to a 10 to 15 percent reduction in warranty costs.

How It Works Predictive models use known results to develop (or train) a
model that can be used to predict values for different or new data. Modelling
provides results in the form of predictions that represent a probability of the
target variable (for example, revenue) based on estimated significance from a
set of input variables.

There are two types of predictive models. Classification models predict class
membership. For instance, you try to classify whether someone is likely to
leave, whether he will respond to a solicitation, whether he’s a good or bad
credit risk, etc. Usually, the model results are in the form of 0 or 1, with 1 being
the event you are targeting. Regression models predict a number – for example,
how much revenue a customer will generate over the next year or the number of
months before a component will fail on a machine.

Three of the most widely used predictive modelling techniques are decision
trees, regression and neural networks.

Decision trees are classification models that partition data into subsets based on
categories of input variables. This helps you understand someone's path of
decisions. A decision tree looks like a tree with each branch representing a
choice between a number of alternatives, and each leaf representing a
classification or decision. This model looks at the data and tries to find the one
variable that splits the data into logical groups that are the most different.

10 | P a g e
Decision trees are popular because they are easy to understand and interpret.
They also handle missing values well and are useful for preliminary variable
selection. So, if you have a lot of missing values or want a quick and easily
interpretable answer, you can start with a tree.

Regression (linear and logistic) is one of the most popular method in statistics.
Regression analysis estimates relationships among variables. Intended for
continuous data that can be assumed to follow a normal distribution, it finds key
patterns in large data sets and is often used to determine how much specific
factors, such as the price, influence the movement of an asset. With linear
regression, one independent variable is used to explain and/or predict the
outcome of dependent variable. Multiple regression uses two or more
independent variables to predict the outcome. With logistic regression,
unknown variables of a discrete variable are predicted based on known value of
other variables. The response variable is categorical, meaning it can assume
only a limited number of values. With binary logistic regression, a response
variable has only two values such as 0 or 1. In multiple logistic regression, a
response variable can have severallevels,such as low, medium and high, or 1, 2
and 3.

11 | P a g e
Neural networks are sophisticated techniques capable of modelling extremely
complex relationships. They’re popular because they’re powerful and flexible.
The power comes in their ability to handle nonlinear relationships in data,
which is increasingly common as we collect more data. They are often used to
confirm findings from simple techniques like regression and decision trees.
They work well when no mathematical formula is known that relates inputs to
outputs, prediction is more important than explanation or there is a lot of
training data. Artificial neural networks were originally developed by
researchers who were trying to mimic the neurophysiology of the human brain.

12 | P a g e
Other Popular Techniques:

1. Bayesian analysis 6. Memory-based reasoning

2. Ensemble models 7. Partial least squares

3. Gradient boosting 8.Principal component analysis

4. Incremental response 9. Support vector machine

5. K-nearest neigh0.bor 10. Time series data mining

13 | P a g e
CHAPTER – II
METHODOLOGY

14 | P a g e
Research Problem:Analysing the stats of teams that reached playoffs in the
previous seasons and predicting the attributes that a team should focus to
qualify for playoffs in next season.

Need of the Study:Many sports franchises are investing huge amounts in


buying the most effective teams. With the huge availability of data, generated
from each match,the dream of the managers can be brought to life by analysing
this data.

Objectives:The main objective is to build a predictive model to identify how


significant each attribute is in helping the team reach the playoffs.

Scope of Study:Selecting the best attributes that a team should possess that
would make sure that the team reaches playoffs.

Limitations:
o As with all statistical data there might be errors in recording of data that
may cause skewness in the analysis.
o Since, it is difficult to get the present data we are using the old data. By
using old data, we have an advantage that we can easily verify the
correctness of the prediction.

Research Design:The train dataset contains 835 observations of 20


attributes. It is the data of National Basketball Association(NBA) teams in all
seasons since 1980 to 2011, except the teams which played less than 82 games.

The test dataset contains 28 observations of 20 attributes. It is the data of


National Basketball Association(NBA) teams in the season 2013.

15 | P a g e
The dataset has 20 different attributes and the description of each is given
below,
1. SeasonEnd – The year in which the particular season ended.
2. Team – Name of the team.
3. Playoffs – Knock-out stage in the tournament.
4. W – Number of wins of a team in a season.
5. PTS – Points scored by a team in a season.
6. oppPTS – Points allowed by a team in a season.
7. FG – In basketball, the term field goal refers to a basket scored on any
shot or tap other than a free throw, worth two or three points depending
on the distance of the attempt from the basket.
8. FGA – Field Goals Attempted by a team in a season.
9. 2P – Two Points scored by a team in a season.
10.2PA – Two Points Attempted by a team in a season.
11.3P – Three Points scored by a team in a season.
12.3PA – Three Points Attempted by a team in a season.
13.FT –In basketball, free throws or foul shots are unopposed attempts to
score points from a restricted area on the court and are generally awarded
after a foul on the shooter by the opposing team. Each successful free
throw is worth one point.
14.FTA – Free Throws Attempted by a team in a season.
15.ORB - A rebound, colloquially referred to as a board, is a statistic
awarded to a player who retrieves the ball after a missed field goal or free
throw. Reboundsare also given to a player who tips in a missed shot on
his team's offensive end.
16.DRB –A rebound, colloquially referred to as a board, is a statistic
awarded to a player who retrieves the ball after a missed field goal or free
throw. Reboundsare also given to a player who tips in a missed shot on
his team's offensive end.

16 | P a g e
17.AST –An assist is attributed to a player who passes the ball to a teammate
in a way that leads to a score by field goal, meaning that he or she was
"assisting" in the basket. There is some judgment involved in deciding
whether a passer should be credited with an assist.
18.STL – A steal occurs when a defensive player legally causes a turnover
by his positive, aggressive action(s). This can be done by deflecting and
controlling, or by catching the opponent's pass or dribble of an offensive
player.
19.BLK –A block or blocked shot occurs when a defensive player legally
deflects a field goal attempt from an offensive player. The defender is not
allowed to make contact with the offensive player's hand (unless the
defender is also in contact with the ball) or a foul is called.
20.TOV –A turnover occurs when a team loses possession of the ball to the
opposing team before a player takes a shot at his team's basket.

Summary Statistics of Attributes:


SeasonEnd Team Playoffs
Min. :1980 Atlanta Hawks : 31 Min. :0.0000
1st Qu.:1989 Boston Celtics : 31 1st Qu.:0.0000
Median :1996 Chicago Bulls : 31 Median :1.0000
Mean :1996 Cleveland Cavaliers: 31 Mean :0.5749
3rd Qu.:2005 Denver Nuggets : 31 3rd Qu.:1.0000
Max. :2011 Detroit Pistons : 31 Max. :1.0000
(Other) :649

W PTS oppPTS FG
Min. :11.0 Min. : 6901 Min. : 6909 Min. :2565
1st Qu.:31.0 1st Qu.: 7934 1st Qu.: 7934 1st Qu.:2974
Median :42.0 Median : 8312 Median : 8365 Median :3150
Mean :41.0 Mean : 8370 Mean : 8370 Mean :3200
3rd Qu.:50.5 3rd Qu.: 8784 3rd Qu.: 8768 3rd Qu.:3434
Max. :72.0 Max. :10371 Max. :10723 Max. :3980

17 | P a g e
FGA X2P X2PA X3P
Min. :5972 Min. :1981 Min. :4153 Min. : 10.0
1st Qu.:6564 1st Qu.:2510 1st Qu.:5269 1st Qu.:131.5
Median :6831 Median :2718 Median :5706 Median :329.0
Mean :6873 Mean :2881 Mean :5956 Mean :319.0
3rd Qu.:7157 3rd Qu.:3296 3rd Qu.:6754 3rd Qu.:481.5
Max. :8868 Max. :3954 Max. :7873 Max. :841.0

X3PA FT FTA ORB


Min. : 75.0 Min. :1189 Min. :1475 Min. : 639.0
1st Qu.: 413.0 1st Qu.:1502 1st Qu.:2008 1st Qu.: 953.5
Median : 942.0 Median :1628 Median :2176 Median :1055.0
Mean : 916.9 Mean :1650 Mean :2190 Mean :1061.6
3rd Qu.:1347.5 3rd Qu.:1781 3rd Qu.:2352 3rd Qu.:1167.0
Max. :2284.0 Max. :2388 Max. :3051 Max. :1520.0

DRB AST STL BLK TOV


Min. :2044 Min. :1423 Min. : 455.0 Min. :204.0 Min. : 931
1st Qu.:2346 1st Qu.:1735 1st Qu.: 599.0 1st Qu.:359.0 1st Qu.:1192
Median :2433 Median :1899 Median : 658.0 Median :410.0 Median :1289
Mean :2427 Mean :1912 Mean : 668.4 Mean :419.8 Mean :1303
3rd Qu.:2516 3rd Qu.:2078 3rd Qu.: 729.0 3rd Qu.:469.5 3rd Qu.:1396
Max. :2753 Max. :2575 Max. :1053.0 Max. :716.0 Max. :1873

Predictive Analytics Concepts Used:


 Correlation: Correlation is a statistical measure that indicates the extent to
which two or more variables fluctuate together. A positive correlation
indicates the extent to which those variables increase or decrease in
parallel; a negative correlation indicates the extent to which
one variable increases as the other decreases.
 Linear Regression: Simple Linear Regression is the method for finding
the "line of best fit" between the dependent variable, y, and the
independent variable, x.

18 | P a g e
 Multiple Regression: The general purpose of multiple regression is to
learn about the relationship between several independent variables and a
dependent variable.
 Sum of Squares due to Error: SSE is the sum of the squared differences
between each observation and its group's mean. It can be used as a
measure of variation within a cluster. If all cases within a cluster are
identical the SSE would then be equal to 0.
 Room Mean Square Error: Root-mean-square error (RMSE) is a
frequently used measure of the differences between values predicted by a
model or an estimator and the values actually observed.
 Total Sum of Squares: SST is the sum of the squared deviations of the
dependent variable about its mean.
 R2 Value: R-squared is a statistical measure of how close the data are to
the fitted regression line. It is also known as the coefficient of
determination.
Steps involved in developing the predictive model:
 Adding the points difference attribute which is the difference between
total points scored and total points considered.
 Find the number of wins required to guarantee a playoff berth.
 Creating a regression model to predict the point difference required to
achieve the wins mentioned above.
 Developing a regression model for points on points attribute and different
attributes.
 Application of developed predictive model on test dataset to predict the
points.
 Calculating the residuals, R2 and Root Mean Square Error(RMSE) values
for the test dataset.
Tools Used:
 R version 3.3.3

19 | P a g e
CHAPTER – III
ANALYSIS

20 | P a g e
Structure of train dataset:

The train dataset contains 835 observations of 20 attributes. It is the data of


National Basketball Association(NBA) teams in all seasons since 1980 to 2011,
except the teams which played less than 82 games.

Inserting Points Difference Column:

We will now insert a column to show the points difference of every team in
each season. This is done because the number of wins is based on the points
scored attribute and opponent points attribute. The inclusion of points difference
attribute is shown in below figure.

21 | P a g e
Correlation between Wins and Points Difference Attributes:

Figure 1:

The above graph shows the correlation between number of wins and points
difference of all teams in all seasons. Since the graph shows a positive slope, we

22 | P a g e
can say that the number of wins is positively correlated with the points
difference.

The statistical representation of the regression between wins and points


difference is shown in the below figure.

Since the R-square value is near to 1, it says the model is very accurate and is
very precise in calculating the number of wins required for a team to qualify for
playoffs.

Then the regression equation is,

Wins, W = 41 + PTSdiff (0.0326)

Table 1:

Number of times Number of times


Number of
teams reached teams not reached
Wins
playoffs playoffs
Below 35 2 273
35-46 163 99
Above 46 277 1

23 | P a g e
The above table gives the information about the number of times a team reached
to playoffs based on their wins in each season.

Assume the number of wins as 47, since teams which won games 47 and above
had reached playoffs almost in all seasons. Then the points difference will be
approximately 184. This is calculated by using the above regression equation.

Regression on Points conceded(oppPTS) and different attributes:

The above figure shows the significance of each attribute on Opponent Points
attribute. The model is considered after the elimination of attributes which had
no significance on Opponent Points attribute.

24 | P a g e
Regression on Point(PTS) and different attributes:

The above figure shows the significance of each attribute on Points attribute.
The model is considered after the elimination of attributes which had no
significance on Points attribute. Since the R-square value is near to 1 we justify
the model is accurate.

Sum of Squares due to Error:

SSE = sum(PointsReg$residuals^2)

Residuals are the difference between the calculated values and original values.

The value of SSE of the model is 28394314

Root Mean Square Error:

RMSE = sqrt(SSE/nrow(NBA))

The value of RMSE of the model is 184.4049

25 | P a g e
Average points of a team in each season:

Mean(NBA$PTS) = 8370.24

When we compare the RMSE value to the mean points is only about the 2% of
the measure. So, we can proceed with the model for further analysis.

Model Implementation on Test dataset:

The above figure shows the structure of test dataset. The test dataset contains 28
observations of 20 attributes. It is the data of National Basketball
Association(NBA) teams in the season 2013.

Based on model (which is generated above), analysis on test dataset is done by


calculating points predictions.

Sum of Squares due to Error:

SSE = sum((PointsPredictions - NBA_test$PTS)^2)

The value of SSE of the model is 1079739

26 | P a g e
Total Sum of Squares:

SST = sum((mean(NBA$PTS) – NBA_test$PTS)^2)

The value of SST is 5765192

R2 Value:

R2 = 1 – SSE/SST = 0.8127142

Root Mean Square Error:

RMSE = sqrt(SSE/nrow(NBA_test))

The value of RMSE of the model is 196.3723

Comparison:

Train Set Test Set


2
R value 0.8983 0.8217
RMSE 184.4049 196.3723

While comparing the values of R2 and RMSE are almost similar for both train
and test datasets. And R2 value is near to 1. So, we can infer that our predicted
model is accurate.

27 | P a g e
CHAPTER – IV
IMPLICATIONS AND CONCLUSION

28 | P a g e
Implications:Analytics in sports helps,

 To retain and engage fans.


 Helps to take decisions instantly.
 Spotting the top performers.
 Improving players performance.
 Building better teams.
 Creating sustainable franchises.

Conclusion:At present every sport is becoming commercial and each one has
its own commercial leagues. Franchises are spending huge amounts on making
the best teams by recruiting the top players in order to attain more profits. So
managers should not select the players by using old traditional methods like
scouting. Since each beavery aspect of the game is available in the form of data,
managers should use it for selecting the players. Different analytical models
help managers to more accurately value the players and minimize the risks.

For example, In the famous book which was even made into a movie
MoneyballA baseball team Oakland Athleticssurprised everyone when they
consecutively qualified for the playoffs with very low budget by recruiting
players who were not so popular. They could do that because they analysed
what are the most underrated attributes which help a team in qualifying. They
spent under 30 million for recruiting players and won 90 games Whereas teams
like New York Yankes spent 90 million and won 100 games and Red Sox spent
80 million and won 90 games. By this, Moneyball have changed the scope of
analytics in sports.

29 | P a g e
CHAPTER – V
BIBLIOGRAPHY AND ANNEXURE

30 | P a g e
Bibliography:
1. sportsanalytics.sa.utoronto.ca
2. sas.com
3. courses.edx.org

Annexure:
Train dataset sample:

SeasonEnd 1980 1980 1980 1980 1980


Atlanta Boston Chicago Cleveland Denver
Team
Hawks Celtics Bulls Cavaliers Nuggets
Playoffs 1 1 0 0 0
W 50 61 30 37 30
PTS 8573 9303 8813 9360 8878
oppPTS 8334 8664 9035 9332 9240
FG 3261 3617 3362 3811 3462
FGA 7027 7387 6943 8041 7470
2P 3248 3455 3292 3775 3379
2PA 6952 6965 6668 7854 7215
3P 13 162 70 36 83
3PA 75 422 275 187 255
FT 2038 1907 2019 1702 1871
FTA 2645 2449 2592 2205 2539
ORB 1369 1227 1115 1307 1311
DRB 2406 2457 2465 2381 2524
AST 1913 2198 2152 2108 2079
STL 782 809 704 764 746
BLK 539 308 392 342 404
TOV 1495 1539 1684 1370 1533

31 | P a g e
Test dataset sample:

SeasonEnd 2013 2013 2013 2013 2013


Atlanta Brooklyn Charlotte Chicago Cleveland
Team
Hawks Nets Bobcats Bulls Cavaliers
Playoffs 1 1 0 1 0
W 44 49 21 45 24
PTS 8032 7944 7661 7641 7913
oppPTS 7999 7798 8418 7615 8297
FG 3084 2942 2823 2926 2993
FGA 6644 6544 6649 6698 6901
2P 2378 2314 2354 2480 2446
2PA 4743 4784 5250 5433 5320
3P 706 628 469 446 547
3PA 1901 1760 1399 1265 1581
FT 1158 1432 1546 1343 1380
FTA 1619 1958 2060 1738 1826
ORB 758 1047 917 1026 1004
DRB 2593 2460 2389 2514 2359
AST 2007 1668 1587 1886 1694
STL 664 599 591 588 647
BLK 369 391 479 417 334
TOV 1219 1206 1153 1171 1149

32 | P a g e

Вам также может понравиться