You are on page 1of 4

 %H\RQG0RQH\EDOO7KHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFV3DUW,

Beyond Moneyball:
The rapidly evolving world of sports analytics,
Part I

Professional baseball and basketball teams are leading the pack of sports organizations that are embracing analytics.

Editors note: This article appears in the September/October 2011 issue of Analytics Magazine.

By Benjamin Alamar and Vijay Mehrotra


Over the past few years, the world of sports has experienced an explosion in the use of analytics. In this three-
part series, we reflect on the current state of sports analytics and consider what the future of sports analytics may
look like.
We define sports analytics as the management of structured historical data, the application of predictive analytic
models that utilize that data, and the use of information systems to inform decision makers and enable them to
help their organizations in gaining a competitive advantage on the field of play. Our definition is both expansive
(in the sense that it includes not only statistical models but also the broader information value chain that
surrounds these models) and restrictive (because it excludes traditional analytics applications such as demand
forecasting, revenue management and financial modeling, all of which are certainly relevant in the business of
professional sports). Our framework for sports analytics is presented in Figure 1.

KWWSZZZDQDO\WLFVPDJD]LQHRUJVSHFLDODUWLFOHVEH\RQGPRQH\EDOOWKHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFVSDUWL"WPSO FRPSRQHQW SULQW  OD\ 


 %H\RQG0RQH\EDOO7KHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFV3DUW,

Figure 1: A framework of sports analytics.

Data management includes any and all processes associated with acquiring, verifying and storing data in an
efficient manner. In a sports organization, data can come from a variety of sources and may be presented in many
different forms.
As shown in Figure 1, the data management function will feed both the predictive analytics function and the
information systems that support decision-makers. Given this crucial role, good data management is essential,
and therefore missing, incomplete and/or inaccessible data inherently reduces the value of any other investments
in analytics.
In many organizations, data is often stored in isolated silos, so that getting data is often not a smooth process.
Different groups within an organization such as scouting or training may have extensive data on players that other
groups either do not have access to or do not even know exist.
For example, the personnel group at one NFL team had been collecting extensive performance data on various
groups of both opposing players and their own players. The coaching staff had no idea that the data existed, but
when they did discover it, they had difficulty accessing it. The data resided in spreadsheets on the computers of
the personnel group instead of being integrated into a common data archive. This is a common situation within
professional sports organizations.

Predictive analysis, the next piece of the framework, is the process of applying statistical tools to data to gain
insight into what is likely to happen in the future. In sports, this can involve the projection of the pro careers of
amateur players, identifying how the strengths and weaknesses of an opponent will play out against your own
teams strengths and weaknesses, or assessing whether a free agent would fill a need on a team at an appropriate
cost. Depending on the importance of the problem, the time until an answer is needed and the data available,
these analyses can range from simple comparisons to extremely complicated and cutting-edge statistical analysis.
The results of these analyses may feed directly into an intelligent information system that provides decision-
makers with standardized results. Alternately, such results may be reported directly to a decision-maker for
special projects that may be outside of any standard systems.

Information systems, the next component in the framework, are increasingly common in the world of sports.
When designed and implemented correctly, such information systems typically allow for visualization and
interactive analysis of relevant information from multiple sources in one place, organized in aFRPSRQHQW SULQW
KWWSZZZDQDO\WLFVPDJD]LQHRUJVSHFLDODUWLFOHVEH\RQGPRQH\EDOOWKHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFVSDUWL"WPSO meaningful way to 
 OD\
 %H\RQG0RQH\EDOO7KHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFV3DUW,
interactive analysis of relevant information from multiple sources in one place, organized in a meaningful way to
provide insights for decision makers. For example, a cutting-edge sports information system might combine
unstructured information from scouting reports, summary reports from multiple data sources and results from
predictive models. Such a system not only provides a data-driven decision support platform and integrates data
from multiple sources, but (as we will discuss in Part 2 of this series) also has the potential to fundamentally alter
and enhance the way a decision-maker does his or her job.

Decision-makers are the ultimate customers for all components in the sports analytics framework. However, the
modern professional sports organization typically has many different decision-makers, including the general
manager, coaches, scouts, trainers, salary cap managers and other personnel executives. Decision-makers in
different functional areas may utilize different data and models to tackle different types of questions. Conversely,
as mentioned above, one key problem today is that decision-makers in one functional area (such as scouts) rarely
have easy access to information generated by personnel in other areas (such as assistant coaches or salary cap
managers).
To summarize, our definition and framework for sports analytics encompasses several different and related
aspects associated with turning raw data into information that is valued by and has an impact on decision-
makers in the world of sports.

An Explosion of Interest in Sports Analytics


Though still a nascent, unstructured field (as we will discuss in more detail in Part 3), interest and activity in
sports analytics has been exploding in recent years.
While studies applying mathematical models to professional sports data can be traced back more than 50 years
[1], it is important to remember what the world looked look like as recently as 2005 when the first issue of the
Journal of Quantitative Analysis in Sports (www.bepress.com/jqas/) was published. At the time this journal was
launched, only two or three NBA teams thought about using advanced statistics in connection with players and
strategy. Michael Lewis seminal book [2], Moneyball: The Art of Winning an Unfair Game, about the Oakland As
use of data and models had recently been published, and no one had yet thought seriously about the application
of motion capture technology in the context of professional sports. Just six short years later, more than half of
NBA teams now utilize the tools of analytics on the team side of the their operation, most MLB teams now
consider analytics a normal part of baseball operations, and companies such as STATS LLC are installing cameras
in NBA arenas and NFL stadiums to capture more and more data.
On a broader scale, the annual Sloan Sports Analytics Conference serves as a vivid symbol of the growth of sports
analytics. The first Sloan conference took place in 2006 in a few classrooms on the MIT campus with less than
300 attendees. The 2011 conference was held at the Boston Convention Center and attracted more than 2,000
attendees.

An Explosion of Data
Data within a sports organization used to consist of individual box scores, player and team summary statistics,
text-based scouting reports and raw game films. However, the data available to decision-makers has grown
exponentially over the last 15 years.
Several factors have contributed to this explosion in data. Innovations in sports science, ranging from training
routines to nutritional regimens, coupled with improved reporting from medical staffs and trainers have all come
with their own data sets that are gathered and tracked somewhere within an organization. With improved
communications via the Internet, the frequency and amount of information captured, stored and distributed by
scouts and coaches at all levels has grown significantly. Thanks to increased computing power and reduced
storage costs, historical data about the games themselves is now packaged into many different formats, with
companies such as Stats LLC, StatDNA and Sports Data Hub emerging to provide organizations with high-quality
historical data presented with unique summaries and indexes.
Finally, the advent of motion capture technology has expanded the data collected from each game. This
technology tracks everything that moves on a field every 100th of a second. The impact of this is staggering for it
transforms the amount of information captured for a single game from a few hundred rows of data to well over
one million. Major League Baseball, the NBA and pro soccer teams have implemented this type of technology.
The result of all of this is clear: The world of sports generates far, far more data today than could have been
imagined just a few short years ago. Dean Oliver, director of Publication Analytics at ESPN, has spoken of finding
data that can win championships.
However, as the computer scientist Clifford Stoll has said, Data is not information, information is not knowledge,
knowledge is not understanding, understanding is not wisdom. Too much time is still spent by analysts using
their skills to try and answer questions that are not meaningful to decision-makers in pro sports.
KWWSZZZDQDO\WLFVPDJD]LQHRUJVSHFLDODUWLFOHVEH\RQGPRQH\EDOOWKHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFVSDUWL"WPSO For example,
FRPSRQHQW SULQW  OD\ 
 %H\RQG0RQH\EDOO7KHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFV3DUW,
their skills to try and answer questions that are not meaningful to decision-makers in pro sports. For example,
very little is interesting about the next new statistic that ranks all NBA players. General managers do not lose
sleep trying to figure out who the best player in the game is, as that information is neither accurate nor
actionable. Conversely, Mark Cuban, owner and general manager of the Dallas Mavericks, has often cited studies
that either predict or examine the effects of injuries as delivering useful and actionable information to his
coaching staff and team.
In other words, despite the remarkable growth in the amount and variety of data available of examination and
analysis, the world of sports analytics still faces the same ubiquitous challenge: How to get meaningful
information into the hands and minds of the people who are in a position to make effective use of it.
In Part 2, we examine some of the predictive models that are being used to create actionable information in the
world of sports today and the information systems that effectively deliver valuable information to decision-makers.

Benjamin Alamar (quantsports@gmail.com) is the founding editor of the Journal of Quantitative Analysis in Sports,
a professor of sports management at Menlo College and the director of Basketball Analytics and Research for the
Oklahoma City Thunder of the NBA. He is co-author of the annual Football Outsiders Almanac and a regular
contributor to the Wall Street Journal.
Vijay Mehrotra (vmehrotra@usfca.edu) is an associate professor, Department of Finance and Quantitative Analytics,
School of Business and Professional Studies, University of San Francisco. He is also an experienced analytics
consultant and entrepreneur, an angel investor in several successful analytics companies and a San Francisco
Giants season-ticket holder.

References
1. Lindsey, G. R. Statistical Data Useful for the Operation of a Baseball Team, Operations Research, Vol. 7, No.
2, March-April 1959, pp. 197-207.
2. www.amazon.com/Moneyball-Art-Winning-Unfair-Game/dp/0393057658.

KWWSZZZDQDO\WLFVPDJD]LQHRUJVSHFLDODUWLFOHVEH\RQGPRQH\EDOOWKHUDSLGO\HYROYLQJZRUOGRIVSRUWVDQDO\WLFVSDUWL"WPSO FRPSRQHQW SULQW  OD\