Вы находитесь на странице: 1из 6

Syllabus: Data Analytics Spring 2017

Professors: Peter Fox and Greg N. Hughes

Topics: Predictive Analytics, Big Data, Data Science, Analytics, Data Visualization

Course Numbers:

38740, 38741

Description:

Data and Information analytics extends analysis (descriptive and predictive models to obtain

knowledge from data) by using insight from analyses to recommend action or to guide and

communicate decision-making. Thus, analytics is not so much concerned with individual analyses or

analysis steps, but with an entire methodology. The world at-large is confronted with increasingly

larger and complex sets of structured/unstructured information; from sensors, instruments, and

generated by computer simulations; data is "hidden" in websites, application servers, social networks

and on mobile devices. As a nation, assimilating information across disparate domains (e.g.,

intelligence, economics, science) has the potential to provide improved capabilities for decision

makers. In commerce and industry, analytics-driven enterprises are becoming mainstream. Yet, there

is a shortfall in the key education skills needed to meet the growing needs. Traditional enterprises are

moving toward analytics-driven approaches for core business functions. In the government and

corporations, cybersecurity problems are prevalent. The investment in advanced analytics capabilities

could potentially be more broadly leveraged today and greater than any prior government investments

in computing. Emphasis is now placed on disruptive data and information sources on the Web and

Internet: using Web Science and informatics to explore social networks, platform competition, the

"long tail" and economic or resource impacts of the search for new findings. Key topics include:

advanced statistical computing theory, multivariate analysis, and application of computer science

courses such as data mining and machine learning and change detection by uncovering unexpected

patterns in data.

Introduce students to relevant methods to recognize and apply quantitative algorithms,

techniques and interpretation

To develop students' strategic thinking skills, combined with a solid technical foundation in

data and model-driven decision-making.

Develop ability to apply critical and analytical methods to formulate and solve science,

engineering, medical, and business problems

Students will examine real-world examples using modern cyberinfrastructure to place

statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that

proper application is as much an art as it is a science.


By the end of the course, students can effectively communicate analytic findings to non-

specialists

Class Listing: ITWS 4600/ITWS 6600

Instructor: Professor Peter Fox and Professor Greg Hughes

TA: Dave Ward - wardd4 at rpi dot edu

Meeting times: MR 2-3:50

Class Location: LALLY HALL 102

Office Hours: Monday 1-2pm Winslow 2120 or by appointment in Lally 207A

phone: x4862

TA Office Hours: by appointment

Syllabus/ Calendar
Refer to Reading/ Assignment/ Reference list for each week (see below).

Reference material (available through RPI library - RCS login required):

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (online)

(RECOMMENDED)

Big data analytics : turning big data into big money

Big Data Analytics : Turning Big Data into Big Money (online)

Big Data Analytics : From Strategic Planning to Enterprise Integration with Tools, Techniques,

NoSQL, and Graph (online)

Big Data Analytics with R and Hadoop (online)

R for Everyone: Advanced Analytics and Graphics (online)

Group 1 - Intro/ Setup

Week 1 (Jan. 19): Introduction to Course, Case Studies, and Preview of Course Material Week

1 Thursday slides [Download], Introduction/ refresher on basic statistics Week 1 Thursday slides

[Download], Assignment 1 [Download]

Week 2 (Jan. 23/26): Starting with Data and Information Resources, Role of Hypothesis,

Synthesis and Model Choices Week 2 video recording view video here: TBD, R/ RStudio bootcamp Week 2

Thursday slides
Week 3 (Jan. 30/Feb 2): , Data filtering, hypothesis exploration, visual analysis, model

consideration and assessment (lab) Week 3 Thursday slides

(Lab) Assignment 2

Group 2 - Patterns, relations, descriptive analytics

Week 4 (Feb. 6/9): Preliminary Analysis, Interpretation, Detailed Analysis, Introduction to

Analytic Methods, Types of Data Mining for Analytics Week 4 Monday slides, (lab) Week 4 Thursday slides

Assignment 3

Week 5 (Feb. 13/16): Weighted kNN, Clustering, early decision trees and Bayesian Inference

Week 5 Monday slides, Exercises for linear regression, kNN and K-means (lab) Week 5 Thursday slides

Assignment 4

Assignment 5

Week 6 (Feb. 21/23):More Clustering and Bayesian Inference Week 6 Tuesday slides (lab), lab

for knn and kmeans on a dirty dataset Week 6 Thursday slides

Week 7 (Feb. 27/Mar. 2): Interpreting, Regression, Classification, Clustering and Bayesian

Inference Week 7 Monday slides

, lab for Regression, Classification, Clustering and Bayesian Inference Week 6 Thursday slides

Assignment 6

Group 3 - Predictive Analytics

Week 8 (Mar. 6/9): Decision trees, cross-validation Week 8 Monday lab slides, Lab for decision

trees, cross-validation Week 8 Thursday slides

Mar. 13/17 - no classes - Spring Break

Week 9 (Mar. 20/24): Dimension reduction and scaling, Support Vector Machines Week 9

Monday slides, Lab for DR, MDS, SVM Week 9 Thursday lab

Week 10 (Mar. 27/30): Factor Analysis Week 10 Monday slides SVM, Dimension Reduction,

MDS, Factor Analysis lab Week 10 Thursday slides

Assignment 7
Week 11 (Apr. 3/6): Interpreting PCA, MDS, DR, and FA, Boosting, Bootstrapping, Bagging

Week 11 Monday slides Boosting, Bootstrapping, Bagging (lab) Week 11 Thursday slides

Group 4 - Evaluating and validating, prescriptive analytics

Week 12 (Apr. 10/13): Cross-validation, Revisiting Regression - local methods, Week 12

Monday slides Lab - Cross-validation, Regression - local methods and continue project and assignment

work Week 12 Thursday slides

Week 13 (Apr. 17/Apr 20): Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer

Linear Discriminant) Week 13 Monday slides Open Lab and continue project and assignment work -

Assignment 7 due (no slides)

Week 14 (Apr. 24/ 27): Final Project Presentations

Reading/ Assignment/ Reference List (see above)


Class 1: Reading Assignment:

Sports Analytics Moneyball (http://www.imdb.com/title/tt1210166/),

Nate Silver (http://en.wikipedia.org/wiki/Nate_Silver)

http://www.slideshare.net/lsakoda/case-studies-utilizing-real-time-data-...

http://www.marketquotient.com/case-studies.html

http://www.ibm.com/analytics/us/en/case-studies/

Class 2 Reading Assignment: prior to Thursday class

http://www.r-tutor.com/r-introduction/data-frame

http://www.r-tutor.com/r-introduction/

Class 3 Reading Assignment: prior to Monday class

http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)

http://en.wikipedia.org/wiki/Regression_analysis

http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

http://varianceexplained.org/r/kmeans-free-lunch/

http://en.wikipedia.org/wiki/K-means_clustering

Classes 4-6 Reading Assignment: none


Class 7 Reading Assignment:

http://stat-www.berkeley.edu/users/breiman/RandomForests/ Random Forests

Class 8 Reading Assignment: none

Class 9 Reading Assignment:

http://escience.rpi.edu/data/DA/v15i09.pdf Karatzoglou et al. 2006

http://escience.rpi.edu/data/DA/svmbasic_notes.pdf Vert SVM basic

http://www.stjuderesearch.org/site/data/ALL1/ ALL dataset

http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html RSVM

http://data-informed.com/focus-predictive-analytics/ /li>

Classes 10-13 Reading Assignment: None

Course goals:
Introduce students to relevant methods to recognize and apply quantitative algorithms,

techniques and interpretation

To develop students' strategic thinking skills, combined with a solid technical foundation in

data and model-driven decision-making.

Develop ability to apply critical and analytical methods to formulate and solve science,

engineering, medical, and business problems

Students will examine real-world examples using modern cyberinfrastructure to place

statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that

proper application is as much an art as it is a science.

By the end of the course, students can effectively communicate analytic findings to non-

specialists

Course Learning Objectives:


Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply

quantitative algorithms, techniques and interpret results

Students to demonstrate strategic thinking skills, combined with a solid technical foundation in

data and model-driven decision-making.


Students to develop ability to apply critical and analytical methods to formulate and solve

science, engineering, medical, and business problems

Students will examine real-world examples to place data-mining techniques in context, to

develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.

Students must effectively communicate analytic findings to non-specialists.

[graduate level]

Students must develop and demonstrate a working knowledge of decision making under uncertainty, be

able to build optimization models that incorporate random parameters: static stochastic optimization, two-

stage optimization with recourse, chance-constrained optimization, and sequential decision making.