Big Data Insight

WEEK 3 BIG DATA AND DATA ANALYTICS

Outline

 o Finding Pattern / Insight from Data o Data Transformation

o Data Visualization

 o Looking for Correlation o The Future (According to Data) o Data Collection Workflow

Pattern

A pattern is a discernible regularity in the world or in a manmade design. As such, the elements of

a pattern repeat in a predictable manner. (Wikipedia)

Why Pattern Recognition is important:

 1 The ability to recognize and create patterns help us make predictions based on our observations 2 Patterns allow us to see relationships and develop generalizations 3 Allows someone to identify such patterns when they first appear 4 Patterns provide a sense of order in what might otherwise appear chaotic 5 Patterns allow someone to make educated guesses (hypothesis in science) 6 Understanding patterns aid in developing mental skills 7 Patterns can provide a clear understanding of mathematical relationships 8 Understanding patterns provide a clear basis for problem solving skills (eg. Algebra) 9 A knowledge of pattern can be transferred into many science fields where they prove very helpful 10 Patterns provide clear insight into the natural world

You don’t have to start with an outcome in mind when logging personal data. If you know how to ask the right questions of raw

data, you may find patterns you didn’t expect. The beauty of having

lots of different types of data about yourself is that you may uncover unexpected correlations. One of my favorite examples of an

unexpected correlation in personal data is Jewel Loree’s musical

cycles. She got her mood report from last.fm, visualizing the patterns in the music she listens to. She was interested in the pattern of peaks in sad/ low energy music.

FINDING PATTERN / INSIGHT

Making a Bayesian Model to Infer Uber Rider Destinations

It is never been easy / almost hard

often need machine / software help

DATA TRANSFORMATION

Definition: Data transformation is the process of converting data or information from one

format to another, usually from the format of a source system into the required format of a new destination system.

Objective: Simplifying the problem solutions, for example using adjacency matrix and graph (in

social network).

DATA VISUALIZATION

Data visualization or data visualisation is viewed by many disciplines as a modern equivalent of visual communication. It involves the creation and study of the visual representation of data, meaning

"information that has been abstracted in some schematic form, including attributes or variables for the units

of information“. (Wikipedia)

A primary goal of data visualization is to communicate information clearly and efficiently via statistical graphics, plots and information graphics.

Data visualization is both an art and a science. It is viewed as a branch of descriptive statistics by some, but also as a grounded theory development tool by others.

DATA VISUALIZATION MATTERS

Identify areas that need attention or improvement

Clarify which factors influence other factors such as in customer behavior

Predict future changes (example: sales volumes)

A good data visualization is a part of your story telling.

Peoples love storytelling, especially ones with pictures.

Data-driven stories.

DATA VISUALIZATION

The Beauty of Data Visualization - https://www.youtube.com/watch?v=5Zg-C8AAIGg

Looking for Correlation

You have two sampling variables (X and Y). Does the value of one variable depend on the other or are the variables random.

Correlation then determines the probability that the two variables are randomly correlated.

Correlations can be strong or weak.

Strong correlations are extremely useful in identifying root causes and/or what the most important variables are

Weak correlations open the way for tremendous ambiguity

If two variables are correlated, it means that one variable can be written as a function of the other.

Looking for Correlation

Example: Correlation Matrix

The Future (According to Data)

The forecasts are shown as a blue line, with the 80% prediction intervals as an gray shaded area, and the 95% prediction intervals as a light gray shaded area.

The Future (According to Data)

Predict Network Growth

NoSQL for Big Data

A NoSQL (originally referring to "non SQL" or "non relational") [1] database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

They hold and help manage the vast reservoirs of structured and unstructured data that make it possible to mine for insight with Big Data

Some of NOSQL type Classification :

We are talking about GRAPH DATABASE for the reason of simplicity

NEO4J (Graph Database)

Assignment (in The Class)

o Find a Case Study of Big Data Implementation / Application for Business or others

o State the objective, problems, solution idea (Week 1)

o State the methodology used (explain) (Week 2) o State the model, measurement, accuracy (Week 3)

Lab. Activities

Introduction to R

Module / Package Installation

Import / Export Data

Next weeks activities: