Вы находитесь на странице: 1из 28


A Case Study Approach

Sharjeel Imtiaz | PhD Data Science – last stage | University of East London, UK
Big Data Definition

• Any piece of information can be considered as data.

• This data can be in various forms and in various sizes. It can vary from
small data to very big Data.

• Any data that can reside in RAM or memory is considered as small data.
Small data is less than 10s of GBs.

• Any data that can reside in Hard Disk is considered as medium data.
Medium data is in the range of 10s to 1000s of GBs.

• Any data which cannot reside in Hard disk or in a single system is

considered as Big Data. Its size is more than 1000s of GBs.
How to process Big Data

To process and manage such huge volume

of data, different Big Data technologies
come into picture. It is a new data
challenge that requires leveraging
existing systems differently as some time
ago, data type and volume were not of the
type as it is today.
Big data reveals Shakespeare co-authored 17 of his
plays (A English Literature Domain- NLP)
• To process and manage such huge
volume of data, different Big Data
technologies come into picture.

• It is a new data challenge that

requires leveraging existing
systems differently as some time
ago, data type and volume were
not of the type as it is today.

• He said that it remains unclear

exactly how the authors worked We counted how often particular words and
together. It could have been that phrases appeared in texts by Shakespeare and
Marlowe wrote the texts and other authors of his day. These patterns were
Shakespeare later edited them. pretty unmistakable," researcher Gabriel Egan of
De Montfort University in Leicester told news
agency dpa.
Marlowe wrote the texts and Shakespeare later
edited them. (May be)
Sentiment Analysis with Topic Modeling –

• What are aspect of particular domain like Hotel have different aspect

Service, cleanliness , room, location, and 7 more ….

• A product in amazon is having many product features price, and other

• A stock market product is having many features and aspect based\On


• Disaster alerts data from tweeter is having aspects.

Sentiment analysis of
aspect in Review is
challenge and trend…..

.The ideal architecture connect devices to data

. Security IOT based devices that monitor all
Type of network security attacks

. IOT device that monitor the network traffic and

Avoid congestion with auto monitoring and
Avoidance mechanism

• Disaster monitoring and alert mechanism

for big data analytics)
Engineering Good Topics (Good for Oman)

The solar panel produces the

Current due to the hitting of the
photons from the sun on the
silicon atoms.
Or Layman
Thus in simple words, SUNLIGHT
AND CURRENT is produced

Example for 100 watts pannel, Voc=20V or 21V and Isc=5 A thus
P=100 watts (approx) ---( How to manage it efficiently)
Why Cloud

• Cloud process big data volume and process in parallel nodes

framework and provide Machine Learning library.

• It is not possible to process data in Machine learning model like Neural

network on single machine within few second or a minute

More features more

IOT TO BIG Data to Analytics

• Integrated IOT with big data is not the easy required domain knowledge

• Amazon Domain products data &

• category

• Tweets data

• Social media data

Tools SPSS is not for Big

data and not so good
WEKA, RMiner.
R + Python + RStudio +
Cloud based ML
What is your network architecture?
IOT of network security or health or
tourism devices with Big data
IOT Recommendation system for Oman Tourism
A Case study with process of Data Science and

Model Analysis
Data Collection with Hadoop

Step 1&2&3: Data Collection + Preparation

Data Sentiment + storage

Preparation Analysis

Storage &
Structure on Data Exploration
Hadoop (HDFS)
TRC funding Project – Trip advisor Web Site
scraping Tool

• TripAdvisor is the most well-known customer for reviewing

website for hotels and restaurants in Spain, although
Yelp.com is more popular on a global scale.
• TripAdvisor users can write reviews and post scores from 1
(“terrible”) to 5 (“excellent”) following a number of criteria.
• Webharvy
WebHarvy ---

• It scrap the reviews but required cleaning

and preparation of data together.

• Put on local disk and google cloud

Model Analysis
Data Collection with Hadoop

Data Sentiment
Preparation Analysis

Storage &
• Step 4: Data Exploration Analysis
Structure on Data Exploration
Hadoop (HDFS)
Explore the most Important terms

It is topic modeling:

• Which term is more frequently

coming in review

• Well, we finalize following


• Room, good, downtown, stay,

service, clean, room, pool, and

• Remember we did stemming, Room, location, value,

punctuation, white space, and
many ore preprocessing step service
before that
Model Analysis
Data Collection with Hadoop

• Step 5: sentiment analysis

Data Sentiment
Preparation Analysis

Storage &
Structure on Data Exploration
Hadoop (HDFS)
IS THIS ENOGH ? Answer is NO

• Take a list of positive and negative words

Positive Negative I had a fantastic time on
4- 1= 3
holiday at your resort. The
service was excellent and
friendly. My family all really
Fantastic Rubbish
enjoyed themselves.
Overall sentiment:
Excellent Sucked
Friendly Awful The pool was closed, which
Positive Terrible kind of sucked though.
Enjoyed Bogus
Sentiment Analysis with Neural Network
indeed it is good choice rather dictionary based
Dashboard – Sentiment Analysis of Dubai vs UK

• It is sentiment analysis

• Which give score of positive or


• We evaluate the score of aspects

• Location , service, value, room

• We conclude that UK luxury

customers are more focusing on
location and service on the other
hand Dubai people focus on value
and room. Which Is vital fact for
GULF to re-consider.
Analyze aspect terms accuracy 95.81%

• It is K-Mean analysis

• Created clusters or groups for

aspects Location , service, value

• We concluded that our top terms

are better candidate for sentiment
analysis otherwise go again and

Find out other terms

Hadoop processing ML Model (Big Data)

• Configure Hadoop 2.8.3 version

• After starting Hadoop services HDFS

• We analyze k-mean in MAP-REDUCE to

process efficiently.
Other ML Model and IOT Integration

• Plan to integrate with IOT.

• Next will explore lights , sound and lock,

music player services of room because
people in GULF prefer room ambiance as
we analyzed.

• IOT devices in summer with dashboard

reflect the usefulness and effectiveness of
big data in tourism with more than 50
recommendation in our final report.


• We left 2 months to explore more Months ( Academic year 2017-2018)

• 50 plus Recommendation From September to

From January to April
From May
to June
• Visualizing reports Sept Oct Nov Dec Jan April May June
Feb March
201 201 201 201 201 201 201 201
2018 2018
• Finally more data on cloud to 7 7 7 7 8 8 8 8

Literature review
• Convince ministry of tourism about Data Collection & Scraping  
effectiveness of our project in the domain
of IOT + BIG. Data Preparation    

Data Insight    
• Google cloud will show the future plan
Data Exploration 

• And large scale storage benefits for Data Model    

Ministry Hotels data around the world.
Software Building   

Reports    
Thank you! Question start what I will do.. Not what you
Works Cited

• https://data.gov.uk/dataset/road-accidents-safety-data

• https://www.thebalance.com/what-is-crowdsourcing-marketing-and-how-is-it-

• http://www.shu.edu/technology/

• http://archive.ics.uci.edu/ml/datasets.html?sort=nameUp&view=list

• http://www.dw.com/en/big-data-reveals-shakespeare-co-authored-17-of-his-