Вы находитесь на странице: 1из 28

BIG + IOT FOR IDEAS

A Case Study Approach


Sharjeel Imtiaz | PhD Data Science – last stage | University of East London, UK
Big Data Definition

• Any piece of information can be considered as data.

• This data can be in various forms and in various sizes. It can vary from
small data to very big Data.

• Any data that can reside in RAM or memory is considered as small data.
Small data is less than 10s of GBs.

• Any data that can reside in Hard Disk is considered as medium data.
Medium data is in the range of 10s to 1000s of GBs.

• Any data which cannot reside in Hard disk or in a single system is


considered as Big Data. Its size is more than 1000s of GBs.
Trends?
How to process Big Data

To process and manage such huge volume


of data, different Big Data technologies
come into picture. It is a new data
challenge that requires leveraging
existing systems differently as some time
ago, data type and volume were not of the
type as it is today.
Big data reveals Shakespeare co-authored 17 of his
plays (A English Literature Domain- NLP)
• To process and manage such huge
volume of data, different Big Data
technologies come into picture.

• It is a new data challenge that


requires leveraging existing
systems differently as some time
ago, data type and volume were
not of the type as it is today.

• He said that it remains unclear


exactly how the authors worked We counted how often particular words and
together. It could have been that phrases appeared in texts by Shakespeare and
Marlowe wrote the texts and other authors of his day. These patterns were
Shakespeare later edited them. pretty unmistakable," researcher Gabriel Egan of
De Montfort University in Leicester told news
agency dpa.
Marlowe wrote the texts and Shakespeare later
edited them. (May be)
Sentiment Analysis with Topic Modeling –
(A COMPUTER SCIENCE DOMAIN)

• What are aspect of particular domain like Hotel have different aspect

Service, cleanliness , room, location, and 7 more ….

• A product in amazon is having many product features price, and other

• A stock market product is having many features and aspect based\On


tweeter

• Disaster alerts data from tweeter is having aspects.


Sentiment analysis of
aspect in Review is
challenge and trend…..
IOT WITH BIG DATA INFRASTRUCTURE

.The ideal architecture connect devices to data


Analytics
. Security IOT based devices that monitor all
Type of network security attacks

. IOT device that monitor the network traffic and


Avoid congestion with auto monitoring and
Avoidance mechanism

• Disaster monitoring and alert mechanism


for big data analytics)
Engineering Good Topics (Good for Oman)

The solar panel produces the


Current due to the hitting of the
photons from the sun on the
silicon atoms.
Or Layman
Thus in simple words, SUNLIGHT
FALLS ON THE SOLAR PANEL
AND CURRENT is produced

Example for 100 watts pannel, Voc=20V or 21V and Isc=5 A thus
P=100 watts (approx) ---( How to manage it efficiently)
Why Cloud

• Cloud process big data volume and process in parallel nodes


framework and provide Machine Learning library.

• It is not possible to process data in Machine learning model like Neural


network on single machine within few second or a minute

More features more


processing
IOT TO BIG Data to Analytics

• Integrated IOT with big data is not the easy required domain knowledge

• Amazon Domain products data &

• category

• Tweets data

• Social media data

Tools SPSS is not for Big


data and not so good
WEKA, RMiner.
R + Python + RStudio +
Cloud based ML
What is your network architecture?
Architecture
IOT of network security or health or
tourism devices with Big data
BIG DATA WITH IOT IS TREND
OUR IOT RECOMMENDATION SYSTEM FOR
OMAN TOURISM (IOT + BIG DATA)
IOT Recommendation system for Oman Tourism
A Case study with process of Data Science and
Computations

Model Analysis
Data Collection with Hadoop

Step 1&2&3: Data Collection + Preparation

Data Sentiment + storage


Preparation Analysis

Storage &
Structure on Data Exploration
Hadoop (HDFS)
TRC funding Project – Trip advisor Web Site
scraping Tool

• TripAdvisor is the most well-known customer for reviewing


website for hotels and restaurants in Spain, although
Yelp.com is more popular on a global scale.
• TripAdvisor users can write reviews and post scores from 1
(“terrible”) to 5 (“excellent”) following a number of criteria.
• Webharvy
WebHarvy ---

• It scrap the reviews but required cleaning


and preparation of data together.

• Put on local disk and google cloud


Model Analysis
Data Collection with Hadoop

Data Sentiment
Preparation Analysis

Storage &
• Step 4: Data Exploration Analysis
Structure on Data Exploration
Hadoop (HDFS)
Explore the most Important terms

It is topic modeling:

• Which term is more frequently


coming in review

• Well, we finalize following


words/terms

• Room, good, downtown, stay,


service, clean, room, pool, and
more

• Remember we did stemming, Room, location, value,


punctuation, white space, and
many ore preprocessing step service
before that
Model Analysis
Data Collection with Hadoop

• Step 5: sentiment analysis

Data Sentiment
Preparation Analysis

Storage &
Structure on Data Exploration
Hadoop (HDFS)
IS THIS ENOGH ? Answer is NO

• Take a list of positive and negative words


Positive Negative I had a fantastic time on
Good
Great
4- 1= 3
Bad
Worse
holiday at your resort. The
service was excellent and
friendly. My family all really
Fantastic Rubbish
enjoyed themselves.
Overall sentiment:
Excellent Sucked
Friendly Awful The pool was closed, which
Awesome
Positive Terrible kind of sucked though.
Enjoyed Bogus
Sentiment Analysis with Neural Network
indeed it is good choice rather dictionary based
appraoch!
Dashboard – Sentiment Analysis of Dubai vs UK
hotel

• It is sentiment analysis

• Which give score of positive or


negative

• We evaluate the score of aspects

• Location , service, value, room

• We conclude that UK luxury


customers are more focusing on
location and service on the other
hand Dubai people focus on value
and room. Which Is vital fact for
GULF to re-consider.
Analyze aspect terms accuracy 95.81%

• It is K-Mean analysis

• Created clusters or groups for


aspects Location , service, value

• We concluded that our top terms


are better candidate for sentiment
analysis otherwise go again and

Find out other terms


Hadoop processing ML Model (Big Data)

• Configure Hadoop 2.8.3 version

• After starting Hadoop services HDFS

• We analyze k-mean in MAP-REDUCE to


process efficiently.
Other ML Model and IOT Integration

• Plan to integrate with IOT.

• Next will explore lights , sound and lock,


music player services of room because
people in GULF prefer room ambiance as
we analyzed.

• IOT devices in summer with dashboard


reflect the usefulness and effectiveness of
big data in tourism with more than 50
recommendation in our final report.


Plan

• We left 2 months to explore more Months ( Academic year 2017-2018)

• 50 plus Recommendation From September to


December
From January to April
From May
to June
Activities
• Visualizing reports Sept Oct Nov Dec Jan April May June
Feb March
201 201 201 201 201 201 201 201
2018 2018
• Finally more data on cloud to 7 7 7 7 8 8 8 8

Literature review
• Convince ministry of tourism about Data Collection & Scraping  
effectiveness of our project in the domain
of IOT + BIG. Data Preparation    

Data Insight    
• Google cloud will show the future plan
Data Exploration 

• And large scale storage benefits for Data Model    


Ministry Hotels data around the world.
Software Building   

Reports    
Thank you! Question start what I will do.. Not what you
did.
Works Cited

• https://data.gov.uk/dataset/road-accidents-safety-data

• https://www.thebalance.com/what-is-crowdsourcing-marketing-and-how-is-it-
used-2295467

• http://www.shu.edu/technology/

• http://archive.ics.uci.edu/ml/datasets.html?sort=nameUp&view=list

• http://www.dw.com/en/big-data-reveals-shakespeare-co-authored-17-of-his-
plays/a-36145979

Оценить