Вы находитесь на странице: 1из 16

UNIT 5

APPLICATIONS

1. APPLICATIONS OF DATA SCIENCE


2. TECHNOLOGIES FOR VISUALIZATION
3. BOKEH (PYTHON) RECENT TRENDS IN VARIOUS DATA COLLECTION AND
ANALYSIS TECHNIQUES
4. VARIOUS VISUALIZATION TECHNIQUES
5. APPLICATION DEVELOPMENT METHODS OF USED IN DATA SCIENCE.
1. APPLICATIONS OF DATA SCIENCE

Refer unit 1 notes

2. TECHNOLOGIES FOR VISUALIZATION

Data Visualization Technology

Visual analysis tools compress and store data in memory, providing sub-second response times for
any action taken against the data (such as filtering, drilling, calculating, sorting, and ranking).
Visually, analysts point and click to interact with charts, apply filters, and change views. For
instance, analysts can use their mouse to “lasso” data points in a certain section of a scatter plot to
create a new group and automatically filter other charts on the page. (See Figure 3.)

Compared to OLAP tools, visual analysis tools don’t require an IT person to design a dimensional
data model. The tools use a “load-and-go” approach in which analysts load raw data from multiple
sources and simply link tables along common keys to get a unified view of the data set. As a result,
most visual analysis tools can be deployed in a few hours or a few days or weeks, depending on
the number of data sources and their complexity and cleanliness.

Analysts or developers often use visual discovery tools to create and publish interactive,
departmental dashboards for casual users. They often create the dashboards on desktop machines
and then publish them to a departmental server for general consumption. When doing so, the
developers generally strip out some analytical functionality and options that might overwhelm
casual users.

Two Environments. It should be clear that visual reporting and visual analysis tools serve two
different audiences and purposes. While visual reporting tools are designed to visualize
performance against predefined metrics for executives and managers, visual analysis tools
empower business analysts to explore trends and anomalies in data sets they create and publish
views for others to consume.

Visualization Technology

Both types of visualization solutions leverage emerging technology to enhance the visual
experience of BI users. Here are key technologies driving the adoption of visualization in corporate
environments.

 64-bit systems and multi-core servers. Charting engines chew up a lot of CPU cycles,
especially if the charts are interactive. Rendering charts, especially in server-based
environments, takes a lot of horsepower. Today’s 64-bit platforms and multi-core
processors speed visual processing to give users more dynamic and interactive visual
environments in which to view data.
 RAM and compression. Many visualization tools work with in-memory data to ensure
speed-of-thought interactivity. With prices for RAM dropping, it’s easier for power users
to analyze large data sets (up to 50 million records) held in memory. New compression
techniques increase the amount of data that can be held in memory—but be cautious of
decompression performance penalties.
 Java applets/Active X controls. These mini-applications run inside a Web browser and
execute within a virtual machine or sandbox. Actions execute as fast as compiled code,
making them an easy way to recreate full-featured applications on the Web. However, they
raise security concerns, so many IT administrators prevent users from downloading such
controls through corporate firewalls, which limits their pervasiveness.
 DHTML and AJAX. A lighter-weight approach is to embed a scripting language inside
HTML pages, such as JavaScript, that executes functions in the browser. Dynamic HTML
(DHTML) uses scripting to animate a downloaded HTML page. For example, DHTML is
often used to animate drop-down boxes, radio buttons, mouseovers, and tickers, as well as
capture user inputs via forms. AJAX (asynchronous JavaScript and XML) takes this one
step further and retrieves new content from the server in the background without interfering
with the display and behavior of the page. Basically, AJAX enables users to add new data
to the dashboard without having to reload the entire page. It can also be used to pre-fetch
data, such as the next page of results.
 Flash. Another popular approach is to use multimedia development platforms, such as
Adobe Flash, Java applets, Microsoft Silverlight, and Mozilla Scalable Vector Graphics
(SVG), which add animation and movies to Web pages. Compared to Java scripting, these
plug-ins provide stunning graphics and animation for displaying quantitative information,
which makes the user interfaces very appealing to business users. They load both
visualizations and data simultaneously in a single file rather than dishing up dozens or
hundreds of pages. Although this makes the initial load slower than a comparable DHTML
or AJAX application, performance thereafter is exceptionally fast, since the data required
to display all components on a page resides locally.

Vendor Advancements. BI vendors have been scrambling to meet increasing demand for
visualization. For instance, Oracle’s release of Oracle Business Intelligence Enterprise Edition
(OBIEE) 11g in mid-2010 addressed visualization weaknesses in earlier releases, Oracle officials
said. Vendors such as MicroStrategy, ADVIZOR Solutions, and Tableau Software have recently
emphasized new in-memory capacity for greater scalability. SAS (with its JMP visualization
software) and DSPanel are among vendors incorporating the open-source R statistical
programming language to mix visualization and data mining.

Corda and Dundas, which both provide charting components and dashboard tools, have expanded
their tool sets to give developers greater flexibility. Microsoft is aiming to elevate Excel’s profile
for BI visualization with the 2010 release of PowerPivot, an add-on that helps Excel accommodate
large-scale data and extends its visualization capabilities, Microsoft officials said. Similarly,
PowerPivot can leverage new visualization capabilities available through SharePoint 2010
integration with Visio, they said.

Many of these innovations are aimed at untethering business users from a reliance on IT so they
can analyze data in a visual environment. “It’s an evolutionary thing,” said Doug Cogswell,
president and CEO of ADVIZOR Solutions. “We’re used to using BI to view reports or KPIs, and
now people want to move beyond reporting to visual analysis.”
3. BOKEH (PYTHON) RECENT TRENDS IN VARIOUS DATA COLLECTION AND
ANALYSIS TECHNIQUES

What is Bokeh?

Bokeh is a Python library for interactive visualization that targets web browsers for representation.
This is the core difference between Bokeh and other visualization libraries. Look at the snapshot
below, which explains the process flow of how Bokeh helps to present data to a web browser.

Source: Continuum Analytics

As you can see, Bokeh has multiple language bindings (Python, R, lua and Julia). These bindings
produce a JSON file, which works as an input for BokehJS (a Javascript library), which in turn
presents data to the modern web browsers.

Bokeh can produce elegant and interactive visualization like D3.js with high-performance
interactivity over very large or streaming datasets. Bokeh can help anyone who would like to
quickly and easily create interactive plots, dashboards, and data applications.
What does Bokeh offer to a data scientist like me?

I started my data science journey as a BI professional and then worked my way through predictive
modeling, data science and machine learning. I have primarily relied on tools like QlikView &
Tableau for data visualization and SAS & Python for predictive analytics & data science. I had
near zero experience of using JavaScript.

So, for all my data products or ideas, I had to either outsource the work or had to pitch my ideas
through wire-frames, both of which are not ideal for building quick prototypes. Now, with Bokeh,
I can continue to work in Python ecosystem, but still create these prototypes quickly.

Benefits of Bokeh:

 Bokeh allows you to build complex statistical plots quickly and through simple
commands
 Bokeh provides you output in various medium like html, notebook and server
 We can also embed Bokeh visualization to flask and django app
 Bokeh can transform visualization written in other libraries like matplotlib, seaborn,
ggplot
 Bokeh has flexibility for applying interaction, layouts and different styling option to
visualization

Challenges with Bokeh:

 Like with any upcoming open source library, Bokeh is undergoing a lot of development.
So, the code you write today may not be entirely reusable in future.
 It has relatively less visualization options, when compared to D3.js. Hence, it is unlikely
in near future that it will challenge D3.js for its crown.

Given the benefits and the challenges, it is currently ideal to rapidly develop prototypes. However,
if you want to create something for production environment, D3.js might still be your best bet.

To install Bokeh, please follow the instruction given here.

Visualization with Bokeh


Bokeh offers both powerful and flexible features which imparts simplicity and highly advanced
customization. It provides multiple visualization interfaces to the user as shown below:

 Charts: a high-level interface that is used to build complex statistical plots as quickly
and in a simplistic manner.
 Plotting: an intermediate-level interface that is centered around composing visual glyphs.
 Models: a low-level interface that provides the maximum flexibility to application
developers.

In this article, we will look at first two interfaces charts & plotting only. We will discuss models
and other advance feature of this library in next post.

Charts

As mentioned above, it is a high level interface used to present information in standard


visualization form. These forms include box plot, bar chart, area plot, heat map, donut chart and
many others. You can generate these plots just by passing data frames, numpy arrays and
dictionaries.

Let’s look at the common methodology to create a chart:

1. Import the library and functions/ methods


2. Prepare the data
3. Set the output mode (Notebook, Web Browser or Server)
4. Create chart with styling option (if required)
5. Visualize the chart

To understand these steps better, let me demonstrate these steps using example below:
Charts Example-1: Create a bar chart and visualize it on web browser using Bokeh

We will follow above listed steps to create a chart:

#Import library

from bokeh.charts import Bar, output_file, show #use output_notebook to visualize it in noteboo

# prepare data (dummy data)

data = {"y": [1, 2, 3, 4, 5]}

# Output to Line.HTML

output_file("lines.html", title="line plot example") #put output_notebook() for notebook

# create a new line chat with a title and axis labels

p = Bar(data, title="Line Chart Example", xlabel='x', ylabel='values', width=400, height=400)

# show the results

show(p)
In the chart above, you can see the tools at the top (zoom, resize, reset, wheel zoom) and these
tools allows you to interact with chart. You can also look at the multiple chart options (legend,
xlabel, ylabel, xgrid, width, height and many other) and various example of charts here.

Chart Example-2: Compare the distribution of sepal length and petal length of IRIS data set
using Box plot on notebook

To create this visualization, firstly, I’ll import the iris data set using sklearn library. Then, follow
the steps as discussed above to visualize chart in ipython notebook.

#IRIS Data Set

from sklearn.datasets import load_iris

import pandas as pd

iris = load_iris()
df=pd.DataFrame(iris.data)

df.columns=['petal_width','petal_length','sepal_width','sepal_length']

#Import library

from bokeh.charts import BoxPlot, output_notebook, show

data=df[['petal_length','sepal_length']]

# Output to Notebook

output_notebook()

# create a new line chat with a title and axis labels

p = BoxPlot(data, width=400, height=400)

# show the results

show(p)
5.APPLICATION DEVELOPMENT METHODS OF USED IN DATA SCIENCE.

Applications / Uses of Data Science


Using data science, companies have become intelligent enough to push & sell products as
per customers purchasing power & interest. Here’s how they are ruling our hearts and minds:

Internet Search
When we speak of search, we think ‘Google’. Right? But there are many other search engines
like Yahoo, Bing, Ask, AOL, Duckduckgo etc. All these search engines (including Google)
make use of data science algorithms to deliver the best result for our searched query in
fraction of seconds. Considering the fact that, Google processes more than 20 petabytes of
data everyday. Had there been no data science, Google wouldn’t have been the ‘Google’ we
know today.
Digital Advertisements (Targeted Advertising and re-targeting)
If you thought Search would have been the biggest application of data science and machine
learning, here is a challenger – the entire digital marketing spectrum. Starting from the display
banners on various websites to the digital bill boards at the airports – almost all of them are
decided by using data science algorithms.

This is the reason why digital ads have been able to get a lot higher CTR than traditional
advertisements. They can be targeted based on user’s past behaviour. This is the reason
why I see ads of analytics trainings while my friend sees ad of apparels in the same place at
the same time.

Recommender Systems
Who can forget the suggestions about similar products on Amazon? They not only help you
find relevant products from billions of products available with them, but also adds a lot to the
user experience.

A lot of companies have fervidly used this engine / system to promote their products /
suggestions in accordance with user’s interest and relevance of information. Internet giants
like Amazon, Twitter, Google Play, Netflix, Linkedin, imdb and many more uses this system
to improve user experience. The recommendations are made based on previous search
results for a user.

Image Recognition
You upload your image with friends on Facebook and you start getting suggestions to tag
your friends. This automatic tag suggestion feature uses face recognition algorithm. Similarly,
while using whatsapp web, you scan a barcode in your web browser using your mobile phone.
In addition, Google provides you the option to search for images by uploading them. It uses
image recognition and provides related search results.

Speech Recognition
Some of the best example of speech recognition products are Google Voice, Siri, Cortana
etc. Using speech recognition feature, even if you aren’t in a position to type a message, your
life wouldn’t stop. Simply speak out the message and it will be converted to text. However, at
times, you would realize, speech recognition doesn’t perform accurately.

Gaming
EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led gaming experience to the
next level using data science. Games are now designed using machine learning algorithms
which improve / upgrade themselves as the player moves up to a higher level. In motion
gaming also, your opponent (computer) analyzes your previous moves and accordingly
shapes up its game.

Price Comparison Websites


At a basic level, these websites are being driven by lots and lots of data which is fetched
using APIs and RSS Feeds. If you have ever used these websites, you would know, the
convenience of comparing the price of a product from multiple vendors at one place.
PriceGrabber, PriceRunner, Junglee, Shopzilla, DealTime are some examples of price
comparison websites. Now a days, price comparison website can be found in almost every
domain such as technology, hospitality, automobiles, durables, apparels etc.

Airline Route Planning


Airline Industry across the world is known to bear heavy losses. Except a few airline service
providers, companies are struggling to maintain their occupancy ratio and operating profits.
With high rise in air fuel prices and need to offer heavy discounts to customers has further
made the situation worse. It wasn’t for long when airlines companies started using data
science to identify the strategic areas of improvements. Now using data science, the airline
companies can:

1. Predict flight delay


2. Decide which class of airplanes to buy
3. Whether to directly land at the destination, or take a halt in between (For example: A
flight can have a direct route from New Delhi to New York. Alternatively, it can also
choose to halt in any country.)
4. Effectively drive customer loyalty programs

Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data
science to bring changes in their way of working.

Fraud and Risk Detection


One of the first applications of data science originated from Finance discipline. Companies
were fed up of bad debts and losses every year. However, they had a lot of data which use
to get collected during the initial paper work while sanctioning loans. They decided to bring in
data science practices in order to rescue them out of losses. Over the years, banking
companies learned to divide and conquer data via customer profiling, past expenditures and
other essential variables to analyze the probabilities of risk and default. Moreover, it also
helped them to push their banking products based on customer’s purchasing power.

Delivery logistics
Who says data science has limited applications? Logistic companies like DHL, FedEx, UPS,
Kuhne+Nagel have used data science to improve their operational efficiency. Using data
science, these companies have discovered the best routes to ship, the best suited time to
deliver, the best mode of transport to choose thus leading to cost efficiency, and many more
to mention. Further more, the data that these companies generate using the GPS installed,
provides them a lots of possibilities to explore using data science.
Miscellaneous
Apart from the applications mentioned above, data science is also used in Marketing,
Finance, Human Resources, Health Care, Government Policies and every possible industry
where data gets generated. Using data science, the marketing departments of companies
decide which products are best for Up selling and cross selling, based on the behavioral data
from customers. In addition, predicting the wallet share of a customer, which customer is
likely to churn, which customer should be pitched for high value product and many other
questions can be easily answered by data science. Finance (Credit Risk, Fraud), Human
Resources (which employees are most likely to leave, employees performance, decide
employees bonus) and many other tasks are easily accomplished using data science in these
disciplines.

Coming Up In Future
Though, not much has been reveled about them except the prototypes, and neither I know
when they would be available for a common man’s disposal. Hence, I’ve kept these amazing
application of data science in ‘Coming Up’ section. We need to wait and watch how far Google
can become successful in their self driving cars project. Robots, as we know, have lived for
a while but aren’t being used as a commodity yet due to associated security issues. Let’s see,
what our future holds for us!

Вам также может понравиться