Вы находитесь на странице: 1из 27

Power BI for Data Science

Integration and exploration capabilities

J AV I E R G U I L L E N

CHAR LOT TE BI GRO UP


CHAR LOT TE, N C - 2018
Power BI for Data Science exploration
Different mindset from traditional BI – going beyond slicing and dicing data

◦ In traditional BI, Power BI is a front end charting tool for pre-computed data

◦ In Data Science, Power BI can provide new insights on existing data –producing new knowledge that
becomes a source for reporting. To accomplish this, we use Big Data repositories and predictive model
integrations
Agenda
Provide overview of six key scenarios for Power BI integration with data science tools
Session is not a deep dive on math or coding concepts
Will focus mostly on data exploration efforts (in contrast to operationalization)
About me
Director, Data Engineering @ Syntelli Solutions
Adjunct Faculty – City University of New York, Data & Analytics program
SME Board Advisor – Central Piedmont Community College
Co-founder – Charlotte BI Group (Official Power BI Group)
Co-organizer – SQL Saturday
Out of the box Data Science Capabilities
in Power BI
✓Q&A and Q&A Explorer
✓Explain Increase / Decrease
✓Explore when distribution is different
✓New DAX functions:
✓ NORM.DIST, NORM.INV, NORM.S.DIST, NORM.S.INV, T.DIST, T.DIST.2T, T.DIST.RT, T.INV, T.INV.2T,
Correlation Coefficient (Quick measure)

✓Custom Visuals (Advanced Analytics)


✓Forecasting (Line Chart)
Scenario 1 : R Based data explorations
What is R?
R is a language and environment for statistical computing and graphics. It is one of the most
popular languages for wrangling data and developing predictive models.

How can I use it in Power BI?


R can be used in 3 ways in Power BI – when loading data, to implement data transformations, and
to visualize.

Does it work with the Power BI Service?


Yes, the Power BI service also comes with a wide variety of pre-installed R packages.
Loading data from R
To work with local R packages you can use your favorite R IDE :

To use a library in Power BI simply call the library(<package>) function


For packages installed in the cloud service see: https://docs.microsoft.com/en-us/power-bi/service-r-
packages-support
Data wrangling with R
R Scripts can be used in the Query Editor:

library(dplyr)
iris_mean <- summarize(group_by(iris, Species),
slength = mean(Sepal.Length), swidth = mean(Sepal.Width),
plength = mean(Petal.Length), pwidth = mean(Petal.Width))
Visualizing in R
Make sure to add fields into the R visual,
as an automated data frame is created with them
K means clustering, for example, can be easily integrated into your interactive analysis

Original Species Definition Predicted Species


R visuals - limitations
Data used by R visual is limited to 150,000 rows
Execution time cannot be more than 5 minutes (it will time out)
Cannot be source of cross-filtering interaction

See updates here:


https://docs.microsoft.com/en-us/power-bi/desktop-r-visuals#known-limitations
Scenario 2:
Using Hadoop for Clickstream Analysis
What is Hadoop?
• Hadoop is an open-source software framework for storing data and running applications on clusters of
commodity hardware.
What is Hive?
• Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data
query and analysis. It provides a framework for creating schemas on existing data.
What are Zeppelin notebooks?
• Apache Zeppelin is a multi-purposed web-based authoring tool which brings data ingestion, data
exploration, visualization, sharing and collaboration features to Hadoop and Spark
How can we use Hadoop/Hive in Power BI for exploratory reporting?
• We have use cache-optimized LLAP for interactive reporting on data with minimal processing
Sankey visual for tracing web behavior in Power BI over a Hive table
Scenario 3: Add interactivity to experiments
on Apache Spark
What is Apache Spark?
Apache Spark is a unified analytics engine for big data processing, with built-in modules for
streaming, SQL, machine learning and graph processing.

How can Power BI benefit Spark based analysis?


Power BI can connect to Spark clusters and enhance experiments by providing drilling and
interactive analysis to dimensional data that has been augmented with predictive output
On a Jupyter notebook we can, at any point, invoke the write method of a dataframe to expose it to
Power BI:
You can also select only a few columns of interest for exploration:

predictionsDf[['name','prediction']].write.saveAsTable(‘predictions’)
Scenario 4:
Interactive machine learning-based reporting
What is Azure Machine Learning?
Microsoft Azure Machine Learning Studio is a collaborative, drag-and-drop tool you can use to
build,test, and deploy predictive analytics solutions on your data.

How can we use them in Power BI?


Given Azure Machine Learning models can be exposed as web services, we can use R or Python
(combined with reporting slicers in Power BI) to call predictive models interactively.
For example, when exploring price elasticity we can use Power BI with AML to create a “what-if” tool
and interactively display predicted revenue at specified price points:
Scenario 5: Power BI in IoT Scenarios
What is Internet of Things (IoT)?
• The Internet of Things (IoT) is the network of physical devices, vehicles, home appliances, and
other items embedded with electronics, software, sensors, actuators, and connectivity which
enables these things to connect and exchange data, creating opportunities for analytics. For
example: predictive maintenance.

How do we use Power BI for IoT?


• Power BI can implement streaming scenarios by either consuming real-time datasets, or by
leveraging streaming technologies like Azure Stream Analytics.
Three types of real-time datasets we can enable in Power BI directly:
Power BI can be an endpoint for Stream Analytics-
based Architecture:
Scenario 6:
Leveraging Cognitive Automation
What is cognitive automation?
• Cognitive automation refers to AI techniques applied to using robotic approaches for emulating
humans over specific business processes.

What can we do with it in Power BI?


• Azure provides pre-trained models able to handle a variety of AI tasks. For, example speech
analysis and imagine recognition can be leveraged by Power BI for on-the-fly analysis of call
center or factory data.
In a call center environment, Voice of Customer (VOC) projects can deliver automated speech to text
and key phrase analytics. This can provide near real-time identification of issues that can inform
operational and marketing efforts.
A final note - Power BI Templates

End to end solutions that utilize various Azure engineering and data science components to
create “plug & play” experiences.
https://tinyurl.com/y7yz56br
Thank you!
Javier Guillen
Javier.guillen@syntelli.com
Twitter: @javiguillen

Вам также может понравиться