SAP HANA Vora Installation Admin Guide en

PUBLIC
SAP HANA Vora 1.2

Document Version: 1.1 2016-05-09
SAP HANA Vora Installation and Administration

Guide
Content
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1
SAP HANA Vora and Apache Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2
Related Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1
SAP HANA Vora Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2
SAP HANA Vora Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3
Installation Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Hadoop Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Cluster Provisioning Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Operating Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Supported Platforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Cluster Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Required Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
DLog Server Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4
Installation and Bootstrapping Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5
Collect Hadoop Cluster Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6
Installing SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Install SAP HANA Vora Using Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Install SAP HANA Vora Using Cloudera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Installing SAP HANA Vora for MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7
Validate the SAP HANA Vora Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8
Install the SAP HANA Vora Zeppelin Interpreter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
2.9
Connect SAP HANA Spark Controller to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.10
Connect SAP Lumira to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.11
Updating SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Update SAP HANA Vora Using Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Update SAP HANA Vora Using Cloudera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Update SAP HANA Vora for MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.12
SAP HANA Vora Default Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1
Configure Proxy Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2
Enable Spark Auto-registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3
Start and Stop SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
3.4
Best Practices: Administration and Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
SAP HANA Vora Installation and Administration Guide

Content
HDFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Choosing a Cluster Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Example Cluster Configuration Including a Client Machine (Jump Box). . . . . . . . . . . . . . . . . . . . 55
4
Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1
Technical System Landscape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Content
PUBLIC
Introduction
SAP HANA Vora provides an in-memory processing engine that is integrated into the Hadoop ecosystem and
Spark execution framework. Able to scale to thousands of nodes, it is designed for use in large distributed
clusters and for handling big data.
Fast Query Execution

The SAP HANA Vora processing engine holds data in memory and boosts the execution performance of Spark.
Supporting just-in-time code compilation, it translates incoming SQL queries into machine-level code on the
fly using a LLVM compiler, enabling them to be executed quickly and efficiently.
Data Analytics
SAP HANA Vora makes available OLAP-style capabilities for data on Hadoop, in particular, a hierarchy
implementation that allows hierarchical data structures to be defined and complex computations performed
on different levels of data. Extensions to Spark SQL also include enhancements to the data source API to
enable Spark SQL queries or parts of the queries to be pushed down to the SAP HANA Vora processing engine.
SAP HANA Integration

Data processing between the SAP HANA and Hadoop environments allows data in SAP HANA to be combined
with big data stored in Hadoop systems and processed in Spark or SAP HANA applications.
1.1
SAP HANA Vora and Apache Hadoop
The SAP HANA Vora solution is built on the Hadoop ecosystem, an open-source project providing a collection
of components that support distributed processing of large data sets across a cluster of machines. Hadoop
PUBLIC

Introduction
allows both structured as well as complex, unstructured data to be stored, accessed, and analyzed across the
cluster.
The main components used in this environment are shown in the figure below:
Component
Description
Ambari
An open operational framework for provisioning,

Apache Ambari
managing and monitoring Apache Hadoop clusters.
Cloudera
Cloudera Manager - Cloudera's automated cluster

management tool.
Cloudera
HDFS
The Hadoop Distributed File System.
HDFS Users Guide
Zookeeper
A centralized service for maintaining configuration

information and naming, and for providing distrib
uted synchronization and group services.
Apache ZooKeeper
Yarn
Hadoops resource manager and job scheduler.
Apache Hadoop YARN
HBase
The Hadoop database.
Apache HBase
Pig
A high-level data-flow language and execution

framework for parallel computation.
Apache Pig
Spark SQL
A module for structured and semi-structured data

processing.
Spark SQL and DataFrame Guide
Apache Hive
A data warehouse infrastructure supporting data

summarization, query, and analysis.
Apache Hive
MLib
A machine learning tool that runs on Spark.
Machine Learning Library (MLlib) Guide

Introduction
More Information
PUBLIC
1.2
Related Information
In addition to this document, please refer to the following resources:

Resource
Details
Release note for SAP HANA Vora
http://service.sap.com/sap/support/notes/2284507
SAP Software Download Center
https://support.sap.com/swdc
SAP HANA Vora on SAP Help Portal
http://help.sap.com/hana_vora
SAP HANA Vora on SCN (SAP Community Net
http://scn.sap.com/blogs/vora/
work)
SAP HANA Vora troubleshooting information
http://scn.sap.com/blogs/vora/2015/12/09/sap-hana-vora--trouble
shooting
SAP HANA Vora support component
HAN-VO: SAP HANA Vora
PUBLIC

Introduction
Installation
To install SAP HANA Vora, first familiarize yourself with the components it contains and the installation
package you require. Review the installation prerequisites to ensure a properly configured cluster and then
download and install the SAP HANA Vora package.
Complete the individual tasks in the following order:
Task
See
Understand what components make up the SAP HANA Vora SAP HANA Vora Components [page 8]
system
Find out which package is required to install SAP HANA
Vora and where it is available
SAP HANA Vora Packages [page 9]
Ensure your Hadoop cluster is correctly set up and meets

the installation requirements for SAP HANA Vora
Installation Prerequisites [page 9]
Check the overview to see how and where SAP HANA Vora
components should be deployed
Installation and Bootstrapping Guidelines [page 13]
Collect and document essential information about your Ha

doop cluster
Collect Hadoop Cluster Information [page 16]
Download and install the package containing the SAP HANA

Vora engine and SAP HANA Vora Spark extension library
Installing SAP HANA Vora [page 16]
Validate the installation
Validate the SAP HANA Vora Installation [page 33]
Optionally enable the Zeppelin interpreter if you want to use

Zeppelin (an interactive data analytics tool)
Install the SAP HANA Vora Zeppelin Interpreter [page 35]
Note that Zeppelin is still in the incubation phase.

Set up the Spark controller if you want to query tables ac
cessible through Spark from SAP HANA
Connect SAP HANA Spark Controller to SAP HANA Vora

[page 38]
Connect SAP Lumira if you want to visualize SAP HANA

Vora data in SAP Lumira
Connect SAP Lumira to SAP HANA Vora [page 40]
Update your SAP HANA Vora installation with the latest ver
sions of the installation packages
Updating SAP HANA Vora [page 44]
Related Information
SAP HANA Vora Default Ports [page 47]
SAP HANA Vora Troubleshooting Information (SCN)

Installation
PUBLIC
2.1
SAP HANA Vora Components
The SAP HANA Vora system consists of two main components, the SAP HANA Vora engine, which needs to be
installed on all compute nodes in the cluster, and the SAP HANA Vora Spark extension library, which provides
access to the SAP HANA Vora engine and its functional features.
SAP HANA Vora Engine

The SAP HANA Vora SQL engine is a service that you add to your existing Hadoop installation. SAP HANA Vora
instances hold data in memory and boost the performance of out-of-the box Spark. To increase execution
performance on the node level, you add an SAP HANA Vora instance to each compute node so that it contains
the following:
A Spark worker
An SAP HANA Vora engine
The integration of the SAP HANA Vora engine with Spark is shown in the overview below:
SAP HANA Vora Spark Extension Library

The SAP HANA Vora extension library allows SAP HANA Vora to be accessed through Spark. It also makes
available additional functionality, such as a hierarchy implementation, which allows you to build hierarchies
and run hierarchical queries.
PUBLIC

Installation
Related Information
SAP HANA Vora Packages [page 9]
2.2
SAP HANA Vora Packages
To install the SAP HANA Vora system, you require a package containing the SAP HANA Vora engine and SAP
HANA Vora Spark extension library.
Separate installation packages are provided specifically for each of the cluster provisioning tools. This allows
the Ambari and Cloudera cluster provisioning tools to be used to install the SAP HANA Vora components on
the cluster. For MapR this is currently a manual installation.
The installation packages are as follows:
SAP HANA Vora for Ambari: VORA_AM<version>.TGZ
SAP HANA Vora for Cloudera: VORA_CL<version>.TGZ
SAP HANA Vora for MapR: VORA_MR<VERSION>.TGZ
The SAP HANA Vora Spark extension library contained in the packages consists of a JAR file (spark-sapdatasources-<VERSION>-assembly.jar) with all necessary dependencies and a number of shell scripts
for using the SAP HANA Vora extension through Spark.
The packages can be downloaded from the SAP Software Download Center: https://support.sap.com/swdc
2.3
Installation Prerequisites
A Hadoop cluster is a prerequisite for installing SAP HANA Vora. Review the installation requirements to
ensure that the cluster you use is correctly set up.
Installation Prerequisite Checklist
Hadoop Distributions [page 10]

Cluster Provisioning Tools [page 10]
Operating Systems [page 10]
Supported Platforms [page 11]
Cluster Sizing [page 11]

Installation
PUBLIC
Required Components [page 12]

DLog Server Requirements [page 12]
Validation [page 13]
2.3.1 Hadoop Distributions

SAP HANA Vora can only be used with selected Hadoop distributions:
Hortonworks Data Platform (HDP)
Cloudera Enterprise (CDH)
MapR
2.3.2 Cluster Provisioning Tools

The cluster must be managed by one of the following cluster provisioning tools:
Apache Ambari 2.2
Cloudera Manager 5.5 or 5.6
MapR Control System (MCS) 5.1
2.3.3 Operating Systems

The following operating systems are supported:
SUSE Linux Enterprise Server (SLES) 11 SP3 (see compatibility pack details below)
Red Hat Enterprise Linux (RHEL) 6.7 (see compatibility pack details below) and 7.2
Compatibility packs are required as follows:
Operating System
Compatibility Pack
SLES 11 SP3
You need to install the RPM packages libgcc_s1 and libstdc++6.

Ensure that the versions are not earlier than the following (earlier versions cause problems
during runtime due to improper exception handling):
libgcc_s1-4.7.2_20130108-0.17.2
libstdc++6-4.7.2_20130108-0.17.2
Install the RPM packages as follows, if they are not already installed by default:
# zypper install libgcc_s1 libstdc++6
10
PUBLIC

Installation
Operating System
Compatibility Pack
RHEL 6.7
To run SAP HANA Vora on RHEL 6.7, an additional runtime environment for GCC 4.7 is re
quired, which you can add by installing the RPM package compat-sap-c++ (see also SAP
Note 2001528
).
To be able to access the library, you need a subscription for "Red Hat Enterprise Linux Server
for SAP HANA". This allows you to subscribe your server to the "RHEL Server SAP HANA"
channel on the Red Hat Customer Portal or your local Satellite server. After you have subscri
bed your server to the channel, the output of yum repolist should contain the following:
rhel-x86_64-server-sap-hana-6 RHEL Server SAP HANA (v. 6 for

64-bit x86_64)
You can then install the GCC 4.7 libstdc++ library with the following command:
# yum install compat-sap-c++
For an up-to-date list of supported operating systems, see SAP Note 2284507
2.3.4 Supported Platforms

The following combinations of operating system, cluster provisioning tool, and Hadoop distribution are
supported:
Operating System
Cluster Provisioning Tool
Hadoop Distribution
Hadoop
SLES 11 SP3
Ambari 2.2
HDP 2.3
Hadoop 2.7.1
SLES 11 SP3
Cloudera 5.5/5.6
CDH 5.5/5.6
Hadoop 2.6.0
RHEL 7.2
Ambari 2.2
HDP 2.3
Hadoop 2.7.1
RHEL 6.7
Ambari 2.2
HDP 2.3
Hadoop 2.7.1
RHEL 6.7
Cloudera 5.5/5.6
CDH 5.5/5.6
Hadoop 2.6.0
RHEL 7.2
MapR Control System 5.1
MapR 5.1
Hadoop 2.7.0
RHEL 6.7
MapR Control System 5.1
MapR 5.1
Hadoop 2.7.0
2.3.5 Cluster Sizing

To enable efficient cluster computation using the SAP HANA Vora extension, the cluster nodes should have at
least the following:
4 cores
8 GB of RAM
20 GB of free disk space for HDFS data

Installation
PUBLIC
11
2.3.6 Required Components

The following components are required on the cluster:
Component
More Information
HDFS 2.6.x or 2.7.1
https://hadoop.apache.org/docs/stable/
ZooKeeper 3.4.6
http://zookeeper.apache.org/releases.html
Spark 1.5.2
https://spark.apache.org/releases/spark-release-1-5-2.html
Yarn cluster manager 2.7.1
https://spark.apache.org/docs/latest/running-on-yarn.html
Zeppelin v0.5.6
Optional allows you to use the Zeppelin integration. Note that Zeppelin is still in
the incubation phase: https://zeppelin.incubator.apache.org/
2.3.7 DLog Server Requirements

The SAP HANA Vora Distributed Log (DLog) component requires the RPM package libaio to be installed on
the target machine and the file descriptor limits to be set appropriately.
Procedure
1. Install the libaio package as follows:
Platform
Command
RHEL
sudo yum install libaio
SUSE
sudo zypper install libaio1
2. Increase the system file descriptor limit if necessary:

a. Check the current limit:
cat /proc/sys/fs/file-max
You are generally advised to set the limit to 65536 per 1 GB of RAM.
b. If necessary, increase the limit by adding or modifying the following line in the /etc/sysctl.conf
file:
fs.file-max=<limit>
c. Run the following to load the new setting:
sysctl --load=/etc/sysctl.conf
3. Set the default ulimit value:
12
PUBLIC

Installation
a. Add or modify the following line in the /etc/security/limits.conf file:

* - nofile 1000000
Caution
Do not set the limit to a value larger than 1048576 or you may be unable to log in to your system
(notably on RHEL 7.1).
b. Log out or reboot so that the ulimit change takes effect.
2.3.8 Validation
To ensure that the components have been correctly installed, run a sample Spark application on the cluster,
such as SparkPi, which calculates the approximate value of Pi.
In the Spark shell, execute the following:
Sample Code
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client -num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 2
--queue default $SPARK_HOME/lib/spark-examples*.jar 10 2>/dev/null
You should see something like this:
Pi is roughly 3.140292
For more information, see Spark Examples
2.4
Installation and Bootstrapping Guidelines
You need to choose appropriate nodes when you deploy the SAP HANA Vora components on the cluster. An
overview of the different node types and how and where SAP HANA Vora components should be deployed is
given below.
Node Types
For the purposes of setting up a cluster, four different types of cluster nodes are distinguished:
Node Type
Description
Management node
Contains the cluster provisioning tool, for example, Ambari or Cloudera.

Installation
PUBLIC
13
Node Type
Description
Master nodes
Contain central cluster components, such as the NameNode or ZooKeeper serv

ers.
Worker nodes
These are the compute nodes of the cluster. They contain components such as
DataNodes or NodeManagers.
Jump boxes
Contain only client components, such as the HDFS client, and serve as an entry
point for users to start compute jobs using Spark.
Installation and Deployment

The SAP HANA Vora installation package contains the components shown in the table below, which you need
to install and deploy on the cluster in the following way:
Component
Description
Installation
SAP HANA Vora Base
All libraries and binaries re

quired by SAP HANA Vora
Install on all hosts. This distributes the binaries to all ma

chines in the cluster.
SAP HANA Vora Catalog
Distributed metadata store

for SAP HANA Vora
Install on a single node
Hashicorp's Discovery Serv

ice, manages service regis
tration
Install on each node in the cluster in either server or client

mode (mutually exclusive):
SAP HANA Vora Discovery

Service
On the same node as Distributed Log (recommended)
Server mode
Install on at least three nodes
Maximum number recommended is seven
Select a bootstrapping host
Client mode
Install on all remaining nodes
SAP HANA Vora Distributed Distributed log manager pro Install on at least one node
Log
viding persistence for the
Master nodes
SAP HANA Vora catalog
Five nodes recommended
SAP HANA Vora Thrift

server
No upper limit
Gateway compatible with the Install on a single node, typically the jump box (recom
Hive JDBC Driver
mended)
A node where the Discovery Service, Distributed Log,

and Catalog are not deployed
SAP HANA Vora Tools
Web-based user interface

with a SQL editor and OLAP
modeler
Install on the same node as the SAP HANA Vora Thrift

server, typically the jump box
SAP HANA Vora V2Server
SAP HANA Vora engine
Install on all worker nodes (all nodes where a DataNode is

deployed)
14
PUBLIC

Installation
Bootstrapping
Bootstrapping ensures that the SAP HANA Vora components are installed and started in a way that enables
them to operate together correctly. The sequence of actions is as follows:
1. The cluster has already been set up and core components, such as HDFS, Hadoop cluster manager, Yarn,
and ZooKeeper are up and running. SAP HANA Vora Base has been installed and deployed on all hosts, but
no SAP HANA Vora services are running yet.
Note
The task of starting services is handled by the Hadoop cluster manager. All actions you take must be
done through the cluster provisioning tool (Ambari or Cloudera). Interference with this process will
mean that the cluster manager cannot keep track of the components that have been started.
Components are started if their dependencies are already up and running, otherwise they will wait or
stop execution.
2. Start the Discovery Service.
The Discovery Service is responsible for handling the bootstrapping process and needs to be installed on
all nodes of the cluster in either server or client mode. You need to have at least three server deployments.
This ensures high availability if a server dies (a server is not automatically restarted if this happens). All
remaining hosts should have client deployments.
One of the server deployments needs to be selected as the bootstrapping host. A bootstrapping host is
needed until the Discovery Service is up and running.
Since the Discovery Service is deployed on each node, all other SAP HANA Vora components on the node
can access it through the localhost:8500. However, for test purposes and custom installations, it is
recommended that all SAP HANA Vora components have a parameter specifying the Discovery Service
deployment address and port. On a production system, this parameter is set to localhost:8500.
3. Start the Distributed Log.
The Distributed Log must be installed on at least one node. However, for the sake of redundancy, it is
recommended that you have five deployments of the Distributed Log (if sufficient resouces are available).
This ensures high availability if a server dies (a server is not automatically restarted if this happens). There
is no upper restriction on the number of nodes.
4. Start the SAP HANA Vora Catalog.
You must have exactly one deployment of the catalog. It is recommended that it is deployed on one of the
nodes used by the Distributed Log.
5. Start SAP HANA Vora.
SAP HANA Vora must be deployed on all DataNode nodes.
6. Start the SAP HANA Vora Thriftserver.
The Thriftserver must be deployed on a single node, typically the jump box.
7. Start the SAP HANA Vora Tools.
The Tools must be deployed on the same node as the SAP HANA Vora Thriftserver.

Installation
PUBLIC
15
2.5
Collect Hadoop Cluster Information
Before proceeding with the installation, collect and document the following information about your Hadoop
cluster. You will need to have this information at hand during the installation.
Procedure
Make a note of the following information:
User and password for Ambari/Cloudera
Operating system user and password
HDFS user and password
Installation directories of Ambari/Cloudera, ZooKeeper, and so on
2.6
Installing SAP HANA Vora
The SAP HANA Vora engine and extension library are contained in installation packages provided specifically
for each of the cluster provisioning tools.
The installation packages are as follows:
Ambari: VORA_AM<version>.TGZ
Cloudera: VORA_CL<version>.TGZ
MapR: VORA_MR<VERSION>.TGZ
The packages contain the following components, which you need to install and deploy on the cluster:
Component
Description
Vora Base
All libraries and binaries required by SAP HANA Vora
Vora Discovery
Hashicorp's Discovery Service, manages service registration
Vora Distributed Log
A distributed log manager providing persistence for the SAP HANA Vora catalog
Vora Catalog
A distributed metadata store for SAP HANA Vora
Vora V2Server
SAP HANA Vora engine
Vora Thriftserver
A gateway compatible with the Hive JDBC Driver
Vora Tools
A web-based user interface with a SQL editor and OLAP modeler
Note
If your Hadoop cluster requires an HTTP(S) proxy to access content through the HTTP(S) protocol, make
sure that the proxy is configured before starting SAP HANA Vora. For more information, see Configure
Proxy Settings [page 49].
16
PUBLIC

Installation
Procedure
Install SAP HANA Vora Using Ambari [page 17]
Install SAP HANA Vora Using Cloudera [page 22]
Installing SAP HANA Vora for MapR [page 29]
2.6.1 Install SAP HANA Vora Using Ambari

Use the Ambari provisioning tool to install SAP HANA Vora on your cluster.
Procedure
1. Log on to the Ambari cluster management node.
2. Download VORA_AM<version>.TGZ from the SAP Software Download Center (https://support.sap.com/
swdc
) to the management node.
3. Go to /var/lib/ambari-server/resources/stacks/HDP/2.3/services.
4. Copy VORA_AM<version>.TGZ to that directory and extract it.
5. Restart the Ambari server with the following command:
$ ambari-server restart
Depending on your cluster configuration, you may need to be the root user or a user with administrator
rights to do so.
6. Wait until the Ambari Administration Interface is up and running.
Ambari is now able to provision the SAP HANA Vora components on the Hadoop cluster.
7. On the Ambari dashboard, choose
Actions
Add Service .
8. On the Choose Services screen, add the services as follows:

a. Install SAP HANA Vora Base [page 18]
b. Install SAP HANA Vora Discovery [page 18]
c. Install SAP HANA Vora Distributed Log [page 19]
d. Install SAP HANA Vora Catalog [page 20]
e. Install SAP HANA Vora V2Server [page 20]
f. Install SAP HANA Vora Thriftserver [page 21]
g. Install SAP HANA Vora Tools [page 22]

Installation
PUBLIC
17
2.6.1.1
Install SAP HANA Vora Base
Procedure
1. On the Choose Services screen, select the Vora Base option and click Next.
2. On the Assign Slaves and Clients screen, add the Vora Base component to all hosts and click Next.
3. Customize the service.
No configuration is needed.
4. Deploy the service and complete the installation.
The libraries and binaries provided by SAP HANA Vora Base have been distributed to all machines in the
cluster.
Note
SAP HANA Vora Base does not run as a service (you cannot start it).
2.6.1.2
Install SAP HANA Vora Discovery
Procedure
1. On the Choose Services screen, select the Vora Discovery option and click Next.
2. On the Assign Masters screen, add the servers on which the Discovery service should run.
You need to deploy it on at least three masters (that is, in server mode).
Click Next.
3. On the Assign Slaves and Clients screen, add the service to all remaining hosts.
The Discovery service needs to be installed on all nodes in the cluster, but must not be deployed in both
server and client mode (mutually exclusive) on the same node.
Click Next.
18
PUBLIC

Installation
In the Advanced vora-discovery-config section, enter the following required information:

Parameter
Description
vora_discovery_bootstrap_host
The server address of the bootstrap host. The bootstrap host can be any one
of the discovery masters you selected earlier. Note that you need to enter the
fully qualified domain name (FQDN). For example: mydiscserver1.mydo
main.org
The bootstrap host is responsible for bootstrapping the service if no Discov
ery service host is up and running. Once the initial servers have been added,
you can disable the bootstrap mode by removing the bootstrap host from this
field and restarting the server as a regular server.
vora_discovery_servers
The servers you selected earlier as discovery masters, separated by commas.

Note that you need to enter the fully qualified domain names (FQDNs). For ex
ample: mydiscserver1.mydomain.org,mydiscserver2.mydomain.org,mydisc
server3.mydomain.org,mydiscserver4.mydomain.org
Correct the default log settings and data directory if necessary:

Parameter
Default Value
vora_discovery_log_dir
/var/log/vora-discovery
vora_discovery_log_level
WARNING
vora_discovery_data_dir
/var/local/vora-discovery
2.6.1.3
Install SAP HANA Vora Distributed Log
Procedure
1. On the Choose Services screen, select the Vora Distributed Log option and click Next.
2. On the Assign Masters screen, add the servers on which the Distributed Log should run. It must be
installed on at least one server, however, the recommended number of servers is five (if sufficient
resouces are available).
Click Next.
In the Advanced vora-dlog-config section, correct the default log settings and other default values if
necessary:
Parameter
Default Value
vora_dlog_log_dir
/var/log/vora-dlog

Installation
PUBLIC
19
Parameter
Default Value
vora_dlog_log_level
WARNING
vora_dlog_store_dir
/var/local/vora-dlog
vora_dlog_port
49152 (the first port in the open range), or otherwise the

next consecutive port number that is free
2.6.1.4
Install SAP HANA Vora Catalog
Procedure
1. On the Choose Services screen, select the Vora Catalog option and click Next.
2. On the Assign Masters screen, add the server on which the Catalog service should run. It must be installed
on a single server. It is recommended that it is deployed on one of the servers used by the Distributed Log.
Click Next.
In the Advanced vora-catalog-config section, correct the default log settings if necessary:
Parameter
Default Value
vora_catalog_log_dir
/var/log/vora-catalog
vora_catalog_log_level
WARNING
vora_catalog_dlog_replication_factor
3
If you installed the SAP HANA Vora Distributed Log with N
servers, you need to specify a number M <= N as the rep
lication factor.
2.6.1.5
Install SAP HANA Vora V2Server
Procedure
1. On the Choose Services screen, select the Vora V2Server option and click Next.
20
PUBLIC

Installation
2. On the Assign Slaves and Clients screen, add the service to the appropriate hosts.
We recommend that you add it to all data nodes, that is, each node that acts as a Spark worker node.
Click Next.
In the Advanced vora-v2server-config section, modify the SAP HANA Vora V2Server configuration, if
needed. This includes, in particular, the file system location of the SAP HANA Vora engine logs:
Parameter
Default Value
vora_v2server_log_dir
/var/log/vora-v2server
vora_v2server_log_level
WARNING
Results
You can confirm that the SAP HANA Vora engine has been successfully deployed on the cluster nodes by
verifying that the v2server process is running on them.
2.6.1.6
Install SAP HANA Vora Thriftserver
Procedure
1. On the Choose Services screen, select the Vora Thriftserver option and click Next.
2. On the Assign Masters screen, add the server on which the Thriftserver should run. This is typically the
jump box.
Click Next.
In the Advanced vora-thriftserver-config section, enter the following required information:
Parameter
Description
vora_thriftserver_java_home
Location of Java installation that is used for SAP HANA Vora Thriftserver
vora_thriftserver_spark_home
Location of Spark installation that is used for SAP HANA Vora Thriftserver
Correct the default log settings and other default values if necessary:
Parameter
Default Value
vora_thriftserver_log_dir
/var/log/vora-thriftserver

Installation
PUBLIC
21
Parameter
Default Value
vora_thriftserver_log_level
WARNING
vora_thriftserver_metastore_dir
/tmp/vora-thriftserver
Related Information
Enable Spark Auto-registration [page 50]
2.6.1.7
Install SAP HANA Vora Tools
Procedure
1. On the Choose Services screen, select the Vora Tools option and click Next.
2. On the Assign Masters screen, add the server on which the Tools should run. This needs to be the same as
that of the Thriftserver and is typically the jump box.
Click Next.
In the Advanced vora-tools-config section, correct the default log settings if necessary:
Parameter
Default Value
vora_tools_log_dir
/var/log/vora-tools
vora_tools_log_level
WARNING
2.6.2 Install SAP HANA Vora Using Cloudera

Use the Cloudera provisioning tool to install SAP HANA Vora on your cluster.
Procedure
1. Log on to the Cloudera cluster management node.
22
PUBLIC

Installation
2. Download VORA_CL<version>.TGZ from the SAP Software Download Center (https://support.sap.com/

swdc
) to a temporary directory on the management node.
3. Extract the package.

4. Copy all files contained in the csd directory to /opt/cloudera/csd, the default local descriptor
repository path.
5. Copy all files contained in the parcel-repo directory to /opt/cloudera/parcel-repo, the default
local parcel repository path.
6. Remove the temporary directory.
7. Restart the Cloudera server, for example as follows:
$ service cloudera-scm-server restart
rights to do so.
8. Wait until Cloudera Manager is up and running.
Cloudera is now able to provision the SAP HANA Vora components on the Hadoop cluster.
9. In the Cloudera Manager, choose Hosts and then the Parcels tab.
10. In the parcel list, locate SAPHanaVora and choose the Distribute button.
Wait until the parcel has been distributed. The parcel's status is shown as distributed.
11. Choose the Activate button.
12. Choose OK to confirm.
The parcel's status is shown as distributed and activated.
13. Go to the Home screen.
14. Open the drop-down menu next to your cluster name and choose Add a Service.
A list of service types is displayed.
15. On the Add a Service screen, add the services as follows:
a. Install SAP HANA Vora Base [page 23]
b. Install SAP HANA Vora Discovery [page 24]
c. Install SAP HANA Vora Distributed Log [page 25]
d. Install SAP HANA Vora Catalog [page 26]
e. Install SAP HANA Vora V2Server [page 26]
f. Install SAP HANA Vora Thriftserver [page 27]
g. Install SAP HANA Vora Tools [page 28]
2.6.2.1
Install SAP HANA Vora Base
Procedure
1. On the Add a Service screen, select the Vora Base option and choose Continue.
2. On the role assignment page, click the box below Gateway.
The Hosts Selected dialog box appears.

Installation
PUBLIC
23
3. Add the SAP HANA Vora Base component to all hosts and choose OK.
4. Choose Continue.
5. When the component has been successfully installed, choose Continue and then Finish.
The libraries and binaries provided by SAP HANA Vora Base have been distributed to all machines in the
cluster.
Note
SAP HANA Vora Base does not run as a service (you cannot start it).
2.6.2.2
Install SAP HANA Vora Discovery
Procedure
1. On the Add a Service screen, select the Vora Discovery option and choose Continue.
2. On the role assignment page:
a. Click the box below Vora Discovery Server.
b. Add the servers on which the Discovery service should run. You need to deploy it on at least three
hosts (that is, in server mode).
c. Choose OK.
d. Click the box below Vora Discovery Client.
e. Add the service to all remaining hosts. The Discovery service needs to be installed on all nodes in the
cluster, but must not be deployed in both server and client mode (mutually exclusive) on the same
node.
f. Choose OK and then Continue.
3. On the review changes page, enter the following required information:
Parameter
Description
vora_discovery_bootstrap_host
The server address of the bootstrap host. The bootstrap host can be any one
of the discovery servers you selected earlier. Note that you need to enter the
fully qualified domain name (FQDN). For example: mydiscserver1.mydo
main.org
The bootstrap host is responsible for bootstrapping the service if no Discov
ery service host is up and running. Once the initial servers have been added,
you can disable the bootstrap mode by removing the bootstrap host from this
field and restarting the server as a regular server.
24
PUBLIC

Installation
Parameter
Description
vora_discovery_servers
The servers you selected earlier as discovery servers, separated by commas.

Note that you need to enter the fully qualified domain names (FQDNs). For ex
ample: mydiscserver1.mydomain.org,mydiscserver2.mydomain.org,mydisc
server3.mydomain.org,mydiscserver4.mydomain.org

Parameter
Default Value
vora_discovery_log_dir
/var/log/vora-discovery
vora_discovery_log_level
WARNING
vora_discovery_data_dir
/var/local/vora-discovery
4. Choose Continue.
5. When the SAP HANA Vora Discovery service has been successfully started, choose Continue and then
Finish.
2.6.2.3
Install SAP HANA Vora Distributed Log
Procedure
1. On the Add a Service screen, select the Vora Distributed Log option and choose Continue.
a. Click the box below Vora Distributed Log Server.
b. Add the servers on which the Distributed Log service should run. It must be installed on at least one
server, however, the recommended number of servers is five (if sufficient resouces are available).
c. Choose OK and then Continue.
3. On the review changes page, correct the default log settings and other default values if necessary:
Parameter
Default Value
vora_dlog_log_dir
/var/log/vora-dlog
vora_dlog_log_level
WARNING
vora_dlog_store_dir
/var/local/vora-dlog
vora_dlog_port
49152 (the first port in the open range), or otherwise the

next consecutive port number that is free
4. Choose Continue.
5. When the SAP HANA Vora Distributed Log service has been successfully started, choose Continue and
then Finish.

Installation
PUBLIC
25
2.6.2.4
Install SAP HANA Vora Catalog
Procedure
1. On the Add a Service screen, select the Vora Catalog option and choose Continue.
a. Click the box below Vora Catalog Server.
b. Add the server on which the Catalog service should run. It must be installed on a single server. It is
recommended that it is deployed on one of the servers used by the Distributed Log.
3. On the review changes page, correct the default log settings if necessary:
Parameter
Default Value
vora_catalog_log_dir
/var/log/vora-catalog
vora_catalog_log_level
WARNING
vora_catalog_dlog_replication_factor
3
If you installed the SAP HANA Vora Distributed Log with N
servers, you need to specify a number M <= N as the rep
lication factor.
4. Choose Continue.
5. When the SAP HANA Vora Catalog service has been successfully started, choose Continue and then
Finish.
2.6.2.5
Install SAP HANA Vora V2Server
Procedure
1. On the Add a Service screen, select the Vora V2Server option and choose Continue.
a. Click the box below Vora V2Server Worker.
b. Select the appropriate hosts from the list. We recommend that you add the SAP HANA Vora V2Server
service to each node that acts as a Spark worker node.
26
PUBLIC

Installation
3. On the review changes page, correct the default data directory and log settings if necessary:
Parameter
Default Value
vora_v2server_log_dir
/var/log/vora-v2server
vora_v2server_log_level
WARNING
4. Choose Continue.
5. When the SAP HANA Vora V2Server service has been successfully started, choose Continue and then
Finish.
2.6.2.6
Install SAP HANA Vora Thriftserver
Prerequisites
To run the Thriftserver on Cloudera, you need to install Spark 1.5.2 on your jump box and set the
vora_thriftserver_spark_home parameter (see below) to this location. The Spark installation provided
by Cloudera does not include the necessary Spark Thriftserver packages.
Procedure
1. On the Add a Service screen, select the Vora Thriftserver option and choose Continue.
a. Click the box below Vora Thriftserver Master.
b. Add the server on which the Thriftserver should run. This is typically the jump box.
3. On the review changes page, enter the following required information:
Parameter
Description
vora_thriftserver_spark_home
Location of Spark installation that is used for SAP HANA Vora Thriftserver
vora_thriftserver_java_home
Location of Java installation that is used for SAP HANA Vora Thriftserver

Parameter
Default Value
vora_thriftserver_log_dir
/var/log/vora-thriftserver
vora_thriftserver_log_level
WARNING

Installation
PUBLIC
27
Parameter
Default Value
vora_thriftserver_metastore_dir
/tmp/vora-thriftserver
4. Choose Continue.
5. When the SAP HANA Vora Thriftserver service has been successfully started, choose Continue and then
Finish.
Related Information
2.6.2.7
Install SAP HANA Vora Tools
Procedure
1. On the Add a Service screen, select the Vora Tools option and choose Continue.
a. Click the box below Vora Tools Master.
b. Add the server on which the Tools should run. This needs to be the same as that of the Thriftserver
and is typically the jump box.
3. On the review changes page, correct the default log settings if necessary:
Parameter
Default Value
vora_tools_log_dir
/var/log/vora-tools
vora_tools_log_level
WARNING
4. Choose Continue.
5. When the SAP HANA Vora Tools service has been successfully started, choose Continue and then Finish.
28
PUBLIC

Installation
2.6.3 Installing SAP HANA Vora for MapR

Install the SAP HANA Vora package for MapR on your cluster. This is currently a manual installation process.
Prerequisites
The MapR cluster is already set up.
For convenience, the MapR File System (MapR-FS) can be accessed through NFS on every node.
The mechanism for the MapR central configuration has been established.
SAP HANA Vora RPM Packages

The files contained in the SAP HANA Vora package are RPM packages that can be installed with package
management tools like yum (for the Red Hat Linux distribution) or zypper (for the SUSE Linux distribution).
The following table describes the RPM packages required to install SAP HANA Vora:
Package Name
Description
mapr-vora-base-<version>.<arch>.rpm
SAP HANA Vora base package: This package contains all SAP HANA
Vora executables and basic configuration files.
It needs to be installed on each node of the cluster.
Prerequisite: "mapr-core" package
mapr-vora-discovery-<version>.<arch>.rpm
Configuration files for the SAP HANA Vora Discovery Service.

It needs to be installed on each node on which the SAP HANA Vora serv
ices are deployed. It is recommended to deploy this service on the MapR
ZooKeeper and CLDB nodes.
Prerequisite: "mapr-vora-base"
mapr-vora-dlog-<version>.<arch>.rpm
Configuration files for the SAP HANA Vora Distributed Log Service.
This service needs to be deployed on at least one node, however, the
recommended number is five (if sufficient resouces are available).
Prerequisites: "mapr-vora-discovery" and the "libaio" library
mapr-vora-catalog-<version>.<arch>.rpm
SAP HANA Vora Catalog: The infrastructure for metadata, such as table
definitions.
This service needs to be deployed on a single node. It is recommended
that it is deployed on one of the servers used by the Distributed Log.
Prerequisite: "mapr-vora-dlog"
mapr-vora-v2server-<version>.<arch>.rpm
SAP HANA Vora SQL engine

Prerequisite: "mapr-vora-discovery"

Installation
PUBLIC
29
Package Name
Description
mapr-vora-thriftserver-<version>.<arch>.rpm
Configuration files for the Spark Thriftserver (including SAP HANA Vora
extensions)
Prerequisite: package "mapr-spark"
mapr-vora-tools-<version>.<arch>.rpm
Configuration files for the modeling tools

Prerequisite: "mapr-vora-thriftserver"
Note
The MapR installer cannot yet be used to deploy the HANA Vora components across the cluster. However,
the manual installation steps required can be easily automated, using password-less SSH access as
described in the MapR installation guide.
Procedure
1. Prepare for Installation [page 30]
2. Install the SAP HANA Vora Packages [page 31]
3. Configure SAP HANA Vora [page 32]
4. Start SAP HANA Vora [page 33]
2.6.3.1
Prepare for Installation
Procedure
1. Create a group "vora" and user "vora" on all nodes of the cluster.
When adding a user to the cluster nodes, make sure that the user ID (UID) is always the same. The same
applies to the group ID (GID). For example:
groupadd vora --gid 4999
useradd vora --uid 4999 -g vora
2. Download the file VORA_MR<VERSION>.TGZ from the SAP Software Download Center (https://
support.sap.com/swdc
) to the cluster host.
3. Extract the package to a local directory.
Note
Since you need to be able to access the installation files from all nodes of the cluster, you might want to
move the files to a shared storage.
30
PUBLIC

Installation
Tip
Using MapR-FS NFS, you could move the files to /mapr/<cluster name>/user/mapr/vorainstall. This is equivalent to maprfs://user/mapr/vora-install.
2.6.3.2
Install the SAP HANA Vora Packages
Install the SAP HANA Vora packages on the appropriate nodes of the cluster.
Context
It is recommended that you distribute the services across the cluster as follows:
On all nodes: Deploy the packages mapr-vora-base and mapr-vora-discovery. Include the
Zookeeper and CLDB nodes.
On some nodes (minimum one, recommended five): Deploy the package mapr-vora-dlog.
On a single node: Deploy the package mapr-vora-catalog. It is recommended that it is deployed on one
of the servers used by the Distributed Log.
On most nodes: Deploy the package mapr-vora-v2server (SAP HANA Vora SQL engine).
On jump nodes: Deploy the packages mapr-vora-thriftserver and mapr-vora-tools.
Perform the steps outlined below on all nodes of the cluster.
Procedure
1. Log on to a cluster node with an administrative user, for example, the MapR user.
2. Navigate to the installation directory. For example:
cd /mapr/<cluster name>/user/mapr/vora-install
3. Install the packages as follows:
Red Hat
sudo yum install <package_file_name>
SUSE
sudo zypper install <package_file_name>

Installation
PUBLIC
31
2.6.3.3
Configure SAP HANA Vora
After the installation of the packages, you can adjust the SAP HANA Vora configuration to suit your own
requirements.
Context
The SAP HANA Vora configuration is contained in two configuration files.
Default settings
The file /opt/mapr/conf/conf.d/vora_default_settings.sh lists all configuration parameters for
the SAP HANA Vora services. It is realized as a shell script and uses environment variables for storage
purposes. The shell script is structured into functions, one for each service. All configuration parameters
have a description of the parameter, an allowed value range, and a default value.
Start settings
When a service is started, it is often necessary to consider the actual environment of a node or cluster to
derive or overwrite the default settings. The file /opt/mapr/conf/conf.d/vora_start_settings.sh
takes the default settings and changes improper values.
If possible, limit the adjustments you need to make to the configuration to the default settings file.
Procedure
1. Copy the file /opt/mapr/conf/conf.d/vora_default_settings.sh to a different local directory. For
example:
cp /opt/mapr/conf/conf.d/vora_default_settings.sh /tmp/
vora_default_settings.sh
2. Edit the temporary configuration file with a text editor.
3. Upload the temporary configuration file to the central configuration:
hadoop fs mkdir p /var/mapr/configuration/conf/conf.d
hadoop fs put /tmp/vora_default_settings.sh /var/mapr/configuration/conf/
conf.d
After some time, the central configuration is replicated to all cluster nodes.
The same procedure can be applied to the start settings file, if required.
32
PUBLIC

Installation
2.6.3.4
Start SAP HANA Vora
Integrate the new services into the MapR cluster and launch them.
Procedure
1. Execute the following on all cluster nodes:
sudo /opt/mapr/server/configure.sh -R
2. Log on to the MapR Control System and verify the service status on the various cluster nodes.
2.7
Validate the SAP HANA Vora Installation
To check that the SAP HANA Vora engine and extension library have been correctly installed and that you can
use the SAP HANA Vora features in Spark, create a table and load data into it from a file stored in HDFS.
Prerequisites
You have already successfully deployed the SAP HANA Vora components on the cluster and the instances
are running.
You have already installed Spark.
Context
The location of the SAP HANA Vora spark extension depends on your installation:
Ambari, for example: /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vora-base/
package/lib/vora-spark
Cloudera, for example: /opt/cloudera/parcels/SAPHanaVora-1.2.35.97/lib/vora-spark
It contains the following folders:
lib/: Contains the spark-sap-datasources-<VERSION>-assembly.jar file with all necessary
dependencies (excluding Spark).
bin/: Contains scripts for ease of use.
META-INF/: Contains the pom.properties and pom.xml files.

Installation
PUBLIC
33
Procedure
1. Create a file in HDFS. Note that in this example the test file, test.csv, is stored in a directory set up for
the user "vora" (user/vora):
Sample Code
echo "1,2,Hello" > test.csv
hadoop fs -put test.csv
hadoop fs -cat /user/vora/test.csv
1,2,Hello
2. Open a Spark shell, for example, by using the shell script:
/<vora-spark-extension-path>/vora-spark/bin/start-spark-shell.sh
3. Enter the following statements in the Spark shell to create a table and check that it has been successfully
created:
scala> import org.apache.spark.sql.SapSQLContext
scala> val vc = new SapSQLContext(sc)
scala> val testsql = """
CREATE TABLE table001 (a1 double, a2 int, a3 string)
USING com.sap.spark.vora
OPTIONS (
tablename "table001",
paths "/user/vora/test.csv"
)"""
scala> vc.sql(testsql)
scala> vc.sql("show tables").show
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
| table001|
false|
+---------+-----------+
scala> vc.sql("SELECT * FROM table001").show
+---+--+-----+
| a1|a2|
a3|
+---+--+-----+
|1.0| 2|Hello|
+---+--+-----+
scala > <Ctrl-C to quit>
Results
You have now successfully validated the SAP HANA Vora extension and can use it as follows:
The JAR file in the lib folder (spark-sap-datasources-VERSION-assembly.jar) can be provided to
Spark using the --jars option.
For example, assuming the spark-shell command is on the user's path:
$ spark-shell --jars /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vorabase/package/lib/vora-spark/lib/spark-sap-datasources-VERSION-assembly.jar
34
PUBLIC

Installation
Alternatively, the shell scripts in the bin folder can be used to run a Spark shell with the SAP HANA Vora
extension library. To do so, the SPARK_HOME environment variable needs to point to the Spark folder on
the jump box.
You can then start the Spark shell in Yarn client mode as follows:
$ ./start-spark-shell.sh --master yarn-client
2.8
Install the SAP HANA Vora Zeppelin Interpreter
Zeppelin is a graphical user interface that allows you, as a data scientist, to interact easily with a cluster. The
SAP HANA Vora Spark extension provides an interpreter for the Zeppelin user interface.
Prerequisites
You require Zeppelin 0.5.6 built against Spark 1.5.2, Hadoop 2.6, and Yarn, installed on one of the cluster
nodes (most likely the jump box):
You can build a compatible Zeppelin version as follows (you need Maven 3.1 or higher):
$
$
$
$
git clone https://github.com/apache/incubator-zeppelin.git

cd incubator-zeppelin
git checkout branch-0.5.6
mvn clean package -DskipTests -Pspark-1.5 -Phadoop-2.6 -Pyarn -Pbuild-distr
To build Zeppelin for MapR 5.x distributions, you need to enable the "mapr50" build profile. The maven build
call for MapR 5.x distributions looks as follows:
$ mvn clean package -DskipTests -Pspark-1.5 -Pmapr50 -Pyarn -Pbuild-distr
After the build process has completed, you should have a tar.gz package in the following directory:
./zeppelin-distribution/target
Context
The SAP HANA Vora extension library has its own SQL context class. A modified Zeppelin interpreter is
therefore required to allow Zeppelin to run in the modified context. To enable the interpreter, you need to
register it with Zeppelin.
Restriction
Zeppelin is still in the incubation stage. The steps below are provided for guidance only.

Installation
PUBLIC
35
Procedure
1. Copy spark-sap-datasources-<VERSION>-assembly.jar to <ZEPPELIN_HOME>/interpreter/
spark:
$ cp ~/vora-spark/lib/spark-sap-datasources-<VERSION>-assembly.jar \
<ZEPPELIN_HOME>/interpreter/spark/spark-sap-datasources-assembly.jar
Note
The location of the spark-sap-datasources-<VERSION>-assembly.jar file depends on your
installation:
Ambari, for example: /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vorabase/package/lib/vora-spark/lib/
Cloudera, for example: /opt/cloudera/parcels/SAPHanaVora-1.2.35.97/lib/voraspark/lib
<ZEPPELIN_HOME> refers to the directory to which the Zeppelin binaries have been extracted.
2. Combine the Zeppelin Spark interpreter JAR with the spark-sap-datasources-assembly JAR,
replacing the versions as appropriate:
$
$
$
$
$
$
$
$
cd `<ZEPPELIN_HOME>/interpreter/spark`
mkdir tmp
(cd tmp; jar -xf ../spark-sap-datasources-<VERSION>-assembly.jar)
(cd tmp; jar -xf ../zeppelin-spark-<VERSION>-incubating.jar)
jar -cvf zeppelin-spark-sap-combined.jar -C tmp .
// remove the old jars
rm spark-sap-datasources-<VERSION>-assembly.jar
rm zeppelin-spark-<VERSION>-incubating.jar
3. Add the following variables to the <ZEPPELIN_HOME>/conf/zeppelin-env.sh file:

HDP/CDH:
export MASTER=yarn-client
export ZEPPELIN_PORT=9099
MapR 5.x:
export MASTER=yarn-client
export ZEPPELIN_PORT=9099
export HADOOP_CONF_DIR="/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop"
export HADOOP_HOME="/opt/mapr/hadoop/hadoop-2.7.0/"
export ZEPPELIN_JAVA_OPTS="-Djava.security.auth.login.config=/opt/mapr/
conf/mapr.login.conf -Dzookeeper.sasl.client=false"
Example
1. cp $ZEPPELIN_HOME/conf/zeppelin-env.sh.template $ZEPPELIN_HOME/conf/
zeppelin-env.sh
2. chmod 0755 $ZEPPELIN_HOME/conf/zeppelin-env.sh
3. vi $ZEPPELIN_HOME/conf/zeppelin-env.sh
4. Insert the variables shown above and save your changes.
36
PUBLIC

Installation
Note
Zeppelin also requires the environment variables SPARK_HOME and HADOOP_CONF_DIR to be set. If
these are not already set, you can add them to the zeppelin-env.sh file as well.
4. Add the interpreter class org.apache.spark.sql.SapSqlInterpreter to the
zeppelin.interpreters property in the <ZEPPELIN_HOME>/conf/zeppelin-sites.xml file:
...
<property>
<name>zeppelin.interpreters</name>
<value>INTERPRETER_1,...,INTERPRETER_N,org.apache.spark.sql.SapSqlInterpreter<
/value>
<description>Comma separated interpreter configurations.
First interpreter becomes the default</description>
</property>
...
Note
Make sure that the SAP interpreter class "org.apache.spark.sql.SapSqlInterpreter" occurs after the
Spark interpreter class "org.apache.zeppelin.spark.SparkInterpreter" in the resulting list of
interpreters.
5. For HDP with Ambari only: Update the YARN configuration as follows:
a. Check the installed HDP version (<HDP_VERSION>), for example, from the following directory
name: /usr/hdp/<HDP_VERSION>
b. On the Ambari administration interface, select the YARN service and choose the
Advanced
Configs
tab. Scroll down to the Custom yarn-site section and choose Add Property.
c. Add a property with the key hdp.version and value <HDP_VERSION>.

6. Start the Zeppelin server:
$ <ZEPPELIN_HOME>/bin/zeppelin-daemon.sh start
7. In a web browser, open Zeppelin: http://DNS_NAME_OF_JUMPBOX_NODE:9099
8. Open a notebook and click the "configuration" icon.
You should see an additional interpreter prefix called %vora in the interpreter list.
9. Test that the Zeppelin interpreter has been successfully installed.
Create a new notebook and add the following two scripts:
%vora CREATE TABLE table01 (a1 double, a2 int, a3 string)
USING com.sap.spark.vora
OPTIONS (
tablename "table01"
)
%vora SHOW TABLES
The execution of the first snippet might take some time (1-3 minutes), since a Spark application needs to
be started on the server. Once the application is running, subsequent calls will be much faster (depending
on the actual query).

Installation
PUBLIC
37
Example output:
Note
The log files are available as follows:
<ZEPPELIN_HOME>/logs/zeppelin-*-.log: Contains the Web-UI related output.
<ZEPPELIN_HOME>/logs/zeppelin-interpreter-*-.log: Contains the output you would see
in a Spark shell.
2.9
Connect SAP HANA Spark Controller to SAP HANA

Vora
Configure the Spark controller to use SAP HANA Vora. This allows you to connect from SAP HANA to SAP
HANA Vora and query SAP HANA Vora tables.
Prerequisites
The Spark controller has been installed and configured. For more information, see Set up SAP HANA
Spark Controller in the SAP HANA Administration Guide.
When installing the Spark controller as described in Set up SAP HANA Spark Controller, the following
steps are not necessary:
Install Spark Assemby Files and Dependent Libraries
The three datanucleus artifacts listed in this section are not needed when you run the Spark
controller with SAP HANA Vora:
datanucleus-rdbms
datanucleus-api-jdo
38
PUBLIC

Installation
datanucleus-core
Do not download and copy these artifacts to HDFS.
Configure Hive Metastore
You do not need to copy the hive-site.xml when you run the Spark controller with SAP HANA Vora.
If you do copy the datanucleus* artifacts and hive-site.xml, you might encounter issues unless you
have a valid Hive installation that is appropriately configured and your Hive metastore is running properly.
Context
Restriction
MapR does not yet support the SAP HANA Spark controller. For more information, see SAP Note 2284507.
Procedure
1. Make the SAP HANA Vora data sources package available to the Spark controller.
Copy spark-sap-datasources-<VERSION>-assembly.jar to the folder /usr/sap/spark/
controller/lib/.
Make sure that you copy the same version that you are using to create tables. Compatibility between
different packages is not always guaranteed.
2. Configure the Spark controller.
In the Spark controller configuration file /usr/sap/spark/controller/conf/hanaes-site.xml,
change the value of the property sap.hana.hadoop.datastore from 'hive' to 'vora'. It should look like
this:
<property>
<name>sap.hana.hadoop.datastore</name>
<value>vora</value>
<final>true</final>
</property>
3. Restart the Spark controller.
For the configuration changes to take effect, restart the Spark controller, for example, using the following
commands:
$ cd /usr/sap/spark/controller/bin
$ ./hanaes stop
$ ./hanaes start
4. Verify the configuration changes.
To verify whether the configuration changes were successful, check the Spark controller log
file: /var/log/hanaes/hana_controller.log

Installation
PUBLIC
39
After initialization, the file should contain the following lines at the end:
(DATE
(DATE
(DATE
(DATE
and
and
and
and
TIME)
TIME)
TIME)
TIME)
INFO
INFO
INFO
INFO
Server: Starting Spark Controller

CommandRouter: Connecting to Vora Engine
CommandRouter: Initialized Router
CommandRouter: Server started
If these lines are missing, double-check whether the spark-sap-datasources-<VERSION>assembly.jar is present and the configuration settings are correct.
Results
After successful configuration, you can see the tables stored in SAP HANA Vora in SAP HANA Studio, and you
can add virtual tables and submit queries, as described in the SAP HANA Spark Controller documentation.
Related Information
SAP HANA Spark Controller
SAP Note 2284507
2.10 Connect SAP Lumira to SAP HANA Vora

Connect SAP Lumira to SAP HANA Vora to visualize data from SAP HANA Vora, Spark, and SAP HANA, in SAP
Lumira.
Prerequisites
You need SAP Lumira version 1.29 or higher.
Context
To use SAP Lumira with SAP HANA Vora, you need to install the relevant drivers in SAP Lumira to be able to
connect from SAP Lumira using JDBC. You can then create a connection to SAP HANA Vora using the SAP
HANA Vora Thrift server.
40
PUBLIC

Installation
Procedure
1. Install the JDBC driver. You need to use the Spark drivers.
a. Open SAP Lumira and choose
Preferences
SQL Drivers .
b. Select Generic JDBC datasource JDBC Drivers and choose Install Drivers.
c. Select all *.jar files under C:\Program Files\SAP Lumira\Desktop\utilities\SparkJDBC,

choose Open and then Done.
d. To apply the driver changes, restart SAP Lumira.
2. Start the SAP HANA Vora Thriftserver from the Ambari or Cloudera cluster provisioning tool.
3. Create a connection to SAP HANA Vora.
a. In SAP Lumira choose File New .
The Add new dataset dialog box appears.
b. Select Query with SQL and choose Next.

Installation
PUBLIC
41
c. Select Generic JDBC datasource JDBC Drivers and choose Next. Note that the green tick indicates
that the drivers are installed.
d. Enter the required credentials and connection URLs as follows:

Field
Value
User name/password
lumira/lumira
JDBC URL
jdbc:spark://<host>:<port>/
default;CatalogSchemaSwitch=0;UseNativeQuery=1
JDBC Class
host: Host name of the Thrift server
port: The default value is 49155
com.simba.spark.jdbc4.Driver
e. Choose Connect.
You should now see the CATALOG_VIEW, where you can select tables and enter SQL queries.
4. Use Beeline, a JDBC client, to register tables created in SAP HANA Vora in the Thrift server.
42
PUBLIC

Installation
a. Open the Beeline command line client:

./beeline
b. Execute the following statement to connect to the Thrift server, replacing the host name and port as
needed:
!connect jdbc:hive2://<hostname of thrift server>:<port, default: 49155>
c. When prompted for a user name and password, enter lumira in both cases.
d. Register the tables by running the following command:
REGISTER ALL TABLES USING com.sap.spark.vora;
Note
Table definitions are stored in the SAP HANA Vora catalog. This allows you to register or re-register
tables when you start or restart the Thrift server. The tables are persisted as long as the Thrift
server is connected.
5. View the data in SAP Lumira.
a. In SAP Lumira, refresh the CATALOG_VIEW (see step 3 above) by choosing Previous and then Next.
b. Drill down in the CATALOG_VIEW into Spark to see the tables available on the Thrift server.
c. In the Query field, enter a select statement and choose Preview. Note that you need to use the same
format for select statements as in the Beeline command line client.
A preview of the selected data is displayed.
d. Use the standard SAP Lumira functionality to create a report and visualize the data.
Related Information
SAP Lumira

Installation
PUBLIC
43
2.11
Updating SAP HANA Vora
Update your SAP HANA Vora installation by downloading and installing the latest versions of the installation
packages.
Remember
If Zeppelin has been configured to support the SAP HANA Vora Spark extension library, you will also need to
update the library in the <ZEPPELIN_HOME>/interpreter/spark directory.
Restriction
Note that when upgrading from SAP HANA Vora 1.1 to SAP HANA Vora 1.2, the ZooKeeper catalog is
replaced by the SAP HANA Vora catalog. A migration tool is not available for automatically transferring the
ZooKeeper catalog contents to the SAP HANA Vora catalog.
Related Information
Update SAP HANA Vora Using Ambari [page 44]
Update SAP HANA Vora Using Cloudera [page 46]
Update SAP HANA Vora for MapR [page 47]
Install the SAP HANA Vora Zeppelin Interpreter [page 35]
2.11.1 Update SAP HANA Vora Using Ambari

Use the Ambari provisioning tool to install the latest version of SAP HANA Vora on your cluster.
Procedure
1. Stop the SAP HANA Vora services.
a. In the Services panel on the dashboard, select a SAP HANA Vora service.
b. In the Service Actions dropdown menu on the Services page, choose Stop.
c. Repeat for all other SAP HANA Vora services.
2. Remove the services.
Run the following command from any machine where curl is available, for example, the management node
of the cluster, replacing the placeholders with appropriate values:
curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -X DELETE -H 'X-Requested-By:admin' \
44
PUBLIC

Installation
http://<YOUR_MGMT_NODE_FQDN>:8080/api/v1/clusters/\
<YOUR_CLUSTER_NAME>/services/<SERVICE_NAME>
Replace SERVICE_NAME as follows:
Service
service_name
SAP HANA Vora
VORA
Vora Base
HANA_VORA_BASE
Vora Catalog
HANA_VORA_CATALOG
Vora Discovery
HANA_VORA_DISCOVERY
Vora Distributed Log
HANA_VORA_DLOG
Vora Thriftserver
HANA_VORA_THRIFTSERVER
Vora Tools
HANA_VORA_TOOLS
Vora V2Server
HANA_VORA_V2SERVER
Note
If a service is shown as stopped on the Ambari UI, but Ambari responds that it isn't when you try and
remove it, you can use the following commands to stop it:
To stop a component, run the following command for every component of the SAP HANA Vora service:
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":
{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}'
http://$AMBARI_SERVER:8080/api/v1/clusters/$CLUSTER_NAME/hosts/
$COMPONENT_MACHINE/host_components/$COMPONENT_NAME
To stop a service, run the following command once for the SAP HANA Vora service:
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":
{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}'
http://$AMBARI_SERVER:8080/api/v1/clusters/$CLUSTER_NAME/services/
$SERVICENAME
3. Download the latest version of VORA_AM<version>.TGZ.

a. Log on to the Ambari cluster management node.
b. Remove the VORA or vora-<component> folders from the directory /var/lib/ambari-server/
resources/stacks/HDP/<HDP_version>/services/.
c. Download the latest version of VORA_AM<version>.TGZ from the SAP Software Download Center at
https://support.sap.com/swdc
to the management node.
d. Go to /var/lib/ambari-server/resources/stacks/HDP/<HDP_version>/services.
e. Copy VORA_AM<version>.TGZ to that directory and extract it.
4. Restart the Ambari server with the following command:
$ ambari-server restart
rights to do so.

Installation
PUBLIC
45
Wait until the Ambari Administration Interface is up and running.

5. Add the SAP HANA Vora engine to the cluster as a service using Ambari. To do this, complete steps 7 8
of the installation procedure. See Install SAP HANA Vora Using Ambari.
Related Information
Install SAP HANA Vora Using Ambari [page 17]
2.11.2 Update SAP HANA Vora Using Cloudera

Use the Cloudera provisioning tool to install the latest version of SAP HANA Vora on your cluster.
Procedure
1. Stop the SAP HANA Vora services.
a. On the Cloudera Manager Home page, click to the right of each SAP HANA Vora service and choose
Stop in the dropdown menu.
b. Choose Stop to confirm.
When you see a Finished status, the service has stopped.
2. Delete the SAP HANA Vora services.
a. On the Home page, click to the right of each SAP HANA Vora service and choose Delete in the
dropdown menu.
b. Choose Delete to confirm.
3. Delete the parcels.
a. Choose Hosts and then the Parcels tab.
b. Choose the Deactivate button next to SAPHanaVora and confirm.
c. In the dropdown menu next to SAPHANAVora, choose Remove From Hosts and confirm.
d. In the dropdown menu next to SAP HANA Vora, choose Delete and confirm.
e. Delete the SAP HANA Vora files in the directory /opt/cloudera/csd and /opt/cloudera/
parcel-repo/ from the management node.
4. Install the new version of the SAP HANA Vora engine according to the installation procedure. See Install
SAP HANA Vora Using Cloudera.
Related Information
Install SAP HANA Vora Using Cloudera [page 22]
46
PUBLIC

Installation
2.11.3 Update SAP HANA Vora for MapR

To update SAP HANA Vora for MapR, you need to perform an uninstall followed by a new installation.
Prerequisites
In order to avoid data loss:
Use the same hosts as before for the Distributed Log service
Do not change the persistency of the Distributed Log service
Procedure
1. Stop the HANA Vora Services completely, either using the MapR Control System or with the MapRCLI
command line tool.
2. Back up the configuration file:
cd /opt/mapr/conf/conf.d
cp vora_default_settings.sh vora_default_settings.sh.bak
3. On all cluster nodes, remove the "mapr-vora-base" package. This will also remove all dependent SAP
HANA Vora packages:
yum remove mapr-vora-base
4. Re-install SAP HANA Vora as described in Installing SAP HANA Vora for MapR.
Adjust the configuration file vora_default_settings.sh based on your previous settings.
Related Information
Installing SAP HANA Vora for MapR [page 29]
2.12 SAP HANA Vora Default Ports

By default, SAP HANA Vora is configured to use the port numbers given below.
Component
Port Number
Zeppelin
9099

Installation
PUBLIC
47
Component
Port Number
Thrift server
49155
SAP HANA Vora Tools
9225
Ambari
8080
Cloudera Manager
7180
48
PUBLIC

Installation
Administration
There are some standard administration tasks you need to perform and best practices for the ongoing
operation of your SAP HANA Vora services and Hadoop cluster.
See the following topics:
Topic
Description
Configure Proxy Settings [page 49]
If your cluster runs behind a proxy, set up your proxy settings
Automatically load data sources on startup
Start and Stop SAP HANA Vora Services [page

51]
Start, stop, and restart the SAP HANA Vora services on your cluster
Best Practices: Administration and Operations

[page 54]
Achieve higher performance on your cluster by observing some basic

best practices
Related Information
SAP HANA Vora Troubleshooting Information (SCN)
3.1
Configure Proxy Settings
If your cluster runs behind a proxy, you need to set up your proxy settings correctly so that the SAP HANA
Vora engine and Spark are able to access external services, such as Amazon S3.
Procedure
1. Make sure that the following environment variables have been configured with the appropriate URLs in
the /etc/environment file:
http_proxy
HTTP_PROXY
https_proxy
HTTPS_PROXY
FTP_PROXY
ftp_proxy
no_proxy

Administration
PUBLIC
49
You can add variables to the /etc/environment file as follows:
Sample Code
export
export
export
export
http_proxy=http://proxy.example.com:8080
HTTP_PROXY=http://proxy.example.com:8080
https_proxy=https://proxy.example.com:8080
HTTPS_PROXY=https://proxy.example.com:8080
If any of the variables are not set up properly, make the necessary corrections and then restart the SAP
HANA Vora service using the cluster provisioning tool (for example, Ambari or Cloudera Manager).
2. Make sure that the following variables are passed to the JVM running the Spark driver:
http.proxyHost
http.proxyPort
https.proxyHost
https.proxyPort
You can do this by setting the extraJavaOptions property in the spark-defaults.conf file.
If you are running Spark in YARN client mode, you can set the property as follows:
spark.yarn.am.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> Dhttps.proxyPort=<HTTPS_PORT>
If you are running Spark in YARN cluster mode, you can set the property as follows:
spark.driver.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> Dhttps.proxyPort=< HTTPS_PORT>
3.2
Enable Spark Auto-registration
The spark.sap.autoregister option is a Spark configuration parameter that specifies which data sources
should be automatically loaded on startup. This allows all tables that were previously loaded and saved in the
SAP HANA Vora catalog to be re-registered in the Spark context automatically.
Prerequisites
To use Spark auto-registration, the Discovery Service must be up and running.
Context
When you run the Thriftserver, for example, all tables will be automatically registered at startup if Spark autoregistration is enabled.
50
PUBLIC

Administration
To enable Spark auto-registration, you can set the Spark auto-registration option in the Spark defaults
configuration file or when executing spark-submit.
Procedure
Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) in
the spark-defaults.conf file:
Sample Code
spark.sap.autoregister com.sap.spark.vora
spark.vora.discovery <discovery_service_url>
Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) when
executing spark-submit:
Sample Code
spark-submit --conf spark.sap.autoregister=com.sap.spark.vora
--conf spark.vora.discovery=<discovery_service_url>
3.3
Start and Stop SAP HANA Vora Services
Use the cluster provisioning tool to start, stop, and restart the SAP HANA Vora services on your cluster.
Context
The task of managing the SAP HANA Vora services is handled by the Hadoop cluster manager. All actions you
take must be done through the cluster provisioning tool (Ambari or Cloudera), since the cluster manager will
otherwise not be able to keep track of the components that have been started.
To ensure that the SAP HANA Vora components are started in a way that enables them to operate together
correctly, it is important that you follow the bootstrapping guidelines.
Bear in mind that when you stop or restart the SAP HANA Vora engine instances, the data is removed
completely from the in-memory database. If SAP HANA Vora is needed to provide acceleration for a specific
query again, the fraction of data a certain instance was responsible for has to be reloaded from disk.
Note that Ambari is used in the procedure below. The procedure is similar for Cloudera.

Administration
PUBLIC
51
Procedure
1. On the Ambari dashboard, select a SAP HANA Vora service in the Services panel.
The Services summary tab shows how many instances of the selected SAP HANA Vora service are
running, for example:
2. On the Services page, you have the following options:

To start, stop, or restart all instances of the selected service, choose the appropriate option in the
Service Actions dropdown menu:
52
Option
Description
Start
Starts the SAP HANA Vora service on all hosts
Stop
Stops the SAP HANA Vora service on all hosts
Restart All
Stops and then starts the SAP HANA Vora service on all hosts
PUBLIC

Administration
Option
Description
Restart Vora <service>
Performs a rolling restart of the SAP HANA Vora service across all hosts.
You can specify the following:
Turn On Maintenance Mode
The number of instances to be started at a time
How long to wait between batches
The number of allowed restart failures
To only restart instances with stale configuration
To activate maintenance mode
Suppresses alerts generated by the SAP HANA Vora service
To start, stop, or restart the instances by host:

1. Click the SAP HANA Vora <service> link.
A list of hosts running the selected SAP HANA Vora service is displayed.
2. Click the relevant host link.
The component list and host details are displayed.
3. In the component list, locate the SAP HANA Vora service and choose the appropriate option from
the dropdown menu:
Next Steps
After restarting the SAP HANA Vora services, the tables no longer exist in the SAP HANA Vora in-memory
database. However, the associated metadata has been retained. To make the SAP HANA Vora engine
instances reload the data, you can use the markAllHostsAsFailed() function in the ClusterUtils object
as follows:
1. Start the Spark shell.
2. Run the following function, where discoveryAddress is the address of the Consul Discovery service. If
no argument is passed, the method will try to connect to the local Consul Discovery agent:
com.sap.spark.vora.client.ClusterUtils.markAllHostsAsFailed(discoveryAddress:
Option[String] = None): Unit
As a result, Spark will assume that the SAP HANA Vora engine instances are empty and reload the data
according to the metadata information.

Administration
PUBLIC
53
Related Information
3.4
Best Practices: Administration and Operations
By observing some basic best practices, you can achieve higher performance on your Hadoop cluster.
A Hadoop cluster typically involves a very large number of relatively similar computers. In general, a good way
to install a cluster is by distinguishing between four types of machines:
1. Cluster provisioning system with Ambari or Cloudera installed
2. Master cluster nodes that contain systems such as HDFS NameNodes and central cluster management
tools (such as the Yarn resource manager and ZooKeeper servers)
3. Worker nodes that do the actual computing and contain HDFS data
4. Jump boxes that contain only client components. These machines allow users to start their jobs.
Note that if you have a very specific setup where you have, for example, divided compute nodes and HDFS
data nodes, this might not be the best choice.
Related Information
HDFS [page 54]
Choosing a Cluster Manager [page 55]
Example Cluster Configuration Including a Client Machine (Jump Box) [page 55]
3.4.1 HDFS
By default HDFS stores three replicas of each data block on different machines. Besides the necessary fault
tolerance, this also increases data locality.
Be aware of the following, since this might affect the performance of the cluster when it is used in combination
with SAP HANA Vora:
If the data that is used for SQL processing is not evenly distributed this might lead to longer loading times
for tables. This might be the case if you delete a large amount of data (it will be unbalanced) or if you also
use HDFS for data that is not used for processing with SAP HANA Vora.
Using a lot of small files (that is, smaller than the block size of HDFS) will waste a lot of space.
Remember
It is important to keep the data that you use in SAP HANA Vora/Spark as evenly distributed as possible on
HDFS to increase speed. There are a number of HDFS tools available to re-balance the data.
54
PUBLIC

Administration
3.4.2 Choosing a Cluster Manager

The cluster manager is responsible for distributing tasks throughout the compute nodes of the cluster. Each
node that assumes computation tasks is managed by a cluster manager.
In order to run, an application requests resources from the cluster manager. If this is successful, the cluster
manager transfers the actual application to the nodes in question and starts it.
The cluster manager therefore serves as an abstraction layer for the application, allowing it to be developed
independently of the cluster setup. This means that Spark, as well as all its extensions for SAP HANA Vora, can
be installed on a single node and will then be automatically transferred to the compute nodes. The problem
with this, however, is that Spark itself also includes a cluster manager, called Spark standalone mode.
Logically, however, it is an independent system that is not related to the computational capabilities of Spark.
The system provided by SAP HANA Vora is completely independent of the cluster manager. If you are
deploying a test and development environment with a small number of nodes, we recommend that you choose
Sparks standalone cluster manager. For information about how to install it, see the Spark manual.
Your Hadoop distribution usually comes with a built-in cluster manager. In most cases, this is Yarn. Yarn
distinguishes between Node Managers, which are responsible for a compute node, and the Resource Manager,
which keeps track of the overall workload of the cluster and distributes tasks to the Node Managers.
Note
If your cluster manager has central components, such as the Resource Manager, you should put them on
separate machines that do not compute jobs.
Related Information
Spark Standalone Mode
3.4.3 Example Cluster Configuration Including a Client

Machine (Jump Box)
This example shows how a small Hadoop system consisting of 60 nodes in total can be configured.
Each node is quite small and contains 32 GB of RAM. Yarn is used as the cluster manager. The nodes are
configured as follows:
1 Ambari server
2 master nodes (Resource Manager, NameNodes, and ZooKeeper server)
56 worker/compute nodes
1 jump box containing client components
All components are provisioned by Ambari with the standard settings. Particularly noteworthy is the way the
jump box is configured to enable a user to easily deploy applications and use the platform.

Administration
PUBLIC
55
Each user is assigned a separate Linux user, including a home directory containing Spark binaries as well as a
shaded JAR of all the components and dependencies provided by SAP. Each user then has the following
directory structure:
/home/user/spark: Symlink to the current Spark installation
/home/user/sapjars: Shaded JARs
Each user also has a home directory on HDFS
For convenience, the environment variables are configured as follows in the .profile file:
# Include spark home
export SPARK_HOME="$HOME/spark"
# Hadoop conf dir
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export YARN_CONF_DIR="/etc/hadoop/conf"
export JAVA_HOME="/usr/jdk64/jdk1.7.0_67/"
export PATH="$PATH:$SPARK_HOME/bin"
To use the SAP HANA Vora Spark integration component, certain system-specific variables need to be
configured in Spark. See the developer manual for more details. For convenience, these are configured in the
spark-defaults.conf file so that all system-specific variables are located in one place:
spark.driver.extraJavaOptions
-XX:MaxPermSize=256m
# Uncomment the following line and enter your Amazon S3 secret access key, if
# you have one
# spark.vora.s3secretaccesskeyid
<S3 secret access key>
Based on this configuration, users can easily start a shell or deploy an application with the following
commands:
spark-shell --num-executors 3 --driver-memory 4g --executor-memory 2g
--master yarn-client --jars ~/sapjars/shaded.jar
spark-submit --class com.sap.spark.vora.example.ExampleQueryHDFS
--master yarn-client --jars sapjars/shaded.jar SparkVoraTrialProject-0.0.1.jar
56
PUBLIC

Administration
Security
When using a distributed system, you need to be sure that your data and processes support your business
needs without allowing unauthorized access to critical information. User errors, negligence, or attempted
manipulation of your system should not result in loss of information or processing time.
These demands on security apply likewise to SAP HANA Vora.
Security Guides
SAP HANA Vora functions as an execution engine within a Spark/Hadoop landscape. Therefore, the following
security guides outline all applicable security considerations:
Guide
Noteworthy Sections
Ambari Security Guide
Configuring Ambari and Hadoop for Kerberos
Cloudera Security Guide 5.5
/Cloudera Security Guide
Enabling Kerberos Authentication Using the Wizard
5.6
MapR Security Guide:
http://maprdocs.mapr.com/51/index.html#Security
Guide/SecurityOverview.html
Enabling and Disabling Security Features on Your Clus

ter:
Guide/c-enabling_and_disabling_security_fea
tures_on_your_cluster.html
Generating a maprticket from a Kerberos Ticket:
Guide/GeneratingMapRTicket.html
Spark Security
Full document
Related Information
Technical System Landscape [page 58]

Security
PUBLIC
57
4.1
Technical System Landscape
SAP HANA Vora integrates into the Hadoop ecosystem, as shown below.
When installed on nodes in an Ambari/Cloudera cluster, SAP HANA Vora becomes an available service that
can be added through the Ambari/Cloudera administration interface provided by the management node, in
parallel with existing services.
58
PUBLIC

Security
Important Disclaimers and Legal Information
Coding Samples
Any software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system
environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and
completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP
intentionally or by SAP's gross negligence.
Accessibility
The information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be
a binding guideline on how to ensure accessibility of software products. SAP in particular disclaims any liability in relation to this document. This disclaimer, however,
does not apply in cases of wilful misconduct or gross negligence of SAP. Furthermore, this document does not result in any direct or indirect contractual obligations of
SAP.
Gender-Neutral Language
As far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as
"sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun
does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.
Internet Hyperlinks
The SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does
not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any
damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for
transparency (see: http://help.sap.com/disclaimer).

Important Disclaimers and Legal Information
PUBLIC
59
go.sap.com/registration/
contact.html

No part of this publication may be reproduced or transmitted in any
form or for any purpose without the express permission of SAP SE
or an SAP affiliate company. The information contained herein may
be changed without prior notice.
Some software products marketed by SAP SE and its distributors
contain proprietary software components of other software
vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company
for informational purposes only, without representation or warranty
of any kind, and SAP or its affiliated companies shall not be liable for
errors or omissions with respect to the materials. The only
warranties for SAP or SAP affiliate company products and services
are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein
should be construed as constituting an additional warranty.
SAP and other SAP products and services mentioned herein as well
as their respective logos are trademarks or registered trademarks
of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the
trademarks of their respective companies.
Please see http://www.sap.com/corporate-en/legal/copyright/
index.epx for additional trademark information and notices.

SAP HANA Vora Installation Admin Guide en

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

SAP HANA Vora Installation Admin Guide en

Загружено:

Авторское право:

Доступные форматы

PUBLIC

SAP HANA Vora 1.2

SAP HANA Vora Installation and Administration

SAP HANA Vora and Apache Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

SAP HANA Vora Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

SAP HANA Vora Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Installation and Bootstrapping Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Collect Hadoop Cluster Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Installing SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Validate the SAP HANA Vora Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Install the SAP HANA Vora Zeppelin Interpreter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

Connect SAP HANA Spark Controller to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Connect SAP Lumira to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Updating SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

SAP HANA Vora Default Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Configure Proxy Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Enable Spark Auto-registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Start and Stop SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51

Best Practices: Administration and Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

SAP HANA Vora Installation and Administration Guide

Technical System Landscape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

SAP HANA Vora Installation and Administration Guide

Fast Query Execution

SAP HANA Integration

SAP HANA Vora and Apache Hadoop

SAP HANA Vora Installation and Administration Guide

An open operational framework for provisioning,

Cloudera Manager - Cloudera's automated cluster

The Hadoop Distributed File System.

HDFS Users Guide

A centralized service for maintaining configuration

Hadoops resource manager and job scheduler.

Apache Hadoop YARN

The Hadoop database.

A high-level data-flow language and execution

A module for structured and semi-structured data

Spark SQL and DataFrame Guide

A data warehouse infrastructure supporting data

A machine learning tool that runs on Spark.

Machine Learning Library (MLlib) Guide

SAP HANA Vora Installation and Administration Guide

In addition to this document, please refer to the following resources:

Release note for SAP HANA Vora

SAP Software Download Center

SAP HANA Vora on SAP Help Portal

SAP HANA Vora on SCN (SAP Community Net

SAP HANA Vora support component

HAN-VO: SAP HANA Vora

SAP HANA Vora Installation and Administration Guide

SAP HANA Vora Packages [page 9]

Ensure your Hadoop cluster is correctly set up and meets

Installation Prerequisites [page 9]

Installation and Bootstrapping Guidelines [page 13]

Collect and document essential information about your Ha

Collect Hadoop Cluster Information [page 16]

Download and install the package containing the SAP HANA

Installing SAP HANA Vora [page 16]

Validate the installation

Validate the SAP HANA Vora Installation [page 33]

Optionally enable the Zeppelin interpreter if you want to use

Install the SAP HANA Vora Zeppelin Interpreter [page 35]

Note that Zeppelin is still in the incubation phase.

Connect SAP HANA Spark Controller to SAP HANA Vora