Академический Документы
Профессиональный Документы
Культура Документы
Content
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1
1.2
Related Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1
2.2
2.3
Installation Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Hadoop Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Cluster Provisioning Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Operating Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Supported Platforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Cluster Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Required Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
DLog Server Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1
3.2
3.3
3.4
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
HDFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Choosing a Cluster Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Example Cluster Configuration Including a Client Machine (Jump Box). . . . . . . . . . . . . . . . . . . . 55
4
Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Introduction
SAP HANA Vora provides an in-memory processing engine that is integrated into the Hadoop ecosystem and
Spark execution framework. Able to scale to thousands of nodes, it is designed for use in large distributed
clusters and for handling big data.
Data Analytics
SAP HANA Vora makes available OLAP-style capabilities for data on Hadoop, in particular, a hierarchy
implementation that allows hierarchical data structures to be defined and complex computations performed
on different levels of data. Extensions to Spark SQL also include enhancements to the data source API to
enable Spark SQL queries or parts of the queries to be pushed down to the SAP HANA Vora processing engine.
1.1
The SAP HANA Vora solution is built on the Hadoop ecosystem, an open-source project providing a collection
of components that support distributed processing of large data sets across a cluster of machines. Hadoop
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
allows both structured as well as complex, unstructured data to be stored, accessed, and analyzed across the
cluster.
The main components used in this environment are shown in the figure below:
Component
Description
Ambari
Cloudera
Cloudera
HDFS
Zookeeper
Apache ZooKeeper
Yarn
HBase
Apache HBase
Pig
Apache Pig
Spark SQL
Apache Hive
Apache Hive
MLib
More Information
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
1.2
Related Information
Details
http://service.sap.com/sap/support/notes/2284507
https://support.sap.com/swdc
http://help.sap.com/hana_vora
http://scn.sap.com/blogs/vora/
work)
SAP HANA Vora troubleshooting information
http://scn.sap.com/blogs/vora/2015/12/09/sap-hana-vora--trouble
shooting
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Installation
To install SAP HANA Vora, first familiarize yourself with the components it contains and the installation
package you require. Review the installation prerequisites to ensure a properly configured cluster and then
download and install the SAP HANA Vora package.
Complete the individual tasks in the following order:
Task
See
Understand what components make up the SAP HANA Vora SAP HANA Vora Components [page 8]
system
Find out which package is required to install SAP HANA
Vora and where it is available
Check the overview to see how and where SAP HANA Vora
components should be deployed
Update your SAP HANA Vora installation with the latest ver
sions of the installation packages
Related Information
SAP HANA Vora Default Ports [page 47]
SAP HANA Vora Troubleshooting Information (SCN)
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
2.1
The SAP HANA Vora system consists of two main components, the SAP HANA Vora engine, which needs to be
installed on all compute nodes in the cluster, and the SAP HANA Vora Spark extension library, which provides
access to the SAP HANA Vora engine and its functional features.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Related Information
SAP HANA Vora Packages [page 9]
Installation and Bootstrapping Guidelines [page 13]
2.2
To install the SAP HANA Vora system, you require a package containing the SAP HANA Vora engine and SAP
HANA Vora Spark extension library.
Separate installation packages are provided specifically for each of the cluster provisioning tools. This allows
the Ambari and Cloudera cluster provisioning tools to be used to install the SAP HANA Vora components on
the cluster. For MapR this is currently a manual installation.
The installation packages are as follows:
SAP HANA Vora for Ambari: VORA_AM<version>.TGZ
SAP HANA Vora for Cloudera: VORA_CL<version>.TGZ
SAP HANA Vora for MapR: VORA_MR<VERSION>.TGZ
The SAP HANA Vora Spark extension library contained in the packages consists of a JAR file (spark-sapdatasources-<VERSION>-assembly.jar) with all necessary dependencies and a number of shell scripts
for using the SAP HANA Vora extension through Spark.
The packages can be downloaded from the SAP Software Download Center: https://support.sap.com/swdc
2.3
Installation Prerequisites
A Hadoop cluster is a prerequisite for installing SAP HANA Vora. Review the installation requirements to
ensure that the cluster you use is correctly set up.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Compatibility Pack
SLES 11 SP3
libgcc_s1-4.7.2_20130108-0.17.2
libstdc++6-4.7.2_20130108-0.17.2
Install the RPM packages as follows, if they are not already installed by default:
10
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Operating System
Compatibility Pack
RHEL 6.7
To run SAP HANA Vora on RHEL 6.7, an additional runtime environment for GCC 4.7 is re
quired, which you can add by installing the RPM package compat-sap-c++ (see also SAP
Note 2001528
).
To be able to access the library, you need a subscription for "Red Hat Enterprise Linux Server
for SAP HANA". This allows you to subscribe your server to the "RHEL Server SAP HANA"
channel on the Red Hat Customer Portal or your local Satellite server. After you have subscri
bed your server to the channel, the output of yum repolist should contain the following:
For an up-to-date list of supported operating systems, see SAP Note 2284507
Hadoop Distribution
Hadoop
SLES 11 SP3
Ambari 2.2
HDP 2.3
Hadoop 2.7.1
SLES 11 SP3
Cloudera 5.5/5.6
CDH 5.5/5.6
Hadoop 2.6.0
RHEL 7.2
Ambari 2.2
HDP 2.3
Hadoop 2.7.1
RHEL 6.7
Ambari 2.2
HDP 2.3
Hadoop 2.7.1
RHEL 6.7
Cloudera 5.5/5.6
CDH 5.5/5.6
Hadoop 2.6.0
RHEL 7.2
MapR 5.1
Hadoop 2.7.0
RHEL 6.7
MapR 5.1
Hadoop 2.7.0
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
11
More Information
https://hadoop.apache.org/docs/stable/
ZooKeeper 3.4.6
http://zookeeper.apache.org/releases.html
Spark 1.5.2
https://spark.apache.org/releases/spark-release-1-5-2.html
https://spark.apache.org/docs/latest/running-on-yarn.html
Zeppelin v0.5.6
Optional allows you to use the Zeppelin integration. Note that Zeppelin is still in
the incubation phase: https://zeppelin.incubator.apache.org/
Procedure
1. Install the libaio package as follows:
Platform
Command
RHEL
SUSE
12
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Caution
Do not set the limit to a value larger than 1048576 or you may be unable to log in to your system
(notably on RHEL 7.1).
b. Log out or reboot so that the ulimit change takes effect.
2.3.8 Validation
To ensure that the components have been correctly installed, run a sample Spark application on the cluster,
such as SparkPi, which calculates the approximate value of Pi.
In the Spark shell, execute the following:
Sample Code
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client -num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 2
--queue default $SPARK_HOME/lib/spark-examples*.jar 10 2>/dev/null
You should see something like this:
Pi is roughly 3.140292
For more information, see Spark Examples
2.4
You need to choose appropriate nodes when you deploy the SAP HANA Vora components on the cluster. An
overview of the different node types and how and where SAP HANA Vora components should be deployed is
given below.
Node Types
For the purposes of setting up a cluster, four different types of cluster nodes are distinguished:
Node Type
Description
Management node
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
13
Node Type
Description
Master nodes
Worker nodes
These are the compute nodes of the cluster. They contain components such as
DataNodes or NodeManagers.
Jump boxes
Contain only client components, such as the HDFS client, and serve as an entry
point for users to start compute jobs using Spark.
Description
Installation
Server mode
Client mode
SAP HANA Vora Distributed Distributed log manager pro Install on at least one node
Log
viding persistence for the
Master nodes
SAP HANA Vora catalog
Five nodes recommended
No upper limit
Gateway compatible with the Install on a single node, typically the jump box (recom
Hive JDBC Driver
mended)
14
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Bootstrapping
Bootstrapping ensures that the SAP HANA Vora components are installed and started in a way that enables
them to operate together correctly. The sequence of actions is as follows:
1. The cluster has already been set up and core components, such as HDFS, Hadoop cluster manager, Yarn,
and ZooKeeper are up and running. SAP HANA Vora Base has been installed and deployed on all hosts, but
no SAP HANA Vora services are running yet.
Note
The task of starting services is handled by the Hadoop cluster manager. All actions you take must be
done through the cluster provisioning tool (Ambari or Cloudera). Interference with this process will
mean that the cluster manager cannot keep track of the components that have been started.
Components are started if their dependencies are already up and running, otherwise they will wait or
stop execution.
2. Start the Discovery Service.
The Discovery Service is responsible for handling the bootstrapping process and needs to be installed on
all nodes of the cluster in either server or client mode. You need to have at least three server deployments.
This ensures high availability if a server dies (a server is not automatically restarted if this happens). All
remaining hosts should have client deployments.
One of the server deployments needs to be selected as the bootstrapping host. A bootstrapping host is
needed until the Discovery Service is up and running.
Since the Discovery Service is deployed on each node, all other SAP HANA Vora components on the node
can access it through the localhost:8500. However, for test purposes and custom installations, it is
recommended that all SAP HANA Vora components have a parameter specifying the Discovery Service
deployment address and port. On a production system, this parameter is set to localhost:8500.
3. Start the Distributed Log.
The Distributed Log must be installed on at least one node. However, for the sake of redundancy, it is
recommended that you have five deployments of the Distributed Log (if sufficient resouces are available).
This ensures high availability if a server dies (a server is not automatically restarted if this happens). There
is no upper restriction on the number of nodes.
4. Start the SAP HANA Vora Catalog.
You must have exactly one deployment of the catalog. It is recommended that it is deployed on one of the
nodes used by the Distributed Log.
5. Start SAP HANA Vora.
SAP HANA Vora must be deployed on all DataNode nodes.
6. Start the SAP HANA Vora Thriftserver.
The Thriftserver must be deployed on a single node, typically the jump box.
7. Start the SAP HANA Vora Tools.
The Tools must be deployed on the same node as the SAP HANA Vora Thriftserver.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
15
2.5
Before proceeding with the installation, collect and document the following information about your Hadoop
cluster. You will need to have this information at hand during the installation.
Procedure
Make a note of the following information:
User and password for Ambari/Cloudera
Operating system user and password
HDFS user and password
Installation directories of Ambari/Cloudera, ZooKeeper, and so on
2.6
The SAP HANA Vora engine and extension library are contained in installation packages provided specifically
for each of the cluster provisioning tools.
The installation packages are as follows:
Ambari: VORA_AM<version>.TGZ
Cloudera: VORA_CL<version>.TGZ
MapR: VORA_MR<VERSION>.TGZ
The packages contain the following components, which you need to install and deploy on the cluster:
Component
Description
Vora Base
Vora Discovery
A distributed log manager providing persistence for the SAP HANA Vora catalog
Vora Catalog
Vora V2Server
Vora Thriftserver
Vora Tools
Note
If your Hadoop cluster requires an HTTP(S) proxy to access content through the HTTP(S) protocol, make
sure that the proxy is configured before starting SAP HANA Vora. For more information, see Configure
Proxy Settings [page 49].
16
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Procedure
Install SAP HANA Vora Using Ambari [page 17]
Install SAP HANA Vora Using Cloudera [page 22]
Installing SAP HANA Vora for MapR [page 29]
Procedure
1. Log on to the Ambari cluster management node.
2. Download VORA_AM<version>.TGZ from the SAP Software Download Center (https://support.sap.com/
swdc
3. Go to /var/lib/ambari-server/resources/stacks/HDP/2.3/services.
4. Copy VORA_AM<version>.TGZ to that directory and extract it.
5. Restart the Ambari server with the following command:
$ ambari-server restart
Depending on your cluster configuration, you may need to be the root user or a user with administrator
rights to do so.
6. Wait until the Ambari Administration Interface is up and running.
Ambari is now able to provision the SAP HANA Vora components on the Hadoop cluster.
7. On the Ambari dashboard, choose
Actions
Add Service .
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
17
2.6.1.1
Procedure
1. On the Choose Services screen, select the Vora Base option and click Next.
2. On the Assign Slaves and Clients screen, add the Vora Base component to all hosts and click Next.
3. Customize the service.
No configuration is needed.
4. Deploy the service and complete the installation.
The libraries and binaries provided by SAP HANA Vora Base have been distributed to all machines in the
cluster.
Note
SAP HANA Vora Base does not run as a service (you cannot start it).
2.6.1.2
Procedure
1. On the Choose Services screen, select the Vora Discovery option and click Next.
2. On the Assign Masters screen, add the servers on which the Discovery service should run.
You need to deploy it on at least three masters (that is, in server mode).
Click Next.
3. On the Assign Slaves and Clients screen, add the service to all remaining hosts.
The Discovery service needs to be installed on all nodes in the cluster, but must not be deployed in both
server and client mode (mutually exclusive) on the same node.
Click Next.
4. Customize the service.
18
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Description
vora_discovery_bootstrap_host
The server address of the bootstrap host. The bootstrap host can be any one
of the discovery masters you selected earlier. Note that you need to enter the
fully qualified domain name (FQDN). For example: mydiscserver1.mydo
main.org
The bootstrap host is responsible for bootstrapping the service if no Discov
ery service host is up and running. Once the initial servers have been added,
you can disable the bootstrap mode by removing the bootstrap host from this
field and restarting the server as a regular server.
vora_discovery_servers
Default Value
vora_discovery_log_dir
/var/log/vora-discovery
vora_discovery_log_level
WARNING
vora_discovery_data_dir
/var/local/vora-discovery
2.6.1.3
Procedure
1. On the Choose Services screen, select the Vora Distributed Log option and click Next.
2. On the Assign Masters screen, add the servers on which the Distributed Log should run. It must be
installed on at least one server, however, the recommended number of servers is five (if sufficient
resouces are available).
Click Next.
3. Customize the service.
In the Advanced vora-dlog-config section, correct the default log settings and other default values if
necessary:
Parameter
Default Value
vora_dlog_log_dir
/var/log/vora-dlog
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
19
Parameter
Default Value
vora_dlog_log_level
WARNING
vora_dlog_store_dir
/var/local/vora-dlog
vora_dlog_port
2.6.1.4
Procedure
1. On the Choose Services screen, select the Vora Catalog option and click Next.
2. On the Assign Masters screen, add the server on which the Catalog service should run. It must be installed
on a single server. It is recommended that it is deployed on one of the servers used by the Distributed Log.
Click Next.
3. Customize the service.
In the Advanced vora-catalog-config section, correct the default log settings if necessary:
Parameter
Default Value
vora_catalog_log_dir
/var/log/vora-catalog
vora_catalog_log_level
WARNING
vora_catalog_dlog_replication_factor
3
If you installed the SAP HANA Vora Distributed Log with N
servers, you need to specify a number M <= N as the rep
lication factor.
2.6.1.5
Procedure
1. On the Choose Services screen, select the Vora V2Server option and click Next.
20
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
2. On the Assign Slaves and Clients screen, add the service to the appropriate hosts.
We recommend that you add it to all data nodes, that is, each node that acts as a Spark worker node.
Click Next.
3. Customize the service.
In the Advanced vora-v2server-config section, modify the SAP HANA Vora V2Server configuration, if
needed. This includes, in particular, the file system location of the SAP HANA Vora engine logs:
Parameter
Default Value
vora_v2server_log_dir
/var/log/vora-v2server
vora_v2server_log_level
WARNING
Results
You can confirm that the SAP HANA Vora engine has been successfully deployed on the cluster nodes by
verifying that the v2server process is running on them.
2.6.1.6
Procedure
1. On the Choose Services screen, select the Vora Thriftserver option and click Next.
2. On the Assign Masters screen, add the server on which the Thriftserver should run. This is typically the
jump box.
Click Next.
3. Customize the service.
In the Advanced vora-thriftserver-config section, enter the following required information:
Parameter
Description
vora_thriftserver_java_home
Location of Java installation that is used for SAP HANA Vora Thriftserver
vora_thriftserver_spark_home
Location of Spark installation that is used for SAP HANA Vora Thriftserver
Correct the default log settings and other default values if necessary:
Parameter
Default Value
vora_thriftserver_log_dir
/var/log/vora-thriftserver
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
21
Parameter
Default Value
vora_thriftserver_log_level
WARNING
vora_thriftserver_metastore_dir
/tmp/vora-thriftserver
Related Information
Enable Spark Auto-registration [page 50]
2.6.1.7
Procedure
1. On the Choose Services screen, select the Vora Tools option and click Next.
2. On the Assign Masters screen, add the server on which the Tools should run. This needs to be the same as
that of the Thriftserver and is typically the jump box.
Click Next.
3. Customize the service.
In the Advanced vora-tools-config section, correct the default log settings if necessary:
Parameter
Default Value
vora_tools_log_dir
/var/log/vora-tools
vora_tools_log_level
WARNING
Procedure
1. Log on to the Cloudera cluster management node.
22
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
2.6.2.1
Procedure
1. On the Add a Service screen, select the Vora Base option and choose Continue.
2. On the role assignment page, click the box below Gateway.
The Hosts Selected dialog box appears.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
23
3. Add the SAP HANA Vora Base component to all hosts and choose OK.
4. Choose Continue.
5. When the component has been successfully installed, choose Continue and then Finish.
The libraries and binaries provided by SAP HANA Vora Base have been distributed to all machines in the
cluster.
Note
SAP HANA Vora Base does not run as a service (you cannot start it).
2.6.2.2
Procedure
1. On the Add a Service screen, select the Vora Discovery option and choose Continue.
2. On the role assignment page:
a. Click the box below Vora Discovery Server.
The Hosts Selected dialog box appears.
b. Add the servers on which the Discovery service should run. You need to deploy it on at least three
hosts (that is, in server mode).
c. Choose OK.
d. Click the box below Vora Discovery Client.
The Hosts Selected dialog box appears.
e. Add the service to all remaining hosts. The Discovery service needs to be installed on all nodes in the
cluster, but must not be deployed in both server and client mode (mutually exclusive) on the same
node.
f. Choose OK and then Continue.
3. On the review changes page, enter the following required information:
Parameter
Description
vora_discovery_bootstrap_host
The server address of the bootstrap host. The bootstrap host can be any one
of the discovery servers you selected earlier. Note that you need to enter the
fully qualified domain name (FQDN). For example: mydiscserver1.mydo
main.org
The bootstrap host is responsible for bootstrapping the service if no Discov
ery service host is up and running. Once the initial servers have been added,
you can disable the bootstrap mode by removing the bootstrap host from this
field and restarting the server as a regular server.
24
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Parameter
Description
vora_discovery_servers
Default Value
vora_discovery_log_dir
/var/log/vora-discovery
vora_discovery_log_level
WARNING
vora_discovery_data_dir
/var/local/vora-discovery
4. Choose Continue.
5. When the SAP HANA Vora Discovery service has been successfully started, choose Continue and then
Finish.
2.6.2.3
Procedure
1. On the Add a Service screen, select the Vora Distributed Log option and choose Continue.
2. On the role assignment page:
a. Click the box below Vora Distributed Log Server.
The Hosts Selected dialog box appears.
b. Add the servers on which the Distributed Log service should run. It must be installed on at least one
server, however, the recommended number of servers is five (if sufficient resouces are available).
c. Choose OK and then Continue.
3. On the review changes page, correct the default log settings and other default values if necessary:
Parameter
Default Value
vora_dlog_log_dir
/var/log/vora-dlog
vora_dlog_log_level
WARNING
vora_dlog_store_dir
/var/local/vora-dlog
vora_dlog_port
4. Choose Continue.
5. When the SAP HANA Vora Distributed Log service has been successfully started, choose Continue and
then Finish.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
25
2.6.2.4
Procedure
1. On the Add a Service screen, select the Vora Catalog option and choose Continue.
2. On the role assignment page:
a. Click the box below Vora Catalog Server.
The Hosts Selected dialog box appears.
b. Add the server on which the Catalog service should run. It must be installed on a single server. It is
recommended that it is deployed on one of the servers used by the Distributed Log.
c. Choose OK and then Continue.
3. On the review changes page, correct the default log settings if necessary:
Parameter
Default Value
vora_catalog_log_dir
/var/log/vora-catalog
vora_catalog_log_level
WARNING
vora_catalog_dlog_replication_factor
3
If you installed the SAP HANA Vora Distributed Log with N
servers, you need to specify a number M <= N as the rep
lication factor.
4. Choose Continue.
5. When the SAP HANA Vora Catalog service has been successfully started, choose Continue and then
Finish.
2.6.2.5
Procedure
1. On the Add a Service screen, select the Vora V2Server option and choose Continue.
2. On the role assignment page:
a. Click the box below Vora V2Server Worker.
The Hosts Selected dialog box appears.
b. Select the appropriate hosts from the list. We recommend that you add the SAP HANA Vora V2Server
service to each node that acts as a Spark worker node.
c. Choose OK and then Continue.
26
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
3. On the review changes page, correct the default data directory and log settings if necessary:
Parameter
Default Value
vora_v2server_log_dir
/var/log/vora-v2server
vora_v2server_log_level
WARNING
4. Choose Continue.
5. When the SAP HANA Vora V2Server service has been successfully started, choose Continue and then
Finish.
2.6.2.6
Prerequisites
To run the Thriftserver on Cloudera, you need to install Spark 1.5.2 on your jump box and set the
vora_thriftserver_spark_home parameter (see below) to this location. The Spark installation provided
by Cloudera does not include the necessary Spark Thriftserver packages.
Procedure
1. On the Add a Service screen, select the Vora Thriftserver option and choose Continue.
2. On the role assignment page:
a. Click the box below Vora Thriftserver Master.
The Hosts Selected dialog box appears.
b. Add the server on which the Thriftserver should run. This is typically the jump box.
c. Choose OK and then Continue.
3. On the review changes page, enter the following required information:
Parameter
Description
vora_thriftserver_spark_home
Location of Spark installation that is used for SAP HANA Vora Thriftserver
vora_thriftserver_java_home
Location of Java installation that is used for SAP HANA Vora Thriftserver
Default Value
vora_thriftserver_log_dir
/var/log/vora-thriftserver
vora_thriftserver_log_level
WARNING
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
27
Parameter
Default Value
vora_thriftserver_metastore_dir
/tmp/vora-thriftserver
4. Choose Continue.
5. When the SAP HANA Vora Thriftserver service has been successfully started, choose Continue and then
Finish.
Related Information
Enable Spark Auto-registration [page 50]
2.6.2.7
Procedure
1. On the Add a Service screen, select the Vora Tools option and choose Continue.
2. On the role assignment page:
a. Click the box below Vora Tools Master.
The Hosts Selected dialog box appears.
b. Add the server on which the Tools should run. This needs to be the same as that of the Thriftserver
and is typically the jump box.
c. Choose OK and then Continue.
3. On the review changes page, correct the default log settings if necessary:
Parameter
Default Value
vora_tools_log_dir
/var/log/vora-tools
vora_tools_log_level
WARNING
4. Choose Continue.
5. When the SAP HANA Vora Tools service has been successfully started, choose Continue and then Finish.
28
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Prerequisites
The MapR cluster is already set up.
For convenience, the MapR File System (MapR-FS) can be accessed through NFS on every node.
The mechanism for the MapR central configuration has been established.
Description
mapr-vora-base-<version>.<arch>.rpm
SAP HANA Vora base package: This package contains all SAP HANA
Vora executables and basic configuration files.
It needs to be installed on each node of the cluster.
Prerequisite: "mapr-core" package
mapr-vora-discovery-<version>.<arch>.rpm
mapr-vora-dlog-<version>.<arch>.rpm
Configuration files for the SAP HANA Vora Distributed Log Service.
This service needs to be deployed on at least one node, however, the
recommended number is five (if sufficient resouces are available).
Prerequisites: "mapr-vora-discovery" and the "libaio" library
mapr-vora-catalog-<version>.<arch>.rpm
SAP HANA Vora Catalog: The infrastructure for metadata, such as table
definitions.
This service needs to be deployed on a single node. It is recommended
that it is deployed on one of the servers used by the Distributed Log.
Prerequisite: "mapr-vora-dlog"
mapr-vora-v2server-<version>.<arch>.rpm
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
29
Package Name
Description
mapr-vora-thriftserver-<version>.<arch>.rpm
Configuration files for the Spark Thriftserver (including SAP HANA Vora
extensions)
Prerequisite: package "mapr-spark"
mapr-vora-tools-<version>.<arch>.rpm
Note
The MapR installer cannot yet be used to deploy the HANA Vora components across the cluster. However,
the manual installation steps required can be easily automated, using password-less SSH access as
described in the MapR installation guide.
Procedure
1. Prepare for Installation [page 30]
2. Install the SAP HANA Vora Packages [page 31]
3. Configure SAP HANA Vora [page 32]
4. Start SAP HANA Vora [page 33]
2.6.3.1
Procedure
1. Create a group "vora" and user "vora" on all nodes of the cluster.
When adding a user to the cluster nodes, make sure that the user ID (UID) is always the same. The same
applies to the group ID (GID). For example:
groupadd vora --gid 4999
useradd vora --uid 4999 -g vora
2. Download the file VORA_MR<VERSION>.TGZ from the SAP Software Download Center (https://
support.sap.com/swdc
Note
Since you need to be able to access the installation files from all nodes of the cluster, you might want to
move the files to a shared storage.
30
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Tip
Using MapR-FS NFS, you could move the files to /mapr/<cluster name>/user/mapr/vorainstall. This is equivalent to maprfs://user/mapr/vora-install.
2.6.3.2
Install the SAP HANA Vora packages on the appropriate nodes of the cluster.
Context
It is recommended that you distribute the services across the cluster as follows:
On all nodes: Deploy the packages mapr-vora-base and mapr-vora-discovery. Include the
Zookeeper and CLDB nodes.
On some nodes (minimum one, recommended five): Deploy the package mapr-vora-dlog.
On a single node: Deploy the package mapr-vora-catalog. It is recommended that it is deployed on one
of the servers used by the Distributed Log.
On most nodes: Deploy the package mapr-vora-v2server (SAP HANA Vora SQL engine).
On jump nodes: Deploy the packages mapr-vora-thriftserver and mapr-vora-tools.
Perform the steps outlined below on all nodes of the cluster.
Procedure
1. Log on to a cluster node with an administrative user, for example, the MapR user.
2. Navigate to the installation directory. For example:
cd /mapr/<cluster name>/user/mapr/vora-install
3. Install the packages as follows:
Red Hat
sudo yum install <package_file_name>
SUSE
sudo zypper install <package_file_name>
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
31
2.6.3.3
After the installation of the packages, you can adjust the SAP HANA Vora configuration to suit your own
requirements.
Context
The SAP HANA Vora configuration is contained in two configuration files.
Default settings
The file /opt/mapr/conf/conf.d/vora_default_settings.sh lists all configuration parameters for
the SAP HANA Vora services. It is realized as a shell script and uses environment variables for storage
purposes. The shell script is structured into functions, one for each service. All configuration parameters
have a description of the parameter, an allowed value range, and a default value.
Start settings
When a service is started, it is often necessary to consider the actual environment of a node or cluster to
derive or overwrite the default settings. The file /opt/mapr/conf/conf.d/vora_start_settings.sh
takes the default settings and changes improper values.
If possible, limit the adjustments you need to make to the configuration to the default settings file.
Procedure
1. Copy the file /opt/mapr/conf/conf.d/vora_default_settings.sh to a different local directory. For
example:
cp /opt/mapr/conf/conf.d/vora_default_settings.sh /tmp/
vora_default_settings.sh
2. Edit the temporary configuration file with a text editor.
3. Upload the temporary configuration file to the central configuration:
hadoop fs mkdir p /var/mapr/configuration/conf/conf.d
hadoop fs put /tmp/vora_default_settings.sh /var/mapr/configuration/conf/
conf.d
After some time, the central configuration is replicated to all cluster nodes.
The same procedure can be applied to the start settings file, if required.
32
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
2.6.3.4
Integrate the new services into the MapR cluster and launch them.
Procedure
1. Execute the following on all cluster nodes:
sudo /opt/mapr/server/configure.sh -R
2. Log on to the MapR Control System and verify the service status on the various cluster nodes.
2.7
To check that the SAP HANA Vora engine and extension library have been correctly installed and that you can
use the SAP HANA Vora features in Spark, create a table and load data into it from a file stored in HDFS.
Prerequisites
You have already successfully deployed the SAP HANA Vora components on the cluster and the instances
are running.
You have already installed Spark.
Context
The location of the SAP HANA Vora spark extension depends on your installation:
Ambari, for example: /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vora-base/
package/lib/vora-spark
Cloudera, for example: /opt/cloudera/parcels/SAPHanaVora-1.2.35.97/lib/vora-spark
It contains the following folders:
lib/: Contains the spark-sap-datasources-<VERSION>-assembly.jar file with all necessary
dependencies (excluding Spark).
bin/: Contains scripts for ease of use.
META-INF/: Contains the pom.properties and pom.xml files.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
33
Procedure
1. Create a file in HDFS. Note that in this example the test file, test.csv, is stored in a directory set up for
the user "vora" (user/vora):
Sample Code
echo "1,2,Hello" > test.csv
hadoop fs -put test.csv
hadoop fs -cat /user/vora/test.csv
1,2,Hello
2. Open a Spark shell, for example, by using the shell script:
/<vora-spark-extension-path>/vora-spark/bin/start-spark-shell.sh
3. Enter the following statements in the Spark shell to create a table and check that it has been successfully
created:
scala> import org.apache.spark.sql.SapSQLContext
scala> val vc = new SapSQLContext(sc)
scala> val testsql = """
CREATE TABLE table001 (a1 double, a2 int, a3 string)
USING com.sap.spark.vora
OPTIONS (
tablename "table001",
paths "/user/vora/test.csv"
)"""
scala> vc.sql(testsql)
scala> vc.sql("show tables").show
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
| table001|
false|
+---------+-----------+
scala> vc.sql("SELECT * FROM table001").show
+---+--+-----+
| a1|a2|
a3|
+---+--+-----+
|1.0| 2|Hello|
+---+--+-----+
scala > <Ctrl-C to quit>
Results
You have now successfully validated the SAP HANA Vora extension and can use it as follows:
The JAR file in the lib folder (spark-sap-datasources-VERSION-assembly.jar) can be provided to
Spark using the --jars option.
For example, assuming the spark-shell command is on the user's path:
$ spark-shell --jars /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vorabase/package/lib/vora-spark/lib/spark-sap-datasources-VERSION-assembly.jar
34
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Alternatively, the shell scripts in the bin folder can be used to run a Spark shell with the SAP HANA Vora
extension library. To do so, the SPARK_HOME environment variable needs to point to the Spark folder on
the jump box.
You can then start the Spark shell in Yarn client mode as follows:
$ ./start-spark-shell.sh --master yarn-client
2.8
Zeppelin is a graphical user interface that allows you, as a data scientist, to interact easily with a cluster. The
SAP HANA Vora Spark extension provides an interpreter for the Zeppelin user interface.
Prerequisites
You require Zeppelin 0.5.6 built against Spark 1.5.2, Hadoop 2.6, and Yarn, installed on one of the cluster
nodes (most likely the jump box):
You can build a compatible Zeppelin version as follows (you need Maven 3.1 or higher):
$
$
$
$
To build Zeppelin for MapR 5.x distributions, you need to enable the "mapr50" build profile. The maven build
call for MapR 5.x distributions looks as follows:
$ mvn clean package -DskipTests -Pspark-1.5 -Pmapr50 -Pyarn -Pbuild-distr
After the build process has completed, you should have a tar.gz package in the following directory:
./zeppelin-distribution/target
Context
The SAP HANA Vora extension library has its own SQL context class. A modified Zeppelin interpreter is
therefore required to allow Zeppelin to run in the modified context. To enable the interpreter, you need to
register it with Zeppelin.
Restriction
Zeppelin is still in the incubation stage. The steps below are provided for guidance only.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
35
Procedure
1. Copy spark-sap-datasources-<VERSION>-assembly.jar to <ZEPPELIN_HOME>/interpreter/
spark:
$ cp ~/vora-spark/lib/spark-sap-datasources-<VERSION>-assembly.jar \
<ZEPPELIN_HOME>/interpreter/spark/spark-sap-datasources-assembly.jar
Note
The location of the spark-sap-datasources-<VERSION>-assembly.jar file depends on your
installation:
Ambari, for example: /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vorabase/package/lib/vora-spark/lib/
Cloudera, for example: /opt/cloudera/parcels/SAPHanaVora-1.2.35.97/lib/voraspark/lib
<ZEPPELIN_HOME> refers to the directory to which the Zeppelin binaries have been extracted.
2. Combine the Zeppelin Spark interpreter JAR with the spark-sap-datasources-assembly JAR,
replacing the versions as appropriate:
$
$
$
$
$
$
$
$
cd `<ZEPPELIN_HOME>/interpreter/spark`
mkdir tmp
(cd tmp; jar -xf ../spark-sap-datasources-<VERSION>-assembly.jar)
(cd tmp; jar -xf ../zeppelin-spark-<VERSION>-incubating.jar)
jar -cvf zeppelin-spark-sap-combined.jar -C tmp .
// remove the old jars
rm spark-sap-datasources-<VERSION>-assembly.jar
rm zeppelin-spark-<VERSION>-incubating.jar
Example
1. cp $ZEPPELIN_HOME/conf/zeppelin-env.sh.template $ZEPPELIN_HOME/conf/
zeppelin-env.sh
2. chmod 0755 $ZEPPELIN_HOME/conf/zeppelin-env.sh
3. vi $ZEPPELIN_HOME/conf/zeppelin-env.sh
4. Insert the variables shown above and save your changes.
36
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Note
Zeppelin also requires the environment variables SPARK_HOME and HADOOP_CONF_DIR to be set. If
these are not already set, you can add them to the zeppelin-env.sh file as well.
4. Add the interpreter class org.apache.spark.sql.SapSqlInterpreter to the
zeppelin.interpreters property in the <ZEPPELIN_HOME>/conf/zeppelin-sites.xml file:
...
<property>
<name>zeppelin.interpreters</name>
<value>INTERPRETER_1,...,INTERPRETER_N,org.apache.spark.sql.SapSqlInterpreter<
/value>
<description>Comma separated interpreter configurations.
First interpreter becomes the default</description>
</property>
...
Note
Make sure that the SAP interpreter class "org.apache.spark.sql.SapSqlInterpreter" occurs after the
Spark interpreter class "org.apache.zeppelin.spark.SparkInterpreter" in the resulting list of
interpreters.
5. For HDP with Ambari only: Update the YARN configuration as follows:
a. Check the installed HDP version (<HDP_VERSION>), for example, from the following directory
name: /usr/hdp/<HDP_VERSION>
b. On the Ambari administration interface, select the YARN service and choose the
Advanced
Configs
tab. Scroll down to the Custom yarn-site section and choose Add Property.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
37
Example output:
Note
The log files are available as follows:
<ZEPPELIN_HOME>/logs/zeppelin-*-.log: Contains the Web-UI related output.
<ZEPPELIN_HOME>/logs/zeppelin-interpreter-*-.log: Contains the output you would see
in a Spark shell.
2.9
Configure the Spark controller to use SAP HANA Vora. This allows you to connect from SAP HANA to SAP
HANA Vora and query SAP HANA Vora tables.
Prerequisites
The Spark controller has been installed and configured. For more information, see Set up SAP HANA
Spark Controller in the SAP HANA Administration Guide.
When installing the Spark controller as described in Set up SAP HANA Spark Controller, the following
steps are not necessary:
Install Spark Assemby Files and Dependent Libraries
The three datanucleus artifacts listed in this section are not needed when you run the Spark
controller with SAP HANA Vora:
datanucleus-rdbms
datanucleus-api-jdo
38
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
datanucleus-core
Do not download and copy these artifacts to HDFS.
Configure Hive Metastore
You do not need to copy the hive-site.xml when you run the Spark controller with SAP HANA Vora.
If you do copy the datanucleus* artifacts and hive-site.xml, you might encounter issues unless you
have a valid Hive installation that is appropriately configured and your Hive metastore is running properly.
Context
Restriction
MapR does not yet support the SAP HANA Spark controller. For more information, see SAP Note 2284507.
Procedure
1. Make the SAP HANA Vora data sources package available to the Spark controller.
Copy spark-sap-datasources-<VERSION>-assembly.jar to the folder /usr/sap/spark/
controller/lib/.
Make sure that you copy the same version that you are using to create tables. Compatibility between
different packages is not always guaranteed.
2. Configure the Spark controller.
In the Spark controller configuration file /usr/sap/spark/controller/conf/hanaes-site.xml,
change the value of the property sap.hana.hadoop.datastore from 'hive' to 'vora'. It should look like
this:
<property>
<name>sap.hana.hadoop.datastore</name>
<value>vora</value>
<final>true</final>
</property>
3. Restart the Spark controller.
For the configuration changes to take effect, restart the Spark controller, for example, using the following
commands:
$ cd /usr/sap/spark/controller/bin
$ ./hanaes stop
$ ./hanaes start
4. Verify the configuration changes.
To verify whether the configuration changes were successful, check the Spark controller log
file: /var/log/hanaes/hana_controller.log
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
39
After initialization, the file should contain the following lines at the end:
(DATE
(DATE
(DATE
(DATE
and
and
and
and
TIME)
TIME)
TIME)
TIME)
INFO
INFO
INFO
INFO
If these lines are missing, double-check whether the spark-sap-datasources-<VERSION>assembly.jar is present and the configuration settings are correct.
Results
After successful configuration, you can see the tables stored in SAP HANA Vora in SAP HANA Studio, and you
can add virtual tables and submit queries, as described in the SAP HANA Spark Controller documentation.
Related Information
SAP HANA Spark Controller
SAP Note 2284507
Prerequisites
You need SAP Lumira version 1.29 or higher.
Context
To use SAP Lumira with SAP HANA Vora, you need to install the relevant drivers in SAP Lumira to be able to
connect from SAP Lumira using JDBC. You can then create a connection to SAP HANA Vora using the SAP
HANA Vora Thrift server.
40
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Procedure
1. Install the JDBC driver. You need to use the Spark drivers.
a. Open SAP Lumira and choose
Preferences
SQL Drivers .
b. Select Generic JDBC datasource JDBC Drivers and choose Install Drivers.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
41
c. Select Generic JDBC datasource JDBC Drivers and choose Next. Note that the green tick indicates
that the drivers are installed.
Value
User name/password
lumira/lumira
JDBC URL
jdbc:spark://<host>:<port>/
default;CatalogSchemaSwitch=0;UseNativeQuery=1
JDBC Class
com.simba.spark.jdbc4.Driver
e. Choose Connect.
You should now see the CATALOG_VIEW, where you can select tables and enter SQL queries.
4. Use Beeline, a JDBC client, to register tables created in SAP HANA Vora in the Thrift server.
42
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Note
Table definitions are stored in the SAP HANA Vora catalog. This allows you to register or re-register
tables when you start or restart the Thrift server. The tables are persisted as long as the Thrift
server is connected.
5. View the data in SAP Lumira.
a. In SAP Lumira, refresh the CATALOG_VIEW (see step 3 above) by choosing Previous and then Next.
b. Drill down in the CATALOG_VIEW into Spark to see the tables available on the Thrift server.
c. In the Query field, enter a select statement and choose Preview. Note that you need to use the same
format for select statements as in the Beeline command line client.
A preview of the selected data is displayed.
d. Use the standard SAP Lumira functionality to create a report and visualize the data.
Related Information
SAP Lumira
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
43
2.11
Update your SAP HANA Vora installation by downloading and installing the latest versions of the installation
packages.
Remember
If Zeppelin has been configured to support the SAP HANA Vora Spark extension library, you will also need to
update the library in the <ZEPPELIN_HOME>/interpreter/spark directory.
Restriction
Note that when upgrading from SAP HANA Vora 1.1 to SAP HANA Vora 1.2, the ZooKeeper catalog is
replaced by the SAP HANA Vora catalog. A migration tool is not available for automatically transferring the
ZooKeeper catalog contents to the SAP HANA Vora catalog.
Related Information
Update SAP HANA Vora Using Ambari [page 44]
Update SAP HANA Vora Using Cloudera [page 46]
Update SAP HANA Vora for MapR [page 47]
Install the SAP HANA Vora Zeppelin Interpreter [page 35]
Procedure
1. Stop the SAP HANA Vora services.
a. In the Services panel on the dashboard, select a SAP HANA Vora service.
b. In the Service Actions dropdown menu on the Services page, choose Stop.
c. Repeat for all other SAP HANA Vora services.
2. Remove the services.
Run the following command from any machine where curl is available, for example, the management node
of the cluster, replacing the placeholders with appropriate values:
curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -X DELETE -H 'X-Requested-By:admin' \
44
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
http://<YOUR_MGMT_NODE_FQDN>:8080/api/v1/clusters/\
<YOUR_CLUSTER_NAME>/services/<SERVICE_NAME>
Replace SERVICE_NAME as follows:
Service
service_name
VORA
Vora Base
HANA_VORA_BASE
Vora Catalog
HANA_VORA_CATALOG
Vora Discovery
HANA_VORA_DISCOVERY
HANA_VORA_DLOG
Vora Thriftserver
HANA_VORA_THRIFTSERVER
Vora Tools
HANA_VORA_TOOLS
Vora V2Server
HANA_VORA_V2SERVER
Note
If a service is shown as stopped on the Ambari UI, but Ambari responds that it isn't when you try and
remove it, you can use the following commands to stop it:
To stop a component, run the following command for every component of the SAP HANA Vora service:
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":
{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}'
http://$AMBARI_SERVER:8080/api/v1/clusters/$CLUSTER_NAME/hosts/
$COMPONENT_MACHINE/host_components/$COMPONENT_NAME
To stop a service, run the following command once for the SAP HANA Vora service:
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":
{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}'
http://$AMBARI_SERVER:8080/api/v1/clusters/$CLUSTER_NAME/services/
$SERVICENAME
d. Go to /var/lib/ambari-server/resources/stacks/HDP/<HDP_version>/services.
e. Copy VORA_AM<version>.TGZ to that directory and extract it.
4. Restart the Ambari server with the following command:
$ ambari-server restart
Depending on your cluster configuration, you may need to be the root user or a user with administrator
rights to do so.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
45
Related Information
Install SAP HANA Vora Using Ambari [page 17]
Procedure
1. Stop the SAP HANA Vora services.
a. On the Cloudera Manager Home page, click to the right of each SAP HANA Vora service and choose
Stop in the dropdown menu.
b. Choose Stop to confirm.
When you see a Finished status, the service has stopped.
2. Delete the SAP HANA Vora services.
a. On the Home page, click to the right of each SAP HANA Vora service and choose Delete in the
dropdown menu.
b. Choose Delete to confirm.
3. Delete the parcels.
a. Choose Hosts and then the Parcels tab.
b. Choose the Deactivate button next to SAPHanaVora and confirm.
c. In the dropdown menu next to SAPHANAVora, choose Remove From Hosts and confirm.
d. In the dropdown menu next to SAP HANA Vora, choose Delete and confirm.
e. Delete the SAP HANA Vora files in the directory /opt/cloudera/csd and /opt/cloudera/
parcel-repo/ from the management node.
4. Install the new version of the SAP HANA Vora engine according to the installation procedure. See Install
SAP HANA Vora Using Cloudera.
Related Information
Install SAP HANA Vora Using Cloudera [page 22]
46
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Prerequisites
In order to avoid data loss:
Use the same hosts as before for the Distributed Log service
Do not change the persistency of the Distributed Log service
Procedure
1. Stop the HANA Vora Services completely, either using the MapR Control System or with the MapRCLI
command line tool.
2. Back up the configuration file:
cd /opt/mapr/conf/conf.d
cp vora_default_settings.sh vora_default_settings.sh.bak
3. On all cluster nodes, remove the "mapr-vora-base" package. This will also remove all dependent SAP
HANA Vora packages:
yum remove mapr-vora-base
4. Re-install SAP HANA Vora as described in Installing SAP HANA Vora for MapR.
Adjust the configuration file vora_default_settings.sh based on your previous settings.
Related Information
Installing SAP HANA Vora for MapR [page 29]
Port Number
Zeppelin
9099
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
47
Component
Port Number
Thrift server
49155
9225
Ambari
8080
Cloudera Manager
7180
48
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Administration
There are some standard administration tasks you need to perform and best practices for the ongoing
operation of your SAP HANA Vora services and Hadoop cluster.
See the following topics:
Topic
Description
Start, stop, and restart the SAP HANA Vora services on your cluster
Related Information
SAP HANA Vora Troubleshooting Information (SCN)
3.1
If your cluster runs behind a proxy, you need to set up your proxy settings correctly so that the SAP HANA
Vora engine and Spark are able to access external services, such as Amazon S3.
Procedure
1. Make sure that the following environment variables have been configured with the appropriate URLs in
the /etc/environment file:
http_proxy
HTTP_PROXY
https_proxy
HTTPS_PROXY
FTP_PROXY
ftp_proxy
no_proxy
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
49
Sample Code
export
export
export
export
http_proxy=http://proxy.example.com:8080
HTTP_PROXY=http://proxy.example.com:8080
https_proxy=https://proxy.example.com:8080
HTTPS_PROXY=https://proxy.example.com:8080
If any of the variables are not set up properly, make the necessary corrections and then restart the SAP
HANA Vora service using the cluster provisioning tool (for example, Ambari or Cloudera Manager).
2. Make sure that the following variables are passed to the JVM running the Spark driver:
http.proxyHost
http.proxyPort
https.proxyHost
https.proxyPort
You can do this by setting the extraJavaOptions property in the spark-defaults.conf file.
If you are running Spark in YARN client mode, you can set the property as follows:
spark.yarn.am.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> Dhttps.proxyPort=<HTTPS_PORT>
If you are running Spark in YARN cluster mode, you can set the property as follows:
spark.driver.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> Dhttps.proxyPort=< HTTPS_PORT>
3.2
The spark.sap.autoregister option is a Spark configuration parameter that specifies which data sources
should be automatically loaded on startup. This allows all tables that were previously loaded and saved in the
SAP HANA Vora catalog to be re-registered in the Spark context automatically.
Prerequisites
To use Spark auto-registration, the Discovery Service must be up and running.
Context
When you run the Thriftserver, for example, all tables will be automatically registered at startup if Spark autoregistration is enabled.
50
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
To enable Spark auto-registration, you can set the Spark auto-registration option in the Spark defaults
configuration file or when executing spark-submit.
Procedure
Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) in
the spark-defaults.conf file:
Sample Code
spark.sap.autoregister com.sap.spark.vora
spark.vora.discovery <discovery_service_url>
Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) when
executing spark-submit:
Sample Code
spark-submit --conf spark.sap.autoregister=com.sap.spark.vora
--conf spark.vora.discovery=<discovery_service_url>
3.3
Use the cluster provisioning tool to start, stop, and restart the SAP HANA Vora services on your cluster.
Context
The task of managing the SAP HANA Vora services is handled by the Hadoop cluster manager. All actions you
take must be done through the cluster provisioning tool (Ambari or Cloudera), since the cluster manager will
otherwise not be able to keep track of the components that have been started.
To ensure that the SAP HANA Vora components are started in a way that enables them to operate together
correctly, it is important that you follow the bootstrapping guidelines.
Bear in mind that when you stop or restart the SAP HANA Vora engine instances, the data is removed
completely from the in-memory database. If SAP HANA Vora is needed to provide acceleration for a specific
query again, the fraction of data a certain instance was responsible for has to be reloaded from disk.
Note that Ambari is used in the procedure below. The procedure is similar for Cloudera.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
51
Procedure
1. On the Ambari dashboard, select a SAP HANA Vora service in the Services panel.
The Services summary tab shows how many instances of the selected SAP HANA Vora service are
running, for example:
52
Option
Description
Start
Stop
Restart All
Stops and then starts the SAP HANA Vora service on all hosts
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Option
Description
Performs a rolling restart of the SAP HANA Vora service across all hosts.
You can specify the following:
Next Steps
After restarting the SAP HANA Vora services, the tables no longer exist in the SAP HANA Vora in-memory
database. However, the associated metadata has been retained. To make the SAP HANA Vora engine
instances reload the data, you can use the markAllHostsAsFailed() function in the ClusterUtils object
as follows:
1. Start the Spark shell.
2. Run the following function, where discoveryAddress is the address of the Consul Discovery service. If
no argument is passed, the method will try to connect to the local Consul Discovery agent:
com.sap.spark.vora.client.ClusterUtils.markAllHostsAsFailed(discoveryAddress:
Option[String] = None): Unit
As a result, Spark will assume that the SAP HANA Vora engine instances are empty and reload the data
according to the metadata information.
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
53
Related Information
Installation and Bootstrapping Guidelines [page 13]
3.4
By observing some basic best practices, you can achieve higher performance on your Hadoop cluster.
A Hadoop cluster typically involves a very large number of relatively similar computers. In general, a good way
to install a cluster is by distinguishing between four types of machines:
1. Cluster provisioning system with Ambari or Cloudera installed
2. Master cluster nodes that contain systems such as HDFS NameNodes and central cluster management
tools (such as the Yarn resource manager and ZooKeeper servers)
3. Worker nodes that do the actual computing and contain HDFS data
4. Jump boxes that contain only client components. These machines allow users to start their jobs.
Note that if you have a very specific setup where you have, for example, divided compute nodes and HDFS
data nodes, this might not be the best choice.
Related Information
HDFS [page 54]
Choosing a Cluster Manager [page 55]
Example Cluster Configuration Including a Client Machine (Jump Box) [page 55]
3.4.1 HDFS
By default HDFS stores three replicas of each data block on different machines. Besides the necessary fault
tolerance, this also increases data locality.
Be aware of the following, since this might affect the performance of the cluster when it is used in combination
with SAP HANA Vora:
If the data that is used for SQL processing is not evenly distributed this might lead to longer loading times
for tables. This might be the case if you delete a large amount of data (it will be unbalanced) or if you also
use HDFS for data that is not used for processing with SAP HANA Vora.
Using a lot of small files (that is, smaller than the block size of HDFS) will waste a lot of space.
Remember
It is important to keep the data that you use in SAP HANA Vora/Spark as evenly distributed as possible on
HDFS to increase speed. There are a number of HDFS tools available to re-balance the data.
54
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Note
If your cluster manager has central components, such as the Resource Manager, you should put them on
separate machines that do not compute jobs.
Related Information
Spark Standalone Mode
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
55
Each user is assigned a separate Linux user, including a home directory containing Spark binaries as well as a
shaded JAR of all the components and dependencies provided by SAP. Each user then has the following
directory structure:
/home/user/spark: Symlink to the current Spark installation
/home/user/sapjars: Shaded JARs
Each user also has a home directory on HDFS
For convenience, the environment variables are configured as follows in the .profile file:
# Include spark home
export SPARK_HOME="$HOME/spark"
# Hadoop conf dir
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export YARN_CONF_DIR="/etc/hadoop/conf"
export JAVA_HOME="/usr/jdk64/jdk1.7.0_67/"
export PATH="$PATH:$SPARK_HOME/bin"
To use the SAP HANA Vora Spark integration component, certain system-specific variables need to be
configured in Spark. See the developer manual for more details. For convenience, these are configured in the
spark-defaults.conf file so that all system-specific variables are located in one place:
spark.driver.extraJavaOptions
-XX:MaxPermSize=256m
# Uncomment the following line and enter your Amazon S3 secret access key, if
# you have one
# spark.vora.s3secretaccesskeyid
<S3 secret access key>
Based on this configuration, users can easily start a shell or deploy an application with the following
commands:
spark-shell --num-executors 3 --driver-memory 4g --executor-memory 2g
--master yarn-client --jars ~/sapjars/shaded.jar
spark-submit --class com.sap.spark.vora.example.ExampleQueryHDFS
--master yarn-client --jars sapjars/shaded.jar SparkVoraTrialProject-0.0.1.jar
56
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Security
When using a distributed system, you need to be sure that your data and processes support your business
needs without allowing unauthorized access to critical information. User errors, negligence, or attempted
manipulation of your system should not result in loss of information or processing time.
These demands on security apply likewise to SAP HANA Vora.
Security Guides
SAP HANA Vora functions as an execution engine within a Spark/Hadoop landscape. Therefore, the following
security guides outline all applicable security considerations:
Guide
Noteworthy Sections
5.6
MapR Security Guide:
http://maprdocs.mapr.com/51/index.html#Security
Guide/SecurityOverview.html
Spark Security
Full document
Related Information
Technical System Landscape [page 58]
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
57
4.1
SAP HANA Vora integrates into the Hadoop ecosystem, as shown below.
When installed on nodes in an Ambari/Cloudera cluster, SAP HANA Vora becomes an available service that
can be added through the Ambari/Cloudera administration interface provided by the management node, in
parallel with existing services.
58
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
Coding Samples
Any software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system
environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and
completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP
intentionally or by SAP's gross negligence.
Accessibility
The information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be
a binding guideline on how to ensure accessibility of software products. SAP in particular disclaims any liability in relation to this document. This disclaimer, however,
does not apply in cases of wilful misconduct or gross negligence of SAP. Furthermore, this document does not result in any direct or indirect contractual obligations of
SAP.
Gender-Neutral Language
As far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as
"sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun
does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.
Internet Hyperlinks
The SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does
not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any
damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for
transparency (see: http://help.sap.com/disclaimer).
PUBLIC
2016 SAP SE or an SAP affiliate company. All rights reserved.
59
go.sap.com/registration/
contact.html