Вы находитесь на странице: 1из 7

Description:

Spark is distributed with the Metrics Java library which can greatly enhance abilities to diagnose issues
with Spark jobs. This document describes how to configure Metrics to report to a Graphite backend and
view the results with Grafana.

Spark MetricsSystem
A MetricsSystem instance lives on every driver and executor and optionally exposes metrics to a variety
of Sinks while applications are running.

In this way, MetricsSystem offers the way to monitor Spark applications using a variety of third-party
tools.

Graphite
In particular, MetricsSystem includes bindings to ship metrics to Graphite, a popular open-source tool
for collecting and serving time series data.

Sending Metrics: Spark → Graphite


Spark’s MetricsSystem is configured via a metrics.properties file; Spark ships with a template that
provides examples of configuring a variety of Sources and Sinks. Hereis an example like the one we use.
Set up a metrics.properties file for yourself, accessible from the machine you’ll be starting your Spark
job from.

Next, pass the following flags to spark-submit:

--files=/path/to/metrics.properties \
--conf spark.metrics.conf=metrics.properties

The --files flag will cause /path/to/metrics.properties to be sent to every executor,


and spark.metrics.conf=metrics.properties will tell all executors to load that file when initializing their
respective MetricsSystems.

Grafana
Having thus configured Spark (and installed Graphite), we surveyed the many Graphite-visualization
tools that exist and began building custom Spark-monitoring dashboards using Grafana. Grafana is “an
open source, feature rich metrics dashboard and graph editor for Graphite, InfluxDB & OpenTSDB,” and
includes some powerful features for scripting the creation of dynamic dashboards, allowing us to
experiment with many ways of visualizing the performance of our Spark applications in real-time.
Architecture :

Installation
There are several pieces that need to be installed and made to talk to each other here:

1. Install Graphite.
2. Configure Spark to send metrics to your Graphite.
3. Install Grafana with your Graphite as a data source.
4. Create data source in Grafana pointing to the Graphite Host and port.
5. Build the dashboard with required queries.

Each of these steps is briefly discussed below.

Install Graphite
We will use the containerized version of Graphite. Follow the installation procedure in REHL version of
linux,

A) Install docker:

$sudo yum install -y yum-utils device-mapper-persistent-data lvm2


$sudo -E yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-
ce.repo
$sudo yum-config-manager --disable download.docker.com_linux_centos_docker-ce.rpm
$sudo yum update -y
$sudo yum install –y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$sudo yum-config-manager --enable rhui-REGION-rhel-server-extras
$sudo yum install -y docker-ce

B) Start Docker.

$sudo systemctl start docker


$sudo systemctl enable docker
$sudo systemctl status docker

C) Run the docker image :


sudo docker run -d --name graphite --restart=always -p 80:80 -p 2003-2004:2003-2004 -p 2023-
2024:2023-2024 -p 8125:8125/udp -p 8126:8126 graphiteapp/graphite-statsd

This starts a Docker container named: graphite

Please also note that you can freely remap container port to any host port in case of corresponding port
is already occupied on host. It's also not mandatory to map all ports, map only required ports - please
see table below.
Mapped Ports

Host Container Service

80 80 nginx

2003 2003 carbon receiver - plaintext

2004 2004 carbon receiver - pickle

2023 2023 carbon aggregator - plaintext

2024 2024 carbon aggregator - pickle

Graphite internal gunicorn port (without


8080 8080
Nginx proxying).

8125 8125 statsd

8126 8126 statsd admin

By default, statsd listens on the UDP port 8125. If you want it to listen on the TCP port 8125 instead, you
can set the environment variable STATSD_INTERFACE to tcp when running the container.

For more configuration settings options, refer to the following link:


https://github.com/graphite-project/docker-graphite-statsd

Once Graphite image started running, you can access the same by pointing your bowser to the
host name of the node you installed docker.
Example :
Configure Spark to Send Metrics to Graphite.

Create metrics.properties file with the following content, set the host and port of Graphite
accordingly, This file must be passed with Spark-Submit command .

*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=10.253.16.69
*.sink.graphite.port=2003
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.graphite.period=5
*.sink.graphite.unit=seconds
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

Example :
spark2-submit --files=/home/vishal/metrics.properties --conf spark.metrics.conf=./metrics.properties
--master yarn /home/vishal/Spark-Kafla/SparkWikiPedia/DataAnalysis/target/DataAnalysis-2.0.0-jar-
with-dependencies.jar --executor-memory 3G --num-executors 1
Install and Configure Grafana

The Grafana docs are pretty good, but a little lacking the "quick start" department. The
basic steps you need to follow are:

$ mkdir grafana_intallation_file

$ chmod 777 grafana_intallation_file/

$ sudo wget https://dl.grafana.com/oss/release/grafana-6.0.2-1.x86_64.rpm

$ sudo yum localinstall grafana-6.0.2-1.x86_64.rpm

$ sudo vi /etc/grafana/grafana.ini

$ sudo service grafana-server start

$ systemctl status grafana-server

This will start the grafana-server process as the grafana user, which was created
during the package installation. The default HTTP port is 3000 and default user and
group is admin.

Default login and password admin/ admin

To configure the Grafana server to start at boot time:

$ sudo update-rc.d grafana-server defaults

Point your browser to host:3000 to load Grafana.

Once in the Grafana UI, login as the Grafana Admin user (admin/admin).
Now that we are logged in to Grafana, we will add our new Graphite data source before we can build a
dashboard for our new spark metrics. Select to Data Sources on the left-hand side and select “Add new”
in the top of the screen. Here I am adding my “Graphite-Spark” source with Type Graphite. I did not add
any authentication to Graphite, so my Http Auth section is left empty. Access is set to direct. After you
click Save, the option “Test Connection” will be available.
Example JVM graph:

Вам также может понравиться