Вы находитесь на странице: 1из 11

Lessons Learned Benchmarking

NoSQL on the AWS Cloud


(AerospikeDB and Redis)

In this post Ill summarize what I learned from running benchmark tests on
virtual machines on the AWS Cloud with the Aerospike team and also as I
validated their test results independently. Ill also discuss benchmarking
techniques & results for this particular set of test databases. In the process of validating benchmarks, I learned many broadly applicable
AWS-specific EC2 benchmarking practices that I will include.
I tested two NoSQL databases Aerospike and Redis. Both databases are
known for speed and are often used for caching or as fast key value stores via in-memory implementation. Aerospike is built to be extremely fast by leveraging SSDs for persistence and to be very easy to scale. By contrast, Redis
is built primarily as a fast in memory store.
Aerospike is multithreaded and Redis is single threaded. For the benchmark tests, I compared both as simple
key-value stores. To fairly compare, I needed to scale out Redis so that it uses multiple cores on each AWS EC2
instance. The way to do this is to launch several Redis servers and shard the data among these servers.

Benchmark Results TL; DR at scale Aerospike wins


As I compared both databases at scale, I found a key differentiator to be manageability of sharding or scaling for
each type of database solution.
Redis

About Redis Scaling:

You must manage sharding yourself, by coming up with a sharding algorithm that evenly balances data between the shards.
Some of the Java clients (such as Jedis) do this for you, but you must check
to make sure that the data is properly balanced.
If you wish to increase the number of Redis shards in order to increase throughput or data volume, you will
have to refactor the sharding algorithm and rebalance the data. This usually results in a downtime.
If you need replication and failover, you will need to synchronize the data yourself, at the application layer.

About Aerospike Scaling:

Aerospike handles the equivalent of sharding automatically.


There is no downtime. You can add new capacity (data volume and throughput)
dynamically. Simply add additional nodes and Aerospike will automatically rebalance
the data and traffic.
You set the replication factor to 2 or more and configure Aerospike to operate with
synchronous replication for immediate consistency.
If a server goes down, Aerospike handles fail-over transparently; the Aerospike client makes sure that your
application can read or write replicated data automatically.
You can run purely in RAM, or take advantage of Flash/SSDs for storage. Using Flash/SSDs results in a slight
penalty on latency (~0.1ms longer on average), but allows you to scale with significantly smaller clusters and at
a fraction of the cost.

Benchmark Testing on AWS TL; DR the devil is in the details

Although AWS is convenient and inexpensive to use for testing, cloud platforms like AWS typically, demonstrate
greater variability of results. The network throughput, disk speeds, etc are more variable and this may result in different throughput results for the tests when conducted in a different availability zone, at a different time of day or
even within the same run of the test. Using AWS boundary containers, such as an AWS VPC and an AWS Placement
Group reduces this variability by a significant amount.
That being said, I found that reproducing vendor benchmarks on any public cloud requires quite a bit of attention
to detail. The environment is obviously different that on premises. Also beyond basic set up, performance-tuning
techniques vary from those Ive used for on premise and also from cloud-to-cloud solutions. In addition to covering
the steps to do this, Ive also included a long list of technical links at the end of this blog post.

Part 1: Getting Setup to Test on AWS the Basics


Step 1 Create an IAM AWS User account. I performed all of my tests as an authorized AWS IAM (non root)
user. It is of course a best practice for all use of any cloud to run as least privileged user, rather than root. On AWS
via IAM there are permission templates, which make the creation of users and assignment of permission quick and
easy, and there is really no excuse to perform benchmark testing as a root user.
Step 2 Select your EC2 AMI. For the first, most basic type of test, youll need to select, start and configure 3
AWS EC2 instances. There are a number of considerations here. In this post, the term node means a single EC2
instance and shard will mean a single Redis process acting as a part of a larger database service.
To get started, I used three of the same Amazon Linux AMIs. Each instance should be capable of having HVM enabled for maximum network throughput. HVM provides enhanced networking, it uses single root I/O virtualization
(SR-IOV) and results in higher network performance (packets per second), lower latency and lower jitter.
I used Amazon Linux AMI version 2014.09, as shown below:

Step 3 Select your AWS EC2 Machine Types. I chose the AWS R3 series of instances, since these were designed
to be optimized for memory intensive applications. Specifically I used R3.8xlarge which has 32 CPUs and 244 GB
RAM for the servers. On this instance type, HVM should be enabled by default so long as you spin up your instances in an AWS VPC.

AWS Component

Type

CPUs

RAM

SSD

ENIs

Network

Use

EC2 instance

R3.8xlarge

32

244

2 x 320

10 Gigabit

Redis
server

EC2 instance

R3.8xlarge

32

244

2 x 320

10 Gigabit

Aerospike
server

AWS Component

Type

CPUs

RAM

SSD

ENIs

Network

Use

EC2 instance

R3.2xlarge

61

1 x 160

High

Database
client

Step 4 Create an AWS Placement Group. As you prepare to spin up each EC2 instance be sure to use AWS
containers to simulate the in the same rack proximity that youd have if you were performing tests on premise. In
order to exactly simulate this and to minimize network latency on the AWS Cloud, I was careful to place the first set
of EC2 instances in the same VPC, availability zone and placement group. About AWS Placement groups from AWS
documentation A placement group is a logical grouping of instances within a single Availability Zone. Using placement groups enables applications to participate in a low-latency, 10 Gbps network.
Step 5 Startup your 3 EC2 instances. Be sure to place them in the same VPC, availability zone and placement
group. Take note of both their external and internal IP addresses.
Step 6 Connect to each of your instances. When you connect you may also want to verify HVM for each one,
to do so run this command to verify that the ixgbevf driver has been properly installed as shown below:

Step 7 Add more AWS ENIs: Even with Enhanced Networking, the network throughput is not enough to drive
Aerospike and Redis to their capacity. To increase the network throughput I added more network interfaces or
ENIs to each server. By using 4 ENIs on each r3.8xlarge EC2 instance I reached high network throughputs where
the database engines load the CPU cores to a significant amount (around 40%-60%).
Although you will add these ENIs (and also associate them with the EC2 instances for the servers and client, you will
also need to perform additional configuration steps to get maximum throughput. These steps are described in the
AWS Performance Networking Tuning section of this post.
Also when connecting from the client to the server for testing, I used the internal IP address to utilize the containment that I had so carefully set up. Shown below is a simple diagram of this process.

Part 2: Installing the Databases and Testing the Benchmark Tools


This first test is purposefully simple and isnt really designed to test either database at capacity, rather its a kind of
Hello World or smoke test designed to test your testing environment. Benchmark 1 tests with a single node for
each database server and keep all data in memory only, i.e. no data is persisted to disk. To proceed you perform
the following steps:



Step 1 Install Redis (2.8.17) on one EC2 R3.8xlarge instance


Step 2 Install Aerospike (3.3.21 Community) on another EC2 R3.8xlarge instance
Step 3 Install Client Tools Aerospike Java client (3.0.30) and Redis Jedis client (2.6.0) on a EC2 R3.2xlarge
instance
Step 4 Install the Benchmark Tools on the client I used 3 tools the Aerospike and Redis-version (of Aerospike) benchmarking tool and also the native Redis benchmarking tool. Benchmark test results for Redis were
roughly the same using either the Aerospike benchmark tool for Redis or the native Redis tool.

Step 5 Test run the benchmarks at this point you are only trying to verify that youve installed the
server(s), client and benchmarking tools correctly. You will not get highest level benchmark results until AFTER
you perform the additional AWS performance tweaks, listed in the next section. You can see the type of activity
(read or write) in the first column, then going across, and the latency for operations in ms by percentage of
operations. Ive highlighted the total transactions per second in the red circle in the first sample output below.

Shown below is sample output from the Aerospike benchmark tool:

Shown below is sample output from the native Redis benchmark tool:

Part 3a: Run Tests -> Benchmark Test 1 Single node, no persistence
For this first benchmark, I tested the performance of both Aerospike and Redis as a completely RAM-based store.
To get a more realistic result that just running the benchmark in a plain vanilla configuration, you will want to
compensate for architectural differences in the products. Aerospike is multithreaded and will use all available cores
(which in our case is 32 per server instance), while Redis is single-threaded. To fairly compare, I launched multiple
instances of Redis and sharded the data manually. Shown below is a visualization of this process.
The first diagram shows this process for Aerospike:

All clients must run with all the shards configured. Otherwise, the partitioning of keys will break down. Because of
this, all the benchmark clients should send traffic to all the redis servers.
The diagram below shows this process for Redis:

Here is the process to add Redis shards:






Run redis-*/utils/install-server.sh and use sequentially increasing port numbers (up from 6379)
Change conf file for every server (/etc/redis/6380.conf) and comment out the save lines to disable persistence
for i in {6387..6394}; do sed -i s/^save/#save/ $i.conf; done
Restart the redis instances
Make sure you FLUSHALL in redis-cli for all redis instances
for i in {6379..6386}; do redis-cli -p $i FLUSHALL; done
After keys have been inserted, confirm that they are equally distributed among the shards using INFO command in redis-cli. The last line in the output shows the number of records
for i in {6379..6386}; do redis-cli -p $i INFO | grep keys=; done

The next set of considerations is around mitigating the network bottleneck that you will encounter when testing
these high performance databases with the default number of ENIs (network interfaces). Here is where you will
want to further tune those additional ENIs that we created when we set up the instances by configuring IRQ and
Process affinity manually. The next section details this process.
AWS Networking Performance Tuning

IRQ affinity: Both Aerospike and Redis are extremely fast in their in-process operations of getting and setting
data items. To this end, it is beneficial to dedicate CPU cores to handle just the network IRQs. Each network
interface has 2 IRQs, which can be found by:
for i in {0..3}; do echo eth$i; grep eth$i-TxRx /proc/interrupts | awk {printf %s\n, $1}; done
Now, we can assign one CPU core for processing interrupts on each IRQ by changing the smp_affinity values of
the IRQs.
echo 1 > /proc/irq/259/smp_affinity

echo 2 > /proc/irq/260/smp_affinity


Process affinity: The kernel on any of the CPU cores may schedule the Aerospike and Redis processes and
indeed they may switch around different cores in the course of a single run. It has been empirically found that
pinning the processes to a set of CPU cores usually results in better performance. Specifically, keeping the CPU
cores handling the network IRQ isolated from the database processes is beneficial. This is accomplished by
using the taskset command. To make Aerospike use CPU cores 8 to 31:
# taskset -ap FFFFFF00 <PID of asd>
In the case of the many sharded Redis processes, this takes many taskset invocations one for each Redis
server. As Redis is single threaded, assigning one Redis server for each CPU core is a good idea.

To configure your environment you perform the following steps:





Step 1 Install multiple instances of Redis on your Redis server instance


Step 2 Verify that you have the correct number of ENIs associated to your server(s)
Step 3 Spin up additional client instances for this test, I used 4 instances
Step 4 Install Client Tools and Benchmark Tool on each client instance

Step 5 Assign CPUs to IRQs as per the directions in the previous section (smp_affinity)
Step 6 Pin processes to CPUs as per the directions in the previous section (taskset)
Step 7 Set the default RAM for the test Aerospike namespace to 10 GB
Step 8 Set each Redis instance to disable snapshotting by commenting out the save parameters in the config
file.
Step 9 Run the benchmark tool (using parameters) and compare the tuned benchmarks

Benchmark Tool Parameters


The multiple hosts in the -h option of the benchmark tool must be used to test against sharded Redis servers.
The ports are assumed to be serially increasing from the number specified in the -p option. Benchmark options
used in the current tests were:


10 million keys (-k 10000000)


100 byte string values for each key (-o S: 100)
Three different read-write loads:
50%-50% read/write (-w RU, 50)
80%-20% read/write (-w RU, 80)
100% read (-w RU, 100)
Before every test run, data was populated using the insert workload option (-w I)
For each client instance, 90 threads give the maximum throughput in the in-memory database case (-z 90). This
was reduced to a lower number in case of persistence tests (Benchmark Test 2) to avoid write errors due to
disk choking.

Benchmark Test 1 Results


Aerospike is as fast as Redis with close to 1 MTPS for 100% read workloads on a single node on AWS R3.8xlarge
with no persistence.
The default bottleneck in both cases is the network throughput of the instances. Adding ENIs helps to increase the
TPS for both Aerospike and Redis. With proper network IRQ affinity and process affinity set, both reach close to 1

MTPS in the 100% read workload. The chart below shows the benchmark test 1 results.

Part 3b: Run Tests ->Benchmark Test 2 Single Node, with Persistence
In this scenario, persistent storage was introduced. All of the data was still in memory but was also persisted on
EBS SSD (gp2) storage.
For Aerospike a new namespace was configured for this case. The data-in-memory config parameter was used. To
avoid the bottleneck caused by writing to single file, Aerospike was configured to write to 12 different data file locations (to create the same environment as the 12 files written by the 12 Redis shards.) This configuration specifies
that the storage files will only be read when restarting the instance.
The append-only file persistence option (AOF) was used to test with Redis. When a certain size of the AOF file is
reached, Redis compacts the file by reading the data from memory (background rewriting AOF). When this was taking place, there are periods when Redis throughput dropped. To avoid these outlier numbers, I kept the auto-aofrewrite-min-size parameter to a large size so that the rewrites were not triggered while the benchmark was being
run. These changes favorably overstate Redis performance.

Benchmark Test 2 Results


As shown in the chart above, Aerospike is slightly faster than Redis for 100/0 and 80/20 read/write workloads
against a single node backed by EBS SSD (gp2) storage for persistence.
I ran the test against 12 Redis shards on a single machine with 4 ENIs.. In this scenario, it was the disk writes which
were the bottleneck. The number of client threads was reduced for both Aerospike and Redis, to keep write errors
to zero.
It is important to note that Aerospike handles rewrites of the data using a block interface, rather than appending to
a file. It uses a background job to rewrite the data. The throughput numbers presented above are a good representation of the overall performance. However, when using a persistence file, Redis must occasionally rewrite the
data from RAM to disk in an AOF rewrite. During these times peak throughput is reduced. The throughput results
above do not take AOF rewrites into account.
The effects of AOF Rewrites should not be underestimated. In the above charts, I configured Redis to not do this,
since it is difficult to measure the steady state performance of the database during this time. However, it is important to understand its effects since this may impact your production system. The chart below shows how Redis
performs during one example of an AOF rewrite. Notice that both the read and write performance varies during
the rewrite.

References















Selecting AWS instance types http://aws.amazon.com/ec2/instance-types/


Enabling enhanced networking on Linux EC2 instance on an AWS VPC http://docs.aws.amazon.com/AWSEC2/
latest/UserGuide/enhanced-networking.html
AWS Placement Groups http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html
Understanding SR-IOV http://www.ovirt.org/Feature/SR-IOV
Limits of AWS VPC (use of unicast-mesh vs. multi-cast) http://aws.amazon.com/vpc/faqs/
Tuning Aerospike on AWS http://www.aerospike.com/docs/deploy_guides/aws/tune/
Aerospike (and Redis) benchmarking tool https://github.com/chetanvaity/aerospike-client-java-1 (download
from branch redis-bm for Redis version of tool)
Installing Redis http://redis.io/download
Sharding Redis for benchmarking http://redis.io/topics/benchmarks
About Redis AOF persistence http://redis.io/topics/persistence
Article on Redis persistence mechanisms: http://oldblog.antirez.com/post/redis-persistence-demystified.html
How Jedis manages Redis sharding https://github.com/xetorthio/jedis/wiki/AdvancedUsage#shardedjedis
Changing SMP Affinity for IRQs https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt
About the Linux taskset command http://linux.die.net/man/1/taskset
Aerospike client benchmark tool https://github.com/chetanvaity/aerospike-client-java-1
Redis client benchmark tool (authored by Aerospike) https://github.com/chetanvaity/aerospike-client-java-1
Branch: redis-bm

Вам также может понравиться