Академический Документы
Профессиональный Документы
Культура Документы
CONTENTS
Contents .............................................................................................................................. iii
List of Figures ...................................................................................................................... vi
Author List ......................................................................................................................... viii
Revision History .................................................................................................................. ix
1
Introduction............................................................................................................ 10
2.1
Functional Overview.................................................................................. 11
2.2
2.3
Return on Investment................................................................................ 14
3.2
3.3
3.2.2
Iometer ................................................................................................... 20
3.3.2
BigFixPerf ............................................................................................... 22
iii
4.1
4.2
4.3
4.4
4.5
BigFix Relay and Associated Infrastructure Capacity Management
Considerations ...................................................................................................... 29
4.6
4.7
5.2
5.3
6
Virtualization ........................................................................................... 34
5.1.2
5.1.3
5.2.2
5.2.3
5.2.4
5.2.5
6.1.2
iv
6.2
6.3
6.4
7.2
7.1.2
7.1.3
7.1.4
Security Hardening.................................................................................... 50
7.2.1
7.2.2
Summary Cookbook.............................................................................................. 54
8.1
8.2
8.3
References ........................................................................................................................ 56
LIST OF FIGURES
Figure 1: Revision History ................................................................................................................ ix
Figure 2: BigFix Architecture.......................................................................................................... 12
Figure 3: BigFix Server Elements .................................................................................................. 13
Figure 4: Business Value Analyst for IBM BigFix and MobileFirst ................................................. 14
Figure 5: BigFix Performance Benchmark Environment Sample ................................................... 17
Figure 6: Little's Law ...................................................................................................................... 18
Figure 7: Monitoring Tools ............................................................................................................. 19
Figure 8: Iometer User Interface Sample ....................................................................................... 20
Figure 9: Iometer Workload Sample .............................................................................................. 20
Figure 10: Iometer Results Sample................................................................................................ 21
Figure 11: Disk Queue Length by IO Subsystem Type .................................................................. 22
Figure 12: BigFixPerf Syntax ......................................................................................................... 22
Figure 13: BigFixPerf Windows Example ....................................................................................... 23
Figure 14: BigFixPerf Windows Example Output ........................................................................... 23
Figure 15: BigFixPerf UNIX Example ............................................................................................. 23
Figure 16: BigFixPerf UNIX Example Output ................................................................................. 24
Figure 17: BigFix Management Server Capacity Planning Requirements ..................................... 26
Figure 18: BigFix Enabled Function Additional Capacity Planning Requirements ......................... 27
Figure 19: Console Workstation Capacity Planning Requirements ................................................ 28
Figure 20: Terminal Server Capacity Planning Requirements ....................................................... 28
Figure 21: BigFix Relay Infrastructure............................................................................................ 29
Figure 22: Top Level Relay Capacity Planning Requirements ....................................................... 30
Figure 23: Sample Capacity Planning Profile................................................................................. 31
Figure 24: BigFix Capacity Planning & Performance Management ............................................... 33
Figure 25: Modifying the Linux IO Scheduler ................................................................................. 35
Figure 26: Linux IO Scheduler Throughput .................................................................................... 36
Figure 27: Linux IO Scheduler Latency .......................................................................................... 36
Figure 28: BigFix Schema Characteristics ..................................................................................... 39
Figure 29: DB2 Configuration Recommendations .......................................................................... 39
Figure 30: FillDB Database Boost Levels ...................................................................................... 40
vi
vii
AUTHOR LIST
This paper is the team effort of a number of security and performance specialists comprising the
IBM BigFix performance team. Additional recognition goes out to the entire IBM BigFix
development team.
Mark Leitch
(primary contact for this paper)
IBM Toronto Laboratory
Mariella Corbacio
Nicola Milanese
Pietro Marella
IBM Rome Laboratory
viii
REVISION HISTORY
Date
Version
Revised By
Draft
MDL
9.2.x.0
MDL
9.2.x.1
MDL
9.2.x.2
MDL
July
15th,
2015
November
1st,
2015
Comments
ix
Introduction
Capacity planning involves the specification of the various components of an installation to
meet customer requirements, often with growth or timeline considerations. IBM BigFix
offers endpoint lifecycle and security management for large scale, distributed deployments
of servers, desktops, laptops, and mobile devices across physical and virtual
environments.
This document will provide an overview of capacity planning for the IBM BigFix Version 9.2
solution. In addition, it will offer management best practices to achieve a well performing
installation that demonstrates service stability. This will include the deployment of the
BigFix management servers into cloud, or virtual, environments. Capacity planning for
virtual environments typically involves the specification of sufficient physical resources to
provide the illusion of unconstrained resources in an environment that may be
characterized by highly variable demand
In this document we will provide an IBM BigFix overview, including functionality,
architecture, and performance. We will then offer the capacity planning recommendations,
including considerations for hardware configuration, software configuration, and cloud best
practices. A summary cookbook is provided to manage installation and configuration for
specific instances of BigFix.
Note: This document is considered a work in progress. Capacity planning
recommendations will be refined and updated as new BigFix releases are available. While
the paper in general is considered suitable for all BigFix Version 9.2 releases, it is best
oriented towards IBM BigFix Version 9.2.6. In addition, a number of references are
provided in the References section. These papers are highly recommended for readers
who want detailed knowledge of BigFix server configuration, architecture, and capacity
planning.
Note: Some artifacts are distributed with this paper (see View Navigation Panels
Attachments in the document viewer). The distributions are in zip format. However
Adobe protects against files with a zip suffix. As a result, the file suffix is set to zap per
distribution. To use these artifacts, simply rename the distribution to zip and process as
usual.
10
2.1
Functional Overview
The IBM BigFix portfolio provides a comprehensive security solution encompassing a
number of operational areas. These areas include the following.
Patch management.
Power management.
Core protection.
Server automation.
11
2.2
Architectural Overview
The following diagram provided a basic view of the BigFix architecture1.
12
Server.
The base BigFix Enterprise Server. It includes the following elements. The
diagram below shows the anti-collocation options for these elements (meaning, the
ability to deploy on nodes distinct from the BigFix root server). The pros and cons
of anti-collocation are described later in this document.
o
The WebUI.
A Node.js instance with associated database intended to support the new Web
based user interface.
Fixlet Server.
It is used as the object repository for all client content (fixlets, tasks, baselines, and
analyses). In addition, dashboards, wizards, and WebUI applications are delivered
via the Fixlet servers. Fixlets are utilized by the agent to determine relevance,
compliance, and remediation activities.
Console.
A management console (user interface) for BigFix operators. The console is a
Windows only application.
Relays.
A distributed, hierarchical infrastructure to manage the deployment of BigFix
agents across diverse network topologies.
13
Recommendations will be provided in the BigFix Capacity Planning section for optimal
performance management of these components.
2.3
Return on Investment
Return on Investment (ROI) is a key concern for any deployed solution. In the security
space, the notion of return can involve many dimensions, given the potentially
catastrophic impact security exploits may have on an enterprise. To facilitate the
understanding of ROI for IBM BigFix, a business value analyst is available (URL).
The analyzer is based on the establishment of a company profile comprising the following
elements.
The company profile, including legacy systems and current incident and problem
resolution rates.
Hardware and software investment profiles, including endpoint audits and device
decomposition.
Based on these responses, the benefits, investments, and overall Return on Investment is
provided through a number of multi-year views.
14
3.1
Reference Benchmarks
There are a number of reference benchmarks managed for the BigFix solution. These
benchmarks ensure the offering is field ready and able to manage future scalability
requirements. The set of benchmarks includes, but is not limited to, the following.
FillDB benchmark.
A set of workloads to manage and optimize the BigFix FillDB operation.
Mailbox benchmark.
A set of workloads to manage and evaluate the BigFix endpoint mailbox
functionality.
The most basic performance methodology for any benchmark is to establish a baseline,
and then iterate on the baseline as you drive improvements through code, infrastructure,
and tuning. Once the solution offering is delivered, the baseline is then moved to the new
improved state.
15
and scale of the workload, but also by time. Sample characteristics of the Unified
Benchmark include the following.
Duration.
The benchmark is persistent, meaning it is continuously running (as a customer
workload would be continuously running) in order to manage the long term stability
characteristics of BigFix (e.g. performance stability, system resource stability, etc.).
Data population.
Database population is used to simulate large scale customer installations.
Sample population parameters include, but are not limited to, the following.
o
Patches = 28,000.
Roles = 50.
Client simulation.
Real and simulated clients may be used to represent large installations comprising
hundreds of thousands of endpoints.
Workload saturation.
Workload levels should not be constant. Workload oscillation, meaning workload
peaks and valleys are in evidence, are expected in customer environments. It can
be useful to drive beyond solution saturation levels for brief periods to demonstrate
product stability and preservation of service under high load.
Think times.
Think times are the pause between user operations, meant to simulate the
behavior of a human user.
Bandwidth throttling.
In order to simulate low speed or high latency lines, bandwidth throttling is
employed for some customer workloads. A sample throttle for a moderate speed
16
3.2
System utilization including the standard CPU, IO, network and memory views.
The final category, request concurrency, has many interesting dimensions. Two areas we
will focus on are Littles Law, and how to evaluate the number of concurrent users for a
solution.
17
L = W
Where:
L = the number of concurrent requests in the system.
= the request arrival rate.
W = the average time a request spends in the system.
Figure 6: Little's Law
This elegant equation makes it clear that if you want to improve concurrency you may:
Increasing the arrival rate will eventually hit a solution limit (whether software or
infrastructure). At that point, the focus is typically on optimizing the software and/or
infrastructure to reduce the average request time.
C = the concurrent user population for an instance of BigFix. Concurrent users are
considered to be the set of users within the overall population P that are actively
managing the environment at a point in time (e.g. administrator operations in the
user interface, endpoint operations, etc.).
In general, P is a much larger value than C (i.e. P >> C). For example, it is not unrealistic
that a total population of 200 users may have a concurrent user population of 20 users (i.e.
10%).
3.3
Monitoring Tools
Monitoring tools may include system monitoring approaches as well as associated
infrastructure benchmarks. The follow table describes a number of approaches that are
used for BigFix.
18
Tool
Description
BigFixPerf
A BigFix custom data collection tool that wraps the nmon and perfmon utilities.
Documentation: See detail section below.
Recommended invocation:
BigFixPerf monitor <monitor output> interval <interval> iterations <iterations>
Note: For simplicity, BigFixPerf may be used as a wrapper for nmon and perfmon.
nmon
nmon is a comprehensive system monitoring tool for the UNIX platform. It is highly
useful for understanding system behavior.
Documentation: URL
Sample invocation: nmon -T -s <samplerate> -c <iterations> -F <output file>
perfmon
db2support
DBMS
Snapshots
DBMS snapshot monitoring can offer insight into SQL workload, and in particular
expensive SQL statements.
Documentation: URL
WAIT
esxtop
iometer
I/O subsystem measurement and characterization tool for single and clustered
systems.
Documentation: URL Additional information is also provided below.
Recommended invocation: dynamo /m <client host name or ip>
iperf
TCP and UDP measurement and characterization tool that reports bandwidth,
delay, jitter, and datagram loss.
Documentation: URL
Recommended server invocation: iperf s
Recommended client invocation #1: iperf -c <server host name or ip>
Recommended client invocation #2: iperf -c <server host name or ip> -R
UnixBench
Perfkit
Benchmarker
19
3.3.1 Iometer
Out of all system resources, IO is typically the most difficult to manage. High performance
IO subsystems are relatively expensive, and prone to failure if redundancy is not managed.
In addition, many solutions that perform well in a physical environment may plummet in a
virtual environment (to, say, 100 IOPS). We will describe a sample Iometer workload
relevant for BigFix. We will then show sample results and targets.
Block Size
Read %
Random %
Workload 1:
Stock
4KB
25%
0%
Workload 2:
Open Action Profile
8KB
10%
20%
Workload 3:
REST API Profile
8KB
90%
20%
20
SSD.
A Storage Area Network (SAN) Raid 5 configuration configured for file system
access (LUN A).
The graph shows the latency of each IO subsystem, for each reference workload. The
supported IOPS are shown as part of the X axis per column.
The general recommendation for a healthy IO subsystem is the ability to manage in the
range of 5,000 IOPS to 10,000 IOPS with a latency of 1ms or less. An additional diagram
of Windows perfmon level data (average disk IO queue length) is also provided to reinforce
the Iometer results. Essentially, the system and disk queue impact of the various storage
subsystems translate easily to the Iometer results.
21
3.3.2 BigFixPerf
The BigFixPerf utility is a custom monitoring tool intended to standardize data collection on
Windows and UNIX.
Syntax:
BigFixPerf --monitor <monitor results identifier> [--interval <sample interval>] [--iterations
<sample iterations>] [--verbose]
Options:
--monitor <monitor results identifier>
The monitor results object. On Windows, this is the name of the defined
monitoring element, and may be viewed under the Windows perfmon utility. On
UNIX, this is the name of the monitoring output file.
--interval <sample interval>
The monitor sample interval. The default value is 5 seconds.
--iterations <sample iterations>
The number of samples to collect. The default value is 720 sample iterations.
With a 5 second sample interval, this will provide a one hour monitoring capture.
--verbose
Enables verbose mode for the BigFixPerf utility itself. This may be useful for
debugging purposes.
Figure 12: BigFixPerf Syntax
22
Windows Example
The following command will create a performance counter named BigFixPerf and will
collect samples every 60 seconds, for a 24 hour interval.
BigFixPerf --monitor BigFixPerf --interval 60 --iterations 1440
Figure 13: BigFixPerf Windows Example
The monitor results are typically written to the $systemdir directory, and may be viewed in
the Windows perfmon utility. The following image shows the aggregate counter view. The
view may be tailored so only selected counters are visible.
UNIX Example
The following command will create a performance monitor output file named
BigFixPerf.nmon and will collect samples every 60 seconds, for a 24 hour interval.
BigFixPerf --monitor BigFixPerf.nmon --interval 60 --iterations 1440
Figure 15: BigFixPerf UNIX Example
The monitor results are written to the designated output file. The results must then be
formatted via the nmon provided utility. The following image shows the sample formatted
23
24
Capacity planning for the BigFix management server(s) (aka the managed from
infrastructure). This includes the BigFix console.
Capacity planning for the managed endpoints, including the BigFix relay
infrastructure (aka the managed to infrastructure).
The capacity planning recommendations will be given in terms of number of CPUs. Given
there is a broad range of capability here, we will provide a general definition of a CPU.
4.1
In terms of pure clock speed, per the IBM SoftLayer definition, we will generally
consider a CPU core to be a relatively current generation 2.0+ GHz/core
implementation. See the References section for more information on SoftLayer
server specifications.
A hyper threaded core does not have the throughput capability of a pure core,
and can be considered to have on the order of 30% of the capability of a pure core.
Our sizing approach is based on pure, physical cores at 2.0+ GHz/core. For virtual or
hyper threaded cores, the above efficiency rations should be part of the sizing
methodology. An example will be provided at the end of this section.
4.2
25
4.3
Note the storage requirements for IBM BigFix are typically quite low. However,
suitable storage is recommended to accommodate the growing database size and
associated management overhead (e.g. a working set of database backups).
Network requirements are for a 1 Gbps network or better for the management
server infrastructure.
Deployment Size
CPU
Memory (GB)
Storage (GB)
< 1000
50
10,000
50
50,000
12
100
100,000
16
100
150,000
16
32
150
200,000
20
64
200
250,000
24
64
250
26
Platform
Additional CPU
Message Level
Encryption
All
+2
+4
All
+1
+4
Windows
Linux
BigFix WebUI
Message Level Encryption (MLE) provides data encryption support for clients, and
is particularly valuable for insecure networks or when secure communication in
general is required. It is worth noting MLE does not affect actions taken from the
console or fixlets that are already protected by digital signatures. More information
on MLE is provided in the References section.
The BigFix WebUI offers a new scalable and highly responsive management
interface for BigFix. As part of the implementation, an Extract-Transform-Load
(ETL) infrastructure must be enabled. This function, once enabled, has the above
server requirements.
The BigFix WebUI in general, once enabled, will drive additional system utilization
as a function of the number of concurrent administrators (as described earlier, this
is the population of administrators active at any one time, and the number of
concurrent administrators is typically >> the total number of administrators).
In terms of the scalability characteristics of the WebUI, the initial release is targeted at the
activities of non-master operators (versus the broader administrative role of master
operators). A realistic upper bound for the initial release of the WebUI would be
management by 30 concurrent users on Windows, and 60 concurrent users on Linux, over
an estate of 60k devices. The concurrent users would typically be non-master operators,
managing a subset of the estate. For example, some non-master operators may only be
managing a handful of devices, while others may be managing on the order of 20k devices.
It is possible to manage at a larger scale based on user operations, infrastructure
capability, etc. However, the stated bounds should be considered a good rule of thumb
for the scale of the solution.
27
4.4
CPU
Memory (GB)
Storage (GB)
< 10,000
0.25
< 100,000
2GB
> 100,000
4GB
The following table shows capacity planning requirements for a terminal or Citrix server
based implementation. The expectation is data center level network speeds are available
for the server, and each server may be managing on the order of 10-20 concurrent users
(remote users, meaning they may not reside in the data center). In the event a greater
number of concurrent users are in effect, the general rule of thumb is to add on the order of
1 CPU and 2-6 GB of RAM for every additional concurrent user (RAM is dependent on the
deployment size). As always, requirements are workload dependent so monitoring of the
system under load is always recommended. As a result, ranges are given where
appropriate with the expectation that monitoring may be used to fine tune in the customer
environment.
Deployment Size
CPU
Memory (GB)
Storage (GB)
< 10,000
20
< 100,000
4-8
16 - 32
80GB
> 100,000
8 - 16
16 - 32
80GB - 160GB
The question often arises how many console operators may be supported by BigFix. A
primary selling feature of BigFix is the ability of a small number of operators to manage a
large estate. However, in the event that fine grained management is required, a base of
300 operators may be managed with careful attention to the console infrastructure.
Proceeding beyond this value would require understanding of the infrastructure and
associated workload impact.
28
4.5
In terms of strict capacity planning the relay requirements are fairly straightforward. The
relays are deployed in a hierarchy with top level relays serving other relays, and the leaf
node relays (often referred to as site level relays) serving the endpoints. The following
ratios apply for relay deployment.
Relays managing relays: 1:250 (meaning each relay node can manage on the
order of 250 other relays).
Relays managing BigFix agent children: 1:1000 (meaning each relay node can
manage on the order of 1000 child agents or endpoints.
29
For the relays serving other relays (also known as top level relays), the following capacity
planning recommendations apply. The relays managing the endpoints offer low utilization,
and may possibly be collocated with server nodes already distributed in the enterprise.
Deployment Size
(Child Endpoints Served)
CPU
Memory (GB)
10,000
20,000
50,000
100,000
150,000
200,000
250,000
A more difficult decision to reach is actual placement of the relays, and in particular the
hierarchy of nodes. Decision points include network bandwidth, network latency, firewalls,
server infrastructure, etc. Network topology maps typically exist for most enterprises.
However, these maps rarely contain accurate metrics for network performance between
nodes. Even when metrics are provided, they are often out of date or represent ideal
conditions. In addition, network shaping often applies, meaning network characteristics
may be dynamic based upon load.
In order to facilitate understanding of the network and placement considerations, basic
network ping tests may be performed. Sample ping commands follow.
Using these commands a map may be built showing latency and packet loss. Additional
tests to demonstrate the number of hops via trace route (e.g. tracert) or equivalent is
recommend. Secure copy (e.g. scp) tests for sample payloads are also helpful. In
essence, once the network characteristics are defined, placement decisions typically
become very straightforward.
4.6
30
recommended that ample container space is available for any upgrade starting below
patch level 9.2.5, and moving to level 9.2.5 and beyond.
4.7
Network topology: Hub topology with 1 central data center and four hubs, with
1Gbps data center speed, and 100Mbps WAN speed.
BigFix Component
Number of Servers
CPU
(per Server)
RAM (GB)
(per Server)
Base server
(includes DBMS)
WebUI ETL
+1
+4
WebUI Server
+3
+2
Terminal Server
20
(collocated on
existing servers)
n/a
n/a
Total
3
(not counting
collocated relays)
16
23
31
Performance Management
Capacity planning and performance management go hand in hand. There is no standard
workload for BigFix. Every enterprise has different requirements, infrastructure, and
customization. This section will build upon the base capacity planning recommendations,
and offer a set of defined decision points for building an optimal BigFix deployment based
on the enterprise needs.
The following diagram provides a form of decision tree for a BigFix deployment.
Virtualization adds additional levels that must be managed for performance. For
example hypervisor management and IO tuning are critical in virtual deployments.
In physical deployments, this management cost is typically significantly reduced
(IO) or eliminated (the hypervisor).
Windows adds additional Total Cost of Ownership (TCO) concerns for licensing.
For example, the DB2 entitlement is provided as part of the BigFix product itself,
so no additional licensing is needed. This can simplify deployments, especially
Proof of Concept situations that often desire minimal licensing and infrastructure
requirements.
A remote database typically adds overhead (e.g. request latency) versus a local
database.
32
33
5.1
5.1.1 Virtualization
In todays modern enterprise, virtualization is seen as a powerful way to address the
management of cost and scale. In general terms, performance management of physical
servers tends to be simpler. Resources are isolated, there is no hypervisor involved, and
the operating system view of performance is a direct indicator of system and application
performance.
In order to simplify performance management and keep latency characteristics to a
minimum, the first recommendation is always to deploy on physical hardware. However, it
is still possible that a virtual deployment is still desired (whether enterprise standards, high
skill levels in the team for virtual system performance, etc.). In order to manage BigFix in a
virtual environment, precautions must be taken to ensure performance. We will describe
some of the key management aspects. We will then reinforce the fact that monitoring and
understanding is critical in a virtual world.
CPU ready.
This is the percentage of time the VM is ready to be run, but is waiting due to
scheduler constraints.
CPU wait.
The amount of time the CPU spends in wait state.
34
Virtual IO Management
Out of all system resources, IO is typically the most difficult to manage. High performance
IO subsystems are relatively expensive, and prone to failure if redundancy is not managed.
In addition, many solutions that perform well in a physical environment (say in the range of
5,000 to 10,000 IOPS) may plummet in a virtual environment (say 100 IOPS). As a result,
in any virtual environment it is critical to benchmark and monitor the IO subsystem. In
order to achieve this, more information is provided in the benchmarking section below. In
addition, we will next describe specific guidelines for IO management for Linux virtual
deployments.
The following graphs show the throughput and latency results, based on an Iometer
benchmark across a variety of storage subsystems. In terms of throughput (where higher is
better) and latency (where lower is better), the deadline scheduler is preferable. Note
while the differences may appear small, under load and as concurrency and contention go
up, the gaps increase significantly.
With Red Hat Enterprise Linux 7, the default scheduler has been set to deadline.
35
36
We will describe two special areas of operating system management: Linux swappiness
and the Linux ulimit.
Linux Swappiness
The Linux swappiness kernel parameter (vm.swappiness) is a value in the interval [0,100]
that defines the relative weight of swapping out runtime memory, versus dropping pages
from the system page cache. The default value is typically 60. The recommendations for
setting this value are as follows.
For a database management server collocated with the BigFix application server,
the swappiness should be set to ten (10).
Further details on managing DB2 performance are provided in the References section.
The versions supported for a specific BigFix release are documented in the system
requirements matrix (URL). The general recommendation is to use the most current
database release supported for your BigFix version, as database performance, resilience,
and function tend to only improve with each new release.
It should be noted that Microsoft SQL Server Express is also included in the list of
reference databases. Both DB2 and Microsoft SQL Server offer express versions.
These are license free, limited utility offerings typically intended for low demand or proof of
concept situations.
37
In the case of DB2 Express, there is no compelling reason to use it. It is simply a
constrained version of DB2, and the license for the full version is provided for BigFix Linux
deployments.
In the case of Microsoft SQL Server Express, the support matrix clearly indicates it may be
used for evaluation purposes, and the customer will provide the full Microsoft SQL Server
license. What does this mean in the context of a BigFix deployment? Essentially,
Microsoft SQL Server Express may be used for a BigFix deployment with the following
constraints.
The user must be aware of the Microsoft SQL Server Express constraints. The
constraints for a specific version are documented in the Microsoft Knowledge Base
(e.g. URL). In general, the DBMS is constrained to a single CPU socket and up to
four cores, utilizing up to 1GB of RAM and 10GB of database storage. Once the
Microsoft SQL Server Express limits are reached, the configuration is no longer
supported by IBM BigFix.
In terms of the scale limits for IBM BigFix with Microsoft SQL Server Express,
scale on the order of 100 devices with one or two operators is expected. Even at
this level of proof of concept scale, system monitoring is critical to ensure system
health. For example, it may be possible to exceed 100 devices with careful
monitoring, but it is considered a good rule of thumb. In addition, some BigFix
function such as the IBM BigFix WebUI should be enabled with care. Further
detail on managing monitoring and the WebUI is provided in the following points.
The user must perform adequate system monitoring to ensure the database
system limits are not impacting the health of BigFix. For example, once the 10GB
storage limit is reached, the database will no longer be viable. When the CPU and
memory limits are reached, system response times and throughput will degrade.
As a result system monitoring is critical. To monitor the storage limits, the
Windows file explorer may be used. To manage the CPU and memory limits, the
Windows performance monitor or task manager may be used for the SQL Server
process. For advanced users, the Microsoft SQL Server Management Studio may
be used for monitoring.
In the event the defined limits are reached, Microsoft does support an in place
upgrade approach. As a result, a maintenance outage may be taken, the DBMS
licensed, and service may be resumed.
38
DBMS server are collocated. For a remote database, the network transport chain must be
invoked. As a result, even well configured data centers that have low latency 10Gbps
networks may demonstrate a slowdown with a remote database versus a local database.
Schema
Number
of Tables
BESREPOR
DBO
26
BFENT
DBO
114
Comments
Comments
STMT_CONC = LITERALS
LOCKTIMEOUT = 60
AUTO_MAINT = ON
AUTO_TBL_MAINT = ON
AUTO_RUNSTATS = ON
AUTO_STMT_STATS = ON
AUTO_REORG = ON
CUR_COMMIT = ON
5.2
39
interest in any deployment. Please see the References section for a more comprehensive
listing.
Windows:
Add the DatabaseBoostLevel DWord value to the registry key
HKLM\Software\Wow6432Node\BigFix\Enterprise Server\FillDB.
Linux:
Add the following lines to the /var/opt/BESServer/besserver.config file:
[Software\BigFix\Enterprise Server\FillDB]
DatabaseBoostLevel = <DATABASE_CONFIGURATION_LEVEL>
The default database boost level is zero (0). While this is sufficient for Linux, the
recommended database boost level for Windows based deployments is three (3). The
following graph shows a sample scenario for database boost level impact on Windows
(higher numbers are better).
40
Computer removal.
A computer removal utility (URL) is available to remove obsolete computers and
thereby reduce the overhead for database operations.
Audit cleanup.
An audit cleanup utility (URL) is available to prune entries based on a variety of
criteria, including time. The audit cleanup should be done in accordance with the
enterprise audit policies. For example, a database archive may be generated to
store the audit content for that point in time, with the subsequent cleanup serving
to reduce the overhead for database operations.
_WebUI_ETL_DelaySeconds
This parameter defines the ETL run interval, and starts counting at the completion
of the last ETL interval. The default value is 600 seconds, which equated to 10
minutes. Adjusting this value is a tradeoff between WebUI data freshness and
database resource impact.
As is typical of most databases, statistics are collected by the database manager and used
in the establishment of query plans. In the case of the WebUI SQLite instance, there are
two parameters that will impact statistics collection.
_WebUIAppEnv_ETL_STATISTICS_THRESHOLD
This parameter defines the row threshold for statistics collection. Once this
41
_WebUIAppEnv_ETL_STATISTICS_THRESHOLD_TIME
This parameter defines the time interval for refreshing statistics. By default it is set
to 3 a.m. (i.e. 03:00). An array of values may be provided (e.g.
03:00,11:00,16:00").
In the event that query slowdowns are experienced during the daily usage of the WebUI
(as evidenced by user interface operations taking more time to load), more frequent
statistics collection may be specified via the above parameters.
In addition to statistics management, concurrency in the WebUI (Node.js) runtime is also
important. The following configuration setting should be set to the expected number of
peak concurrent users.
5.3
_WebUIAppEnv_UV_THREADPOOL_SIZE
This parameter determines the thread pool size utilized by the Node.js runtime for
the WebUI.
In the event it is desired to set up a benchmark and monitoring reference for a BigFix
installation, the authors of this paper may be contacted for consultation purposes.
42
DB2 Utility
backup, restore
backup, restore
runstats
reorg
Reorganization
Maintenance Automation
6.1
Online backups may be utilized as well. The following figure provides commands that
comprise a sample weekly schedule. With the given schedule, the best case scenario is a
restore requiring one image to restore (Monday failure using the Sunday night backup).
The worst case scenario would require four images (Sunday + Wednesday + Thursday +
Friday). An alternate approach would be to utilize a full incremental backup each night to
make the worst case scenario two images. The tradeoffs for the backup approaches are
the time to take the backup, the amount of disk space consumed, and the restore
dependencies. A best practice can be to start with nightly full online backups, and
introduce incremental backups if time becomes an issue.
43
(Sun)
(Mon)
(Tue)
(Wed)
(Thu)
(Fri)
(Sat)
backup
backup
backup
backup
backup
backup
backup
db
db
db
db
db
db
db
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
online
online
online
online
online
online
online
tsm
tsm
tsm
tsm
Note to enable incremental backups, the database configuration must be updated to track
page modifications, and a full backup taken in order to establish a baseline.
update db cfg for BFENT using TRACKMOD YES
Figure 35: Database Incremental Backup Enablement
To restore the online backups, either a manual or automatic approach may be used. For
the manual approach, you must start with the target image, and then revert to the oldest
relevant backup and move forward to finish with the target image. A far simpler approach
is to use the automatic option and let DB2 manage the images. A sample of each
approach is provided below, showing the restore based on the Thursday backup.
restore db <dbname> incremental use tsm taken at <Sunday full timestamp>
restore db <dbname> incremental use tsm taken at <Wednesday incremental
timestamp>
restore db <dbname> incremental use tsm taken at <Thursday incremental delta
timestamp>
Figure 36: Database Online Backup Manual Restore
restore db <dbname> incremental auto use tsm taken at <Thursday incremental delta
timestamp>
Figure 37: Database Online Backup Automatic Restore
In order to support online backups, archive logging must be enabled. The next subsection
provides information on archive logging, including the capability to restore to a specific
point in time using a combination of database backups and archive logs.
44
Alternatively, in order to enable log archiving to TSM, the following command may be
used4.
update db cfg for <dbname> using logarchmeth1 TSM
Figure 39: Database Log Archiving to TSM
Note that a logarchmeth2 configuration parameter also exists. If both of the log archive
method parameters are set, each log file is archived twice (once per log archive method
configuration setting). This will result in two copies of archived log files in two distinct
locations (a useful feature based on the resiliency and availability of each archive location).
Once the online backups and log archive(s) are in effect, the recovery of the database may
be performed via a database restore followed by a roll forward through the logs. Several
restore options have been previously described. Once the restore has been completed,
roll forward recovery must be performed. The following are sample roll forward operations.
rollforward <dbname> to end of logs
Figure 40: Database Roll Forward Recovery: Sample A
rollforward <dbname> to 2012-02-23-14.21.56 and stop
Figure 41: Database Roll Forward Recovery: Sample B
It is worth noting the second example recovers to a specific point in time. For a
comprehensive description of the DB2 log archiving options, the DB2 information center
should be consulted (URL). A service window (i.e. stop the application) is typically
required to enable log archiving.
A superior approach is to let DB2 automatically prune the backup history and delete your
old backup images and log files. A sample configuration is provided below.
update db cfg for BFENT using AUTO_DEL_REC_OBJ ON
update db cfg for BFENT using NUM_DB_BACKUPS 21
update db cfg for BFENT using REC_HIS_RETENTN 180
Figure 43: Database Backup Automatic Cleanup Configuration
It is also generally recommended to have the backup storage independent from the
database itself. This provides a level of isolation in the event volume issues arise (e.g. it
4
The log archive methods (logarchmeth1, logarchmeth2) have the ability to associate configuration options with
them (logarchopt1, logarchopt2) for further customization.
45
ensures that a backup operation will not fill the volume hosting the tablespace containers,
which could possibly lead to application failures).
6.2
One issue with the reorgchk command is it does not enable full control over statistics
capturing options. For this reason, it may be beneficial to perform statistics updates on a
table by table level. However, this can be a daunting task for a database with hundreds of
tables. As a result, the following SQL statement may be used to generate administration
commands on a table by table basis.
select 'runstats on table ' || STRIP(tabschema) || '.' || tabname || ' with
distribution and detailed indexes all;' from SYSCAT.TABLES where tabschema in
('DBO');
Figure 45: Database Statistics Collection Table Iterator
6.3
Database Reorganization
Over time, the space associated with database tables and indexes may become
fragmented. Reorganizing the table and indexes may reclaim space and lead to more
efficient space utilization and query performance. In order to achieve this, the table
reorganization command may be used. Note, as discussed in the previous performance
management section, automatic database reorganization may be enabled to reduce the
requirement for manual maintenance.
The following commands are examples of running a reorg on a specific table and its
associated indexes. Note the reorgchk command previously demonstrated will actually
have a per table indicator of what tables require a reorg. Using the result of reorgchk per
table reorganization may be achieved for optimal database space management and usage.
It is important to note there are many options and philosophies for doing database
reorganization. Every enterprise must establish its own policies based on usage, space
considerations, performance, etc. The above example is an offline reorg. However it is
possible to also do an online reorg via the allow read access or allow write access
options. The notruncate option may also be specified (indicating the table will not be
truncated in order to free space). The notruncate option permits more relaxed locking
and greater concurrency (which may be desirable if the space usage is small or will soon
46
be reclaimed). If full online access during a reorg is required, the allow write access and
notruncate options are both recommended.
Note it is also possible to use our table iteration approach to do massive reorgs across
hundreds of tables as shown in the following figure. The DB2 provided snapshot routines
and views (e.g. SNAPDB, SNAP_GET_TAB_REORG) may be used to monitor the status
of reorg operations.
select 'reorg table ' || STRIP(tabschema) || '.' || tabname || ' allow no
access;' from SYSCAT.TABLES where tabschema in ('DBO');
select 'reorg indexes all for table ' || STRIP(tabschema) || '.' || tabname || '
allow no access;' from SYSCAT.TABLES where tabschema in ('DBO');
Figure 47: Database Reorganization Table Iterator
6.4
Statistics
Reorgs
Saturday
Saturday
Sunday
Sunday
47
Archiving
Saturday
Security Considerations
This paper is primarily concerned with capacity planning and performance management for
BigFix. However, BigFix is primarily a security offering, so it seems fitting to provide a
description of security management and hardening approaches for BigFix deployments.
7.1
Security Management
The following table provides a summary of BigFix security management. Specific security
areas are expanded upon as appropriate.
Security Area
Disposition
Threat Modeling
Provides visibility into the security and regulatory compliance risks web
applications present to your organization.
Scans websites for both embedded malware and links to malicious or undesirable
websites.
Helps ensure your website is not infecting visitors or directing them to unwanted or
dangerous websites.
48
Delivers more than 40 security compliance reports, including PCI Data Security
Standard (PCI DSS), Payment Application Data Security Standard (PA-DSS), ISO
27001 and ISO 27002, HIPAA, GLBA and Basel II.
Further information on the Rational Appscan offering is available in the References section.
Identifies security vulnerabilities and defects in the source code during the early
stages of the application lifecycle when they are the least expensive to remediate.
Delivers fast scans of more than one million lines of code per hour, allowing you to
scan even the most complex enterprise applications.
Further information on the Rational Appscan Source offering is available in the References
section.
49
It is worth noting these are simply report options. For example, for the PCI DSS report
neither Rational Appscan nor IBM are approved scanning vendors. While the reports are
considered to have value in terms of classifications and exposures, they are not
considered to be at the certification level.
7.2
Security Hardening
Security hardening has multiple dimensions, particularly given the scope of BigFix. We will
provide the following, very basic, hardening approaches.
50
ports that are local to the host) between components is expected. For intra node
communication, the local host is not listed as an incoming host.
The following attributes are managed.
Port.
The specific port that is open.
Protocol.
The specific network protocol in effect, where applicable.
Program instance.
The program holding the port. This may be a specific executable or a general
class designation (e.g. Operating System).
The reference tables describe the BigFix runtime requirements. The install and
upgrade requirements are not included.
The ports described are for the BigFix content. Additional operating system
services may be active.
DNS and directory services are specific to an enterprise deployment and may
require additional customization.
Information is not provided for the BigFix Disaster Server Architecture (DSA)
configuration at the time of this writing.
51
Port
Protocol
Program
User
Comments
50000
TCP
db2sysc
db2inst1
8080
TCP
BESWebReports
root
52311
TCP
BESRootServer
root
52315
TCP
BESRootServer
root
80
TCP
node
root
443
TCP
node
root
TCP
node
root
52
The first variable is the set of servers to be managed. This is a hash of the node alias (a
symbolic value), and the fully qualified host name. This structure should be changed per
BigFix installation, for the nodes the utility is to be run against. A sample follows.
%hosts = ('BF1' => 'blade13.romelab.it.ibm.com');
Figure 52: Port Utility Hosts Configuration
The next structure shows the set of active ports required for BigFix. These are the defined
listening ports, broken down by host and organized by component. Samples are shown for
the BigFix Enterprise Server.
%ports_active = (
'BF1' => [# DB2
BF
50000, 8080,
52311,
52315,
Node
80,
443,
5000,
5001,
5002,
5003,
5004,
5005,
5006,
5007,
5008,
5009
],
);
Figure 53: Port Utility Active Port Configuration
The next structures serve a common purpose: they indicate the ports or the programs
associated with ports that may be ignored. The intent is to remove any noise from the port
monitoring view. This is particularly valuable in monitor mode.
%ports_ignore = ('BF1' => []);
@programs_ignore = ('cupsd', 'dnsmasq', 'master', 'repo_srv.', 'rpcbind',
'rpc.statd', 'sshd');
Figure 54: Port Utility Ports and Programs to Ignore
53
Summary Cookbook
The following tables provide a cookbook for the solution implementation. The cookbook
approach implies a set of steps the reader may check off as completed to provide a
stepwise implementation of the BigFix solution. The recommendations will be provided in
three basic steps:
1. Base installation recommendations.
2. Post installation recommendations.
3. High scale recommendations.
All recommendations are provided in tabular format. The preferred order of implementing
the recommendations is in order from the first row of the table through to the last.
8.1
Description
B1
B2
B3
54
Status
8.2
Description
P1
P2
P3
P4
P5
Status
8.3
Description
S1
S2
S3
55
Status
REFERENCES
IBM BigFix References
IBM BigFix Knowledge Center
IBM BigFix Version 9.2 Knowledge Center
IBM BigFix Resource Center
IBM BigFix Resource Center
IBM BigFix developerWorks Resource Center
IBM BigFix developerWorks Resource Center
IBM BigFix 9.2.0 System Requirements
http://www-01.ibm.com/support/docview.wss?rs=1015&uid=swg21684809
IBM BigFix Message Level Encryption
IBM BigFix Message Level Encryption
IBM BigFix Performance Configurations
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20End
point%20Manager/page/Performance%20Configurations
IBM BigFix Server Disk Performance
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20End
point%20Manager/page/Server%20Disk%20Performance
BigFix Network Management and Bandwidth Throttling
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20End
point%20Manager/page/Bandwidth%20Throttling
BigFix Client Usage Profiler
http://www-01.ibm.com/support/docview.wss?uid=swg21506248
BigFix Utilities
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20End
point%20Manager/page/Utilities
56
Virtualization References
IBM SoftLayer Cloud Server Specifications
IBM SoftLayer Cloud Server Specifications
Performance Best Practices for VMware vSphere 5.0
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
Performance Best Practices for VMware vSphere 5.1
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf
Best practices for virtual machine snapshots in the VMware environment
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC
&externalId=1025279
VMware: Troubleshooting ESX/ESXi virtual machine performance issues
VMware Knowledge Base
VMware: Troubleshooting virtual machine performance issues
VMware Knowledge Base
VMware: Performance Blog
http://blogs.vmware.com/vsphere/performance
Linux on System x: Tuning KVM for Performance
KVM Performance Tuning
Kernel Virtual Machine (KVM): Tuning KVM for performance
http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaattuning_pdf.pdf
PowerVM Virtualization Performance Advisor
Developer Works PowerVM Performance
IBM PowerVM Best Practices
http://www.redbooks.ibm.com/redbooks/pdfs/sg248062.pdf
Benchmark References
Report on Cloud Computing to the OSG Steering Committee, SPEC Open Systems Group,
https://www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf
57
58
59
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in
the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in
this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks may also be registered or common law
trademarks in other countries. A current list of IBM trademarks is available on the web at "Copyright and trademark
information" at http://www.ibm.com/legal/copytrade.shtml.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Other company, product, or service names may be trademarks or service marks of others.
60