Вы находитесь на странице: 1из 6

Data Center Operations & Problem Management Policy

The following sample outlines a set of policies and procedures for Data Center Operations &
Problem Management.

Prepared By: ______________


Approved By: ______________
Revision Date: ______________
Effective Date: ______________

PURPOSE:
The objective of this document is to provide policy and procedure guidance for conducting major
activities in the Company X data centers.

SCOPE:
The major items included are Roles & Responsibilities, Help Desk Support, User Access
Management, System Monitoring, Problem Management, and Environmental Controls.

DEFINITIONS:
 1st Line Support - Handles help-desk level activities and provides front-line support to
internal customers.
 2nd Line Support - Manages production systems and issues involving the installation,
maintenance, and support for the various web properties.
 App X - An online IT Helpdesk application.
 Director of Technical Operations (DTO) - Responsible for the day-to-day management and
direction of the entire technical operations staff including 1st and 2nd Line Support.

POLICY
These policies and procedures, along with department roles and responsibilities, are reviewed by
the Managers of 1st and 2nd Line Support on an annual basis and signed-off by the DTO. All
metrics used to guide and monitor the activities of the Technical Operations department including
tasks to be performed, time frames for completion, and segregation of duties guidelines will be
reviewed annually.

ROLES & RESPONSIBILITIES


The IT technical support organization includes 1st and 2nd Line Support Managers, System
Administrators and Network Engineers who report to the DTO. The DTO reports to the CIO. The
general roles and responsibilities assigned to these positions are as follows:

1ST Line Support


Technical support’s 1st Line Support organization includes System Administrators and a first-line
manager position supporting the following responsibilities:

 Set-up new computers, phones;


 Support for hardware, software, and telecommunications;
 Deployment of new equipment;
 Process access requests for new, changes and modifications to network and applications;
 Help desk support for all problems and requests;
 System monitoring.

2nd Line Support


Technical Operation’s 2nd Line Support organization includes all UNIX systems, file servers,
applications and database servers for Development, QA, Staging and Production.

1
Source: www.knowledgeleader.com
 2nd Line Support for Unix systems and network administration duties;
 Troubleshooting and problem resolution;
 Support for new development efforts;
 Support of critical applications and system utilization;
 Participation in infrastructure architecture, enhancement and scalability projects.

Director of Technical Operations (DTO)


The DTO reports directly to the CIO and includes the following general responsibilities:

 Provides direction and management to 1st and 2nd line management;


 Monitors all performance reporting;
 Conducts review of all policies and procedures;
 Acts as sign-off on all change management requests;
 Provides escalation path to CIO and development staff for problem management.

PROCEDURES:

Help Desk Support System

App X is utilized to generate electronic ticketing for IT requests, including adding access for new
users, modifications to existing users and terminations, problems, emergencies and change
management requests. In addition to providing a vehicle for users to initiate IT requests, App X
facilitates the assignment, follow-up, management, closure and escalation of requests, including
the provision of reporting and audit trails.

When a ticket is acted upon (including closed) an e-mail is sent to all parties identified on the
ticket. This includes the requestor and IT parties working on the ticket. If a departmental manager
is identified (or any other individual cc'd as a concerned party) that individual will be copied on all
actions to the ticket.

Users open requests directly in App X, including details of their requirements. App X incidents are
automatically numbered. Tech Ops 1st Line Support technicians have access to App X and
monitor the requests throughout the day. The 1st Line Support technician that opens the ticket
has the choice of accepting the assignment or forwarding it to another 1 st Line Support technician.
Company X also has an IT help phone line. The individual answering this line will instruct the
caller to log the issue in App X and help facilitate the caller logging the issue in App X if
necessary.

Response categories have been defined, in order of greatest severity, P1 most severe, to P5,
least severe. While the initial requestor may designate the priority level, the IT responder may
change this priority to more appropriately reflect business conditions.

P1 events are considered emergency events and are responded to within four hours. Emergency
events would include situations that could potentially create a service delivery interruption (e.g.,
bring down a production database, domain, mail server, etc.) or present a need for an immediate
response, as in the case of terminating user access for someone leaving the facility with little or
no warning.

P2 requests are high priority and are responded to within eight hours. These are non-emergency
requests that either by definition of the requestor and/or the IT support technician need a
response within an eight-hour time period.

P3 requests are medium priority requests that require a response within 24 hours. P4 requests
are low-priority requests providing a response within a week. P5 requests are those requests that

2
Source: www.knowledgeleader.com
fall into the category of optional. They may or may not be resolved by IT and will be left ‘queued’
in this category, notifying the requestor of this status, until they are able to be resolved. In many
cases, P5 requests have to do with user upgrades to Company X provided technology and is not
part of the typical user technology set-up.

The requesting App X tickets are backed-up and maintained as part of the permanent filing
function of this particular application.

User Access Management

New Employee
IT creates user accounts and assigns passwords to new employees and contractors only upon
receipt of a New Hire request from Human Resources (HR) and only per the specific instructions
provided.

 HR opens a New Hire request in App X for employees and contractors, which provides the
required verification that the request is for a valid new employee or contractor, copying the
hiring manager on the request.
 Hiring manager specifies the type of access that each employee or contractor should have
including network and application access, email, intranet portal, and telephone.
 The 1st Line Support technician who accepts assignment of the App X request, sets up
accounts and one-time password and communicates it verbally to the new employee or
contractor.

Modifications
Modifications to user accounts for employees or contractors are accomplished only by the receipt
of a request from the individual’s manager and only per the specific instructions provided:

 The responsible manager opens a request in App X and specifies the change that is required.
 If the change is due to a job change, the new manager annotates what user account access
should be assigned to support the new position and what access should be deleted from the
past position.
 The IT technician accepts assignment of the App X request makes the required changes and
closes the request.

Terminations
IT deletes user accounts for employees and contractors only upon receipt of a request from
Human Resources (HR) and only per the specific instructions provided.

 Human Resources open an App X request to include the employee or contractor’s


termination date.
 The IT technician accepting assignment of the App X request disables the employee or
contractor’s account on their termination date, unless otherwise directed by Human
Resources.
 When necessary and at the direction of HR, IT processes the request as an emergency
event, documenting the activity in App X after the event.

External (Network) System Monitoring/Intrusion Detection

A documented wide-area network (WAN) network diagram exists and illustrates network controls
that include firewalls, routers, switches, and servers.

3
Source: www.knowledgeleader.com
Company X utilizes a firewall on the Production and Corporate environments that hides the
structure of the network, filters out unauthorized access, provides an audit trail of connectivity and
generates alarms. All firewall locations are noted on the network diagram.

An Intrusion Detection System (IDS) is utilized with probes on the Corporate and Production side
of the network. At the production data center probes are located in the DMZ VLAN and the
network in front of the load balancer. In the development data center probes are located in the
Corporate DMZ and the internal network.

A Distributed Denial of Service (DDoS) attack mitigation system is utilized in the production
network. It is composed of a Detector and a Guard attached to the Edge router. In the event that
an attack is identified by the Detector, it signals the Guard to redirect incoming traffic through the
Guard, allowing only valid traffic through.

Internal (Host) System Monitoring/Event Log Monitoring

Company X provides Event Log Monitoring (ELM) on their servers to detect and respond to
suspicious and anomalous activity. Servers included in the ELM include those housing financial
information, mail servers, secure web servers, and database servers.

Suspicious and anomalous information, as defined by Company X, is reviewed for any necessary
response and logged in App X.

System Availability & Capacity Monitoring


Company X production systems are monitored daily and reported to management on a monthly
basis. Downtime is classified as 1) severely degraded services, 2) unscheduled outages and 3)
scheduled maintenance. In the cases of degraded services or unscheduled outages, an alert will
be sent and the issue will be researched and remediated.

System capacity is automatically monitored. If capacity thresholds are crossed an alert will be
sent to Technical Operations and the issue will be researched and remediated.

Third Party Services & Access

Third Party Services


Acquisition of major Vendor services is performed only after an internal risk assessment and an
RFP. Upon assessment of the need to procure major IT services from a third-party, IT
Management will create a business case and review the needs and associated risk with each
potential vendor. If multiple vendors are identified, IT Management will send an RFP and
subsequently evaluate responses for feasibility.

IT reviews major vendor SLAs upon contract renewal. Vendor SLAs are established and reviewed
at the time of contract renewals. Technical Operations will review SLAs and contracts for major
vendor contracts. This entails reviewing pricing, past performance (if an existing vendor), and
compatibility with other Company X systems.

Third-Party Access
Currently Company X provides network access to third-parties only in limited instances. Access is
granted at the minimum-access level required and only for a defined duration, immediately
revoking the access at end of agreed period.

Vulnerability Assessments

Vulnerability assessments are performed for Company X by an outside provider, which conducts
(at minimum) an annual vulnerability assessment. Any necessary remediation will be logged-in
App X and kept as part of a permanent archive.

4
Source: www.knowledgeleader.com
Problem Management

Any critical, non-standard operational event, such as a system failure that has or could pose a
significant service delivery interruption, is coded as a P1 (highest severity) incident in App X for
prioritizing, monitoring, escalation, resolution, and archiving. These are events that have been
detected either from monitoring of the network, event-log monitoring by in-house IT staff, or by
detection on the part of any IT employee and/or user:

 The IT department staff monitors the server, network environment, and App X daily for non-
standard events. This occurs through several processes:
 Internal event-log monitoring;
 External monitoring;
 User reporting through the Helpdesk (App X) function.
 Emergency events are logged-in App X once identified, either by regular IT monitoring, or
through user requests.
 After being identified and logged-in App X, all emergency priority events are escalated to the
DTO and a notification is sent via email. Distribution for this email includes the CIO and
Executive management team.

Physical Access & Environmental Controls

Physical Access
Visitors to Company X are required to check-in with the receptionist and fill-in the name of who
they are visiting, as well as the date and time in and out.

Visitors to the Company X Corp office are required to check-in with the receptionist. The
employee they are visiting is informed of their arrival and will escort the visitor while on the
premises.

Only authorized personnel are provided unescorted access to the data center. Access control is
provided by key card. Access is restricted by a security guard; only authorized access is
permitted upon review of identification, which is then called against Company X’s access list.

Environmental Controls
The Corporate data center contains adequate environmental controls to maintain the systems
and data, including fire suppression, uninterrupted power source (UPS) power back-up with a
diesel generator back-up, and air conditioning. The data center contains the following:

 Dedicated air conditioning units;


 Temperature control devices;
 Uninterruptible Power supply (UPS) – diesel back-up;
 Water sprinkler system (high-temperature wet-head).

The Production data center, which is collocated at a Tier-1 facility, also maintains environmental
controls to support systems and data, including fire suppression, uninterrupted power source
(UPS) power back-up and air conditioning. The data center contains the following:

 Air conditioning units;


 Temperature control devices;
 Uninterruptible Power Supply (UPS);
 Fire suppression – dry pipes.

5
Source: www.knowledgeleader.com
All mission critical servers in the data centers are rack-mounted and secured against seismic
events or falling hazards. The equipment racks in the data centers are seismically secured to
both the floor and the overhead.

6
Source: www.knowledgeleader.com

Вам также может понравиться