Data Center Automation
Data Center Automation (DCA) consists of the processes and
procedures implemented in a data center environment for the purpose of
automating day-to-day activities. These activities will be those
performed normally by system administrators, operators, programmers, and
users.
The fundamental purpose of utilizing computers is automation,
however computers rarely simplify or expedite the initial gathering of
data, in fact the opposite is usually true. A computing environment
provides the power to be able to easily manipulate the data once it has
been gathered, and to automate processes based on this data.
When analyzing processes for automation in a data center environment
the following question must be asked first: Will automating this process
save or generate revenue? If not, then it is not a candidate for
automation. Implementing technology solutions in a business scenario
only for the sake of technology itself, is a waste of time, energy, and
resources. If no monetary benefit is realized, then it is of no benefit
to the business. In a non-business scenario, the requirement of
"monetary benefit" may be overridden by research or scientific
curiosity, but is still a question that should be asked.
It is the duty and responsibility of all persons within an
organization who utilize computing resources to identify and document
those procedures and processes that are candidates for automation. As
previously stated, in a business scenario, candidates for automation are
any activity that will generate or increase monetary benefits by
automation, and is repeated more than once. Priority for assigning
resources to automate an activity is determined by how often it is
performed and the expected benefits.
System deployment, configuration, and
implementation
Numerous software application packages exist to automatically
provide system deployment, configuration, and implementation. These
packages are typically large and work across heterogenous platforms and
environments.
High Availability deployment, configuration, and
implementation
High availability is typically an automated process for minimizing
business function downtime associated with planned or unplanned outages.
Typically utilizes replicated hardware platforms to eliminate single
points of failure. The business function fail-over normally occurs
between two or more physical frames within the same data center using a
single shared storage system.
- Elimination of single points of failure (SPOF's) are a necessary part of HA.
- The goal is to minimize downtime for business functions, not systems.
- This is NOT non-stop computing, downtime will be experienced during fail over.
Disaster Recovery deployment, configuration, and
implementation
Disaster Recovery (DR) is the implementation of a project plan which
describes the tasks necessary to recover critical business functions
after a disaster. The recovery occurs between geographically separated
data centers using one or more methods of storage replication between
the data centers. The initial implementation of a DR plan is normally a
manual process that requires management declaration of a disaster,
however subsequent DR processes may be automated.
In the context of data center automation, the generation of a
disaster recovery plan should be an automated process, or mostly
automated. Unfortunately, this is typically not the case. In most
instances, the DR plan is written and manually maintained by system
administrators, application administrators, and other technical
personnel.
Business Continuity compliance, configuration, and
implementation
Business continuity consists of the activities performed on a daily
basis to ensure the business operates normally today, tomorrow, and
beyond. The business continuity plan may be thought of as a
methodology, or as an enterprise wide mentality of conducting day-to-day
business.
Network resources allocation and deallocation
For the purpose of data center automation, allocation of network
resources, such as IP addresses, must be programmable. This requires
that available network addresses be stored in dynamic locations or
databases that are accessible by other automated processes, such as
system deployment and configuration. This also requires that node
names, host names, and aliases be automatically generated on an as
needed basis and name resolution services be automatically updated with
this information.
Storage resources allocation and deallocation
When automatically deploying new systems as part of data center
automation, this requires storage for operating systems and application
be available for allocation. Automating these storage allocations and
deallocations requires a Storage Area Network (SAN) with a programmable
interface such as scripts, or API's.
Dynamic CPU and Memory allocation and
deallocation
Most of today's modern systems provide capabilities to dynamically
allocate and deallocation hardware resources such as CPU and Memory.
Automating these changes requires a programmable interface such as
scripts or API's to the hardware management system. Furthermore, in
modern data center environments, change control must be employed before
modifications are implemented, therefore change requests must be
automatically submitted and approvals transmitted back to the hardware
management system. Once approvals are received, the automated hardware
change can proceed.
Process Scheduling
Assuming heterogenous environments, data center automation requires
a cross-platform process scheduling system that can start processes on a
node, detect when the process is complete, and make decisions regarding
the next step (and process) in a sequence of data processing procedures.
These platforms may include mainframe, Unix, MS-Windows, and a wide
variety of others.
Performance Monitoring
Performance monitoring and notification is a fundamental piece of
data center automation. Many problem avoidance procedures can be
automatically implemented based on performance information. This
information is also critical to determining service level agreement
compliance. Notification of performance issue that cannot be
automatically resolved can be forwarded to the appropriate personnel for
further review.
Error detection and problem resolution
Monitoring of system and application error logs provides a
foundation for automated problem resolution.
Security Management
User authentication and authorization is one of the most time
consuming activities in any large computing environment. Many tools
exist to automate these processes in heterogenous environments, however
it is still a very difficult task. Once a solution is implemented by an
organization, an enterprise wide policy must be adopted that requires
all subsequent systems and applications to utilize and integrate into
the selected solution. Non-compliance must require executive management
approval.
Document Management
In a data center automation environment, technical personnel should
not be spending their time writing system documentation and procedures.
The primary reason is because the documentation will always be obsolete
or incomplete. Instead system documentation and procedures should be
automatically generated on a periodic basis to ensure they are current
and complete.
Change Management
Any large data center environment will require change requests to be
submitted and approved before any planned work is performed. To support
data center automation principles, the change request system must accept
input programmatically through scripts, API's, email, or some other
mechanism that can be generated from the systems themselves, rather than
by a human.
Audit Management
In support of data center automation concepts, audit compliance must
be achievable through automated mechanisms, at least from the technical
support personnel's perspective. Toolsets should be implemented with the
ability to respond to audit requests via a "self-service" interface
which grants access to audit information only to authorized auditors.
This relieves the technical support staff from having to gather, format,
and provide this information to numerous different auditors, several
times a year.
Service Level Agreements
The Service Level Agreement (SLA) is an agreement between a business
function owner and the service provider which designates the amount of
time, on an annualized basis, the business function will be available.
Conversely, the SLA also designates the amount of time, on an annualized
basis, for which the business function will NOT be available. The SLA's
are utilized in data center automation to schedule maintenance, updates,
upgrades, etc., and every system in the data center must have an
associated SLA in order for it to participate in the automated
scheduling of services. This includes test, development, and sandbox
systems.
Resources
|