Data Center Automation
Data Center Automation (DCA) consists of the processes and
procedures implemented in a data center environment for the purpose of
automating day-to-day activities. These activities will be those
performed normally by system administrators, operators, programmers, and
users.
The fundamental purpose of utilizing computers is automation,
however computers rarely simplify or expedite the initial gathering of
data, in fact the opposite is usually true. A computing environment
provides the power to be able to easily manipulate the data once it has
been gathered, and to automate processes based on this data.
When analyzing processes for automation in a data center environment
the following question must be asked first: Will automating this process
save or generate revenue? If not, then it is not a candidate for
automation. Implementing technology solutions in a business scenario
only for the sake of technology itself, is a waste of time, energy, and
resources. If no monetary benefit is realized, then it is of no benefit
to the business. In a non-business scenario, the requirement of
"monetary benefit" may be overridden by research or scientific
curiosity, but is still a question that should be asked.
It is the duty and responsibility of all persons within an
organization who utilize computing resources to identify and document
those procedures and processes that are candidates for automation. As
previously stated, in a business scenario, candidates for automation are
any activity that will generate or increase monetary benefits by
automation, and is repeated more than once. Priority for assigning
resources to automate an activity is determined by how often it is
performed and the expected benefits.
System deployment, configuration, and
implementation
Numerous software application packages exist to automatically
provide system deployment, configuration, and implementation. These
packages are typically large and work across heterogenous platforms and
environments.
High Availability deployment, configuration, and
implementation
High availability is typically an automated process for minimizing
business function downtime associated with planned or unplanned outages.
Typically utilizes replicated hardware platforms to eliminate single
points of failure. The business function fail-over normally occurs
between two or more physical frames within the same data center using a
single shared storage system.
- Elimination of single points of failure (SPOF's) are a necessary part of HA.
- The goal is to minimize downtime for business functions, not systems.
- This is NOT non-stop computing, downtime will be experienced during fail over.
Disaster Recovery deployment, configuration, and
implementation
Disaster Recovery (DR) is the implementation of a project plan which
describes the tasks necessary to recover critical business functions
after a disaster. The recovery occurs between geographically separated
data centers using one or more methods of storage replication between
the data centers. The initial implementation of a DR plan is normally a
manual process that requires management declaration of a disaster,
however subsequent DR processes may be automated.
In the context of data center automation, the generation of a
disaster recovery plan should be an automated process, or mostly
automated. Unfortunately, this is typically not the case. In most
instances, the DR plan is written and manually maintained by system
administrators, application administrators, and other technical
personnel.
Business Continuity compliance, configuration, and
implementation
Business continuity consists of the activities performed on a daily
basis to ensure the business operates normally today, tomorrow, and
beyond. The business continuity plan may be thought of as a
methodology, or as an enterprise wide mentality of conducting day-to-day
business.
Network resources allocation and deallocation
For the purpose of data center automation, allocation of network
resources, such as IP addresses, must be programmable. This requires
that available network addresses be stored in dynamic locations or
databases that are accessible by other automated processes, such as
system deployment and configuration. This also requires that node
names, host names, and aliases be automatically generated on an as
needed basis and name resolution services be automatically updated with
this information.
Storage resources allocation and deallocation
When automatically deploying new systems as part of data center
automation, this requires storage for operating systems and application
be available for allocation. Automating these storage allocations and
deallocations requires a Storage Area Network (SAN) with a programmable
interface such as scripts, or API's.
Dynamic CPU and Memory allocation and
deallocation
Most of today's modern systems provide capabilities to dynamically
allocate and deallocation hardware resources such as CPU and Memory.
Automating these changes requires a programmable interface such as
scripts or API's to the hardware management system. Furthermore, in
modern data center environments, change control must be employed before
modifications are implemented, therefore change requests must be
automatically submitted and approvals transmitted back to the hardware
management system. Once approvals are received, the automated hardware
change can proceed.
Process Scheduling
Assuming heterogenous environments, data center automation requires
a cross-platform process scheduling system that can start processes on a
node, detect when the process is complete, and make decisions regarding
the next step (and process) in a sequence of data processing procedures.
These platforms may include mainframe, Unix, MS-Windows, and a wide
variety of others.
Performance Monitoring
Performance monitoring and notification is a fundamental piece of
data center automation. Many problem avoidance procedures can be
automatically implemented based on performance information. This
information is also critical to determining service level agreement
compliance. Notification of performance issue that cannot be
automatically resolved can be forwarded to the appropriate personnel for
further review.
Error detection and problem resolution
Monitoring of system and application error logs provides a
foundation for automated problem resolution.
Security Management
User authentication and authorization is one of the most time
consuming activities in any large computing environment. Many tools
exist to automate these processes in heterogenous environments, however
it is still a very difficult task. Once a solution is implemented by an
organization, an enterprise wide policy must be adopted that requires
all subsequent systems and applications to utilize and integrate into
the selected solution. Non-compliance must require executive management
approval.
Document Management
In a data center automation environment, technical personnel should
not be spending their time writing system documentation and procedures.
The primary reason is because the documentation will always be obsolete
or incomplete. Instead system documentation and procedures should be
automatically generated on a periodic basis to ensure they are current
and complete.
Change Management
Any large data center environment will require change requests to be
submitted and approved before any planned work is performed. To support
data center automation principles, the change request system must accept
input programmatically through scripts, API's, email, or some other
mechanism that can be generated from the systems themselves, rather than
by a human.
Audit Management
In support of data center automation concepts, audit compliance must
be achievable through automated mechanisms, at least from the technical
support personnel's perspective. Toolsets should be implemented with the
ability to respond to audit requests via a "self-service" interface
which grants access to audit information only to authorized auditors.
This relieves the technical support staff from having to gather, format,
and provide this information to numerous different auditors, several
times a year.
Service Level Agreements
The Service Level Agreement (SLA) is an agreement between a business
function owner and the service provider which designates the amount of
time, on an annualized basis, the business function will be available.
Conversely, the SLA also designates the amount of time, on an annualized
basis, for which the business function will NOT be available. The SLA's
are utilized in data center automation to schedule maintenance, updates,
upgrades, etc., and every system in the data center must have an
associated SLA in order for it to participate in the automated
scheduling of services. This includes test, development, and sandbox
systems.
Resources
Virtual Logical Partition Automated Robot
vLPAR®
The vLPAR®
appliance fully automates the deployment and configuration of entire
multi-system, multi-data center computing environments including
business continuity, disaster recovery, high availability, and
virtualization.
Mt Xia is partnered with TriParadigm to deliver automated solutions
for deploying business continuity (BC), disaster recovery (DR), high
availability (HA), and virtualization (VIO) as a consumer
commodity. This packaged solution called
vLPAR® is sold by
TriParadigm as an appliance that can
plug directly into your datacenter and reduce your administration costs
by 75%. vLPAR® can also increase your profits by fully
implementing IBM Power 5/6/7 architectures and over-subscribing available
hardware resources to multiple customers, thus maximizing your
return-on-investment (ROI).
The business continuity methodology developed by Mt Xia and
TriParadigm, and implemented by
vLPAR® is characterized by a "build once, run
anywhere" mentality. This phrase is meant to illustrate the concept
of "containerized" business functions, where all resources related to
supporting a business function are grouped together in such a way as to
be able to run on any compatible system, in any datacenter, anywhere in
the world.
Each component and piece of a containerized business function is
constructed using enterprise wide unique identifiers so that naming and
numbering conflicts are eliminated, thus enabling the "build once,
run anywhere" mentality. In fact, this concept enables multiple
containers to run simultaneously on the same system, which is a common
practice when implementing a disaster recovery plan. In disaster
recovery operations, production systems are typically consolidated onto
a single system in the disaster recovery environment.
With vLPAR®, a datacenter operator has the ability to
offer it's customers, on-demand disaster recovery / high availability
clusters, or standalone systems, in real-time, all generated by the
customer from a simple web based interface.
For a customer to create their own BC/DR/HA/VIO solutions, they only
need to complete one simple web form. Their request is automatically
processed and system(s) are created in a matter of minutes, they are
notified upon completion and provided with a login name and password to
a fully functional cluster of systems. (DR/HA).
vLPAR® is a true on-demand computing solution delivered
as a consumer commodity providing BC/DR/HA/VIO and standardized
SLA's.
vLPAR® Press Release
vLPAR® provides and fully implements all of the
following features:
- Consumer oriented web based interface to automatically create in real-time:
- Disaster Recovery / High Availability multi-node clusters
- High Availability multi-node clusters
- Standalone systems
- Real-time creation/configuration of all systems
- Eliminates wasted resources sitting idle waiting for customer
- Real-time automated creation/configuration of automated high availabilty
- Real-time automated creation/configuration of disaster recovery resources
- Real-time automated creation/configuration of business continuity structures
- CPU Micropartitioning
- Dynamic resizing of CPU resources based on system load
- Dynamic resizing of Memory resources based on system load
- Redundant access to storage
- Redundant access to networking
- "Build once, run anywhere" business continuity design structure
- Disaster recovery failover to any location
- High availability failover to another hardware platform
- Virtualized I/O resources with redundant connections
- Standardized Service Level Agreements (SLA)
- Standardized system structure based on service level agreement
- State-of-the-Art policies, guidelines, standards, and procedures
- Automatically generated disaster recovery documentation
- Automatically generated high availability cluster documentation
- Automatically generated manual fail-over documentation
Additional packages integrated into vLPAR:
- Orwell Disaster Recovery System (Mt Xia, et al)
- AMMS - Automated Microcode Management System (Mt Xia)
- OpenCMDB - Web based Configuration Management Database (Open Source)
- Automated IP address allocation and management (Mt Xia)
- Automated Documentation generator (Mt Xia)
- Automated Web based documentation management (Mt Xia)
- kshAuth - Web based authorization and authentication system (Mt Xia)
- VIOS - VIO Server - Virtual I/O Server (IBM)
- HMC - Hardware Management Console (IBM)
- NIM - Network Installation Manager (IBM)
- HACMP - High Availability Clustered Multi-Processing (IBM)
- PLM - Partition Load Manager (IBM)
- WLM - Work Load Load Manager (IBM)
- DDNS - Dynamic DNS - Dynamic Domain Name Service (Open Source)
Click here for a high level
slide presentation describing vLPAR features and functions.
Click here for screen shots and animations of
vLPAR client LPAR creation procedures.
[an error occurred while processing this directive]
|