Asynchronous I/O

This document addresses how asynchronous I/O is used. This information applies to AIX Version 4.x.

Prerequisites

AIO overview

Characteristics of asynchronous I/O

Functions of asynchronous I/O

Order and priority of asynchronous I/O calls

Subroutines affected by asynchronous I/O

64-bit enhancements


Prerequisites

To make use of asynchronous I/O the following fileset must be installed:

bos.rte.aio

To determine if this fileset is installed, use:

lslpp -l bos.rte.aio

You must also make the aio0 device "Available" via SMIT.

smit chgaio

STATE to be configured at system restart available



AIO overview

Synchronous I/O occurs while you wait. Applications processing cannot continue until the I/O operation is complete.

In contrast, asynchronous I/O operations run in the background and do not block user applications. This improves performance because I/O operations and applications processing can run simultaneously.

Using asynchronous I/O will usually improve your I/O throughput, especially when you are storing data in raw logical volumes (as opposed to Journaled file systems). The actual performance, however, depends on how many server processes are running that will handle the I/O requests.

Many applications, such as databases and file servers, take advantage of the ability to overlap processing and I/O. These asynchronous I/O operations use various kinds of devices and files. Also, multiple asynchronous I/O operations may run at the same time on one or more devices or files.

Each asynchronous I/O request has a corresponding control block in the application's address space. When an asynchronous I/O request is made, a handle is established in the control block. This handle is used to retrieve the status and the return values of the request.

Applications use the aio_read and aio_write subroutines to perform the I/O. Control returns to the application from the subroutine, as soon as the request has been queued. The application can then continue processing while the disk operation is being performed.

A kernel process (KPROC) called a server is in charge of each request from the time it is taken off the queue until it completes. The number of servers limits the number of disk I/O operations that can be in progress in the system simultaneously.

The default values are minservers=1 and maxservers=10. In systems that seldom run applications that use asynchronous I/O, this is usually adequate. For environments with many disk drives and key applications that use asynchronous I/O, the default is far too low. The result of a deficiency of servers is that disk I/O seems much slower than it should be. Not only do requests spend inordinate lengths of time in the queue, but the low ratio of servers to disk drives means that the seek-optimization algorithms have too few requests to work with for each drive.

How do I know if I need to use AIO?

Using the vmstat command with an interval and count value, you can determine if the CPU is idle waiting for disk I/O. The wa column details the percentage of time the CPU was idle with pending local disk I/O.

If there is at least one outstanding I/O to a local disk when the wait process is running, the time is classified as waiting for I/O. Unless asynchronous I/O is being used by the process, an I/O request to disk causes the calling process to block (or sleep) until the request has been completed. Once a process's I/O request completes, it is placed on the run queue.

A wa value consistently over 25 percent may indicate that the disk subsystem is not balanced properly, or it may be the result of a disk-intensive workload.

NOTE: AIO will not relieve an overly busy disk drive. Using the iostat command with an interval and count value, you can determine if any disks are overly busy. Monitor the %tm_act column for each disk drive on the system. On some systems, a %tm_act of 35.0 or higher for one disk can cause noticeably slower performance. The relief for this case could be to move data from more busy to less busy disks, but simply having AIO will not relieve an overly busy disk problem.

Important for SMP

For SMP systems, the us, sy, id and wa columns are only averages over all processors. But keep in mind that the I/O wait statistic per processor is not really a processor-specific statistic; it is a global statistic. An I/O wait is distinguished from idle time only by the state of a pending I/O. If there is any pending disk I/O, and the processor is not busy, then it is an I/O wait time. Disk I/O is not tracked by processors, so when there is any I/O wait, all processors get charged (assuming they are all equally idle).

How many AIO servers am I currently using?

The following command will tell you how many AIO servers are currently running (you must run this command as the "root" user):

pstat -a | grep aios | wc -l

If the disk drives that are being accessed asynchronously are using the AIX Journaled File System (JFS), all I/O will be routed through the aios KPROCs.

If the disk drives that are being accessed asynchronously are using a form of RAW logical volume management, then the disk I/O is not routed through the aios KPROCs. In that case the number of servers running is not relevant.

However, if you want to confirm that an application that uses RAW logical volumes is taking advantage of AIO, and you are at AIX 4.3.2 or higher or AIX 4.3.x with APAR IX79690 installed, you can disable the "Fastpath" option via SMIT. When this option has been disabled, even RAW I/O will be forced through the aios KPROCs. At that point, the pstat command listed in preceding discussion will work. You would not want to run the system with this option disabled for any length of time. This is simply a suggestion to confirm that the application is working with AIO and RAW logical volumes.

At AIX levels before AIX 4.3, the "Fastpath" is enabled by default and cannot be disabled.

How many AIO servers do I need?

Here are some suggested rules of thumb for determining what value to set MAXIMUM number of servers to:

1. The first rule of thumb suggests that you limit the MAXIMUM number of servers to a number equal to ten times the number of disks that are to be used concurrently. The MINIMUM number of servers should be set to half of this maximum number.

2. Another rule of thumb is to set the MAXIMUM number of servers to 80 and leave the MINIMUM number of servers set to the default of 1 and reboot. Monitor the number of additional servers started throughout the course of normal workload. After a 24-hour period of normal activity, set the MAXIMUM number of servers to:

(The number of currently running aios + 10),

and set the MINIMUM number of servers to:

(The number of currently running aios - 10).

In some environments you may see more than 80 aios KPROCs running. If so, consider this rule of thumb:

3. A third suggestion is to take statistics using vmstat -s before any high I/O activity begins, and again at the end. Check the field iodone. From this you can determine how many physical I/Os are being handled in a given wall clock period. Then increase the MAXIMUM number of servers and see if you can get more iodones in the same time period.


Characteristics of asynchronous I/O

You can change attributes relating to asynchronous I/O using the chdev command or SMIT. Likewise, you can use SMIT to configure and remove (unconfigure) asynchronous I/O. (Alternatively, you can use the mkdev and rmdev commands to configure and remove asynchronous I/O.) To start SMIT at the main menu for asynchronous I/O, enter smit aio.

MINIMUM number of servers

indicates the minimum number of kernel processes dedicated to asynchronous I/O processing. Since each kernel process uses memory, this number should not be large when the amount of asynchronous I/O expected is small.

MAXIMUM number of servers

indicates the maximum number of kernel processes dedicated to asynchronous I/O processing. There can never be more than this many asynchronous I/O requests in progress at one time, so this number limits the possible I/O concurrency.

Maximum number of REQUESTS

indicates the maximum number of asynchronous I/O requests that can be outstanding at one time. This includes requests that are in progress as well as those that are waiting to be started. The maximum number of asynchronous I/O requests cannot be less than the value of AIO_MAX, as defined in the /usr/include/sys/limits.h file, but it can be greater. It would be appropriate for a system with a high volume of asynchronous I/O to have a maximum number of asynchronous I/O requests larger than AIO_MAX.

Server PRIORITY

indicates the priority level of kernel processes dedicated to asynchronous I/O. The lower the priority number is, the more favored the process is in scheduling. Concurrency is enhanced by making this number slightly less than the value of PUSER, the priority of a normal user process. It cannot be made lower than the values of PRI_SCHED.

Since the default priority is (40+nice), these daemons will be slightly favored with this value of (39+nice). If you want to favor them more, make changes slowly. A very low priority can interfere with the system processes that require low priority.

WARNING: Raising the server PRIORITY (decreasing this numeric value) is not recommended since system hangs or crashes could occur if the priority of the AIO servers is favored too much. There is little to be gained by making big priority changes.

PUSER and PRI_SCHED are defined in the /usr/include/sys/pri.h file.

STATE to be configured at system restart

indicates the state to which asynchronous I/O is to be configured during system initialization. The possible values are 1.) defined, which indicates that the asynchronous I/O will be left in the defined state and not available for use, and 2.) available, indicating that asynchronous I/O will be configured and available for use.

STATE of FastPath

You will only see this option if you are at AIX 4.3.2 or greater or any level of AIX 4.3.x with APAR IX79690 installed. Disabling this option forces ALL I/O activity through the aios KPROCs, even I/O activity involving RAW logical volumes. At AIX levels before AIX 4.3 the "Fastpath" is enabled by default and cannot be disabled.


Functions of asynchronous I/O

Large file-enabled asynchronous I/O (AIX Version 4.2.1 or later)

The fundamental data structure associated with all asynchronous I/O operations is struct aiocb. Within this structure is the aio_offset field, which is used to specify the offset for an I/O operation.

The default asynchronous I/O interfaces are limited to an offset of 2G minus 1 due to the signed 32-bit definition of aio_offset. To overcome this limitation, a new aio control block with a signed 64-bit offset field and a new set of asynchronous I/O interfaces have been defined beginning with AIX Version 4.2.1.

The large offset-enabled asynchronous I/O interfaces are available under the _LARGE_FILES compilation environment and under the LARGE_FILE_API programming environment. For further information, see "Writing Programs Which Access Large Files" in AIX Version 4.3 "General Programming Concepts: Writing and Debugging Programs".

In the _LARGE_FILE_API environment, the 64-bit API interfaces are visible. This environment requires recoding of applications to the new 64-bit API name. For further information on using the _LARGE_FILE_API environment, see "Using the 64-Bit File System Subroutines" in AIX Version 4.3 "General Programming Concepts: Writing and Debugging Programs".

Nonblocking I/O

After issuing an I/O request, the application can proceed without being blocked while the I/O operation is in progress. The I/O operation occurs while the application is running. Specifically, when the application issues an I/O request, the request is queued. The application can then resume running before the I/O operation is initiated.

To manage asynchronous I/O, each asynchronous I/O request has a corresponding control block in the application's address space. This control block contains the control and status information for the request. It can be used again when the I/O operation is complete.

Notification of I/O completion

After issuing an asynchronous I/O request, the user application can determine when and how the I/O operation is completed. This information is provided in three ways:

The application can poll the status of the I/O operation.

The system can asynchronously notify the application when the I/O operation is done.

The application can block until the I/O operation is complete.

1. Polling the Status of the I/O Operation

The application can periodically poll the status of the I/O operation. The status of each I/O operation is provided in the application's address space in the control block associated with each request. Portable applications can retrieve the status by using the aio_error subroutine.

2. Asynchronously Notifying the Application When I/O Operation Completes

Asynchronously notifying the I/O completion is done by signals. Specifically, an application may request that a SIGIO signal be delivered when the I/O operation is complete. To do this, the application sets a flag in the control block at the time it issues the I/O request. If several requests have been issued, the application can poll the status of the requests to determine which which have actually completed.

3. Blocking the Application until the I/O Operation Is Complete

The third way to determine whether an I/O operation is complete is to let the calling process become blocked and wait until at least one of the I/O requests it is waiting for is complete. This is similar to synchronous-style I/O. It is useful for applications that, after performing some processing, need to wait for I/O completion before proceeding.

Cancellation of I/O requests

I/O requests can be canceled if they are cancellable. Cancellation is not guaranteed and may succeed or not depending upon the state of the individual request. If a request is in the queue and the I/O operations have not yet started, the request is cancellable. Typically, a request is no longer cancellable when the actual I/O operation has begun.


Order and priority of asynchronous I/O calls

An application may issue several asynchronous I/O requests on the same file or device. However, since the I/O operations are performed asynchronously, the order in which they are handled may not be the order in which the I/O calls were made. The application must enforce ordering of its own I/O requests if ordering is required.

Priority among the I/O requests is not currently implemented. The aio_reqprio field in the control block is currently ignored.

For files that support seek operations, seeking is allowed as part of the asynchronous read or write operations. The whence and offset fields are provided in the control block of the request to set the seek parameters. The seek pointer is updated when the asynchronous read or write call returns.


Subroutines affected by asynchronous I/O

The following existing subroutines are affected by asynchronous I/O:

The close subroutine

The exit subroutine

The exec subroutine

The fork subroutine

If the application closes a file, or calls the _exit or exec subroutines while it has some outstanding I/O requests, the requests are canceled. If they cannot be canceled, the application is blocked until the requests have completed. When a process calls the fork subroutine, its asynchronous I/O is not inherited by the child process.

One fundamental limitation in asynchronous I/O is page hiding. When an unbuffered (raw) asynchronous I/O is issued, the page that contains the user buffer is hidden during the actual I/O operation. This ensures cache consistency. However, the application may access the memory locations that fall within the same page as the user buffer. This may cause the application to block as a result of a page fault. To alleviate this, allocate page aligned buffers and do not touch the buffers until the I/O request using it has completed.


64-bit enhancements

Asynchronous I/O (AIO) has been enhanced to support 64-bit enabled applications. On 64-bit platforms, both 32-bit and 64-bit AIO can occur simultaneously.

The struct aiocb, the fundamental data structure associated with all asynchronous I/O operation, has changed. The element of this struct, aio_return, is now defined as ssize_t. Previously, it was defined as an int. AIO supports large files by default. An application compiled in 64-bit mode can do AIO to a large file without any additional #defines or special opening of those files.