User and Group Names - UID/GID Numbers

On an AIX or Unix system, files are not stored by filename, they are stored by "inode number". Each file has an inode and is identified by an inode number, sometimes called an i-number in the file system where it resides. Inodes provide important information on files such as user and group ownership, access permissions and file type. Each file on a Unix system is associated with exactly one user (owner) and one group.

Similar to the filename/inode relationship, the file owner and group membership is designated and stored by the UID/GID numbers. Each user on a Unix system is assigned a UID number and each group a GID number. These numbers are used by the inode to assign ownership and group membership to files.

In Business Continuity Planning, it is important to recognize that UID/GID numbers should be uniquely assigned to users and groups on an enterprise wide basis.

If these UID/GID numbers are not enterprise wide unique, there will be security issues as well as conflicts during high availability or disaster recovery failovers. Security issues will also arise when a backup is restored onto a machine where the UID/GID numbers do not match those stored in the backup.

The following are recommended policies, standards, guidelines, and procedures, as related to "User and Group Names - UID/GID Numbers":

Policies: User and Group Names - UID/GID Numbers

Each user provided access to any system shall have an enterprise wide unique username assigned to them. This username will be implemented on each system to which this user requires access.

Each user provided access to any system shall have an enterprise wide unique UID number assigned to them. This UID number will be implemented on all systems requiring a username/UID association.

Each group name implemented on a Unix system shall have an enterprise wide unique GID number assigned to it. This GID number will be implemented on every system where this group name is implemented.

Guidelines: User and Group Names - UID/GID Numbers

It is important to choose a format for user names that provides enterprise wide flexibility for use with as many platforms as possible. A user name format that appears to be supported on the widest range of platforms (AIX, Linux, AS/400, MVS, VSE, MS Windows, Unix) is a seven character username, the first three characters being lowercase letters, the last four characters being digits from 0 - 9.

In order to avoid the maintenance and support issues of keeping a database of UID/GID values and their associated user and group names, a reproducible algorithm should be used to calculate the UID/GID values. This algorithm should be executable on any Unix system and provide the same UID/GID number for a given user or group name. Since the UID/GID numbers are reproducible on any Unix platform, the only values that must be maintained are the enterprise wide unique user and group names.

Standards: User Names and UID Numbers

In order to facilitate normal maintenance, disaster recovery and business continuity, it is recommended that each enterprise wide unique user and group name, also be assigned enterprise wide unique UID or GID number.

This will cause files whose owners/groups are not defined in the /etc/password file to appear in a long listing with the enterprise wide unique UID and GID numbers for the owner and group. This condition may indicate which users need to be added to the system, or which files need to be removed from the system. Regardless of the fix, file security will remain intact and uncompromised.

Instead of keeping a database of username/UID and group name/GID associations, it is preferred to use an algorithm to generate the UID/GID values based on the username/group name. This algorithm must be reproducible across all systems that require a UID/GID number.

A typical algorithm for performing a UID/GID calculation is to use the "sum" command with the "-r" option to generate the Berkeley cksum value. The "-r" option is supported by AIX and appears to be consistently implemented across a wide variety of other Unix platforms as well. An example of using this technique (all commands shown in this series of documents are expressed in Korn shell syntax):

$ print "abc1234" | sum -r
29247  1

A limitation of this technique is that it will only calculate numbers between about 600 and 65,000 for the specified username format of three lowercase letters followed by 4 digits. So the number of users and groups, enterprise wide, is limited to less than 65,000, and the possibility of duplicates does exist with this algorithm. These limitations may be acceptable for some organizations; however most will want to expand this concept of UID/GID calculation.

The recommended UID/GID calculation algorithm uses a base 26 numbering system to represent the first three characters of the username, and then converts the calculated base 26 number to a base 10 (decimal) value. An example script implementing this technique is shown below.

-- Cut Here --  
#!/usr/bin/ksh93
################################################################
function usagemsg_mkuid {
  print "
Program: mkuid

This function generates a UID number for a username,
that username must consist of 3 letters followed by 4
digits,  or 4 letters followed by 3 digits.  The
username is assumed  to be the first argument to the
function.  The username must  also be exactly 7
characters long.

Usage: ${1##*/} username
  Where:
    username = XXX#### - 3 letters followed by 4 digits
        Generates UID between 1,000,000 and 176,759,999
             or
    username = XXXX### - 4 letters followed by 3 digits
        Generates UID between 176,760,000 and 633,735,000

Author: Dana French (dfrench@mtxia.com) Copyright 2004
\"AutoContent\" enabled
"
}
################################################################
#### 
#### Description:
#### 
#### This function generates a UID number for a username, that
#### username must consist of 3 letters followed by 4 digits, 
#### or 4 letters followed by 3 digits.  The username is assumed 
#### to be the first argument to the function.
#### 
#### Assumptions:
#### 
#### This function assumes the username is constructed with
#### either 3 or 4 contiguous alphabetic characters followed
#### by 4 or 3 contiguous digits.  The total number of
#### characters in the user name must be exactly 7.
#### 
#### Dependencies:
#### 
#### The "mkuid" function is dependent upon the external 
#### function "mkascii" to produce an associative array
#### of ascii decimal values for the lower case letters.
#### 
#### Products:
#### 
#### Upon successful completion this function prints a single
#### decimal value to standard output.
#### 
#### For usernames consisting of 3 lower case letters
#### followed by 4 digits, the resulting UID values are
#### between 1,000,000 and 176,759,999.
#### 
#### For usernames consisting of 4 lower case letters
#### followed by 3 digits, the resulting UID values are
#### between 176,760,000 and 633,735,000.
#### 
#### Configured Usage:
#### 
#### The "mkuid" function can be called from the command
#### line, script, or another function.
#### 
#### Details:
#### 
################################################################
function mkuid
{
  typeset -l LET
  typeset NUMLET=0
  
  [[ "${1}" == "-?" ]] && usagemsg_mkuid "${0}" && return 1
  
#### 
#### If the total number of characters in the username
#### command line argument is not equal to 7, display an
#### error message, followed by the usage message, and return
#### from the function with an unsuccessful return code.
#### 

  if (( ${#1} != 7 ))
  then
      print -u 2 "ERROR: Number of characters in username ${1} is not equal to 7\n"
      usagemsg_mkuid "${0}"
      return -1
  fi
  
#### 
#### Extract only the alphabetic characters from the
#### username command line argument and count the number
#### of characters extracted.
#### 

  NUMLET="${1//([!a-zA-Z])/}"
  NUMLET="${#NUMLET}"

#### 
#### If the number of alphabetic characters in the username
#### command line argument is not equal to 3 or 4, display an
#### error message, followed by the usage message, and return
#### from the function with an unsuccessful return code.
#### 

  if (( NUMLET != 3 )) && (( NUMLET != 4 ))
  then
      print -u 2 "ERROR: Number of letters in username ${1} is not equal to 3 or 4\n"
      usagemsg_mkuid "${0}"
      return -1
  fi
  
#### Determine the base 10 order of magnitude for the
#### numeric portion of the user name and store this value.

  (( ORDMAG = 10 ** ( 7 - NUMLET ) ))

#### Initialize the decimal value of the lower limit for the
#### UID number to one million.  The lowest value allowed for
#### a calculated UID number will be this value.

  typeset LOWLIM=1000000

#### Initialize the decimal value of the UID number using a
#### base 26 calculation.  The lower limit is added to this
#### value to insure the calculated UID number is greater
#### than the lower limit.

  (( LOWUID = ( 26 ** ( NUMLET - 1 ) * LOWLIM / 100 ) + LOWLIM ))
  (( NUMLET == 3 )) && (( LOWUID = LOWLIM ))
  
#### Initialize several variables containing values used
#### while iterating through each character of the username.

  typeset BASE=26
  typeset NVALUE=""
  typeset LVALUE=0
  typeset BASEORDMAG=0
  typeset SUBTOT=0
  typeset TOT=0
  typeset MULT=0
  
####
#### Define an associative array to contain an ascii table of
#### values, the array index is the alphabetic character, the
#### value is the decimal number associated with the
#### character.  Call the external function "mkascii" to
#### create this array.
####

  typeset -A LETTERS
  mkascii LETTERS
  
#   print " ${LETTERS[@]}"
  
#### 
#### Divide the username into individual characters and store
#### each character in an array.  Iterate thru this array one
#### element at a time to calculate the base 26 value of each
#### alphabetic character and the decimal value of each
#### number. 
#### 

  eval EACHLET="( ${1//(?)/ \1} )"
  for LET in "${EACHLET[@]}"
  do
  
####
#### If the current interation character is a lower case
#### letter, subtract the decimal value of 97 from the ascii
#### value for this character ( 97 is the ascii value of the
#### lowercase letter "a" ).  This will cause the value of
#### "a" to be zero, "b" = 1, "c" = 2, etc.  Assign a
#### variable to contain this decimal value for the letter.
####

    if [[ "_${LET}" = _[a-z] ]]
    then
      LVALUE="$(( ${LETTERS[${LET}]} - 97 ))"

####
#### If the current iteration character is not a lower case
#### letter, then it is a number from 0 - 9.  Set the letter
#### value to zero (indicating the iteration character is not
#### a letter) and append the number to the end of the
#### numeric value.
####

    else
      LVALUE="0"
      NVALUE="${NVALUE}${LET}"
    fi
  
####
#### Calculate a base 26 multiplier for the current iteration
#### of the loop.  Each loop iteration should increase this
#### multiplier by a base 26 order of magnitude.
####

    (( MULT = BASE ** BASEORDMAG ))

#### 
#### Multiply the decimal letter value by the base 26
#### multiplier and add the product to a running total for
#### each iteration of the loop.
#### 

    (( SUBTOT = SUBTOT + ( LVALUE * MULT ) ))

#### 
#### Add a value of 1 to the loop counter, which is used to
#### determine the base 26 order of magnitude.
#### 

    (( BASEORDMAG = BASEORDMAG + 1 ))

  done
  
#### 
#### Multiply the value calculated for the alphabetic
#### characters by the base 10 order of magnitude for the
#### numeric portion of the user name.  Add the numeric
#### portion of the user name and the lower limit value to
#### arrive at a final value for the calculated UID number.
#### Print this value to standard output and return from this
#### function with a successful return code.
#### 

  (( TOT = ( ( SUBTOT * ORDMAG ) + NVALUE ) + LOWUID ))
  print ${TOT}
  return 0
}
################################################################
function usagemsg_mkascii {
  print "
Program: mkascii

This function accepts an associative array variable name
as the first command line argument, and builds an ASCII
table of characters in that array.  The index of the
associative array is the ASCII character.  The value
contained in each array element is the decimal, hex, or
octal number associated with the ASCII character array
index.

Usage: ${1##*/} ArrayName
  Where:
    ArrayName = Name of a predefined associative array

Author: Dana French (dfrench@mtxia.com) Copyright 2004
\"AutoContent\" enabled
"
}
################################################################
#### 
#### Description:
#### 
#### The "mkascii" function returns an ASCII table of
#### characters in an associative array. The index of the
#### associative array is the ASCII character.  The value
#### contained in each array element is the decimal, hex, or
#### octal number associated with the ASCII character array
#### index.
#### 
#### The "mkascii" function will accept one or two command
#### line arguments.  The first is the name of a predefined
#### associative array, the optional second command line
#### argument can be specified to designate the type of value
#### (decimal, octal, hexidecimal) to return in the
#### associative array.  Decimal values are returned 
#### default.  If the value of the second command line
#### argument is "8", octal values are returned, if the value
#### of the second command line argument is "16", hexidecimal
#### values are returned.
#### 
#### Assumptions:
#### 
#### It is assumed the first command line argument is the
#### name of a predefined associative array.  If the second
#### command line argument is not specified, it is assumed
#### the user is requesting decimal values be returned in the
#### array.
#### 
#### Dependencies:
#### 
#### The only valid values for the second command line
#### argument are "8" (octal)  and "16" (hexidecimal).
#### Any other values for the second command line argument
#### will result in an error.
#### 
#### Products:
#### 
#### The product of the "mkascii" function is the completion
#### of a predefined associative array indexed by the ASCII
#### character whose value is the decimal, octal, or
#### hexidecimal number that represents the ASCII character.
#### 
#### Configured Usage:
#### 
#### The "mkascii" function can be called from the command
#### line, script, or another function.
#### 
#### Details:
#### 
################################################################
function mkascii
{

#### 
#### If the first command line argument is a literal "-?",
#### display the usage message and exit the function with an
#### unsuccessful return code.
#### 

  [[ "${1}" == "-?" ]] && usagemsg_mkascii "${0}" && return 1

#### 
#### Establish a value variable to contain a decimal value associated
#### with each ASCII character.  The variable containing the
#### decimal value is unevaluated because the evaluation will
#### take place later.
#### 

  typeset VAL='${CNT}'

#### 
#### If the second command line argument is "8", change the
#### value variable to contain a statement that will be
#### evaluated later to octal values.
#### 

  [[ "_${2}" != "_" && "_${2}" =  "_8" ]] && VAL='$( printf "%o" 0x${i}${j} )'

#### 
#### If the second command line argument is "16", change the
#### value variable to contain a statement that will be
#### evaluated later to hexidecimal values.
#### 

  [[ "_${2}" != "_" && "_${2}" = "_16" ]] && VAL='${i}${j}'

#### 
#### Using the first command line argument as the name of a predefined
#### associative array, create a name reference to that array.
#### 

  nameref ASCII_TABLE="${1}"

#### 
#### Initialize a loop counter incremented by one each time
#### through the loop.  This counter is used to represent the
#### decimal value of the ascii character.
#### 

  typeset CNT=0

#### 
#### Loop through the double digit hexidecimal values for all
#### ASCII characters.
#### 

  for i in 0 1 2 3 4 5 6 7 8 9 A B C D E F
  do
    for j in 0 1 2 3 4 5 6 7 8 9 A B C D E F
    do

#### 
#### Evaluate the ASCII character from the double digit
#### hexidecimal value for the current iteration of the
#### loops.  Also evaluate the decimal, octal or hexidecimal
#### value associated with the ASCII character.  Assign
#### the resulting value to the associative array using the
#### "nameref" variable.
#### 

      eval ASCII_TABLE[\\$(print -- $( printf "\\\0%o" 0x${i}${j} ) )]=\"${VAL}\" 2>/dev/null

#### 
#### Increment the loop counter by one (used to represent
#### decimal values of each ASCII character
#### 

      (( CNT = CNT + 1 ))

#### 
#### Return to the beginning of the loop to evaluate the next
#### character in the ASCII sequence.
#### 

    done
  done
}
################################################################
#### 
#### Call the "mkuid" function with all command line arguments

mkuid "${@}"

-- Cut Here --

In the "usage message" areas of the above script, the phrase "AutoContent Enabled" appears. This refers to a technique of commenting scripts in such a way that documentation can be automatically generated from the script. This has the added benefit that whenever updates to the script are made, the documentation is automatically updated also. Notice that all comments in the script begin with 4 hash marks (#) followed by a space, this pattern is used to designate text used as documentation. This automated documentation technique, referred to here as "AutoContent", will be discussed in a later document.

The "mkuid" script shown above is composed of shell functions so that, if desired, the script can be separated into individual components and called from a shell function library. Otherwise, it can be executed as-is from the command line.

Standards: Group Names and GID Numbers

Since the group names will likely not conform to the same standard selected for the usernames, a different mechanism is required to provide the group name/GID number associations. Software vendors, suppliers, and internal operations will specify group names that must be implemented as-is. For instance the Informix database will require the group name "informix" be implemented on each system running the database.

The recommended UID calculation method which implements a base 26 conversion of the username to a decimal value cannot be used in this instance, because it will generate decimal values greater than that supported by AIX and/or HACMP. Therefore and alternate method is needed for GID calculations. The previously mentioned Berkeley "sum -r" method will likely suffice for this purpose. The limitation of 65,536 names does not cause a problem in reference to defining group names, because it is extremely unlikely that this limitation would ever be reached.

So the recommended algorithm to use for defining the group name/GID number associations is the Berkeley "sum -r" method:

$ print "informix" | sum -r
21883  1

Whatever unique UID/GID calculation algorithm is selected, chosen, or written, it is important that it be implemented enterprise wide to eliminate security issues and failover conflicts during high availability or disaster recovery events. This technique also provides added security during normal maintenance activities by insuring files are not accidentally assigned to a user or group that does not own the file.

Procedures: User and Group Names - UID/GID Numbers

As part of the standard build for each AIX system, copy the "mkuid" script to each system, the recommended location is:
/usr/local/sh/mkuid

When implementing a new user on an AIX system thru the "SMIT" interface, the UID number should first be calculated. An example is shown for a username "abc1234":
/usr/local/sh/mkuid abc1234

The resulting number should be used as the "User ID" value when adding the user "abc1234" on an AIX system thru the "SMIT" interface.

When implementing a new user thru a script, the resulting number from the "mkuid" calculation should be captured into a variable and used as the UID number.

When implementing a new group on an AIX system thru the "SMIT" interface, the GID number should first be calculated using the Berkeley "sum -r" method. An example is shown for a group name "informix":
$ print "informix" | sum -r | awk '{ print $1 }' 21883

The value "21883" should be used as the Group ID number when adding the group "informix" on an AIX system thru the "SMIT" interface.

When implementing a new group thru a script, the resulting number from the "sum -r" calculation should be captured into a variable and used as the GID number.

Conclusion:

The result of standardizing user names, group names, UID numbers, and GID numbers includes the following:

Individual login names for each user
Capable of centralized user/group management
Increased security
Improved auditing capabilities
Avoidance of user name conflicts
Avoidance of group name conflicts
Avoidance of UID conflicts
Avoidance of GID conflicts
Avoidance of Security holes during and after DR

Conflict avoidance is important during a disaster recovery effort because conflicts tend to consume large amounts of time. The last place you want to redesign and implement new user/group standards are when you are at your disaster recovery site attempting to recover your production systems. For most organizations, their disaster recovery provisions are not an exact duplicate of their production systems. In a disaster recovery implementation, multiple systems may be consolidated onto a single platform. This consolidation will expose conflicts such as user/group names and ID numbers, and will require these conflicts be resolved before the production systems can be recovered.

For those organizations where the disaster recovery provisions are an exact duplicate of their production systems, standardization of user/group names and ID numbers using this technique, is also highly desirable for the purpose of simplifying user/group support, maintenance and security. Again, file security on AIX is enforced by UID/GID numbers, not user/group names; therefore the production and disaster recovery systems must be synchronized on this aspect.

The best way to avoid conflicts during disaster recovery is to implement and enforce policies, guidelines, standards, and procedures to eliminate conflicts as part of your enterprise wide business continuity planning.

The next document in this series will discuss naming structures for machines, hosts, adapters, resource groups, and aliases for use in a business continuity environment.