User and Group Names - UID/GID Numbers

On an AIX or Unix system, files are not stored by filename, they are stored by "inode number". Each file has an inode and is identified by an inode number, sometimes called an i-number in the file system where it resides. Inodes provide important information on files such as user and group ownership, access permissions and file type. Each file on a Unix system is associated with exactly one user (owner) and one group.

Similar to the filename/inode relationship, the file owner and group membership is designated and stored by the UID/GID numbers. Each user on a Unix system is assigned a UID number and each group a GID number. These numbers are used by the inode to assign ownership and group membership to files.

In Business Continuity Planning, it is important to recognize that UID/GID numbers should be uniquely assigned to users and groups on an enterprise wide basis.

If these UID/GID numbers are not enterprise wide unique, there will be security issues as well as conflicts during high availability or disaster recovery failovers. Security issues will also arise when a backup is restored onto a machine where the UID/GID numbers do not match those stored in the backup.

The following are recommended policies, standards, guidelines, and procedures, as related to "User and Group Names - UID/GID Numbers":


Policies: User and Group Names - UID/GID Numbers


Guidelines: User and Group Names - UID/GID Numbers


Standards: User Names and UID Numbers


A typical algorithm for performing a UID/GID calculation is to use the "sum" command with the "-r" option to generate the Berkeley cksum value. The "-r" option is supported by AIX and appears to be consistently implemented across a wide variety of other Unix platforms as well. An example of using this technique (all commands shown in this series of documents are expressed in Korn shell syntax):

$ print "abc1234" | sum -r
29247  1

A limitation of this technique is that it will only calculate numbers between about 600 and 65,000 for the specified username format of three lowercase letters followed by 4 digits. So the number of users and groups, enterprise wide, is limited to less than 65,000, and the possibility of duplicates does exist with this algorithm. These limitations may be acceptable for some organizations; however most will want to expand this concept of UID/GID calculation.


The recommended UID/GID calculation algorithm uses a base 26 numbering system to represent the first three characters of the username, and then converts the calculated base 26 number to a base 10 (decimal) value. An example script implementing this technique is shown below.

-- Cut Here --  
#!/usr/bin/ksh93
################################################################
function usagemsg_mkuid {
  print "
Program: mkuid

This function generates a UID number for a username,
that username must consist of 3 letters followed by 4
digits,  or 4 letters followed by 3 digits.  The
username is assumed  to be the first argument to the
function.  The username must  also be exactly 7
characters long.

Usage: ${1##*/} username
  Where:
    username = XXX#### - 3 letters followed by 4 digits
        Generates UID between 1,000,000 and 176,759,999
             or
    username = XXXX### - 4 letters followed by 3 digits
        Generates UID between 176,760,000 and 633,735,000

Author: Dana French (dfrench@mtxia.com) Copyright 2004
\"AutoContent\" enabled
"
}
################################################################
#### 
#### Description:
#### 
#### This function generates a UID number for a username, that
#### username must consist of 3 letters followed by 4 digits, 
#### or 4 letters followed by 3 digits.  The username is assumed 
#### to be the first argument to the function.
#### 
#### Assumptions:
#### 
#### This function assumes the username is constructed with
#### either 3 or 4 contiguous alphabetic characters followed
#### by 4 or 3 contiguous digits.  The total number of
#### characters in the user name must be exactly 7.
#### 
#### Dependencies:
#### 
#### The "mkuid" function is dependent upon the external 
#### function "mkascii" to produce an associative array
#### of ascii decimal values for the lower case letters.
#### 
#### Products:
#### 
#### Upon successful completion this function prints a single
#### decimal value to standard output.
#### 
#### For usernames consisting of 3 lower case letters
#### followed by 4 digits, the resulting UID values are
#### between 1,000,000 and 176,759,999.
#### 
#### For usernames consisting of 4 lower case letters
#### followed by 3 digits, the resulting UID values are
#### between 176,760,000 and 633,735,000.
#### 
#### Configured Usage:
#### 
#### The "mkuid" function can be called from the command
#### line, script, or another function.
#### 
#### Details:
#### 
################################################################
function mkuid
{
  typeset -l LET
  typeset NUMLET=0
  
  [[ "${1}" == "-?" ]] && usagemsg_mkuid "${0}" && return 1
  
#### 
#### If the total number of characters in the username
#### command line argument is not equal to 7, display an
#### error message, followed by the usage message, and return
#### from the function with an unsuccessful return code.
#### 

  if (( ${#1} != 7 ))
  then
      print -u 2 "ERROR: Number of characters in username ${1} is not equal to 7\n"
      usagemsg_mkuid "${0}"
      return -1
  fi
  
#### 
#### Extract only the alphabetic characters from the
#### username command line argument and count the number
#### of characters extracted.
#### 

  NUMLET="${1//([!a-zA-Z])/}"
  NUMLET="${#NUMLET}"

#### 
#### If the number of alphabetic characters in the username
#### command line argument is not equal to 3 or 4, display an
#### error message, followed by the usage message, and return
#### from the function with an unsuccessful return code.
#### 

  if (( NUMLET != 3 )) && (( NUMLET != 4 ))
  then
      print -u 2 "ERROR: Number of letters in username ${1} is not equal to 3 or 4\n"
      usagemsg_mkuid "${0}"
      return -1
  fi
  
#### Determine the base 10 order of magnitude for the
#### numeric portion of the user name and store this value.

  (( ORDMAG = 10 ** ( 7 - NUMLET ) ))

#### Initialize the decimal value of the lower limit for the
#### UID number to one million.  The lowest value allowed for
#### a calculated UID number will be this value.

  typeset LOWLIM=1000000

#### Initialize the decimal value of the UID number using a
#### base 26 calculation.  The lower limit is added to this
#### value to insure the calculated UID number is greater
#### than the lower limit.

  (( LOWUID = ( 26 ** ( NUMLET - 1 ) * LOWLIM / 100 ) + LOWLIM ))
  (( NUMLET == 3 )) && (( LOWUID = LOWLIM ))
  
#### Initialize several variables containing values used
#### while iterating through each character of the username.

  typeset BASE=26
  typeset NVALUE=""
  typeset LVALUE=0
  typeset BASEORDMAG=0
  typeset SUBTOT=0
  typeset TOT=0
  typeset MULT=0
  
####
#### Define an associative array to contain an ascii table of
#### values, the array index is the alphabetic character, the
#### value is the decimal number associated with the
#### character.  Call the external function "mkascii" to
#### create this array.
####

  typeset -A LETTERS
  mkascii LETTERS
  
#   print " ${LETTERS[@]}"
  
#### 
#### Divide the username into individual characters and store
#### each character in an array.  Iterate thru this array one
#### element at a time to calculate the base 26 value of each
#### alphabetic character and the decimal value of each
#### number. 
#### 

  eval EACHLET="( ${1//(?)/ \1} )"
  for LET in "${EACHLET[@]}"
  do
  
####
#### If the current interation character is a lower case
#### letter, subtract the decimal value of 97 from the ascii
#### value for this character ( 97 is the ascii value of the
#### lowercase letter "a" ).  This will cause the value of
#### "a" to be zero, "b" = 1, "c" = 2, etc.  Assign a
#### variable to contain this decimal value for the letter.
####

    if [[ "_${LET}" = _[a-z] ]]
    then
      LVALUE="$(( ${LETTERS[${LET}]} - 97 ))"

####
#### If the current iteration character is not a lower case
#### letter, then it is a number from 0 - 9.  Set the letter
#### value to zero (indicating the iteration character is not
#### a letter) and append the number to the end of the
#### numeric value.
####

    else
      LVALUE="0"
      NVALUE="${NVALUE}${LET}"
    fi
  
####
#### Calculate a base 26 multiplier for the current iteration
#### of the loop.  Each loop iteration should increase this
#### multiplier by a base 26 order of magnitude.
####

    (( MULT = BASE ** BASEORDMAG ))

#### 
#### Multiply the decimal letter value by the base 26
#### multiplier and add the product to a running total for
#### each iteration of the loop.
#### 

    (( SUBTOT = SUBTOT + ( LVALUE * MULT ) ))

#### 
#### Add a value of 1 to the loop counter, which is used to
#### determine the base 26 order of magnitude.
#### 

    (( BASEORDMAG = BASEORDMAG + 1 ))

  done
  
#### 
#### Multiply the value calculated for the alphabetic
#### characters by the base 10 order of magnitude for the
#### numeric portion of the user name.  Add the numeric
#### portion of the user name and the lower limit value to
#### arrive at a final value for the calculated UID number.
#### Print this value to standard output and return from this
#### function with a successful return code.
#### 

  (( TOT = ( ( SUBTOT * ORDMAG ) + NVALUE ) + LOWUID ))
  print ${TOT}
  return 0
}
################################################################
function usagemsg_mkascii {
  print "
Program: mkascii

This function accepts an associative array variable name
as the first command line argument, and builds an ASCII
table of characters in that array.  The index of the
associative array is the ASCII character.  The value
contained in each array element is the decimal, hex, or
octal number associated with the ASCII character array
index.

Usage: ${1##*/} ArrayName
  Where:
    ArrayName = Name of a predefined associative array

Author: Dana French (dfrench@mtxia.com) Copyright 2004
\"AutoContent\" enabled
"
}
################################################################
#### 
#### Description:
#### 
#### The "mkascii" function returns an ASCII table of
#### characters in an associative array. The index of the
#### associative array is the ASCII character.  The value
#### contained in each array element is the decimal, hex, or
#### octal number associated with the ASCII character array
#### index.
#### 
#### The "mkascii" function will accept one or two command
#### line arguments.  The first is the name of a predefined
#### associative array, the optional second command line
#### argument can be specified to designate the type of value
#### (decimal, octal, hexidecimal) to return in the
#### associative array.  Decimal values are returned 
#### default.  If the value of the second command line
#### argument is "8", octal values are returned, if the value
#### of the second command line argument is "16", hexidecimal
#### values are returned.
#### 
#### Assumptions:
#### 
#### It is assumed the first command line argument is the
#### name of a predefined associative array.  If the second
#### command line argument is not specified, it is assumed
#### the user is requesting decimal values be returned in the
#### array.
#### 
#### Dependencies:
#### 
#### The only valid values for the second command line
#### argument are "8" (octal)  and "16" (hexidecimal).
#### Any other values for the second command line argument
#### will result in an error.
#### 
#### Products:
#### 
#### The product of the "mkascii" function is the completion
#### of a predefined associative array indexed by the ASCII
#### character whose value is the decimal, octal, or
#### hexidecimal number that represents the ASCII character.
#### 
#### Configured Usage:
#### 
#### The "mkascii" function can be called from the command
#### line, script, or another function.
#### 
#### Details:
#### 
################################################################
function mkascii
{

#### 
#### If the first command line argument is a literal "-?",
#### display the usage message and exit the function with an
#### unsuccessful return code.
#### 

  [[ "${1}" == "-?" ]] && usagemsg_mkascii "${0}" && return 1

#### 
#### Establish a value variable to contain a decimal value associated
#### with each ASCII character.  The variable containing the
#### decimal value is unevaluated because the evaluation will
#### take place later.
#### 

  typeset VAL='${CNT}'

#### 
#### If the second command line argument is "8", change the
#### value variable to contain a statement that will be
#### evaluated later to octal values.
#### 

  [[ "_${2}" != "_" && "_${2}" =  "_8" ]] && VAL='$( printf "%o" 0x${i}${j} )'

#### 
#### If the second command line argument is "16", change the
#### value variable to contain a statement that will be
#### evaluated later to hexidecimal values.
#### 

  [[ "_${2}" != "_" && "_${2}" = "_16" ]] && VAL='${i}${j}'

#### 
#### Using the first command line argument as the name of a predefined
#### associative array, create a name reference to that array.
#### 

  nameref ASCII_TABLE="${1}"

#### 
#### Initialize a loop counter incremented by one each time
#### through the loop.  This counter is used to represent the
#### decimal value of the ascii character.
#### 

  typeset CNT=0

#### 
#### Loop through the double digit hexidecimal values for all
#### ASCII characters.
#### 

  for i in 0 1 2 3 4 5 6 7 8 9 A B C D E F
  do
    for j in 0 1 2 3 4 5 6 7 8 9 A B C D E F
    do

#### 
#### Evaluate the ASCII character from the double digit
#### hexidecimal value for the current iteration of the
#### loops.  Also evaluate the decimal, octal or hexidecimal
#### value associated with the ASCII character.  Assign
#### the resulting value to the associative array using the
#### "nameref" variable.
#### 

      eval ASCII_TABLE[\\$(print -- $( printf "\\\0%o" 0x${i}${j} ) )]=\"${VAL}\" 2>/dev/null

#### 
#### Increment the loop counter by one (used to represent
#### decimal values of each ASCII character
#### 

      (( CNT = CNT + 1 ))

#### 
#### Return to the beginning of the loop to evaluate the next
#### character in the ASCII sequence.
#### 

    done
  done
}
################################################################
#### 
#### Call the "mkuid" function with all command line arguments

mkuid "${@}"

-- Cut Here --

In the "usage message" areas of the above script, the phrase "AutoContent Enabled" appears. This refers to a technique of commenting scripts in such a way that documentation can be automatically generated from the script. This has the added benefit that whenever updates to the script are made, the documentation is automatically updated also. Notice that all comments in the script begin with 4 hash marks (#) followed by a space, this pattern is used to designate text used as documentation. This automated documentation technique, referred to here as "AutoContent", will be discussed in a later document.

The "mkuid" script shown above is composed of shell functions so that, if desired, the script can be separated into individual components and called from a shell function library. Otherwise, it can be executed as-is from the command line.


Standards: Group Names and GID Numbers

So the recommended algorithm to use for defining the group name/GID number associations is the Berkeley "sum -r" method:

$ print "informix" | sum -r
21883  1

Whatever unique UID/GID calculation algorithm is selected, chosen, or written, it is important that it be implemented enterprise wide to eliminate security issues and failover conflicts during high availability or disaster recovery events. This technique also provides added security during normal maintenance activities by insuring files are not accidentally assigned to a user or group that does not own the file.


Procedures: User and Group Names - UID/GID Numbers


Conclusion:

The result of standardizing user names, group names, UID numbers, and GID numbers includes the following:

Conflict avoidance is important during a disaster recovery effort because conflicts tend to consume large amounts of time. The last place you want to redesign and implement new user/group standards are when you are at your disaster recovery site attempting to recover your production systems. For most organizations, their disaster recovery provisions are not an exact duplicate of their production systems. In a disaster recovery implementation, multiple systems may be consolidated onto a single platform. This consolidation will expose conflicts such as user/group names and ID numbers, and will require these conflicts be resolved before the production systems can be recovered.

For those organizations where the disaster recovery provisions are an exact duplicate of their production systems, standardization of user/group names and ID numbers using this technique, is also highly desirable for the purpose of simplifying user/group support, maintenance and security. Again, file security on AIX is enforced by UID/GID numbers, not user/group names; therefore the production and disaster recovery systems must be synchronized on this aspect.

The best way to avoid conflicts during disaster recovery is to implement and enforce policies, guidelines, standards, and procedures to eliminate conflicts as part of your enterprise wide business continuity planning.

The next document in this series will discuss naming structures for machines, hosts, adapters, resource groups, and aliases for use in a business continuity environment.