NetworkAuthentication/Client

Client

Revision 13 as of 2006-11-01 13:54:27

Launchpad Entry: https://launchpad.net/distros/ubuntu/+spec/network-authentication
Created: Date(2006-10-25T18:00:00CDT) by JerryHaltom

["NetworkAuthentication"]

Introduction

The most prominent step in successfully providing directory services integration on Ubuntu is that of the client. A server implementation without a client does not accomplish much. A client without our own server implementation can get us traction in markets already covered by a directory server, notably the majority of the world on Microsoft Active Directory. This is a market we should desire.

This document outlines the proposed design of Ubuntu's directory services integration from the point of view of a client to the directory service: either a desktop machine or a server. It steps slightly into the realm of the servers when discussing various properties of the client which are directly driven by the choice of server configuration.

After outlining the design, an implementation plan must be created. This plan must take into consideration the scope of the work and available resources.

Rationale

As Ubuntu moves into the enterprise - either as a server or as a workstation - integration with that enterprise's existing systems will not only become desired, but in some circumstances required. Various US security requirements for certain work types such as banking, credit unions or other financial institutions mandate security requirements which force networks to use some form of secure integrated authentication.

A typical form of this is that all systems including workstations and servers need to authenticate to a centralized source in order to retrieve authentication information from a centralized source. This source is then free to apply logging and access restrictions to prevent devices or people from accessing network resources. Not only must LAN and WAN authentication be guaranteed to be encrypted, it must flow from a centralized authority.

Along with legal and organizational requirements, proper directory integration offers many compelling benefits. Security of authentication information in transit can reduce the number of attacks leveled against services. Single sign-on can reduce the management burden of users' passwords. Key-based authentication can reduce the number of times the user needs to be prompted and the number of places on the user's system their various passwords are stored. All of these reduce the necessary effort to access their services and result in a reduced attack footprint and fewer help desk calls.

Improper directory service integration can, however, result in a negative benefit. Authentication failures can reduce large number of users' ability to get their work done or even access their systems. Simple network down-time which previously simply reduced a user's ability to access network resources can now impact a user's ability to even access his local machine. These types of possibilities are unacceptable and a large amount of thought must go into securing each potential fault point in the various systems involved.

Scope

Before we can start planning our implementation we must first define what the scope of our implementation will be. Our initial goal will be to focus on integration with the widest deployed directory service currently in use today: Microsoft Active Directory. Since this directory service is "based" on standard LDAP and Kerberos components, careful selection of components leading to proper integration with it will be a major stepping stone on the path to proper integration with our own directory services.

The scope of implementation for the client support is to allow Ubuntu systems to be configured against a specific named Active Directory domain. User interaction should be kept to a minimum. Preferably only two questions would be asked: 'What is the domain name?' and 'What credentials do I need to join it?' The answer to these questions should be preseedable as part of an automated Ubuntu deployment. After configuring the system the user should be able to log into the system using credentials which are hosted by the domain.

Client support takes precedence over an Ubuntu directory server. Client support can instantly give us a user base in existing directory installations. Implementation of client support will give us exposure to these environments and a better understanding of how existing vendors have implemented their directory services. This understanding is critical for successful implementation. An install base also grants us a path of least resistance in establishing an income stream from institutions interesting in deploying Ubuntu boxes alongside their existing services.

Dealing with the problem of roaming user data or home directories is left to other specifications.

Cross-realm authentication should be supported. The user should be able to log into any domain which the Active Directory forest allows him to. Most large (multi-location) enterprises actually use multiple domains joined together into a "forest". Not providing access to these resources (such as: an email server in one domain connecting and sending email to a server in another domain; a user from one domain accessing an internal web site at another location; or users connecting to centralized corporate file shares) would be detrimental to the use of Ubuntu in large distributed enterprises. The log-in interface needs to take this into account. The user needs to be able to select which realm he should authenticate against from a drop-down list of available realms.

Operations common in enterprises need to be considered. User names are sometimes named based on family names, and sometimes these change. Participating client machines need to handle this situation transparently. If a user is renamed he should be able to log in as his new user name and his settings should be preserved. This presents some unique challenges for most Unix/Linux environments. An example is the crontab files which are named based on the user name. Another example is local group memberships which are stored in the local /etc/group file based on user name.

As mentioned in the introduction disconnected authentication needs to be perfected. An Ubuntu laptop needs to operate disconnected. It needs to allow the user to log-in even though he has no contact with the LDAP or Kerberos servers. Again, shared files are not addressed by this specification. Obviously though the user is able to log-in he will not be able to access network resources until he connects. Upon reconnection the user should have a method of taking whatever action is necessary to refresh his network credentials.

Implementation

Significant value is offered by reusing existing software available in both Ubuntu and Debian, or elsewhere, to form our implementation. Current rudimentary implementations exist for most of the important parts which we can build our effort upon. The two systems inside Linux which we will be working with are NSS and PAM.

NSS

I am revising this section and will have it done shortly. Short of the matter is I am changing focus to a pure Winbind mode for the short-term AD. For long term, Winbind should be renamed, have backends added, and become a core component of a next generation NSS API. [JerryHaltom]

NSS (Name Service Switch) is provided by the base libc libraries and used to provide Posix defined elements to applications (passwd, group, shadow, host.) To introduce the concept of remote users to our systems, extensions are added to NSS which retrieve the required user information from remote sources. Currently there exists libnss-ldap which is contains a basic implementation of support to transform an LDAP query into a Posix 'passwd' and 'group' lists.

Typical use of NSS is very fine grained. Applications query large lists of all available passwd records and manipulate them to fill in drop down lists and other UI. Applications query NSS many times in fairly inefficient ways, often times retrieving the same record multiple times. NSCD (the Name Service Caching Daemon) seeks to eliminate a portion of this. NSCD runs as root and answers NSS queries on behalf of applications. It has rudimentary caching functionality with simple positive and negative expiry times. It does not handle disconnected operation properly. It is also fairly buggy when operating alongside libnss-ldap.

Applications are typically very sensitive to NSS response times. They often block the UI while retrieving a user list. This works properly as long as obtaining a user list is simply reading from a local file, but it becomes a problem when the user information must be retrieved from the network. Not only is the network an order of magnitude slower under optimal conditions, conditions are not always optimal. Because of this the interaction between libnss-ldap and NSCD needs to be tuned to provide near real time listing of users. Typically this means a cache must be actively maintained in NSCD at all times and that responses should be returned directly from this cache. This is not currently the case in either of these pieces of software. Work is needed to improve the situation.

Though the short comings in existing and future software which uses NSS will force us to make optimizations in NSS to satisfy it, it should not preclude us from fixing those applications so that their actions are more efficient.

Disconnected operation needs to be provided. To this end the NSS cache must never completely expire and be removed from the system. An up to date list of ALL users and groups must be maintain and persisted between reboots. When a laptop is shut down and disconnected from the corporate network NSS queries should continue working as expected. This cache should be guaranteed made and up-to-date at the point of joining a realm.

The NSCD daemon will run as root. When issuing LDAP queries it will use the host/$(hostname)@REALM principal to authenticate with the remote LDAP server. This requires that the host principal key is created during a realm join and that it is maintained up to date.

PAM

PAM is the method by which the user is prompted for initial authentication information (user name and password) and the path that that information takes to either successfully authenticate the user or deny them access. An existing implementation of libpam-krb5 exists in various forms. This allows a remote Kerberos KDC to be contacted to validate a user's password. It also facilitates the retrieval of an initial Kerberos TGT upon log-in.

The same conditions that affect NSS can affect a user's ability to authenticate through PAM. PAM however is much more lenient. Systems should wait until the server is confirmed inaccessible before resulting to reading from a cache. A package exists in Ubuntu named 'libpam-ccreds'. Properly used this module can solve this problem nearly completely. An example PAM stack follows:

auth [default=die success=done authinfo_unavail=reset] pam_unix.so debug
auth [default=die success=1 service_err=reset auth_err=die] pam_krb5.so use_first_pass debug forwardable
auth [default=die success=done] pam_ccreds.so action=validate use_first_pass
auth [default=done] pam_ccreds.so action=store use_first_pass

When authenticating a user the first step is to use pam_unix. This attempts to read the user from the local password file. If the user is found and authentication is successful processing of the stack exits. If the user is unavailable (authinfo_unavail) (does not exist in /etc/passwd) then processing moves to the next module. In all other cases (invalid password) processing ends and the authentication attempt has failed. If pam_unix was unable to locate the user and proceeds to the next module, pam_krb5 attempts to validate the user against the remote KDC. If this succeeds processing jumps to the last module, pam_ccreds, which stores a SHA1 hash of the password in a local database. If pam_krb5 has a service_error (is unable to reach the KDC), pam_ccreds validates the password against the cache. If pam_krb5 returns auth_err (KDC was reachable, but password as incorrect), processing ends: the user entered the wrong password.

This PAM configuration needs to be extensively tested and validated.

Tasks

Both of these pieces of software need to be tuned to support more automated run-time server selection. In a large enterprise LDAP and KDC servers come and go as networks expand and contract. Some go offline and then come back. A proper implementation will use SRV records to query for available servers for a given domain. It will cache these records and consult the servers in some sort of priority (preferably based on metrics to determine least cost). When a server is unable to fulfill a request the software will attempt to find another. Periodically the list of SRV records will be retrieved and servers which were previously unreachable will be tried again. This process must be very well tuned. All workstations in an enterprise suddenly having a coding error and DDoSing a single server is unacceptable.

Kerberos and LDAP systems do not use a simple short text string to uniquely identifier their users. Kerberos uses a principal name containing the a realm portion: user@REALM.DOM. LDAP usually identifies it's objects based on their path within a given directory: cn=User Name,ou=Users,dc=realm,dc=com. Both of these are mutable and cannot be relied upon for long term storage of attributes linked to the user entity. Principals can be renamed and LDAP objects can be renamed or moved. Long term storage requires an immutable identifier be used which can be determined from the remote object and linked back up to the remote object once it has moved. In Unix we use the UID for this purpose.

The UID in Linux is currently a 32-bit identifier which is simply desired to never change for a given user. File system permissions are stored based on this UID and as such have no problem tracking user moves or renames. LDAP provides us a place to store this UID attached to the user's other information such as principal name.

Various systems in Linux however are not keyed based on the UID, but instead based on the user name. An example is group membership in /etc/group. Another example is crontab files in /var/lib/crontab. Both of these make relationships to the remote entities name instead of it's UID. Two solutions exist: a) mandate object names never change and b) store references based on an immutable key. The first solution is not likely to go over well in Active Directory. The second solution will require that group memberships in /etc/group be stored based on UID and crontab files be changed to be based on UID as well. Other systems will need to be audited.

Though a directory service is free to assign whatever UIDs it feels are appropriate for a given user, our system needs to take some of this into account. We should allocate a range of usable UIDs which we consider "local users" and another range which we consider "remote users". This will help prevent conflicts.

Cross-Realm Authentication

Cross-realm authentication allows a client in one domain to access a resource in another. Two pieces are required for this. First, the NSS libraries must be able to distinguish between users in different realms. They may be named the same. This is unavoidable. Posix does not allow this. Second, unique UIDs must be used for all users across all known realms. The second issue is currently left our of this specification as partitioning and allocation of unique UID numbers across an entire enterprise is very much dependent on the resources of the directory server in question. The first issue means that we must mangle Posix user names and combine in a realm identifier. Posix user names then take the form of username@REALM. This keeps parity with the Kerberos principal name.

Cross-realm authentication poses some unique problems for our NSS situation. The nature of cross-realm authentication means that a single passwd listing might require communication with a dozen different directory servers across an enterprise. This is not efficient. Active Directory provides something known as the Global Catalog on the server side to remedy this situation. The Global catalog is simply a complete listing of all objects needed from all trusted realms. Thus is might be possible for our LDAP libraries to have the option of talking to a specific set of secondary LDAP servers (Global Catalog servers) in the case of certain user look up operations.

Other options might be that the workstations do contact the remote servers directly but do so with heavy caching restrictions. Attention needs to be paid to detail. Typing 'ls -l' from a shared network drive will result in a separate NSS query being issued for every unique UID that appears in the listing. This operation cannot be slow. The operation does not have to translate into dozens of separate LDAP queries.

Many current Linux applications query and enumerate the entire NSS passwd table for some operations. Nautilus does so to display it's permission owner selection drop-down boxes on files. As we move into large enterprises with many hundreds of thousands of distributed users, this situation can not continue. Not only is such a query very large and will take a very long time to retrieve the sheer number of users will make the interface unbearable. Different interfaces need to be introduced which make querying and sorting large user bases more reliable. Though you should be able to set a permission on a file relating to a user in another realm, it is not the primary use case. Typically you want only a list of users in the current realm or you want to filter based on the user's name or portion of their name. This requires some extended knowledge which is not present in the standard passwd table. Posix has no concept of which realm a give user exists in, nor no concept on how to return an enumeration of users for a specific realm. For this interface to work either Nautilus will have to circumvent NSS and issue LDAP queries directly or NSS will have to grow such functions.

This section may be removed. Altering NSS, though it would solve the problem for now, might be too intrusive a change into too critical a system to be gain any traction in any other libc implementations. Creation of a full fledged alternative to NSS may be the appropiate answer. NSS compatibility would of course be maintained. [JerryHaltom]

One idea is the introduction of a new NSS table simply named 'realm'. Records would look like:

0:
1:COMPANY.DOM
2:US.COMPANY.DOM
3:EU.COMPANY.DOM

A realm has a unique ID and a unique name. NSS service modules such as libnss-ldap would be able to provide support for this table by querying the domain for trust relationships. Functions would be added to enumerate users or groups from a given realm. The libnss-ldap module could implement support for these queries by directly contacting the appropriate LDAP server and issuing a single query. Whether the addition of a new NSS table in libc is something which can be done, or should be done, is up in the air.

Each system would have file based equivalent (/etc/realm) and a default realm assigned (/etc/realmname).

Additionally this new NSS table would drive out interfaces where a realm listing is required, such as with GDM (Gnome Display Manager). We would be able to provide the user a drop-down list to select where their user account is located.

Kerberos Hosts

Each machine participating in a Kerberos network should have a host principal. This is essentially a principal named host/$(hostname --fqdn)@REALM. The host uses it to access services on the network on it's own. NSS for instance should connect to LDAP using the host principal in order to retrieve user listings. This prevents unauthorized devices from knowing available user names.

This Kerberos principal should be created while joining the host to the domain. Our domain join procedure needs to take that into account. The principal should be created with a random password. The machine itself should periodically change it's own password.

Kerberos Users

After a user has logged into his session he will have acquired a Kerberos TGT via the libpam-krb5 module. Applications the user opens will use this TGT to request service tickets for services they require access to. After a certain period of time the Kerberos TGT will need to be renewed. This is a process which should happen automatically. To accomplish this a per-user daemon should run within his session which keeps track of the user's TGT and periodically requests it be renewed. This process should require no user interaction.

Eventually the certificate can actually expire. When this happens the authentication stage has to happen again: the user has to enter his password. The user should not be forced to log off and back on to resume access. When the user has an expired TGT a notification should pop up explaining to the user that he needs to re-authenticate to the network. A password dialog which acquires a new TGT needs to be displayed. An alternative is to ask the user to lock the screen. Unlocking it will result in the user reacquiring a TGT. Windows does this.

When a user changes his password the PAM stack should allow it. In the case of a MIT or Heimdal Kerberos server, a kadmin protocol is provided. In the case of Active Directory, Winbind must be used. The PAM configuration should take all of this into account.

Interface

Manual configuration of these varied Linux subsystems would take substantial time on the part of the administrator. To that end a simple interface should be created which integrates the various components outlined above in an easy to use form where the least number of required questions are asked. For an Active Directory domain this can be reduced to asking for the domain name and the authentication information required to connect to the domain.

This process should be easily preseedable for automated Ubuntu deployments. When deploying Active Directory boxes it is desired that they join themselves to the domain when they first boot up. The seed would need to be able to contain the domain name and authentication information.

["NetworkAuthentication/Client/Interface"]

Action

A number of tasks are required to make this implementation real.

Clean-up, audit and test combinations of libpam-krb5, libnss-ldap and libpam-ccreds. Fix any open issues preventing them from working together smoothly.
NSS components must keep blocking to an absolute minimum. NSS queries must be answered in near real time. If a LDAP server is slow to respond, we give up easily. KDC responses are more tolerant.
Address caching of NSS data when delivered over the network. NSCD will need to be audited and corrected to work optimally.
Address intelligent fall-back when communicating with LDAP and Kerberos servers.
Per-host Kerberos maintenance. Winbind will be required for some portions of AD integration. Some custom daemon will need to be created to maintain host keys for non-AD situations.
Work on interface for configuring authentication. Both command line and Gtk based.
Per-user Kerberos maintenance. The user session needs a Gtk-based daemon to renew Kerberos tickets periodically. In the case of an expiration, a procedure should be provided for the user to re-authenticate. Windows does this by asking the user to lock and unlock their sessions (effectively asking for their password.) This seems sane to me.

Comments

Please place comments under here:

I'd like to see the per-user Kerberos maintainance be done using integration into gnome-keyring and let the user be able to maintain Kerberos tickets alongside GPG and SSH keys. [JelmerVernooij]

Jelmer, I would like to see this at some point too. Right now though, gnome-keyring is not in any shape for it. gnome-keyring really is nothing more than an encrypted password store. It has no facilities for inserting external things into it. Seahorse however does. Maybe you meant Seahorse? If so, yes. Seahorse may be a good basis for our ticket management interface. [JerryHaltom]

Ubuntu Wiki