UbuntuServerLogs

Summary

server-logs addresses the problem that nearly all /var/log data relates to either a specific service, or general error conditions. server-logs keeps routine data relating to general system parameters. server-logs will be installed by default. State data is collected regularly about networking, filesystems, some vital files, statistics exposed in /proc, anything typed at the commandline (using cloba), and all programs run. The overhead is designed to be minimal, and no new tools are required. Where logging to a remote server is enabled (normal on significant networks) server-logs enables easy forensic event reconstruction when a server crashes or is destroyed beyond repair.

This spec is about logging. Monitoring is a different spec.

This spec does not address rate-limiting or possible resource exhaustion. See another spec for that.

Release Note

server-logs uses a combination of existing packages (acct, inotify-tools, cloba, X, Y) and a little bash scripting called from cron. These record data vital for knowing what Ubuntu Server has been doing when functioning normally in order to help when things go wrong. In addition to the log files kept by the acct, cloba and X packages, a new /var/log/system-log/ directory is populated with output from standard system commands such as ip, vmstat, iostat, cat /proc/sys/XXX. In /var/log/apt records are kept of dpkg --get-selections from cron, and the apt system is configured to log its invocation commands, ie packages requested but not any automatic dependencies. These logs can also be sent to a remote syslog server, meaning server-logs also gives a new way of debugging after a system disaster.

Rationale

Administrators need a logging scheme that will let them look into the history of problems they never anticipated. They will be also pleased to have a tool for tracking down the biggest single cause of problems on a network: the administrator themselves.

Use Cases

  • Misaki is an administrator with an intermittent network problem on her server. She notices ip -s link show (or ifconfig) reports errors. She greps through /var/log/system-logs/network and notices that these errors have accumulated steadily since a particular day last week. She knows a new switch/network card was installed that week, and now has a clue where the problem might lie.

  • Yunna is an administrator whose SQL server failed to respond several times today, although it is working fine now. She checked DNS but no other system has had a DNS problem. She greps through /var/log/system-logs/files and sees a number of changes to resolv.conf that coincide with the timeouts. She checks with her colleague who confirms he had broke resolv.conf a couple of times before fixing it again.

  • Takara is an administrator who receives an alert about repeated failed login attempts on a server. She verifies the report using /var/log/auth.log. The IP address given is localhost, so she checks /var/log/cloba/root.log and sure enough, a colleague been performing some testing following an upgrade without telling anyone, invoking it using his own unprivileged account.

  • Tsukiko is an administrator whose server has crashed unrecoverably. A superficial look indicates the root filesystem is damaged. She greps the central syslog server for the logs that were also sent to /var/log/cloba/root.log, and sees that someone had done su - root and then some time later had typed rm -rf /tmp/x *, entirely by accident. She has the uid and timestamp so she knows which of her assistants on the sysadmin team to have a little chat with.

  • Chouko is an Ubuntu server team member who is reading an [Apport] crash report. Apport has included the last two hours of /var/log/system-logs/[top, netstats] and the last day of /var/log/apt/history.log which indicate a new package was recently installed that is chewing up resources.

  • Sayuri is an Ubuntu QA team member who is trying to determine why two similar servers are behaving differently. She compares the output of dpkg -get-selections on each machine. Then she steps backward in time, comparing /var/log/apt/get-selections until she finds some sigificant differences.

Assumptions

Todo

Design

There are three parts to the package:

  • dependencies on packages that already have all the functionality required in their area: acct, cloba, X, Y, Z

  • dependencies on packages that provide programs invoked by a shell script in cron: inotify-tools, X, Y, Z

  • shell scripts installed in cron that use base Ubuntu features, from /proc to ip without needing any further support

Implementation

Todo

UI Changes

None.

Code Changes

None.

Migration

None.

Test/Demo Plan

Install the server-logs package. Then:

  • run a program that has never before been run on this machine. Does it appear in the output of acct?

  • run a command as root. Does the command appear in /var/log/cloba/root.log?

  • run a command as user foo. Does the command appear in /var/log/cloba/foo.log?

  • wait for ten minutes. Are files in /var/log/system-log called iostat, top and iplink accumulating information?

Outstanding Issues

  • Should cloba, the commandline logging script, really be included here? It is so simple but also so powerful, and a possible security risk

References


CategorySpec

UbuntuServerLogs (last edited 2008-11-04 10:55:29 by jgr)