CheckboxAnonymizingInformation

Summary

Checkbox currently submits private information as part of the gathering phase like serial numbers, mac addresses, etc. This is not necessary considering the server only needs to identify unique information rather than specific information in this particular case. For example, a serial number is only necessary to identify that the same instance of a component is being submitted so hashing this information should be quite sufficient.

Release Note

Anonymizing information submitted over the network.

Rationale

The reason for anonymizing information is to protect the privacy of users contributing details about their systems. However, the unique quality of this information should be preserved in order to identify instances of components and systems.

Use Cases

  • After submitting test results for a network controller which happens to have a unique mac address, the user should be able to see all related results for that instance of the controller.
  • An instance of a system model should be identifiable by using some of the unique attributes of its components. This makes it possible to identify how many instances of a particular system model have been tested.

Assumptions

  • The hashing used to anonymize private information assumes that a given value will always hash to the same value.
  • The hashing should be one way so that the original private information cannot be determined by means other than brute force.

Design

The design must take in consideration that information is gathered from several locations by the registries. For example, information might be retrieved from lshal, lshw, /proc, etc.

The design should strive to be agnostic of the particular information being retrieved by the registries by expressing private keys for which the value should be anonymized in a generic way. For example, the following keys should be identified in a single location as opposed to being hard coded in each registry:

  • serial
  • hostname
  • ip
  • mac

Then, the recursive nature of the registry should be able to handle the transformation of the corresponding values to an anonymized representation.

The anonymized values should be represented as an overloaded str object so that the original value can be retrieved from within Checkbox but the hashed value, such as an md5sum representation, could also be retrieved.

Implementation

  • Add an anonymizer plugin which recurses the registry tree after the gathering phase.
  • The recursion routine should replace private values with an overloaded str object.
  • The str object should return an anonymized value by default and provide an additional attribute to retrieve the original value.

Migration

  • The server should be updated so that existing private values are converted in the same way as the client.

Test/Demo Plan

Unfortunately, this would need manual intervention by users to validate that the information being gathered indeed does not contain private information.

Unresolved issues

QATeam/Specs/CheckboxAnonymizingInformation (last edited 2008-12-22 17:34:57 by modemcable178)