LicenseReviewProcessImprovementSpec

Summary

The license statement for packages is not always accurate, due to conflicts at the file level inside the package. Improving the process for documenting the actual license(s) in effect, for a package, before it is accepted into main, is the goal.

Rationale

Package license statements do not always accurately reflect all of the actual licenses that need to be complied with for that package. See: 1

When a new packages is nominated for main, as part of the license check the individual files should be scanned for licenses. By using current open source tools to analyze the contents, the file level can be surveyed efficiently, and the licenses can be recorded accurately in the package before it is accepted into main.

Use Cases

  • A package is denoted to as released under BSD, however files inside it are licensed under GPLv2. If the files are part of the binary, then the source distribution requirement needs to be known to comply with. Knowing there are GPLv2 licensed files present, signals that better analysis in order to comply with the license terms may be required.

Scope

This specification covers the promotion of a new package into main. Once a new package has been adopted, ongoing scans to prevent inadvertent inclusion may be required.

Design

When a new package is nominated to be promoted into main:

  • The person nominating it, should run FOSSology, ninka, or one of the commercial tools on the package, and verify that the results at the file level match the declared license of the package.
  • If any file does not match the declared license of the package:
    • It should be documented as a detected license, in the information associated with the package.
  • If any detected license is incompatible with the declared license:
    • Before the package is accepted into main, the person nominating it should work with the package maintainer to resolve the licensing conflict.
    • a good summary of compatible/incompatible licenses can be found: 2

Summary

This specification is for a process change to be applied before accepting new packages into main, by using file level scanning tools to check the analysis.

Rationale

There is increased awareness in the commercial supply chain to ensure that the licenses code is released under can be complied to [3]. There is a legacy of projects where the intent of the package maintainer does not match what the licenses in the contents say [1]. This step is a low cost overhead to avoid the problem becoming worse.

Scope and Use Cases

Scope at this time is limited to packages getting added into main. We may want to extend it to updates to main packages to go through the file level scrutiny, but first step is to make sure new packages are license accurately documented as a condition of addition.

Use Cases

Implementation Plan

  • installation of common tools that can provide file level scanning
  • update process requirements[4] and process [5] to make license check explicit
  • file level exceptions to package license get documented in one of the emerging standard formats (DEP-5 or SPDX)
  • main archive approvers make explicit the license check requirement.

Implementation

Outstanding Issues

  • process documentation and existing tools confuse copyright and licensing in places.
  • is there an approved list of licenses for packages that can go into main?
  • FOSSology server setup and access, or other tools?
  • mechanism to list all the licenses present in a package ( DEP-5 [8] , SPDX [9], ?? )

BoF agenda and discussion

Taken from whiteboard on blueprint (pre-UDS)

  • Scott K - I'm somewhat confused about what's being proposed here? It is already a requirement that files in a package be licensed in a compatible way and that these licenses be documented in debian/copyright in order to be in the archive at all.
  • Kate - The package license is not always accurate, licenses at the file level can cause conflicts. This session is to discuss improvements to the review process, using open source tools to help clarify when there are ambiguities (and get resolution), before the package is accepted into main.
  • Laney - I don't understand why the `main' distinction is being made here. Isn't this relevant for the whole archive? It seems to be talking about standard source NEW review.
  • Kate - main is what ubuntu ship's, so was just a logical focus to start at. Issue is certainly applicable to wider archive, but keeping the scope of process change at a level reasonable to try out, was the initial thinking.
  • Emmet - Actually, "Ubuntu" ships the entire archive: main, restricted, universe, and multiverse. Depending on flavour, these may be enabled by default, or you may be prompted with a message like "Would you like your media experience to work?" during install to enable (and install software from) these components. The archive admins are expected to use the existing automated tools to verify debian/copyright for *every* new package entering the archive (although, in practice, many inherited from Debian are not carefully checked: presuming the check to have been done by the Debian ftpmasters). It may be more interesting to look at the tools as something to potentially run against the entire archive to ensure that we do comply with the license guarantees we provide to users (for every component).
  • kyleN -The issue I think this blueprint raises is along these lines: What is our level of confidence that the debian/copyright file is indeed accurate for *all* files in the source package for current Ubuntu src packages. Is it good enough? And, should we undertake to create a higher level of confidence in debian/copyright accuracy by using new/improved tools and processes. If so, where to start: new packages Main? All pkgs in Main? Some other scope? I would add that it may be useful to consider requiring that new pkgs in Ubuntu follow Dep5 (even though is is not final yet), and if so, where to first require that (Main?).

Notes from UDS

http://repo.fossology.org/?mod=nomoslicense&upload=86&show=detail&item=47437324

It's common in FOSS world for licenses and files to be mismatched, and for toplevel AUTHORS or debian/copyright to not be fully accurate.

Proposal to check that debian/copyright is accurate during main inclusion process. Should we also do this before any package even hits the archive? If it's an automated tool, it's no harder to do universe than main. (there are false positives, though) Upstream isn't necessarily involved in MIR process to help with any issues anyway.

Do we know how good we are right now in the archive? Not at all. But we're not even sure how bad it is right now.

Tools out there now are fossology, ninka(?), license-check.

Fossology is already setup for Debian... somewhere.

Sometimes source package license is actually wrong and debian/copyright is right because we've checked with upstream. And actually patching source is a headache for maintenance. DEP-5 needs a way to specify this, to avoid false positives. SPDX has a way around this.

Merging SPDX and DEP-5? DEP-5 needs to be human readable, but SPDX is going to XML. Fedora and Linux Foundation seem to be moving towards SPDX. As long as both are semantically compatible and machine-readable, we're fine.

SPDX isn't finished yet. RC before end of year.

Kyle has written a tool called get-licenses that will iterate through copyright files and produce a spreadsheet about which licenses it found. Parses DEP-5 files, calls license-check, will ping fossology server. Looks at installed packages, not just a source package sitting there. Still useful for flavors and such that have such publication requirements. Kyle plans to publish as a Launchpad project.

Action Plan:

  • Get a list of license problems in packages
  • File bugs with each package in Ubuntu and/or Debian
  • Coordinate with Lucas to create a UDD (ultimate debian database) report about it
  • When there is a conflict and debian/copyright is actually right, file an upstream bug
  • Have each flavor team start tackling their flavor bugs

Action items for next six months:

  • Focus on actually getting data about license problems (item 1 above)
    • Canonical Legal (Amanda) says she will get IS server resources
    • Setup fossology on server and document how to set one up for others as well as how users use it on wiki
      • The view for this should organize package problems by flavor set
    • DPL says he'll talk to Lucas
    • Get ninka packaged (Kate)
    • Kyle to publish get-licenses
    • Need a license tool to be runnable by developers, not just server-side
      • Add documentation to wiki about how to check source using such a tool (blocked until we have a tool to recommend)
      • Add a recommendation for running such a license tool to the NEW and MIR review process documentation
  • Need to figure out then how DEP-5 and SPDX will live together
  • Finish DEP-5 (2 to 3 more major revisions are expected -- globbing change for sure, probably hashes like SPDX)
    • This is arguably a blocker for us to start filing massive bugs about adopting DEP-5

References


CategorySpec

LicenseReviewProcessImprovementSpec (last edited 2010-10-26 13:56:51 by host194)