Recommendations

Differences between revisions 2 and 3
Revision 2 as of 2011-10-29 02:16:42
Size: 8658
Editor: 71
Comment:
Revision 3 as of 2011-12-13 22:24:05
Size: 11656
Editor: mpt
Comment: + spec for display of recommendations
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
== Generation == <<TableOfContents()>>
Line 10: Line 10:
By default, the "Recommended for you" box contains only a "Turn On Recommendations" button, and an explanation that turning them on will submit data about what software you have installed. When you turn it on, USC securely submits to the server a list of all the packages you have installed, together with a UUID and (if you're signed in) your SSO ID to link with your ratings. So you don't need to sign in to an SSO account to get recommendations, you just need to click one button. == Opting in and out ==
Line 12: Line 12:
Whenever you submit the data (which is whenever you install, remove, or rate anything subsequently), the server uses a [[http://en.wikipedia.org/wiki/Recommendation_system#Algorithms|recommender algorithm]] to identify the ~50 packages you don't have installed that you're most likely to rate as excellent. http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext {{attachment:recommendations-home.png}}

{1} By default, the “Recommended For You” box on USC’s home screen should contain only a "Turn On Recommendations" button, and the text: “To make recommendations, Ubuntu Software Center will send to Canonical a list of software currently installed.”. The header should contain a “Hide” button, that collapses “Recommended For You” to a bar with the “Hide” button replaced by a “Show” button.

If you click “Turn On Recommendations”, the button and caption should fade out. If there is no Internet connection, it should be replaced by the faded-in text “Recommendations will appear when next online.”. Otherwise (or when the computer ''is'' next online), it should be replaced by a faded-in left-aligned text “Submitting inventory…”, and a right-aligned progress bar which fills its first 50% while USC submits the list of software currently installed to {2} the recommendation service, together with a UUID and (''only if'' you're signed in) your SSO ID to link with your ratings.

Once the submission is finished, the progress text should change to “Receiving recommendations…”, while the remainder of the progress bar fills. Finally, the label and progress bar should fade out and be replaced by faded-in initial recommendations.

While submitting inventory, receiving, or showing recommendations, the “Recommended For You” section header should have the standard “More” button (disabled if there are no recommendations yet), and next to it a subtle “Turn Off” button. If you click “Turn Off”, any current and future submission should be cancelled, and the section should collapse to its header-only state. Clicking “Show” from this state should show the initial opt-in display.

== Generating recommendations ==

{1} Except when opting in, all inventory submission and recommendation should happen in the background, with no progress display.

USC should send inventory updates, and request updated recommendations, when:
 * you install or remove anything
 * you rate anything
  * ''This is something we may do on the server side, when a new review is entered that could trigger the recalculation of the recommends server side. Of course it depends on if this is on the same server as rnrserver or not.''
 * at least a week has passed (in case anything hot has been released since then)
  * ''We can use the http ETAG to ask for changes more often and leave it to the server to set the policy of the cache, this gives us more flexibility in the future.''
 * the cache is missing or unparseable.

{2} The server should use a [[http://en.wikipedia.org/wiki/Recommendation_system#Algorithms|recommender algorithm]] to identify the ~50 packages you don't have installed that you're most likely to rate as excellent. http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext
Line 15: Line 37:

 ''Having multiple rating mechanisms would be too confusing, but we should have an “I’m not interested” for things already recommended. —mpt''
Line 39: Line 63:
    USC asks the server for updated recommendations whenever the
    feature is turned on and, since it last requested them:
=== On the home screen ===
Line 42: Line 65:
 * you've installed or removed anything
 * you've rated anything
  * ''This is something we may do on the server side, when a new review is entered that could trigger the recalculation of the recommends server side. Of course it depends on if this is on the same server as rnrserver or not.''
 * at least a week has passed (in case anything hot has been released since then)
  * ''We can use the http ETAG to ask for changes more often and leave it to the server to set the policy of the cache, this gives us more flexibility in the future.''
 * the cache is missing or unparseable.
{1} In the “Recommended For You” section of the home screen, items should be shown using a standard [[SoftwareCenter#software-tile-view|software tile view]].
Line 49: Line 67:
The top ~6 recommendations are shown in a box on the home screen, with an "All" link to a separate screen to show them all. The “More” button in the section header should navigate to a separate “Recommendations” screen that lists either the top 60 recommendations that are not installed, or those with a confidence value of X or higher, whichever is less.

On the screen for a recommended item, below the description should be an extra box that describes {2} the single strongest factor in the recommendation, and gives you the option to either remove that recommendation or nullify that factor. For example:

||<style="background:#eee;color:#000;padding:0.5em;">Recommended for you because you installed “Filezilla” and others.<<BR>><<BR>>▶ __I’m not interested in Bleach``Bit__<<BR>><<BR>>▶ __Ignore Filezilla for recommendations__||

Selecting either of these should result in the explanation box fading out.

If you have ignored an item for recommendations, a similar box should appear:

||<style="background:#eee;color:#000;padding:0.5em;">You have chosen to ignore Filezilla when getting recommendations.<<BR>><<BR>>▶ __Stop ignoring Filezilla__||
Line 52: Line 80:

=== On software item screens ===

On [[SoftwareCenter#software-item-screen|the software item screen]] for an item you have installed, below the description and “Add-ons” section (if any) should be a “People Also Installed” section, with a [[SoftwareCenter#software-tile-view|tile view]] of three recommendations based solely on that item.

This is a living specification for recommendations in Ubuntu Software Center.

Recommendations are a way to suggest software that someone might be interested in. They involve three components:

  1. (1) Ubuntu Software Center itself

  2. (2) the recommendation service, an Internet server that receives, stores, and publishes reviews.

  3. (3) the Ubuntu Single Sign-On service.

Opting in and out

recommendations-home.png

(1) By default, the “Recommended For You” box on USC’s home screen should contain only a "Turn On Recommendations" button, and the text: “To make recommendations, Ubuntu Software Center will send to Canonical a list of software currently installed.”. The header should contain a “Hide” button, that collapses “Recommended For You” to a bar with the “Hide” button replaced by a “Show” button.

If you click “Turn On Recommendations”, the button and caption should fade out. If there is no Internet connection, it should be replaced by the faded-in text “Recommendations will appear when next online.”. Otherwise (or when the computer is next online), it should be replaced by a faded-in left-aligned text “Submitting inventory…”, and a right-aligned progress bar which fills its first 50% while USC submits the list of software currently installed to (2) the recommendation service, together with a UUID and (only if you're signed in) your SSO ID to link with your ratings.

Once the submission is finished, the progress text should change to “Receiving recommendations…”, while the remainder of the progress bar fills. Finally, the label and progress bar should fade out and be replaced by faded-in initial recommendations.

While submitting inventory, receiving, or showing recommendations, the “Recommended For You” section header should have the standard “More” button (disabled if there are no recommendations yet), and next to it a subtle “Turn Off” button. If you click “Turn Off”, any current and future submission should be cancelled, and the section should collapse to its header-only state. Clicking “Show” from this state should show the initial opt-in display.

Generating recommendations

(1) Except when opting in, all inventory submission and recommendation should happen in the background, with no progress display.

USC should send inventory updates, and request updated recommendations, when:

  • you install or remove anything
  • you rate anything
    • This is something we may do on the server side, when a new review is entered that could trigger the recalculation of the recommends server side. Of course it depends on if this is on the same server as rnrserver or not.

  • at least a week has passed (in case anything hot has been released since then)
    • We can use the http ETAG to ask for changes more often and leave it to the server to set the policy of the cache, this gives us more flexibility in the future.

  • the cache is missing or unparseable.

(2) The server should use a recommender algorithm to identify the ~50 packages you don't have installed that you're most likely to rate as excellent. http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext

It seems like we should consider making it easier to the user to express "I like/dislike this app" without having to write a full review as we should benefit from the data.

  • Having multiple rating mechanisms would be too confusing, but we should have an “I’m not interested” for things already recommended. —mpt

This is mixing "generation of recommends" and "storing the app list" into a single task apparently. I think its easier to discuss them as two seperate tasks especially if we consider reusing popcon for parts of it.

Given that the result of the recommends does also depend on the other users we should probably re-generate periodically even if your system does not change. How this needs to be cached will depend on the complexity of the job. This is something that we need to discuss with ISD and the people implementing this on the server side.

Learnings association rules[1] using installed package sets (or "good review" sets) as carts is relatively easy and will probably produce quite more effective recommendations. http://en.wikipedia.org/wiki/Association_rule_learning

What information would be used to generate this UUID? This would determine if your UUID changes from time to time, from one install to the next, or from one of your devices to another. If we can tell what *user* a package-set belongs to (even if we keep many different ones for each user based on a UUID) it would enable us to link that data to social data if we ever find some. I see the advantages of not requiring a user to log in, though.

If we do link data to a SSO account it would include the hostname or some other device identifier. If we keep all versions provided by device id/UUID we may be able to do some very interesting recommendations at some point, like "apps you're likely to install in the future based on your current installed apps" that looks at the usual trend of app installation per device.

Storage

The server stores the list of each participant's installed packages, and a cache of the recommendations generated for them.

Serving

When sent a request containing the UUID, the server returns a Json list of packages representing the recommendations for that UUID.

There will be a REST API call that involves the UUID and that will return the recommendations in some format that s-c can understand.

Displaying

On the home screen

(1) In the “Recommended For You” section of the home screen, items should be shown using a standard software tile view.

The “More” button in the section header should navigate to a separate “Recommendations” screen that lists either the top 60 recommendations that are not installed, or those with a confidence value of X or higher, whichever is less.

On the screen for a recommended item, below the description should be an extra box that describes (2) the single strongest factor in the recommendation, and gives you the option to either remove that recommendation or nullify that factor. For example:

Recommended for you because you installed “Filezilla” and others.

I’m not interested in BleachBit

Ignore Filezilla for recommendations

Selecting either of these should result in the explanation box fading out.

If you have ignored an item for recommendations, a similar box should appear:

You have chosen to ignore Filezilla when getting recommendations.

Stop ignoring Filezilla

When this happens the "my-installed-apps" list needs to be updated on the server and the re-calcuation of the recommends needs to be triggered. Depending on how long this takes we need to poll the server. But this needs discussion with ISD as it will depend on the implementation.

On software item screens

On the software item screen for an item you have installed, below the description and “Add-ons” section (if any) should be a “People Also Installed” section, with a tile view of three recommendations based solely on that item.

Caching

  • USC caches the list of recommendations.

Fallback

  • If USC can't contact the server, it displays the cached recommendations. If no cache is available, it either tells you to connect to the Internet or to try again later, depending.

Unresolved issues

  • How do we cater for people whose computer is used by multiple people? Should we add local username to the UUID to ensure it's unique (when some or all of the users don't have an SSO account?
    • Its a interessting question what people would expect here. If I have a dedicated game machine and productivity machine then we should have two different recommendations. If OTOH I have a laptop and a desktop that I use for the same things the recommendations should be the same. Hopefully the system can work it out from the context.
  • Does the algorithm take software ratings into account as well as whether it is installed? Is it less effective if someone has never rated software themselves (i.e. Users without an SSO account)?
  • What if someone reinstalls Ubuntu?
    • So we should probably do periodic "ping" (even if the system does not install/remove software a ping to tell the server that its still in use) with the UUID to be able to remove no longer valid UUIDs over time.
  • One interesting point though is bootstrapping the dataset, that is, what recommendations to serve until we have a reasonable amount of data on the server. In the case of recommendations based on reviews we already have a decent amount of reviews up there to start review-based recommendations. For recommendations based on installed packages otoh, we'd need to start receiving data for a while before we can start making useful recommendations.

Data we can use

The current data about other people we have is:

  • what all other people have installed (new recommender service/popcon)
  • what all other people are using (zeitgeist/new recommender service/popcon)
  • what specific apps other people like or dislike (rnr)

The data we have about the users system is:

  • what apps the user has installed
  • what apps the user is using (popcon/zeitgeist)
  • what mimetypes the user is working with (zeitgeist)
  • *maybe* the SSO ID of the user
  • *maybe* what apps the user likes (based on his/her reviews)
  • the users contacts)

There is a certain overlap with popcon, so we should consider reusing parts of it and parts of the raw data we have into this new system.

Our review based data will be relatively small because people have to write a full review in order to "rate" a app. Having a lower threshold here in the form of just "like/dislike this app" or "1-5 stars" (without a review) would generate more data that we could use for the purpose of good recommendations.

Contacts are an interesting idea.Some challenges:

  • Diversity of the contacts. I have in my contacts my familty, my
    • friends and my co-workers and more people I know but don't interact much with. Their interessts and computer habits are very diverse, I really wonder if that will give me anything better than recommendations on the whole s-c user population. We could use "friends" or "favorite contacts" instead (which is also not quite right but probably closer)
  • Privacy. We need to be careful with this feature, if a user has only
    • very few contacts this could be used to gather data about the installed apps of them. We either need to make this opt-in or be very careful about leaking information. The nature of the data is not that sensitive so we may well be fine, but we need to take it into consideration.
  • Technical: the server will have to know the users contacts
    • (ubuntuone or uploading when the feature is activated) and the server will have to match ubuntu sso IDs to the applist of the given user. This will exclude users without a ubuntu SSO account.

SoftwareCenter/Recommendations (last edited 2012-07-09 09:57:05 by mpt)