This is a living specification for recommendations in Ubuntu Software Center.
Recommendations are a way to suggest software that someone might be interested in. They involve three components:
Ubuntu Software Center itself
the recommendation service, an Internet server that receives, stores, and publishes reviews.
the Ubuntu Single Sign-On service.
Contents
Opting in and out
By default, the “Recommended For You” box on USC’s home screen should contain only a “Turn On Recommendations” button, and the disclosure text: “To make recommendations, Ubuntu Software Center will occasionally send to Canonical an anonymous list of software currently installed.”. The header should contain a “Hide” button, that collapses “Recommended For You” to a bar with the “Hide” button replaced by a “Show” button.
If you click “Turn On Recommendations” (or choose “View” > “Turn On Recommendations…” and confirm the same disclosure text in an alert), the button and caption should fade out. If there is no Internet connection, it should be replaced by the faded-in text “Recommendations will appear when next online.”. Otherwise (or when the computer is next online), it should be replaced by a faded-in left-aligned text “Submitting inventory…”, and a right-aligned progress bar which fills its first 50% while USC submits the list of software currently installed to the recommendation service, together with a UUID and (only if you’re already signed in) your SSO ID to link with your ratings.
Once the submission is finished, the progress text should change to “Receiving recommendations…”, while the remainder of the progress bar fills. Finally, the label and progress bar should fade out and be replaced by faded-in initial recommendations, as the box enlarges to accommodate them.
While submitting inventory, receiving, or showing recommendations, the “Recommended For You” section header should have the standard “More” button (disabled if there are no recommendations yet), and next to it a subtle “Turn Off” button. If you click “Turn Off” (or choose “Turn Off Recommendations” in the “View” menu), any current and future submission should be cancelled, and the section should collapse to its header-only state. Clicking “Show” from this state should show the initial opt-in display.
Generating recommendations
Except when opting in, all inventory submission and recommendation should happen in the background, with no progress display.
USC should send inventory updates, and request updated recommendations, when:
- you install or remove anything
- you rate anything
This is something we may do on the server side, when a new review is entered that could trigger the recalculation of the recommends server side. Of course it depends on if this is on the same server as rnrserver or not.
- at least a week has passed (in case anything hot has been released since then)
We can use the http ETAG to ask for changes more often and leave it to the server to set the policy of the cache, this gives us more flexibility in the future.
- the cache is missing or unparseable.
The server should use a recommender algorithm to identify the ~50 packages you don't have installed that you're most likely to rate as excellent. http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext
It seems like we should consider making it easier to the user to express "I like/dislike this app" without having to write a full review as we should benefit from the data.
Having multiple rating mechanisms would be too confusing, but we should have an “I’m not interested” for things already recommended. —mpt
This is mixing "generation of recommends" and "storing the app list" into a single task apparently. I think its easier to discuss them as two seperate tasks especially if we consider reusing popcon for parts of it.
Given that the result of the recommends does also depend on the other users we should probably re-generate periodically even if your system does not change. How this needs to be cached will depend on the complexity of the job. This is something that we need to discuss with ISD and the people implementing this on the server side.
Learnings association rules[1] using installed package sets (or "good review" sets) as carts is relatively easy and will probably produce quite more effective recommendations. http://en.wikipedia.org/wiki/Association_rule_learning
What information would be used to generate this UUID? This would determine if your UUID changes from time to time, from one install to the next, or from one of your devices to another. If we can tell what *user* a package-set belongs to (even if we keep many different ones for each user based on a UUID) it would enable us to link that data to social data if we ever find some. I see the advantages of not requiring a user to log in, though.
If we do link data to a SSO account it would include the hostname or some other device identifier. If we keep all versions provided by device id/UUID we may be able to do some very interesting recommendations at some point, like "apps you're likely to install in the future based on your current installed apps" that looks at the usual trend of app installation per device.
Storage
The server stores the list of each participant's installed packages, and a cache of the recommendations generated for them.
Serving
When sent a request containing the UUID, the server returns a Json list of packages representing the recommendations for that UUID.
There will be a REST API call that involves the UUID and that will return the recommendations in some format that s-c can understand.
Displaying
The full set of recommendations should be, out of all software not currently installed (bug 1009967), either the strongest 60 recommendations or all those with a confidence value of X or higher — whichever is fewer.
On the home screen
In the “Recommended For You” section of the home screen, up to 12 strongest recommendations from the full set should be shown using a standard software tile view, sorted strongest first.
The “More” button in the section header should navigate to a separate “Recommendations” screen listing the full set of recommendations, again sorted strongest first.
On category screens
In a category screen (but not a subcategory screen), if you have opted in to recommendations and there are any recommendations for you in that category and the category uses a software tile view, there should be a “Recommended For You in {Name of Category}” box listing up to 12 of those recommendations.
In a category that has subcategories, the recommendations box should follow the list of subcategories. In a category that does not have subcategories, the recommendations box should come first, followed by an “All Items” box.
On software item screens
On the software item screen for an item you have installed, below the description and “Add-ons” section (if any) should be a “People Also Installed” section, with a tile view of three recommendations based solely on that item.
Customizing
You may customize recommendations implicitly, by installing a recommended item then keeping the software installed; or explicitly, by expressing disagreement interactively.
On the screen for a recommended item, below the description should be an extra box that describes the single strongest factor in the recommendation, and gives you the option to either remove that recommendation or nullify that factor. For example:
Recommended for you because you installed “Filezilla” and others. |
Selecting either of these should result in the explanation box fading out.
If you have ignored an item for recommendations, a similar box should appear:
You have chosen to ignore Filezilla when getting recommendations. |
Finally, below the “Turn Off Recommendations” menu item should be a “Reset Recommendations…” item, which is active whenever recommendations are turned on. Activating it should display a “Reset Recommendations” dialog. If you have not made any customizations, it should have primary text “Recommendations are based on your installed software and any ratings.”, secondary text “You haven’t made any exceptions to these recommendations.”, and an “OK” button.
If you have made customizations, the dialog should instead have primary text “Recommendations are based on your installed software and any ratings, with these exceptions:”, followed by a listbox containing checkbox items for each “Don’t recommend” or “Don’t use for recommendations” exception you have set. The “Reset All” and “Cancel” buttons should always be sensitive, but “Reset” should be sensitive only when at least one exception is checked.
When this happens the "my-installed-apps" list needs to be updated on the server and the re-calcuation of the recommends needs to be triggered. Depending on how long this takes we need to poll the server. But this needs discussion with ISD as it will depend on the implementation.
Caching
- USC caches the list of recommendations.
Fallback
- If USC can't contact the server, it displays the cached recommendations. If no cache is available, it either tells you to connect to the Internet or to try again later, depending.
Unresolved issues
- How do we cater for people whose computer is used by multiple people? Should we add local username to the UUID to ensure it's unique (when some or all of the users don't have an SSO account?
- Its a interessting question what people would expect here. If I have a dedicated game machine and productivity machine then we should have two different recommendations. If OTOH I have a laptop and a desktop that I use for the same things the recommendations should be the same. Hopefully the system can work it out from the context.
- Does the algorithm take software ratings into account as well as whether it is installed? Is it less effective if someone has never rated software themselves (i.e. Users without an SSO account)?
- What if someone reinstalls Ubuntu?
- So we should probably do periodic "ping" (even if the system does not install/remove software a ping to tell the server that its still in use) with the UUID to be able to remove no longer valid UUIDs over time.
- One interesting point though is bootstrapping the dataset, that is, what recommendations to serve until we have a reasonable amount of data on the server. In the case of recommendations based on reviews we already have a decent amount of reviews up there to start review-based recommendations. For recommendations based on installed packages otoh, we'd need to start receiving data for a while before we can start making useful recommendations.
Data we can use
The current data about other people we have is:
- what all other people have installed (new recommender service/popcon)
- what all other people are using (zeitgeist/new recommender service/popcon)
- what specific apps other people like or dislike (rnr)
The data we have about the users system is:
- what apps the user has installed
- what apps the user is using (popcon/zeitgeist)
- what mimetypes the user is working with (zeitgeist)
- *maybe* the SSO ID of the user
- *maybe* what apps the user likes (based on his/her reviews)
- the users contacts)
There is a certain overlap with popcon, so we should consider reusing parts of it and parts of the raw data we have into this new system.
Our review based data will be relatively small because people have to write a full review in order to "rate" a app. Having a lower threshold here in the form of just "like/dislike this app" or "1-5 stars" (without a review) would generate more data that we could use for the purpose of good recommendations.
Contacts are an interesting idea.Some challenges:
- Diversity of the contacts. I have in my contacts my familty, my
- friends and my co-workers and more people I know but don't interact much with. Their interessts and computer habits are very diverse, I really wonder if that will give me anything better than recommendations on the whole s-c user population. We could use "friends" or "favorite contacts" instead (which is also not quite right but probably closer)
- Privacy. We need to be careful with this feature, if a user has only
- very few contacts this could be used to gather data about the installed apps of them. We either need to make this opt-in or be very careful about leaking information. The nature of the data is not that sensitive so we may well be fine, but we need to take it into consideration.
- Technical: the server will have to know the users contacts
- (ubuntuone or uploading when the feature is activated) and the server will have to match ubuntu sso IDs to the applist of the given user. This will exclude users without a ubuntu SSO account.