You are on page 1of 33

Survey of Recommendation

Systems and Algorithms

Term Paper for


Spring 2000


Yuan Qu

Xiaoyun Yang

Tianping Huang

May 5, 2000

Table of Contents

I. Introduction ........................................................ 3

II. Recommendation Systems................................... 4

I. Algorithms.......................................................... 14

II. Discussion.......................................................... 29

III. Reference........................................................... 31


1. Introduction

In our daily life, we make our choices at most cases relying on recommendations

from other people either by word of mouth, recommendation letters, movie and book

reviews printed in newspapers, or general surveys. In this information age, each day tons

of news published through the Internet. This leads to a clear demand for automated

methods that locate and retrieve information with respect to users’ individual interests.

More and more people accessing the Internet also provide new possibilities to organize

and recommend information.

Recommendation systems can assist and augment this natural social process.

These systems can recommend what you want according what you want in previous time.

The main purpose of the recommendation systems is to provide tools for people to

leverage the information hunting and gathering activities of other people or groups of

people. Recommendation systems have been an important application area and the focus

of considerable recent academic and commercial interest.

Recommendation systems basically are divided into two categories. One is called

content-base filtering; the other is collaborative filtering (or social filtering). In content-

based filtering system, each user is assumed to operate independently. As a result,

document representations in content-based filtering systems can exploit only information

that can be derived from document contents. In collaborative filtering system, the

representation of a document is based on an evaluation to that document made by prior

readers of the document. They consider that communities of shared interest could be

automatically identified by exchanging this sort of information. In practical, collaborative


filtering system provides a basis for selection of information items, regardless of whether

their content can be represented in a way that is useful for selection. In this paper, the

focus will be on the collaborative filtering.

Collaborative Filtering was presented by the developers of the first

recommendation system, Tapestry, in 1992 [Goldberg, et al. 1992]. Several years later

the concept of collaborative filtering had already applied in dozens of publicly available

systems, several proprietary systems, and even some commercially available systems. In

1996, dozens of the researchers in the academic and business areas gathered at the UC-

Berkeley to share their ideas and experiences about these emerging filtering methods

[Collaborative Filtering workshop, 1996]. They presented the vision and definition of

collaborative filtering, and provided some applications of this technique. Right now

more and more published articles demonstrated their applications of the collaborative

filtering methods.

In this paper, a survey was made for all the recommendation systems available in

the Internet. Then, the characteristics of each recommendation system are displayed. And

last, some algorithms of famous recommendation systems are introduced in detail.

II. Recommendation Systems

There are a lot of recommendation systems on web sites. According to the

purposes of their application, the recommendation systems can be classified into three

categories [Resnick, 1997], shown in Figure 1.


r e c o m m e n d a t o n s y s t e m s

m o v ie s o rn em w u s s io c r a rw t i ec bl e sp a g e s

E a c h M o i vTe a p e s t r y P h o a k s
M o r s e G r o u p L e n sG A B
F ir e f ly L o t u s N o t e Fs a b
. . . . . . . . .

Figure 1. The recommendation systems’ categories

The systems in first category are used for recommending movies, music, videos or

other services. In this category, the database is relative stable, such as the population

database, it may not be changed in years. The typical systems include EachMovie,

Firefly, and Morse. The second category is used for news or articles in a newsgroup.

The users in the newsgroup generally have the similar goals or interests. The database is

also relative stable. It may be updated in weeks or short time. The representatives of

these systems are Tapestry, GroupLens, and Lotus Notes. The last one is for web pages’

recommendation. The information in this category is dynamic, that means, the new page

can be added or deleted in the system at any time. At the same time, the users may have

different tastes. Phoaks, GAB, and Fab are most useful systems of this kind.

The brief introduction of each recommendation system is given as follows:



When a user revisits a favorite Web page, Do-I-Care [Turnbull, 1998;

Collaborative Filtering workshop, 1996] system provides a function that alerts the user

when this Web page is changed. The system uses the model-based algorithm. It uses

Bayesian classifier technology. After some users training the model many times, the other

users can get good prediction.

According to the report from Mark Ackerman (U. of California-Irvine)

[Collaborative Filtering workshop, 1996], the accuracy of Do-I-Care can reach 70-90%.

It is said the accuracy of the system reaches 100% in tracking airline fare sales



In a collaborative filtering system, if a new item or new user enters the system,

the system has no clue to calculate the similarity between users and the system has no

way to consider the new item unless some users have rated it, or recommended it. This

problem is called cold-start problem. But for content-based filtering, there does not exist

such problem. To eliminate this problem, Fab recommendation system [Turnbull, 1998]

combines both collaborative and content-based filtering systems.

The Fab system is a web based recommendation service that incorporates both

collaborative and content-based filtering methods. Users ‘ profiles are constructed as a

collection of keywords contained in those documents that each user rate highly.


Documents are presented for rating when either the content of the document matches

previous documents that were rated highly, or neighboring users rate a document highly.

Every time a favorable or unfavorable rating is received, the profile of the user is updated

to reflect the new rating.

Collection agents are sent out over the web to look for documents with specific

content, each agent using a different set of keywords. After retrieving the documents,

they are passed to a central server where a selection agent matched to each user's profile,

scours through the documents looking for interesting material. Relevant documents are

then presented to the user for rating. This rating dynamically affects the selection agent’s

behavior and changes the user's profile. The rating also affects the collection agent that

retrieved the document. Unpopular collection agents are removed and replaced with more

successful ones over time.

The Fab system combines the best features of both content-based and

collaborative filtering methods and also manages to keep the system dynamically updated

to the current users' tastes. One potential shortcoming is Fab's reliance on explicit user



The system [Turnbull, 1997 and 1998] is based on similarities of users to provide

recommendation. At the beginning, this system was used for music and movies

recommendation. Right now it extends to other media recommendation, such as

newsgroup, books, and web pages.


The system used users’ profiles as input, and used constrained Pearson algorithm

to make the best predictions between users. The basic idea of the algorithm is: a) the

system maintains a user profile, which includes “like or dislike” of specific items; b) the

system compares the similarities of users and decides which kind of users that the user

belongs, and c) according to the similar user’s profile and gives a good recommendation.


GAB [Wittenburg, et. al., 1998] stands for group asynchronous browsing. The

idea of GAB system is that the system collects and merges bookmarks and hotlists files

of users and then serves these files to users. That means, the system has the ability to

reach user’s bookmarks and extract information. This raises privacy concerns. To

overcome the privacy problem, the system has provide a mechanism to let user save

his/her bookmark in “private” or “public”.

The system uses multi-tree data structure for the bookmarks. To avoid getting

lost in hyperspace and to increase the connectivity in merged subject tree database, the

system has defined sibling and cousin relations. Sibling relation of item A and B means

that A and B belong to the same specific subject, while cousin relation of A and B means

that A and B belong to the broad subject but not the same specific subject. The system

also has applied for monitoring the change of content of web page.



Grassroots system [Turnbull, 1998] is described as "A System Providing A

Uniform Framework for Communicating, Structuring, Sharing Information, and

Organizing People”.

This system provides a special interface of Web pages to access all of the

information it works with. In practice, Grassroots also lets participants continue using

other mechanisms, and takes as much advantage of them as possible. The main engine in

the Grassroots system is a Web server and Proxy server setup that can be used with any

Web browser.


Resnick [Resnick, et al. 1994] presented the GroupLens system, which is built

based on a simple premise "the heuristic that people who agreed in the past will probably

agree again". This system uses the same Pearson algorithm to provide algorithm. At

early stage, the system uses explicit vote ( 1 to 5 scale, 1 stands for dislike it, 5 for like

it). The updated one also includes using implicit method to get the feedback from the

user, such as monitoring reading time. The most characteristics of the system are its

openness and scalability.

Openness means that this system provides other researchers an access to create

clients that work with the system servers or to even change those servers if there are

better improvements. When users’ number increases, the system still can provide

accurate prediction but the database for the system or the calculation time will be very



Letizia & Let’s Browse

Let's browse and its predecessor, Letizia, [Lieberman, 1996; Pryor, 1998] are web

agents that assist a user during his/her browsing experience. By monitoring a user’s

behavior, or browsing time on a web page, Letizia system learns the user’s interests and

provides recommendation. Let’s Browse, improved from Letizia, provides

recommendation by using group’s profiles instead of by using a single profile. If

multiple users are reading the same page at the same time, Let’s Browse can determine

which users are in the area of monitor, and use their profiles to provide recommendation

sites for entire group.

Lotus Notes

Lotus Notes [Turnbull, 1998] is a system that is used as a foundation for

Collaborative Filtering techniques. The system serves for the newsgroup. All Notes

Users should have similar goals or information interests because they are working in the

same group

Lotus provides a feature to let people annotate documents. After annotation, the

user can send or distribute these links or comments to others. To protect user’s privacy,

the system uses an agent to represent an individual. These agents extract significant

phrases from the document that the user reads, and then exchange the learning results




Mosaic system [Turnbull,1997] was the first Web tool that facilitated

collaborative. Like recommendation system Pointers, the Mosaic users in the system can

publish and distribute the bookmarks and add the comments to the web page. This

simple feature enabled users to actively share information with others.


Terveen [Terveen et. al, 1997] first introduced PHOAKS (People Helping One

Another Know Stuff) system that recommends the URLs that will be very interesting to

users. The system will automatically recognize web resource references in a new group

message and then attempt to classify it, and introduce it to other users. That means the

system scans and checks the group’s messages and then gets the most important URLs in

theses messages. After sorting these links, the system recommends this URLs to users.

The system uses implicit feedback and also considers the role specialization.


This system [Maltz, 1995] is implemented inside Lotus Notes environment. As

we know if one person is an expert in these areas, then other users in this group would

like to see his/her recommendation. So the system provide a mechanism to let the

“information mediators” in a workgroup easily distribute references and commentary of


documents they find. This mechanism is realized by using “pointer”. This pointer is

consists of URL link, contextual information, and optimal comments by the sender. The

system is very easy to use but not anonymous.


Siteseer [Turnbull, 1997] is a collaborative system using web browser bookmarks

to find neighbors and recommend sites. Users with significant overlap in bookmark

listings are determined to be close to one another, allowing previously unvisited sites to

be recommended to one another.


This is the first collaborative recommendation system [Goldberg, 1992]. It uses

free annotations or explicit “like it” or “hate it” annotations. This system is used for

newsgroup. So it is not easy for the group exploring new area.


Turnbull [Turnbull, 1998] considered Yahoo! as a recommendation system that

uses manual way to realize collaborative filtering. They have one expert to update

Yahoo! Index as quickly as possible. That means that every site is examined by a people

when it is added. Also the system allows web users to submit pages. Because of its


openness, the form of Yahoo! index has become very popular and has become a

classification standard.


The WebWatcher system [Joachims, 1996] likes a tour guide in a museum. It

provides interactive communication between server and users and provides

recommendation. The user who enter the system can ask question by typing what is

his/her interest, and then the system will recommend the related web sites. This is not the

same thing as keyword-based search engine. It does use the user profile and other users’

previous tour, and calculate the similarities of users and predict the user’s interest. The

system also uses the user’s experience to reinforce learning.

III. Algorithms on Collaborative Filtering

Today recommendation systems have been used in many fields, virtually all

topics that could be of potential interest to users are covered by special purpose

recommendation systems: Web pages, news stories, emails, movies, music videos, books,

CDs, restaurants, and many more. These recommendation systems predict the users’

interest and preference based on all users’ profiles, using information retrieval

techniques. The underlying techniques used in today’s recommendation systems fall into

two distinct categories: content-based filtering and collaborative filtering methods. The

content-based filtering uses actual content features of items, while the collaborative


filtering predict new user’s preference using other users’ rating, assuming the like-

minded people tend to have similar choices. Here, we concentrate on the algorithms used

on the collaborative filtering.

Collaborative filtering or recommender systems predict additional topics or

products of a new user might like, based on a user preference database. There have been

a lot of collaborative filtering algorithms. Breese,,1998, classified these algorithms

into two categories: Memory-based Algorithm and Model-based Algorithms. Based on

their classification, we collect and classified the available algorithms so far on

Collaborative Filtering.

Memory-based Algorithms

The reason that they define these algorithms as memory-based algorithm is

because that these algorithms operate over the entire user database to make predictions.

Basically, these algorithms all try to find the similarity or correlation between the new

active user and other users in the database. All users’ preferences could be represented by

their votes (explicit or implicit) to the products (which could be anything related to the

users’ interests.). The new user has an average vote over the products he/she has rated.

Then the predicted votes of the new users over other products could be calculated by

adding weighted sum of other users’ votes. The weights could be determined by the

similarity between the new user and other users. The more similar they are, the more

contributions they have to the sum, so the large the weights are. The user’s average vote


could be represented as below, the I i is the set of items the new user i has voted, vij is

the user i vote to product j. Then the average vote is:

vi = ∑v i , j
| I i | j∈I i

The predicted vote of the new (active) user is:

p a , j = v a + k ∑ w(a, i )(vi , j − v i )
i =1

where the k is a normalizing factor, while w( a, i ) is the weight that the user i

contributes to the active user.

The weights are calculated by comparing a set of common products, which the

active user and all other users in the database have rated. Here we collected three major

methods to define the weights.

Mean Squared Differences:

This method defines the weight as the inverse of the mean square distance.

w( a, i ) =
(V j −Va ) 2

Pearson Correlation:


w( a, i ) =
∑ (v − v )( v − v )
j a, j a i, j i

∑ (v − v ) ∑ (v − v
j a, j a
j i, j i )2

Vector Similarity:

This method defines the weight based on the angle size between the active user

vector and the other user vector.

va, j vi , j
w( a, i ) = ∑
j ∑ k∈I a
v a2, k ∑ k ∈I i

Improvement on Memory-based Algorithms

In order to improve the performance of standard memory-based algorithms,

several modifications are proposed.

Default Voting:

book1 book2 book3 book4 book5 book6

user 1 5 1
user 2 3 1 5
user 3 3 5 4
user 4 4 2 ?


Usually, we are dealing with very sparse databases, also there are a lot of products which

users didn’t vote on (explicit or implicit). When using memory-based algorithms, we are

only using the entries at the intersection. For the example above, to calculate the weight

user1 contributes, we can only use the rates for book1. In order to deal with this problem,

default votes are introduced. In most case, a neutral or negative preference is given to the

unobserved products. So the union of voted set could be used in weights calculation

instead of intersection. But this method may not necessarily improve the performance of

the memory-based algorithms, an unobserved product may not mean that it’s less


Inverse User Frequency:

The idea of inverse user frequency is that universally liked products are not as

useful as the less common products in capturing the similarity between users. So the

weight is modified by introducing a f j , which is defined as below:

f j = log

Where n is total number of users, while n j is the total number of users who have

voted for product j. Then the relative correlation weight would be

w( a, i ) =
∑ j
f j (∑j f j v a , j vi , j ) − (∑j f j v a , j )( ∑j f j vi , j )



U = ∑ f j (∑ f j v a2, j − (∑ f j v a , j ) 2 )
j j j

V = ∑ f j (∑ f j vi2, j − (∑ f j vi , j ) 2 )
j j j

Case Amplification:

Case amplification emphasizes the contribution of the most similar users to the

prediction by amplifying the weights close to 1. The new weights are calculated as


waρ,i wa ,i ≥ 0
wa' ,i = {
− ( −waρ,i ) wa , j < 0

Voting by category:

In some collaborative filtering applications, the dimensions of the users’ voting

matrix could become unmanageable, preventing the practical calculations over the over

matrix. There could be very few common votes to the same products if not using default

voting method mentioned before, however, providing default votes may not improve the

performance. Gokiso-cho, 1998 proposed an voting by category algorithm.

Basically, they assume the existence of small number of generated clusters or pre-

existing categories to which products can be assigned. Then transfer the voting matrix

into much lower dimension by transfer users’ voting to products into the voting to


categories. See the same example below, this time the original 4 by 6 matrix is changed

to be 4 by 3 and users have more common votes.

catagory1 category2 catagory3

book1 book2 book3 book4 book5 book6
user 1 5 1
user 2 3 1 5
user 3 3 5 4
user 4 4 2 ?

The new votes of users to categories are calculated as below:

v i ,c = vi , j , j ∈c

Now the entry of the new matrix is the average over the votes of the products per each

category for a given user.

The categories could be pre-defined or unknown. To deal with unknown

categories, EM algorithm could be used.

The method could be used on all other algorithms (including the Model-based

Algorithms). We put it here because the original author uses it along with the correlation


Model-based Algorithms


Model-based algorithms first generate a descriptive model by compiling the users’

preferences; recommendations are then predicted by appealing to the model. From a

probabilistic perspective, the collaborative filtering can be viewed as calculating the

expected value of a vote, given user’s profile or previous votes.

Pa , j = E (v a , j ) = ∑ Pr( v a , j = i | v a ,k , k ∈ I a ) ⋅ i
i =0

Cluster Models:

Based on the idea that there are certain groups or types of users capturing a

common set of preferences and tastes, Breese,, proposed a cluster method, in which

like-minded users are classified into the same group. Given a user’s class membership,

the user’s votes are assumed to be independent, then the joint probability of class and

votes could be calculated by the “naïve” Bayes formulation,

Pr( C = c, v1 ,..., v n ) = Pr( C = c) Pr( vi | C = c)
i =1

Once we know the probability of observing an individual of a class with a set of votes,

the expectation of the future vote could be easily calculated. Since the classes and

number of class are unknown, EM algorithm is used to find the model structure with

maximum likelihood.


Ungar [Unger, et. al.,1998] proposed a new clustering methods, unlike the

standard cluster models, they assume that people are from classes: e.g, intellectual or fun

and products are also from classes. Here is an example in their paper,

Batman Rambo Andre Hiver Whispers Star Wars

Lyle y y
Ellen y y y
Jason y y
Fred y y
Dean y y y

In this movie database example, people can be classified as intellectual or fun,

and movies could belong to three categories: action, foreign, classic. “y” in the table

means people like the movies associated. For each person/movie pair, the probability that

there is a “y” in the table is

action foreign classic

intellectual 0/6 5/9. 2/3.
fun 3/4. 0/6 2/2.

Based on the observation above, they establish a model, which contains three sets

of parameters: Pk (probability a random person is in class k), Pl (probability a random

movie is in class l), Pkl (probability a person in class k is linked to a movie in class l).

Here, the class assignments are unknown. They tried repeated clustering and

Gibbs sampling methods. In repeated clustering method, firstly, people are clustered

based on movies and movies based on people; on the second, and later passes, people are

clustered based movie clusters and movies based on people clusters. To do clustering,


they use k-means clustering instead of EM algorithm due to the constraint that a person is

always in the same class and a movie is always in the same class. They claimed that the

Gibbs sampling method over-performances repeated clustering.

Bayesian Network Models:

An alternative model formulation for probabilistic collaborative filtering is a

Bayesian belief network with a node corresponding to each product in the database. The

missing data can be represented by a “no vote” value. After applying an algorithm to train

the belief network, in the resulting network, each item will have a set of parent items that

are the best predictors of its votes. A decision tree could be used to represent the

conditional probability table.

Neural Network Models:

Similar as the Bayesian Network models, collaborative filtering can be seen as a

classification task. Based on a set of ratings from users for products, we could induce a

model for each user that allows us to classify unseen products into two or more classes.

The missing data could be indicted by a “no vote” state. Here is an example given in

Billsus’ [Billsus, D. and Pazzani, M., 1998] paper.

I1 I2 I3 I4 I5
U1 4 3
U2 1 2
U3 3 4 2 4
U4 4 2 1 ?


Where Ui is the ith user, Ii is the ith item. Users rate the items from 1 to 4, while 4 is the

highest rating. Since finally they only recommend the items the active user would like,

they reform the rating matrix by replacing rating > 2 by 1 otherwise 0. To represent the

“no vote” value, they further split every user set into two sets (like and dislike).

E1 E2 E3
U1 like 1 0 1
U1 dislike 0 0 0
U2 like 0 0 0
U2 dislike 0 1 0
U3 like 1 1 0
U3 dislike 0 0 1
Class like dislike dislike

Here U4’s ratings for I1, I2, I3 are class labels. After converting a data set of user ratings

for items into this format, we can apply virtually any supervised learning algorithm.

Other Algorithms

A hybrid memory- and model-based approach:

Pennock [Pennock, David M. and Horvitz, Eric 1999] proposed a CF method

called personality diagnosis (PD) which can be seen as a hybrid between memory- and

model-based approaches. All data is maintained throughout the process, new data can be

added incrementally, and predictions have a meaningful probabilistic semantics.

In this algorithm, each user’s preferences are interpreted as a manifestation of

their underlying “personal type”. Based on the fact that users’ voting are affected by the

other environmental factors, such as previous users’ votes, current user’s mood … , they


assumed that all users report their rating with Gaussian noise. If we define a user’s

personality type as a vector of “true” rating V i , then user i’s actually rating could be

drawn from an independent normal distribution,

−( x − y ) / 2σ 2
Pr( vi , j = x | vitrue
, j = y) = k ⋅ e

Where σ is a free parameter.

They further assumed that the distribution of voting vector in the database is

representative of the distribution of that in target population of users. So we have,

Pr(Vatrue = Vi ) =

Where n is the total number of users in the database. Then the probability that the active

user has the same personality type with any other user can by calculated by applying

Bayes’ rule.

Pr( Vatrue = Vi | v a ,1 = x1 ,..., v a ,m = x m ) ∝

Pr( v a ,1 = x1 | v atrue true

,1 = v i ,1 ) ⋅ ⋅ ⋅ Pr( v a ,m = x m | v a ,m = v i ,m ) ⋅ Pr( Va
= Vi )

Then the active user’s vote of an unseen product would be,


p (v =x |v = x ,..., v =x ) =
r a, j j a ,1 1 a, m m

true =V ) ⋅ p (V true =V | v
∑ p r ( v a , j = x j | Va i r a i a ,1 = x1 ,..., v a , m = x m )


Now we have seen the memory-based and model-based collaborative filtering

methods. Both methods have their advantages and drawbacks. Memory-based methods

are simple and easy to implement. But they may be time- and space- consuming. At lease,

for memory-based methods, it’s hard to handle two problems mentioned below:

1) Missing data: To find the similarity between users, the difference (distance) between

users has to be computed. If there are missing data, either only the products which all

users voted are used, or give a vote to missing data. In first case, it has problem with

sparse databases. In second case, giving average votes or somewhat negative votes to

the missing data may shadow the similarity between users.

2) Memory-based methods can not handle the situation that two user are very similar but

have not rated the same set of products. For example,

product1 product2 product3 product4 product5 product6

user1 1 0 1 1 1
user2 0 1 1 1 1
user3 1 ?

User1 and user2 are very similar in this example, however, when we use memory-

based methods to predict user3’s preference on product6, only user1’s votes could be

used to predict.


For model-based methods, clustering methods could somewhat handle missing data

by clustering products into fewer categories, the new votes for categories are averaged

over available votes for the products in the category. But Clustering methods may over-

generalize, and hurt the performance. Bayesian network or neural network models could

handle the missing data and the problem (2) mentioned before reasonably well. But for

large databases containing many users, we will end up with thousands of features while

our amount of training data is very limited, those models will become not practical.

Recently, a promising algorithm is proposed. The idea is that users are rating their

products based on the latent features of products. All products in the database share a set

of common features. Users rate products highly because they rate those features highly.

So by factoring peoples’ ratings into features using linear algebra, we could predict how

users will react to documents they have not seen before based on their preferences for

these features. Singular Value Decomposition (SVD) allows us to break down data sets

into these components and analyze the principal components of the data. We will see

below how SVD could be used to capture the hidden features and help to reduce the

dimension of databases.

Singular Value Decomposition:

The user rating vectors can be represented by a m× n matrix A, with m users and

n products,

A =[ai , j ]


Where a i , j is the rating of user i for product j. Through singular value

decomposition, A can by factored into USV T

, where U and V are orthogonal matrices

and the S is a zero matrix, except for the diagonal entries which are defined as the

singular value of A. U is representative of the response of each user to certain features. V

is representative of the amount of each feature present in each product. S is a matrix

related to the feature importance in overall determination of the rating. Here is an

example given by Pryor [Pryor, H. Michael,1998] in his report. Suppose the rating matrix

A is,

5 4 2 6
3 7 5 2

6 4 1 4

The SVD of A would be:

0.6000 − 0.4124 − 0.6855 

U =
0.5811 0.8136 0.0192 

0.5498 − 0.4099 0.7278 

14 .4890 0.0000 0.0000 0.0000 

S =
 0.0000 4.9324 0.0000 0.0000 

 0.0000 0.0000 1.6550 
0.0000 

0.5551 − 0.4218 0.6023 − 0.3889 

0.5982 0.4878 0.1835 0.6088 
V = 
0.3213 0.5744 − 0.3306 − 0.6764 
 
0.4805 − 0.5041 − 0.7031 0.1437 


We can find that the feature described by “14.4890” in S is the most important

feature. So the dimension of S could drop off by selecting only most important features,

in this case only the one represented by “14.4890”. Then the new rating matrix could be

generated, by converting the original rating matrix into the feature space.


The new rating matrix M,

M = US '

In this case, S ' = [14 .4890 ] , after we get the new rating matrix M in the feature space.

We can implement memory-based or model-based methods on this new rating matrix. It

has been shown that exploiting latent structure in matrices of user ratings can lead to

improved predictive performance.

In current recommender systems, Content-Based Filtering (CBF) methods and

Collaborative Filtering (CF) Methods are used. CBF filters information based on

matching information content with user’s interests. CBF is able to filter information that

has not been evaluated by other people. So CBF and CF are combined in recommender

systems. CBF could be used to deal with unlearn products, while CF recommend new

products based on previous users’ votes.

IV. Discussion


As we introduced above, the future recommendation systems should have

following features:

1) Solve the “cold-start” problem.

General collaborative recommendation systems have suffered this problem, that

is, system has no clue to recommend a new item to users or to provide an accurate

predictions for a new user. Since content-based filtering is based on the feature of the

item, there is no such cold-start problem. Fab system has integrated these content-based

fitering and collaborative filtering. Based on this integration, Michelle Keim Condliff et

al[1998], propose a Bayesian methodology for recommendation system. This proposal

uses Bayesian theory to give a good prediction by fully incorporating all of the available

data, such as user ratings, user features, and item features . Claypool [Mark Claypool, et

al. 1999] also provide an approach to solve this cold-start problem. This system bases on

a weighted average of the content-based filtering prediction and collaborative filtering


2) Easy for users to participate or vote

Generally speaking, people do not like to provide recommendation although they

like to receive recommendation. Since the system depends on the votes of users and then

to calculate the similarities of users, so it is very important to get enough data from the

users. So the system should provide very easy interface for a user to vote or provide

annotation. Although explicit annotations or votes will leverage the calculation, implicit

feedback of the users will be more helpful to decrease the sparse matrices, which is used

for similarity calculation. The implicit methods include monitoring user’s behavior and


monitoring user’s browsing time on the page. The longer time a person stays, the more

interesting the person shows. The system also can use compensation methods. For

example, if one needs further recommendation, one must vote what he reads.

3) Privacy

Privacy becomes an issue when a system collects information about its user, so

important social issue s arise on an individual scale as well. In collaborative filtering,

users share the document annotations. In one side, people do not like the release their

private identification, on the other side, people like to see who make the annotations. For

example, if annotation is provided by an expert in this area, people in this group would

like more to read this information. The system should provide a mechanism to allow user

to adopt a pseudonym, also it should provide different level of privacy protection.

4) Algorithm

The good algorithm should have following features:

1. handling missing data

2. handling sparse data

3. cost-efficiency

5. Reference:

Ariyoshi, Yusuke: 1999. Improvement of combination Information Filtering Method

based on Reliabilities.


Billsus, D. and Pazzani, M., 1998. Learning Collaborative Filters. Proceedings of

ICML’98, 46-53. Morgan Kaufman Eds.


Breese, J., Heckerman, D., Kadie, C., 1998. Empirical Analysis of Predictive Algorithms

for collaborative Filtering. Proceedings of the Fourteenth Conference on

Uncertainty in Artificial Intelligence, Madison, WI.

Claypool, Mark; Gokhale, Anuja and Miranda, Tim et. al., 1999, Combining Content-

Based and Collaborative Filters in an online Newspaper.

Collaborative Filtering workshop, 1996, Berkeley, CA. Webpage:

Condliff, Michelle Keim; Lewis, David D.; Madigan, David and Posse, Christian ; 1998,

Bayesian Mixed-Effects Models for Recommender Systems.

Goldberg, D. Nichols, D. Oki, B. M. and Terry, D.: Using collaborative filtering to weave

an information tapestry. Commun. ACM 35, 12, 1992.

Joachims, Thorsten; Freitag, Dayne and Mitchell, Tom 1996, WebWatcher: A Tour

Guide for the World Wide Web.


Lieberman, H. 1996: “Letizia: An Agent That Assists Web Browse,” in MIT Media Lab.

Maltz, David and Ehrlich, Kate 1995: Pointing the way: active collaborative filtering.

Oard, Douglas W. and Marchionini, Gary 1996, A Conceptual FrameWork for Text


Pennock, David M. and Horvitz, Eric 1999. Collaborative Filtering by Personality


Diagnosis: A Hybrid Memory- and Model-Based Approach.

Pryor, H. Michael,1998. The Effects of Singular Value Decomposition on Collaborative

Filtering. Computer Science Technical Report, Dartmouth College. PCS-TR98-


Resnick, Paul and Varian, Hal R. 1997, Recommender Systems. COMMUNICATIONS

OF THE ACM. March 1997/vol. 40, No.3.

Resnick, Paul; Iacovou, Neophytos and et al;, 1994, GroupLens : An Open Architecture

for Collaborative Filtering of Netnews. From Proceedings of ACM 1994

Conference on Computer Supported Cooperative Work, Chapel Hill, NC: pages


Shardanand, Upendra and Maes, Pattie 1995. Social Information Filtering: Algorithms

for Automating “Word of Mouth”.

Terveen, Loren G., Hill, William C. and et al;, 1998, Building Task-Specific Interfaces

to High Volume Conversational Data.

Turnbull, Don: Augmenting Information Seeking on the World Wide Web Using

Collaborative Filtering Techniques. 1998,

Turnbull, Don: KMDI Final Summary: Collaborative Filtering. 1997,

Ungar, Lyle H., and Foster, Dean P. Foster, 1998. A Formal Statistical Approach to


Collaborative Filtering in AAAI Workshop on Recommendation System.

Wittenburg, Kent, Duco Das, Will Hill, and Larry Stead, 1998, Group Asynchronous

Browsing on the World Wide Web.