Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION
A Recommendation System, can be described to as a system that can run on grouped/non grouped
environment by taking customer’s online history and usage behaviour as one of its input and
producing a likely result for the client along these lines giving its customers an expectation closer
to the real world. Recommender system generally require a huge dataset and a quick registering
framework that can perform examination on the equivalent within seconds.
Recommendation Systems, in easier terms are softwares that are information escalated and
include complex example running on a lot of predefined parameters. Recommender system
customer’s interest with the end goal of suggesting things to buy or look at. They have become
basic applications in media streaming business also, giving proposals that needs huge data sets
with the goal that clients are directed toward the content which are most liked by them and
matches their interest. Many of systems have been proposed till today for performing proposals
and recommendations. The systems for example, content-based, communitarian, information
based and statistic are utilized for proposals.
In the proposed Video Recommendation System, Videos will be suggested and shown by using
content based filtering technique, which can work even in a smaller amount of data.
For building a recommender system from scratch, we face several different problems. Currently
there are a lot of recommender systems based on the user information, so what should we do if the
website has not gotten enough users.
After that, we will solve the representation of a movie, which is how a system can understand a
movie. That is the precondition for comparing similarity between two movies. Movie features
such as genre, actor and director is a way that can categorize movies. But for each feature of the
movie, there should be different weight for them and each of them plays a different role for
recommendation. So we get these questions:
1.2.1 Objective
The goal of our recommendation system is to provide personalized recommendations that help
users find videos relative to their interests and obtain them consistently. In order to improve the
degree of user satisfaction, it is imperative that these recommendations are provided in real-time
and reflect user’s recent activities on the site.
1.2.2 Vision
1.2.3 Mission
Existing system:
The existing engines make use of conventional algorithms for recommendations. In Content based
Recommendation Engine, system generates recommendations from source based on the features
associated with products and the user’s information.
In Context based Recommendation Engine, system requires the additional data about the context
of item consumption like time, mood and behavioural aspects. These data may be used to
improve the recommendation compared to what could be performed without this additional source
of information.
Disadvantages:
The major problem with existing system is it needs a good amount of data to even work
considerably good which can be a challenge for small businesses and startups.
The data which is to be used for training should be precise and filtered. Any mistake in the data
can lead to inaccuracy of the whole system.
Proposed System:
The Proposed Video Recommender System will use Content based filtering technique using
cosine similarity algorithm. This methodology depends on making a plenty of parameters to
describe a particular video or clip file. Thinking about an Video as an model the potential
parameters could be Channel, Genre, Year Released etc.. The bigger the parameter set the better
and simpler it is to coordinate examples with customer’s interest and his online impression. The
parameters would then be assigned weight and consequently a relative need is set for every one of
the parameter. All these parameters are then used to make a customer’s profile. Henceforth we see
that the system finds out about the client interest and choice patters by his interest by
understanding online behaviour.
Advantages:
The major advantage of using Content based filtering algorithm is no requirement of huge dataset.
Using these kind of system, a Video streaming can increase the content consumption per
customer. The Content Based filtering algorithm is flexible in nature.
Feasibility study
Recommender Systems (RSs) are systems capable of predicting the preferences of users over sets
of items (given the historical user-preference data). RSs can be found almost everywhere in the
digital space (e.g. Amazon, Google, Netflix), shaping the choices we make, the products we buy,
the books we read, or movies we watch. However, there are almost no RSs in the academic world,
where we expect they can have a great potential. The feasibility of the project is analysing the
video recommender system based on human interest. During system analysis the feasibility study
of the proposed system is to be carried out. This is to ensure that the proposed system is not a
burden to the users. For feasibility analysis, some of the understanding of the major requirements
for the system is essential.
Economical Feasibility
Technical Feasibility
Social Feasibility
Economical Feasibility:
It is important to make our system to reach to many number of users since the profit is calculated
based on the number of users using the system. If we got the amount more than what we had
invested then we can say that our project is economically feasible.
Technical Feasibility:
Any project developed must not have a high demand on the available technical resources. This
will lead to high demands on the available technical resources. This will lead to high demands
being placed on the user.
Market Feasibility:
A marketing plan maps out specific ideas, strategies, and campaigns based on feasibility study
investigations, and are intended to be implemented. Think of market feasibility studies as a
logistical study and a marketing plan as a specific, planned course of action to take.
Chapter 3
System Environment
About Python
Python is a programming language, which means it’a a language both people and computers can
understand. Python was developed by a Dutch software engineer named Guido van Rossum, who
created the language to solve some problems he saw in computer languages of the time.
Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and
has a large and comprehensive standard library. Python interpreters are available for many
operating systems. C python, the reference implementation of Python, is open source software and
has a community-based development model, as do nearly all of its variant implementations. C
Python is managed by the non-profit Python Software Foundation.
One significant advantage of learning Python is that it’s a general-purpose language that can be
applied in a large variety of projects. Below are just some of the most common fields where
Python has found its use:
Data science
Web development
Computer graphics
Python’s ecosystem is growing over the years and it’s more and more capable of the statistical
analysis.
It’s the best compromise between scale and sophistication (in terms of data processing). Python
emphasizes productivity and readability.
Python is used by programmers that want to delve into data analysis or apply statistical techniques
(and by devs that turn to data science)
There are plenty of Python scientific packages for data visualization, machine learning, natural
language processing, complex data analysis and more.
All of these factors make Python a great tool for scientific computing and a solid alternative for
commercial packages such as MatLab. The most popular libraries and tools for data science are:
NumPy:
The fundamental package for scientific computing with Python, adding support for large, multi-
dimensional arrays and matrices, along with a large library of high-level mathematical functions
to operate on these arrays.
NumPy is a Python package which stands for ‘Numerical Python’. It is the core library for
scientific computing, which contains a powerful n-dimensional array object, provide tools for
integrating C, C++ etc. It is also useful in linear algebra, random number capability etc. NumPy
array can also be used as an efficient multi-dimensional container for generic data. Now, let me
tell you what exactly is a python numpy array.
Pandas:
Pandas is a library for data manipulation and analysis. The library provides data structures and
operations for manipulating numerical tables and time series.
pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with “relational” or “labeled” data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real world data analysis in Python.
Additionally, it has the broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language. It is already well on its way toward this
goal.
We’ll start with a quick, non-comprehensive overview of the fundamental data structures in
pandas to get you started. The fundamental behavior about data types, indexing, and axis
labeling / alignment apply across all of the objects. To get started, import NumPy and load pandas
into your namespace:
Matplotlib:
Matplotlib is a python 2D plotting library which produces publication quality figures in a variety
of hardcopy formats and interactive environments across platforms. Matplotlib allows you to
generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, and more.
All the keywords except True, False and None are in lowercase and they must be written as it is.
The list of all the keywords is given below:
Identifier is the name given to entities like class, functions, variables etc. in Python. It helps
differentiating one entity from another.
An identifier cannot start with a digit. 1variable is invalid, but variable1 is perfectly fine.
Keywords cannot be used as identifiers.
>>> global = 1
File "<interactive input>", line 1
global = 1
^
>>> a@ = 0
File "<interactive input>", line 1
a@ = 0
^
SyntaxError: invalid syntax
Python
Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and
has a large and comprehensive standard library.
Python interpreters are available for many operating systems. C Python, the reference
implementation of Python, is open source software and has a community-based development
model, as do nearly all of its variant implementations. C Python is managed by the non-profit
Python Software Foundation.
Python's large standard library, commonly cited as one of its greatest strengths, provides tools
suited to many tasks. For Internet-facing applications, many standard formats and protocols such
as MIME and HTTP are supported. It includes modules for creating graphical user interfaces,
connecting to relational databases, generating pseudorandom numbers, arithmetic with arbitrary-
precision decimals,[103] manipulating regular expressions, and unit testing.
Some parts of the standard library are covered by specifications (for example, the Web Server
Gateway Interface (WSGI) implementation wsgiref follows PEP 333), but most modules are not.
They are specified by their code, internal documentation, and test suites (if supplied). However,
because most of the standard library is cross-platform Python code, only a few modules need
altering or rewriting for variant implementations.
Anaconda
Machine Learning
Machine learning is learning based on experience. As an example, it is like a person who learns to
play chess through observation as others play. In this way, computers can be programmed through
the provision of information which they are trained, acquiring the ability to identify elements or
their characteristics with high probability.
First of all, you need to know that there are various stages of machine learning:
data collection
data sorting
data analysis
algorithm development
checking algorithm generated
To look for patterns, various algorithms are used, which are divided into two groups:
Unsupervised learning
Supervised learning
With unsupervised learning, your machine receives only a set of input data. Thereafter, the
machine is up to determine the relationship between the entered data and any other hypothetical
data. Unlike supervised learning, where the machine is provided with some verification data for
learning, independent Unsupervised learning implies that the computer itself will find patterns and
relationships between different data sets. Unsupervised learning can be further divided into
clustering and association.
Supervised learning implies the computer ability to recognize elements based on the provided
samples. The computer studies it and develops the ability to recognize new data based on this data.
For example, you can train your computer to filter spam messages based on previously received
information.
Chapter 4
SYSTEM ANALYSIS & DESIGN
Requirement Specification
Software Requirements:
Hardware Requirements:
Preprocessor : i3(minimum),i5
RAM : 4GB(minimum)
K-Nearest Neighbor (K-NN) is a simple algorithm that stores all the available cases and classifies
the new data or case based on a similarity measure.
K-NN classification
Advantages of K-NN :
Cosine similarity is a metric used to measure how similar the documents are irrespective of their
size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-
dimensional space. The cosine similarity is advantageous because even if the two similar
documents are far apart by the Euclidean distance (due to the size of the document), chances are
they may still be oriented closer together. The smaller the angle, higher the cosine similarity.
Values range between -1 and 1, where -1 is perfectly dissimilar and 1 is perfectly similar.
The library contains both procedures and functions to calculate similarity between sets of data.
The function is best used when calculating the similarity between small numbers of sets. The
procedures parallelize the computation and are therefore more appropriate for computing
similarities on bigger datasets.
Derivation:
cosine is usually [−1,1][−1,1], but document vectors (see Vector Space Model) are usually
non-negative, so the angle between two documents can never be greater than 90
degrees, and for document vectors
(d1 , d2) ∈ [0 , 1]
min cosine is 0 (max angle: the documents are orthogonal)
max cosine is 1 (min angle: the documents are the same)
Cosine Normalization
If documents have unit length, then cosine similarity is the same as Dot Product
we can "unit-normalize" document vectors and then compute dot product on them and get
cosine this "unit-length normalization" is often called "cosine normalization" in IR
d′ = d / ∥d∥
Cosine Distance
for documents cosine
(d1 , d2) == [0 , 1]
We can use the Cosine Similarity algorithm to work out the similarity between two things. We
might then use the computed similarity as part of a recommendation query. For example, to get
movie recommendations based on the preferences of users who have given similar ratings to other
movies that you’ve seen.
GENRE
Comedy
Comedy Horror Thriller
Comedy
1 0 0
Horror 1 0 0
0 1 0
Thriller
0 0 1
Horror 0 1 0
Chapter 5
Implementation
Vectorization Output:
Output for Vectorization process
Output of numeric values based on genre:
Output for displaying graph on genre:
Recommender systems are a powerful new technology for extracting additional value for a
business from its user databases. These systems help users find items they want to buy from a
business. Recommender systems benefit users by enabling them to find items they like.
Conversely, they help the business by generating more sales. Recommender systems are rapidly
becoming a crucial tool in E-commerce on the Web. Recommender systems are being stressed by
the huge volume of user data in existing corporate databases, and will be stressed even more by
the increasing volume of user data available on the Web. New technologies are needed that can
dramatically improve the scalability of recommender systems. Recommender systems open new
opportunities of retrieving personalized information on the Internet. It also helps to alleviate the
problem of information overload which is a very common phenomenon with information retrieval
systems and enables users to have access to products and services which are not readily available
to users on the system.
Chapter 7
Future Scope
Cosine similarity calculation do not work well when we don't have enough rating for movie or
when user's rating for some movie is exceptionally either high or low.As an improvement on this
project some other methods such as adjusted cosine similarity can be used to compute similarity.
Adjusted cosine similarity, which is similar to cosine similarity, is measured by normalizing the
user vectors Ux and Uy and computing the cosine of the angle between them. However, unlike
cosine similarity, when computing the dot product of the two user vectors, adjusted cosine
similarity uses the deviation between each of the user’s item ratings, denoted Ru, and their
average item rating, denoted ¯Ru, in place of the user’s raw item rating. In the near future, it will
be installed in Apache Server and so it will be published in internet. Datasets will be updated
continuously and it will make online actual rating predictions to the users whose habits are
changing day by day. As a result, it can be sensitively satisfying current user tastes. Web services
in particular suffer from producing recommendations of millions of items to millions