Вы находитесь на странице: 1из 16

Recommendation Engine for Wikipedia Article

Seminar and Progress Report


Submitted in the partial fulfillment of the requirements
For the award of the degree of

Bachelor of Technology
in
Information Technology

Submitted by:
Deepanshu Chambok
(06916401514)

Under the guidance of:


Mrs. Anju Saha
(Associate Professor, USICT)

University School of Information and Communication Technology


GGS Indraprastha University, New Delhi
(2014-2018)
DECLARATION

I hereby certify that the work which is being submitted in this Major Project report entitled
Recommendation Engine for Wikipedia Article in partial fulfillment of the degree of
Bachelor of Technology in Information Technology submitted at the University School
of Information and Communication Technology, Guru Gobind Singh Indraprastha
University, Sector 16-C, Dwarka, New Delhi is an authentic report of my own work
carried out under the supervision of Mrs. Anju Saha, USICT, GGSIPU, Delhi.

Date: Student Name: Deepanshu Chambok


B.tech (IT) – 8th Semester
Enrol. No.: 06916401514
CERTIFICATE

This is to certify that the Major Project report entitled Recommendation Engine for
Wikipedia Article submitted by Mr. Deepanshu Chambok, Enrol. No. 06916401514 is an
authentic work carried out by him under my guidance. The matter embodied in this Project
Report has not been submitted earlier for the award of any degree or diploma to the best of
my knowledge and belief.

Date: Mrs. Anju Saha


Associate Professor
USICT, GGSIPU
ACKNOWLEDGEMENT

I am sincerely thankful to all who have provided me with invaluable assistance during this
project and have helped me to develop this project report. I take this opportunity to express
my gratitude to my mentor for the guidance she provided and also helping me with a high
conceptual understanding of the project, which helped make my project extremely
interesting to me and gave the impetus to work hard. With her experience and knowledge,
she has been able to guide me ably and successfully towards the successful completion of
the project.

I also express my sincere thanks to University School of Information and Communication


Technology (USICT), Guru Gobind Singh Indraprastha University for providing me this
opportunity and allowing us to work on this project. I would like to mention thanks to my
project coordinators and faculties of my institution.

Deepanshu Chambok
(06916401514)
TABLE OF CONTENT

DECLARATION
CERTIFICATE
ACKNOWLEDGEMENT
ABSTRACT
1. TECHNOLOGY USED
1.1. PYTHON LANGUAGE
1.2. PYCHARM IDE
1.3. FLASK
1.4. JAVASCRIPT
1.5. HTML
2. WORKING WITH PYTHON
2.1. PACKAGES INSTALLATION
2.1.1. BEAUTIFUL SOUP PACKAGE
2.1.2. REQUEST MODULE
2.1.3. DATETIME MODULE
2.1.4. CSV MODULE
2.1.5. FLASK MODULE
3. REFERENCES
ABSTRACT

When someone searches for a topic on Wikipedia, they simply get the article on that topic
as a result. If they want more on that topic, like latest news on the topic or tweets and other
posts on that topic, they have to search separately for that.

In this project, I am making a Recommendation Engine that will display the Wikipedia
article on the topic we search for in the search engine search box. It will also display
trending twitter feeds and other popular and trending links on the side, regarding the topic
we have searched on Wikipedia. This is done by crawling the links and tweets to the
system and them displaying them on the same window. This would be beneficial for the
users as it would save some time for them and display the relevant tweets and other links
on the side by hitting a single search button.
1. TECHNOLOGY USED

1.1. PYTHON LANGUAGE

Python is an interpreted high-level programming language for general-purpose


programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significant whitespace.
It provides constructs that enable clear programming on both small and large scales.

Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.

Python interpreters are available for many operating systems. CPython, the reference
implementation of Python, is open source software and has a community-based
development model, as do nearly all of its variant implementations. CPython is managed
by the non-profit Python Software Foundation.

Uses
Large organizations that use Python include Wikipedia, Google, Yahoo!, CERN, NASA,
Facebook, Amazon, Instagram, Spotify[citation needed] and some smaller entities like
ILM and ITA. The social news networking site Reddit is written entirely in Python.

Python can serve as a scripting language for web applications, e.g., via mod_wsgi for the
Apache web server. With Web Server Gateway Interface, a standard API has evolved to
facilitate these applications. Web frameworks like Django, Pylons, Pyramid, TurboGears,
web2py, Tornado, Flask, Bottle and Zope support developers in the design and
maintenance of complex applications. Pyjs and IronPython can be used to develop the
client-side of Ajax-based applications. SQLAlchemy can be used as data mapper to a
relational database. Twisted is a framework to program communications between
computers, and is used (for example) by Dropbox.
1.2. PYCHARM IDE

PyCharm is an Integrated Development Environment (IDE) used in computer


programming, specifically for the Python language. It is developed by the Czech company
JetBrains. It provides code analysis, a graphical debugger, an integrated unit tester,
integration with version control systems (VCSes), and supports web development with
Django.

PyCharm is cross-platform, with Windows, macOS and Linux versions. The Community
Edition is released under the Apache License, and there is also Professional Edition
released under a proprietary license - this has extra features.

Features of PyCharm:
 Coding assistance and analysis, with code completion, syntax and error
highlighting, linter integration, and quick fixes
 Project and code navigation: specialized project views, file structure views and
quick jumping between files, classes, methods and usages
 Python refactoring: including rename, extract method, introduce variable, introduce
constant, pull up, push down and others
 Support for web frameworks: Django, web2py and Flask
 Integrated Python debugger
 Integrated unit testing, with line-by-line code coverage
 Google App Engine Python development
 Version control integration: unified user interface for Mercurial, Git, Subversion,
Perforce and CVS with changelists and merge.

Uses
With PyCharm you can develop applications in Python. In addition, in the Professional
edition, one can develop Django, Flask and Pyramid applications. Also, it fully supports
HTML (including HTML5), CSS, JavaScript, and XML: these languages are bundled in
the IDE via plugins and are switched on for you by default. Support for the other
languages and frameworks can also be added via plugins.
1.3. FLASK

Flask is a micro web framework written in Python and based on the Werkzeug toolkit and
Jinja2 template engine. It is BSD licensed.
Applications that use the Flask framework include Pinterest,[3] LinkedIn,[4] and the
community web page for Flask itself.[5]

Flask is called a micro framework because it does not require particular tools or libraries.
It has no database abstraction layer, form validation, or any other components where pre-
existing third-party libraries provide common functions. However, Flask supports
extensions that can add application features as if they were implemented in Flask itself.
Extensions exist for object-relational mappers, form validation, upload handling, various
open authentication technologies and several common framework related tools. Extensions
are updated far more regularly than the core Flask program.

Features:
 Contains development server and debugger
 Integrated support for unit testing
 RESTful request dispatching
 Uses Jinja2 templating
 Support for secure cookies (client side sessions)
 100% WSGI 1.0 compliant
 Unicode-based
 Extensive documentation
 Google App Engine compatibility
 Extensions available to enhance features desired
1.4. JAVASCRIPT

JavaScript (JS) is a high-level, interpreted programming language. It is a language which is


also characterized as dynamic, weakly typed, prototype-based and multi-paradigm.

Alongside HTML and CSS, JavaScript is one of the three core technologies of the World
Wide Web. JavaScript enables interactive web pages and thus is an essential part of web
applications. The vast majority of websites use it, and all major web browsers have a
dedicated JavaScript engine to execute it.
As a multi-paradigm language, JavaScript supports event-driven, functional, and
imperative (including object-oriented and prototype-based) programming styles. It has an
API for working with text, arrays, dates, regular expressions, and basic manipulation of the
DOM, but the language itself does not include any I/O, such as networking, storage, or
graphics facilities, relying for these upon the host environment in which it is embedded.

Initially only implemented client-side in web browsers, JavaScript engines are now
embedded in many other types of host software, including server-side in web servers and
databases, and in non-web programs such as word processors and PDF software, and in
runtime environments that make JavaScript available for writing mobile and desktop
applications, including desktop widgets.

Uses
The most common use of JavaScript is to add client-side behavior to HTML pages, also
known as Dynamic HTML (DHTML). Scripts are embedded in or included from HTML
pages and interact with the Document Object Model (DOM) of the page. Some simple
examples of this usage are:

 Loading new page content or submitting data to the server via Ajax without
reloading the page (for example, a social network might allow the user to post
status updates without leaving the page).
 Animation of page elements, fading them in and out, resizing them, moving them,
etc.
 Interactive content, for example games, and playing audio and video.
 Validating input values of a Web form to make sure that they are acceptable before
being submitted to the server.
 Transmitting information about the user's reading habits and browsing activities to
various websites. Web pages frequently do this for Web analytics, ad tracking,
personalization or other purposes.

1.5. HTML
Hypertext Markup Language (HTML) is the standard markup language for creating web
pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a
triad of cornerstone technologies for the World Wide Web.[4]

Web browsers receive HTML documents from a web server or from local storage and
render the documents into multimedia web pages. HTML describes the structure of a web
page semantically and originally included cues for the appearance of the document.

HTML elements are the building blocks of HTML pages. With HTML constructs, images
and other objects such as interactive forms may be embedded into the rendered page.
HTML provides a means to create structured documents by denoting structural semantics
for text such as headings, paragraphs, lists, links, quotes and other items. HTML elements
are delineated by tags, written using angle brackets. Tags such as <img /> and <input />
directly introduce content into the page. Other tags such as <p> surround and provide
information about document text and may include other tags as sub-elements. Browsers do
not display the HTML tags, but use them to interpret the content of the page.

HTML can embed programs written in a scripting language such as JavaScript, which
affects the behavior and content of web pages. Inclusion of CSS defines the look and
layout of content.
Uses
HTML is used to create web pages along with CSS and JavaScript. Site authors use HTML
to format text as titles and headings, to arrange graphics on a webpage, to link to different
pages within a website, and to link to different websites.

It is a set of codes that a website author inserts into a plain text file to format the content.
The author inserts HTML tags, or commands, before and after words or phrases to indicate
their format and location on the page. HTML tags are also used to add tables, lists, images,
music, and other elements to a webpage.
2. WORKING WITH PYTHON

2.1. PACKAGES INSTALLATION

2.1.1. BeautifulSoup Package


Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works
with your favorite parser to provide idiomatic ways of navigating, searching, and
modifying the parse tree. It commonly saves programmers hours or days of work.

2.1.2. Request Module


Requests is an Apache2 Licensed HTTP library, written in Python. It is designed to be
used by humans to interact with the language. This means you don’t have to manually add
query strings to URLs, or form-encode your POST data.
Requests will allow you to send HTTP/1.1 requests using Python. With it, you can add
content like headers, form data, multipart files, and parameters via simple Python libraries.
It also allows you to access the response data of Python in the same way.
To work with the Requests library in Python, you must import the appropriate module.
You can do this simply by adding the following code at the beginning of your script:

import requests

2.1.3. DateTime Module


The datetime module supplies classes for manipulating dates and times in both simple and
complex ways. While date and time arithmetic is supported, the focus of the
implementation is on efficient attribute extraction for output formatting and manipulation.
To work with the datetime library in python, module is added at the beginning of the code:
Import time

Import datetime
2.1.4. CSV Module
If you want to import or export spreadsheets and databases for use in the Python
interpreter, you must rely on the CSV module, or Comma Separated Values format.
The CSV module includes all the necessary functions built in. They are:

 csv.reader
 csv.writer
 csv.register_dialect
 csv.unregister_dialect
 csv.get_dialect
 csv.list_dialects
 csv.field_size_limit

To work with the csv library in python, module is added at the beginning of the code:

Import csv

2.1.5. FLASK
Flask is a micro web framework written in Python. Flask is called a micro framework
because it does not require particular tools or libraries. It has no database abstraction layer,
form validation, or any other components where pre-existing third-party libraries provide
common functions. However, Flask supports extensions that can add application features
as if they were implemented in Flask itself. Extensions exist for object-relational mappers,
form validation, upload handling, various open authentication technologies.

Features :

 Contains development server and debugger


 Integrated support for unit testing
 RESTful request dispatching
 Support for secure cookies (client side sessions)
 Unicode-based
 Google App Engine compatibility

Example Code Of Flask :

from flask import Flask


app = Flask(__name__)

@app.route("/")
def hello():
return "Hello World!"

if __name__ == "__main__":
app.run()
BIBLIOGRAPHY

[1] Zed Shaw : Learn Python the Hard Way


https://learnpythonthehardway.org/book/ex51.html
[2] FLASK Tutorial
https://www.tutorialspoint.com/flask/index.htm
[3] HTML, CSS, JAVASCRIPT Tutorial
https://www.w3schools.com/js/default.asp
[4] Search Engine Python
http://www.zackgrossbart.com/hackito/search-engine-python/
[5] Python Libraries
https://docs.python.org/3/library/
[6] Flask A Micro Framework
http://flask.pocoo.org/
[7] Python Web Framework
https://wiki.python.org/moin/WebFrameworks
[8] Zip In Python
https://www.geeksforgeeks.org/zip-in-python/
[9] Python Sets
https://www.w3resource.com/python/python-sets.php

Вам также может понравиться