Академический Документы
Профессиональный Документы
Культура Документы
Bachelor of Technology
in
Information Technology
Submitted by:
Deepanshu Chambok
(06916401514)
I hereby certify that the work which is being submitted in this Major Project report entitled
Recommendation Engine for Wikipedia Article in partial fulfillment of the degree of
Bachelor of Technology in Information Technology submitted at the University School
of Information and Communication Technology, Guru Gobind Singh Indraprastha
University, Sector 16-C, Dwarka, New Delhi is an authentic report of my own work
carried out under the supervision of Mrs. Anju Saha, USICT, GGSIPU, Delhi.
This is to certify that the Major Project report entitled Recommendation Engine for
Wikipedia Article submitted by Mr. Deepanshu Chambok, Enrol. No. 06916401514 is an
authentic work carried out by him under my guidance. The matter embodied in this Project
Report has not been submitted earlier for the award of any degree or diploma to the best of
my knowledge and belief.
I am sincerely thankful to all who have provided me with invaluable assistance during this
project and have helped me to develop this project report. I take this opportunity to express
my gratitude to my mentor for the guidance she provided and also helping me with a high
conceptual understanding of the project, which helped make my project extremely
interesting to me and gave the impetus to work hard. With her experience and knowledge,
she has been able to guide me ably and successfully towards the successful completion of
the project.
Deepanshu Chambok
(06916401514)
TABLE OF CONTENT
DECLARATION
CERTIFICATE
ACKNOWLEDGEMENT
ABSTRACT
1. TECHNOLOGY USED
1.1. PYTHON LANGUAGE
1.2. PYCHARM IDE
1.3. FLASK
1.4. JAVASCRIPT
1.5. HTML
2. WORKING WITH PYTHON
2.1. PACKAGES INSTALLATION
2.1.1. BEAUTIFUL SOUP PACKAGE
2.1.2. REQUEST MODULE
2.1.3. DATETIME MODULE
2.1.4. CSV MODULE
2.1.5. FLASK MODULE
3. REFERENCES
ABSTRACT
When someone searches for a topic on Wikipedia, they simply get the article on that topic
as a result. If they want more on that topic, like latest news on the topic or tweets and other
posts on that topic, they have to search separately for that.
In this project, I am making a Recommendation Engine that will display the Wikipedia
article on the topic we search for in the search engine search box. It will also display
trending twitter feeds and other popular and trending links on the side, regarding the topic
we have searched on Wikipedia. This is done by crawling the links and tweets to the
system and them displaying them on the same window. This would be beneficial for the
users as it would save some time for them and display the relevant tweets and other links
on the side by hitting a single search button.
1. TECHNOLOGY USED
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
Python interpreters are available for many operating systems. CPython, the reference
implementation of Python, is open source software and has a community-based
development model, as do nearly all of its variant implementations. CPython is managed
by the non-profit Python Software Foundation.
Uses
Large organizations that use Python include Wikipedia, Google, Yahoo!, CERN, NASA,
Facebook, Amazon, Instagram, Spotify[citation needed] and some smaller entities like
ILM and ITA. The social news networking site Reddit is written entirely in Python.
Python can serve as a scripting language for web applications, e.g., via mod_wsgi for the
Apache web server. With Web Server Gateway Interface, a standard API has evolved to
facilitate these applications. Web frameworks like Django, Pylons, Pyramid, TurboGears,
web2py, Tornado, Flask, Bottle and Zope support developers in the design and
maintenance of complex applications. Pyjs and IronPython can be used to develop the
client-side of Ajax-based applications. SQLAlchemy can be used as data mapper to a
relational database. Twisted is a framework to program communications between
computers, and is used (for example) by Dropbox.
1.2. PYCHARM IDE
PyCharm is cross-platform, with Windows, macOS and Linux versions. The Community
Edition is released under the Apache License, and there is also Professional Edition
released under a proprietary license - this has extra features.
Features of PyCharm:
Coding assistance and analysis, with code completion, syntax and error
highlighting, linter integration, and quick fixes
Project and code navigation: specialized project views, file structure views and
quick jumping between files, classes, methods and usages
Python refactoring: including rename, extract method, introduce variable, introduce
constant, pull up, push down and others
Support for web frameworks: Django, web2py and Flask
Integrated Python debugger
Integrated unit testing, with line-by-line code coverage
Google App Engine Python development
Version control integration: unified user interface for Mercurial, Git, Subversion,
Perforce and CVS with changelists and merge.
Uses
With PyCharm you can develop applications in Python. In addition, in the Professional
edition, one can develop Django, Flask and Pyramid applications. Also, it fully supports
HTML (including HTML5), CSS, JavaScript, and XML: these languages are bundled in
the IDE via plugins and are switched on for you by default. Support for the other
languages and frameworks can also be added via plugins.
1.3. FLASK
Flask is a micro web framework written in Python and based on the Werkzeug toolkit and
Jinja2 template engine. It is BSD licensed.
Applications that use the Flask framework include Pinterest,[3] LinkedIn,[4] and the
community web page for Flask itself.[5]
Flask is called a micro framework because it does not require particular tools or libraries.
It has no database abstraction layer, form validation, or any other components where pre-
existing third-party libraries provide common functions. However, Flask supports
extensions that can add application features as if they were implemented in Flask itself.
Extensions exist for object-relational mappers, form validation, upload handling, various
open authentication technologies and several common framework related tools. Extensions
are updated far more regularly than the core Flask program.
Features:
Contains development server and debugger
Integrated support for unit testing
RESTful request dispatching
Uses Jinja2 templating
Support for secure cookies (client side sessions)
100% WSGI 1.0 compliant
Unicode-based
Extensive documentation
Google App Engine compatibility
Extensions available to enhance features desired
1.4. JAVASCRIPT
Alongside HTML and CSS, JavaScript is one of the three core technologies of the World
Wide Web. JavaScript enables interactive web pages and thus is an essential part of web
applications. The vast majority of websites use it, and all major web browsers have a
dedicated JavaScript engine to execute it.
As a multi-paradigm language, JavaScript supports event-driven, functional, and
imperative (including object-oriented and prototype-based) programming styles. It has an
API for working with text, arrays, dates, regular expressions, and basic manipulation of the
DOM, but the language itself does not include any I/O, such as networking, storage, or
graphics facilities, relying for these upon the host environment in which it is embedded.
Initially only implemented client-side in web browsers, JavaScript engines are now
embedded in many other types of host software, including server-side in web servers and
databases, and in non-web programs such as word processors and PDF software, and in
runtime environments that make JavaScript available for writing mobile and desktop
applications, including desktop widgets.
Uses
The most common use of JavaScript is to add client-side behavior to HTML pages, also
known as Dynamic HTML (DHTML). Scripts are embedded in or included from HTML
pages and interact with the Document Object Model (DOM) of the page. Some simple
examples of this usage are:
Loading new page content or submitting data to the server via Ajax without
reloading the page (for example, a social network might allow the user to post
status updates without leaving the page).
Animation of page elements, fading them in and out, resizing them, moving them,
etc.
Interactive content, for example games, and playing audio and video.
Validating input values of a Web form to make sure that they are acceptable before
being submitted to the server.
Transmitting information about the user's reading habits and browsing activities to
various websites. Web pages frequently do this for Web analytics, ad tracking,
personalization or other purposes.
1.5. HTML
Hypertext Markup Language (HTML) is the standard markup language for creating web
pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a
triad of cornerstone technologies for the World Wide Web.[4]
Web browsers receive HTML documents from a web server or from local storage and
render the documents into multimedia web pages. HTML describes the structure of a web
page semantically and originally included cues for the appearance of the document.
HTML elements are the building blocks of HTML pages. With HTML constructs, images
and other objects such as interactive forms may be embedded into the rendered page.
HTML provides a means to create structured documents by denoting structural semantics
for text such as headings, paragraphs, lists, links, quotes and other items. HTML elements
are delineated by tags, written using angle brackets. Tags such as <img /> and <input />
directly introduce content into the page. Other tags such as <p> surround and provide
information about document text and may include other tags as sub-elements. Browsers do
not display the HTML tags, but use them to interpret the content of the page.
HTML can embed programs written in a scripting language such as JavaScript, which
affects the behavior and content of web pages. Inclusion of CSS defines the look and
layout of content.
Uses
HTML is used to create web pages along with CSS and JavaScript. Site authors use HTML
to format text as titles and headings, to arrange graphics on a webpage, to link to different
pages within a website, and to link to different websites.
It is a set of codes that a website author inserts into a plain text file to format the content.
The author inserts HTML tags, or commands, before and after words or phrases to indicate
their format and location on the page. HTML tags are also used to add tables, lists, images,
music, and other elements to a webpage.
2. WORKING WITH PYTHON
import requests
Import datetime
2.1.4. CSV Module
If you want to import or export spreadsheets and databases for use in the Python
interpreter, you must rely on the CSV module, or Comma Separated Values format.
The CSV module includes all the necessary functions built in. They are:
csv.reader
csv.writer
csv.register_dialect
csv.unregister_dialect
csv.get_dialect
csv.list_dialects
csv.field_size_limit
To work with the csv library in python, module is added at the beginning of the code:
Import csv
2.1.5. FLASK
Flask is a micro web framework written in Python. Flask is called a micro framework
because it does not require particular tools or libraries. It has no database abstraction layer,
form validation, or any other components where pre-existing third-party libraries provide
common functions. However, Flask supports extensions that can add application features
as if they were implemented in Flask itself. Extensions exist for object-relational mappers,
form validation, upload handling, various open authentication technologies.
Features :
@app.route("/")
def hello():
return "Hello World!"
if __name__ == "__main__":
app.run()
BIBLIOGRAPHY