Вы находитесь на странице: 1из 16

A NEWS READER WEB

APP(Machine Learning)
SUBMITTED BY- RAHUL BANSAL AND TUSHAR BAHETI
OBJECTIVE OF PROJECT:

 To build a web application which generates a video out of the news article
content.
 Our app provides a pictorial representation of article which is more
informative if we compare with bunch of text content.
 Which also helps the user to understand in much appropriate manner through
videos.
DESCRIPTION :

 The App will contain the following features:


 Scraping the news article from the given URL provided by the user.
 Perform summarization of news article using NLP.
 Download images from Google based on the summary.
 Add subtitle to the images.
 Add audio to read the subtitles
 Generate video out of the image and audio.
 Make a web page for getting input from user.
TOOLS AND LIBRARIES:

 Tools/Platform
 1. Python, NLP
 2. Jupyter Notebook
 Anaconda package.
 3. Libraries - BeautifulSoup, Pillow, OpenCv, gTTl
 Frontend
 1. HTML, CSS, Javascript
 Backend
 1. Flask
Software Requirements:

 Python version >= 3.5


 Libraries should be installed.
 Internet Connectivity
Project Description:

 The app starts with the user providing the news article URL.
 Currently we are allowing news article URL of Hindustan Times.
 With Beautiful Soup we are scraping the news article content from the given
URL.
 Text summarization is performed using Natural Language Processing.
 Then we are downloading the images from Google images based on the
summary text.
 After that we are adding subtitles to the images downloaded.
 Audio is generated from the summarized text.
 Combining images and audio we are generating the video, which is our final
product.
NewsPaper Headline:

https://www.hindustantimes.com/india-news/pm-modi-saudi-prince-
mohammed-bin-salman-hold-bilateral-talks-statement-shortly/story-
xK1ug5C0xlwJF4v3JrQPhP.html
Current Progress:

 We have chosen Hindustan Times website for News article scrapping


using Beautifulsoup in python.
 We have also implemented Web Scrapping component to get Text form
The News article.
 Also we have implemented text Summerization using Natural language
processing.
 From this Text summarization text we use Gooogle_image to search
image on google.
 Then we use image to make a video.
First Module:

 The First Module of our Project is Based on scraping the news article content
from the given URL of News article.
 For this we are using Beautifulsoup to scrap the text from the News article.
What is BeautifulSoup And how to use it:

 Beautiful Soup is a Python package for parsing HTML and XML documents
(including having malformed markup, i.e. non-closed tags, so named after tag
soup). It creates a parse tree for parsed pages that can be used to extract
data from HTML, which is useful for web scraping.
 It is available for Python 2.7 and Python 3.
 To Use BeautifulSoup First we have to download the package Beautifulsoup
using Pip or conda.
Second Module

 Second Module of our Project Is based on Natural Language processing.


 Natural Language processing is used to summarize Text.
 Summarize Text is used to fetch Revalant Image form web.
 Based on each Summarize sentence we will fetch Images from Webs. Which
will merge to form an video..
What is Text Summarization?

 Text summarization is the process of distilling the most important information


from a source (or sources) to produce an abridged version for a particular user
(or users) and task (or tasks).
Third Module

 In the second module we are using text summarization algo to text summarize
it.
 And we are using google_image Library to search image on google.
 From this images we are making video to show output to user.
 From this summarized text we make audio using gttl library.
PILLOW

 Python Imaging Library (abbreviated as PIL) (in newer versions known as


Pillow) is a free library for the Python programming language that adds
support for opening, manipulating, and saving many different image file
formats. It is available for Windows, Mac OS X and Linux. The latest version of
PIL is 1.1.7, was released in September 2009 and supports Python 1.5.2–2.7,
with Python 3support to be released "later“.
OPEN CV

 OpenCV (Open source computer vision) is a library of programming


functions mainly aimed at real-time computer vision. Originally developed
by Intel, it was later supported by Willow Garage then Itseez (which was later
acquired by Intel). The library is cross-platform and free for use under the open-
source BSD license.

 OpenCV supports the deep


learning frameworks TensorFlow, Torch/PyTorch and Caffe.
GTTL

 There are several APIs available to convert text to speech in python. One of
such APIs is the Google Text to Speech API commonly known as the gTTS API.
gTTS is a very easy to use tool which converts the text entered, into audio
which can be saved as a mp3 file.
 The gTTS API supports several languages including English, Hindi, Tamil,
French, German and many more. The speech can be delivered in any one of
the two available audio speeds, fast or slow. However, as of the latest
update, it is not possible to change the voice of the generated audio.

Вам также может понравиться