Академический Документы
Профессиональный Документы
Культура Документы
This application is show articles of different site and show on our panel .in this case our frontend
panel will be mobile application. Why we mobile for our client interaction because it used
mostly.
In mobile industry there are two main competitors’ android and iOS. Android is open source
maintained by Google. IOS is not opensource (developer can’t modify its code and SDK).
Android and IOS are about different technologies and its difficult for us to maintain and learn
both these different technologies. So, we search and discussed some software engineers those
have experience of fields so we found new technology that is called hybrid application.
Now days hybrid applications are very famous like Facebook, AliExpress and daraz.pk many
other applications are using this concept. So, they have to write code for once and compile it for
android and iOS as well. So, this will help us to write code once. What hybrid applications are
developed on Cordova which the main part that compile it and transform it as native application.
Native applications are those that are developed by SDK which are given by company those are
developing and maintain mobile operating system.
Mobile applications mostly dependent on Rest API which make it dynamic. In our system we
chose Laravel which is a web framework written on php programming language. PHP is
dynamic type and script language, which means we have no worry to compile the code and
datatypes of functions and variables.
Most of the time php web applications hosted on Linux Servers. The stack of php development
called LAMP stack. Linux Apache MySQL and PHP.
All these parts of stack are open sourced. We will explain all these part in details.
So, we introduced the technologies we will used for this system. Now we are explaining our
system in simple words.
We want crawl data from news sites top one and want show under same umbrella. A user how
daily read news from different websites. Like BBC and ARY News website. What is news all
about a news have headline and details. So, our idea is to crawl over some renowned sites and
put in our database and write rest api for our mobile application.
So how this system will work basically we write some schedulers we run daily in morning and
crawl on sites and take latest news than put on our database.
According on Crawl and put all data of other sites in our database this make problem for use so
we decide to get data of news sites with heading and some description, we will provide link of
sites for read full news. Because mostly people are interested in headlines.
A Brief History of Web Crawlers
Web crawlers visit net applications, collect knowledge, and find out about new websites from
visited pages. internet crawlers have an extended and attention-grabbing history. Early internet
crawlers collected statistics regarding the online. additionally, to grouping statistics regarding the
online and categorization the applications for search engines, fashionable crawlers will be wont
to perform accessibility and vulnerability checks on the appliance. fast enlargement of the online,
and also the complexness additional to internet applications have created the method of
locomotion a awfully difficult one. Throughout the history of internet locomotion several
researchers and industrial teams self-addressed totally different problems and challenges that
internet crawlers face. totally different solutions are planned to cut back the time and price of
locomotion. playing associate degree thoroughgoing crawl could be a difficult question. to boot
capturing the model of a contemporary internet application and extracting knowledge from it
mechanically is another open question. What follows could be a temporary history of various
technique and algorithms used from the first days of locomotion up to the recent days. we tend to
introduce criteria to gauge the relative performance of internet crawlers. supported these criteria
we tend to plot the evolution of internet crawlers and compare their performance.
How it Works
Crawler is s type of the robot that is on WWW (World Wide Web) and live on the web . Is
known with so many name such as web spider , automatic indexer . No matter what but its
function still remain the same.
A crawler is make and hired by a search engine to update their web data or indices the web
content of other websites. It copies all the pages so that they can be prepare after by search
engine. This is allow all the user to find web pages faster on the search engine. The web crawler
too valid links and HTML code, and sometimes it extracts other information from the website.
Robots.txt
By creating a robots.txt file at the root folder of your website on server with rules for web
crawlers such as allow or disallow, they must follow. You can apply default rules which apply to
all bots and specify their specific User-agent string.
User-agent: *
Disallow: /
User-agent: *
Disallow:
Crawler
Scheduler
URLs Parser
Indexing
Index
Search
Web Scraping
Web Scraping is collection of information from webpages which is extracted using program and
instructions.
Web page is contained information in special language call HTML. When we open site in our
browser it consists on HTML which browser can understand and show use the page.
The main parts in web scraping is to gather the compatible data for our research issue from the lots of
textual data.
Inside the HTML we are usually involved in systematic data – when we want to study all the data
that is use the quantitative methods.
Systematic structures can be numbers or recurrent names for example like countries or address.
There is a three steps:
First, we are extract the Unstructured data which is basically HTML.
Second, we find out about the reappear patterns behind data that we are looking for.
Third, we use these follow to the unstructured text to return the descriptive data.
INTERN
ET
Web
Web
Data
Data
Web Scraping
Service
Web Page
CSV File
Applicatio
Database
n
RDB
Big Heading
A paragraph.
Yahoo
HTML Element Syntax
HTML element is an main part of the html page, written document or Dom. It represents meaning to
the html page.
For example, the title element of the html page represent the title of the page.
Most of the HTML element are written with a open tag and the close tag, with information or data in
between of the opening and the closing tag .
Elements can also contain attributes that defines its other properties.
For example, to put a image in webpage we use html element “<img>”
DOM
The DOM (Document Object Model) could be a program API for HTML (hypertext markup
language) and Extensible Markup Language(XML) documents.
DOM characterize the logical structure of document and therefore the method a document is
manipulated and accessed.
Many other types of data that will be hold on in various method, and far of this could historically
clear as knowledge instead of documents.
nonetheless, Extensible Markup Language (XML) presents this knowledge as documents, and
therefore the Document Object Model (DOM) could also wont to manage that knowledge.
With the Document Object Model (DOM), the documents are produce and build by the programmers
and navigate the structure, and add, modify, or delete parts and their content.
Something found in associate degree hypertext markup language or XML(Extensible Markup
Language) document may be changed, deleted,accessed, or further victimization the DOM
(Document Object Model), with some exceptions - particularly, the DOM (Document Object Model)
interface for the inner set and outer set haven't nonetheless been specific.
Document
Root element
<html>
element element
<head> <body>
element
<title> attribute element element
Text: Text:
Text:
“The Link” “The Header”
“Over Title”
These are stats of player, so if we want save data of player stats in our file or database we download
the html of page, analyze the structure and then we parse the Dom and extract delated data and save
in our database.
So, the core part is to analyze the structure of page, we analyze it well than it is very easy for use to
save data in database.
cURL
Introduction
cURL is command line for the loading or downloading web page content. Basically it provide the
HTML of url.
cURL use the lib curl, it can supports a variety of protocols, presently together with protocol,SOAP
request ,HTML,FTPS, FTP, SFTP, TFTP, SCP, and SMTP etc.
cURL can support https and performs SSL certificatifiction verification once a protected protocol is
such.
Once cURL is connects to far off server with the https, it will get the far of server certificate and
then analysis opposed to its ca certification store the validity of the far of servers to make sure the
far of servers is that the one it comes to be.
Some of the cURL containers area unit pack with ca certificate save file.
Directory wherever the cURL program found.
Present operating directory.
Windows system directory.
Windows directory.
Directories per trail setting variable.
Examples
Essential use of cURL affect just type curl at the command line (cmd):
curl www.goodbyeworld.com
<p>In PHP programming language and how does all of this is working?</p>
// Send request
$result = curl_exec($curl);
echo $result;
?>