Вы находитесь на странице: 1из 40

Student Name & CSU

ID
Project Topic Title Analysing the website and measuring the web trends using web crawler based on page tagging for
increasing the productivity of businesses

1|Page
Evaluation & analysis Table week 6

Name- Evaluation of models based on their Customer behaviour prediction:

Purpose- To evaluate the best one solution which is able to detect the consumer behaviour based on the web measuring trend using
the web crawler with aim of increasing the business productivity.

Analytics
Application
Accuracy
base

Customer classification

e-shopping satisfaction

Engagement of URL
Visualization based

Purchase intention
Model Tools
Reference Technique
No. used

Review based
Personal sites

Engagement
Social media

Statistical
Real-time

Web/Data
Social
High

Low

Simulatio
Suchacka&Wo Machine n model
1.     
tzka, 2017 learning

Liu, et. al., Virtual


2. ARIMA   
2018 reality

Service-
3. Li et al.,2016 Polar Hub   oriented  
architectur

2|Page
e
Social
Social media media
Stieglitz et analytics
4.   data and  
al.,2018
4Vs of the
big Data
Google
exploits
DNS weighted
García-Dorado, footprints keywords,
5.    
et. al., 2018 Big Data,
DNS
Cache
KEY
value
store,
6. Deka, 2018 NoSQL       Hadoop,H       
ypercat,
metacrawl
er
collaborati
ve
Hwangbo, et. filtering
7. K-RecSys     
al.,2018 algorithm

8.   WordNet,  
Ireland & Liu, Online product  
Part-of-

3|Page
2018 review Speech
Tagger,
Pling
Stemmer,

Intelligent soft Java API,


Rekik, et. al., computing Search
9. technique  engine 
2018

Balbi, et. Web 2.0 Social


10.     
al.,2018 media

User interest- web graph


Saleheen based web visualizati
11. graph (UIWG).     on 
&Lai,2018

Web 2.0 social Social


12. Alalwan, 2018 media  media ads 
 

Big data fingerprint


Kobusinska, et. fingerprint ing
13.    
al., 2018 analytics
tool
grey
14. Liu, 2018 fuzzy semantics   
situations
and text
and fuzzy

4|Page
mining (GFuzzy)

Resource
dependenc
e theory
hybrid content (RDT)
15. Wu &Lin,2018    
analytics Innovation
diffusion
theory
(IDT)
Social
media and
Duarte, et. al., e-WOM
16.   social  
2018 service

Psychoph
ysiology,
sophisticat
Ciechanowski, ed bots
17. chat bot       
et. al., 2018 and social
robots

Fatehkia, et. Facebook ads Digital


18.     trace of 
al., 2018
data,

5|Page
regression
model

Neural
network
Serrano Neurocomputin and
19. &Gelenbe, g    Intelligent
2018 Internet
Search
Assistant
e-mail tracking Machine
Haupt, et. al., technology
20.   learning  
2018
algorithm

Neuromarketin Neuromar
Eye-tracking keting,
g, website 2.0,
21. technology  website   
Area of internet
2.0, Area
of interest

Home
Gamification-
Area
based
AlSkaif, et. al., Network,
22. residential   
2018 smart
customers
meters,
mobile
and web

6|Page
applicatio
ns, data
analytics

Customer churn Churn


Amin, et. al., prediction
23. predication(CP    
2018
P)

Single source
panel data Latent-
Nakano & Class
24. based    
Kondo, 2018 Cluster
segmentation Analysis

Social
media
typology of
intelligenc
social media
25. Lee, 2018   e, social  
analytics
media
analytics

Result: The proposed table concludes the best two solutions that is one is based on the NoSQL based application in crawling and the
another one based on chat bot for the human interaction

7|Page
Name- Evaluation table based on performance based criteria

Purpose- to evaluate the performance of various models having precise customer behaviour predication, considering
systemavailability, usability, reliability etc. as parameter for analysis the best sought out techniques solution.

Availability Reliability Usability Tracking Reporting

Personalization
Model

documentation
Authorization

Completeness
Reference

Page viewing
E-commerce
Accessibility

Consistency
no.

auditability

Definition/

credibility

Meta data
timeliness

Accuracy

Integrity

visiting
E-mail
17 Ciechanowski, et.
     
al., 2018

11 Saleheen
   
&Lai,2018

12 Alalwan, 2018   

8 Ireland & Liu,


     
2018

6 Deka, 2018        
10 Balbi, et. al.,2018   

18 Fatehkia, et. al.,


   
2018

8|Page
16 Duarte, et. al.,
  
2018

13 Kobusinska, et.
   
al., 2018

5 García-Dorado, et.
  
al., 2018

4 Stieglitz et
      
al.,2018

1 Suchacka&Wotzk
  
a, 2017

7 Hwangbo, et.
  
al.,2018

20 Haupt, et. al.,


      
2018

14 Liu, 2018  

Result-From the above best screen out solution further analysis is made and thus it conclude the method of social media analytics for
the scrape the behavioural data and second and again NoSQL based solution is also found to be better.

9|Page
Name- Business enhancement based evaluation table

Purpose- From the above concludes analysis further analysis is being evaluated by considering the concept of enhancement of
business productivity in respect with the desired area.

Rate of Maintain
Volume of data
Web quality interest backend
retrieval
(ROI) data
Model
Reference Dynami High Low More Less
no Custom
Real Static c Retailer
er
time changes predicat based
based
ion

18 Fatehkia, et. al.,


   
2018

20 Haupt, et. al.,


  
2018

4 Stieglitz et
  
al.,2018

6 Deka, 2018     

8 Ireland & Liu,


  
2018

1 Suchacka&Wotzk
 
a, 2017

7 Hwangbo, et. al.,  

10 | P a g e
2018

20 Haupt, et. al.,


   
2018

Result- From the above analysis it is concluded that the NoSQL based application for crawling is best model solution for the proposed
research due to its characteristics of maintain backend data management and real time enhancement of productivity.

11 | P a g e
Justification

To critically analyse the proposed justification, an evaluation table is being prepared with respect to the web analytics measurement in
the web trend by using the web crawler. The proposed scheme is justifies by undertaking the three evolution criteria. In the first
evolution table based on the predication of the customer behaviour which is the main goal of the research, proposed table concludes
the best two solutions that is one is based on the NoSQL based application in crawling and the another one based on chat bot for the
human interaction. In the second evolution is being made based on the performance criteria which gives the method of social media
analytics for the scrape the behavioural data and second and again NoSQL based solution is also found to be better. At the end of the
evolution analysis is made based on the enhancement of business productivity, which emphases that NoSQL based application is quite
suitable.

Thus, from the above justification, NoSQL based application in crawling will be considers appropriate for the proposed scheme.

12 | P a g e
State of Art- Current best solution
From the above-presented analysis tables, this work concludes the best solution for the presented paper solution best on the NoSQL
based application for the crawling web trend (Deka, 2018). The presented work is applicable in storing the crawler database and
avoids the duplication of the URL. In the above mention hash tables, Map reduced based framework for the NoSQL based application
is quite appropriate for measuring the web trend. The above-presented table evaluates that NoSQL based application for the web
crawling is superior in terms of enhancing the performance, most appropriate for scrapping the consumer behavior and reliable in
terms of improving the quality of the web trend measuring analytics in the manner of incrementing the productivity of the e-commerce
business (Deka, 2018)

13 | P a g e
1. Draw the state of art diagram. Use Blue dotted border for good feature in this work, and the red dotted border for the
limitations of it; the text in the diagram in 'font 8 or 9', and the line spacing inside each box in the diagram should be 'Single',
and the spacing before and after text should be '0, 0' .

Then, write a paragraph to describe the state of art diagram. You need to refer to the figure in your writing.

Map reduced based


NoSQL web crawler
application Illustrated the purpose of the
development of the model
Clarifies the purpose
Making the URL list which will not be reliable
of development
from the URLs

Registering of state of Mapping It evaluates text data


slaves to establish text
index
Collection of the Collection It will avoids the
data from the web duplication of the URL
page which will
Reduce
It rates the process till the
achievement of result

Request the URL


Download URL
Figure1- the above-presented work represent the best current solution for the crawling. Blue dots highlight the feature of the
work and red dots represent the limitation of the work.

14 | P a g e
Analysis and description
The above figure1 represents the processing mechanism of Map Reduced framework of the
NoSQL based database for the crawling application. The process mechanism begins with the
process of determination of the goal and purpose then it will collect the various URL for
crawling and map it in a sense that it will able to avoid the duplication of any URL. The
mapping process also illustrates in registering the state of slaves to crawl the web (Deka,
2018). Process begins with determine what data of the consumer should be scraping and
determine the search engine spider, for their business campaign. After the determination of the
goal of the, user will specify the URL. The mapping process also illustrates in registering the
state of slaves to crawl the web (Deka, 2018). The mapping process also uses the Beevolve Web
Crawler API for monitoring the social media. After the complete registration of the state of the
slaves, now it will follow up the RESTful JSON API in order to obtain the crawling data from
the NoSQL database. Crawling application will start generating the crawling data by merging the
URL list retrieved from the crawler frontier. The main feature of this work is that it will avoid
the duplication of any URL. Reducing stage will enhance the quality of the web by scrapping
the best web pages first. It will also integrate the reducing phase by holding the virtual hosting in
the IP address (Deka, 2018).

15 | P a g e
2. Describe the Stages/Parts of the system.

Determination of the goal of E-commerce business


At this phase, e-commerce business should plan a vision of their business motive of the target
customer, determine what data of the consumer should be scraping and determine the search
engine spider, for their business campaign (Deka, 2018). In the above framework, a commercial
search engine spider is generated which is liable to scrape the new resources in the net in order to
enhance the business productivity.

Specifying the URL for crawling


After the determination of the goal of the, user will specify the URL which has to be crawled via
the Graphical User Interface of the crawler. In that phase, the entire URL is being collected and
mold in the crawler frontier and specifies the URLs list.

Mapping process
After specifying the URL list generated from the GUI crawler, spider search engine will start
generating the URL and making the list of the URL from the retrieved crawler frontier and will
allocate all the links presented in the URL list campaign. The mapping process in the crawling is
based on the human-machine interaction in the new grey area of the It. The mapping process also
illustrates in registering the state of slaves to crawl the web (Deka, 2018). The mapping
process also uses the Beevolve Web Crawler API for monitoring the social media. After the
complete registration of the state of the slaves, now it will follow up the RESTful JSON API in
order to obtain the crawling data from the NoSQL database. These processes will also use the
80legs crawler for cost-effective distribution of the web crawler in the web measuring.

Collection of the crawling data


After the successful implementation of the mapping process, the crawling application will start
generating the crawling data by merging the URL list retrieved from the crawler frontier. The
main feature of this work is that it will avoid the duplication of any URL. Again, it will start
adjusting the Repositories of the URL and the documents. The query, which is to be inserted in
the crawling, is likely to be HTML and RDF and finally stored the extracted links retrieved from
the crawler frontier.

16 | P a g e
Reducing stage
After storing the cumulative crawling for spider search engine, it will start reducing mechanism,
which will request the URL from the various web pages and will reduce it by downloading the
URL in search engine. Further, it will go on in order for addressing and rates the process until the
achievement of the result. It will make use of the semantic based web data retrieval, which
will be liable for enhancing the data storage of the spider search engine also increase the
processing power of the web crawler. Reducing stage will enhance the quality of the web by
scrapping the best web pages first. It will also integrate the reducing phase by holding the virtual
hosting in the IP address (Deka, 2018).

Thus from the above classifies stages, it is concluded that web crawler application should be
actually fit with the map reduced based programming model. Below there is mechanism,
illuminated in figure 2 & 3, for the map reduced programming model and illustrates the process
criteria of the crawler frontier.

17 | P a g e
Downloading

Extracting links Crawler Frontier

Storing extracted links

Figure-2 crawl the web pages

Determination of the goal of E-commerce business

Specifying the URL for crawling

Mapping process

Collection of the crawling data

Reducing stage

Figure 3- map reduced based programming model

18 | P a g e
3. Based on the purpose of the stage you are work on it, you need to provide the steps that
used to reach the goal. You also need to mention the important of the state of art
features in your project. You also need to mention the limitations if they are available in
this stage. You also need to mention why this part has got this limitations (JUSTIFY).

Mapping process
The mapping process in the crawling is based on the human-machine interaction in the new
grey area. The mapping process also illustrates in registering the state of slaves to crawl the web
(Deka, 2018). The mapping process also uses the Beevolve Web Crawler API for monitoring the
social media. After the complete registration of the state of the slaves, now it will follow up the
RESTful JSON API in order to obtain the crawling data from the NoSQL database. These
processes will also use the 80legs crawler for cost-effective distribution of the web crawler in the
web measuring.

Reducing stage
It will go on in order for addressing and rates the process until the achievement of the result. It
will make use of the semantic based web data retrieval, which will be liable for enhancing the
data storage of the spider search engine also increase the processing power of the web crawler.
Reducing stage will enhance the quality of the web by scrapping the best web pages first. It will
also integrate the reducing phase by holding the virtual hosting in the IP address (Deka, 2018).

However, in the core process between the mapping and reducing stage there is a fear of
malicious attack whiles the collection of URL from the URL list.

19 | P a g e
4. Describe the system/model and the output of it, and clarify if the output is accepted in
your project domain or not.
Then you should provide the limitation of it, and where and why this limitation occurs
(Analyse and JUSTIFY).

Map reduced based framework for the NoSQL based application is quite appropriate for
measuring the web trend. It is able to scrape the consumer behaviour accuracy increased up to
30-40%. NoSQL based database will result in better utilisation of crawler frontier in order to
maintain the spider search engine. .However, it seemed that crawler facing the core issue in
providing the security concern in each platform. NoSQL database is not a specified programming
interferes but, the adoption of Hadoop Talent Bin and Key value store for the database backend
solution and big data volumeis illustrated which is able to connect them with the DNS server.
Honeypot computing is will be adopted for detecting the unauthorized malicious attacks.

20 | P a g e
5. You also need to draw the logical flow diagram of it.

Algorithm- Predication based UDT and LTD, training process set and distance factor

Input- matrix with m × n elements for defining the training set

Output- predicting result based on UDT and LTD predication in the social media analytics.

Step1: BEGIN

Step2: Get the resource prediction on the basis of the input parameter

Step3: data pre-processing stage

Step 4: enabling the distance factor based matrix

Step5: selection of sample based on the

W= I, where i= (100,200,300……n)

Step6: defining the issue based on,


𝑛𝑥 𝑛(𝑛−1)𝑥 2
Accuracy- TP+TN(1 + 𝑥)𝑛 = 1 + + +⋯
1! 2!

Step7: defining the training process,

T=w+1..(N-w-1)

Step8: validation phase of L and U

L=1..W and U=N-w….N

21 | P a g e
Start

Input thematrix with m × n elements


for defining the training set

Data pre-processing stage

Distance factor enabling Assuming A


as the matrix

TP+TN(1 +
𝑥)𝑛 = 1 + Defining accuracy
𝑛𝑥
+
1!
𝑛(𝑛−1)𝑥 2 𝑣1
+⋯
2! ⬚
A൦ ൪= vij×Rm*n

Defining the training process,
… 𝑣1, 𝑛
T=w+1... (N-w-1)

Figure4-a flowchart of churn prediction in the social media analytics

22 | P a g e
Appendix

For the equation

T= training set

W= web crawl

N= no of the training set

𝑛𝑥 𝑛(𝑛−1)𝑥 2
TP+TN(1 + 𝑥)𝑛 = 1 + + + ⋯ = accuracy determination based on training set and
1! 2!
benchmark framework

For abbreviation

HTTP= Hypertext transfer protocol

URL= Uniform Resource Locator

IP = Internet protocol

GUI = Graphical User Interface

LTD and UTD= Lower transfer distance and Upper transfer distance

23 | P a g e
Week-8

1. Give introductory about the idea of the proposed solution that comes from the features of
the first and second/third best solutions.

After evaluating various technologies for crawling based application for scrapping the consumer
behaviour, this research work had highlighted the pros and cons. of the various methods based on
the issue of efficiency, accuracy, reliability, traceability and statistical approaches.

In this research, NoSQL presented by Deka. (2018) found to be the most effective application for
crawling web trends. It is having the feature of avoiding copies or duplication of URLs, which
helps to increase web trends. A NoSQL technique enabled businesses is used in measuring
analytics and helps to increase the productivity of e-commerce businesses (Deka. 2018).
Though state of art solution has many advantages, the security concern is the most focused
limitation that occurred on each platform. Here it has also seen that machine-human interaction
is required in a way to get the ratings on the pages. Hence, to overcome the limitations which
has occurred in the state of art solution, Hadoop talent bin and honeypot computing is
proposed, this method is far different and has a significant work contribution for the above
limitation. Proposed technology is having features of defending malware attacks seemed at the
time of acquisition of the URL from the URL list from the crawler frontier.

24 | P a g e
2. Draw the proposed solution diagram. Take the copy of the state of art diagram, and only
change the text of the placed that you enhanced to propose the new solution. Use Green
dotted border for new feature in this work, and remove the red dotted border; the text in
the diagram in 'font 8 or 9', and the line spacing inside each box in the diagram should be
'Single', and the spacing before and after text should be '0, 0’. Then, write a paragraph to
describe the state of the art diagram. You need to refer to the figure in your writing.

25 | P a g e
26 | P a g e
Figure (3) - Introduction of proposed feature for limitation in state of art
3. Describe the proposed system diagram; also refer to the figure number in your describing.
What are the components of the proposed system, and refer to the diagram? Show the
NEW features and workflow of the proposed system compared with the state of art.

In the figure (3), features of proposed and state of art solutions are given. At the initial stage,
mapping is done through the three stages that are based on criteria of URL list preparation,
state of slave registration and text data evaluation. Further, data collection procedure takes
place. In which all the data is collected. The proposed system consists of three major stages (Fig.
3) mapping, de-duplication from the NoSQL and reducing.

The research is focused on the elimination of chances of external malicious attacks. To


overcome the limitation, honeypot method has proposed which mentioned in the figure.
Honeypot method enables system to prevent malicious attacks occurred at the time of
transmission of URL to the registered slave for the Site promotion.

Proposed method routed the attacks to the replicas of system, which has prepared virtually to
prevent the malicious attacks. In the process, if the attacks found malicious it redirected to the
virtual system. On the other hand, if attack found normal than it gone for the de-duplication
process. Here, state of art solution eliminates the chances of duplication of URLs. Reducing
stage will later on starting the initialising for the promotion of URL at the top of the search
engine for the e-commerce business website.

27 | P a g e
4. What are the new proposed components that you Modified and Enhanced. What is the
purpose of each modified and enhanced, and the impact of each on your results (how
each solved the limitations in state of art).

In the 4thstage, the proposed method Honeypot has proposed. It is found that the honeypot
method is effective in eliminating the chances of malicious attacks, which is generated from the
attacker end and finish at the system. Honeypot method is proposed in the way to reduce the
chances of external malware attacks that reduce the overall efficiency of the system. Honeypot
method eliminates the malicious attacks by routing them to the virtual system instead of actual
system.

In the Honeypot method stage, external attacks are focused and the proposed method route the
attacks to the virtual system rather than the actual system. Current best solution found
ineffective in a way to prevent external malicious attacks. However, Honeypot might be an
appropriate choice for overcoming the limitation. Hadoop method provides feature of detecting
and control the crawling of the various web pages. It has found that there are high chances of
crawling on web pages (Amin et al. 2018).

28 | P a g e
5. AREA OF IMPROVEMENT: Show how many components you modified, and how
you modified, what is the importance of each modified equation and what has solved.

For justification of this research, proposed solution is consisting of an equation.


Honeypot method eliminates the malicious attacks by routing them to the virtual system
instead of actual system. The proposed equation is based on attributes of de-duplication
of URL and the attributes of an eliminating of malicious attack. Both are considered
below-

URL de-duplication equation-


𝑝𝑐 = 𝑝𝑟 {𝑑𝑒𝑙𝑎𝑦 ≥ 𝑡𝑛𝑜𝑤 − 𝑇𝑐 }

1 (𝑥 − 𝜇𝑐 )2
= ∫ 𝑒𝑥𝑝 (−
𝜎𝑐 √2𝜋 𝑡𝑛𝑜𝑤 −𝑡𝑐 2𝜎𝑐2

Eliminating of malicious attack-

𝒅(𝒑, 𝒒) = √(𝒑𝟏 − 𝒒𝟏 )𝟐 + (𝒑𝟐 − 𝒒𝟐 )𝟐 + (𝒑𝟑 − 𝒒𝟑 )𝟐

(𝒑𝟏 − 𝒒𝟏 )𝟐 + (𝒑𝟐 − 𝒒𝟐 )𝟐 + (𝒑𝟑 − 𝒒𝟑 )𝟐


𝒅(𝒑, 𝒒) =
𝟐

The above-mentioned equation will focus on security and de-duplication of URLs

However, In the 4thstage, the proposed method Honeypot has proposed. It is found that the
honeypot method is effective in eliminating the chances of malicious attacks, which is generated
from the attacker end and finish at the system. In the stage, external attacks are focused and the
proposed method route the attacks to the virtual system rather than the actual system. Current
best solution found ineffective in a way to prevent external malicious attacks. Current best
solution found ineffective in a way to prevent external malicious attacks. However, Honeypot
might be an appropriate choice for overcoming the limitation. Hadoop method provides the
feature of detecting and control the crawling of the various web pages. It has found that there are
high chances of crawling on web pages (Amin et al. 2018).

29 | P a g e
6. Show the CONTRIBUTION of your proposed system, and the IMPORTANT (WHY) of it.
Also, compare your proposed solution with the state of art to show the contribution.

Hadoop method –

Hadoop method provides feature of detecting and control the crawling of the various web pages.
It has found that there are high chances of crawling on web pages (Amin et al. 2018). It also
enabled users to explore complex data and assisting in semi-structure data analysis. This method
used in two phases those are data distribution and process of isolation. In the state of art solution,
limitation such as duplication of URLs was covered and eliminated. This increase the efficiency
of websites as URLs duplication slightly reduced.

In the state of art solution NoSQL based algorithm proposed to manage web crawling for its
optimization. Further, NoSQL based crawling application also helps to reduce the number of
URL duplication, which causes the issue of web site inefficiency.

Limitations of URLs duplication and security issues have recognised the issues associated with
the proposals. Methods have been evaluated to overcome the limitation of malicious attacks and
honeypot method found as the most effective method to increase the security of systems.

Honeypot method has the advantage of redirecting malicious attacks to the virtual systems
(Avery & Wallrabenstein. 2018). Although, systems are found insecure from the external attacks
in the previous methods Honeypot method enables users to manage their systems by developing
replicas of actual system.

Due to high accuracy of proposed Honeypot system, future systems could be safe from the
external attacks with the low cost of resources.

30 | P a g e
7. Provide a comparison table between your proposed and state of art solutions. This table
should be built based on what you described in point 6.

Basis of differentiation Proposed solution State of Art solution

Name of technique Honeypot NoSQL based application

Level of accuracy Level of accuracy is more It has the high accuracy in


superior because of initialising of scrapping consumer behaviour
virtual system, the accuracy level with a range of up to 7-9
it seemed to be alignment in
range of 8 out of 10

Feature of work Create a virtual system for Map- reducing based


eliminating any malicious attack framework for avoiding any
duplication of URLs.
Limitation It can unfilter and harm the e- Security and malicious attack
commerce website once engaged
in attacking.
Contribution-1 Overcoming the limitations of NoSQL based crawling
URLs duplication and security application also helps to reduce
issues have recognised the issues the number of URL duplication,
associated with the proposals. which causes the issue of website
inefficiency.
Contribution 2 Creating a de-duplication and Adopting Hadoop talent bin and
eliminating malicious attack key-value store attributes.
equation.

31 | P a g e
8. You need to draw the logical flow diagram of it.

Algorithm- Invertible Bloom lookup table

Input- p and q as number of instruction and branched respectively

Output- (𝑡𝑎𝑘𝑖𝑛𝑔 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑝 𝑎𝑛𝑑 𝑞 𝑎𝑠 𝑖𝑛𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛) or 𝑛𝑢𝑙𝑙

Step1: BEGIN

Step 2: 𝑤ℎ𝑖𝑙𝑒(0 < 𝑝 𝑎𝑛𝑑 𝑞 <= 1)𝑑𝑜

Step 3: 𝑖𝑓

𝑑(𝑝, 𝑞) = √(𝑝1 − 𝑞1 )2 + (𝑝2 − 𝑞2 )2 + (𝑝3 − 𝑞3 )2

Step 4: checking process for malicious attack

Step 5: 𝑖𝑓 𝑎𝑡𝑡𝑎𝑐𝑘 𝑖𝑠 𝑚𝑎𝑙𝑖𝑐𝑖𝑜𝑢𝑠 = 𝑐ℎ𝑒𝑐𝑘

Step 7: system redirection phase.

Step 8: END

32 | P a g e
Figure (4) – Generic flow diagram of proposed system

33 | P a g e
3. Expected Results and Discussion
I. Positive outcomes and implication

The proposed model has positive outcomes after the implication of a web crawler in an e-
commerce business in order to scrape consumer for increasing business productivity. The
system has includes various tools and techniques which emerged as enhance tool to improve the
crawling efficiency for scraping in Big web data with more accurately with respect to the
tentative customers (Deka. 2018). Web crawling assists business for instead data gathering
instead of manual efforts, aids in order to compare business with the competitors business by
scrapping of competitors website. A web crawling is a good measure for a business side for
Neuromarketing approaches by adopting an updated database of tentative audience (Wu et al.
2018). Proposed scheme will indulge NoSQL based application that is including Hadoop talent
bin, key-value store, map-reduce framework and Apache Cassandra for scraping the wen data in
the proposed spider search engine (Haupt et al. 2018). Another tool that the proposed model will
be including is Honeypot model which confronts the security of URL database and fascinate
rapid attack over the jamming and malicious attack. Honeypot method has the advantage of
redirecting malicious attacks to the virtual systems. Although, systems are found insecure from
the external attacks in the previous methods Honeypot method enables users to manage their
systems by developing replicas of actual system (Deka. 2018).

34 | P a g e
II. Issue/challenges with implementation
The proposed solution is dealing with many issues after implementation of various tools
and technique in one solution. The proposed scheme is including honeypot model which
is able to redirect the attack to a Virtual device, it performs its task based on developing
replicas of actual system (Amin et al. 2018). However besides eliminating malicious
attack and ensuring security, Honeypot model have limitation based on network
forensics. Some expert concludes that the implication of honeypot is found to be an
unethical issue. Honeypot model also creating the disadvantage of enhancing Hijacking
activity or they are building hacker better. Since hackers are trying many activities to
train them and know more about the concept of honeypot model, thus for a firm honeypot
model is neither be an appropriate choice for security (Ireland & Liu. 2018). Virtual
system may compress the actual data. URL in message is broken when honeypot
establishes virtual device. (Wu et al. 2018).

35 | P a g e
Conclusion
Reiterate the purpose of the research. Summarise results/findings. Acknowledge limitations of
the research focusing on methodology, the model and implementation

The research has improved the scrapping accuracy of crawler by up to 8-10&% through
implanting proposed model of NoSQL application with Honeypot model. The study has focused
on the implementation of NoSQL based crawling application in order to provide database storage
to the crawler frontier in order to avoid duplication of any URL.

A web crawler acts as the spider in the era of web. A web crawler search web content based on
its crawling. It acts as the automated scripted program that is browsing the web in a systematic
order. A web crawling is a good measure for a business side for Neuromarketing approaches by
adopting an updated database of tentative audience.

The main of this research paper is to develop an adoption of web crawler application in the
modern web analytics tools in for website measuring in order to increase business productivity.
The implemented scheme is including NoSQL based crawling application. Map-reduce based
framework evaluated as the appropriate method that avoids any duplication of URL in the
crawler frontier.

Further, this research work found security and malicious attack problem in the current best
solution. To overcome these issues a proposed solution is being introduced by implanting
Honeypot model that of redirecting malicious attacks to the virtual systems. However further it
is also being concluded that honeypot model too engaged with many technical and ethical issues
but they can be overcome future by introducing various tools and techniques such as VMware,
improving detection configuration.

The proposed model provides the accuracy in scrapping consumer behaviour is almost 8-10%
and crawler efficiency is also being increased by 10%. Proposed model is more reliable than the
current best solution in order to increase the level of security by 15%.

36 | P a g e
Future Work
Suggest areas of research and the future direction. What needs to be done as a result of your findings
focusing on the weaknesses identified

A web crawler acts as the spider in the era of web. A web crawler search web content based on
its crawling. It acts as the automated scripted program that is browsing the web in a systematic
order. Future of web crawler is bright as it assists in the scrapping of web pages for fetching and
extraction.
In this research paper, the capability of web crawler in web analytics is being considered in the
eye for increasing business productivity. The implementation of crawler in e-commerce website
seems some weakness such as complex big data, customer engagement for online shopping, web
rating, scraping consumer behavior and classifying tentative customer, complexity in
Neuromarketing and security and privacy of the consumer.
Web crawling is a going on a project for e-commerce business sight of view. Social analytics,
web analytics, NoSQL application, cyber-infrastructure, eye tracking technology, and
Neuromarketing evaluated as the best methods that can be adopted in the future for increasing
crawling effectiveness in future

37 | P a g e
References

Alalwan, A. A. (2018). Investigating the impact of social media advertising features on customer
purchase intention. International Journal of Information Management, 42, 65-77.

AlSkaif, T., Lampropoulos, I., van den Broek, M., & van Sark, W. (2018). Gamification-based
framework for engagement of residential customers in energy applications. Energy
Research & Social Science, 44, 187-195.

Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J., & Anwar, S. (2018). Customer churns
prediction in telecommunication industry using data certainty. Journal of Business
Research.

Avery, J., &Wallrabenstein, J. R. (2018).Formally modeling deceptive patches using a game-


based approach. Computers & Security, 75, 182-190.

Balbi, S., Misuraca, M., &Scepi, G. (2018).Combining different evaluation systems on social
media for measuring user satisfaction. Information Processing & Management, 54(4),
674-685.

Ciechanowski, L., Przegalinska, A., Magnuski, M., &Gloor, P. (2018). In the shades of the
uncanny valley: An experimental study of human–chatbot interaction. Future Generation
Computer Systems.

Deka, G. C. (2018). NoSQL Web Crawler Application.In Advances in Computers (Vol. 109, pp.
77-100).Elsevier.

Duarte, P., e Silva, S. C., & Ferreira, M. B. (2018). How convenient is it? Delivering online
shopping convenience to enhance customer satisfaction and encourage e-WOM. Journal
of Retailing and Consumer Services, 44, 161-169.

Fatehkia, M., Kashyap, R., & Weber, I. (2018).Using Facebook ad data to track the global digital
gender gap. World Development, 107, 189-209.

García-Dorado, J. L., Ramos, J., Rodríguez, M., & Aracil, J. (2018). DNS weighted footprints
for web browsing analytics. Journal of Network and Computer Applications, 111, 35-48.
38 | P a g e
Haupt, J., Bender, B., Fabian, B., &Lessmann, S. (2018). Robust identification of email tracking:
A machine learning approach. European Journal of Operational Research.

Hwangbo, H., Kim, Y. S., & Cha, K. J. (2018).Recommendation system development for fashion
retail e-commerce. Electronic Commerce Research and Applications, 28, 94-101.

Ireland, R., & Liu, A. (2018). Application of data analytics for product design: Sentiment
analysis of online product reviews. CIRP Journal of Manufacturing Science and
Technology.

Kobusińska, A., Pawluczuk, K., &Brzeziński, J. (2018).Big Data fingerprinting information


analytics for sustainability. Future Generation Computer Systems.

Lee, I. (2018). Social media analytics for enterprises: Typology, methods, and
processes. Business Horizons, 61(2), 199-210.

Li, W., Wang, S., & Bhatia, V. (2016).PolarHub: A large-scale web crawling engine for OGC
service discovery in cyberinfrastructure. Computers, Environment and Urban
Systems, 59, 195-207.

Liu, J. W. (2018). Using big data database to construct new GFuzzy text mining and decision
algorithm for targeting and classifying customers. Computers & Industrial Engineering.

Liu, Y. Y., Tseng, F. M., & Tseng, Y. H. (2018).Big Data analytics for forecasting tourism
destination arrivals with the applied Vector Autoregression model. Technological
Forecasting and Social Change, 130, 123-134.

Muñoz-Leiva, F., Hernández-Méndez, J., & Gómez-Carmona, D. (2018).Measuring advertising


effectiveness in Travel 2.0 websites through eye-tracking technology. Physiology
&behavior.

Nakano, S., & Kondo, F. N. (2018).Customer segmentation with purchase channels and media
touchpoints using single source panel data. Journal of Retailing and consumer
services, 41, 142-152.

39 | P a g e
Rekik, R., Kallel, I., Casillas, J., &Alimi, A. M. (2018).Assessing websites quality: A systematic
literature review by text and association rules mining.International Journal of
Information Management, 38(1), 201-216.

Saleheen, S., & Lai, W. (2018).UIWGViz: An architecture of user interest-based web graph
visualization. Journal of Visual Languages & Computing, 44, 39-57

Serrano, W., &Gelenbe, E. (2018). The Random Neural Network in a neurocomputing


application for Web search. Neurocomputing, 280, 123-134.

Stieglitz, S., Mirbabaie, M., Ross, B., & Neuberger, C. (2018).Social media analytics–
Challenges in topic discovery, data collection, and data preparation. International
Journal of Information Management, 39, 156-168.

Suchacka, G., &Wotzka, D. (2017).Modeling a non-stationary bots’ arrival process at an e-


commerce Web site. Journal of computational science, 22, 198-208.

Wu, P. J., & Lin, K. C. (2018).Unstructured big data analytics for retrieving e-commerce
logistics knowledge. Telematics and Informatics, 35(1), 237-244.

40 | P a g e

Вам также может понравиться