Академический Документы
Профессиональный Документы
Культура Документы
A BACHELORS THESIS
Submitted in partial fulfillment of the requirements for the degree
Of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
Submitted by
SACHIN JAIN
IIT2008064
CANDIDATES DECLARATION
I, do hereby, declare that the work presented in this thesis titled DEVELOPING GUI
BASED PLATFORM FOR PLUGIN INTEGRATION ON WEB PAGES, submitted in
the partial fulfillment of the requirement of the degree of Bachelor of Technology in
Information Technology at Indian Institute of Information Technology, Allahabad is an
authentic record of my original work carried out under the guidance of DR. MANISH
KUMAR (IIIT-A) and MR. VINEET SHARMA (ADOBE SYSTEMS)
Place : ALLAHABAD
SACHIN JAIN
Date : 26-07-2012
IIT2008064
I do hereby recommend that the thesis work prepared under my supervision by SACHIN
JAIN be accepted in the partial fulfillment of the requirement of the degree of BACHELOR
OF TECHNOLOGY IN INFORMATION TECHNOLOGY for evaluation.
_________________________
SIGNATURE OF THE THESIS SUPERVISOR
______________________________
COUNTERSIGNED BY THE DIVISIONAL HEAD
Place: ALLAHABAD
Date: 26-07-2012
ACKNOWLEDGEMENTS
It is my privilege to express my sincerest regards to our project coordinator, DR. MANISH
KUMAR, MR. VINEET SHARMA, MR. SURYADEEP AGARWAL and MR. RAJIV
MANGLA for their valuable inputs, able guidance, encouragement, whole-hearted
cooperation and constructive criticism throughout the duration of our project. They have been
very helpful throughout the project proceedings.
I would also like to thank my colleagues at the company without whom the project wont
have been successful. Their constant support and help has been extremely valuable for the
project completion.
Place : NOIDA
SACHINJAIN
Date : 26-07-2012
(IIT2008064)
ABSTRACT
The proposed work is part of a project that aims for the development for a GUI based
platform where plugin providers can provide their plugins and website administrators can
simply use this platform to integrate plugins into their website without knowing anything
about plugin or without even knowing about the code in the webpage where plugin needs to
be integrated. The project aims at the reduction of manual efforts in including plugins on
every page on the website. Interface of platform is kept so simple that even a non-technical
guy will be able to place plugins on his website. In this document we implement several
algorithms to optimize each and every aspect of the application and which can reduce human
effort and provide a much better user experience.
Table Of Contents
Candidates Declaration 2
Certificate ....3
Acknowledgement4
Abstract.5
Contents.6
Chapter 1: Introduction.8-16
1.1 Background
1.2 Literature Survey
1.3 Process of adding plug-ins
1.4 Formulation of the present problem
Chapter 2: Frameworks used17-32
2.1 Hardware
2.2 Framework and Software used
2.3 Database used
(Theoretical Developments)
Chapter 3: Loading a webpage in I-Frame33-39
3.1 Cross Domain Resource Sharing
3.2 Various types of XSS attacks
3.3 JSONP
3.4 CORS Filter
LIST OF FIGURES
Chapter 1
Fig. 1.1 Use of Social Plugins (Connect) on Web pages12
Fig. 1.2: Use of Like And +1 Plugins in E-Commerce Product Pages..13
Fig. 1.3: Steps of Integrating A Plugins on Webpage...13
Fig. 1.4: Step1 - Configuring Plug-Ins....14
Fig. 1.5: Step2 - Fetch Code of Customized Plug-Ins.15
Chapter 2
Fig. 2.1: Example of Maven Dependency in Spring Frameworks.20
Fig. 2.2: Model and View Controller.21
Chapter 3
Fig.3.1: Example of JSONP37
Fig. 3.2: Use of CORS Filter to Solve Cross Site Resource Sharing Problem..39
Fig. 3.3: Activity Diagram to Load A Webpage in I-Frame41
Outline of work
Fig. 1 Flowchart of Fetching Plug-Ins Using Similar Pages46
Snapshots
Fig. 2 Snapshots of Social Plug-Ins..48
Fig.3 Snapshot Of Webpage without Plug-Ins49
Fig.4 Snapshot of Webpage after Placing Add to wish list Plug-Ins..49
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
A Plug-in is a set of software components that adds specific abilities to a large software
application. If supported plug-in enable customizing the functionality of an application. For
example, plug-ins are commonly used in web browsers to play video, scan for viruses, and
display new file types. Well-known plug-ins examples include Adobe Flash Player,
QuickTime, and Oxy-tube.
It can be seen that almost every website today use plug-ins in some way whether it be social
plug-ins which connect the website or webpage to social media or some other kind of plugins which extend the websites functionality in some way. But no social platform gives an
easy way to put these plug-ins into their system. E-commerce is one of the popular business
streams today. Till date, hundreds of ecommerce websites have been grown up and came into
existence.
Electronic commerce refers to buying and selling of products or services over Internet. These
days, the amount of trade conducted electronically has grown extraordinarily with widespread
Internet usage. We have more than hundred virtual stores running on web every day. There
are many advantages with online shopping like convenience. Online stores are usually
available 24 hours which is not the case with physical store. Virtual stores also provide
service of delivering items at home which can save a lot of time for the user. Another
advantage of virtual stores is reviews about a particular product from other customers so
before purchasing a product user can refer to the reviews from the customers who previously
purchased the same product. This helps customer purchasing an item which meets his needs
fully. Price comparison from various virtual stores is also very easy as compared to
comparing prices of a product in different physical stores. Besides many advantages, virtual
stores also suffer from many disadvantages. Fraud and security are the main concerns. SSL
encryption has generally solved the problem of credit card numbers being intercepted in
transit between consumer and merchant. Phishing is also one of the hacking techniques which
are used to cheat customers. In spite of these problems online shopping is quite successful in
the present time.
10
Social commerce [1] is a subset of electronic commerce that involves social media, online
media that supports social interaction and user contributions to assist in the online buying and
selling of products and services. The term social commerce was introduced by Yahoo! In
November 2005 to describe a set of online collaborative shopping tools such as shared pick
lists, user ratings and other user-generated content-sharing of online product information. The
concept of social commerce was developed by David Biesel to include collaborative ecommerce tools that enable shoppers to get advice from trusted individuals, find goods and
services and then purchase them. Today, the area of social commerce has been expanded to
include the range of social media tools and content used in the context of e-commerce
especially in the fashion industry. Examples of social commerce include customer ratings and
reviews, user recommendations and referrals, social shopping tools (sharing the act of online
shopping), forums and communities, social advertising.
Use of social plug-ins in ecommerce websites is the next big thing. Even the process has
already started with many of the websites especially US based e-commerce websites are
using social plug-ins to move to ecommerce 2.0 which is also known as social commerce.
Maybe one day someone will crack the magical code and create a true social commerce
platform. Still, even then, the platform will need to be built on a firm foundation of Social
Networks and Commerce. Until then, it makes sense that plug-ins will be the key to building
a successful Social Commerce business.
11
Amazon is e-commerce 1.0, with more than 10 years worth of products on a site. What we
are moving toward now is e-commerce 2.0, which is more about discovering and
browsing[3]- said by Jason Goldberg CEO. Social media is not only about facebook pages
and twitter accounts. Its about building the community, facilitating conversations, listening
and responding.
26% chose to sign up to a website using social media. Visitors who login with their social
media profiles are five times more likely to make purchases than those who create accounts
on their site[3] told by CEO Randall Weidberg, GiantNerd.com.
Facebook also provides various plug-ins to be placed on sites which can increase the user
engagement but information from these plug-ins cant be processed to improve
personalization and re-targeting of users.
Connect with facebook[6] plug-in asks for users to connect with facebook and provides
extra discount for doing the same. Basically, it tempts the users for discounts and use their
customer as a potential seller and advertise the company name with the help of social media.
12
13
14
15
5. There is no way by which administrator can customize the plug-in settings like color,
size according to the needs of website. He has to try hit and trial method which is
quite time consuming.
16
same page structure. And then we will implement the concept of one done all done. This
means the merchant needs to place plug-in on only one product page and since all other
product pages are applied plug-ins in the same way so our algorithm will handle thousands of
similar pages on his website which saves days of manual effort.
17
CHAPTER 2
FRAMEWORKS USED
2.1 HARDWARE
The platform developed needs to be hosted on a web server. So as a hardware only a system
is required with web server running on it.
Web Server
A Web server[8] can refer to either the hardware (the computer) or the software (the
computer application) that helps to deliver Web content that can be accessed through the
Internet. The primary function of a web server is to deliver web pages on the request to
clients using the Hypertext Transfer Protocol (HTTP). This means delivery of HTML
documents and any additional content that may be included by a document, such as images,
style sheets and scripts.
A user agent, commonly a web browser or web crawler, initiates communication by making a
request for a specific resource using HTTP and the server responds with the content of that
resource or an error message if unable to do so. The resource is typically a real file on the
server's secondary memory, but this is not necessarily the case and depends on how the web
server is implemented.
While the primary function is to serve content, a full implementation of HTTP also includes
ways of receiving content from clients. This feature is used for submitting web forms,
including uploading of files.
2.2
Spring Framework
The Spring Framework[9] provides a comprehensive programming and configuration model
for modern Java-based enterprise applications - on any kind of deployment platform. A key
element of Spring is infrastructural support at the application level: Spring focuses on the
18
Spring includes:
Advanced support for aspect-oriented programming with proxy-based and Aspectbased variants.
Powerful abstractions for working with common Java EE specifications such as JDBC
First-class support for common open source frameworks such as Hibernate and
Quartz
A flexible web framework for building Restful MVC applications and service
endpoints
Rich testing facilities for unit tests as well as for integration tests
Spring is modular in design, allowing for incremental adoption of individual parts such as the
core container or the JDBC support. While all Spring services are a perfect fit for the Spring
core container, many services can also be used in a programmatic fashion outside of the
container.
Supported deployment platforms range from standalone applications to Tomcat and Java EE
servers such as WebSphere. Spring is also a first-class citizen on major cloud platforms with
Java support, e.g. on Heroku, Google App Engine, Amazon Elastic Beanstalk and VMware's
Cloud Foundry.
19
The Spring Framework serves as the foundation for the wider family of Spring open source
projects, including:
Spring Security
Spring Integration
Spring Batch
Spring Data
Spring Mobile
Spring Social
Spring Android
20
21
It is possible to create both in-memory tables, as well as disk-based tables. Tables can be
persistent or temporary. Index types are hash table and tree for in-memory tables, and b-tree
for disk-based tables. All data manipulation operations are transactional. Table level locking
and multisession concurrency control are implemented. The 2-phase commit protocol is
supported as well, but no standard API for distributed transactions is implemented. The
security features of the database are: role based access rights, encryption of the password
using SHA-256 and data using the AES or the Tiny Encryption Algorithm, XTEA. The
cryptographic features are available as functions inside the database as well. SSL / TLS
connections are supported in the client-server mode, as well as when using the console
application.
Two full text search implementations are included, a native implementation and one using
Lucene.A simple form of high availability is implemented: when used in the client-server
mode, the database engine supports hot failover (this is commonly known as clustering).
However, the clustering mode must be enabled manually after a failure.[5]The database
supports protection against SQL injection by enforcing the use of parameterized statements.
In H2, this feature is called 'disabling literals.
JavaScript
JavaScript[11] (sometimes abbreviated JS) is a prototype-basedscripting language that is
dynamic, weakly typed and has first-class functions. It is a multi-paradigm language,
supporting object-oriented,[5]imperative, and functional[1][6] programming styles.
JavaScript's use in applications outside web pages. For example in PDF documents, sitespecific browsers, and desktop widgets is also significant. Newer and faster JavaScript VMs
and frameworks built upon them (notably Node.js) have also increased the popularity of
JavaScript for server-side web applications.
JavaScript uses syntax influenced by that of C. JavaScript copies many names and naming
conventions from Java, but the two languages are otherwise unrelated and have very different
semantics. The key design principles within JavaScript are taken from the Self and Scheme
programming languages.
22
The most common use of JavaScript is to write functions that are embedded in or included
from HTML pages and that interact with the Document Object Model (DOM) of the page.
Some simple examples of this usage are:
Loading new page content or submitting data to the server via AJAX without reloading the
page (for example, a social network might allow the user to post status updates without
leaving the page)
Animation of page elements, fading them in and out, resizing them, moving them, etc.
Interactive content, for example games, and playing audio and video
Validating input values of a web form to make sure that they are acceptable before being
submitted to the server.
Transmitting information about the user's reading habits and browsing activities to various
websites. Web pages frequently do this for web analytics, ad tracking, personalization or
other purposes.
Because JavaScript code can run locally in a user's browser (rather than on a remote server),
the browser can respond to user actions quickly, making an application more responsive.
Furthermore, JavaScript code can detect user actions which HTML alone cannot, such as
individual keystrokes. Applications such as Gmail take advantage of this: much of the userinterface logic is written in JavaScript, and JavaScript dispatches requests for information
(such as the content of an e-mail message) to the server. The wider trend of Ajax
programming similarly exploits this strength.
A JavaScript engine (also known as JavaScript interpreter or JavaScript implementation) is an
interpreter that interprets JavaScript source code and executes the script accordingly. The first
JavaScript engine was created by Brendan Eich at Netscape Communications Corporation,
for the Netscape Navigatorweb browser. The engine, code-named SpiderMonkey, is
implemented in C. It has since been updated (in JavaScript 1.5) to conform to ECMA-262
Edition 3. The Rhino engine, created primarily by Norris Boyd (formerly of Netscape; now at
Google) is a JavaScript implementation in Java. Rhino, like SpiderMonkey, is ECMA-262
Edition 3 compliant.
A web browser is by far the most common host environment for JavaScript. Web browsers
typically use the public API to create "host objects" responsible for reflecting the Document
Object Model (DOM) into JavaScript. The web server is another common application of the
23
engine. A JavaScript webserver would expose host objects representing an HTTP request and
response objects, which a JavaScript program could then manipulate to dynamically generate
web pages.
Because JavaScript is the only language that the most popular browsers share support for, it
has become a target language for many frameworks in other languages, even though
JavaScript was never intended to be such a language. Despite the performance limitations
inherent to its dynamic nature, the increasing speed of JavaScript engines has made the
language a surprisingly feasible compilation target
Cross-site vulnerabilities
A common JavaScript-related security problem is cross-site scripting, or XSS, a violation of
the same-origin policy. XSS vulnerabilities occur when an attacker is able to cause a target
web site, such as an online banking website, to include a malicious script in the webpage
presented to a victim. The script in this example can then access the banking application with
the privileges of the victim, potentially disclosing secret information or transferring money
without the victim's authorization. A solution to XSS vulnerabilities is to use HTML escaping
whenever displaying untrusted data.
Some browsers include partial protection against reflected XSS attacks, in which the attacker
provides a URL including malicious script. However, even users of those browsers are
vulnerable to other XSS attacks, such as those where the malicious code is stored in a
database. Only correct design of Web applications on the server side can fully prevent
XSS.XSS vulnerabilities can also occur because of implementation mistakes by browser
authors.
Another cross-site vulnerability is cross-site request forgery or CSRF. In CSRF, code on an
attacker's site tricks the victim's browser into taking actions the user didn't intend at a target
site (like transferring money at a bank). It works because, if the target site relies only on
cookies to authenticate requests, then requests initiated by code on the attacker's site will
carry the same legitimate login credentials as requests initiated by the user. In general, the
solution to CSRF is to require an authentication value in a hidden form field, and not only in
the cookies, to authenticate any request that might have lasting effects. Checking the HTTP
Referrer header can also help.
24
"JavaScript hijacking" is a type of CSRF attack in which a <script> tag on an attacker's site
exploits a page on the victim's site that returns private information such as JSON or
JavaScript. Possible solutions include:
1) requiring an authentication token in the POST and GET parameters for any response
that returns private information
2) using POST and never GET for requests that return private information
Browser and plug-in coding errors
JavaScript provides an interface to a wide range of browser capabilities, some of which may
have flaws such as buffer overflows. These flaws can allow attackers to write scripts which
would run any code they wish on the user's system.These flaws have affected major browsers
including Firefox, Internet Explorer, and Safari.
Plug-ins, such as video players, Adobe Flash, and the wide range of ActiveX controls enabled
by default in Microsoft Internet Explorer, may also have flaws exploitable via JavaScript, and
such flaws have been exploited in the past.
In Windows Vista, Microsoft has attempted to contain the risks of bugs such as buffer
overflows by running the Internet Explorer process with limited privileges.Google Chrome
similarly limits page renderers in its own "sandbox".
Jquery
jQuery[12] is a fast and concise JavaScript Library that simplifies HTML document
traversing, event handling, animating, and Ajax interactions for rapid web development.
jQuery is designed to change the way that you write JavaScript.
1) Lightweight Footprint
2) CSS3 Compliant
3) Cross-browser
jQuery is free, open source software, dual-licensed under the MIT License or the GNU
General Public License, Version 2. jQuery's syntax is designed to make it easier to navigate
a document, select DOM elements, create animations, handle events, and develop Ajax
applications. jQuery also provides capabilities for developers to create plug-ins on top of the
25
JavaScript library. This enables developers to create abstractions for low-level interaction and
animation, advanced effects and high-level, theme-able widgets. The modular approach to the
jQuery library allows the creation of powerful dynamic web pages and web applications.
Features:
jQuery includes the following features:
1) DOM element selections using the cross-browser open source selector engine Sizzle,
fallbacks for older ones - For example the inArray() and each() functions.
10) Cross-browser support
26
often called commands, are chainable as they all return jQuery objects.
2) via $.-prefixed functions. These are utility functions which do not work on the jQuery
27
1) scrape and parse HTML from a URL, file, or string
2) find and extract data, using DOM traversal or CSS selectors
3) manipulate the HTML elements, attributes, and text
4) clean user-submitted content against a safe white-list, to prevent XSS attacks
5) output tidy HTML
jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and
validating, to invalid tag-soup; jsoup will create a sensible parse tree.
Example :
1) Fetch the Wikipedia homepage
2) parse it to a DOM
3) select the headlines from the in the news section into a list of Elements:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Ext-JS
Ext JS[14] is a pure JavaScript application framework for building interactive web
applications using techniques such as Ajax, DHTML and DOM scripting. GUI controls. Ext
JS includes a set of GUI-based form controls (or "widgets") for use within web applications:
1) text field and textarea input controls
2) date fields with a pop-up date-picker
3) numeric fields
4) list box and combo boxes
5) radio and checkbox controls
6) html editor control
7) grid control (with both read-only and edit modes, sort able data, lockable and drag
28
12) region panels to allow a form to be divided into multiple sub-sections
13) sliders
14) vector graphics charts
Many of these controls are able to communicate with a web server using Ajax.
JSON
JSON [15] or JavaScript Object Notation is a lightweight text-based open standard designed
for human-readable data interchange. It is derived from the JavaScript scripting language for
representing simple data structures and associative arrays, called objects. Despite its
relationship to JavaScript, it is language-independent, with parsers available for many
languages.
The JSON format was originally specified by Douglas Crockford. The official Internet media
type for JSON is application/json. The JSON filename extension is .json.
The JSON format is often used for serializing and transmitting structured data over a network
connection. It is used primarily to transmit data between a server and web application,
serving as an alternative to XML.
Example:
{
"firstName": "John",
"lastName" : "Smith",
"age"
: 25,
"address" :
{
"streetAddress": "21 2nd Street",
"city"
: "New York",
"state"
: "NY",
"postalCode" : "10021"
29
},
"phoneNumber":
[
{
"type" : "home",
"number": "212 555-1234"
},
{
"type" : "fax",
"number": "646 555-4567"
}
]
}
30
Schema of JSON
There are several ways to verify the structure and data types inside a JSON object, much like
an XML schema; however unlike XML schema, JSON schemas are not widely used.
Additionally JSON Schema has to be written manually; unlike XML, there are currently no
tools available to generate a JSON schema from JSON data.
XML
XML has been used to describe structured data and to serialize objects. Various XML-based
protocols exist to represent the same kind of data structures as JSON for the same kind of
data interchange purposes. When data is encoded in XML, the result is typically larger than
an equivalent encoding in JSON, mainly because of XML's closing tags. Yet, if the data is
compressed using an algorithm like gzip there is little difference because compression is
good at saving space when a pattern is repeated.
XML there are alternative ways to encode the same information because some values can be
represented both as child nodes and attributes. This can make automated data exchange
complicated unless the used XML format is strictly specified as programs need to deal with
many different variations of the data structure. Both of the following XML examples carry
the same information as the JSON example above in different ways.
<Person>
<firstName>John</firstName>
<lastName>Smith</lastName>
<age>25</age>
<address>
<streetAddress>21 2nd Street</streetAddress>
<city>New York</city>
<state>NY</state>
<postalCode>10021</postalCode>
</address>
<phoneNumbers>
<phoneNumbertype="home">212 555-1234</phoneNumber>
<phoneNumbertype="fax">646 555-4567</phoneNumber>
</phoneNumbers>
31
</person>
<personfirstName="John"lastName="Smith"age="25">
<addressstreetAddress="21 2nd Street"city="New York"state="NY"postalCode="10021"/>
<phoneNumbers>
<phoneNumbertype="home"number="212 555-1234"/>
<phoneNumbertype="fax"number="646 555-4567"/>
</phoneNumbers>
</person>
The XML encoding may therefore be shorter than the equivalent JSON encoding. A wide
range of XML processing technologies exist, from the Document Object
Model to XPath and XSLT. XML can also be styled for immediate display
using CSS.XHTML is a form of XML so that elements can be passed in this form ready for
direct insertion into webpages using client-side scripting.
Which is better: XML or JSON?
1) The XML format is more advanced than shown by the example, though. You can for
example add attributes to each element, and you can use namespaces to partition
elements. There are also standards for defining the format of an XML file, the
XPATH language to query XML data, and XSLT for transforming XML into
presentation data.
2) The XML format has been around for some time, so there is a lot of software
developed for it. The JSON format is quite new, so there is a lot less support for it.
3) While XML was developed as an independent data format, JSON was developed
specifically for use with JavaScript and AJAX, so the format is exactly the same as a
JavaScript literal object.
4) JSON parsing is generally faster than XML parsing.
5) JSON is a more compact format, meaning it weighs far less on the wire than the more
verbose XML.
6) Formatted JSON is generally easier to read than formatted XML.
7) JSON specifies how to represent complex data types, there is no single best way to
represent a data structure in XML.
32
Example :
JSON object { "foo": { "bar": "baz" } } could be represented in XML as
<foo bar="baz"> or <foo><bar>baz</bar></baz> or <object name="foo">< property
name="bar">baz</property ></object>
33
CHAPTER 3
LOADING A THIRD-PARTY WEBPAGE IN IFRAME
3.1 CROSS DOMAIN RESOURCE SHARING
Iframes are often used to load third party content, ads and widgets. The main reason to use
the iframe technique is that the iframe content can load in parallel with the main page: it
doesn't block the main page. Loading content in an iframe does however have two downsides
1) Iframes block onload of the main page
2) The main page and iframe share the same connection pool
But the problem will die out very soon if we dont want to put our JS in the webpage in out
iframe. So there was a need to develop some methodology using which we can open a third
party webpage in our iframe with our javascript loaded in his webpage. Another problem
rises as we need to solve cross site scripting issues as we are rendering a page in our iframe
which is having source address different from ours. Basically Cross-site scripting uses known
vulnerabilities in web-based applications, their servers, or plug-in systems they rely on.
Exploiting one of these, they fold malicious content into the content being delivered from the
compromised site. When the resulting combined content arrives at the client-side web
browser, it has all been delivered from the trusted source, and thus operates under the
permissions granted to that system. By finding ways of injecting malicious scripts into web
pages, an attacker can gain elevated access-privileges to sensitive page content, session
cookies, and a variety of other information maintained by the browser on behalf of the user.
Cross-site scripting attacks are therefore a special case of code injection.
Exploit cases using XSS
Attackers intending to exploit cross-site scripting[16] vulnerabilities must approach each
class of vulnerability differently. For each class, a specific attack vector is described here.
The names below are technical terms, taken from the cast of characters commonly used in
computer security.
34
3.3 JSON-P
JSON is a lightweight data-interchange format. It was formally standardized by Douglas
Crockford, and since has been received almost universally as a simple and powerful
representation of data for transmission between two entities, regardless of what computer
language those entities run in natively.
35
One such mechanism which can request content cross-domain is the <script> tag. JSONwith-padding) [17] is used as a way to leverage this property of <script> tags to be able to
request data in the JSON format across domains. JSON-P works by making a <script>
element (either in HTML markup or inserted into the DOM via JavaScript), which requests to
a remote data service location. The response (the loaded "JavaScript" content) is the name of
a function pre-defined on the requesting web page, with the parameter being passed to it
being the JSON data being requested. When the script executes, the function is called and
passed the JSON data, allowing the requesting page to receive and process the data.
Example:
The problem
Thus far, JSON-P has essentially just been a loose definition by convention, when in reality
the browser accepts any abitrary JavaScript in the response. This means that authors who rely
on JSON-P for cross-domain Ajax are in fact opening themselves up to potentially just as
much mayhem as was attempted to be avoided by implementing the same-origin policy in the
first place. For instance, a malicious web service could return a function call for the JSON-P
portion, but slip in another set of JavaScript logic that hacks the page into sending back
private user's data, etc.
JSON-P is, for that reason, seen by many as an unsafe and hacky approach to cross-domain
Ajax, and for good reason. Authors must be diligent to only make such calls to remote web
services that they either control or implicitly trust, so as not to subject their users to harm.
36
functionName({JSON});
obj.functionName({JSON});
obj["function-name"]({JSON});
The intention is that only a single expression (function reference, or object property function
reference) can be used for the function ("padding") reference of the JSON-P response, and
must be immediately followed by a single ( ) enclosing pair, inside of which must be a strictly
valid and parse able JSON object. The function call may optionally be followed by one ;
semi-colon. No other content, other than whitespace or valid JavaScript comments, may
appear in the JSON-P response, and whitespace and comments must be ignored by the
browser JavaScript parser (as would normally be the case).
The most critical piece of this proposal is that browser vendors must begin to enforce
this rule for script tags that are receiving JSON-P content, and throw errors (or at least
stop processing) on any non-conforming JSON-P content.
In order for the browser to be able to know when it should apply such content-filtering to
what
might
37
consuming data from web services which do not yet support CORS (or for which the author
does not want to use CORS for whatever reason).
It's also recognized that this stricter definition may cause some "JSON-P" transmissions,
which rely on the looser interpretation of just arbitrary JavaScript content, to fail. But this
could easily be avoided by having the author (and the server) avoid referring to that content
with the strictly defined JSON-P MIME-types as described above, which would then prevent
the browser from selectively turning on such filtering.
CORS Filter is the first universal solution for fitting Cross-Origin Resource Sharing (CORS)
support to Java web applications. CORS is a recent W3C effort to introduce a standard
mechanism for enabling cross-domain requests in web browsers and participating servers.
38
Security
Bear in mind that CORS is not about providing server-side security. The controls that it
imposes are primarily to protect the browser, and more specifically - the legitimate JavaScript
apps that run in it as well as any confidential user data (cookies) from some cross-site
exploits. Remember, after all, that the Origin request header is supplied by the browser and
the server has no direct means to verify it.
FIG. 3.2 : Use of CORS Filter to solve cross site resource sharing problem
The CORS Filter, as the name implies, implements the clever javax.servlet.Filter interface. It
intercepts incoming HTTP requests and if they are identified as cross-origin, it applies the
proper CORS policy and headers, before passing them on to the actual targets (servlets, JSPs,
static XML/HTML documents).
This transparent nature of the CORS Filter makes it very easy to retrofit existing Java web
services with a CORS capability. Just put the CORS JAR file into your CLASSPATH and
enable it with a few lines of XML in your web.xml file. The CORS Filter implementation is
extremely efficient too - it takes less than 25K of bytecode.
39
implemented by us. We can augment the proxy server code to process the document (or
webpage) and insert the javascript link in the page.
We also need to process the document at proxy side or after receiving in iframe. I have
processed the document at proxy level itself because it reduces the document size which is
shipped over the network and client side work gets reduced. So the processing of document
means removal of hrefs, stopping all the click events, updating the links and sources of
image, java scripts, style sheets to absolute from relative paths so that page when make
request from iframe for a resource, that request can be completed.
40
CHAPTER 4
ALGORITHM FOR LOCATING AN HTML DOM OBJECT
ON WEBPAGE
Locating an element on web page
HTML provides an attribute named as id which is used to identify a particular element on
the web page. The problem occurs when we want to locate an element having no id attribute
specified. So we need to develop some algorithm to find out an element when id attribute is
not specified.
41
42
CHAPTER 5
ALGORITHM TO IDENTIFY WHETHER TWO
DOCUMENTS ARE STRUCTURALLY IDENTICAL
Algorithm for identifying whether two web pages are structurally identical
There are two approaches by using which we can identify the structural identicalness of two
documents. These are:
1) URL structure
2) DOM structure
Generally, all the e-commerce websites having similar layout pages have some common URL
structure.
For example, analyze these some of the e-commerce websites URL structure-
43
http://www.flipkart.com/blackberry-pocket-8520-9780black/p/itmczbu4fzyxwrxz?pid=ACCCYH7XM6H6AAJ4&ref=8016029b-c67a-4621-a5eaf379b65cdbf7
But there is no information as such which can be generalized. Although this kind of
comparison will be quite fast but such algorithm is not scalable and is not applicable to all ecommerce websites.
Moving on to second approach i.e. utilizing DOM structure which seems more appropriate as
there is so much information which can be utilized like the no of elements, location of
specific elements etc. This approach seems to work for all websites.
Algorithm for Similar Page Matching using DOM Structure
1) Fetch all elements of doc1 and doc2.
2) Traverse both the lists and when tagName matches, then compare the elements with
id.
3) if Id matches, then recursively call the isSimilar function for these two elements.
4) After removing all the elements with same id in both documents, we compare the and
their parent tag names as well as their parent class names.
5) If all these things match, then we suspect are these similar element and we recursively
call the isSimilar method over these two elements
6) After removal of these elements we are left with all those elements which are either
additional in one of the documents.
7) But still some of the elements are common but without any identifier like class or id.
8) So this time we make use of location parameter. All the elements which have same
tag name and same location and same parent are called recursively to check if they are
similar.
9) Criteria to identify if the elements are similar, if after removal of similar elements the
final number reduces to 10% of its initial number of elements.
10) Then we say both of the documents are structurally similar.
44
OUTLINE OF WORK
The various steps in the entire problem can be summarized and divided into the following
broad headings:
1) Developing the common social connect plug-ins (like, login, wishlist).
2) Developing a GUI based platform where any merchant can open his webpage which
includes
3) Developing an intermediate proxy server.
4) Using Jsoup HTML Java parser to process the document.
5) Developing a javascript to be inserted by proxy which shows selected element on
webpage.
6) Developing a javascript which shows all the plug-ins available and sends the information
corresponding to the plug-in selected and placed by the website administrator on his
website to the H2 database.
7) Developing a javascript which merchant will include in his website after placing the plugin which queries the database for the plug-ins corresponding to the URL.
8) Developing an algorithm to find whether two html pages are structurally identical or not.
This algorithm is used to plug-inize all the pages similar to the page on which merchant
has placed the plug-ins.
Database Schema
1. Table FBUserActivity (userId, action, objectName, objectUrl, activityTime, client );
2. Table SimilarPages(pageUrl, baseUrl);
3. Table WidgetStore(pageUrl, widgetName, widgetId, JSONdata);
4. Table pageSketchStore(pageUrl, pageSketch);
45
46
The following are some of the observations made with respect to usage of spritesheets
1) Current Approach: Integrating each plug-in into a webpage is itself a very time
consumable job. Imagine what life becomes when you have to repeat these steps for
thousands of times. So current approach is not well suited as it is very time
consumable and requires a lot of manual work.
2) Give live demo of how page looks after integrating plugin: This feature in our
algorithm allows the merchant to get a live response from his website about how he
should customize his plug-in. He does not need to set the properties again and again
and check the interface separately. All these steps have been integrated into one and
interface of integrating plug-in becomes a lot easy.
3) One done all done algorithm saves a lot of time and manual effort: This
significantly decreases manual effort and a lot of time for the merchants. Now a
merchant is set to integrate social plug-ins into his website and his website is up and
ready within minutes. While earlier it required hours and days of time to do the same
process.
developed an algorithm which takes an html DOM object as input and encode the
element such that using the encoded information we can reach the element. This
algorithm will work even if the html dom object does not contain id attribute. This
algorithm is inspired from the concept of xpath but it also has certain limitations.
Since it is based on concept of xpath this algorithm will not work if the document
changes or even if the document obhects are shuffled. Any modification in the
document which leads to any disturbance in path of the element stored will make this
algorithm fail. But since ecommerce product pages are not changed frequently so this
algorithm works well for such cases.
47
SNAPSHOTS
48
49
CONCLUSION
Integrating plug-ins into an e-commerce site was never so easy. Imagine what life becomes
when you have to insert some piece of code for thousands of times and you also need to find
track of where to place the plug-ins i.e where the respective code will lie.So current approach
is not well suited as it is very time consumable and requires a lot of manual work. Still many
e-commerce websites have to do the same process as there is no other alternative in the
market. So here we step in with our new enriched and advanced plug-in integration
algorithm. Our algorithm has two main features. Firstly merchant can visualize the page and
customize the plug-in settings accordingly. This feature in our algorithm allows the merchant
to get a live response from his website about how he should customize his plug-in. He does
not need to set the properties again and again and check the interface separately. All these
steps have been integrated into one and interface of integrating plug-in becomes a lot easy.
Secondly, we have tried to reduce manual effort by significant amount. One done all done
algorithm significantly decreases manual effort and a lot of time for the merchants. Now a
merchant is set to integrate social plug-ins into his website and his website is up and ready
within minutes. While earlier it required hours and days of time to do the same process.
50
Integration of Search
Currently, merchant pays a lot of money to search engines and provide a list of keywords and
corresponding url. So whenever a user comes from the search engine search which is of two
types organic and inorganic (paid search) These paid search results are not fully correct so
there are high chances that some users instantly leave the site when they do not see the
ecpected the products. So in case such a widget comes into market integration of these kinds
of widgets will be very easy using our platform. There is no need to dive into the code.
Currently work is going on this and will be seen in near future.
Improve performance of similar pages algorithm
We have developed an algorithm which utilizes the DOM structure to identify the structural
identical nature of two documents. Using this algorithm we can tell whether two documents
are structurally identical or not. Tbis algorithm is highly useful when we integrate plug-ins
into a website which have a lot of pages with same structure. For example: take an
ecommerce website. All the product pages have same structure and all the category pages
have same structure. So we just need to integrate plug-ins into two page i.e. one product page
and one category page. Rest all the pages will be handled by the algorithm and it will
integrate plug-ins into rest of the pages. Currently the algorithm is not time efficient, use
multiple recursive calls and performance in terms of time and space is not good. So it can be
optimized by handling various cases.
51
REFERENCES
1. http://en.wikipedia.org/wiki/Social_commerce as cited on Feb 5, 2012
52