Вы находитесь на странице: 1из 56

PROJECT REPORT

ON

CONTEXTUAL ADVERTISING SYSTEM

Submitted by
ABIN AUGUSTINE [EPAJECS001] ANU K. [EPAJECS012] BIBINDAS T.P. [EPAJECS021] GEORGE SUBIN BENNY [EPAJECS026] NISHEEL S. [EPAJECS044] PROJECT GUIDE: P.EZUDHEEN

DEPARTMENT OF COMPUTER SCIENCE


GOVERMMENT ENGINEERING COLLEGE, SREEKRISHNAPURAM, PALAKKAD 679 513 2012-2013

CERTIFICATE

Government Engineering College Sreekrishnapuram, Palakkad

This is to certify that the main project entitled CONTEXTUAL ADVERTISING SYSTEM submitted by Abin Augustine,Anu K., Bibindas T.P.,George Subin Benny,Nisheel S. to the Department of Computer Science and Engineering, Government Engineering College,Sreekrishnapuram-679513,Palakkad in partial fulllment of the requirements for the award of the degree of Bachelor of Technology in Computer Science and Engineering under Calicut University is a Bonade record of work carried out by them.

P.Ezhudheen Project Guide

Dr.P C Reghuraj Project Co-ordinator

Dr.P C Reghuraj Head of Department

Place : Sreekrishnapuram Date : March 25,2013

II

ACKNOWLEDGEMENT

First and foremost we wish to express our wholehearted indebtedness to God Almighty for his gracious constant care and blessings showered over us for the successful completion of the project. We would like to extend our sincere gratitude to our Head of the Department of Computer Science and Engineering Dr. P.C Reghuraj for rendering help during the completion of the project. We would like to express our deep gratitude to our project guide Mr.P. Ezhudheen. for guiding and helping us throughout this project. We express our sincere thanks to our Project Cordinator Dr.P C Reghuraj. for his cordination and support. We would also like to extend our thanks to Lecturers and all other members of the department for their valuable guidance. We are also grateful to our friends who helped us with their timely suggestions and supports during this endeavour. Last but not the least, we thank for the support and encouragement provided by our parents.

III

ABSTRACT

Contextual advertising system is an advertisement system which publishes advertisements in correlation with content of the web page.Our advertising system allows two kinds of users, namely providers and publishers to register in our site. The providers upload their advertisements and publishers give their url in which the advertisement is to be displayed.Using the url content of the web page is identied and stored in the database. The content analysis process is done periodically considering the dynamic behaviour of web page contents.

The identication of web page content is done using Pythons Natural Language Tool Kit (NLTK) and Uclassify web service. Advertisements are displayed on Google gadgets which are placed on publishers website.Google gadgets periodically communicate with our web server to identify the current content of the web page and the appropriate advertisements are displayed.

CONTENTS

IV

Contents
1 INTRODUCTION 2 LITERATURE SURVEY 3 REQUIREMENT SPECIFICATIONS 3.1 3.2 1 3 5 5 5 6 6 8

Minimum Hardware Requirements . . . . . . Minimum Software Specications . . . . . .

4 DESIGN 4.1

Content Identication . . . . . . . . . . .
4.1.1 4.1.2 4.1.3 4.1.4 4.1.5

Classier . . . . . . . . . . . . . . . . . .

Feature Extraction . . . . . . . . . . . . 10 Training . . . . . . . . . . . . . . . . . . 11 Accessing uClassify . . . . . . . . . . . . 11 Uploading Context le via FTP . . . . 12 What is Gadget? . . . . . . . . . . . . . 13 Google Sites . . . . . . . . . . . . . . . . 21 Java Script . . . . . . . . . . . . . . . . . 23 Working with Remote Content . . . . 23

4.2

Google Gadgets . . . . . . . . . . . . . . . . 13
4.2.1 4.2.2 4.2.3 4.2.4

CONTENTS

4.3

Website . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 4.3.2

Providers and Publishers . . . . . . . . 27 Website Security . . . . . . . . . . . . . 28


29

5 IMPLEMENTATION 5.1

The SmartAd Web Server . . . . . . . . 30


5.1.1 5.1.2 5.1.3 5.1.4 5.1.5

Python Unit . . . . . . . . . . . . . . . . 30 Databases . . . . . . . . . . . . . . . . . 32 Registering With SmartAd Server . . 34 Uploading ADs by provider . . . . . . 36 Cron Job . . . . . . . . . . . . . . . . . . 37 API . . . . . . . . . . . . . . . . . . . . . 38 Processing Response . . . . . . . . . . . 41 Working of google gadget . . . . . . . . 44
47 48 49

5.2

Communicating with uClassify . . . . 38


5.2.1 5.2.2

5.3

Displaying Advertisements . . . . . . . 42
5.3.1

6 FUTURE SCOPE AND APPLICAIONS 7 CONCLUSION 8 REFERENCES

LIST OF FIGURES

VI

List of Figures
1 2 3 4 5 6 7 8 9 10 11 12 13 Classier . . . . . . . . . . . . . . . . . . . . . . . 8

The Google Gadget . . . . . . . . . . . . . . . . . 13 Google Gadget on publisher site . . . . . . . . . . 16 Sample XML code for Google Gadget . . . . . . . 17 The makeRequest() call . . . . . . . . . . . . . . 26 The Response to makeRequest() . . . . . . . . . . 26 The SmartAd System . . . . . . . . . . . . . . . . 33 Home page of SmartAd Server . . . . . . . . . . . 34 Provider Registration . . . . . . . . . . . . . . . . 35 Publisher Registration . . . . . . . . . . . . . . . 35 Login Page . . . . . . . . . . . . . . . . . . . . . 36

Uploading ADs . . . . . . . . . . . . . . . . . . . 37 Gadget Displaying an AD . . . . . . . . . . . . . 46

INTRODUCTION

A contextual advertising system scans the text of a website for keywords and returns advertisements to the webpage based on those keywords. The advertisements may be displayed on the webpage or as pop-up ads. For example, if the user is viewing a website pertaining to sports and that website uses contextual advertising, the user may see advertisements for sports-related companies, such as memorabilia dealers or ticket sellers. Contextual advertising is also used by search engines to display advertisements on their search results pages based on the keywords in the users query.

Contextual advertising is a form of targeted advertising in which the content of an ad is in direct correlation to the content of the web page the user is viewing. For example, if you are visiting a website concerning travelling in Europe and see that an ad pops up oering a special price on a ight to Italy, thats contextual advertising.
1

Contextual advertising is also called In-Text advertising or In-Context technology.Web advertising (Online advertising), a form of advertising that uses the World Wide Web to attract customers,has become one of the worlds most important marketing channels.Contextual advertising which refers to the assignment of relevant ads to a generic web page.

LITERATURE SURVEY

The contextual advertising system we have developed mainly consists of three distinct tasks i.e. content identication of webpages,developing a website which act as a mediator between advertisement providers and publishers and nally displaying the advertisements using Google Gadgets.For content identication the general concepts about classication,feature extraction,training are studied by referring the text,Natural language processing with Python by Steven Bird,Ewan Klein and Edward Looper and the website www.pythoncourse.edu.

For studying the implementation of Python threads we refered Core Python Programming by Wesley J.Chun. For using

uClassify API the documentation of requests and responses was provided in their own website www.uClassify.com

The essentials of creating a website was studied from the text How to do everything with PHP and MySQL by Vikram Vaswani as well as from the website www.phpacademy.org.

Detailed documentation for how to access remote content by a Google Gadget was referred from www.developers.google. com/gadgets/.

The idea about the basics of Google Gadgets was acquired by taking lessons from the following websites

www.ibm.com/developerweb/tutorials/wa-google1/index.html www.wso2.org/articles/2011/11/writing-google-gadgets-tutorialpart-01

3
3.1

REQUIREMENT SPECIFICATIONS Minimum Hardware Requirements

Processor: 400 MHz Pentium II or better Hard disk: 10 GB Memory: 256 MB

3.2

Minimum Software Specications

Programming Language: Python 2.7 or higher and NLTK Operating System: Linux ubuntu 12.04 or higher Packages required in ubuntu : PyUclassify Web Services:uClassify service Web Hosting : Web host service must have the following 1. Appache,PHP,MySQL,Python 2. FTP Service 3. Cron Job Utility 4. Hosting space of 1500 MB

DESIGN

The identication of content of the websites is done using uClassify web service API and Pythons Natural Language Toolkit.The site is designed with PHP,HTML and MySQL.The advertisements are displayed on the websites using Google gadgets.

4.1

Content Identication

Content identication is done using Python NLTK and uClassify web service.Natural language processing is a eld of computer science,articial intelligence and linguistics concerned with interactions between computers and human (natural) languages.As such NLP is related to the area of human-computer interaction,many challenges in NLP involve natural language understanding that is,enabling computers to derive meaning from human or natural language input.

Modern NLP algorithms are based on machine learning, especially statistical machine learning. The paradigm of machine learning is dierent from that of most prior attempts at language processing. Prior implementations of language processing tasks typically involved the direct hand coding of large sets of rules. The machine-learning paradigm calls instead for using general learning algorithms often, although not always, grounded in statistical inference to automatically learn such rules through the analysis of large corpora of typical real-world examples. A corpus (plural, corpora) is a set of documents (or sometimes, individual sentences) that have been hand-annotated with the correct values to be learned. Many dierent classes of machine learning algorithms have been applied to NLP tasks. These algorithms take as input a large set of features that are generated from the input data. Some of the earliest-used algorithms, such as decision trees, produced systems of hard if-then rules similar to the systems of hand-written rules that were then common. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real valued weights to each input feature. Such models have the

advantage that they can express the relative certainty of many dierent possible answers rather than only one, producing more reliable results when such a model is included as a component of a larger system.

Figure 1: Classier

4.1.1

Classier

Classication is the task of choosing the correct class label for a given input. In basic classication tasks, each input is considered in isolation from all other inputs, and the set of labels is dened in advance.The basic classication task has a number of interesting variants. For example, in multiclass classication, each instance may be assigned multiple labels; in open-class
8

classication, the set of labels is not dened in advance; and in sequence classication,a list of inputs are jointly classied.

A classier is called supervised if it is built based on training corpora containing the correct label for each input.Choosing the right features is an important stage in classication.Selecting relevant features and deciding how to encode them for a learning method can have an enormous impact on the learning methods ability to extract a good model.Much of the interesting work in building a classier is deciding what features might be relevant, and how we can represent them. Although its often possible to get decent performance by using a fairly simple and obvious set of features, there are usually signicant gains to be had by using carefully constructed features based on a thorough understanding of the task at hand.

We use uClassify web service for classifying web pages. uClassify is a free web service where we can easily create our own text classiers.We can also directly use classiers that are shared by the community. uclassify uses naive Bayes classier for classi-

cation.To choose a label for an input value the Bayes classier begins by calculating the prior probability of each label,which is determined by checking the frequency of each label in the training set.The label whose likelihood estimate is the highest is then assigned to the input value.

4.1.2

Feature Extraction

Text feature extraction is the process of transforming what is essentially a list of words into a feature set that is usable by a classier.Classication works by learning from labeled feature sets, or training data, to later classify an unlabeled feature set. A feature set is basically a key-value mapping of feature names to feature values.

The rst step in creating a classier is deciding what features of the input are relevant. Here in a web page,the main content lies in between the HTML tags like title , bold , and paragraphs tags. During extraction HTML tags are removed and we extract chunks that include mainly nouns.
10

4.1.3

Training

During training,a feature extractor is used to convert each input value to a feature set.A custom classier for web page content analysis is created in our uClassify account and it is trained using the features extracted by Python.

4.1.4

Accessing uClassify

uClassify is a free web service where we can create our own text classiers.A Python wrapper is provided by uClassify which can be used to send requests and receive responses via Python. uClassify provides two types of keys a read key and a write key. Write key is used for creating and training the classiers. Read key is used for sending features and receiving responses.

11

4.1.5

Uploading Context le via FTP

After Classifying the web pages,the details such as url,class label and AD name are written to a le called context le.This le is read by all Google Gadgets in publishers web site.So we must upload this le into a place where Google Gadget can read it. Context le is uploaded to the server via FTP(File Transfer Protocol),which is a standard protocol used for uploading and downloading les over the internet.Python will upload the context le into our server.

12

4.2

Google Gadgets What is Gadget?

4.2.1

Gadgets are small applications that you can add to most any Web page as a means to oer dynamic and rich content. Google has an abundance of gadgets to choose from. However, the most intriguing aspect of gadgets is that you can write them for your own use and then publish them on Google, where other developers can integrate your work into their Web projects.

Figure 2: The Google Gadget

13

Google oers two type of gadgets: desktop and universal. As youd expect, desktop gadgets are solely for use on the Google desktop, which is an application you run on your computer that enables you to search your documents (such as e-mail and word processing documents) and allows for custom content such as Rich Site Summary (RSS) feeds, weather, and cartoons. You can run universal Google gadgets on the Google desktop, the Google home page, and most Web pages.

Gadgets are XML,HTML and java script applications that can be embedded in web pages and other apps. Gadgets are developed using Google Gadget Editor(GGE).XML is a general purpose markup language. It describes structured data in a way that both humans and computers can read and write.XML is the language you use to write gadget specications. A gadget is simply an XML le, placed somewhere on the internet where Google can nd it.

14

It will use recognised XML elements to dene the following:

Gadget characteristics, such as the authors name (your name), the gadget title and description, preferred sizing, etc. A screenshot and/or thumbnail image that containers can display to show users what your gadget looks like. Required features that containers must provide for your gadget. User preferences, where your gadget can allow its users to customise certain aspects of the gadget display. The content section , where you use dene the content that your gadget will display. This is where you add the HTML and Java Script functions that output. produce your gadgets

15

In our advertisement system Google Gadgets are used to display advertisements on publishers web site.

Figure 3: Google Gadget on publisher site

16

A sample google gadget XML code

Figure 4: Sample XML code for Google Gadget

Line 1 : This line species the contents of the le as XML. Line 2: You use the <Module> tag to indicate that this XML le contains a gadget. Line 3: As the name implies , the <ModulePrefs> tag

allows you to specify various developer-related preferences.

17

Line 4: Using the tag <Content type=html>, youre electing to use HTML as the content type. Lines 5-22: With the content type of this gadget specied as HTML, you use <![CDAT A[> to indicate the start of the HTML code (or other scripting languages allowed inside HTML, such as JavaScript). Lines 23 and 24: The tags on these two lines indicate the end of their respective sections.

Gadget Content Types

There are mainly two supported content types that you can use when writing a gadget.

html url

18

html: The html content type oers you the most exibility and is the recommended type unless you have specic features that you cant implement using HTML. With this content type, you can include most anything that can be rendered inside a Web browser url: With a url content type, the gadget content lives on a remote Web page referenced by a URL in the gadget XML le. The only content in the XML le is the reference to the URL.This approach is only recommended if you need specic features that you cannot accomplish with HTML. For example, if you have an alternate scripting language that you prefer (that is, something other than JavaScript), this content type may be a good choice. If your gadget needs to change any content in the parent page (that is, your personalized home page), youll need to use the html-inline content type. With this type, the gadget isnt rendered inside an iframe, as with the html content type. Rather, the gadget code is embedded, if you will, inside the HTML of the parent page.

19

Gadget sections

ModulePrefs: If youd like to include information about your gadget, such as your name, a Web site, your location, or other attributes, you do so within the ModulePrefs section Content: This section is where you specify the type

(html, html- inline, url) and follow that with the appropriate content. For both html and html- inline, the content is included in the gadget XML le. For url, you reference the le that contains your gadget logic. UserPrefs: The UserPrefs section allows users to cus-

tomize a gadget to their own liking. For instance, they might have the option to choose their time zone, language, color preferences, and so on.

20

4.2.2

Google Sites

The goal of Google Sites is for anyone to be able to create a team-oriented site where multiple people can collaborate and share les.It has the following features

Custom Domain Name Mapping - Owners of both personal Google accounts and Google Apps for Business accounts are allowed to map their Google Site to a custom domain name.

Multi-Tier Permissions and Accessibility - There are three levels of permissions within Google Sites: Owner, Editor and Viewer. Owners have full permissions to modify design and content of the entire Google Site, whereas editors cannot change the design of the site. Viewers can only view the site and are not permitted to make any changes to text or otherwise.

21

Extensions of Google Sites:

1. Gadgets: These are XML modules that can be embedded in a Site - that can contain custom CSS and JavaScript. Gadgets achieve two purposes:

Separation or Abstraction: The custom code can be abstracted to a distinct le Reuse: The same gadget can be reused by multiple sites as it is published publicly

2. HTML Box: These allow embedding custom HTML, CSS and JavaScript but with following limitations Google Sites Documentation

iFrame is not supported one HTML Box can not interact or refer to code outside including other HTML Boxes. Script cannot create another script, image or link tags
22

4.2.3

Java Script

Java Script (JS) is an interpreted computer programming language.It was originally implemented as part of web browsers so that client-side scripts could interact with the user, control the browser, communicate asynchronously, and alter the document content that was displayed.

Java Script is a scripting language used to add dynamic behaviour to the gadgets.Java Script is almost entirely objectbased scripting language.

4.2.4

Working with Remote Content

The Google Gadgets in publishers web site need to read the context le placed in the server in order to display the correct AD on the web page. So it needs to get the content of a remote le.Google gadget API provide a method called makeRequest to do so.

23

makeRequest()

The gadgets API provides the makeRequest(url, callback, opt params) function for retrieving and operating on remote web content. It takes the following arguments: String url: The URL where the content is located Function callback: The function to call with the data from the URL once it is fetched Map.<gadgets.io.RequestParameters, Object> opt params: Additional parameters to pass to the request. The opt params argument lets you specify the following: The content type of the request (TEXT, XML,and JSON) The method type of the request (POST or GET) Any headers you want to include in the request The authorization type (NONE, SIGNED,and OAUTH )

24

Regardless of the type of data they are fetching, calls to makeRequest() share the same characteristics: Their rst parameter is a URL that is used to fetch the remote content. Their second parameter is a callback function that you use to process the returned data. They are asynchronous, meaning that all processing must happen within the callback function. A callback is a function that is passed as a parameter (in the form of a function reference) to another function. Callbacks give third-party developers a hook into a running framework to do some processing. They have no return values because they return immediately, and their associated callback functions get called whenever the response returns. The makeRequest() call is used to fetch the content of Context le. After getting the content of that le a URL is constructed dynamically. Using this URL google gadget will display the advertisement.
25

A Sample code for makeRequest()

The following code snippet, which fetches remote content as text. This code fetches the HTML text of the google.com web page, and displays the rst 400 characters:

Figure 5: The makeRequest() call

It returns the following Object , which can be processed further.

Figure 6: The Response to makeRequest()

26

4.3

Website

The website has the facilities for users to register their accounts. The Advertisement providers can upload the advertisements. The Advertisement publishers can register their URL of the page on which they need to display the advertisements.

The site is rst developed in a localhost using XAMPP server package which provides a PHP environment and a MySQL

database along with an apache server.After completing the work of the website in localhost it is set up in www.000webhost.com a web hosting service.The script that need to be run periodically are scheduled using cron jobs.

4.3.1

Providers and Publishers

Providers can upload their advertisements to the server via their created account and specify the category in which the Ad belongs to,such as Sports,Home Appliances etc.

Publishers can register with the SmartAd web server by giving their url on which AD to be seen.
27

They also need to create an account in the server.Google gadget in publishers web site will be able to display the relevant Ad.

4.3.2

Website Security

Due to the presence of several web exploits and vulnerabilities,some precautions are taken in our website to prevent attacks like SQL injection and XSS vulnerabilities.

SQL injection vulnerabilities happen when the user inputs contains control characters which alters the SQL queries that are to be executed.This can be prevented by escaping these control characters using mysql-real-escape-string function.

XSS stands for cross site scripting which allows attackers to inject malicious client side scripts and pass this URL to other users to have valuable informations.XSS happens when the input forms are not striping any script tags,this can be avoided using strip tags function.
28

IMPLEMENTATION

Site will be implemented with registration for new users.

Users can be: Advertisement Providers. Advertisement Publishers. Registered providers can upload their advertisements to the server along with its class.Registered publishers can upload the url of their website to the database.A periodic PHP script writes the registered list of URLs and le names of the advertisements to a le.This le is accessed by Python and it fetches the html source of each URLs concurrently using threads.It then extracts the features and pass it to the uClassify web service,which in turn sends the category of the webpage as response.After identifying category of every URLs in the list,Python creates a new le in which the URLs along with its category and a randomly choosen le from that category is written.This le is then uploaded to the SmartAd web server via FTP. The google gadgets in the publishers website access this uploaded le to identify
29

the content and it displays the Advertisements according to the content.

5.1

The SmartAd Web Server

The SmartAd web server consist of the following units Python Unit: Perform feature extraction function Databse for users: Store details of providers and publishers Context le: Contains information needed by Google gadget ADs: Advertisements uploaded by providers

5.1.1

Python Unit

Python unit is a core part of SmartAd server. It performs feature extraction task. It rst fetches the raw content of a web page,pre process it and then perform feature extraction.

The preprocessing step involve elimination of extreneous contents from web page such as images ,tags,audio,video ,ash etc.
30

After preprocessing step it gets a pure text which can be processed using python NLTK.The various text processing task involve tokenization,part of speech tagging and noun phrase chunking.

Advertising keyword are most probably the nouns. So we extract mainly the phrases that involve nouns.After extracting thease features these features are send to uClassify web service which can classify the web page.Since there may be hundreds of publishers web site,doing this task sequentially on all web pages will take long time.So inorder to improve the performance we need to perform this task by using threads.This multi threaded enviorment can signicantly reduce the time.

31

5.1.2

Databases
The

In order to keep details of users we need a database.

database contains two relations. One for the advertisement provides and the other for publishers.

Provider relation keeps the following details Provider id name user name password category

32

THE SMART AD SYSTEM

Figure 7: The SmartAd System

Publisher relation keeps the following details Publisher id name user name password url of publisher web site date of registration

33

5.1.3

Registering With SmartAd Server

In order to access services of smart ad web server rst the users need to register with smart ad web server.

Publishers and provider need to access the home page of our web site via the url

http://www.smartadproject2013.hostei.com/home.html

Figure 8: Home page of SmartAd Server

34

The users can login if they are already registered or they can register if new user.

Figure 9: Provider Registration

Figure 10: Publisher Registration

35

5.1.4

Uploading ADs by provider

After the registration process the provider can upload his ADs to the server by choosing an AD and specifying category of his advertisement.The advertisements are placed in the AD Directory and in the correct category directory.

The provider needs to login with the user name and password

Figure 11: Login Page

36

After login Provider can upload his Ads

Figure 12: Uploading ADs

5.1.5

Cron Job

Cron jobs allow you to automate certain commands or scripts on your site. You can set a script to run at a specic time every day, week, etc. For example, you could set a cron job to delete temporary les every week so that your disk space is not being used up by those les.

We use cron job to run a PHP script,which will write all the publishers URL registered with us and all the name of the advertisements in category wise to a le. This le will be read by python for web page content identication and for creating a context le,which is a le containing url,category and AD name.
37

5.2

Communicating with uClassify

In order to perform a web page classication Python need to communicate with uClassify web service. So it require a API (Application Program Interface) for the communication.

5.2.1

API

uClassify web service provide two types of APIs

URL API

XML API

38

URL API

Classifying a text with the URL API is easy, just use a ClassifyText operation together with the appropiate arguments. Please note that all arguments should be URL encoded.It has a limitation that only one request can be send at a time.It needs the following arguments readKey (string): a read API key that has access to the specied classier. Sign up to get yours for free. Remember to URL encode it! text(string): the text you want to classify. Remember to URL encode it! removeHtml(boolean, optional):species whether you want the API to remove html tags before classication. Possible values are: 1, 0, true, false output(string, optional): species the response format (default is xml), can be xml or json version(string, optional):species the api version (default is 1.00), can be 1.00 or 1.01
39

XML API

The API is designed to handle multiple calls in each request. This means that you are able to batch multiple texts to one or many classiers in the same call. This is extremely powerful, say if you want to classify 300 blog posts, you can send each in a textBase64 element, and in each classify call specify a text to classier mapping. This is done by indexing texts from classify calls. All responses returns a status element, the status element has a boolean attribute called success and if this is true the request went through without any trouble. If its false something went wrong, in this case the status element inner text will contain an error message.

With XML API multiple features can be send with a single request and it can also recieve the classication result of all features in a single response.

40

5.2.2

Processing Response

Python recieves the response from uClassify in the form of a list, which can be further processed.The number of elements in this list will be equal to the number of feature sets that we have send to uClassify web service.Each element in the list contains three elements, rst the feature itself,second classier accurecy and third the probabiity that the web page belongs to each class. We need to nd the largest probability value and assign the corresponding class name as the class of the web page.After identifying the class from python it will write the url,class name and AD name into context le.

41

5.3

Displaying Advertisements

Google gadget will display advertisement on publishers web site. There is only one google gadget for all the publishers.Publishers need to add a google gadget into their site,this is a one time task and there is no need to add the gadget again if their content of web page changes.Our system automatically identify content of web page and display the most appropriate advertisement on the web page.

Our system periodically check the content of web pages and record the class of a web page into a context le,which is shared and used by all google gadgets in publishers web site.Diferent ADs will be displayed on dierent web pages based on their content.

The url of our google gadget is

http://hosting.gmodules.com/ig/gadgets/file/103090780035728431429/ smartad.xml

42

This le is hosted in a place where google can render it. Publisher need to follow the steps given below to add a Gadget to their site Go to their website Go to editing mode From the Insert menu,select more gadgets Select add gadget by URL and insert the URL

http://hosting.gmodules.com/ig/gadgets/file/103090780035728431429/ smartad.xml

Adjust the size to 300x250 and add scrollbar if needed Click Ok and you have the gadget to display the advertisements!!!

43

5.3.1

Working of google gadget

The context le contains many tuples.The number of tuples in this context le is equal to the number of publishers registerd with our server.Each tuple contains three enties

url of web page class name (category name) advertisement le name

The window.location object in the google gadget will display the advertisement by redirecting the browser towards the url assigned to it.

44

The internal datails of goolge Gadget to display AD involves the following steps Google gadget communicate with context le via makeRequest() method and read content of context le. It identify the url of the the web page in which it resides. Start a loop and Compare the url of the web page with the URLs in the context le by scanning the text line by line. If a match is found,the variable category will be assigned the next entry in context le. The variable lename will be assigned with the second next entry in context le. After getting these two values it will form a url window.location is now assigned the url
"http://www.smartadproject2013.hostei.com/ADs/"+category+"/"+filename

45

The above steps are executed by all the gadgets in publishers web site to form a url dynamically.The static part of url is enclosed in double quotes and variable parts are concatenated to it with a + operator to form the complete url.

The below gure shows a Google Gadget that displays an advertisement on publishers web site based on its content.

Figure 13: Gadget Displaying an AD

46

FUTURE SCOPE AND APPLICAIONS

Business Scope
The System can be implemented at business level and can function as an on-line advertising system over the internet. More and more publishers and providers can register and use the services of the sytem.

Future enhancements
Future enhancements for contextual advertising system includes: Implementing an attractive and dynamic pricing policy scheme for the customers Extending the site to support publishers with sites other than google sites Porting the content identcation module from the local system to the web hosting service Implementing security measures for other forms of web based attacks

47

CONCLUSION

Contextual advertising has made a major impact on earnings of many websites. The ADs are more targeted,they are more likely to be clicked,thus generating more revenue for the owner of the website and the server of the advertisement. Contextual advertising replaces the media planning component,i.e instead of humans choosing placement options,that function is replaced by computers facilitating the placement across thousands of web sites.

48

REFERENCES

References
[1] Natural language processing with Python Bird,Ewan Klein and Edward Looper by Steven

[2] How to do everything with PHP and MySQL by Vikram Vaswani. [3] Core Python Programming by Wesley J.Chun. [4] www.pythoncourse.edu. [5] www.developers.google.com/gadgets/. [6] www.seoish.com/how-to-make-google-gadgets/. [7] www.phpacademy.org. [8] www.ibm.com/developerweb/tutorials/wagoogle1/index.html. [9] www.wso2.org/articles/2011/11/writing-google-gadgetstutorialpart.

49

Вам также может понравиться