Академический Документы
Профессиональный Документы
Культура Документы
1
Visakhapatnam
Certificate
record
Bachelor of Technology
In
Guide Observer
2
Candidate’s Declaration
original work pursued under the guidance of Prof. David Wayne Clay and Prof.
We have not submitted the matter embodied in this project for the award of any other
degree.
S.S.V. Kaushik
P. Santosh Varma
Bh.S. Ramaraju
Place: Visakhapatnam
------------------------------------------------------------------------------------------------------------
3
ACKNOWLEDGMENT
gratitude to our mini project guides, Prof. David Wayne Clay and Prof.
enthusiastic and involved interest from their side. This fueled our
enthusiasm even further and encouraged us to boldly step into what was a
ABSTRACT
Introduction
4
With the Internet growing in size day by day both in terms
of number of users and content, the traditional stand alone approach is almost on the
verge of an end. There is a need for an approach that would integrate both the World
Wilde Web and the Stand Alone Systems. Web Wise Files comprise of one such
approach.
Motivation
The idea of Web Wise Files comes from the very basis of a
distributed operating system in which the changes done in files of one terminal
should be reflected in every terminal that is a part of that system. That is the whole
Problem Statement
himself/herself acquainted with the changes in the Stock Market, the Weather of that
day, the Score of an ongoing Cricket match, the latest technological advances and so
on. In addition to these he/she may listen to the latest audio or videos or he may wish
5
In order to do all these, the user needs to spend a lot of
time surfing different sections of the World Wide Web for corresponding information.
An approach that would actually do all these things in a jiffy, in a more systematic
customizable way, depriving the users of the strain, would be a boon to all users. We
Approach
contents accordingly with that of the World Wide Web. Yet they have their physical
existence on the very terminal the user works on. The web wise files are actually user
defined files comprising of data (from the WWW) of user’s choice in the format
TERMINOLOGY USED
The actual document displaying all the content blocks in user-defined format.
6
Web Wise Document Definition(WWDD)
The System which actually contains all the web wise documents and their
definitions.
INTRODUCTION TO WWDS
document with embedded web content. A web wise document will consist of a layout
7
definition section and a set of content block definitions. Layout definition section
indicates the placement of content blocks in the view document. These definitions
Content Block
Content block definitions indicates the method of locating the content and any details
needed to access the content. These definitions will be expressed using XML. Each
8
Content block types include the following :
1. Local text
4. Blog post
5. Web service
6. Twitter post
7. RSS Feed
FEATURES of a WWDS
user
9
ü Retrieving data from a social networking site Ex:Twitter
1)Create/Edit a WWDD.
WWD.
System in web wise document system,on the other hand, should be able to do the
following:
10
MODULES INVOLVED
l Creator
l Viewer.
Creator
è The Creator will be responsible for generating the XML file for WWD
è The Creator provides a visual interface for the user to customize layout
è The Creator will express the layout and content definitions using XML
file.
è In addition the Creator may provide options for the customization of the
Viewer
l The Viewer will be responsible for retrieving and showing the content
to the user in the appropriate layout using the XML file (the one that is
11
generated by the Creator).
l The Viewer retrieves data dynamically during the opening of the WWD
by the user.
Creator.
Content Block parameters may include one or more of the following to uniquely
12
1. Page ID or Section ID
document.
4. The relative position of the content section with respect to another fixed
remote text.
The user can define Custom Layout by defining the absolute positions and
13
REQUIREMENTS / SPECIFICATIONS
The requirements and specifications for this application include both software(SRS)
Operating Systems
Hardware Requirements
14
1. RAM: 256 MB
System Requirements
Functional Requirements
Ø These are the requirements given to the system during Requirements Phase of
Ø The user should be able to choose from a variety of content sources available in
the internet.
Ø The user should be able to able to view all the content blocks simultaneously.
Ø The user should be able to modify the content blocks and their definitions at his
will.
Ø The system should minimize the overall time spent by the user in surfing and
Ø The user should be able to view the document in the desired layout.
Ø The system should dynamically obtain the contents of the document from the
corresponding sources.
Ø The user should be able to use the GUI with minimum guidance. That is, the
15
Ø The GUI must provide an easy way to create, edit and delete contents of the
document.
DESIGN
Though the application can be developed on either java or .NET, we preferred .NET
Though there are many language in .NET like ASP and VB etc., we choose VB
As our project involves both desktop and web functionalities VB serves better.
Challenges Faced
16
During the initial stages of the design of the project , there are many challenges
sites of BBC and CNN and found it very difficult to find the required
for example , a web site where only one heading block appears and it
is contained in a specific table cell. when the user selects web content
source page?
17
there might be several blocks in the source page with the same
"Heading".
So,
That is, what comes into the content block and what doesn't?
HTML tag name and id or even the absolute position of the content block rectangle in
1. How many users will know about the actual coding details of the content
block?
2. And there can be many ways to uniquely identify the content block
3. Id, name and absolute position are only one way. So, trying to provide
But without knowing these details we can not fetch the required content block in a
18
UML Diagrams
UML Diagrams are basic modeling diagrams used to determine the architecture of
We are concerned with the three major UML diagrams. They are
1.Use Case
2.Class
3.Sequence
19
The above scenario shows the interaction of the system with two actors:
Scenario1
i) User
The user interacts with the system to select any of the content types mentioned.
For each content type selected by the users, corresponding changes are updated in the
XML file.
The following scenario shows another type of interaction of the user with the system.
Here the user can perform a variety of functions to create/edit/view a WWDT. The
system uses the WWDT to retrieve the required content blocks dynamically gtom the
internet.
Create a WWDT
Edit/Modify existing a
WWDT
The above class diagram shows the interaction of a total of 9 classes involved in
WWDS. The classes are basically categorized into two packages based on their
21
functionality:
Windows Application1
ii) Form2 – This class contains functions for the defining/modifying title
iii) Form3 – This class contains functions that define, store/edit the
iv) Form4 - This class contains functions that define, store/edit the
22
properties of Web Service. That is, the information required
to
vi) Form6 - This class contains functions that define, store/edit the
vii) Form7 - This class contains functions that define, store/edit the
viii) Form8 - This class contains functions that define, store/edit the
Windows Application2
viewer/retriver.
23
layout.
Sequence Diagram
The above sequence diagram encloses a typical sequence of steps that are
followed by a user for creating/editing a WWDT and later viewing it using the
system.
24
The steps 3 to 14 are actually asynchronous in the sense that each of them can occur
For instance a user may want to access 3 web services but only a single twitter post.
In such a case, the other steps wont be necessary and even these 4 content blocks can
be defined by the user in any order. That is the order is not important for retrieving
information. But this order is very much essential if the user is also considers the
order in which these content blocks are finally displayed as the viewer/retriever
displays the content blocks in the same chronological order as chosen/defined by the
user.
Platform:
We used Microsoft's VB Professional Edition 2008 with the .NET 3.5 Framework
Platform as it is the latest and is well suited for web applications and also has more
For a WWD,
2. The user fills the details into the form to create,edit and delete the
25
So, input is taken and is stored in an XML file.
Steps taken to retrieve the output for the given specific input:
1. We used the XML file to retrieve data from the source (web,local text
etc.)
2. we used SOAP like technology to access the web services using inbuilt
.NET libraries.
3. To retrieve data from the web, we used the concepts of HTML and XML
Parsing along with the nested HTML concepts to solve the parsing html
code problem.
Through this we also achieved a few useful ways to find the content embedded in the
nested html.
XML is the language used to represent the WWDT. It typically represents the
following information:
The <wwd> tag represents the root of the document. It essentially comprises of:
i) a <layout> tag
26
<layout> tag:
The <layout> tag represents the layout of the given document. It encloses the name of
For example,
<wwd>
….
<layout>Column</layout>
</wwd>
indicates that the layout of the document is a Column Layout. That is, all the content
blocks are showed column by column. Similarly, Row Wise is used to represent the
Row Layout.
<content> tag:
ii) <type> tag that contains the type of the content to be retrieved.
iii) <params> tag that contains information regarding parameters specific to the
content block.
For example,
<wwd>
27
<content>
<params>
<BlogURL>http://www.gizmodo.com/ </BlogURL>
</params>
<title>My Blog</title>
<type>Blog Post</type>
</content>
</wwd>
indicates that one of the content blocks to be retrieved is a Blog Post named
There are 6 types of contents included in the project. Their properties are:
Inorder to retrieve a portion of the text from a file, we require the following:
28
− Number of lines to be displayed.
solve we have chosen three predefined categories. For each category we chose two
websites exhibitting good web design standards. In each web site we have pre-
selected the portions of the website to be retrieved. The following are the categories
Categories:
− Headlines
− Weather
− Sports
Websites:
We used the class and ids of the HTML tags to uniquely identify well
3. Blog post:
Inorder to retrieve a recent Blog Post article, we need the following information
29
Using this information, we have retrieved the RSS Feeds corresponding to various
articles in the particular blog. The first RSS Feed obtained corresponds to the recent
Twitter post:
Even here we used the Twitter ID to obtain the RSS Feeds corresponding to the status
Web Services:
A wide number of services are available in the Internet today. We have chosen a few
very important and frequently used web services for demonstration and provied easily
another)
location)
− Send SMS World (sends free SMS to any cell phone in India)
RSS Feeds:
The typical information required to get data from a RSS Feed includes:
30
The URL of the RSS Feed corresponds to the URL of the WSDL (Web Service
output expected for each function. Using the inbuilt functionality of recognizing
functions provided by a web service, given its WSDL, we have implemented GUIs to
required information from a HTML page. Though there are a few HTML parsers
available like MSHTML etc. they provide only a partial solution. This is because,
HTML is not a strongly typed language and hence various users use a variety of non-
standard methods while designing a web page. Often these methods involve tags that
are highly unstructured and syntactically incomplete. Part of this non-standard nature
of the websites can be attributed to the modern browsers which allow and parse a
number of syntactical errors without any complaint. Thus, HTML parsing is a non-
Hence, our HTML parsing is done using our own parsing routines
with the help of MS HTML parser. But this type of page specific parsing is very
limited in its approach and is highly susceptible to errors the moment the
corresponding web site designers decide to change the standards used in the page.
31
document made it easy to write a XML parser. There a number of XML parsers
available over the Internet. One could write their own XML parser provided they
have enough time. Typically there are two types of XML parsers:
i) a SAX Parser
We chose to use a DOM parser because of the ease and efficiency with
which
We have used the Microsoft XML DOM parser provided by the .NET package as it
The interface is so simple and easy to access. A lay man can understand the usage of
32
Solution Explorer in Visual Studio 2008
33
Selection of windows forms application in Visual Studio 2008
Sample Code
34
For instance, a small part of the code used for retrieving content provided as input to
System.Object, ByVal e As
System.Windows.Forms.WebBrowserDocumentCompletedEventArgs)
TextBox1.Text = e.Url.ToString()
Received
Files\Modified\WindowsApplication1\WindowsApplication1\bin\Debug\sample.
xml")
35
xdoc.SelectSingleNode("wwd/content/type[@webpage='" + e.Url.ToString() +
"']")
disp(id).Stop()
RemoveHandler
disp(id).DocumentCompleted,AddressOfMe.WebBrowser1_DocumentComplete
ele = doc.GetElementById("tickerHolder").NextSibling.NextSibling
36
ele = doc.GetElementById("advert_8").Parent.NextSibling.NextSibling
ele = doc.GetElementById("box315")
ele1 = doc.GetElementById("divtopstlatest")
ele = doc.GetElementById("a")
ele = doc.GetElementById("portlet_878")
doc.GetElementById("tickerHolder").NextSibling.NextSibling for
http://news.bbc.co.uk/sport/
doc.GetElementById("advert_8").Parent.NextSibling.NextSibling for
http://www.skysports.com/
Dim ele as
htmlelement=doc.GetElementById("box315")http://www.ndtv.com/news/index.php
37
doc.GetElementById("divtopstlatest")http://www.ndtv.com/news/index.php
http://www.espnstar.com/
End If
TextBox1.Text += ele.InnerHtml
".html")
+"</body></html>")
38
Else
End If
filew.Close()
disp(id).ScriptErrorsSuppressed = True
disp(id).Navigate("file:///C:/Documents%20and%20Settings/SantosH/My%20D
ocuments/Google%20Talk%20Received%20Files/Modified/WindowsApplicatio
disp(id).Show()
End Sub
Screen shots
39
Basic Editor window form without any input provided
Editor window form to select the layout using drop down menu
40
Selecting column layout and clicking “New” button
41
Giving a title and specifying the type of content
42
On clicking “Properties” button in the content block form
43
Clicking OK in the properties form
44
On selecting a content block, “edit” and “delete” buttons will be highlighted
45
Clicking NEW for new content block and repeating the same procedure
46
Properties window form for web page content
Selecting any one of the predefined web sites for the category
47
After creation of the 2 content blocks
48
Clicking NEW for new content block on web services
49
Different Types of web services
50
Same Procedure repeated for Twitter Posts
51
Same Procedure repeated for RSS Feeds
52
Entering the URL for RSS Feeds
53
Editing a content block
54
Deleting a content block
55
XML File parse the inputs and system stores them
56
XML File parse the inputs and system stores them
Output Forms
57
O/P for local text,Headlines and RSS feeds
58
O/P for weather(web service),twitter and blog posts
Testing
Instead of the traditional late Testing, Testing is performed from the initial stages of
In the initial stages of the coding, Unit Testing is performed. That is, each form
59
designed is tested for robustness, consistency and scalability. Each bug is corrected
identify the new code bugs that creep up when integrated. These bugs are identified
and rectified.
Majority of the paths in the Control Flow Graph are followed to identify bugs and
modifications in the code of individual units involved and again integration testing is
performed to identify new bugs that may have crept due to the modifications. This
A good amount of System Testing is also done to identify most frequent bugs
and corresponding corrections are made. The task of testing is easened to some extent
in case of features like web services which use their own exception handling
Boundary testing is also performed to identify the correctness of the code and
Future Scope
In future there can be many extensions to this application. Some of them include:
like
60
phenomenon.
regularly.
sections
structures.
Layouts.
Results
With this approach, the user can simply access a Web Wise File as
any other file on his disk except for the extra delay it would take to update itself. This
61
approach would be faster and easier than the manual surfing of the Internet.
Conclusions
This concept of Web Wise Files will have a profound effect on the
cloud computing and other areas where it can be extended to devices other than
restoring the stand alone feel people used to have in the earlier days when there is no
Internet. The results mentioned above are universally applicable to users of almost all
domains.
62