Вы находитесь на странице: 1из 6

International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 April 2013

Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation
R. N. Jugele* and Dr. V. N. Chavan
Department of Computer Science, Science College, Congress Nagar, Nagpur. Maharashtra, Head, Department of Computer Science, S. K. Porwal College, Kamptee, Dist : Nagpur. Maharashtra. Abstract - The original paper documents are employed
for archiving. Different hypertext structures encounters in the document. Different methods for analyzing document structure is presented. This structure used for presentation of the content of the document to the user. The hypermedia research community find that it is necessary to establish a reference architecture for hypermedia systems to make progress on defining a protocol to enable third party applications to access link services. There is a need to extend the scope of these requirements. The overall architecture for the integration of existing hypermedia systems in a distributed, collaborative model and provide a clear evolution path towards achieving this goal.
*

based around the conceptual layers of the Dexter Model and introduce three protocols for integrating with external entities. A vision of a globally distributed and collaborative model with a clear evolution path toward this goal: Present model illustrates how hypermedia systems can be integrated in a manner which provide powerful, distributed and collaborative architecture. II. PAPER AS STRUCTURED DOCUMENT

Keywords Hypermedia, link, document, logical, geometric, protocol, object, virtual, runtime. I. INTRODUCTION A working group establishing a protocol for hypermedia systems and aim of this protocol is to enable applications to access hypermedia link service functionality in a consistent and standard manner. It is observed that it is difficult to make progress on defining Hypertext Protocol without establishing a reference architecture for hypermedia systems. The Dexter Model[8] attempts to provide a standard hypermedia terminology coupled with a formal model of the common abstractions found within contemporary hypermedia systems. A three layer conceptual data model is presented without any suggestion of an architecture for realizing the model. The Flag Taxonomy[16] shows the functionality and interaction of hypermedia systems in such a manner as to aid classification. To establish an inclusive reference architecture for hypermedia systems. Following are the areas: Agreement upon specification for location specifiers (LocSpecs)[6]: Reich[11] and Rutledge[15] propose solutions for addressing this issue of open location specifications. A reference architecture for hypermedia system: Gronbaek[7] propose a synthesis architecture

Fig. 3. Paper document to Structured Hyperdocument The document model defined forms the basis for algorithms to convert paper documents into structured hyperdocument. These algorithms require processing phases, addressing various aspects of the document structures and content[14]. Processing steps are distinguished based on the different representation levels as shown in Fig. 3. described the method which is tailored easily for use in other applications. A. Paper to image objects The scanned pages are segmented using the Isodata thresholding technique[12] and it analysed the binary images. For speeding up processing, original image can be reduced to other resolutions. These are all mapped to the common document reference

ISSN: 2231-2803 http://www.ijcttjournal.org

Page 630

International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 April 2013

coordinate system called image object with its geometric features. For each image object a set of geometric features is defined i.e. width, height and aspect ratio. B. Image objects to basic geometric objects Classifying the image object into a set of geometric object classes is a segment. Here a decision tree method is used[18]. The class labels are {text, figure,horizontal line,vertical line,noise}. Its features are minimum, maximum, average or modal value of the features of the image objects in the group. The values x imin; ximax, yimin; yimax define the bounding box for segment i. There are three characteristics of segments: Features of the individual segments Relations between pairs of segments Characteristics based on the whole set of segments The individual characteristics used are the width, height and position of a segment on the page. The powerful method of selection and action forms the basis for further document analysis to deriving the basic geometric objects. The image objects are usually the smallest basic items in the image which can be given an interpretation in document terms, like characters and parts of figures. They do not correspond to the basic components required in the geometric structure which are the single paragraphs and complete figures. C. Geometric objects to geometric structure For multi-column documents, the geometric structure is mostly concerned with column structure. For two column documents segments are classified into {centered, left column, right column}. The column of a segment s is computed by considering whether it is intersected by the middle line. If not, the column is obvious, otherwise the following is used: left_column c(s)<- column(s) = centered- c(s) right_column c(s)> where is the parameter for deciding when an element is considered to be centered and c is centrality. This method is not suited for centered segments in the document, as it depends on the alignment of the bounding boxes in the vertical direction. D. Geometric to logical objects The basic objects have a geometric label. There are one or two headers on the top of the page, page

numbers are at the bottom of the page, title pages have both a title and footer above and below the textbody. The classification strategy shown in the following table. Predicate top most(text) vertical overlap(text,header) bottom most(text) in margin(figure,text) segment centered(text)^ above middle(text) segment centered(text)^ below middle(text) New type header header page number Caption Title Footer

Segment centered(text) is same as deciding whether a column is centered. The algorithm is suited for the title page, the pure textual pages and the combined text/figure pages present. E. Basic objects to content To extract the content of figures in a hypertext context, focus is on labels in the figure. plain alphanumeric labels : facsimile of their corresponding ASCII string alphanumeric template labels : text strings derived from a template where the variable part is a plain alphanumeric label and the fixed part is some visual shape icon labels : non-alphanumeric labels distinguished by their shape alone legend labels : icon labels with an associated textual definition The content of figures is analyzed at full resolution to avoid losing important details. The resulting segments are sent to the figure analysis package. Again the raw text is tokenized for use in further analysis. F. Layout and content to logical structure Logical segments have it own meaning and no direct relation to other objects, it can be done by starting with the layout information[18] and then applying rules capturing knowledge of layout conventions. Following regular expressions is used, where * means zero or more occurrences and + means at least one occurrence. -chapter:<start-of-line><numeral><,> <word>+<end-of-line> -section:<start-of-line><numeral><,><numeral> <word>+<end-of-line> Text labels in a figure have the geometric classification text they can have a logical classification indicating their meaning. As in the

ISSN: 2231-2803 http://www.ijcttjournal.org

Page 631

International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 April 2013

logical labeling of basic objects this is domain specific and requires knowledge about the content of the figures. Three logical classes can be distinguished. Figures can have a label of class title There are labels of class note provide contextual information about the figure Class name each of them naming a part of an object in the figure G. Logical structure to hypertext It provides the hierarchical structure of the hyperdocument and the linear structures required for the reading order and accessing the figures in the document[13]. Computing a standard index structure based on the labels in the figure is also trivial. An index of important keywords in the text can be found automatically based on the statistics of occurrence in the text[3]. The cross-group structure between the set of figures and the text can be found when there is some explicit way of reference to figures and these references can be found by searching for the patterns: - Note"<:> \Reference Figure"<numeral> - Note"<:>\Reference Figures"<<numeral>,>+ and" <numeral> Other common ways of referring to figures are, see figure <numeral> ", as shown in figure <numeral> ", (fig. <numeral> illustrates", (fig.<numeral> )", etc. The values of the numerals are used to derive the links of the cross-group structure that relates the text with the set of figures. To find the cross-group structure for a specific figure and its scope in the text, the tokenized text of each label in the figure is searched for in the corresponding text. Characteristics of the labels: The labels in the figure consist of multiple words The text in both the label and the associated text The text of the labels does not necessarily appear in the same order and with the exact words in the text Finally identify whether the superscript is part of a textual part of a document, formula, footnotes also have to be incorporated in the classification of logical basic objects. As no semantic linking is considered there is no remaining cross-reference structure.

not a component inside a composition. Different entry points in a composition are desirable because they allow different presentations of nodes that are recursively contained in the composition. NCM is an example of a model that allows such facility since a link can go into nested compositions as specified by the node list of end point of the link. In Fig. 2 the presentation of composition C2 can be started through links l1 or l3, coming from other parts of the document. When C2 starts through link l1, nodes V1 (video), A1 (background audio) and A2 (voice node) must start at the same time. If C2 starts through link l3, nodes V1 and A2 must start at the same time without the background audio. Therefore the presentation depends on the external context that is on the navigation that led to presentation of the composite node.

Fig. 2. Hypermedia document IV. A PROTOCOL ALONE IS INSUFFICIENT Most systems designer have developed their own proprietary protocols for communicating with link server and further involve a major reimplementation to rewrite the system to find out some new standard protocol. Davis[2] suggested that the difference between system protocols could be resolved if each system produced a protocol shim which would reside between the application and the link server as shown in figure 3. Anderson[1] offers a critique of Hypertext Protocol and makes pragmatic recommendations for improving syntax and semantics.

Fig. 3: Hypermedia Protocol architecture III. COMPOSITIONS WITH VARIOUS ENTRY POINTS Models MOAP, I-HTSPN and Madeus allow a composition as an end point of a relationship but The aim of the Hypermedia Protocol initiative is to enrich the user's environment by integrating third party applications with existing link services. It will not reduce the effectiveness of link services by

ISSN: 2231-2803 http://www.ijcttjournal.org

Page 632

International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 April 2013

rendering the functionality of these associated tools inaccessible to the end user. To overcome this problem there is general agreement that some form of runtime on the user's machine is necessary, the further model is shown in figure 4. It uses a Java virtual machine[4] and develop a framework to allow additional tools and functionality to be dynamically downloaded to the user's machine. The protocol shim functionality will be incorporated within the runtime component.

Fig. 4: Introduced runtime component A further requirement identified and it is a multimedia document/object management which open to allow developers to utilize third party product shown in figure 5 and useful in providing direction for the Hypermedia Protocol initiative and enhance both current and future developments in the field of hypermedia.

Fig. 5 Reference architecture V. RUNTIME INVENTION It act as a mediator between the viewers and the link server. Following are various approaches to provide a runtime component which offer the rich set of presentation, authoring, navigation and hypermedia link service tools. Implementing new Runtime: Implement the runtime and client-side hypermedia tools from scratch, it signifies a complete re-invention. It involve an unreasonable amount of effort. It is platform dependent but only one implementation per platform. It provide a consistent user interface across the platforms.

Virtual Machine: It allow minimal runtime component in a byte-code interpreted language and extremely versatile. The user can incorporate any custom written tools with the runtime to supplement those provided by the link server. It offers great flexibility and zero administration client, each link server must assume that the runtime component has no local hypermedia tools of its own and should therefore offer to provide them. It demands a complete re-invention for each different link server, as it supply its own client-side hypermedia tools the problem of interface inconsistency may occur. Additional penalty also incurred each time while a new tool is dynamically downloaded prior to usage. Reusing Existing Hypermedia Systems as Runtimes: This strategy promotes the wholesale re-use of existing and familiar client-side hypermedia tools which sufficiently open to integrate and combine the previous approaches. It allow to the developer and user a complete freedom over their choice of runtime which would be the favorite hypermedia system. This approach is designed to accommodate their differences and allow them to co-exist and allow a hypermedia system with its own set of proprietary viewers to utilize third party remote link service. A full definition of the essential components and protocols is required to achieve this. Allowing a hypermedia system to act as a runtime component within the model means the hypermedia system can augment locally provided link services with those of a remote link service. If a runtime is represented by a hypermedia system with a link service then there is no reason why the runtime cannot also act as a link service. If a link service is represented by a hypermedia system, then there is no reason why the link service cannot also act as a runtime. This confuses the distinction between the two entities as a client runtime can masquerade as a link server and a link server can masquerade as a client runtime. Due to this dual role, greater scope for configuration is possible. VI. HYPERMEDIA REFERENCE ARCHITECTURE The protocols required to connect the components and then present a reference architecture for hypermedia so that individual components can be discuss their role within the architecture. The protocol allows the developers of each component to have the choice as to which aspects of the reference architecture they wish to adopt and pattern of interaction that each of the protocols is define. Following are related protocols:

ISSN: 2231-2803 http://www.ijcttjournal.org

Page 633

International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 April 2013

Viewer Protocol: It has identical purpose to that of the Hypermedia Protocol where it will enable third party applications to communicate with the runtime component. Following are issues that need to be addressed: o Ratification of the way in which viewers can determine the hypermedia and collaboration services available o The adoption of a sufficiently open and versatile specification of location specifiers. Hypermedia Protocol: It provide an interface for communicating with a link server. Following are issues that need to be addressed: o Ratification of the way in which link servers advertise the services they offer o The adoption of a sufficiently versatile specification of location specifiers o Provision of locking for hypermedia objects Collaboration Service Protocol: The systems DHM[5], HyperDisco[19], SP3[9] and Sepia[17] provide support for collaboration among users. By incorporating an additional component, many of the common services necessary to support collaborative working practices can be provided. Following are issues that need to be addressed: o Support for tight and loose modes of collaboration o Interaction with Document Management System to provide object locking o Event notification subscription/unsubscription and delivery o Interaction with Link Service and Document Management System to support versioning Document Management Service Protocol: Open Document Management API(ODMA)[10] defines a common interface to commercial document management systems and promote interoperability. This standard also addresses issues like heterogeneity, unique and portable document identifiers. The ODMA standard has no mention of support for the streaming of multimedia objects. Where as DHM[5], HyperDisco[19] and SP3[9] provide proprietary solutions for document management and version control. Following are issues that need to be addressed: o Globally unique and portable document naming scheme o Add, remove and modify documents o Document retrieval o Support for versioning o Document locking VII. CONCLUSION For the layout and logical analysis, one page of each class is used for optimizing the parameters

that were not fixed beforehand like sh, ov, sw and oh used in grouping of segments and used in defining columns. For optimizing the parameters used in detection of text labels in the figure the selected figure page is used. The figure contains both parentheses and dashes in the textual labels. The first structure is the hierarchical structure. The cross-group structure between the set of figures and the text i.e., all references to figures are found correctly. In the logical classification of the content of figures identifying the titles and notes no errors are made by the system. Model provide a reference architecture for the integration of differing hypermedia systems in a powerful, distributed and collaborative framework. Different alternative strategies for achieving this end are described. This allows users to continue to enjoy the rich functionality of existing and familiar client-side hypermedia tools available within chosen hypermedia system. Without prior agreement upon the clear roles of the architectural components, a unilateral attempt at defining any of the four protocols identified by the authors would be non-productive and as such these remain undefined. If a reference architecture can help guide the way towards the global integration of hypermedia systems, then the research community can look forward to exploring emerging technologies and their potential for easing the nontrivial task of distributed information management. REFERENCES
[1] Anderson, K. M., A Critique of the Open Hypermedia Protocol. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, Technical Report CIT-SR-97-01, pp1-4, April 1997.http://www.daimi.aau.dk/~kock/OHSHT97/Papers/anderson.html. [2] Davis H. C., Lewis, A.J. and Rizk, A., OHP: A Draft Proposal for an Open Hypermedia Protocol, In The Proceedings of the 2nd Workshop on Open Hypermedia Systems, Technical Report UCI-ICS 9610. http://www.daimi.aau.dk/~kock/OHSHT96/Documents/ohp.html. [3] G. Salton. Another look at automatic text-retrieval systems. Communications of the ACM, 29(7):648{656, 1986. [4] Gosling, J. and McGinton, H., The Java Language Environment: A White Paper, 1995. http://java.sun.com/whitePaper/java-whitepaper1.html. [5] Grnbk, K. and Trigg, R. H., Design Issues for a Dexter-Based Hypermedia System. In Proceedings of the ACM Hypertext '92 Conference, Milano, Italy, pp191-200, November 1992. [6] Grnbk, K. and Trigg, R. H., Toward a Dexterbased Model for Open Hypermedia: Unifying

ISSN: 2231-2803 http://www.ijcttjournal.org

Page 634

International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 April 2013

Embedded References and Link Objects. In Proceedings of the ACM Hypertext '96 Conference, Washington D.C., pp149-160, March 1996. [7] Grnbk, K. and Wiil, U. K., Towards a Reference Architecture for Open Hypermedia. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, Technical Report CIT-SR-97-01, pp31-38, April 1997. http://www.daimi.aau.dk/~kock/OHSHT97/Papers/gronbak.html [8] Halasz, F. G. and Schwartz, M., The Dexter Hypertext Reference Model. In Communications of the ACM, 37(2), pp30-39, February 1994. [9] Leggett, J. J. and Schnase, J. L., Dexter With Open Eyes. In Communications of the ACM, 37(2), pp77-86, February 1994. [10] ODMA Association of Information and Image Management (AIIM). http://www.aiim.org/odma. [11] Reich, S., How OHP's LocSpecs Could Benefit From ISO/IEC 10744. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, Technical Report CIT-SR-97-01, pp54-59, April 1997.http://www.daimi.aau.dk/~kock/OHSHT97/Papers/reich.ps. [12] R.O. Duda and P.E. Hart. Pattern classi_cation and scene analysis. Wiley, 1973. [13] R.N. Jugele and V.N. Chavan,ODA : Processing Model Design for Linking Document, International Journal Of Engineering And Computer Science, ISSN:2319-7242, Vol 2. - , issue 3, March - 2013, pp. 806-810. [14] R.N. Jugele and V.N. Chavan,ODA: A Study of Document Design", International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), ISSN:2278-6856,Vol 2 , issue 1, Jan-Feb - 2013, pp. 194-198. [15] Rutledge, L. and Hardman, L. Applying the HyTime Model to the Open Hypermedia Protocol. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, Technical Report CIT-SR-9701, pp63-65, April 1997. http://www.daimi.aau.dk/~kock/OHSHT97/Papers/rutledge.html [16] Sterbye, K. and Wiil, U. K., The Flag Taxonomy of Open Hypermedia Systems. In Proceedings of the ACM Hypertext '96 Conference, Washington D.C., pp129-139, March 1996. [17] Streitz, N. and Haake, J. and Hannemann, J. and Lemke, A. and Schuler, W. and Schtt, H. and Thring, M., SEPIA: A Cooperative Hypermedia Authoring Environment. In Hypertext: Concepts, Systems and Applications, Proceedings of the Hypertext '90 Conference, INRIA, France, pp11-22, November 1990. [18] S. Tsujimoto and H. Asada. Major components of a complete text reading system. Proceedings of the IEEE, 80(7):1133{1149, 1992. [19] Wiil, U. K. and Leggett, J. J., The HyperDisco Approach to Open Hypermedia Systems. In Proceedings of the ACM Hypertext '96 Conference, Washington D.C. , pp140-148, March 1996.

01. Principles of Multimedia By. Ranjan Parekh Tata McGraw Hill Companies. 02. Hypertext and Hypermedia By. J. Nielsen Academic Press.

Books :

ISSN: 2231-2803 http://www.ijcttjournal.org

Page 635

Вам также может понравиться