Академический Документы
Профессиональный Документы
Культура Документы
Student Name:
M. S. A. SHAHNAWAZ CHOWDHURY & S. M. ABU SALEH SHAWON
ROLL-0507014 ROLL-0507016
This submission is the result of our own work. Primary and secondary sources of
information and any contributions to the work by third parties, other than my tutors,
have been fully and properly attributed. Should this statement prove to be untrue I
recognise the right and duty of the University to take appropriate action in keeping with
the regulations regarding candidates’ use of unfair means during assessment.
Comments:
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________
1
Contents
1. Acknowledgements 3
2. Introduction 4
3. Analysis 5
3.1 Corpus 5
3.1.1 Corpus requirement 5
3.1.2 Corpus creation 5
3.1.3 Type of corpus 6
3.1.4 Why corpus is needed 6
3.2 Corpus Analysis Tools 7
3.2.1 Analysis Tools 7
3.2.1.1 Design a parser 8
3.2.1.2 Save corpus into database 9
3.2.1.3 Search & Show any data 9
3.2.1.4 Representation of corpus as TREE 9
3.3 System requirements 10
4. Snapshots 11-16
5. Future plan 17
6. Conclusion 17
7. Refference 18
1
Chapter-1
Acknowledgements
With due homage and honor we are wishes to express our gratitude to
Almighty Allah.
We are expressing our special thanks to RUSHDI SHAMS Sir(Lecturer,
Department of Computer Science & Engineering) who gave us the idea of this
project.
We express our indebtedness with reverential acknowledgement to our
honorable supervisor RUSHDI SHAMS Sir(Lecturer, Department of Computer
Science & Engineering) for his friendly & excellent guidance.
We also express our gratitude to all our teachers, senior students and our
cordial batch mates for their invaluable support.
1
Chapter-2
Introduction
Corpus-based approaches to dialogue have become an increasingly
important part of dialogue agent design, providing a scope of the real issues that
need to be dealt with in order to engage in natural dialogue with humans, as well as
providing the basic data for statistical methods for language processing.
This paper describes the development of a multi-modal corpus based
on language interaction.
Our software is developed to analysis corpus & represent multimodaly.
Here we use xml file as a corpus and represent it.
Chapter-3
1
Analysis
3.1Corpus:
A corpus is a large body of machine-readable texts. In linguistics and
lexicography, a corpus is a body of texts, utterances, or other specimens considered
more or less representative of a language, and usually stored as an electronic
database.
The main purpose of a corpus is to verify a hypothesis about language.
Corpora are ideal for functionally based analyses of language, they have other
uses as well.Now computer corpora may store many millions of running words,
whose features can be analyzed by means of tagging and the use of concordance
programs.
3.1.1Corpus requirement:
Corpus creation
Import of existing data
Support of state of the art linguistic software
Corpus analysis
Linguistically relevant queries
Generation of sub-corpora
Corpus extension
Simple corpus extension
Revision mechanisms
Corpus dissemination
Within a working group and extensibility
3.1.2Corpus creation:
1
3.1.3Type of Corpus:
XML
RDF
TOPIC MAP
CONCEPT MAP
1
3.2Corpus Analysis Tools
3.2.1Analysis Tools:
1. Design a parser.
2. Save corpus into database.
3. Search & Show any data.
4. Representation of corpus as a TREE.
1
3.2.1.1Design a parser:
1
3.2.1.2Save corpus into database:
To save corpus into the database we have to connect
Java with Mysql. Then we use sql queries to save corpus into the database. After
saving corpus into the database a confirmation message will be shown in the
window.
1
System Requirements
Operating Systems:
Software Requirements:
NetBeans IDE 6.0
MySQL 5.0
JDK 1.6
1
Snapshots
1
Future Plan
Our future plan is to make the “Text to knowledged mapping prototype”
software more friendly, more easier and more comfortable for the user. It is not
only theoritical but also attractive representation.
Conclusion
The interface of the software is user friendly. This software is applicable for
Windows XP operating system. As maximum computer users feel comfort to use
this operating system. Our main target was to develop a software which can really
help people to take decision or understand the subject of the corpus. If we were
provided sufficient technical help as well as enough time, we could develop this
software more effectively. We are looking forward to improve our software to
make it truly platform independent.
1
Refferences
http://www.oracle.com/technology/ Parsing XML Efficiently - Dev xml.html
http://www.corpus analysis.html
http://www.cambridge.org
{ robinson, martinovski, stephan, traum}@ict.usc.edu
Database System Concepts
--Abraham Silberschatz
--Henry F. Korth
--S. Sudarshan
Database Systems Lab Sheets
--Rushdi Shams(Lecturer,CSE,KUET)