Академический Документы
Профессиональный Документы
Культура Документы
Chapter 1
INTRODUCTION
and the reasearcher Natalia Kokash stated in her study that the most important
computer science deals with these tasks. Complexity of tasks in general is examined
studying the most relevant computational resources like execution time and space.
(Kokash, 2002)
Kokash added that the ranging of problems that are solvable with a given
limited amount of time and space into well-defined classes is a very intricate task,
but it can help incredibly to save time and money spent on the heuristic algorithms
connection with the quality issue, the goal of the heuristic algorithm is to find as
good solution as possible for all instances of the problem. There are general
Also Kokash said that nowadays computers are used to solve incredibly
complex problems. But in order to manage with a problem, the researcher should
develop an algorithm. Sometimes the human brain is not able to accomplish this
task. Moreover, exact algorithms might need centuries to manage with formidable
challenges. In such cases heuristic algorithms that find approximate solutions but
In her study which is all about the heuristics, their areas of application and
the basic underlying ideas are surveyed. The researcher also described in more
that since the beginning of civilization, human beings have focused on written
exchange there is clear need for improved techniques to organized large quantities
of the community.
In this study, they surveyed recent research efforts that focused on the
document collection is indexed prior to any user query. A query is issued and a set
documents that are deemed relevant to the query are ranked based on their
Numerous techniques exist to identify how these documents are ranked, and
that is the key focus of this study (effectiveness). Other techniques also exist to
Television, smart phones, computers are just some products of our modern
overtaking of technology is fast approaching in our world with the help of our
intelligent researchers and inventors we could make the future possible which is
branch that deals with engineering and applied sciences. Technology is now
But yet the beauty of the Technology has been misused by some of the
normal norms instead of using it in more appropriate manner. This technology that
the premiere institutions in the region which embraces the changes in technology.
The institute is already ISO 9000:2008 certified and is nationally accredited by the
4
Technology, being one of the colleges of SJIT is embracing the thrust of the
institute as well. CBIT consists of seven (7) courses offered, namely Bachelor of
Information Technology (CBIT) was established last 2013, since this department
grading system, etc.). However, like many colleges and institutions in the region,
CBIT has still a lot of issues to tackle. One of such issues that the researchers of
this study would like to address would be the issue with regard to lack of
Lastly, the main problem that the researchers attempt to solve with this
study is to determine the most reliable and usable heuristic algorithm. Because
heuristic algorithm is a technique designed for solving problem more quickly when
classic methods are too slow, or for finding an approximate solution when classic
5
methods fail to find any exact solution. This is achieved by trading optimality,
The researchers of this study would like to address the issues and concerns
2. Among the heuristic algorithms that will be used, which algorithm performs
3. Among the heuristic algorithms that will be used, which has the most
optimal performance.
4. Among the heuristic algorithms that will be used, which is the most reliable.
The researchers of this study wish to address the abovementioned issues by:
Faculty. The system serves as a tool for our beloved educators who would
like to impart their knowledge of what they acquired in their previous experience
particular e-book which would help them to have a sources for their topics that they
Students. The system serves as a tool for our competent learners who wants
to search a particular e-book which it could serve as location of their source for
projects, reports and other educational purposes. This system is focus to develop
Library. The system caters some part of library books searching features
which is located at their desired books. The difference of our system is to easily
researchers who want to study about the algorithms which be related in our study
able to search a specific e-book or a pdf file in the static file directory on a Personal
process of accessing information from memory or other storage devices. And now
it is one of the common elements of file searching and now this features are now in
integrated already in many platforms like Microsoft, Apple etc. operating system.
The system caters the algorithms panel where in each panel will have a
dashboard that will display the graphical presentation of the evaluation in terms of
researchers for which out of the algorithms developed, the researchers are aiming
8
for the best heuristic algorithm that deals with the file searching in terms of speed,
(SJIT) will have a folder for the storage of e-books pertaining to the departments
specific courses.
algorithm to visualize which algorithm excels the most in terms of time retrieval or
speed, accuracy and optimality. It will be developed in Visual Studio 2010 with
Chapter 2
This chapter presents the related literature and studies after the thorough
and in-depth search done by the researchers. This will also present the synthesis
of the theoretical and conceptual and conceptual framework to full understand the
research to be done and lastly the definition of the terms for better comprehension
of the study.
This study considered some articles taken form the internet as a reliable sources.
Related Literature
Natallia Kokash (2002) study the most important thing among the
examined studying the most relevant computational resources like execution time
connection with the quality issue, the goal of the heuristic algorithm is to find as
manifold problems.
challenges. In such cases heuristic algorithms that find approximate solutions but
have acceptable time and space complexity play indispensable role. In her study
which is all about the heuristics, their areas of application and the basic underlying
and how an assortment of utilities are integrated into the query processing scheme
to improve these rankings. Methods for building and compressing text indexes,
notion that the more often terms are found in both the document and
language, the reality that the same concept can often be described
retrieval strategy. Most utilities add or remove terms from the initial
11
In this study, they surveyed recent research efforts that focused on the electronic
information in response to users queries. That is, they discussed algorithms and
decide on which branch to follow. The term heuristic is used for algorithms which
find solutions among all possible ones, but they do not guarantee that the best will
algorithms. Occasionally these algorithms can be accurate, that is they actually find
the best solution, but the algorithm is still called heuristic until this best solution is
solve a problem in a faster and more efficient fashion than traditional methods by
are most often employed when approximate solutions are sufficient and exact
CHAPTER 3
METHODOLOGY
The chapter presents the e-book file retrieval method of the algorithm flow
Algorithm Design
With three (3) algorithm for e-book file retrieval design and conceptualize
by the researchers and conduct a test which would be applied to the system to
an effective method that can be expressed within a finite amount of space and
14
time and in a well-defined formal language for calculating a function. Starting from
an initial state and initial input (perhaps empty), the instructions describe
ending state. The transition from one state to the next is not
formalization of what would become the modern algorithm began with attempts to
Church's lambda calculus of 1936, Emil Post's "Formulation 1" of 1936, and Alan
finding all (or some) solutions to some computational problems, notably constraint
Algorithm 1. The method is it will get all the drives in the computer or all
available drives that can be search by the algorithm that will get the top level
directories of the specific drive. It will search throughout the subfolders of the
uppermost directory. And it will continue throughout the last folder of the
search also the pdf files. And if the algorithm find the pdf file, it will automatically
convert to string and it will determine the users keyword in that string if it has
occurrence.
The figure 1 shows that the study about algorithm 1 consist of a folders with
its level of sub-folders that signifies the algorithm flow. In the Parent level of the
folder (D//:), the algorithm 1 will go in to the first folder (F1) and allocate the
selected file to determine if it has an available pdf inside of the parent folder. And
if the algorithm found out that there is an available pdf inside the parent folder, the
sub folder will automatically arise which become the level 1 folders which is the
F2 and F7. This folders represented as the child of the parent folder F1.
Meanwhile, in the level 1 folder (F2), the algorithm 1 will also go in, to
search if there is an available pdf inside and if the algorithm noticed that there is an
available pdf in the level 1 folder F2, it will automatically proceed to next level
(level 2) folder. The process of getting the pdf file in each folders (F3) and (F4) are
still the same until the last level of the folder (level 4). And when the algorithm
reach the last level of the folder(level 4 F4) and find out that there is no more
available pdf inside, the algorithm will automatically go in to its sub folder (F5), to
And when the algorithm reach at the level 4 (F5) folder and found out that
there is no more pdf available inside of the folder, the algorithm will go back to the
third level (F3). Since this folder was already evaluated by the algorithm if it has a
pdf inside, the algorithm itself will go to the next sub folder (F6). The process of
Algorithm 2
The method of this study is to collect all the selected directories inside the
drives and search all the pdf file in a particular directories. Until the algorithm will
completely done the process on getting the pdf files. A directory is a file
system cataloging structure which contains references to other computer files, and
cabinet.
Algorithm 2
The Figure 2 shows the description flow of the algorithm 2. First, the
algorithm 2 will go in to the parent folder (D://) to distinguish whats inside of the
drive. After the parent folder, the algorithm 2 will proceed to the next stage as to
collect all the directories in a drive, and search all pdf file in a collected directories.
Algorithm 3
The method of this study is to collect all the directories inside the drives and
search all the pdf file in a collected directories. In algorithm 3, there are two kinds
Algorithm Flowchart
The figure 3 first run shows that the study about algorithm 3 consist of a
folders with its level of sub-folders that signifies the algorithm flow. In the Parent
level of the folder (D//:), this algorithm has the same process in algorithm 1 that the
algorithm itself will go in to the folder (F1) to distinguish if it has an available pdf
inside but unlike algorithm 1, in the algorithm 3, since the algorithm found out that
there is an available pdf inside the folder (F1), the algorithm will not proceed to the
next level of the folder instead the pdf collected by the algorithm will stored in its
level folder which is the text file. This kind of process will be repeated until the
Algorithm Flowchart
The figure 5 shows first that the algorithm will enter the drives through
collecting some pdf files inside of it. And the pdf file would be stored in a notepad
or a text file. When the algorithm was done on storing the files notepad, the
Research Instrument
Internet:
It is the global system of interconnected computer networks that use
the World Wide Web (WWW), electronic mail, telephony, and peer-to-
the United States federal government in the 1960s to build robust, fault-tolerant
academic and military networks in the 1980s. The funding of the National Science
21
Foundation Network as a new backbone in the 1980s, as well as private funding for
marks the beginning of the transition to the modern Internet, and generated a
and mobile computers were connected to the network. Although the Internet was
widely used by academia since the 1980s, the commercialization incorporated its
Internet use grew rapidly in the West from the mid-1990s and from the late
1990s in the developing world. In the 20 years since 1995, Internet use has grown
100-times, measured for the period of one year, to over one third of the world
population.
television, paper mail and newspapers are being reshaped or redefined by the
Internet, giving birth to new services such as email, Internet telephony, Internet
book, and other print publishing are adapting to website technology, or are
The entertainment industry was initially the fastest growing segment on the
Internet. The Internet has enabled and accelerated new forms of personal
22
networking. Online shopping has grown exponentially both for major retailers
and small businesses and entrepreneurs, as it enables firms to extend their "bricks
and mortar" presence to serve a larger market or even sell goods and services
System Design
The table 1 shows the file data analysis that the word that been found in a
certain page of a selected pdf, has the average percentage of the word occurrence
23
in a certain page. The time execution has categorized in terms of second (s) and
millisecond (ms). The data was generated by a system from the pdf file that has
been found and process the data gathered inside the pdf file
Final Calculation
The table 2 shows the final calculation of the total average of the occurrence
word in the pdf detected by the algorithm, as well as the page count of the pdf.
Lastly, the total time execution of the process on searching the pdf files located on
the drives.
Speed Chart
The Figure 6 shows the file speed retrieval with variables x and y axis, the
x-axis stand for time equivalent and for the y-axis is the total word found located
in a certain page of the pdf. And also, each algorithm has corresponding colors for
Accuracy Chart
determine whether the pdf file is more reliable kind of file that the users needed.
In addition, the chart shows each algorithm color code presented for the viewer to
String Searching
Filter all unnecessary string such double spaces and new line
26
Chapter 4
This chapter presents the results and discussions of the data gathered.
raised in chapter 1.
important class of string algorithms that try to find a place where one or
several strings (also called patterns) are found within a larger string or text.
File retrieval is defined as the matching of some stated user query against a
set of free-text records. These records could be any type of mainly unstructured
text, such as newspaper articles, real estate records or paragraphs in a manual. User
few words.
27
In the first test of the algorithm, the one who performs best in terms of file
retrieval and file searching is the algorithm 3. Because, the path of searching the
Problem 2: Among the heuristic algorithms that will be used, which algorithm
In terms of speed the algorithm who performs best was the algorithm 3.
Because, the algorithm 3 has a direct path of stored directories that are going help
Meanwhile, in terms of accuracy, the algorithm who performed best was the
algorithm 3 because the bases of its accuracy are the words found in its folder.
Problem 3: Among the heuristic algorithms that will be used, which has the
In terms of optimality, the algorithm who has the most optimal performance
result was the algorithm 3 because the resources performed by algorithm 3 was just
Problem 4: Among the heuristic algorithms that will be used, which is the most
reliable.
own path that can be useful to transmit file by searching directly of the searched
pdf in a directories.
29