Вы находитесь на странице: 1из 15

An Affix Removal Stemmer for Afaraf Text

3rd Phase Implementation Presentation

Prepared by: Wubie Abiye

March/2018
Consecutive Implementation of
proposed algorithm
• Stop word removal
• Tokenization
• Normalization
• Stemming
Implementation Sections
• Section A./ in 2nd phase implementation
 Collecting and arranging rules for development of algorithm
 Java library for pdf file extraction
 Writing codes for the collected rules and experiment with
some collection of Afaraf words
 Collecting and make ready stop words and punctuation which
will remove from files.

• Section B.
 Remove stop words, punctuation (tokenize text) and normalize.
 Create GUI
 Evaluate final result
Proposed algorithm
1. Let x = total number of input text
// Preprocessing
Remove stop words
Tokenize words
Normalize words
// Stemming
2. For all “x” repeat 3 - 5
3. Check by prefix rules
If match founds apply rules // prefix matching
Else go to step 5
4. Check by suffix rules
If match founds apply rules // suffix matching
Else go to step 5
5. Display stem of words
Collected stop words
Stop word con..

Note: Total collected stop words are: 197


Tokenize and Normalize

2. Tokenization = “. , ? / | \ @* =^& ( ) +_ ; : “
‘ ! # $ % [ ] { }< > - 1 2 3 4 5 6 7 8 9 0”

3. Normalization: change any upper cases in


the file in to lower case example: - Xaagu to
xaagu, Baaxo to baaxo, Dagge to dagge
Input file contains:
Stop word, punctuation, upper case and
non stemmed words
GUI
GUI with example
performance measure
Accuracy =[(Total words – Total errors) / Total words ]*100

• I did experiment on Afaraf text file which contains 1500 words


• After apply stop word removal 1350 words remained , hence 150 stop words
removed .
The experiment accuracy shows as follow by counting :
• 1280 words are stemmed correctly , and 59 and 11 words are stemmed
incorrectly due to over stemming and under stemming

• Accuracy = (1350 – 70/ 1350)*100 = (1280/1350)*100 = 94.81%


Example of stem process.
Future tense:
• Gexeyyo (I will go),
• Gexele (she/he will go),
• Gexetto (you will go)
• Gexelon (they will go)
Past tense:
• Gexeh (he went)
• Gexxeh (she went)
• Gexeenih (they went)
Present continuous tense :
• Gexah (he is going)
• Gexxah (you/she is going)
• Gexaanah (they are going)

 stem form : Gex (Go)


Working paper status
Survey paper status

Вам также может понравиться