Академический Документы
Профессиональный Документы
Культура Документы
Assignment 1
CS 4050
P.R.A.I Gunarathna.
070147T
Department of Computer Science and Engineering.
CFA Elizabethan Language Version: 1.0
Report Date: 07/11/2010
Akila Gunarathna
Revision History
Date Version Description Author
07/11/10 1 Analysis using Shakespeare's Sonnets 1- 60 Akila Gunarathna
Table of Contents
1. Introduction 4
1.1 References 4
2. Problem Statement 4
3. Methodology 5
3.1 Method of Data Collection 5
3.2 Method of calculation 5
3.3 Use of computer programming 5
3.4 Assumptions 5
4. Results 6
4.1 Percentages of character frequency 6
4.2 Character Frequency Histogram 7
5. Analysis 7
6. Conclusion 8
7. Appendices 9
7.1 Appendix 1 Counter Class (only the logic is represented here) 9
7.2 Appendix 2 10
7.3 Appendix 3 11
Report.
1. Introduction
Elizabethan Language is the form of English Language used in the latter half of 15th
century. This has significant differences from the modern English we use. (in terms of
vocabulary, spelling and grammar) I have included few differences in Appendix 3.
Character Frequency Analysis is the study of frequency of letters in a language. This is
mainly useful in cryptanalysis. This report analyses the character frequency analysis of
Elizabethan language. Further I have compared it with the character frequency analysis
for modern English as well.
1.1 References
2. Problem Statement
This report addresses the problem of investigating whether the character frequency
analysis of the two stages of English languages differs or not. And explains how to
separate a sample of Modern English from a sample of Elizabethan Language. There is a
possibility that a text might be written in Elizabethan language and encrypted, so that an
average person would not understand even the original text. If the character frequency
analysis of that text is similar to that of Elizabethan language then the analyst can
determine that it is from Elizabethan Language.
3. Methodology
3.1 Method of Data Collection
The data was directly taken from the Shakespeare's Sonnet Collection ( Sonnets 1-60).
Three sonnets were taken as 1 single sample and each sample contained words between
300-350 on average. 20 samples were used for the analysis.
3.4 Assumptions
Following assumptions were made.
1. Only the characters in the alphabet was taken, other characters such as question marks,
commas were neglected.
2. Sonnets written by Shakespeare represent the Elizabethan language accurately.
4. Results
4.1 Percentages of character frequency
Character Frequency Analysis for Elizabethan Language
Table 1
8.00
6.00
Column X
4.00
2.00
0.00
A B C D E F GH I J K L MNO PQ R S T U VWX Y Z
Character
Figure 1
5. Analysis
Comparison between Elizabethan and Modern English.
Character % Elizabethan English Difference
A 6.66% 8.17% -1.51%
B 1.71% 1.49% 0.22%
C 1.84% 2.78% -0.94%
D 3.71% 4.25% -0.55%
E 12.83% 12.70% 0.13%
F 2.26% 2.23% 0.03%
G 1.87% 2.02% -0.15%
H 7.23% 6.09% 1.14%
I 6.24% 6.97% -0.72%
J 0.09% 0.15% -0.06%
K 0.66% 0.77% -0.11%
L 4.21% 4.03% 0.19%
M 2.82% 2.41% 0.42%
N 6.04% 6.75% -0.71%
O 7.81% 7.51% 0.30%
P 1.35% 1.93% -0.58%
Q 0.08% 0.10% -0.01%
R 5.71% 5.99% -0.28%
S 6.78% 6.33% 0.45%
T 10.00% 9.06% 0.94%
U 3.36% 2.76% 0.60%
V 1.26% 0.98% 0.28%
W 2.61% 2.36% 0.25%
X 0.08% 0.15% -0.07%
Y 2.77% 1.97% 0.80%
Z 0.03% 0.07% -0.04%
Table 2
With a close scrutiny at above results we can see there are slight differences between the
two stages of the language. Specially the most significant difference is with the letter A.
Modern English has a frequency of 8.17% while Elizabethan language has 6.66%. So
given a sample we can decide the stage of the language by calculating the % frequency of
letter A. (provided other frequencies are in line with the % table)
The second most significant difference occurs for letter “h”. Then “c” and “t”. Those 4
letter frequencies can be considered as the main parameters for distinguishing between
these two stages. Other character frequencies do not have a significant difference. (Here I
considered a minimum of 0.9% difference in order to distinguish a character frequency
from the other)
6. Conclusion
The frequency analysis of Elizabethan language and English do not differ significantly
for most of the charters. But there exists a significant difference in the frequencies of the
letters A, H, C, T.
7. Appendices
7.1 Appendix 1 Counter Class (only the logic is represented here)
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package cfa;
/**
*
* @author akila
*/
public class Counter {
System.out.print((char)(asciiA + a));
System.out.println(" : " + counters[a]);
}*/
return counters;
7.2 Appendix 2
Screen shots of the program for character counting.
Figure 2
7.3 Appendix 3
Differences between modern English and Elizabethan language.
Figure 3