Вы находитесь на странице: 1из 15

Spell check solution

Contents
Problem Description ..................................................................................................................................... 2 A. Three Approaches to this problem (First Step) ........................................................................................ 2 1. Duplet generation and liner search with comparison ...................................................................... 2 Theory ................................................................................................................................................... 2 Practice.................................................................................................................................................. 3 Efficiency ............................................................................................................................................... 6 2. Duplet generation and binary search with comparison ................................................................... 7 Theory ................................................................................................................................................... 7 Practice.................................................................................................................................................. 7 Efficiency ............................................................................................................................................... 7 3. Making a GET/POST request ............................................................................................................. 8 Theory ................................................................................................................................................... 8 Practice.................................................................................................................................................. 8 Efficiency ............................................................................................................................................. 10 4. i. ii. Other possible approaches ............................................................................................................. 10 Hamming distance ...................................................................................................................... 10 Microsoft Common Speller Application Programming Interface (CSAPI) ................................... 10

B. How to make my problem use any desired method at the wave of the wand making my problem scalable and flexible .................................................................................................................................... 10 i. ii. iii. Method 1: Basic Refactoring ....................................................................................................... 10 Method 2: Using Factory Pattern................................................................................................ 10 Method 3: Using Spring Framework ........................................................................................... 10

C. Final Step- Wrong Spelling Generator .................................................................................................... 11 Theory ..................................................................................................................................................... 11 Practice ................................................................................................................................................... 11 Output ..................................................................................................................................................... 12 Main program with final step piped to first step .................................................................................... 13

Spell check solution

Karan Bhandari

Problem Description
Spell check Solution

A. Three Approaches to this problem (First Step)


1. Duplet generation and liner search with comparison
Theory Before we plunge into the problem let us define a duplet with respect to this problem. It is an alternate/acronym that I'm using to call a pair of characters. Any word can be exploded into a collection of duplets. For example: Word, duplets Marvel, {ma,ar,rv,ve,el} Achieve, {ac,hi,ev,ch,ie,ve} With the advent of duplets with can perform fussy string matching. The user input is divided into duplets and the string we compare against(from dictionary) is also divided into duplets. For brevity we call user input as 'left hand side' (LHS) and one from dictionary string as 'right hand side' (RHS). In order to perform spell check I'm setting the strictness factor to 55%. That is if 55 percent of duplets of LHS match the duplets of RHS we arrive at an approximate equality or fussy equality. For example if the user inputs 'marvel'. LHS is {ma,ar,rv,ve,el} and the dictionary contains : marvel so RHS is {ma,ar,rv,ve,el}.100% match so marvellous -We are above 55% strictness factor.

Now if the user inputs marvol so duplet LHS is {ma,ar,rv,vo,ol}. And duplet RHS of dictionary entry is {ma,ar,rv,ve,el}.3 out of 5 match. Before we plunge into the problem let us define a duplet with respect to this problem. It is an alternate/acronym that I'm using to call a pair of characters. Any word can be exploded into a collection of duplets. For example: Word, duplets Marvel, {ma,ar,rv,ve,el}

Spell check solution Achieve, {ac,hi,ev,ch,ie,ve}

Karan Bhandari

With the advent of duplets with can perform fussy string matching. The user input is divided into duplets and the string we compare against(from dictionary) is also divided into duplets. For brevity we call user input as 'left hand side' (LHS) and one from dictionary string as 'right hand side' (RHS). In order to perform spell check I'm setting the strictness factor to 55%. That is if 55 percent of duplets of LHS match the duplets of RHS we arrive at an approximate equality or fussy equality. For example if the user inputs 'marvel'. LHS is {ma,ar,rv,ve,el} and the dictionary contains : marvel so RHS is {ma,ar,rv,ve,el}.100% match so marvellous -We are above 55% strictness factor. Here we can surmise that if marvol did not exist in the dictionary then marvel is the closest match. One may complain that certain words are similar like call and ball which may have similar strictness ratio with conflicting or non conflicting words. We will hail the one with max strictness factor as the new emperor. I have copied large list of English words (e.g. from /usr/share/dict/words on a unix system) to a file called dictionaryFile.txt and copied it to the location where the code is compiled. In the main function this program warns you if you have uppercase characters. It detects regular express pattern. Practice Code import java.io.BufferedReader; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; import java.util.ArrayList; import java.util.List;

public class SpellCheckDupletFashion { /** * @param args */ static double STRICT=0.55; Duplet generator Marvel, {ma,ar,rv,ve,el} Achieve, {ac,hi,ev,ch,ie,ve}

// // //

Spell check solution

Karan Bhandari

static public List<char[]> duplet(String input) { ArrayList<char[]> duplet = new ArrayList<char[]>(); for (int i = 0; i < input.length() - 1; i++) { char[] charArr = new char[2]; charArr[0] = input.charAt(i); charArr[1] = input.charAt(i+1); duplet.add(charArr); } return duplet; } //Function that detects approximate equality or fussy equality static public double strictnessFactor(List<char[]> duplet1, List<char[]> duplet2) { List<char[]> slave = new ArrayList<char[]>(duplet2); int flag = 0; for (int i = duplet1.size(); --i >= 0;) { char[] duplet = duplet1.get(i); for (int j = slave.size(); --j >= 0;) { char[] toMatch = slave.get(j); if (duplet[0] == toMatch[0] && duplet[1] == toMatch[1]) { slave.remove(j); flag += 2; break; } } } return (double) flag / (duplet1.size() + duplet2.size()); } //Java version of Read or Console.ReadLine or Scanf public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); return br.readLine(); }

public static String suggestMeTheRighteousOne(String userInput) { List<char[]> userDuplets=duplet(userInput); double maxStrictnessFactor=STRICT; String latestStrictWord=null;

Spell check solution try{

Karan Bhandari

//Access file stream FileInputStream fstream = new FileInputStream("dictionaryFile.txt"); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); // end of Access file stream String strLine; while ((strLine = br.readLine()) != null) { List<char[]> dictionaryDuplets=duplet(strLine); double currStrictFactor=strictnessFactor(dictionaryDuplets, userDuplets); if(currStrictFactor==1) { return "Bravo, you have cracked the spelling bee contest, the word exists in the dictionary"; } else if(currStrictFactor>=maxStrictnessFactor) { latestStrictWord=strLine; maxStrictnessFactor=currStrictFactor; } } in.close(); } catch (Exception e) { System.out.println("Error espousing out of dictionary File-"+e.toString()); } return latestStrictWord; }

public static void main(String[] args) { String userInput = null; System.out.println("Enter word"); try {userInput=getString();} catch (IOException e) {System.out.println("Error due to string insertion:"+e.toString());} String suggested=suggestMeTheRighteousOne(userInput); if(suggested.isEmpty()) suggested="NO SUGGESTION"; System.out.println("Antidote:"+suggested); if(userInput.matches(".*[A-Z].*")) System.out.println("Beware- your string contains uppercase");

Spell check solution }

Karan Bhandari

} Output

Efficiency Suppose if duplet generation of user input takes m time intervals and duplet generation of individual dictionary words take on average of p time intervals per word. If the size of dictionary is n and neglecting time for file operations. Strictness factor check could take h time intervals. It takes approximately k(m+p*n+n*h) time interval. Of the order of O(n) for reading the dictionary file since we read the file once. The next step extenuates it slightly.

Spell check solution

Karan Bhandari

2. Duplet generation and binary search with comparison


Theory The above code assumed that dictionary was unsorted. If my dictionary was sorted I could have used modified version of binary search on the dictionary words. In this we have a limitation, since we expect the use to input wrong characters sometimes we may never arrive on a right hit on the search results. So we could do a binary search on first two characters of dictionary word with first two characters of user word as the key. Here we also assume that the first two characters of the users input are righteous and valid. Once we reach the part in the dictionary with the matching first two characters using modified binary search. We could do testing of strictness factor in the linear fashion within words with matching first two characters. So we search for assume or assuume(wrong spelling) amongst the following : assess,assessable,assessment,assume,assumed,assuming,assumption.. We ought to copy file contents into the main memory before we plunge into it. For example into LinkedList<String> dictionaryWords=new LinkedList<String>(); Practice

int first = 0; int last = dictionaryWords.size(); while (first <= last) { int middle = (first + last) /2; if (key.substring(0,1).compareTo( dictionaryWords.get(middle).substring(0,1)) < 0) { last = middle-1; } else if (key.substring(0,1).compareTo(dictionaryWords.get(middle).substring(0,1)) > 0) { first = middle+1; } else { //Here we do linear strictness check between dictionaryWords.get(first) and //dictionaryWords.get(last) as done in previous section } Efficiency When equality searching is taken into consideration binary search could deliver performance anywhere between O(1) to O(log n). But here it is hybrid- a mix of binary and linear. So our performance is better than O(n) as execution time is mildly reduced due to the genes of binary being injected into it.

Spell check solution

Karan Bhandari

3. Making a GET/POST request


Theory I want you to open your web browser and type http://dictionary.reference.com/browse/people. You will see that it will yield you pronunciation, meaning, examples, synonyms, noun, verbs, etc. Now type http://dictionary.reference.com/browse/peopleee , you will see that it will suggest Did you mean: people. We can use the unix tool called Curl. It can also be downloaded for windows and obtain the html version of the page. So if we type Curl "http://dictionary.reference.com/browse/peoplee" | grep -o "Did you mean.*<\/a><\/span><span class=\"baud\">" at the terminal. It is sufficient. It will suggest the right spelling. Practice Examples of spell check with curl tool bhandari@linux-qty1:~> curl "http://dictionary.reference.com/browse/peoplee" | grep -o "Did you mean.*<\/a><\/span><span class=\"baud\">" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 455 0 455 0 0 928 0 --:--:-- --:--:-- --:--:-936Did you mean</span><span class="bmat"><a class="" href="http://dictionary.reference.com/browse/people" onmousedown="return pk(this,{lk:'rtxtk5',en:'scpmean',io:'0',b:'dym',tp:'mid',m:'scpmean'})">people</a></span><span class="baud"> 100 53564 0 53564 0 0 101k 0 --:--:-- --:--:-- --:--:-102k ----------------------------------------------------------------bhandari@linux-qty1:~> curl "http://dictionary.reference.com/browse/marvelousy" | grep -o "Did you mean.*<\/a><\/span><span class=\"baud\">" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 452 0 452 0 0 795 0 --:--:-- --:--:-- --:--:-801Did you mean</span><span class="bmat"><a class=""

Spell check solution

Karan Bhandari

href="http://dictionary.reference.com/browse/marvellous" onmousedown="return pk(this,{lk:'rtxtk5',en:'scpmean',io:'0',b:'dym',tp:'mid',m:'scpmean'})">marvellous</a></span><span class="baud"> 100 65917 0 65917 0 0 107k 0 --:--:-- --:--:-- --:--:-108k ------------------------------------bhandari@linux-qty1:~> curl "http://dictionary.reference.com/browse/marvelous" | grep -o "Did you mean.*<\/a><\/span><span class=\"baud\">" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 79203 0 79203 0 0 125k 0 --:--:-- --:--:-- --:--:-125k How to execute linux commands from java //to exe any linux command, create instance of my Linux class and call exeCmd(curl command), obtain //the substring of suggested Did you mean part. public class Linux { public static ArrayList exeCmd(String cmd) { Process p = null; String s = null; ArrayList arr=new ArrayList<String>(); try { p = Runtime.getRuntime().exec(cmd); BufferedReader stdInput = new BufferedReader(new InputStreamReader(p.getInputStream())); BufferedReader stdError = new BufferedReader(new InputStreamReader(p.getErrorStream())); while((s = stdInput.readLine())!=null) arr.add(s); while((s = stdError.readLine()) != null) arr.add(s); return arr; } catch (IOException e) { return arr; } }

Spell check solution }

Karan Bhandari

Or GET request in Java Create object of type HttpURLConnection and do InputStream reading. Here endpoint is the string URL of dictionary website allowing GET requests HttpURLConnection conn = get_connection(endpoint, "GET"); conn.connect(); Efficiency This method does not involve disk operations, we do not need to access hard drive so we do not have rotational delay, seek time, latency, TLB misses, cache miss, DMA, etc. But instead we have network related delays like propagation delay, queueing delay, transmission delay and processing delay. We could enhance network efficiency by adding an institutional cache within the LAN environment.

4. Other possible approaches


i. Hamming distance I could use hamming distance to find the number of substitutions needed between user word and dictionary word and suggest the word with least hamming distance. For example hamming distance between marvel and marvol is 1. Hamming distance between marvel and marikl is 2. ii. Microsoft Common Speller Application Programming Interface (CSAPI) We can tap into the API of Microsoft spell check API that is used for office using VB , expose its methods as web services and ask java to make SOAP calls to it.

B. How to make my problem use any desired method at the wave of the wand making my problem scalable and flexible
i. Method 1: Basic Refactoring 1. Draw suggestMeTheRighteousOne() into an interface 2. Use string builders instead of string. 3. Explain parameters and function and add comments to support javadocs ii. Method 2: Using Factory Pattern Suppose if type 1, it will extract dictionary from file, type 2 and it will use dictionary from FTP, 3database, 4- API call, 5- HTTP request. We could use switch case or if else block to select between them or even have several instances of derived class calling the spellcheck polymorphically. iii. Method 3: Using Spring Framework Here I try to achieve inversion of control and inject dependencies by using the spring framework.

Spell check solution

Karan Bhandari

Suppose if my manager asked me to read the dictionary from a file. I will implement a read method. It has file, input stream reader and buffered reader. I deploy it to client application. Now if my manager asks me to change the method from a file or to FTP or datastore then I am stumped. Twitch.tv already has millions of users. I need to change million clients- bad idea. One way to remedy is that I declare an interface and pass the interface to the read method, read will call the right service. But there is still partial dependency. So instead I create an xml file and declare all beans, args, values and properties in it and access it via ApplicationContext variable. So I can change xml from server. The client just calls interfaceInstance.read() and it oblivious of the method used to access the dictionary.

C. Final Step- Wrong Spelling Generator


Theory
The wrong spelling generator is able to generate wrong spellings of following types Jagged cases- Upper and Lower Missing Characters Mismatched vowels

Practice
import java.util.ArrayList; public class WrongSpellingGenerator { String wordUnderScrutiny; public WrongSpellingGenerator(String userInput){ wordUnderScrutiny=userInput; } ArrayList<String> missingCharacterCulprits() { ArrayList<String> returnList=new ArrayList<String>(); for(int i=1;i<wordUnderScrutiny.length()+1;i++) returnList.add(wordUnderScrutiny.substring(0,i1)+wordUnderScrutiny.substring(i,wordUnderScrutiny.length())); return returnList; } ArrayList<String> jaggedCases() { String original=wordUnderScrutiny; ArrayList<String> returnList=new ArrayList<String>(); for(int i=0;i<wordUnderScrutiny.length();i++)

Spell check solution

Karan Bhandari

returnList.add(wordUnderScrutiny.replace(String.valueOf(wordUnderScrutiny.charAt(i)), String.valueOf(wordUnderScrutiny.charAt(i)).toUpperCase())); for(int i=1;i<wordUnderScrutiny.length()+1;i++) returnList.add(wordUnderScrutiny.substring(0,i1)+wordUnderScrutiny.substring(i,wordUnderScrutiny.length()).toUpperCase()); for(int i=2;i<wordUnderScrutiny.length()+1;i++) returnList.add(wordUnderScrutiny.substring(0,i1).toUpperCase()+wordUnderScrutiny.substring(i,wordUnderScrutiny.length())); wordUnderScrutiny=original; return returnList; } ArrayList<String> vowelsConvulator() { String original=wordUnderScrutiny; ArrayList<String> returnList=new ArrayList<String>(); char[] vowels={'a','e','i','o','u'}; for (char ch : vowels) { if(wordUnderScrutiny.contains(String.valueOf(ch))) { for (char cha : vowels){ returnList.add(wordUnderScrutiny.replace(ch, cha)); } } wordUnderScrutiny=original; } return returnList; } }

Output

Enter word constitutionally Antidote:Bravo, you have cracked the spelling bee contest, the word exists in the dictionary -----------Wrong Spelling Generator----------------Missing char instances:[onstitutionally, cnstitutionally, costitutionally, contitutionally, consitutionally, consttutionally, constiutionally, constittionally, constituionally, constitutonally, constitutinally, constitutioally, constitutionlly, constitutionaly, constitutionaly, constitutionall] Jagged cases instances[Constitutionally, cOnstitutiOnally, coNstitutioNally, conStitutionally, consTiTuTionally, constItutIonally, consTiTuTionally, constitUtionally, consTiTuTionally, constItutIonally, cOnstitutiOnally, coNstitutioNally, constitutionAlly, constitutionaLLy, constitutionaLLy,

Spell check solution

Karan Bhandari

constitutionallY, ONSTITUTIONALLY, cNSTITUTIONALLY, coSTITUTIONALLY, conTITUTIONALLY, consITUTIONALLY, constTUTIONALLY, constiUTIONALLY, constitTIONALLY, constituIONALLY, constitutONALLY, constitutiNALLY, constitutioALLY, constitutionLLY, constitutionaLY, constitutionalY, constitutionall, Cnstitutionally, COstitutionally, CONtitutionally, CONSitutionally, CONSTtutionally, CONSTIutionally, CONSTITtionally, CONSTITUionally, CONSTITUTonally, CONSTITUTInally, CONSTITUTIOally, CONSTITUTIONlly, CONSTITUTIONAly, CONSTITUTIONALy, CONSTITUTIONALL] Vowels convulators[constitutionally, constitutionelly, constitutionilly, constitutionolly, constitutionully, constatutaonally, constetuteonally, constitutionally, constotutoonally, constututuonally, canstitutianally, censtitutienally, cinstitutiinally, constitutionally, cunstitutiunally, constitationally, constitetionally, constititionally, constitotionally, constitutionally]

Main program with final step piped to first step


import java.io.BufferedReader; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; import java.util.ArrayList; import java.util.List;

public class SpellCheckDupletFashion { /** * @param args */ static double STRICT=0.55; Duplet generator Marvel, {ma,ar,rv,ve,el} Achieve, {ac,hi,ev,ch,ie,ve} static public List<char[]> duplet(String input) { ArrayList<char[]> duplet = new ArrayList<char[]>(); for (int i = 0; i < input.length() - 1; i++) { char[] charArr = new char[2]; charArr[0] = input.charAt(i); charArr[1] = input.charAt(i+1); duplet.add(charArr); } return duplet; } //Function that detects approximate equality or fussy equality

// // //

Spell check solution

Karan Bhandari

static public double strictnessFactor(List<char[]> duplet1, List<char[]> duplet2) { List<char[]> slave = new ArrayList<char[]>(duplet2); int flag = 0; for (int i = duplet1.size(); --i >= 0;) { char[] duplet = duplet1.get(i); for (int j = slave.size(); --j >= 0;) { char[] toMatch = slave.get(j); if (duplet[0] == toMatch[0] && duplet[1] == toMatch[1]) { slave.remove(j); flag += 2; break; } } } return (double) flag / (duplet1.size() + duplet2.size()); } //Java version of Read or Console.ReadLine or Scanf public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); return br.readLine(); }

public static String suggestMeTheRighteousOne(String userInput) { List<char[]> userDuplets=duplet(userInput); double maxStrictnessFactor=STRICT; String latestStrictWord=""; try{ //Access file stream FileInputStream fstream = new FileInputStream("dictionaryFile.txt"); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); // end of Access file stream String strLine; while ((strLine = br.readLine()) != null) { List<char[]> dictionaryDuplets=duplet(strLine); double currStrictFactor=strictnessFactor(dictionaryDuplets, userDuplets); if(currStrictFactor==1)

Spell check solution {

Karan Bhandari

return "Bravo, you have cracked the spelling bee contest, the word exists in the dictionary"; } else if(currStrictFactor>=maxStrictnessFactor) { latestStrictWord=strLine; maxStrictnessFactor=currStrictFactor; } } in.close(); } catch (Exception e) { System.out.println("Error espousing out of dictionary File-"+e.toString()); } return latestStrictWord; }

public static void main(String[] args) { String userInput = null; System.out.println("Enter word"); try {userInput=getString();} catch (IOException e) {System.out.println("Error due to string insertion:"+e.toString());} String suggested=suggestMeTheRighteousOne(userInput); if(suggested.isEmpty()) suggested="NO SUGGESTION"; System.out.println("Antidote:"+suggested); if(userInput.matches(".*[A-Z].*")) System.out.println("Beware- your string contains uppercase"); System.out.println("-----------Wrong Spelling Generator-----------------"); WrongSpellingGenerator wrong=new WrongSpellingGenerator(userInput); System.out.println("Missing char instances:"+wrong.missingCharacterCulprits()); System.out.println("Jagged cases instances"+wrong.jaggedCases()); System.out.println("Vowels convulators"+wrong.vowelsConvulator()); }

Вам также может понравиться