String Tokens

String Handling
Supplementary Lecture
In C
C has no actual String type Convention is to use null-terminated arrays of chars char str[] = abc; a, b, c, \0 String.h library of functions to manipulate strdup(); strcmp(); strtok(),
#include <stdio.h> #include <string.h> int main( int argc, char *argv[] ){ /* this is C code */ char literal[] = "the cat sat on the mat"; // string literal char *str = strdup( literal ); // malloc me a duplicate printf( "original string starts as: '%s'\n", str ); char separators[] = " \t\n"; char *token = strtok( str, separators ); // first call slurps up str int count = 0; do{ printf( "token %d = '%s'\n", count++, token ); if( strcmp( "cat", token ) == 0 ){ // counter intuitive printf("(found the cat token)\n" ); } token = strtok( NULL, separators ); // subsequent calls, keep tokenizing }while( token != NULL ); printf( "original string is now: '%s'\n", str ); }
In C
% gcc mystring.c % a.out original string starts as: 'the cat sat on the mat' token 0 = 'the' token 1 = 'cat' (found the cat token) token 2 = 'sat' token 3 = 'on' token 4 = 'the' token 5 = 'mat' original string is now: 'the'
Note that strtok() destroys the original string as it tokenizes
import std.stdio; import std.string; void main( char args[][] ){ // D Code char[] str = "the wretched cat is still on the mat"; writefln("original string at start is '%s'", str);
In D
foreach(i, s; tokenise(str," \t\n") ){ // arrays have position property at front writefln(s); if( s == "cat" ) writefln("(found the cat at position %d)", i); } writefln("original string at end is '%s'", str); // uncorrupted string }
// D Code for tokenise()
// returns an array of tokens where each one is an array of chars char[][] tokenise(char[] input, char[] separators){ char[][] results = null; int start = -1; foreach( int i, char ch; input ){ // grabs position as well as the char if( separators.find(ch) == -1 ){ // no separators found if( start == -1 ) start = i; } else{ if(start != -1) { results ~= input[start..i]; // string concatenation start = -1; } } } if( start != -1 ){ // if we are not still at the beginning results ~= input[start..$]; // $ is end } return results;
}
% gdc mystring.d % a.out original string at start is 'the wretched cat is still on the mat' the wretched cat (found the cat at position 2) is still on the mat original string at end is 'the wretched cat is still on the mat'
Original string no longer destroyed Note the neat extraction of array position from the foreach Compilers usually produce a executable file called a.out unless you tell them explicitly otherwise
public class MyString{ // Java Code public static void main( String args[] ){
In Java
String str = "Who cares about the stupid cat anyway?"; // literal for( String token : str.split("\\s+") ){ // split takes a regular expression \s+ System.out.printf("%s\n", token ); if( token.equals( "cat" ) ) System.out.printf("(found the cat)\n"); } } }
% javac MyString.java % java MyString Who cares about the stupid cat (found the cat) anyway?
split() is a member function of String objects in Java split() takes a regular expression argument \s is the regex for any white space + in a regex means 1 or more times We need to escape the backslash so the compiler does not get confused so we need \\s+ as the argument to split split returns an array of Strings so the for (:) expression works regexs give us a general pattern matching language
Summary
C uses null-terminated char arrays for Strings and relies on library function support C++ has a String object which is not directly compatible D builds on the C char arrays but has a lot of built-in support Java has a built-in String object and method support Tokenising a string is a fairly simple operation Regular expressions provide a general pattern matching syntax

String Tokens

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

String Tokens

Загружено:

Авторское право:

Доступные форматы

String Handling

Note that strtok() destroys the original string as it tokenizes

// D Code for tokenise()

Вам также может понравиться