Вы находитесь на странице: 1из 24

Strings and Regular Expressions in PHP

String Syntax
Single quotes: a string

No variable interpolation, \ is only escape code


Variables work, standard escape codes work Great for large multi-line blocks of text or html Variables are interpolated
Gotchas: newline must follow <<<END END; must be the entire line, with no whitespace

Double quotes: a $better string\n

Here-doc syntax: $foo = <<<END END;


January 18, 2005

UPHPU - Mac Newbold

String Operators
Array-like character access:

$str = MyBigString => $str{3} == B


This lets you join strings into . bigger ones
Note: Avoiding embedded newlines in strings that wrap onto multiple lines is a good idea

Concatenation: the dot operator

Concatenating Assignment : .=

$str = My name is; $str .= Mac.\n;


UPHPU - Mac Newbold 3

January 18, 2005

Variables in Strings
Simple string with a $var in it\n You can use $an_array[$var] too\n Sometimes you need ${curl}ies to mark

where the {$var}iable ends Curlies help on {$big[fancy][$stuff]} too Where its confusing to embed . $big[ugly][$var].iables, break it up as needed with concatenation.
January 18, 2005 UPHPU - Mac Newbold 4

Must-Have String Functions


www.php.net/strings echo/print (print $foo)==1, echo can,

$take,more than one,argument; Echo shortcut: <b><?=$foo?></b> trim, ltrim, rtrim/chop remove whitespace explode, implode/join

$arr = explode( , List of words); $str = implode(,,$arr);


UPHPU - Mac Newbold 5

January 18, 2005

Obligatory C-like Functions


All your old favorites are in there:

printf, sprintf, sscanf, fprintf strcmp, strlen, strpos, strtok

They all do just what you expect, though

many of them have easier alternatives Gotcha: Some of them (like strpos and friends) return boolean false, because 0 is a valid result. Always use ===false.
January 18, 2005 UPHPU - Mac Newbold 6

Basic String Manipulation


Any of this can be done with regular
expressions as well

and in more complex cases, can only be done with regular expressions
But regular expressions are slower (more later)

str_replace(bar,baz,foobar); str_repeat(1234567890,8);
January 18, 2005 UPHPU - Mac Newbold 7

Formatting functions
strtolower, strtoupper ucfirst, ucwords uppercase first char, or

first char of each word wordwrap wrap text to a given width str_pad(tooshort,15, ); vprintf, vfprintf, vsprintf formatted output number_format add thousands grouping money_format format as currency
January 18, 2005 UPHPU - Mac Newbold 8

Special-Purpose Functions
One of PHPs strengths is the way it caters
to the common things people need Many string functions are specifically for use with things like dates/times, URLs, HTML, and SQL databases Advice: When you need them, use them. Rolling your own doesnt usually work out the way you plan it.
January 18, 2005 UPHPU - Mac Newbold 9

Now for the fun stuff


Regular Expressions

PCRE POSIX

Performance/Speed considerations Grab bag of cool string functions

January 18, 2005

UPHPU - Mac Newbold

10

Regular Expressions
Extremely powerful tool for pattern
matching same thing used by compilers and interpreters to run your programs Two flavors in PHP:

PCRE Perl-Compatible Regular Expressions POSIX Extended

PCRE Advantages multiple languages,


more features, faster, and binary-safe
January 18, 2005 UPHPU - Mac Newbold 11

Basics of REs
They match patterns the magic is in the
pattern you tell them to match They have to be precise, including and excluding exactly what you want People get scared of them because the details can be tricky But theyre one of the best tools you have for doing some pretty fancy string stuff
12

RE Patterns

Start with strings and grouping: abc(def) Add alternative branches: abc(def|123) Wildcard: . matches any char but \n Quantifiers/Repeating:

* = 0 or more, + = 1 or more, ? = 0 or 1 {n} = n times, {n,m} = n to m times


At least one abc, maybe some triplets, then an even number of characters
13

(abc)+(def|123)*(.{2})*

Character Classes and Types


[] makes character classes List of characters and ranges: [a-zA-Z0-9]

If you want to use -, put it at the beginning Escape any special chars with \ as usual If first char is ^, class is negated

\d = [0-9], \D = [^0-9] \s = whitespace, \S = non-whitespace \w = [a-zA-Z0-9_], \W = [^a-zA-Z0-9_] \b = word boundary zero-width assertion
14

Anchors
What if you want to force it to match only at
the beginning of the string? Or to match the entire string? Use an anchor! ^ as the first char anchors the beginning $ as the last char anchors the end (Varies slightly in multi-line mode)
15

Greediness and Modifiers


Regular Expressions are Greedy

Theyll keep eating characters as long as they can keep matching. Consider: <.*> vs. <[^>]*> when matching against <b>Hi</b>
/i = case insensitive /U = un-greedy /m = multi-line
16

PCRE has modifiers: /<pattern>/<mods>


Back References
Most commonly used in replace operations,
but can be used in match patterns as well Parentheses not only group, but capture too Use \ followed by the number of the capture ab(.)\1(.)\2 will match abccdd or abxxyy, but not abcccd or abdcdc Can get tricky to count which backref goes where with nested parentheses
17

Modifiers for Parentheses


PCRE Only makes some things possible
that otherwise couldnt be done Non-capturing grouping: (?: )

Can simplify back-reference counting They dont advance the matching position Positive: (?= ), or Negative: (?! ) Very powerful, but not always easy to understand. Trial and error can be your friend!
18

Look-ahead Assertions:

PCRE Specifics
www.php.net/pcre preg_match, preg_match_all, preg_replace,

preg_split, preg_grep (filter an array) Perl REs have a delimiter, usually /, but can be anything:

preg_match(/foo/,$bar); preg_match(%/usr/local/bin/%,$path);

19

POSIX Specifics
www.php.net/regex ereg, ereg_replace, split, eregi, spliti, etc. [Only] Advantage over PCRE: It doesnt

require the PCRE library to be installed, so its always there in any PHP installation Other regex engines support this specification, though the Perl style seems to be more popular.
20

Almost there
Intro to Strings in PHP

(Feel free to tell me how fast or slow to go)

Functions relating to HTML, SQL, etc. Regular Expressions


PCRE POSIX

Performance/Speed considerations Grab bag of cool string functions


21

Performance/Speed
Rule of thumb: use the simplest function
that will get the job done right

strpos instead of substr str_replace instead of preg_replace And so forth The PHP manual online usually includes notes about speed differences

PCRE is faster than POSIX Regex


22

Grab Bag
md5, md5_file Calculate md5 hashes

Great for passwords in databases, etc.

levenshtein, similar_text calculate the

similarity of two strings metaphone, soundex calculate how similar two strings sound when spoken out loud str_rot13 Encryption algorithm

Protected by the DMCA


23

Grab Bag 2
str_shuffle words are much more fun once
theyve been randomized count_chars, str_word_count statistics about your strings str_rev if it doesnt make sense forward, try it backwards

24