Академический Документы
Профессиональный Документы
Культура Документы
Using the syntax for single characters and character strings, regular expressions can be created that match whole character strings or substrings of character strings. The syntax for find and replace offers some additional elements that support finding and replacing for substrings in character strings. The special characters that are valid in regular expressions are summarized in: Special characters in regular expressions.
______________________________________________________________
SAP AG 1
The regular expression 'AB' is a concatenation of two expressions for single characters. Operators for single characters These operators are made up of the special characters . , [, ], ^ and -, whereby the last two only act as special characters at specific positions within [ ]. The special characters can be made into literal characters using the prefix \. Placeholders for single characters The special character . is a placeholder for any single character. The operator \C has the same effect as the special character .. A regular expression . or \C matches exactly one single character. Examples The following table shows some results of the test program, , whereby the value transferred at ignore_case can be any value. Pattern Text Match . \C A a X X
______________________________________________________________
SAP AG 2
. ..
AB AB
The regular expression '..' is a concatenation of two expressions for single characters. Self-defined sets for single characters The special characters [ ] can be set around any number of literal characters or names for character classes (see below), and thus define a set of literal characters. A regular expression [...] matches exactly one single character that is listed as a literal character within the brackets, or which is contained in a specified character class. At least one literal character or one name for a character class (see below) must be contained within the brackets. A character [ or ], which is positioned directly after the opening bracket, is interpreted as a literal character. Some of the special characters that start with a backslash, such as \A or \Q, lose their special function within sets, and are interpreted as the simple literal character A or Q. Examples The following table shows some results of the test program. Pattern Text [ABC] [ABC] [AB][CD] [\d] B ABC AD 9
Match X X X
The regular expression [AB][CD] is a concatenation of two expressions for single characters. Negation of a self-defined set for single characters If the character ^ is the first character in a self-defined set for single characters and is listed directly after [, it acts as a special character and negates the rest of the set of literal characters or character classes. A regular expression [^...] matches exactly one single character that is not listed within the brackets as a literal character, or is not contained in a specified character class. A character ^ that is not listed directly after [ acts as a literal character. Examples The following table shows some results of the test program. Pattern Text [^ABC] [^ABC] [^ABC] [^A][^B] [A^B] B Y ABC BA ^
Match X X X X
The regular expression [^A][^B] is a concatenation of two expressions for single characters. Ranges in a self-defined set for single characters If the character - is between two literal characters, it acts as a special character and defines a range between the literal characters. The range is the set of characters that is enclosed by literal characters in the
______________________________________________________________
SAP AG 3
code page of the current operating system. A regular expression [...-...] matches exactly one single character that is within the defined range. A character -, which is not between two literal characters, acts as a literal character. A literal character can not be part of two ranges, for example, 'a-z-Z' is not a regular expression. Examples The following table shows some results of the test program. Pattern Text [A-Za-z0-9] [A-Za-z0-9] [A-Za-z0-9] [A-Za-z0-9] [A-Za-z0-9-] B 5 # -
Match X X X
In the last expression, the closing - does not act as a special character. Character classes Within sets for single characters defined with [ ], predefined platform-independent and language-independent character classes can be specified: [:alnum:] Set of all alphanumeric characters [:alpha:] Set of all upper and lower case letters including language-specific special characters (umlauts, accents, diphthongs) [:blank:] Blank characters and horizontal tabs [:cntrl:] Set of all control characters [:digit:] Set of all digits 0 to 9 [:graph:] Set of all graphic special characters [:lower:] Set of all lower case letters including language-dependent special characters (umlauts, accents, diphthongs) [:print:] Set of all displayable characters [:punct:] Set of all punctuation characters [:space:] Set of all blank characters, tabs, and carriage feeds
______________________________________________________________
SAP AG 4
[:unicode:] Set of all characters with a character representation larger than 255 (only in Unicode systems) [:upper:] Set of all upper case letters including language-dependent special characters (umlauts, accents, diphthongs) [:word:] Set of all alphanumeric characters including underscore _. [:xdigit:] Set of all hexadecimal digits (0-9, A-F and a-f) Note
Character classes only act within [ ] as specified. A regular expression [:digit:] does not define the set of all digits, but instead defines a character set consisting of :, d, g, i and t. To specify the set of all digits, the regular expression [[:digit:]] is used. Examples The following table shows some results from the test program, if the value ' ' is passed to ignore_case. Pattern Text Match [[:alnum:]] [[:alnum:]] [[:alpha:]] a ; 1 X X X X X
Abbreviations for character sets For frequently used character sets, specific operators are available as abbreviations: Character set Abb. Meaning [[:digit:]] [^[:digit:]] [[:lower:]] lower-case letter [^[:lower:]] that is not a lower-case letter [[:space:]] character [^[:space:]] other than blank characters [[:upper:]] upper-case letter \u Placeholder for an \S Placeholder for characters \L \s Placeholder for a character Placeholder for a blank \d \D \l Placeholder for a digit Placeholder for a non-digit Placeholder for a
______________________________________________________________
SAP AG 5
[^[:upper:]] that is not an upper-case letter [[:word:]] alphanumeric character including underscore _ [^[:word:]] non-alphanumeric character except for underscore _ Note
\U \w \W
If upper/lower case is not taken into account in the ABAP statement FIND and REPLACE or when generating an object of the class CL_ABAP_REGEX, then \l and \u are equivalent to [[:alpha:]] and \L, and \U is equivalent to [^[:alpha:]]. The special characters \w, \u, \l, \d, \s can also be listed within sets [...]. Use of the special characters \W, \U, \L, \D, \S within sets is not permitted and triggers an exception CX_SY_INVALID_REGEX. Examples The following table shows some results of the test program, if the ' ' is transferred to ignore_case. Pattern Text Match \d \D \l \l \L \s \S \u \U \w \w \W \W Equivalence classes The operators [..] and [==] are reserved for later language enhancements and trigger the exception CX_SY_INVALID_REGEX if used in sets. # U , A 8 : _ 4 ; u U s X X X X X X X X X X X -
______________________________________________________________
SAP AG 6
Match X X X -
H[aeu]llo is the concatenation of five regular expressions for single characters. Operators for character strings These operators are made up of the special characters {, }, *, +, ?, |, (, ) and \. The special characters can be made into literal characters using the prefix \ or by enclosing with \Q ... \E. Concatenation operators The operators {n}, {n,m}, *, + and ? (whereby n and m are natural numbers, including zero) can be written directly after a regular expression r, and thus generate concatenations rrr... of the regular expression: The regular expression r{n} is equivalent to a n-fold concatenation of r. The regular expression r{0} matches an empty character string, and therefore also the offset before the first character of a character string, the spaces between the characters in character strings, and the offset after the last character in a character string. The regular expression r{n,m} is equivalent to at least n and a maximum of m concatenations of r. The value of n must be smaller than or equal to the value of m. The expression r{n,} is equivalent to at least n concatenations of r. The regular expression r? is equivalent to r{0,1}, which means the expression r or the empty character string. The regular expression r* is equivalent to r{0,}, i.e. a concatenation of r of any length, including the empty character string. When using subgroups (see below), and in a text search, r* matches the longest possible substring (greedy behavior). The regular expression r+ is equivalent to r{1,}, i.e. a concatenation of any length of r excluding the empty character string. When using subgroups, and in a text search, r+ matches the longest possible substring (greedy behavior).
______________________________________________________________
SAP AG 7
The regular expressions r{n,m}?, r*? and r+? are reserved for later language enhancements (non-greedy behavior) and currently trigger the exception CX_SY_INVALID_REGEX. Example
The following table shows some results from the test program. Pattern Text Hel{2}o H.{4} .{0,4} .{4,} .+H.+e.+l.+l.+o.+ x*Hx*ex*lx*lx*ox* l+ Subgroups Hello Hello Hello Hello Hello Hello ll
Match X X X X X
The operators ( ... ) and (?: ... ) group concatenations of regular expressions together into one entity and thus influence the range of effectiveness of other operators such as * or |, which act on this entity. In this case, the regular expressions (r) and (?:r) match the regular expression r. Examples The following table shows some results of the test program. Pattern Text Tral+a Tr(al)+a Tr(?:al)+a Tralala Tralala Tralala
Match X X
In the first expression, the concatenation with the operator + acts on the literal character l, in the second and third expressions, it acts on the subgroup al. Subgroups with registration The operator ( ... ) acts in the same way as (?: ... ) in the formation of subgroups. In addition, when comparing the regular expression with a character string, the substrings that match the subgroups ( ... ) of the expression, are stored sequentially in registers. In this process, an operator \1, \2, \3, ... is assigned to each subgroup, which can be listed within the expression after its subgroup, and thus acts as a placeholder for the character string stored in the corresponding register. In text replacements, the special characters $1, $2, $3, ... can be used to access the last assignment to the register. The number of subgroups and registers is only limited by the capacity of the platform. Note The addition SUBMATCHES of the statements FIND and REPLACE and the eponymous column of the results table filled using the addition RESULTS can be used to access the content of all registers for a found location. The the class CL_ABAP_MATCHER contains the method GET_SUBMATCH for this purpose.
______________________________________________________________
SAP AG 8
Examples The following table shows some results of the test program. pattern text (["']).+\1 (["']).+\1 (["']).+\1 "Hello" 'Hello' 'Hello"
match X X -
The concatenation (["']).+\1 matches all text strings of which the first character is " or ' and the last character is the same as the first. A call matcher->get_submatch( index = 1 ) returns the values ", " or ' for all three cases. Reserved enhancements The character string (? ... ) is generally reserved for later language enhancements, and with the exception of the already supported operators (?:...), (?=...) and (?!...), triggers the exception CX_SY_INVALID_REGEX. Alternatives The operator | can be written between two regular expressions r and s, and thus generates a single regular expression r|s, which matches both r and s. Note Concatenations and other operators are more binding than |, which means that r|st and r|s+ are equivalent to r|(?:st) or r|(?:s+), and not to (?:r|s)t or (?:r|s)+. Examples The following table shows some results of the test program. Pattern Text H(e|a|u)llo H(e|a|u)llo He|a|ullo He|a|ullo Literal characters The operators \Q ... \E form a character string of literal characters from all enclosed characters. Special characters have no effect in this character string. The following table shows some results of the test program. Pattern Text .+\w\d .+\\w\\d .+\Q\w\d\E Special: \w\d Special: \w\d Special: \w\d Match X X Hello Hollo Hallo ullo
Match X X
______________________________________________________________
SAP AG 9
______________________________________________________________
SAP AG 10
______________________________________________________________
SAP AG 11
'Smile' TO text_tab. ' Smile' TO text_tab. ' Smile' TO text_tab. ' Smile' TO text_tab. ' Smile' TO text_tab. ' Smile' TO text_tab.
FIND ALL OCCURRENCES OF regex '^(?:Smile)|(?:Smile)$' IN TABLE text_tab RESULTS result_tab. Start and end of a word The operator \< fits at the start of a word and the operator \> fits at the end of a word. The operator <\b fits at both the beginning and the end of a word. A word is defined as an uninterrupted sequence of alphanumeric characters. Example The following search finds the three words One, two and 3. Instead of the expression \<[[:alnum:]]+\>, \b[[:alnum:]]+\b can also be used. DATA text TYPE string. DATA result_tab TYPE match_result_tab. text = `One, two, 3!`. FIND ALL OCCURRENCES OF regex '\<[[:alnum:]]+\>' IN text RESULTS result_tab. Preview conditions The operator (?=...) defines a regular expression s as a subsequent condition for a previous regular expression r. The regular expression r(?=s) has the same effect in a search as the regular expression r, if the regular expression s matches the substring that immediately follows the substring found with r. The operator (?!...) acts in the same way as (?=...), with the difference that r(?!s) matches the substring for r if s does not match the subsequent substring. Note The substring found by the preview s is not a part of the match found by r(?=s). Example The following search finds the substring la at offset 7. DATA text TYPE string. DATA result_tab TYPE match_result_tab. text = `Shalalala!`. FIND ALL OCCURRENCES OF REGEX '(?:la)(?=!)' IN text RESULTS result_tab.
______________________________________________________________
SAP AG 12
______________________________________________________________
SAP AG 13
______________________________________________________________
SAP AG 14
Example After replacement, text has the content again and again. DATA: text TYPE string. text = `again and`. REPLACE REGEX 'and' IN text WITH '$`$0 $`'. Addressing the text after the found location The operator $' can be used in the replacement text as a placeholder for the whole text after the current found location. Example After replacement, text has the content and again. DATA: text TYPE string. text = `again and`. REPLACE REGEX `again ` IN text WITH `$' $0`.
______________________________________________________________
SAP AG 15
Note Regular expressions with simplified syntax can only be used within the class CL_ABAP_REGEX. If the value 'X' is transferred for the input parameter simple_regex, the regular expression is viewed according to the simplified syntax. By default, the syntax is used according to the extended POSIX standard. If the simplified syntax is to be used in the statements FIND or REPLACE, an object must be transferred.
Example Regular expression (.) \(.\) Simplified expression a (a) a (.) (a) \(.\) Matches
______________________________________________________________
SAP AG 16
Example The following program determines all consecutive repeated double characters. DATA: regex TYPE REF TO cl_abap_regex, res TYPE match_result_tab, text TYPE string. CREATE OBJECT regex EXPORTING pattern = '\(.\)\1' simple_regex = 'X' FIND ALL OCCURRENCES OF REGEX regex IN text RESULTS res.
______________________________________________________________
SAP AG 17
Meaning Placeholder for any single character Placeholder for any single character Placeholder for any single digit Placeholder for any character other than a digit Placeholder for any lower-case letter Placeholder for any character other than a lower-case Placeholder for a blank character Placeholder for any character other than a blank Placeholder for any upper-case letter Placeholder for any character other than an upper-case Placeholder for any alphanumeric character including Placeholder for any non-alphanumeric character Definition of a value set for single characters Negation of a value set for single characters Definition of a range in a value set for single Description of all alphanumeric characters in a value Description of all letters in a value set Description for blank characters and horizontal Description of all control characters in a value set Description of all digits in a value set Description of all graphic special characters in a value Description of all lower-case letters in a value set
______________________________________________________________
SAP AG 18
[ [:print:] ] [ [:punct:] ] [ [:space:] ] carriage feeds in a value set [ [:unicode:] ] with a code larger than 255 [ [:upper:] ] [ [:word:] ] set, including _ [ [:xdigit:] ] \a \f \n \r \t \v [..] [==] -> More
Description of all displayable characters in a value set Description of all punctuation characters in a value set Description of all blank characters, tabulators, and Description of all Unicode characters in a value set Description of all upper-case letters in a value set Description of all alphanumeric characters in a value Description of all hexadecimal digits in a value set Diverse platform-specific control characters Reserved for later enhancements Reserved for later enhancements
Special characters for character string patterns Special character Meaning {n} {n,m} single characters {n,m}? ? * including 'no characters' *? + excluding 'no characters' +? | ( ) (?: ) \1, \2, \3 ... \Q ... \E (? ... ) -> More Special characters for search strings Special character ^ \A $ Meaning Anchor character for the start of a line Anchor character for the start of a character string Anchor character for the end of a line Concatenation of n single characters Concatenation of at least n and a maximum of m Reserved for later enhancements One or no single characters Concatenation of any number of single characters Reserved for later enhancements Concatenation of any number of single characters Reserved for later enhancements Linking of two alternative expressions Definition of subgroups with registration Definition of subgroups without registration Placeholder for the register of subgroups Definition of a string of literal characters Reserved for later enhancements
______________________________________________________________
SAP AG 19
\Z \< \> \b \B (?= ) (?! ) -> More Special characters for replacement texts Special character $0, $& $1, $2, $3... $` $' -> More
Anchor character for the end of a character string Start of a word End of a word Start or end of a word Space between characters within a word Preview condition Negated preview condition
Meaning Placeholder for the whole found location Placeholder for the register of subgroups Placeholder for the text before the found location Placeholder for the text after the found location
______________________________________________________________
SAP AG 20
______________________________________________________________
SAP AG 21
Expresiones regulares
______________________________________________________________
Copyright
Copyright 2011 SAP AG. Reservados todos los derechos. Queda prohibido el traspaso o la reproduccin de esta documentacin o de alguna de sus partes sin la autorizacin por escrito de SAP AG, sea cual sea el fin y la forma. La informacin contenida en esta documentacin podr modificarse o ampliarse sin previo aviso. SAP es una marca registrada de SAP AG. Todos los dems productos que se citan en esta documentacin son marcas registradas o no registradas de las empresas respectivas.
Expresiones regulares Sintaxis de expresiones regulares Patrn de carcter individual Patrn de strings String de bsqueda Patrn de sustitucin Expresiones regulares simplificadas Caracteres especiales en expresiones regulares Examinar expresiones regulares
1 1 2 7 11 14 16 18 21
______________________________________________________________
SAP AG iii