Академический Документы
Профессиональный Документы
Культура Документы
Contents
i. Introduction
ii. Regular Expressions and Patterns
iii. Categories of Pattern Matching Characters
iv. String and Regular Expression methods
v. Sample Usage
Introduction
Validating user input is the bane of every software developer’s existence. When you are
developing cross-browser web applications (IE4+ and NS4+) this task becomes even less
enjoyable due to the lack of useful intrinsic validation functions in JavaScript. Fortunately,
JavaScript 1.2+ has incorporated regular expressions. In this article I will present a brief
tutorial on the basics of regular expressions and then give some examples of how they can
be used to simplify data validation.
Regular Expressions and Patterns
Regular expressions are very powerful tools for performing pattern matches. PERL
programmers and UNIX shell programmers have enjoyed the benefits of regular expressions
for years. Once you master the pattern language, most validation tasks become trivial. You
can perform complex tasks that once required lengthy procedures with just a few lines of
code using regular expressions.
So how are regular expressions implemented in JavaScript? There are two ways:
1) Using literal syntax.
2) When you need to dynamically construct the regular expression, via the RegExp()
constructor.
The literal syntax looks something like:
var RegularExpression = /pattern/
while the RegExp() constructor method looks like
var RegularExpression = new RegExp("pattern");
The RegExp() method allows you to dynamically construct the search pattern as a string,
and is useful when the pattern is not known ahead of time.
To use regular expressions to validate a string you need to define a pattern string that
represents the search criteria, then use a relevant string method to denote the action (ie:
search, replace etc). Patterns are defined using string literal characters and metacharacters.
For example, the following regular expression determines whether a string contains a valid
5-digit US postal code (for sake or simplicity, other possibilities are not considered):
<script language="JavaScript1.2">
function checkpostal(){
var re5digit=/^\d{5}$/ //regular expression defining a 5 digit number
if (document.myform.myinput.value.search(re5digit)==-1) //if match failed
alert("Please enter a valid 5 digit number inside form")
}
</script>
<form name="myform">
<input type="text" name="myinput" size=15>
<input type="button" onClick="checkpostal()" value="check">
</form>
Lets deconstruct the regular expression used, which checks that a string contains a valid 5-
digit number, and ONLY a 5-digit number:
var re5digit=/^\d{5}$/
• ^ indicates the beginning of the string. Using a ^ metacharacter requires that the
match start at the beginning.
• \d indicates a digit character and the {5} following it means that there must be 5
consecutive digit characters.
• $ indicates the end of the string. Using a $ metacharacter requires that the match
end at the end of the string.
Translated to English, this pattern states: "Starting at the beginning of the string there must
be nothing other than 5 digits. There must also be nothing following those 5 digits."
Now that you've got a taste of what regular expressions is all about, lets formally look at its
syntax, so you can create complex expressions that validate virtually anything you want.
Categories of Pattern Matching Characters
Pattern-matching characters can be grouped into various categories, which will be explained
in detail later. By understanding these characters, you understand the language needed to
create a regular expression pattern. The categories are:
• Position matching- You wish to match a substring that occurs at a specific location
within the larger string. For example, a substring that occurs at the very beginning
or end of string.
• Special literal character matching- All alphabetic and numeric characters by
default match themselves literally in regular expressions. However, if you wish to
match say a newline in Regular Expressions, a special syntax is needed, specifically,
a backslash (\) followed by a designated character. For example, to match a newline,
the syntax "\n" is used, while "\r" matches a carriage return.
• Character classes matching- Individual characters can be combined into character
classes to form more complex matches, by placing them in designated containers
such as a square bracket. For example, /[abc]/ matches "a", "b", or "c", while /[a-
zA-Z0-9]/ matches all alphanumeric characters.
• Repetition matching- You wish to match character(s) that occurs in certain
repetition. For example, to match "555", the easy way is to use /5{3}/
• Alternation and grouping matching- You wish to group characters to be
considered as a single entity or add an "OR" logic to your pattern matching.
• Back reference matching- You wish to refer back to a subexpression in the same
regular expression to perform matches where one match is based on the result of an
earlier match.
The following are categorized tables explaining the above:
Position Matching
Symbol Description Example
^ Only matches the beginning of a string. /^The/ matches "The" in
"The night" by not "In The
Night"
Literals
Symbol Description
\xxx Matches the ASCII character expressed by the octal number xxx.
\xdd Matches the ASCII character expressed by the hex number dd.
[xyz] Match any one character enclosed in the /[AN]BC/ matches "ABC"
character set. You may use a hyphen to and "NBC" but not "BBC"
denote range. For example. /[a-z]/ matches since the leading “B” is not
any letter in the alphabet, /[0-9]/ any single in the set.
digit.
[^xyz] Match any one character not enclosed in the /[^AN]BC/ matches "BBC"
character set. The caret indicates that none of but not "ABC" or "NBC".
the characters
Repetition
Symbol Description Example
{x} Match exactly x occurrences of a regular /\d{5}/ matches 5 digits.
expression.
Backreferences
Symbol Description Example
Pattern Switches
In addition to the pattern-matching characters, you can use switches to make the match
global or case- insensitive or both: Switches are added to the very end of a regular
expression.
i Ignore the case of /The/i matches "the" and "The" and "tHe"
characters.
g Global search for all /ain/g matches both "ain"s in "No pain no
occurrences of a pattern gain", instead of just the first.
gi Global search, ignore case. /it/gi matches all "it"s in "It is our IT
department"