Академический Документы
Профессиональный Документы
Культура Документы
Definition
Regular Expressions provide an efficient way for string pattern matching, which are widely
used in UNIX systems, and occasionally on personal computers as well. They provide a very
powerful, but also rather obtuse, set of tools for finding particular words or combinations of
characters in strings.
On first reading, this all seems particularly complicated and not of much use over and above
the standard string matching provided in the Edit Filters dialog (Word matching, for
example). In actual fact, in these cases NewsWatcher converts your string matching criteria
into a regular expression when applying filters to articles.
However, you can use some of the simpler matching criteria with ease (some examples are
suggested below), and gradually build up the complexity of the regular expressions that you
use.
One point to note is that regular expressions are not wildcards. The regular expression 'c*t'
does not mean 'match "cat", "cot"' etc. In this case, it means 'match zero or more 'c' characters
followed by a t', so it would match 't', 'ct', 'cccct' etc.
. A regular expression can be specified by the use of following two characters:
A regular expression can define complex patterns of character sequences. For e.g.: The
regular expression given below looks for the literals f or ht, the literal t, the literal p which
might or might not be followed by literal s, and the closing ( : ) literal:
(f|ht) tps?:
The parentheses here are the metacharacters that are used to group a number of pattern
elements into a single element; the ( | ) symbol provides the functionality of OR, allowing for
either of the characters in the group to be checked. The ( ? ) too is used here as a
metacharacter indicating that the s literal may be optional. Hence the above regular
expression can successfully find the http:, https: , ftp: , ftps: strings.
Some Chronology
Regular Expressions were introduced by S.C. Kleene to describe the McCulloch and Pitts
1943 finite automata model of neurons. ("Representation of Events in Nerve Nets", p3-40 in
Claude Shannon/John McCarthy "Automata Studies", 1956)
The first application of Regular Expressions to editor search/replace (in the QED editor) was
by Ken Thompson, who published a Regular Expression-to-NFA algorithm in 1968, "Regular
Expression Search Algorithm", CACM 11:6, 419-422
Ken Thompson went on to re-implement this in the Unix ed editor, which Bill Joy turned into
thevi editor. Ken Thompson adapted the ed code for grep and sed. (Some years after its
creation, Emacs eventually borrowed the idea of Regular Expressions, but not the code,
directly from these Unix editors -- RMS, private communication)
Steve Johnson (prior to, and building towards, his Unix yacc tool) and Mike Lesk (in the
Unixlex) did some of the earliest applications of Regular Expressions to compiler lexical
analyzers via automated DFA-building tools.
Awk is a scripting language/command line tool derived directly from this Unix Culture of
Regular Expressions; it is no coincidence that the language most famous for Regular
Expressions today, perl, was developed in a Unix environment, inspired by awk and other
Unix Regular Expression tools.
Regular Expressions were thus widespread in Unix tools of all sorts from the beginning,
years to decades before this technology was widespread elsewhere (although obviously there
were exceptions), and Regular Expressions have always been an extremely important (albeit
under-acknowledged) part of Unix Culture
By centralizing match logic in Oracle Database, you avoid intensive string processing
of SQL results sets by middle-tier applications. For example, life science customers
often rely on PERL to do pattern analysis on bioinformatics data stored in huge
databases of DNAs and proteins. Previously, finding a match for a protein sequence
such as [AG].{4}GK[ST] was handled in the middle tier. The SQL regular expression
functions move the processing logic closer to the data, thereby providing a more
efficient solution.
Prior to Oracle Database 10g, developers often coded data validation logic on the
client, requiring the same validation logic to be duplicated for multiple clients. Using
server-side regular expressions to enforce constraints solves this problem.
The built-in SQL and PL/SQL regular expression functions and conditions make
string manipulations more powerful and less cumbersome than in previous releases of
Oracle Database.
Applications
1. Regular Expressions in Web Search Engines
One use of regular expressions that used to be very common was in web search engines.
Archie, one of the first search engines, used regular expressions exclusively to search through
a database of filenames on public FTP servers[1]. Once the World Wide Web started to take
form, the first search engines for it also used regular expressions to search through their
indexes. Regular expressions were chosen for these early search engines because of both their
power and easy implementation. It is a fairly trivial task to convert search strings into regular
expressions that accept only strings that have some relevance to the query. In the case of a
search engine, the strings input to the regular expression would be either whole web pages or
a pre-computed index of a web page that holds only the most important information from that
web page. A query such as regular expression could be translated into the following regular
expression.
(regular expression ) (expression regular )
, then, of course, would be the set of all characters in the character encoding used with this
search engine. The results returned to the user would be the set of web pages that were
accepted by this regular expression. Many other features commonly seen in search engines
are also easy to convert into regular expressions. One example of this is adding quotes around
a query to search for the whole string. The query "regular expression" could be converted into
the following regular expression: (regular expression )
Most of the other common features can also be easily converted into regular expressions.
Regular expressions are not used anymore in the large web search engines because with the
growth of the web it became impossibly slow to use regular expressions. They are however
still used in many smaller search engines such as a find/replace tool in a text editor or tools
such as grep.
expression is sufficient enough to abstractly describe a test case or a class of test cases, but
sets of expressions require their own criteria. The use of regular language theory makes it
easy for coverage analysis and test set generation. The most expenditure for software stems
from maintaining it rather than its actual development, and much of that expenditure goes to
testing. Regression testing is an important part of the software development cycle. It is a
process that is used to determine if a modified program still meets its specifications or if new
errors have been found. There is research being done to improve regression testing to make it
more efficient and specifically more economical. Regular language theory does not play a
huge part in regression testing, but on an integration level, a relation can be established to
finite automaton and regular languages and their properties.