Вы находитесь на странице: 1из 79

Contents

SED Regular expression

Contents
To execute sed from file
Sed regular expression

Contents
To execute sed from file
Sed regular expression

Using the file u3, do the following using sed, displaying


the result on the screen
1.Output only the lines that contain cow
Answer: sed -n '/cow/p' u3
2. Delete any line that contains cow
Answer:
sed '/cow/d' u3
3. Change the first instance of * on each line to !
Answer:
sed 's/*/!/' u3
4. Change all occurrences of * on each line to !
Answer:
sed 's/*/!/g' u3

5. Output only the lines that contain either cow or


calf
Answer:
sed -n -e '/cow/p' -e '/calf/p'
u3
6. Output the file after changing cow to COW on
lines 10-20
Answer:
sed '10,20s/cow/COW/g' u3
7. Output the entire file except lines 1-20
Answer:
sed '1,20d' u3
8. Delete any lines containing the string "news
Answer:
$ sed '/news/d'

8.

line 1 (one)
line 2 (two)
line 3 (three)
Command:
sed -e '1,2s/line/LINE/' file
Output:
LINE 1 (one)
LINE 2 (two)
line 3 (three)

9. Command:
sed -e '1,2d' file
Output:
line 3 (three)
10. Command:
sed -e '3d' file
Output:
line 1 (one)
line 2 (two)

11. Write a script to insert


12. Write a script to change

Sed from a file


If your sed script is getting long, you can put it into
a file, like so:
# This file is named "sample.sed
# comments can only appear in a block at the
beginning s/color/colour/g
s/flavor/flavour/g
s/theater/theatre/g
Then call sed with the "-f" flag:
sed -f sample.sed filename

Or, you can make an executable sed script:


#!/usr/bin/sed -f
# This file is named "sample2.sed"
s/color/colour/g
s/flavor/flavour/g
s/theater/theatre/g
then give it execute permissions:
chmod u+x sample2.sed
and then call it like so:
./sample2.sed filename

Note that you have to escape with backslashes the


many characters:
curlies \{ \} ,
round brackets \( \),
star \*,
plus \+,
question mark \?

Regular expression in sed


Special characters

Usage

Matches the beginning of the line

Matches the end of the line

Matches any single character

\*

Matches zero or more occurrence of the character

\+

Matches one or more occurrence

\?

Matches zero or one instance of the character

[ ]

Matches any character enclosed in [ ]

[^ ]
(character)\{m,n\}
(character)\{m,\}
(character)\{,n\}
(character)\{n\}
\(expression\)
\n
&

Matches any character not enclosed in [ ]


Match m-n repetitions of (character)
Match m or more repetitions of (character)
Match n or less (possibly 0) repetitions of
(character)
Match exactly n repetitions of (character)
Group operator. Also memorizes into numbered
variables - use for backreference as \1 \2 .. \9
Backreference - matches nth group

Regular Expressions (character classes)


The following character classes are short-hand for
matching special characters.
[:alnum:]
space)

Printable characters (includes white

[:alpha:] Alphabetic characters


[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Printable and visible (non-space)
characters
[:lower:] Lowercase characters
[:print:] Alphanumeric characters

The '^' character means the beginning of the


line.
Example:
sed 's/^Thu /Thursday/' filename
will turn "Thu " into "Thursday", but only at the
beginning of the line.
Example:
sed -e '/^#/d

Example:
/[Uu]nix/!d
word unix.

deletes lines that do not contain the

6d

deletes line 6

/^$/d

deletes all blank lines

1,10d

deletes lines 1 through 10

1,/^$/d deletes from line 1 through the first blank line


/^$/,/$/d deletes from the first blank line through the
last line of the file
/^$/,10d deletes from the first blank line through
line 10
/^Co*t/,/[0-9]$/d deletes from the first line that begins
with
Cot, Coot, Cooot, etc through the first

`[a-zA-Z0-9]'
This matches any letters or digits.
`[^a-z A-Z] '
This matches any letters .

Repetition using *
means 0 or more of the previous single character
pattern.
[abc]*

matches "aaaaa" or "acbca

Hi Dave.*

matches "Hi Dave" or "Hi Daveisgoofy

0*10

matches "010" or "0000010" or "10"

Repetition using +
+ means 1 or more of the previous single character
pattern.
[abc]+

matches "aaaaa" or "acbca

Hi Dave.+

matches "Hi Dave." or "Hi Dave.

0+10
matches "010" or "0000010" does not
match "10"
a\+b\+
matches one or more `a's followed
by one or more
`b's: `ab' is the shorter
possible match, but other
examples are
`aaaab' or `abbbbb' or

? Repetition Operator
? means 0 or 1 of the previous single character
pattern.
x[abc]?x

matches "xax" or "xx"

A[0-9]?B matches "A1B" or "AB" does not match


"a1b" or
"A123B"

`.\{9\}A$'
This matches an A that is the last character on line, with at
least nine preceding characters.
`^.\{15\}A
This matches an A that is the 16th character on a line.

sed G myfile.txt > newfile.txt


In the above example using the sed command with G
would double space the file myfile.txt and output the
results to the newfile.txt.
sed = myfile.txt | sed 'N;s/\n/\. /
The above example will use the sed command to output
each of the lines in myfile.txt with the line number
followed by a period and a space before each line. As
done with the first example the output could be
redirected to another file using > and the file name.

sed 's/test/example/g' myfile.txt > newfile.txt


Opens the file myfile.txt and searches for the word
"test" and replaces every occurrence with the word
"example".
sed -n '$=' myfile.txt
Above this command count the number of lines in
the myfile.txt and output the results.

Regular Expressions
(cont)
/^M.*/

Line begins with capital M, 0 or more chars follow

/..*/

At least 1 character long (/.+/ means the same thing)

/^$/

The empty line

ab|cd

Either ab or cd

a(b*|c*)d

matches any string beginning with a letter a, followed


by either zeroor more of the letter b, or zero or
more of the letter c, followed by the letter d.

[[:space:]
[:alnum:]
]

Matches any character that is either a white space


character or alphanumeric.

Note:
Sed always tries to find the longest matching pattern in the
input. How would you match a tag in an HTML document?

Grouping with parens


If you put a subpattern inside parens
you can use + * and ? to the entire
subpattern.
a(bc)*d matches "ad" and "abcbcd"
does not match "abcxd" or "bcbcd"

9. append three exclamation points to the end of each line in u3


that contains student
10.repeat the previous command, but only output the lines that you
change.
11.If you wanted to actually change the original file for questions
#3,4,6,7, and 9, how would you
do it?
9. sed '/student/s/$/!!!/' u3
10.sed -n '/student/s/$/!!!/p' u3
11.Save the output of the sed command in a temporary file and
then use the mv command to rename it
to the original. Never redirect output to the same file you are using
for input within the same command
or pipeline! Example (#9):
sed '/student/s/$/!!!/' u3 > xxx # <-- the shell overwrites xxx
BEFORE it starts sed
mv xxx u3
6. change all occurrences of cow to cows and cows using the
parenthesis operators and \1 substitution
Answer:
sed 's/\(cow\)/\1s and \1s/' u3

Using the file u3, do the following using sed, displaying


the result on the screen
1.Output only the lines that contain MCIS
Answer: sed -n '/MCIS/p' u3
2. Delete any line that contains mcis
Answer:
sed '/mcis/d' u3
3. Change the first instance of * on each line to !
Answer:
sed 's/*/!/' u3
4. Change all occurrences of * on each line to !
Answer:
sed 's/*/!/g' u3

5. Output only the lines that contain either MCIS


or VLSI
Answer:

sed -n -e '/MCIS /p' -e '/VLSI

/p' u3
6. Output the file after changing mcis to MCIS on
lines 10-20
Answer:

sed '10,20s/mcis/MCIS/g' u3

7. Output the entire file except lines 1-20


Answer:

sed '1,20d' u3

9.

line 1 (one)
line 2 (two)
line 3 (three)
Command:
sed -e '1,2s/line/LINE/' file
Output:
LINE 1 (one)
LINE 2 (two)
line 3 (three)

9. Command:
sed -e '1,2d' file
Output:
line 3 (three)
10. Command:
sed -e '3d' file
Output:
line 1 (one)
line 2 (two)

11. Write a sed script that will take two words and a file
name a input from the user.
Let the inputs be word1, word2, and filename. Write scripts to do the following
To insert the word2 at every place word1 is present in
the file u3
Answer:
#!/bin/sh
echo -n 'Enter the string to which the new string to be
appended:'
read string1
echo -n 'Enter the string which is used to append:'
read string2
echo -n 'Enter the filename '
read filename

Sed from a file


If your sed script is getting long, you can put it into
a file, like so:
# This file is named "sample.sed
# comments can only appear in a block at the
beginning s/color/colour/g
s/flavor/flavour/g
s/theater/theatre/g
Then call sed with the "-f" flag:
sed -f sample.sed filename

Or, you can make an executable sed script:


#!/usr/bin/sed -f
# This file is named "sample2.sed"
s/color/colour/g
s/flavor/flavour/g
s/theater/theatre/g
then give it execute permissions:
chmod u+x sample2.sed
and then call it like so:
./sample2.sed filename

Note that you have to escape with backslashes the


many characters:
curlies \{ \} ,
round brackets \( \),
star \*,
plus \+,
question mark \?

Regular expression in sed


Special characters

Usage

Matches the beginning of the line

Matches the end of the line

Matches any single character

\*

Matches zero or more occurrence of the character

\+

Matches one or more occurrence

\?

Matches zero or one instance of the character

[ ]

Matches any character enclosed in [ ]

[^ ]
(character)\{m,n\}
(character)\{m,\}
(character)\{,n\}
(character)\{n\}
\(expression\)
\n

Matches any character not enclosed in [ ]


Match m-n repetitions of (character)
Match m or more repetitions of (character)
Match n or less (possibly 0) repetitions of
(character)
Match exactly n repetitions of (character)
Group operator. Also memorizes into numbered
variables - use for backreference as \1 \2 .. \9
Backreference - matches nth group

The '^' character means the beginning of the


line.
Example:
sed 's/^Thu /Thursday/' filename
will turn "Thu " into "Thursday", but only at the
beginning of the line.
Example:
sed -e '/^#/d

Examples:
1,10d
/[Uu]nix/!d

deletes lines 1 through 10


deletes lines that do not contain the
word unix.

6d
/^$/d
1,/^$/d

deletes line 6
deletes all blank lines
deletes from line 1 through the first
blank line
deletes from the first blank line
through the last line of the file
deletes from the first blank line
through line 10

/^$/,/$/d
/^$/,10d

`[a-zA-Z0-9]'
This matches any letters or digits.
`[^a-z A-Z] '
This matches any letters .

Print only lines of 65 characters or longer


sed -n '/^.\{65\}/p
Print only lines of less than 65 characters
sed -n '/^.\{65\}/!p' # method 1, corresponds to above

Print line number 52


sed -n '52p' # method 1
sed '52!d' # method 2

print section of file between two regular expressions


sed -n '/Iowa/,/Montana/p' # case sensitive
print all of file EXCEPT section between 2 regular
expressions
sed '/Iowa/,/Montana/d'

The q or quit command


There is one more simple command that can restrict the
changes to a set of lines. It is the "q command: quit.
the third way to duplicate the head command is:
sed '11 q'
which quits when the eleventh line is reached.
This command is most useful when you wish to abort the
editing after some condition is reached.
The "q" command is the one command that does not take a
range of addresses.

Relationships between d, p, and !


As you may have noticed, there are often several ways to
solve the same problem with sed. This is
because print and delete are opposite functions, and it
appears that "!p" is similar to "d," while "!d" is
similar to "p." I wanted to test this, so I created a 20 line file,
and tried every different combination. The
following table, which shows the results, demonstrates the
difference:
Relations between d, p, and !
Sed Range Command Results
-------------------------------------------------------sed -n 1,10 p Print first 10 lines
sed -n 11,$ !p Print first 10 lines
sed 1,10 !d Print first 10 lines
sed 11,$ d Print first 10 lines

-------------------------------------------------------sed -n 1,10 !p Print last 10 lines


sed -n 11,$ p Print last 10 lines
sed 1,10 d Print last 10 lines
sed 11,$ !d Print last 10 lines
--------------------------------------------------------

sed -n 1,10 d Nothing printed


sed -n 1,10 !d Nothing printed
sed -n 11,$ d Nothing printed
sed -n 11,$ !d Nothing printed
-------------------------------------------------------sed 1,10 p Print first 10 lines twice,
Then next 10 lines once
sed 11,$ !p Print first 10 lines twice,
Then last 10 lines once
-------------------------------------------------------sed 1,10 !p Print first 10 lines once,
Then last 10 lines twice
sed 11,$ p Print first 10 lines once,
then last 10 lines twice

Obviously the command


sed '1,10 q
cannot quit 10 times. Instead
sed '1 q'
or
sed '10 q
is correct.

1. Delete lines that contain "O" at the beginning of the


line.
Answer:

sed '/^O/d' list.txt

2. Translate capital C,R,O into small c,r,o


Answer:

sed 'y/CRO/cro/' list.txt

3. Delete empty lines


Answer:

sed '/^$/d' list.txt

4. Remove lines containing anything other than


alphabets, numbers, or spaces
Answer:

sed '/ ^[0-9a-zA-Z ]/d' list.txt

Specifying a Range of Characters with [...]


If you want to match specific characters,
you can use the square brackets to identify the exact
characters you are searching for.
The pattern that will match any line of text that
contains exactly one number is
^[0123456789]$
This is verbose.
You can use the hyphen between two characters to
specify a range:
^[0-9]$

You can intermix explicit characters with character


ranges.
This pattern will match a single character that is a
letter, number, or underscore:
[A-Za-z0-9_]

If you wanted to search for a word that


Started with a capital letter "T." Was the first word
on a line
The second letter was a lower case letter
And the third letter was a vowel
the regular expression would be "^T[a-z][aeiou] ."

Delete all lines NOT beginning with an 'a,e,E or I'"


Answer:

sed '/^[^aeEI]/d'

list.txt

You can easily search for all characters except


those in square brackets by putting a "^" as the
first character after the "[."
To match all characters except vowels use
"[^aeiou]."

Repetition using *
means 0 or more of the previous single character
pattern.
[abc]*

matches "aaaaa" or "acbca

Hi Dave.*

matches "Hi Dave" or "Hi Daveisgoofy

0*10

matches "010" or "0000010" or "10"

Lets looks at another example:/a*bc[e-g]*[0-9]*/


Matches:aaaaabcfgh19919234
bc
abcefg123456789
abc45
Aabcggg87310
d*avid
Will match avid, david, ddavid
dddavid and any
other word with repeated ds
followed by avid

Compress all consecutive sequences of zeroes


into a single zero.
Answer:

s/00*/0/g

Repetition using +
+ means 1 or more of the previous single character
pattern.
[abc]+

matches "aaaaa" or "acbca

Hi Dave.+

matches "Hi Dave." or "Hi Dave.

0+10
matches "010" or "0000010" does not
match "10"
a\+b\+
matches one or more `a's followed
by one or more
`b's: `ab' is the shorter
possible match, but other
examples are
`aaaab' or `abbbbb' or

? Repetition Operator
? means 0 or 1 of the previous single character
pattern.
x[abc]?x

matches "xax" or "xx"

A[0-9]?B matches "A1B" or "AB" does not match


"a1b" or
"A123B"

`a\?b'
Matches `b' or `ab'.

Match any character with .


The character "." is one of those special metacharacters.
By itself it will match any character, except the
end-of-line character.
The pattern that will match a line with a single
characters is ^.$
Any character (except a metacharacter!)
matches itself.
The "." character matches any character except
newline.
"F." Matches an 'F' followed by any character.
"a.b" Matches 'a' followed by any1 char
followed by 'b'.

If you really want to match '.',


you can use "\."
a\.b a.b axb

Matching a specified number of the pattern using the


curly brackets {}
Using {n}, we match exactly that number of the
previous expression.
If we want to match 'aaaa' then we could use: a{4}
This would match exactly four a's.
If we want to match the pattern 1999 in our file
bazaar.txt,
then we would do: sed '/19{3}/p' bazaar.txt
This should print all lines containing the pattern 1999
in the bazaar.txt file.

The following expression would match a minimum of


four a's but a maximum of 10 a's in a particular
pattern: a\{4,10\}
Let's say we wanted to match any character a
minimum of 3 times, but a maximum of 7 times, then
we could affect a regular expression like: .\{3,7\}

`\{I\}'
As `*', but matches exactly I sequences (I is a decimal
integer;
for portability, keep it between 0 and 255 inclusive).
`\{I,J\}'
Matches between I and J, inclusive, sequences.
`\{I,\}'
Matches more than or equal to I sequences.

`.\{9\}A$
This matches nine characters followed by an `A'.
`^.\{15\}A'
This matches the start of a string that contains
16characters,
the last of which is an `A'.

`\(REGEXP\)
Groups the inner REGEXP as a whole, this is
used to:
* Apply postfix operators, like `\(abcd\)*':
this will search
for zero or more whole sequences of `abcd',
while `abcd*'
would search for `abc' followed by zero or
more occurrences
of `d'. Note that support for `\(abcd\)*' is
required by
POSIX 1003.1-2001, but many non-GNU
implementations do not
support it and hence it is not universally
portable.

`REGEXP1\|REGEXP2'
Matches either REGEXP1 or REGEXP2.
Use parentheses to use complex alternative regular
expressions.
The matching process tries each alternative in turn,
from left to right, and the first one that succeeds is
used.

`N
Add a newline to the pattern space, then append
the next line of input to the pattern space.
If there is no more input then SED exits without
processing any more commands.

File spacing:
space a file
sed G file name
insert a blank line below every line which matches "regex
sed '/regex/G'

count lines (emulates "wc -l")


sed -n '$='

Вам также может понравиться