Вы находитесь на странице: 1из 37

awk - Read a file and split the contents

awk is one of the most powerful utilities used in the unix world. Whenever it comes to text parsing, sed and awk do some unbelievable things. In this first article on awk, we will see the basic usage of awk. The syntax of awk is: awk 'pattern{action}' file where the pattern indicates the pattern or the condition on which the action is to be executed for every line matching the pattern. In case of a pattern not being present, the action will be executed for every line of the file. In case of the action part not being present, the default action of printing the line will be done. Let us see some examples: Assume a file, say file1, with the following content:

The above awk command does not have any pattern or condition. Hence, the action will be executed on every line of the file. The action statement reads "print $1". awk, while reading a file, splits the different columns into $1, $2, $3 and so on. And hence the first column is accessible using $1, second using $2, etc. And hence the above command prints all the names which happens to be first column in the file. 2. Similarly, to print the second column of the file:

$ awk '{print $2}' file1 Domain Banking Telecom Finance

$ cat file1 Name Domain Deepak Banking Neha Telecom Vijay Finance Guru Migration

Migration

3. In the first example, the list of names got printed along with the header record. How to omit the header record and get only the names printed?

$ awk 'NR!=1{print $1}' file1 Deepak Neha Vijay Guru

This file has 2 fields in it. The first field indicates the name of a person, and the second field denoting their expertise, the first line being the header record. 1. To print only the names present in the file:

$ awk '{print $1}' file1 Name Deepak Neha Vijay Guru

The above awk command uses a special variable NR. NR denotes line number ranging from 1 to the actual line count. The conditon 'NR!=1' indicates not to execute the action part for the first line of the file, and hence the header record gets skipped. 4. How do we print the entire file contents?

$ awk '{print $0}' file1 Name Domain Deepak Banking

Neha Telecom Vijay Finance

using the same method as mentioned in Point 1.

$ awk '{print $1}' file1 Guru Migration Name,Domain,Expertise

$0 stands for the entire line. And hence when we do "print $0", the whole line gets printed. 5. How do we get the entire file content printed in other way?

Deepak,Banking,MQ Neha,Telecom,Power Vijay,Finance,CRM Guru,Migration,Unix

$ awk '1' file1 Name Domain Deepak Banking Neha Telecom Vijay Finance Guru Migration

The output looks weird. Isnt it? We expected only the first column to get printed, but it printed little more and that too not a definitive one. If you notice carefully, it printed every line till the first space is encountered. awk, by default, uses the white space as the delimiter which could be a single space, tab space or a series of spaces. And hence our original file was split into fields depending on space. Since our requirement now involves dealing with a file which is comma separated, we need to specify the delimiter.

The above awk command has only the pattern or condition part, no action part. The '1' in the pattern indicates "true" which means true for every line. As said above, no action part denotes just to print which is the default when no action statement is given, and hence the entire file contents get printed. Let us now consider a file with a delimiter. The delimiter used here is a comma. The comma separated file is called csv file. Assuming the file contents to be:

$ awk -F"," Name Deepak Neha Vijay

'{print $1}' file1

$ cat file1 Guru Name,Domain,Expertise Deepak,Banking,MQ Series Neha,Telecom,Power Builder Vijay,Finance,CRM Expert Guru,Migration,Unix

awk has a command line option "-F' with which we can specify the delimiter. Once the delimiter is specified, awk splits the file on the basis of the delimiter specified, and hence we got the names by printing the first column $1. 7. awk has a special variable called "FS" which stands for field separator. In place of the command line option "-F', we can also use the "FS".

This file contains 3 fields. The new field being the expertise of the respective person. 6. Let us try to print the first column of this csv file
$ awk '{print $1,$3}' FS="," file1

Name Expertise Deepak MQ Series Neha Power Builder Vijay CRM Expert Guru Unix

Deepak,MQ Series Neha,Power Builder Vijay,CRM Expert Guru,Unix

8. Similarly, to print the second column:

OFS is another awk special variable. Just like how FS is used to separate the input fields, OFS (Output field separator) is used to separate the output fields.

$ awk -F, '{print $2}' file1 Domain Banking Telecom Finance Migration

awk - Passing arguments or shell variables to awk


In one of our earlier articles, we saw how to read a file in awk. At times, we might have some requirements wherein we need to pass some arguments to the awk program or to access a shell variable or an environment variable inside awk. Let us see in this article how to pass and access arguments in awk: Let us take a sample file with contents, and a variable "x":

9. To print the first and third columns, ie., the name and the expertise:

$ cat file1 24 12

$ awk -F"," '{print $1, $3}' file1 34 Name Expertise 45 Deepak MQ Series $ echo $x Neha Power Builder 3 Vijay CRM Expert Guru Unix

Now, say we want to add every value with the shell variable x.

10. The output shown above is not easily readable since the third column has more than one word. It would have been better had the fields being displayed are present with a delimiter. Say, lets use comma to separate the output. Also, lets discard the header record.

1.awk provides a "-v" option to pass arguments. Using this, we can pass the shell variable to it.

$ awk -v val=$x '{print $0+val}' file1 $ awk -F"," 'NR!=1{print $1,$3}' OFS="," file1 27 15

37 48

have to quote the file contents. Assume, you have a file which contains the list of database tables. And for your requirement, you need to quote the file contents:

As seen above, the shell variable $x is assigned to the awk variable "val". This variable "val" can directly be accessed in awk. 2. awk provides another way of passing argument to awk without using -v. Just before specifying the file name to awk, provide the shell variable assignments to awk variables as shown below:

$ cat file CUSTOMER BILL ACCOUNT

$ awk file1 24,3 12,3 34,3 45,3

'{print $0,val}' OFS=, val=$x

4. Pass a variable to awk which contains the double quote. Print the quote, line, quote.

$ awk -v q="'" '{print q $0 q}' file 'CUSTOMER' 'BILL' 'ACCOUNT'

3. How to access environment variables in awk? Unlike shell variables, awk provides a way to access the environment variables without passing it as above. awk has a special variable ENVIRON which does the needful.

5. Similarly, to double quote the contents, pass the variable within single quotes:

$ awk '{print q $0 q}' q='"' file $ echo $x 3 $ export x $ awk OFS=, 24,3 12,3 34,3 45,3 '{print $0,ENVIRON["x"]}' file1 "CUSTOMER" "BILL" "ACCOUNT"

awk - Match a pattern in a file in Linux


In one of our earlier articles on awk series, we had seen the basic usage of awk or gawk. In this, we will see mainly how to search for a pattern in a file in awk. Searching pattern in the entire line or in a specific column. Let us consider a csv file with the following contents. The data in the csv file contains kind of expense report. Let us see how to use awk to filter data from the file.

Quoting file content: Some times we might have a requirement wherein we


$ cat file

Medicine,200 Grocery,500 Rent,900 Grocery,800 Medicine,600

$ awk -F, '$1 ~ /Rent/' file Rent,900

The -F option in awk is used to specify the delimiter. It is needed here since we are going to work on the specific columns which can be retrieved only when the delimiter is known. 5. The above pattern match will also match if the first column contains "Rents". To match exactly for the word "Rent" in the first column:

1. To print only the records containing Rent:

$ awk '$0 ~ /Rent/{print}' file $ awk -F, '$1=="Rent"' file Rent,900 Rent,900

~ is the symbol used for pattern matching. The / / symbols are used to specify the pattern. The above line indicates: If the line($0) contains(~) the pattern Rent, print the line. 'print' statement by default prints the entire line. This is actually the simulation of grep command using awk. 2. awk, while doing pattern matching, by default does on the entire line, and hence $0 can be left off as shown below:

6. To print only the 2nd column for all "Medicine" records:

$ awk -F, '$1 == "Medicine"{print $2}' file 200 600

$ awk '/Rent/{print}' file Rent,900

7. To match for patterns "Rent" or "Medicine" in the file:

3. Since awk prints the line by default on a true condition, print statement can also be left off.

$ awk '/Rent|Medicine/' file Medicine,200

$ awk '/Rent/' file Rent,900

Rent,900 Medicine,600

In this example, whenever the line contains Rent, the condition becomes true and the line gets printed. 4. In the above examples, the pattern matching is done on the entire line, however, the pattern we are looking for is only on the first column. This might lead to incorrect results if the file contains the word Rent in other places. To match a pattern only in the first column($1),

8. Similarly, to match for this above pattern only in the first column:

$ awk -F, '$1 ~ /Rent|Medicine/' file Medicine,200 Rent,900

Medicine,600

Rent,900 Grocery,800

9. What if the the first column contains the word "Medicines". The above example will match it as well. In order to exactly match only for Rent or Medicine,

Medicine,600

13. To print the Medicine record only if it is the 1st record:


$ awk -F, '$1 ~ /^Rent$|^Medicine$/' file Medicine,200 Rent,900 Medicine,600 $ awk 'NR==1 && /Medicine/' file Medicine,200

The ^ symbol indicates beginning of the line, $ indicates the end of the line. ^Rent$ matches exactly for the word Rent in the first column, and the same is for the word Medicine as well. 10. To print the lines which does not contain the pattern Medicine:

This is how the logical AND(&&) condition is used in awk. The records needed to be retrieved is only if it is the first record(NR==1) and the record is a medicine record. 14. To print all those Medicine records whose amount is greater than 500:

$ awk -F, '/Medicine/ && $2>500' file $ awk '!/Medicine/' file Grocery,500 Rent,900 Grocery,800 $ awk -F, '/Medicine/ || $2>600' file Medicine,600

15. To print all the Medicine records and also those records whose amount is greater than 600:

The ! is used to negate the pattern search.


Medicine,200

11. To negate the pattern only on the first column alone:

Rent,900 Grocery,800

$ awk -F, '$1 !~ /Medicine/' file Grocery,500 Rent,900 Grocery,800

Medicine,600

This is how the logical OR(||) condition is used in awk.

awk - Join or merge lines on finding a pattern


In one of our earlier articles, we had discussed about joining all lines in a file and also joining every 2 lines in a file. In this article, we will see the how we can join lines based on a pattern or joining lines on encountering a pattern using awk or gawk.

12. To print all records whose amount is greater than 500:

$ awk -F, '$2>500' file

space as delimiter. Let us assume a file with the following contents. There is a line with START in-between. We have to join all the lines following the pattern START.

$ awk '/START/{if (NR!=1)print "";next}{printf "%s ",$0}END{print "";}' file Unix Linux Solaris Aix SCO

$ cat file START Unix Linux START Solaris Aix SCO

This is same as the earlier one except it uses the format specifier %s in order to accommodate an additional space which is the delimiter in this case. 3. Join the lines following the pattern START with comma as delimiter.

$ awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}' file Unix,Linux Solaris,Aix,SCO

1. Join the lines following the pattern START without any delimiter.

$ awk '/START/{if (NR!=1)print "";next}{printf $0}END{print "";}' file UnixLinux SolarisAixSCO

Basically, what we are trying to do is: Accumulate the lines following the START and print them on encountering the next START statement. /START/ searches for lines containing the pattern START. The command within the {} will work only on lines containing the START pattern. Prints a blank line if the line is not the first line(NR!=1). Without this condition, a blank line will come in the very beginning of the output since it encounters a START in the beginning. The next command prevents the remaining part of the command from getting executed for the START lines. The second part of braces {} works only for the lines not containing the START. This part simply prints the line without a terminating new line character(printf). And hence as a result, we get all the lines after the pattern START in the same line. The END label is put to print a newline at the end without which the prompt will appear at the end of the last line of output itself. 2. Join the lines following the pattern START with

Here, we form a complete line and store it in a variable x and print the variable x whenever a new pattern starts. The command: x=(!x)?$0:x","$0 is like the ternary operator in C or Perl. It means if x is empty, assign the current line($0) to x, else append a comma and the current line to x. As a result, x will contain the lines joined with a comma following the START pattern. And in the END label, x is printed since for the last group there will not be a START pattern to print the earlier group. 4. Join the lines following the pattern START with comma as delimiter with also the pattern matching line.

$ awk '/START/{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' file START,Unix,Linux START,Solaris,Aix,SCO

The difference here is the missing next statement. Because next is not there, the commands present in the second set of curly braces are applicable for the START

line as well, and hence it also gets concatenated. 5. Join the lines following the pattern START with comma as delimiter with also the pattern matching line. However, the pattern line should not be joined.

1. To find the total of all numbers in second column. i.e, to find the sum of all the prices.

$ awk -F"," '{x+=$2}END{print x}' file 3000

$ awk '/START/{if (x)print x;print;x="";next}{x=(!x)?$0:x","$0;}END{ print x;}' file START Unix,Linux START Solaris,Aix,SCO

The delimiter(-F) used is comma since its a comma separated file. x+=$2 stands for x=x+$2. When a line is parsed, the second column($2) which is the price, is added to the variable x. At the end, the variable x contains the sum. This example is same as discussed in the awk example of finding the sum of all numbers in a file. If your input file is a text file with the only difference being the comma not present in the above file, all you need to make is one change. Remove this part from the above command: -F"," . This is because the default delimiter in awk is whitespace. 2. To find the total sum of particular group entry alone. i.e, in this case, of "Item1":

In this, instead of forming START as part of the variable x, the START line is printed. As a result, the START line comes out separately, and the remaining lines get joined.

awk - 10 examples to group data in a CSV or text file


awk is very powerful when it comes for file formatting. In this article, we will discuss some wonderful grouping features of awk. awk can group a data based on a column or field , or on a set of columns. It uses the powerful associative array for grouping. If you are new to awk, this article will be easier to understand if you can go over the article how to parse a simple CSV file using awk. Let us take a sample CSV file with the below contents. The file is kind of an expense report containing items and their prices. As seen, some expense items have multiple entries.
$ cat file Item1,200 Item2,500 Item3,900 Item2,800 Item1,600 $ awk -F, '$1=="Item1"{x+=$2;}END{print x}' file 800

This gives us the total sum of all the items pertaining to "Item1". In the earlier example, no condition was specified since we wanted awk to work on every line or record. In this case, we want awk to work on only the records whose first column($1) is equal to Item1. 3. If the data to be worked upon is present in a shell variable:

$ VAR="Item1" $ awk -F, -v inp=$VAR '$1==inp{x+=$2;}END{print x}' file 800

-v is used to pass the shell variable to awk, and the rest is same as the last one. 4. To find unique values of first column

$ awk -F, '{a[$1];}END{for (i in a)print i;}' file Item1 Item2 Item3

$ awk -F, '{a[$1]+=$2;}END{for(i in a)print i", "a[i];}' file Item1, 800 Item2, 1300 Item3, 900

Arrays in awk are associative and is a very powerful feature. Associate arrays have an index and a corresponding value. Example: a["Jan"]=30 meaning in the array a, "Jan" is an index with value 30. In our case here, we use only the index without values. So, the command a[$1] works like this: When the first record is processed, in the array named a, an index value "Item1" is stored. During the second record, a new index "Item2", during third "Item3" and so on. During the 4th record, since the "Item1" index is already there, no new index is added and the same continues. Now, once the file is processed completely, the control goes to the END label where we print all the index items. for loop in awk comes in 2 variants: 1. The C language kind of for loop, Second being the one used for associate arrays. for i in a : This means for every index in the array a . The variable "i" holds the index value. In place of "i", it can be any variable name. Since there are 3 elements in the array, the loop will run for 3 times, each time holding the value of an index in the "i". And by printing "i", we get the index values printed. To understand the for loop better, look at this:

a[$1]+=$2 . This can be written as a[$1]=a[$1]+$2. This works like this: When the first record is processed, a["Item1"] is assigned 200(a["Item1"]=200). During second "Item1" record, a["Item1"]=800 (200+600) and so on. In this way, every index item in the array is stored with the appropriate value associated to it which is the sum of the group. And in the END label, we print both the index(i) and the value(a[i]) which is nothing but the sum. 6. To find the sum of all entries in second column and add it as the last record.

$ awk -F"," '{x+=$2;print}END{print "Total,"x}' file Item1,200 Item2,500 Item3,900 Item2,800 Item1,600 Total,3000

for (i in a) { print i; }

This is same as the first example except that along with adding the value every time, every record is also printed, and at the end, the "Total" record is also printed. 7. To print the maximum or the biggest record of every group:

Note: The order of the output in the above command may vary from system to system. Associative arrays do not store the indexes in sequence and hence the order of the output need not be the same in which it is entered. 5. To find the sum of individual group records. i.e, to sum all records pertaining to Item1 alone, Item2 alone, and so on.

$ awk -F, '{if (a[$1] < $2)a[$1]=$2;}END{for(i in a){print i,a[i];}}' OFS=, file Item1,600 Item2,800

Item3,900

Before storing the value($2) in the array, the current second column value is compared with the existing value and stored only if the value in the current record is bigger. And finally, the array will contain only the maximum values against every group. In the same way, just by changing the "lesser than(<)" symbol to greater than(>), we can find the smallest element in the group. The syntax for if in awk is, similar to the C language syntax: if (condition) { <code for true condition > }else{ <code for false condition> }

A little tricky this one. In this awk command, there is only condition, no action statement. As a result, if the condition is true, the current record gets printed by default. !a[$1]++ : When the first record of a group is encountered, a[$1] remains 0 since ++ is post-fix, and not(!) of 0 is 1 which is true, and hence the first record gets printed. Now, when the second records of "Item1" is parsed, a[$1] is 1 (will become 2 after the command since its a post-fix). Not(!) of 1 is 0 which is false, and the record does not get printed. In this way, the first record of every group gets printed. Simply by removing '!' operator, the above command will print all records other than the first record of the group. 10. To join or concatenate the values of all group items. Join the values of the second column with a colon separator:

8. To find the count of entries against every group:

$ awk -F, '{if(a[$1])a[$1]=a[$1]":"$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' OFS=, file Item1,200:600 Item2,500:800 Item3,900

$ awk -F, '{a[$1]++;}END{for (i in a)print i, a[i];}' file Item1 2 Item2 2 Item3 1

a[$1]++ : This can be put as a[$1]=a[$1]+1. When the first "Item1" record is parsed, a["Item1"]=1 and every item on encountering "Item1" record, this count is incremented, and the same follows for other entries as well. This code simply increments the count by 1 for the respective index on encountering a record. And finally on printing the array, we get the item entries and their respective counts. 9. To print only the first record of every group:

This if condition is pretty simple: If there is some value in a[$1], then append or concatenate the current value using a colon delimiter, else just assign it to a[$1] since this is the first value. To make the above if block clear, let me put it this way: "if (a[$1])" means "if a[$1] has some value".
if(a[$1]) a[$1]=a[$1]":"$2; else a[$1]=$2

$ awk -F, '!a[$1]++' file Item1,200 Item2,500 Item3,900 $ awk -F, '{a[$1]=a[$1]?a[$1]":"$2:$2;}END{for (i in a)print i, a[i];}' OFS=, file

The same can be achieved using the awk ternary operator as well which is same as in the C language.

Item1,200:600 Item2,500:800 Item3,900

1. Split the file into 3 different files, one for each item. i.e, All records pertaining to Item1 into a file, records of Item2 into another, etc.

$ awk -F, '{print > $1}' file1

Ternary operator is a short form of if-else condition. An example of ternary operator is: x=x>10?"Yes":"No" means if x is greater than 10, assign "Yes" to x, else assign "No". In the same way: a[$1]=a[$1]?a[$1]":"$2:$2 means if a[$1] has some value assign a[$1]":"$2 to a[$1] , else simply assign $2 to a[$1].

The files generated by the above command are as below:

$ cat Item1 Item1,200

Concatenate variables in awk: One more thing to notice is the way string concatenation is done in awk. To concatenate 2 variables in awk, use a space in-between. Examples:

Item1,600 $ cat Item3 Item3,900 $ cat Item2

z=x y

#to concatenate x and y Item2,500

z=x":"y #to concatenate x and y with a colon separator.

Item2,800

awk - 10 examples to split a file into multiple files


In this article of the awk series, we will see the different scenarios in which we need to split a file into multiple files using awk. The files can be split into multiple files either based on a condition, or based on a pattern or because the file is big and hence needs to split into smaller files. Sample File1: Let us consider a sample file with the following contents:

This looks so simple, right? print prints the entire line, and the line is printed to a file whose name is $1, which is the first field. This means, the first record will get written to a file named 'Item1', and the second record to 'Item2', third to 'Item3', 4th goes to 'Item2', and so on. 2. Split the files by having an extension of .txt to the new file names.

$ awk -F, '{print > $1".txt"}' file1

$ cat file1 Item1,200 Item2,500 Item3,900 Item2,800 Item1,600

The only change here from the above is concatenating the string ".txt" to the $1 which is the first field. As a result, we get the extension to the file names. The files created are below:

$ ls *.txt Item2.txt Item1.txt Item3.txt

3. Split the files by having only the value(the second field) in the individual files, i.e, only 2nd field in the

new files without the 1st field:

$ awk -F, '{print $2 > $1".txt"}' file1

The condition for greater or lesser than 500 is checked and the appropriate file name is assigned to variable x. The record is then written to the file present in the variable x. Sample File2: Let us consider another file with a different set of contents. This file has a pattern 'START' at frequent intervals.

The print command prints the entire record. Since we want only the second field to go to the output files, we do: print $2.

$ cat Item1.txt 200 600

$ cat file2 START Unix Linux

4. Split the files so that all the items whose value is greater than 500 are in the file "500G.txt", and the rest in the file "500L.txt".

START Solaris Aix SCO

$ awk -F, '{if($2<=500)print > "500L.txt";else print > "500G.txt"}' file1

The output files created will be as below:

5. Split the file into multiple files at every occurrence of the pattern START .

$ cat 500L.txt Item1,200 Item2,500 $ cat 500G.txt Item3,900 Item2,800 Item1,600

$ awk '/START/{x="F"++i;}{print > x;}' file2

Check the second field($2). If it is lesser or equal to 500, the record goes to "500L.txt", else to "500G.txt". Other way to achieve the same thing is using the ternary operator in awk:

This command contains 2 sets of curly braces: The control goes to the first set of braces only on encountering a line containing the pattern START. The second set will be encountered by every line since there is no condition and hence always true. On encountering the pattern START, a new file name is created and stored. When the first START comes, x will contain "F1" and the control goes to the next set of braces and the record is written to F1, and the subsequent records go the file "F1" till the next START comes. On encountering next START, x will contain "F2" and the subsequent lines goes to "F2" till the next START, and it continues.
$ cat F1

$ awk -F, '{x=($2<=500)?"500L.txt":"500G.txt"; print > x}' file1

START Unix

Linux Solaris

since this is where the file is created first.

$ cat F1 $ cat F2 ANY HEADER START Unix Aix Linux SCO $ cat F2

6. Split the file into multiple files at every occurrence of the pattern START. But the line containing the pattern should not be in the new files.

ANY HEADER Solaris Aix

$ awk '/START/{x="F"++i;next}{print > x;}' file2

SCO

The only difference in this from the above is the inclusion of the next command. Due to the next command, the lines containing the START enters the first curly braces and then starts reading the next line immediately due to the next command. As a result, the START lines does not get to the second curly braces and hence the START does not appear in the split files.
$ cat F1 Unix Linux $ cat F2 Solaris Aix SCO

Sample File3: Let us consider a file with the sample contents:

$ cat file3 Unix Linux Solaris AIX SCO

8. Split the file into multiple files at every 3rd line . i.e, First 3 lines into F1, next 3 lines into F2 and so on.

7. Split the file by inserting a header record in every new file.

$ awk 'NR%3==1{x="F"++i;}{print > x}' file3

$ awk '/START/{x="F"++i;print "ANY HEADER" > x;next}{print > x;}' file2

The change here from the earlier one is this: Before the next command, we write the header record into the file. This is the right place to write the header record

In other words, this is nothing but splitting the file into equal parts. The condition does the trick here: NR%3==1 : NR is the line number of the current record. NR%3 will be equal to 1 for every 3rd line such as 1st, 4th, 7th and so on. And at every 3rd line, the file name is changed in the variable x, and hence the records are written to the appropriate files.

$ cat F1 Unix Linux Solaris $ cat F2 Aix SCO

Solaris $ cat F2 AIX SCO

10. Split the file at every 3rd line, retaining the header and trailer in every file.

Sample File4: Let us update the above file with a header and trailer:

$ awk 'BEGIN{getline f;}NR%3==2{x="F"++i;a[i]=x;print f>x;}{print > x}END{for(j=1;j<i;j++)print> a[j];}' file4

$ cat file4 HEADER Unix Linux Solaris AIX

This one is little tricky. Before the file is processed, the first line is read using getline into the variable f. NR%3 is checked with 2 instead of 1 as in the earlier case because since the first line is a header, we need to split the files at 2nd, 5th, 8th lines, and so on. All the file names are stored in the array "a" for later processing. Without the END label, all the files will have the header record, but only the last file will have the trailer record. So, the END label is to precisely write the trailer record to all the files other than the last file.
$ cat F1

SCO HEADER TRAILER Unix

9. Split the file at every 3rd line without the header and trailer in the new files.

Linux Solaris

sed '1d;$d;' file4 | awk 'NR%3==1{x="F"++i;}{print > x}'

TRAILER $ cat F2

The earlier command does the work for us, only thing is to pass to the above command without the header and trailer. sed does it for us. '1d' is to delete the 1st line, '$d' to delete the last line.
$ cat F1 Unix Linux

HEADER Aix SCO TRAILER

awk - 10 examples to read files with multiple delimiters

In this article of awk series, we will see how to use awk to read or parse text or CSV files containing multiple delimiters or repeating delimiters. Also, we will discuss about some peculiar delimiters and how to handle them using awk. Let us consider a sample file. This colon separated file contains item, purchase year and a set of prices separated by a semicolon.

continues till the end of the line is reached. In this way, $4 contained the first part of the price component above. Note: Always keep in mind. While specifying multiple delimiters, it has to be specified inside square brackets( [;:] ). 3. To sum the individual components of the 3rd column and print it:

$ cat file Item1:2010:10;20;30 Item2:2012:12;29;19 Item3:2014:15;50;61

$ awk -F '[;:]' '{$3=$3+$4+$5;print $1,$2,$3}' OFS=: file Item1:2010:60 Item2:2012:60 Item3:2014:126

1. To print the 3rd column which contains the prices:

$ awk -F: '{print $3}' file 10;20;30 12;29;19 15;50;61

The individual components of the price($3) column are available in $3, $4 and $5. Simply, sum them up and store in $3, and print all the variables. OFS (output field separator) is used to specify the delimiter while printing the output. Note: If we do not use the OFS, awk will print the fields using the default output delimiter which is space. 4. Un-group or re-group every record depending on the price column:

This is straight forward. By specifying colon(:) in the option with -F, the 3rd column can be retrieved using the $3 variable. 2. To print the 1st component of $3 alone:

$ awk -F '[;:]' '{for(i=3;i<=5;i++){print $1,$2,$i;}}' OFS=":" file Item1:2010:10 Item1:2010:20 Item1:2010:30

$ awk -F '[:;]' '{print $4}' file 20 29 50

Item2:2012:12 Item2:2012:29 Item2:2012:19 Item3:2014:15 Item3:2014:50 Item3:2014:61

What did we do here? Specified multiple delimiters, one is : and other is ; . How awk parses the file? Its simple. First, it looks at the delimiters which is colon(:) and semi-colon(;). This means, while reading the line, as and when the delimiter : or ; is encountered, store the part read in $1. Continue further. Again on encountering one of the delimiters, store the read part in $2. And this

The requirement here is: New records have to be created for every component of the price column.

Simply, a loop is run on from columns 3 to 5, and every time a record is framed using the price component. 5-6. Read file in which the delimiter is square brackets:

7-8. Read or parse a file containing a series of delimiters:

$ cat file $ cat file 123;abc[202];124 125;abc[203];124 127;abc[204];124 123;;;202;;;203 124;;;213;;;203 125;;;222;;;203

The above file contains a series of 3 semi-colons between every 2 values. 5. To print the value present within the brackets: 7. Using the multiple delimiter method:
$ awk -F '[][]' '{print $2}' file $ awk -F'[;;;]' '{print $2}' file 202 203 204

At the first sight, the delimiter used in the above command might be confusing. Its simple. 2 delimiters are to be used in this case: One is [ and the other is ]. Since the delimiters itself is square brackets which is to be placed within the square brackets, it looks tricky at the first instance. Note: If square brackets are delimiters, it should be put in this way only, meaning first ] followed by [. Using the delimiter like -F '[[]]' will give a different interpretation altogether. 6. To print the first value, the value within brackets, and the last value:

Blank output !!! The above delimiter, though specified as 3 colons is as good as one delimiter which is a semicolon(;) since they are all the same. Due to this, $2 will be the value between the first and the second semi-colon which in our case is blank and hence no output. 8. Using the delimiter without square brackets:

$ awk -F';;;' '{print $2}' file 202 213

$ awk -F '[][;]' '{print $1,$3,$5}' OFS=";" file 123;202;124 125;203;124 127;204;124

222

The expected output !!! No square brackets is used and we got the output which we wanted. Difference between using square brackets and not using it : When a set of delimiters are specified using square brackets, it means an OR condition of the delimiters. For example, -F '[;:]' means to separate the contents either on encountering ':' or ';'. However, when a set of delimiters are specified without using square

3 delimiters are used in this case with semi-colon also included.

brackets, awk looks at them literally to separate the contents. For example, -F ':;' means to separate the contents only on encountering a colon followed by a semi-colon. Hence, in the last example, the file contents are separated only when a set of 3 continuous semicolons are encountered. 9. Read or parse a file containing a series of delimiters of varying lengths: In the below file, the 1st and 2nd column are separated using 3 semi-colons, however the 2nd and 3rd are separated by 4 semi-colons

124 203 125 203

In this case, we use the word "Unix" as the delimiter. And hence $1 and $2 contained the appropriate values . Keep in mind, it is not just the special characters which can be used as delimiters. Even alphabets, words can also be used as delimiters. P.S: We will discuss about the awk split command on how to use it in these types of multiple delimited files.

$ cat file 123;;;202;;;;203 124;;;213;;;;203 125;;;222;;;;203 $ awk -F';'+ '{print $2,$3}' file 202 203 213 203 222 203

awk - Passing awk variables to shell


In one of our earlier articles, we discussed how to access or pass shell variables to awk. In this, we will see how to access the awk variables in shell? Or How to access awk variables as shell variables ? Let us see the different ways in which we can achieve this. Let us consider a file with the sample contents as below:
$ cat file Linux 20 Solaris 30 HPUX 40

1. Access the value of the entry "Solaris" in a shell variable, say x:


$ x=`awk '/Solaris/{a=$2;print a}' file` $ echo $x 30

The '+' is a regular expression. It indicates one or more of previous characters. ';'+ indicates one or more semi-colons, and hence both the 3 semi-colons and 4 semi-colons get matched. 10. Using a word as a delimiter:

This approach is fine as long as we want to access only one value. What if we have to access multiple values in shell? 2. Access the value of "Solaris" in x, and "Linux" in y:
$ z=`awk '{if($1=="Solaris")print "x="$2;if($1=="Linux")print "y="$2}' file` $ echo "$z" y=20 x=30 $ eval $z $ echo $x 30 $ echo $y 20

$ cat file 123Unix203 124Unix203 125Unix203

Retrieve the numbers before and after the word "Unix" :

awk sets the value of "x" and "y" awk variables and prints which is collected in the shell variable "z". The eval command evaluates the variable meaning it executes the commands present in the variable. As a result, "x=30" and "y=20" gets executed, and they become shell variables x and y with appropriate values. 3. Same using the sourcing method:
$ awk '{if($1=="Solaris")print "x="$2;if($1=="Linux")print "y="$2}' file > f1 $ source f1 $ echo $x

$ awk -F'Unix' '{print $1, $2}' file 123 203

30 $ echo $y 20

4. To insert a column before the 2nd last column


$ awk -F, '{$(NF-1)=++i FS $(NF-1);}1' OFS=, file Unix,1,10,A Linux,2,30,B Solaris,3,40,C Fedora,4,20,D Ubuntu,5,50,E

Here, instead of collecting the output of awk command in a variable, it is re-directed to a temporary file. The file is then sourced or in other words executed in the same shell. As a result, "x" and "y" become shell variables. Note: Depending on the shell being used, the appropriate way of sourcing has to be done. The "source" command is used here since the default shell is bash.

NF-1 points to the 2nd last column. Hence, by concatenating the serial number in the beginning of NF1 ends up in inserting a column before the 2nd last. 5. Update 2nd column by adding 10 to the variable:
$ awk -F, '{$2+=10;}1' OFS=, file Unix,20,A Linux,40,B Solaris,50,C Fedora,30,D Ubuntu,60,E

awk - 10 examples to insert / remove / update fields of a CSV file


How to manipulate a text / CSV file using awk/gawk? How to insert/add a column between columns, remove columns, or to update a particular column? Let us discuss in this article. Consider a CSV file with the following contents:
$ cat file Unix,10,A Linux,30,B Solaris,40,C Fedora,20,D Ubuntu,50,E

$2 is incremented by 10. 6.Convert a specific column(1st column) to uppercase in the CSV file:
$ awk -F, '{$1=toupper($1)}1' OFS=, file UNIX,10,A LINUX,30,B SOLARIS,40,C FEDORA,20,D UBUNTU,50,E

1. To insert a new column (say serial number) before the 1st column
$ awk -F, '{$1=++i FS $1;}1' OFS=, file 1,Unix,10,A 2,Linux,30,B 3,Solaris,40,C 4,Fedora,20,D 5,Ubuntu,50,E

Using the toupper function of the awk, the 1st column is converted from lowercase to uppercase. 7. Extract only first 3 characters of a specific column(1st column):
$ awk -F, '{$1=substr($1,0,3)}1' OFS=, file Uni,10,A Lin,30,B Sol,40,C Fed,20,D Ubu,50,E

$1=++i FS $1 => Space is used to concatenate columns in awk. This expression concatenates a new field(++i) with the 1st field along with the delimiter(FS), and assigns it back to the 1st field($1). FS contains the file delimiter. 2. To insert a new column after the last column
$ awk -F, '{$(NF+1)=++i;}1' OFS=, file Unix,10,A,1 Linux,30,B,2 Solaris,40,C,3 Fedora,20,D,4 Ubuntu,50,E,5

Using the substr function of awk, a substring of only the first few characters can be retrieved. 8.Empty the value in the 2nd column:
$ awk -F, '{$2="";}1' OFS=, file Unix,,A Linux,,B Solaris,,C Fedora,,D Ubuntu,,E

$NF indicates the value of last column. Hence,by assigning something to $(NF+1), a new field is inserted at the end automatically. 3. Add 2 columns after the last column:
$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file Unix,10,A,1,X Linux,30,B,2,X Solaris,40,C,3,X Fedora,20,D,4,X Ubuntu,50,E,5,X

Set the variable of 2nd column($2) to blank(""). Now, when the line is printed, $2 will be blank. 9. Remove/Delete the 2nd column from the CSV file:
$ awk -F, '{for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=2 file Unix,A Linux,B Solaris,C Fedora,D Ubuntu,E

The explanation gives for the above 2 examples holds good here.

By just emptying a particular column, the column stays as is with empty value. To remove a column, all the

subsequent columns from that position, needs to be advanced one position ahead. The for loop loops on all the fields. Using the ternary operator, every column is concatenated to the variable "f" provided it is not 2nd column using the FS as delimiter. At the end, the variable "f" is printed which contains the updated record. The column to be removed is passed through the awk variable "x" and hence just be setting the appropriate number in x, any specific column can be removed. 10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
$ awk -F, '{$2=$2":"$x;for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=3 file Unix,10:A Linux,30:B Solaris,40:C Fedora,20:D Ubuntu,50:E

%M for minutes and %S for seconds. In this way, strftime converts Unix time into a date string. 2. Display current date time using strftime without systime:
$ echo | awk '{print strftime("%d-%m-%y %H-%M%S");}' 14-01-13 12-38-08

Both the arguments of strftime are optional. When the timestamp is not provided, it takes the systime by default. 3. strftime with no arguments:
$ echo | awk '{print strftime();}' Mon Jan 14 12:30:05 IST 2013

strftime without the format specifiers provides the output in the default output format as the Unix date command. mktime: mktime function converts any given date time string into a Unix time, which is of the systime format. Syntax: mktime(date time string) # where date time string is a string which contains atleast 6 components in the following order: YYYY MM DD HH MM SS 1. Printing timestamp for a specific date time :
$ echo | awk '{print mktime("2012 12 21 0 0 0");}' 1356028200

Almost same as last example expcept that first the 3rd column($3) is concatenated with 2nd column($2) and then removed.

gawk - Date and time calculation functions


gawk has 3 functions to calculate date and time:

systime strftime mktime

This gives the Unix time for the date 21-Dec-12. 2. Using strftime with mktime:
$ echo | awk '{print strftime("%d-%m%Y",mktime("2012 12 21 0 0 0"));}' 21-12-2012

Let us see in this article how to use these functions: systime: This function is equivalent to the Unix date (date +%s) command. It gives the Unix time, total number of seconds elapsed since the epoch(01-01-1970 00:00:00).
$ echo | awk '{print systime();}' 1358146640

The output of mktime can be validated by formatting the mktime output using the strftime function as above. 3. Negative date in mktime:
$ echo | awk '{print strftime("%d-%m%Y",mktime("2012 12 -1 0 0 0"));}' 29-11-2012

Note: systime function does not take any arguments. strftime: A very common function used in gawk to format the systime into a calendar format. Using this function, from the systime, the year, month, date, hours, mins and seconds can be separated. Syntax: strftime (<format specifiers>,unix time); 1. Printing current date time using strftime:
$ echo | awk '{print strftime("%d-%m-%y %H-%M%S",systime());}' 14-01-13 12-37-45

mktime can take negative values as well. -1 in the date position indicates one day before the date specified which in this case leads to 29th Nov 2012. 4. Negative hour value in mktime:
$ echo | awk '{print strftime("%d-%m-%Y %H-%M%S",mktime("2012 12 3 -2 0 0"));}' 02-12-2012 22-00-00

-2 in the hours position indicates 2 hours before the specified date time which in this case leads to "2-Dec2012 22" hours.

strftime takes format specifiers which are same as the format specifiers available with the date command. %d for date, %m for month number (1 to 12), %y for the 2 digit year number, %H for the hour in 24 hour format,

gawk - Calculate date / time difference between timestamps

How to find the time difference between timestamps


using gawk? Let us consider a file where the 1st column is the Process name, 2nd is the start time of the process, and 3rd column is the end time of the process. The requirement is to find the time consumed by the process which is the difference between the start and the end times. 1. File in which the date and time component are separated by a space:
$ cat file P1,2012 12 4 21 36 48,2012 12 4 22 26 53 P2,2012 12 4 20 36 48,2012 12 4 21 21 23 P3,2012 12 4 18 36 48,2012 12 4 20 12 35

Note: The start and end time has only the date components, no time components Difference in seconds:
$ awk -F, '{gsub(/-/," ",$2);gsub(/-/," ",$3);$2=$2" 0 0 0";$3=$3" 0 0 0";d2=mktime($3);d1=mktime($2);print $1","d2d1,"secs";}' file P1,172800 secs P2,345600 secs P3,86400 secs

In addition to replacing the '-' and ':' with spaces, 0's are appended to the date field since the mktime requires the date in 6 column format. Difference in days:
$ awk -F, '{gsub(/-/," ",$2);gsub(/-/," ",$3);$2=$2" 0 0 0";$3=$3" 0 0 0";d2=mktime($3);d1=mktime($2);print $1","(d2d1)/86400,"days";}' file P1,2 days P2,4 days P3,1 days

Time difference in seconds:


$ awk -F, '{d2=mktime($3);d1=mktime($2);print $1","d2-d1,"secs";}' file P1,3005 secs P2,2675 secs P3,5747 secs

Using mktime function, the Unix time is calculated for the date time strings, and their difference gives us the time elapsed in seconds. 2. File with the different date format :
$ cat file P1,2012-12-4 21:36:48,2012-12-4 22:26:53 P2,2012-12-4 20:36:48,2012-12-4 21:21:23 P3,2012-12-4 18:36:48,2012-12-4 20:12:35

A day has 86400(24*60*60) seconds, and hence by dividing the duration in seconds by 86400, the duration in days can be obtained

sed - Include or Append a line to a file


sed is one of the most important editors we use in UNIX. It supports lot of file editing tasks. In this article, we will see a specific set of sed options. Assume I have a flat file, empFile, containing employee name and employee id as shown below:

Note: This file has the start time and end time in different formats Difference in seconds:
$ awk -F, '{gsub(/[-:]/," ",$2);gsub(/[-:]/," ",$3);d2=mktime($3);d1=mktime($2);print $1","d2-d1,"secs";}' file P1,3005 secs P2,2675 secs P3,5747 secs

Hilesh, 1001 Bharti, 1002 Aparna, 1003 Harshal, 1004 Keyur, 1005

Using gsub function, the '-' and ':' are replaced with a space. This is done because the mktime function arguments should be space separated. Difference in minutes:
$ awk -F, '{gsub(/[-:]/," ",$2);gsub(/[-:]/," ",$3);d2=mktime($3);d1=mktime($2);print $1","(d2-d1)/60,"mins";}' file P1,50.0833 mins P2,44.5833 mins P3,95.7833 mins

Just by dividing the seconds difference by 60 gives us the difference in minutes. 3. File with only date, without time part:
$ cat file P1,2012-12-4,2012-12-6 P2,2012-12-4,2012-12-8 P3,2012-12-4,2012-12-5

1. How to add a header line say "Employee, EmpId" to this file using sed?

$ sed

'1i Employee, EmpId'

empFile

Employee, EmpId

Hilesh, 1001 Bharti, 1002 Aparna, 1003 Harshal, 1004 Keyur, 1005

Keyur, 1005

As shown above, the '-i' option edits the file in-place without the need of a temporary file. 2. How to add a line '-------' after the header line or the 1st line?

$ sed -i '1a ---------------'

empFile

This command does the following: The number '1' tells the operation is to be done only for the first line. 'i' stands for including the following content before reading the line. So, '1i' means to include the following before reading the first line and hence we got the header in the file. However, the file with the header is displayed only in the output, the file contents still remain the old file. So, if the user's requirement is to update the original file with this output, the user has to re-direct the output of the sed command to a temporary file and then move it to the original file. The UNIX system which has the GNU version contains sed with the '-i' option. This option of the sed command is used to edit the file in-place. Let us see the same above example using '-i' option:

$ cat empFile Employee, EmpId --------------Hilesh, 1001 Bharti, 1002 Aparna, 1003 Harshal, 1004 Keyur, 1005

$ cat empFile Hilesh, 1001 Bharti, 1002 Aparna, 1003 Harshal, 1004 Keyur, 1005 $ sed -i '1i Employee, EmpId' empFile $ cat empFile Employee, EmpId Hilesh, 1001 Bharti, 1002 Aparna, 1003 Harshal, 1004

'1i' is similar to '1a' except that 'i' tells to include the content before reading the line, 'a' tells to include the content after reading the line. And hence in this case, the '----' line gets included after the 1st line. As you thought correctly, even if you had used '2i', it will work well and fine. 3. How to add a trailer line to this file?

$ sed -i '$a ---------------' empFile $ cat empFile Employee, EmpId --------------Hilesh, 1001 Bharti, 1002 Aparna, 1003 Harshal, 1004 Keyur, 1005

---------------

Say, add the record for the employee 'Aparna' before the employee record of 'Harshal'

To add to the last line of the file, we need to know the total line count of the file to use in the above mentioned methods. However, sed has the '$' symbol which denotes the last line. '$a' tells to include the following content after reading the last line of the file. 4. How to add a record after a particular record? Let us assume the sample file contains only 3 records as shown below:

$ sed -i '/Harshal/i Aparna, 1003' empFile $ cat empFile Employee, EmpId --------------Hilesh, 1001

Employee, EmpId --------------Hilesh, 1001 Harshal, 1004 Keyur, 1005 ---------------

Bharti, 1002 Aparna, 1003 Harshal, 1004 Keyur, 1005 ---------------

Now, if I want to insert the record for the employee 'Bharti' after the employee 'Hilesh':

Similarly, /Harshal/i tells to include the following contents before reading the line containing the pattern 'Harshal'. Note: As said above, the '-i' option will only work if the sed is GNU sed. Else the user has to re-direct the output to a temporary file and move it to the original file.

$ sed -i '/Hilesh/a Bharti, 1002' empFile $ cat empFile Employee, EmpId --------------Hilesh, 1001 Bharti, 1002 Harshal, 1004 Keyur, 1005

sed - Replace or substitute file contents


In one our earlier articles, we saw how to insert a line or append a line to an existing file using sed. In this article, we will see how we can do data manipulation or substitution in files using sed. Let us consider a sample file, sample1.txt, as shown below:

apple --------------orange

If you note the above sed command carefully, all we have done is in place of a number, we have used a pattern. /Hilesh/a tells to include the following contents after finding the pattern 'Hilesh', and hence the result. 5. How to add a record before a particular record?

banana pappaya

bAnana

1. To add something to the beginning of a every line in a file, say to add a word Fruit:

pAppaya

$ sed 's/^/Fruit: /' sample1.txt Fruit: apple

Please note in every line only the first occurrence of 'a' is being replaed, not all. The example shown here is just for a single character replacement, which can be easily be done for a word as well. 4. To replace or substitute all occurrences of 'a' with 'A'

Fruit: orange Fruit: banana Fruit: pappaya

$ sed 's/a/A/g' sample1.txt Apple orAnge bAnAnA pAppAyA

The character 's' stands for substitution. What follows 's' is the character, word or regular expression to replace followed by character, word or regular expression to replace with. '/' is used to separate the substitution character 's', the content to replace and the content to replace with. The '^' character tells replace in the beginning and hence everyline gets added the phrase 'Fruit: ' in the beginning of the line. 2. Similarly, to add something to the end of the file:

5. Replace the first occurrence or all occurrences is fine. What if we want to replace the second occurrence or third occurrence or in other words nth occurrence. To replace only the 2nd occurrence of a character :

$ sed 's/$/ Fruit/' sample1.txt apple Fruit orange Fruit banana Fruit pappaya Fruit $ sed 's/a/A/2' sample1.txt apple orange banAna pappAya

The character '$' is used to denote the end of the line. And hence this means, replace the end of the line with 'Fruit' which effectively means to add the word 'Fruit' to the end of the line. 3. To replace or substitute a particular character, say to replace 'a' with 'A'.

Please note above. The 'a' in apple has not changed, and so is in orange since there is no 2nd occurrence of 'a' in this. However, the changes have happened appropriately in banana and pappaya 6. Now, say to replace all occurrences from 2nd occurrence onwards:

$ sed 's/a/A/' sample1.txt Apple orAnge $ sed 's/a/A/2g' sample1.txt apple orange

banAnA pappAyA

the entire line, '&' contains the entire line. This type of matching will be really useful when you a file containing list of file names and you want to say rename them as we have shown in one of our earlier articles: Rename group of files 10. Using sed, we can also do multiple substitution. For example, say to replace all 'a' to 'A', and 'p' to 'P':

7. Say, you want to replace 'a' only in a specific line say 3rd line, not in the entire file:

$ sed '3s/a/A/g' sample1.txt $ sed 's/a/A/g; s/p/P/g' sample1.txt apple APPle orange orAnge bAnAnA bAnAnA pappaya PAPPAyA

'3s' denotes the substitution to be done is only for the 3rd line. 8. To replace or substitute 'a' on a range of lines, say from 1st to 3rd line:

OR This can also be done as:

$ sed -e 's/a/A/g' -e 's/p/P/g' sample1.txt APPle orAnge bAnAnA PAPPAyA

$ sed '1,3s/a/A/g' sample1.txt Apple orAnge bAnAnA pappaya

The option '-e' is used when you have more than one set of substitutions to be done. OR The multiple substitution can also be done as shown below spanning multiple lines:

9. To replace the entire line with something. For example, to replace 'apple' with 'apple is a Fruit'.

$ sed 's/.*/& is a Fruit/' sample1.txt apple is a Fruit orange is a Fruit banana is a Fruit pappaya is a Fruit

$ sed -e 's/a/A/g' \ > -e 's/p/P/g' sample1.txt APPle orAnge bAnAnA PAPPAyA

The '&' symbol denotes the entire pattern matched. In this case, since we are using '.*' which means matching

sed - Read from a file or write into a file


In this sed article, we will see how to read a file into a sed output, and also how to write a section of a file content to a different file. Let us assume we have 2 files, file1 and file2 with the following content:

2strawberry

r file2 reads the file contents of file2. Since there is no specific number before 'r', it means to read the file contents of file2 for every line of file1. And hence the above output. 2. The above output is not very useful. Say, we want to read the file2 contents after the 1st line of file1:

$ cat file1 1apple 1banana 1mango $ cat file2 2orange 2strawberry $ sed '1r file2' file1 1apple 2orange 2strawberry 1banana 1mango

sed has 2 options for reading and writing:

'1r' indicates to read the contents of file2 only after reading the line1 of file1. 3. Similarly, we can also try to read a file contents on finding a pattern:

r filename : To read a file name content specified in the filename w filename : To write to a file specified in the filename Let us see some examples now: 1. Read the file2 after every line of file1.

$ sed '/banana/r file2' file1 1apple 1banana

$ sed 'r file2' file1 1apple 2orange 2strawberry 1banana 2orange 2strawberry 1mango 2orange

2orange 2strawberry 1mango

The file2 contents are read on finding the pattern banana and hence the above output. 4. To read a file content on encountering the last line:

$ sed '$r file2' file1 1apple 1banana

1mango 2orange 2strawberry

Note: Even after running the above command, the file1 contents still remain intact. 2. Write the contents from the 3rd line onwards to a different file:

The '$' indicates the last line, and hence the file2 contents are read after the last line. Hey, hold on. The above example is put to show the usage of $ in this scenario. If your requirement is really something like above, you need not use sed. cat file1 file2 will do :) . Let us now move onto the writing part of sed. Consider a file, file1, with the below contents:

$ sed -n '3,$w file2' file1

$ cat file2 mango orange

$ cat file1 apple banana mango orange strawberry

strawberry

As explained earlier, the '3,$' indicates from 3 line to end of the file. 3. To write a range of lines, say to write from lines apple through mango :

$ sed -n '/apple/,/mango/w file2' file1

1. Write the lines from 2nd to 4th to a file, say file2.

$ cat file2 apple

$ sed -n '2,4w file2' file1 banana

The option '2,4w' indicates to write the lines from 2 to 4. What is the option "-n" for? By default, sed prints every line it reads, and hence the above command without "-n" will still print the file1 contents on the standard output. In order to suppress this default output, "-n' is used. Let us print the file2 contents to check the above output.

mango

sed - Selective printing


In this sed article, we are going to see the different options sed provides to selectively print contents in a file. Let us take a sample file with the following contents:

$ cat file2 banana mango orange $ cat file Gmail 10 Yahoo 20 Redif 18

1. To print the entire file contents:

Redif 18

$ sed '' file Gmail 10 Yahoo 20

The "d" command denotes the delete the pattern. As said earlier, the default action of sed is to print. Hence, all the other lines got printed, and the line containing the pattern 'Gmail' got deleted since we have specified explicit "d" option. In the same lines, say to delete the first line of the file:

Redif 18

2. To print only the line containing 'Gmail'. In other words, to simulate the grep command:

$ sed '1d' file Yahoo 20 Redif 18

$ sed '/Gmail/p' file Gmail 10 Gmail 10 Yahoo 20 Redif 18

4. Print lines till you encounter a specific pattern, say till 'Yahoo' is encountered.

$ sed

'/Yahoo/q' file

Gmail 10 Yahoo 20

Within the slashes, we specify the pattern which we try to match. The 'p' command tells to print the line. Look at the above result properly, the line Gmail got printed twice. Why? This is because the default behavior of sed is to print every line after parsing it. On top of it, since we asked sed to print the line containing the pattern 'Gmail' explicitly by specifying 'p", the line 'Gmail' got printed twice. How to get the desired result now?

The "q" command tells to quit from that point onwards. This sed command tells to keep printing(which is default) and stop processing once the pattern "Yahoo" is encountered. Printing Range of Lines: Till now, what we saw is to retrieve a line or a set of lines based on a condition. Now, we will see how to get the same for a given range: Consider the below sample file:

$ sed -n '/Gmail/p' file Gmail 10

The desired result can be obtained by suppressing the default printing which can be done by using the option "n". And hence the above result. 3. To delete the line containing the pattern 'Gmail'. In other words, to simulate the "grep -v" command option in sed:

$ cat file Gmail 10 Yahoo 20 Redif 18

$ sed

'/Gmail/d' file

Inbox 15 Live 23

Yahoo 20

Hotml 09

of the file.

5. To print the first 3 lines, or from lines 1 through 3:

$ sed -n '/Redif/,$p' file Redif 18

$ sed -n '1,3p' file Inbox 15 Gmail 10 Live Yahoo 20 Hotml 09 Redif 18 23

The option "-n" suppresses the default printing. "1,3p" indicates to print from lines 1 to 3. The same can also be achieved through:

The earlier examples were line number ranges and pattern ranges. sed allows us to use both (line number and pattern) in the same command itself. This command indicates to print the lines from pattern "Redif" till the end of the file($). 8. Similarly, to print contents from the beginning of the file till the pattern "Inbox":

$ sed '3q' file Gmail 10

$ sed -n '1,/Inbox/p' file Yahoo 20 Gmail 10 Redif 18 Yahoo 20

3q denotes to quit after reading the 3rd line. Since the "-n" option is not used, the first 3 lines get printed. 6. Similar to give line number ranges, sed can also work on pattern ranges. Say, to print from lines between patterns "Yahoo" and "Live":

Redif 18 Inbox 15

sed - Replace or substitute file contents - Part 2


In one of our earlier articles, we saw about Replace and substitute using sed. In continuation to it, we will see a few more frequent search and replace operations done on files using sed. Let us consider a file with the following contents:

$ sed -n '/Yahoo/,/Live/p' file Yahoo 20 Redif 18 Inbox 15 Live 23

$ cat file RE01:EMP1:25:2500

The pattern is always specified between the slashes. The comma operator is used to specify the range. This command tells to print all those lines between the patterns "Yahoo" and 'Live". 7. To print the lines from pattern "Redif" till the end

RE02:EMP2:26:2650 RE03:EMP3:24:3500 RE04:EMP4:27:2900

1. To replace the first two(2) characters of a string or a line with say "XX":

RE02:EMP2:26:26 RE03:EMP3:24:35 RE04:EMP4:27:29

$ sed 's/^../XX/' file XX01:EMP1:25:2500 XX02:EMP2:26:2650 XX03:EMP3:24:3500 XX04:EMP4:27:2900

4. To add a string to the end of a line:

$ sed 's/$/.Rs/' file RE01:EMP1:25:2500.Rs RE02:EMP2:26:2650.Rs RE03:EMP3:24:3500.Rs RE04:EMP4:27:2900.Rs

The "^" symbol indicates from the beginning. The two dots indicate 2 characters. The same thing can also be achieved without using the carrot(^) symbol as shown below. This also works because by default sed starts any operation from the beginning.

Here the string ".Rs" is being added to the end of the line. 5. To add empty spaces to the beginning of every line in a file:

sed 's/../XX/' file $ sed 's/^/ /' file

2. In the same lines, to remove or delete the first two characters of a string or a line.

RE01:EMP1:25:Rs.2500 RE02:EMP2:26:Rs.2650

$ sed 's/^..//' file 01:EMP1:25:2500 02:EMP2:26:2650 03:EMP3:24:3500 04:EMP4:27:2900

RE03:EMP3:24:Rs.3500 RE04:EMP4:27:Rs.2900

To make any of the sed command change permanent to the file OR in other words, to save or update the changes in the same file, use the option "-i"

Here the string to be substituted is empty, and hence gets deleted. 3. Similarly, to remove/delete the last two characters in the string:

$ sed -i 's/^/ $ cat file

/' file

RE01:EMP1:25:Rs.2500 RE02:EMP2:26:Rs.2650

$ sed 's/..$//' file RE01:EMP1:25:25

RE03:EMP3:24:Rs.3500 RE04:EMP4:27:Rs.2900

6. To remove empty spaces from the beginning of a line :

"RE04:EMP4:27:Rs.2900"

$ sed 's/^ *//' file RE01:EMP1:25:2500

".*" matches the entire line. '&' denotes the pattern matched. The substitution pattern "&" indicates to put a double-quote at the beginning and end of the string. 9. To remove the first and last character of a string:

RE02:EMP2:26:2650 RE03:EMP3:24:3500 $ sed 's/^.//;s/.$//' file RE04:EMP4:27:2900 RE01:EMP1:25:2500

"^ *"(space followed by a *) indicates a sequence of spaces in the beginning. 7. To remove empty spaces from beginning and end of string.

RE02:EMP2:26:2650 RE03:EMP3:24:3500 RE04:EMP4:27:2900

10. To remove everything till the first digit comes :


$ sed 's/^ *//; s/ *$//' file RE01:EMP1:25:2500 RE02:EMP2:26:2650 RE03:EMP3:24:3500 RE04:EMP4:27:2900

$ sed 's/^[^0-9]*//' file 01:EMP1:25:2500 02:EMP2:26:2650 03:EMP3:24:3500 04:EMP4:27:2900

This example also shows to use multiple sed command substitutions as part of the same command. The same command can also be written as :

Similarly, to remove everything till the first alphabet comes:

sed -e 's/^ *//' -e 's/ *$//' file

sed 's/^[^a-zA-Z]*//' file

8. To add a character before and after a string. Or in other words, to encapsulate the string with something:

11. To remove a numerical word from the end of the string:

$ sed 's/[0-9]*$//' file $ sed 's/.*/"&"/' file "RE01:EMP1:25:Rs.2500" "RE02:EMP2:26:Rs.2650" "RE03:EMP3:24:Rs.3500" RE01:EMP1:25: RE02:EMP2:26: RE03:EMP3:24:

RE04:EMP4:27:

RE04:EMP4:27:RS.2900

12. To get the last column of a file with a delimiter. The delimiter in this case is ":".

Same as above, \U instead of \L.

sed - 25 examples to delete a line or pattern in a file


In this article of sed tutorial series, we are going to see how to delete or remove a particular line or a particular pattern from a file using the sed command. Let us consider a file with the sample contents as below:

$ sed 's/.*://' file 2500 2650 3500 2900

$ cat file Cygwin Unix Linux Solaris

For a moment, one can think the output of the above command to be the same contents without the first column and the delim. sed is greedy. When we tell, '.*:' it goes to the last column and consumes everything. And hence, we only the get the content after the last colon. 13. To convert the entire line into lower case:

AIX

$ sed 's/.*/\L&/' file re01:emp1:25:rs.2500 re02:emp2:26:rs.2650 re03:emp3:24:rs.3500 re04:emp4:27:rs.2900

1. Delete the 1st line or the header line:

$ sed '1d' file Unix Linux Solaris

\L is the sed switch to convert to lower case. The operand following the \L gets converted. Since &(the pattern matched, which is the entire line in this case) is following \L, the entire line gets converted to lower case. 14. To convert the entire line or a string to uppercase :

AIX

d command is to delete a line. 1d means to delete the first line. The above command will show the file content by deleting the first line. However, the source file remains unchanged. To update the original file itself with this deletion or to make the changes permanently in the source file, use the -i option. The same is applicable for all the other examples.
sed -i '1d' file

$ sed 's/.*/\U&/' file RE01:EMP1:25:RS.2500 RE02:EMP2:26:RS.2650 RE03:EMP3:24:RS.3500

Note: -i option in sed is available only if it is GNU sed. If not GNU, re-direct the sed output to a file, and rename the output file to the original file. 2. Delete a particular line, 3rd line in this case:

Linux Solaris

The ! operator indicates negative condition. 6. Delete the first line AND the last line of a file, i.e, the header and trailer line of a file.

$ sed '3d' file Cygwin Unix Solaris AIX

$ sed '1d;$d' file Unix Linux Solaris

3. Delete the last line or the trailer line of the file: Multiple conditions are separated using the ';' operator. Similarly, say to delete 2nd and 4th line, you can use: '2d;3d'.
$ sed '$d' file Cygwin Unix Linux Solaris $ sed '/^L/d' file Cygwin Unix

7. Delete all lines beginning with a particular character , 'L' in this case:

$ indicates the last line.


Solaris

4. Delete a range of lines, from 2nd line till 4th line:


AIX

$ sed '2,4d' file Cygwin AIX

'^L' indicates lines beginning with L. 8. Delete all lines ending with a particular character , 'x' in this case:

The range is specified using the comma operator. 5. Delete lines other than the specified range, line other than 2nd till 4th here:

$ sed '/x$/d' file Cygwin Solaris AIX

$ sed '2,4!d' file Unix

'x$' indicates lines ending with 'x'. AIX did not get deleted because the X is capital. 9. Delete all lines ending with either x or X, i.e caseinsensitive delete:

'*' indicates 0 or more occurrences of the previous character. '^ *$' indicates a line containing zero or more spaces. Hence, this will delete all lines which are either empty or lines with only some blank spaces. 12. Delete all lines which are entirely in capital letters:

$ sed '/[xX]$/d' file $ sed '/^[A-Z]*$/d' file Cygwin Cygwin Solaris Unix

[xX] indicates either 'x' or 'X'. So, this will delete all lines ending with either small 'x' or capital 'X'. 10. Delete all blank lines in the file

Linux Solaris

[A-Z] indicates any character matching the alphabets in capital.


$ sed '/^$/d' file Cygwin Unix $ sed '/Unix/d' file Linux Cygwin Solaris Linux AIX Solaris

13. Delete the lines containing the pattern 'Unix'.

'^$' indicates lines containing nothing and hence the empty lines get deleted. However, this wont delete lines containing only some blank spaces. 11. Delete all lines which are empty or which contains just some blank spaces:

AIX

The pattern is specified within a pair of slashes. 14. Delete the lines NOT containing the pattern 'Unix':

$ sed '/^ *$/d' file Cygwin Unix Linux Solaris AIX

$ sed '/Unix/!d' file Unix

15. Delete the lines containing the pattern 'Unix' OR 'Linux':

$ sed '/Unix\|Linux/d' file Cygwin

Solaris AIX

19. Delete the last line ONLY if it contains either the pattern 'AIX' or 'HPUX':

The OR condition is specified using the | operator. In order not to get the pipe(|) interpreted as a literal, it is escaped using a backslash. 16. Delete the lines starting from the 1st line till encountering the pattern 'Linux':

$ sed '${/AIX\|HPUX/d;}' file Cygwin Unix Linux Solaris

$ sed '1,/Linux/d' file Solaris AIX $ sed '1,4{/Solaris/d;}' file Cygwin Unix

20. Delete the lines containing the pattern 'Solaris' only if it is present in the lines from 1 to 4.

Earlier, we saw how to delete a range of lines. Range can be in many combinations: Line ranges, pattern ranges, line and pattern, pattern and line. 17. Delete the lines starting from the pattern 'Linux' till the last line:

Linux AIX

$ sed '/Linux/,$d' file Cygwin Unix

This will only delete the lines containing the pattern Solaris only if it is in the 1st four lines, nowhere else. 21. Delete the line containing the pattern 'Unix' and also the next line:

18. Delete the last line ONLY if it contains the pattern 'AIX':
$ sed '/Unix/{N;d;}' file $ sed '${/AIX/d;}' file Cygwin Unix Linux Solaris Cygwin Solaris AIX

N command reads the next line in the pattern space. d deletes the entire pattern space which contains the current and the next line. 22. Delete only the next line containing the pattern 'Unix', not the very line:

$ is for the last line. To delete a particular line only if it contains the pattern AIX, put the line number in place of the $. This is how we can implement the 'if' condition in sed.

$ sed '/Unix/{N;s/\n.*//;}' file Cygwin Unix Solaris AIX

pattern 'Linux', not the very line:

$ sed -n '/Linux/{x;d;};1h;1!{x;p;};${x;p;}' file Cygwin Linux Solaris AIX

Using the substitution command s, we delete from the newline character till the end, which effective deletes the next line after the line containing the pattern Unix. 23. Delete the line containing the pattern 'Linux', also the line before the pattern:

$ sed -n '/Linux/{s/.*//;x;d;};x;p;${x;p;}' file | sed '/^$/d' Cygwin Solaris AIX

This is almost same as the last one with few changes. On encountering the pattern /Linux/, we exchange(x) and delete(d). As a result of exchange, the current line remains in hold space, and the previous line which came into pattern space got deleted. 1h;1!{x;p;} - 1h is to move the current line to hold space only if it first line. Exchange and print for all the other lines. This could easily have been simply: x;p . The drawback is it gives an empty line at the beginning because during the first exchange between the pattern space and hold space, a new line comes to pattern space since hold space is empty.

A little tricky ones. In order to delete the line prior to the pattern,we store every line in a buffer called as hold space. Whenever the pattern matches, we delete the content present in both, the pattern space which contains the current line, the hold space which contains the previous line. Let me explain this command: 'x;p;' ; This gets executed for every line. x exchanges the content of pattern space with hold space. p prints the pattern space. As a result, every time, the current line goes to hold space, and the previous line comes to pattern space and gets printed. When the pattern /Linux/ matches, we empty(s/.*//) the pattern space, and exchange(x) with the hold space(as a result of which the hold space becomes empty) and delete(d) the pattern space which contains the previous line. And hence, the current and the previous line gets deleted on encountering the pattern Linux. The ${x;p;} is to print the last line which will remain in the hold space if left. The second part of sed is to remove the empty lines created by the first sed command.

25. Delete the line containing the pattern 'Linux', the line before, the line after:

$ sed -n '/Linux/{N;s/.*//;x;d;};x;p;${x;p;}' file | sed '/^$/d' Cygwin AIX

With the explanations of the last 2 commands, this should be fairly simple to understand.

sed - 10 examples to print lines from a file


In this article of sed series, we will see how to print a particular line using the print(p) command of sed. Let us consider a file with the following contents:
$ cat file AIX Solaris Unix Linux HPUX

24. Delete only the line prior to the line containing the

1. Print only the first line of the file:


$ sed -n '1p' file AIX

$ sed -n '/Unix/,${/X$/p;}' file HPUX

Similarly, to print a particular line, put the line number before 'p'. 2. Print only the last line of the file
$ sed -n '$p' file HPUX

$ indicates the last line. 3. Print lines which does not contain 'X':
$ sed -n '/X/!p' file Solaris Unix Linux

The range of lines being chosen are starting from the line containing the pattern 'Unix' till the end of the file($). The commands present within the braces are applied only for this range of lines. Within this group, only the lines ending with 'x' are printed. Refer this to know how to print a range of lines using sed from example 5 onwards. 10. Print range of lines excluding the starting and ending line of the range:
$ sed -n '/Solaris/,/HPUX/{//!p;}' file Unix Linux

!p indicates the negative condition to print. 4. Print lines which contain the character 'u' or 'x' :
$ sed -n '/[ux]/p' file Unix Linux

[ux] indicates line containing the pattern either 'u' or 'x'. 5. Print lines which end with 'x' or 'X' :
$ sed -n '/[xX]$/p' file AIX Unix Linux HPUX

The range of lines chosen is from 'Solaris' to 'HPUX'. The action within the braces is applied only for this range of lines. If no pattern is provided in pattern matching (//), the last matched pattern is considered. For eg, when the line containing the pattern 'Solaris' matches the range of lines and gets inside the curly braches, since no pattern is present, the last pattern (solaris) is matched. Since this matching is true, it is not printed(!p), and the same becomes true for the last line in the group as well.

6. Print lines beginning with either 'A' or 'L':


$ sed -n '/^A\|^L/p' file AIX Linux

sed - 10 examples to replace / delete / print lines of CSV file

The pipe is used to provide multiple pattern matching. Like this, multiple patterns can be provided for searching. 7. Print every alternate line:
$ sed AIX Unix HPUX 'n;d' file

How to use sed to work with a CSV file? Or How to


work with any file in which fields are separated by a delimiter? Let us consider a sample CSV file with the following content:
cat file Solaris,25,11 Ubuntu,31,2 Fedora,21,3 LinuxMint,45,4 RedHat,12,5

n command prints the current line, and immediately reads the next line into pattern space. d command deletes the line present in pattern space. In this way, alternate lines get printed. 8. Print every 2 lines:
$ sed 'n;n;N;d' file AIX Solaris HPUX

1. To remove the 1st field or column :


$ sed 's/[^,]*,//' file 25,11 31,2 21,3 45,4 12,5

n;n; => This command prints 2 lines and the 3rd line is present in the pattern space. N command reads the next line and joins with the current line, and d deltes the entire stuff present in the pattern space. With this, the 3rd and 4th lines present in the pattern space got deleted. Since this repeats till the end of the file, it ends up in printing every 2 lines. 9. Print lines ending with 'X' within a range of lines:

This regular expression searches for a sequence of non-comma([^,]*) characters and deletes them which results in the 1st field getting removed. 2. To print only the last field, OR remove all fields except the last field:
$ sed 's/.*,//' file 11

2 3 4 5

3 Fedora,21,3 4 LinuxMint,45,4 5 RedHat,12,5

This regex removes everything till the last comma(.*,) which results in deleting all the fields except the last field. 3. To print only the 1st field:
$ sed 's/,.*//' file Solaris Ubuntu Fedora LinuxMint RedHat

This is simulation of cat -n command. awk does it easily using the special variable NR. The '=' command of sed gives the line number of every line followed by the line itself. The sed output is piped to another sed command to join every 2 lines. 8. Replace the last field by 99 if the 1st field is 'Ubuntu':
$ sed 's/\(Ubuntu\)\(,.*,\).*/\1\299/' file Solaris,25,11 Ubuntu,31,99 Fedora,21,3 LinuxMint,45,4 RedHat,12,5

This regex(,.*) removes the characters starting from the 1st comma till the end resulting in deleting all the fields except the last field. 4. To delete the 2nd field:
$ sed 's/,[^,]*,/,/' file Solaris,11 Ubuntu,2 Fedora,3 LinuxMint,4 RedHat,5

This regex matches 'Ubuntu' and till the end except the last column and groups each of them as well. In the replacement part, the 1st and 2nd group along with the new number 99 is substituted. 9. Delete the 2nd field if the 1st field is 'RedHat':
$ sed 's/\(RedHat,\)[^,]*\(.*\)/\1\2/' file Solaris,25,11 Ubuntu,31,2 Fedora,21,3 LinuxMint,45,4 RedHat,,5

The regex (,[^,]*,) searches for a comma and sequence of characters followed by a comma which results in matching the 2nd column, and replaces this pattern matched with just a comma, ultimately ending in deleting the 2nd column. Note: To delete the fields in the middle gets more tougher in sed since every field has to be matched literally. 5. To print only the 2nd field:
$ sed 's/[^,]*,\([^,]*\).*/\1/' file 25 31 21 45 12

The 1st field 'RedHat', the 2nd field and the remaining fields are grouped, and the replacement is done with only 1st and the last group , resuting in getting the 2nd field deleted. 10. To insert a new column at the end(last column) :
$ sed 's/.*/&,A/' file Solaris,25,11,A Ubuntu,31,2,A Fedora,21,3,A LinuxMint,45,4,A RedHat,12,5,A

The regex matches the first field, second field and the rest, however groups the 2nd field alone. The whole line is now replaced with the 2nd field(\1), hence only the 2nd field gets displayed. 6. Print only lines in which the last column is a single digit number:
$ sed -n '/.*,[0-9]$/p' file Ubuntu,31,2 Fedora,21,3 LinuxMint,45,4 RedHat,12,5

The regex (.*) matches the entire line and replacing it with the line itself (&) and the new field. 11. To insert a new column in the beginning(1st column):
$ sed 's/.*/A,&/' file A,Solaris,25,11 A,Ubuntu,31,2 A,Fedora,21,3 A,LinuxMint,45,4 A,RedHat,12,5

Same as last example, just the line matched is followed by the new column. Note: sed is generally not preferred on files which has fields separated by a delimiter because it is very difficult to access fields in sed unlike awk or Perl where splitting fields is a breeze.

The regex (,[0-9]$) checks for a single digit in the last field and the p command prints the line which matches this condition. 7. To number all lines in the file:
$ sed = file | sed 'N;s/\n/ /' 1 Solaris,25,11 2 Ubuntu,31,2

Вам также может понравиться