sed and awk are two powerful tools that are often compared because of their similar strength and excellent support for regular expressions. They each have their own scripting languages, with sed mainly used for automated text file modification, while awk can be imagined as a lightweight interpreted language similar to C, commonly used for general purposes, statistics, and reformatting output.
In Chinese books or websites, the explanations for sed and awk are often limited to simple applications and can be ambiguous. Moreover, much of the content is repetitive, making it difficult for users who want to delve deeper to find more detailed information. Don't be too naive to think that you can rely solely on the man pages; man pages are meant for reference by people already familiar with the subject. Trying to learn sed and awk solely from man pages is like trying to learn conversational English from a dictionary. Therefore, this attempt aims to document all the functionalities and potentials of sed and awk. If you are a casual user and don't want to waste too much time, referring to the basic usage of sed and awk should be sufficient for more than 90% of your needs.
While grep can utilize powerful regular expressions to search for strings in files, it lacks the ability to perform editing actions such as deletion, replacement, or insertion on the matched strings. This is where sed comes in to complement grep's editing functionality. Moreover, sed's programmable features are often used to automate text file modifications.
Although vi (or any text editor) can also be used to search for and modify file contents, manually opening files, making changes, and then saving them can be time-consuming. However, if you are familiar with sed operations, all of these tasks can be automated. For example, when a company relocates, there might be numerous spreadsheet files with the old company address, and by making good use of sed, you can efficiently and automatically update all the files with the new address.
The usage of sed may seem a bit abstract, so let's explain the basic usage and each parameter separately.
The basic usage of sed is as follows: sed [-OPTION] [ADD1][,ADD2] [COMMAND] [/PATTERN][/REPLACEMENT]/[FLAG] [FILE].
To make it easier to understand, let's first provide an example, as an example explains a thousand words.
For instance, if we want to change occurrences of "The" or "the" to uppercase "THE" in lines 1 to 8 of the file "MyFile.txt," we can use the following command:
$ | sed | -e | '1,8 | s/ | [Tt]he/ | THE/ | g' | MyFile.txt |
↑ | ↑ | ↑ | ↑ | ↑ | ↑ | ↑ | ||
OPTION | ADD | COMMAND | PATTERN | REPLACE | FLAG | FILE |
Since each parameter can be complex and abstract, sed may sometimes interpret them incorrectly. As a general rule, except for the file and option parameters, all other parameters should be enclosed in single quotes (').
In the example above, if the address field is omitted, it represents all text. The "-e" option is the default and can be omitted in common usage, so if no options are added, sed will execute with the default "-e" option (indicating script interpret).
The most basic command in sed is the search and replace command "s," and its basic syntax is "s/PATTERN/REPLACEMENT/". It works similarly to the vi replace command. The pattern can be a valid regular expression or a simple string.
If you want to search and replace a word, you can add a space before the pattern and the replacement string (since words are separated by spaces). For example, if there's a sentence "This is a book," and you want to make the word "is" uppercase, you can use this method. However, be cautious if the word "This" contains the substring "is," as the output might become "ThIS IS a book." In such cases, using a regular expression with Match the single word might be a better solution if you are familiar with regular expressions.
Examples: $ echo 'This is a book' | sed 's/is/IS/g' ← Replace "is" with "IS" (no space before the word) ThIS IS a book $ echo 'This is a book' | sed 's/ is/ IS/g' ←dd a space before the pattern and replacement This IS a book $ echo 'This is a book' | sed 's/\<is\>/IS/g' ←Use a regular expression with word boundaries This IS a book $ sed 's/\<is\>/IS/g' fileA ←Replace "is" with "IS" in the file "fileA" $ sed 's/\<is\>/IS/g' fileA fileB fileC ←Process multiple files at once |
The sed FLAG "g" stands for global replacement, which means it replaces all occurrences in a line. Without this flag, sed will only replace the first occurrence and then move on to the next line.
As sed stands for "Stream EDitor," the changes made in the above examples are only output to the screen and do not modify the original files. To save the changes to a file, you need to redirect the output to a file using the../pipe/pipe.html#redirection operator.
for Example sed 's/can/CAN/' INPUT_FILE > SAVE_FILE.
$ echo 'this is a apple' | sed 's/a/AN/' ←Replace "a" with "AN" (only replaces the first occurrence) this is AN apple $ echo 'this is a apple' | sed 's/a/AN/g' ←Add the "g" flag for global replacement (replaces all occurrences) this is AN ANpple $ echo 'this is a apple' | sed 's/ is//g' ←Delete the word "is" (replace with an empty string) this a apple $ cat my_file.txt | sed '4,5 s/Google/Yahoo/g' > new.txt ←Replace "Google" with "Yahoo" in lines 4 to 5 of the file "my_file.txt" and save as "new.txt" $ sed '4,5 s/Google/Yahoo/g' < my_file.txt > new.txt ←Same functionality as above |
$ echo 'this is a apple' | sed 's/a/an/' | sed 's/apple/APPLE/'← Replace "a" with "an" and "apple" with "APPLE" this is an APPLE $ echo 'this is a apple' | sed -e 's/a/an/' -e 's/apple/APPLE/' ← Same functionality as above |
For example, if you want to convert a Linux path "/abc/wxy" to the Windows path representation "\abc\wxy," the sed command would look like sed 's/\//\\/g'. This usage can be difficult to read and understand at first glance.
To address this, sed allows you to use any character other than whitespace or newline as the delimiter. You can choose a delimiter that is not present in the search pattern or replacement string, and it should be the same character before and after the "s" command.
Examples: $ echo 'this is a apple' | sed 's:a:AN:' ← Replace "a" with "AN" (using ":" as the delimiter) this is AN apple $ echo '/home/frank/' | sed 's#/#\\#g' ←Replace "/" with "\" (using "#" as the delimiter) \home\frank\ |
The usage of address range is as follows:
$ sed '1,5 s/ [aA]/ one/' file ←Replace "a" or "A" with "one" in lines 1 to 5 $ sed '5 s/ [aA]/ one/' file ←Only replace "a" or "A" with "one" in line 5 $ sed 's/ [aA]/ one/' file ←Omitting the address replaces "a" or "A" with "one" in the entire file |
$ sed '3,$ s/^can/CAN/' file ←Replace "can" at the beginning of lines 3 to the end of the file with "CAN" |
$ sed '/The/,/Whe/ s/ can/ CAN/g' < re.txt ← Address range from the line with "The" to the line with "Whe", replace "can" with "CAN" $ sed '/[Cc]an$/ s/a/A/g' re.txt ←Replace "a" with "A" in the line that matches the pattern "[Cc]an$" |
$ sed '2,/The/ s/[0-9]/#/g' re.txt ← Replace any numbers with "#" from the second line to the line containing "The" $ sed '/google/, $ s/a/A/g' re.txt ← Replace all occurrences of "a" with "A" from the line containing "google" to the last line |
If the content matches the search pattern in the pattern space, it is called the "current pattern space." The COMMANDs "p" or "P" can be used to output the current pattern space to the screen. The basic workflow is as follows:
Syntax:: sed [-OPTION] [ADD1][,ADD2] [COMMAND] [/PATTERN][/REPLACEMENT]/[FLAG] [FILE]] | Note | ||
Command name/Function/Command user | options | Function | |
sed/ stream editor/ Any |
-e | Execute the script syntax of sed | If the "-f" option is not used, this is the default option |
-f | Use an external script file to execute | ||
-n | Do not output the pattern space to the screen | ||
-l # | Often used with COMMANDl (lowercase "L") to specify the length of each line | # is a number | |
-r | Use extended regular expressions | ||
--help | Display the command's built-in help. |
$ echo 'this is a pen' | sed -e 's/t/T/' -e 's/pen/&cil/' ←Change "t" to "T" and "pen" to "pencil" This is a pencil $ sed -e 's/a/A/' -e '/this/ q' -e 'l' MyFile ←Using multiple sed COMMANDs with the -e option (This example changes "a" to "A" in the file "MyFile" until a line contains the string "this" and then ends and lists non-printable characters.) |
$ cat sed_scr ←For example, an external script file named "sed_scr" changes a~d to uppercase s/a/A/g s/b/B/g s/c/C/g s/d/D/g $ echo 'abcdefg' | sed -f sed_scr ABCDefg |
$ sed -f sed_scr < MyFile > TargetFile ←Using the external script file "sed_scr" to process "MyFile" and save the output in "TargetFile" |
$ ls -d /etc/* | sed -n '/[A-Z][0-9]/ p' ←Using the -n option with the "p" flag to list lines matching the pattern (similar to "grep" command) /etc/X11 $ echo -e 'Line1\nLine2\nLine3' | sed -n 's/Line2/Line two/p' ←Printing only the lines that have changed Line two |
$ echo -e 'This\tIs\tA\tDog' | sed -nl 10 'l' ←Specify the line wrap length as 10 This\tIs\ \tA\tDog$ |
$ echo 'Why an apple $9.99?' | sed 's/99?/88/g' ←Change the string "99?" to "88" Why an apple $9.88 ↑In this example, the pattern "99?" is treated as a normal regular expression. However, with the -r option, it is interpreted using extended regular expressions (as in the following example). $ echo 'Why an apple $9.99?' | sed -r 's/99?/88/g' ← Using the -r option for extended regular expressions Why an apple $88.88? |
Sed Flags | |
[g][ number] | Global replacement or specify which occurrence to replace |
I | Ignore case in the pattern |
p | Print the current pattern space |
w | Write to a file |
$ echo 'this is an issue' | sed 's/is/IS/' ←"is"→"IS", but by default, only the first occurrence is replaced $ echo 'this is an issue' | sed 's/is/IS/g' ←Using the "g" FLAG for global replacement thIS IS an ISsue |
$ echo 'aaaaa aaaaa' | sed 's/a/A/3' ←Replace only the 3rd occurrence aaAaa aaaaa |
$ echo 'aaaaa aaaaa' | sed 's/a/A/3g' ←Replace starting from the 3rd occurrence aaAAA AAAAA |
$ echo 'this is an apple' | sed 's/APPLE/banana/I' this is an banana |
$ echo 'this is a pen' | sed 's/pen/pencil/p' this is a pencil ←This line outputs the contents of the pattern space this is a pencil ←This line outputs the contents of the current pattern space $ echo -e 'Line1\nLine2\nLine3' | sed -n '/[13]/p' ←Using the -n option with FLAG "p" to list only matching patterns Line 1 Line 3 |
$ man cp | sed 's/copy/{&}/w cp.txt' ←Write occurrences of "copy" from the man page to the file "cp.txt" |
$ man cp | sed -n 's/COPY/{&}/Igpw cp.txt' ←When using multiple FLAGS, file-related FLAGS (such as "w") must be placed at the end |
Flow control in sed is part of sed COMMAND and is treated as a separate entity. I will explain the syntax and usage of each flow control feature separately.
$ cat sed_scr1 # conver a..d to A..D ← This line is a comment and won't be executed s/a/A/g !s/b/B/g #← Lines starting with "!" won't be executed #s/c/C/g #← Lines starting with "#" are comments and won't be executed s/d/D/g $ echo 'abcdefg' | sed -f sed_scr1 AbcDefg |
$ sed -n '10,15 !p' MyFile ← Do not output lines 10 to 15 of the file $ cat MyFile | sed -n '/Apple/ !=' ← List line numbers where the string "Apple" is not present |
$ cat sed_scr2 # grouping with {} { #←Command package starts s/a/A/g !s/b/B/g #s/c/C/g s/d/D/g } #←Command package ends $ echo 'abcdefg' | sed -f sed_scr2 AbcDefg |
$ cat sed_scr3 # grouping with {} 1,3 { s/a/A/g !s/b/B/g #s/c/C/g s/d/D/g } $ head 3 /etc/passwd | sed -f sed_scr3 root:x:0:0:root:/root:/bin/bAsh bin:x:1:1:bin:/bin:/sbin/nologin DAemon:x:2:2:DAemon:/sbin:/sbin/nologin |
this example, the commands 's/a/A/g', '!s/b/B/g', and 's/d/D/g' are grouped inside the command package "{}". They are only applied to lines 1 to 3 of the input file, "/etc/passwd".
A crucial point to note in the syntax is that the address range and the opening curly brace "{" must be on the same line. This is because the address range is followed by a command or the beginning of a command package, marked by the opening curly brace "{".
Attempting to write the curly brace on the next line will result in a syntax error, as shown in this incorrect representation:
Incorrect Example:$ cat sed_scr4 /hello/ { #Error! The opening curly brace for the command package should be on the same line as the address range. s/a/A/ } |
$ cat sed_scr5 # grouping with nest burly braces 10,$ { /chapter 1/,/chapter 2/ { s/a/A/g s/b/B/g } } |
For example, "b upper" represents an unconditional jump to the label "upper," meaning the program execution will immediately jump to the specified label "upper" regardless of any conditions.
On the other hand, "/pattern/ upper" represents a pattern-conditioned jump. It means that if the pattern "pattern" is matched, the program will jump to the label "upper." Otherwise, it will execute the next line.
The flowchart for this behavior is as follows:
$ cat sed_scr6 # if pattern 'google' found,char "a"->"A" else "b"->"B" /google/ b capitalA #← If pattern 'google' is found, jump to label 'capitalA' { s/b/B/g b end #← Unconditionally jump to label 'end' } :capitalA #← Label 'capitalA' { s/a/A/g } :end #← Label 'end $ echo -e 'google abc\nyahoo abc' | sed -f sed_scr6 google Abc yahoo aBc |
$ cat sed_scr7 s/g..g../GOOGLE/g t capitalA #← Jump to label 'capitalA' if the substitution is successful, otherwise execute the next line b end :capitalA s/a/A/g :end $ echo -e 'google abc\nyahoo abc' | sed -f sed_scr7 GOOGLE Abc yahoo abc |
$ cat sed_scr8 s/g..g../GOOGLE/g T end #Jump to label 'end' if the substitution is unsuccessful, otherwise execute the next line s/a/A/g :end $ echo -e 'google abc\nyahoo abc' | sed -f sed_scr8 GOOGLE Abc yahoo abc |
Previously, we have used the sed COMMAND "s" because sed is powerful, there are many related COMMANDs, and fortunately, most of the COMMANDs are similar to vi.
The possible sed commands are as follows:sed COMMAND | Address |
|
Note | |
:label | label | Reference Flow Control | ||
# | Comment | Reference Flow Control | ||
! | Disable | Reference Flow Control | ||
{} | Command package | Reference Flow Control | ||
b label | b label: J | Reference Flow Control | ||
t label | Branch to label if a substitution is successfuf | Reference Flow Control | ||
T label | T label: Branch to label if a substitution is unsuccessful | Reference Flow Control | ||
= | scope | Print line numbers | ||
a , i or a\, i\ | scope | Insert text | ||
c or c\ | scope | Replace lines | ||
d | scope | Delete pattern space or specified lines | ||
D | scope | Delete the first character of the pattern space up to the newline | ||
g | scope | Copy the hold space to the pattern space | ||
G | scope | Append the hold space to the pattern space | ||
h | scope | Copy the pattern space to the hold space | ||
H | scope | Append the pattern space to the hold space | ||
l | scope | Force printing of hidden characters | ||
n | scope | Read the next line | ||
N | scope | Append the next line to the pattern space | ||
p | scope | Print current pattern space | ||
P | scope | Print from the beginning of the current pattern space up to the first newline | ||
q | single address | Immediately exit sed | ||
Q | single address | Exit sed immediately without printing the current pattern space | ||
r | single address | Insert the contents of a file | ||
s | scope | search-and-replace | ||
w | scope | Write the current pattern space to a file | ||
x | scope | Exchange the contents of the pattern space with the contents of the hold space. | ||
y | scope | Translate characters |
$ sed -n '/home/=' /etc/passwd ← List line numbers in the file "/etc/passwd" that contain the string "home" 37 38 39 |
$ man sed | sed -n '$=' ← List how many lines the sed manual page has 261 |
$ echo -e 'Line1\nLine2\nLine3' | sed '/Line2/ aINSERT' ← Insert "INSERT" after the line containing "Line2". Line1 Line2 INSERT ← Inserted line Line3 $ sed '6,$ aHello' MyFile ← Insert "Hello" after lines 6 to the end in the file "MyFile" |
$ echo -e 'Apple\nBanana\nCoconut' | sed '3 iOrange\nDurian' ← Insert two lines above the third line Apple Banana Orange ←Inserted line Durian ←Inserted line Cocount |
$ cat sed_scr9 /pattern/ a\ Insert line1 \ Insert Line2 |
$ echo -e 'Line1\nLine2\nLine3' | sed '/2$/ cREPLACE LINE' ← Replace the line ending with "2" with "REPLACE LINE".
Line1 Line1 REPLACE LINE Line3 $ echo -e '1\n2\n3\n4\n5' | sed '1,3 cREPLACE 1-3' ←Replace multiple lines with one line. REPLACE 1-3 4 5 |
$ sed '4,8 d/' MyFile ←Delete lines 4 to 8. $ sed '4,8 !d/' MyFile ←Keep lines 4 to 8 and delete the rest $ sed '/pattern/ d' MyFile ←Delete lines that match the pattern $ sed '/pattern1/,/pattern2/ d' MyFile ←Delete all lines between pattern1 and pattern2 (inclusive) $ sed '/^$/ d' ←Delete empty lines |
$ sed -n '5,12 p' file ← Print lines 5 to 12 of "file" $ sed -n '/regex1/,/regex2/ p' file ← Print lines between regex1 and regex2 (inclusive) in "file". |
The "N" command, however, appends the content of the next line to the existing pattern space without clearing it. The two sets of data are separated by the newline character "\n" (ASCII = 0AHEX).
For example, if you have use the command echo -e 'LineA\nLineB' | sed 'N , the pattern space will contain:
L | i | n | e | A | \n | L | i | n | e | B |
$ echo -e 'LineA\nLineB' | sed -e 'N' -e 's/\n/+/' ←Using "N" to read the next line and replacing the newline character "\n" with another string LineA+LineB |
For example, if the pattern space contains the following content:
'LineA\nLineB\nLineC'L | i | n | e | A | \n | L | i | n | e | B | \n | L | i | n | e | C |
L | i | n | e | B | \n | L | i | n | e | C |
$ echo -e 'LineA\nLineB\nLineC' | sed -e 'N' -e 'N' -e 'D' ←Using "N" twice to read the next two lines, then "D" to delete the first line LineB LineC |
$ echo -e 'LineA\nLineB\nLineC' | sed -e 'N' -e 'N' -ne 'P' LineA |
$ cat sed_scr10 /ID/{ N #if matching "ID" append the next line /NAME/ { N #if 2nd line matching "NAME" append next line again /ADDRESS/ { D #delete 1st matched line } } } $ echo -e 'ID:123\nNAME:abc\nADDRESS:taipei' | sed -f sed_scr10 NAME:abc ←I← The line with "ID" was deleted ADDRESS:taipei $ echo -e 'ID:123\nSEX:m\nADDRESS:taipei' | sed -f sed_scr10 ID:123 ← If the patterns "ID", "NAME", and "ADDRESS" are not in sequential and adjacent lines, they will not be deleted SEX:m ADDRESS:taipei |
L | i | n | e | A |
L | i | n | e | B |
L | i | n | e | A | \n | L | i | n | e | B |
$ cat sed_scr11 /ID/{ h # if matching "ID" pattern-space copy to hold-space n # read next line /NAME/ { G # append hold-space to pattern-space } } p # print the current-pattern-space $ echo -e 'ID:123\nNAME:abc' | sed -nf sed_scr11 NAME:abc ID:123 |
$ echo -e 'as can\aner can can a can ' as canner can can a can ←Output appears normal $ echo -e 'as can\aner can can a can ' | sed -n '/can$ /p' ←The extra space after the last "can" will cause the regex "can$" to not match $ echo -e 'as can\aner can can a can ' | sed -n '/canner /p' ←The non-displayable "\a" character prevents matching $ echo -e 'as can\aner can can a can ' | sed -n 'l' ←Using the "l" command to reveal hidden characters $ as can\aner can can a can $ |
$ ls -l / | sed '/dev/ q' total 142 drwxr-xr-x 2 root root 4096 2012-07-04 19:48 bin drwxr-xr-x 4 root root 1024 2012-06-09 01:30 boot drwxr-xr-x 13 root root 4060 2013-10-31 19:50 dev $ ls -l / | sed '/dev/ Q' total 142 drwxr-xr-x 2 root root 4096 2012-07-04 19:48 bin drwxr-xr-x 4 root root 1024 2012-06-09 01:30 boot |
$ sed '10 q' MyFile ←Display the first 10 lines of a file, equivalent to the "head -n" command $ sed -e 's/a/A/' -e '/Hello/ q' MyFile ←Search and replace, but quit when "Hello" is encountered |
$ echo -e 'LineA\nLineB' | sed -e 'n' -ne 'p' LineB |
$ sed '3 r INSERT.txt' MyFile ←Insert the content of the file "INSERT.txt" after the third line of "MyFile" $ cat MyFile | sed '/ch1/,/ch2/ r INSERT.txt' ←Insert the content of "INSERT.txt" between the patterns "ch1" and "ch2" $ sed '/ch1/ r INSERT.txt' MyFile ←Insert the content of the file "INSERT.txt" after the line that matches the pattern "ch1 |
$ cat fileA | sed 's/i/I/g' > fileB ←Recommended to use redirection if the processed file and the output file are different $ cat fileA | sed 's/i/I/g w fileA' ←CCOMMAND "w" is useful when the processed file and the output file are the same |
$ echo abcdefg | sed 'y/abc/ABC/' ABCdefg $ echo 'john smith' | sed 'y/nh/#&/' jo&# smit& $ echo '(2+3)*4' | sed 'y/()*/[]X/' [2+3]X4 |
For those interested in delving further into sed, the following resources can be valuable references:
GNU sed Official Website (sed, a stream editor Examples):
Bruce Barnett's "Sed - An Introduction and Tutorial":