Linux Regular Expressions

The Linux Newbie Guide

⇒

Fundamentals

Advanced

Supplement

Command Index

ENG⇒中

Regular Expressions

1.0 Introduction to Regular Expressions
1.1 Basic Regular Expressions (RE/BRE)
Bracket Expressions (POSIX bracket notation)
POSIX Characters
. : Match any single character
* : Characters before matching are repeated from zero to infinite
^ : Match the string at the start position
$ : Match the string at the end position
& : Remember the matched string
{a, b} : Match the character before the repetition
( ) : Set the string before the match
< > : Match the single word
( )\1 : Backward reference memory match
| : Or match
1.2 Extended Regular Expressions (ERE)
| : Or match
+ : Match the previous one to infinite repeated characters
? : Match the previous zero to one repeated characters

ENG⇒中 ENG⇒中
1.0 Introduction to Regular Expressions
"Regular Expressions" abbreviated as RE or regex or regexp), its role is: "match a string or character(s) that meets a certain rule".

For example, in many product catalogs of the company, sometimes "color" and sometimes "colour" are used to mark the color. For example, if I want to search for the keyword "color" or "colour" one day, I only need to use "colou*r" , you can find the two spellings of "color" or "colour", and often use the regular expressions "\<[^aeiouyAEIOUY]*\>" to find out the words without vowels to check whether there are may typos. Just spend a little time learning, very convenient and practical.

The matching description written in regular expresstions or extended regular expressions is called "pattern", such as "\<[^aeiouyAEIOUY]*\>" or "colou*r" in the above example.

The above example is just a small test of the regular expressions. The regular expressions was originally only popular in some UNIX tool programs such as grep and sed . Because it is too powerful and easy to use, it gradually spread to other places. For example, Notpad++, a commonly used tool software for Windows, supports regular expressions, so it can be applied in many places after learning, and it is worth learning.

MS Word wildcard

Many people will confuse regular-expressions with wildcards and can’t tell the difference, but it’s no wonder, because regular expressions and wildcard the symbols used by characters overlap, but the meanings they represent are not necessarily the same. What is even more confusing is that since UNIX released UNXI V6 in 1975, "globbing patterns" (also called "glob") have been added has expanded the wildcard syntax, since then wildcards also have the function of Bracket Expressions and the syntax of regular expressions has some overlap (although the results are not necessarily the same).

In most cases, wildcards can only operate on files (such as ls /dev/[sh]d* ), regular expressions can be thought of as enhanced versions of wildcards, and can also be used to match the contents of files or the output of programs (eg seq 1 999 | grep '5\{2,3\}' ), it is also more delicate and flexible to match the string of a certain rule. But the regular expressions also has obvious deficiencies; for example, the syntax of the regular expressions is not easy to read for the earthlings, and not all commands or tools support the regular expressions.

Before explaining the regular expressions, give an example to illustrate the biggest disadvantage of wildcards. The purpose of the following example is to use wildcards to list files or directories whose first character in the "/etc" directory is capitalized.

Example:

$ cd /etc
$ LANG=POSIX Set the locales to "POSIX" (equivalent to "LANG=C" or "LANG=" to clear all language settings)
$ ls -d [A-Z]* | head -n 5 List the first 5 files or directories whose first character is uppercase using wildcards
ConsoleKit
DIR_COLORS
Muttrc
DIR_COLORS.xterm
Muttrc.local
$ LANG=en_US.UTF-8 Set the locales to "en_US.UTF-8"
$ ls -d [A-Z]* | head -n 5 Repeat the same action once again
bashrc The output is different now.
blkid
bluetooth
bonobo-activation
capi.conf

The above experiments illustrate one of the biggest disadvantages of wildcards, that is, the output of the same command will be different on different machines or environments (different language settings may affect the sorting of wildcards ) .

If the same function is rewritten with ls and grep that supports regular expressions, there will be consistent output results, and no wildcards will affect the output due to different environments.

example:

$ cd /etc
$ LANG=POSIX
$ ls -d * | grep '[A-Z].*' | head -n 5 Use ls with grep to list the first 5 files or directories whose first character is uppercase.
ConsoleKit
DIR_COLORS
DIR_COLORS.xterm
Muttrc
Muttrc.local
$ LANG=en_US.UTF-8 Set the locales to "en_US.UTF-8"
$ ls -d * | grep '[A-Z].*' | head -n 5 Test the command again to see the results.
ConsoleKit The results are consistent now
DIR_COLORS
DIR_COLORS.xterm
Muttrc
Muttrc.local

Regular expressions themselves are not difficult, but correctly matching patterns requires experience and practice. Regular expressions can be divided into basic regular expressions (RE) and extended regular expressions (ERE). However, the level of support for regular expressions varies among different software tools. Below are the levels of support for regular expressions in commonly used Unix/Linux tools.

Utility	basic regular expressions	extended regular expressions
shell
vi	○
locate	○
find	○	○
grep	○	○
sed	○	○
awk	○	○

^ back on top ^

1.1 Basic Regular Expressions (RE/BRE)

Basic Regular Expressions are often abbreviated as regex, RE, BRE or "posix-basic". The basic regular expression is the most basic usage in the regular expression. Generally, if there is no special explanation, the regular expression refers to the basic regular expression; if you use man to check the usage of a certain command , there is support for "regexp" or "regex" or " posix-basic" means that there is support for Basic Regular Expressions.

In Basic Regular Expressions, the characters "{ }", "( )", "< >", and "|" are treated as reserved metacharacters with special meanings. Therefore, they need to be escape characters with a backslash to be treated as literal characters. This is the most notable difference between Basic Regular Expressions and Extended Regular Expressions (ERE).

Since the vi (vim) editor itself has good support for regular expressions and highlights matched strings, it is suitable for practicing regular expressions. Therefore, so the following examples use vi general mode search Come experiment.
(If the matched strings are not highlighted, please set ":set hlsearch" in the vi configuration. For certain Linux distributions such as Fedora, you may need to install "vim-enhanced" for the matched text to be highlighted. Fedora users can use the command "sudo dnf install vim-enhanced" or "sudo yum install vim-enhanced" to install "vim-enhanced".)

Please enter the following randomly found English tongue twister in vim and save it as "re.txt". It may be used again when introducing Extended Regular Expressions or grep and sed (you can copy and paste if you don't want to type it out):

(vi editor)

busy buzzing bumblebees buzzing busying
6 silly sisters selling shining shoes
The driver was drunk and drove the doctor's car into deep ditch.
can you can a can as a canner can can a can

google Goggles Solves SUDOKU Puzzles.
How much oil boil can a gum boil boil if a gum boil can boil oil?
Where's the peck of pickled peppers Peter Piper picked?
55 Flags freely flutter from the floating frigate

^ back on top ^

Bracket Expressions(POSIX Bracket Expressions)
Bracket Expressions are supported by the shell itself. However, it is important to note that in regular expressions, the ordering of characters within bracket expressions is not affected by the environment or locale, which is different from the use of wildcards in the shell.The most notable feature of bracket expressions is that the rules to be matched are enclosed in square brackets "[ ]", and regardless of how complex the content inside is, it represents only a single character that matches the pattern. Common usages are as follows:

[One or more characters] : Matches a single character
The square brackets "[ ]" indicate that any character placed inside will match any one of those characters.For example, "[xyz]" means it can match either the character "x" or "y" or "z", but not the string "xyz".

(vi editor)

busy buzzing bumblebees buzzing busying
6 silly sisters selling shining shoes
The driver was drunk and drove the doctor's car into deep ditch.

(The middle output portion is omitted)

/bu[nms] ←"In the vi status line, search for a string following 'bu' that is followed by either 'n', 'm', or 's'.

"If the matched text is not highlighted during the experiment in vi ,please set ':set hlsearch' in the vi environment configuration to enable highlighting of matched text. (Note that highlighting is supported only in vim, and most distributions have replaced vi with vim.)

[Start character - end character]: Specify the range of characters to match
If you want to match a character in a certain range, the range must be from small to large, with a "-" interval between the ranges, and there must be no space between the character and "-".

For example, "[xz]" means that the character "x" or "y" or "z" can be matched, which is equivalent to "[xyz]".
Among them, "-" has a particularly high combination rate for the characters on the left and right sides. For example, "[ab -z ]" does not match the characters "a", "b", "-" or "z". But "a" or "b~z", so the result is equal to "[a-z]", and the same function can also be written as "[abc-ijkl-z]" if happy.

Commonly used usage is as follows:

Regular Expressions	Matches
[0-9]	any number
[a-z]	all lowercase letters
[A-Z]	all uppercase letters
[a-zA-Z] 或 [A-Za-z]	all letters
[0-9a-zA-Z]	any numbers and letters

Do not use ambiguous expressions such as "[a-Z]", "[0-z]", "[A-9]" or "[0-/]" because the size sorting is not based on ASCII size, different different tools can produce different results.

Some writing looks weird but is legal, such as "[6-9-]" or "[-6-9]" means matching "6~9" and "-".
And "[6-9-z]" means match "6~9" or "-" or "z". When you can't see it for a while, match the combination rate from left to right, but it can't be combined again if it has been combined. For example, "[6-9-z]" should be regarded as [6-9 -z]" (the green background is Combine with the characters around "-"), and then use the combination rate to check the rest. If it can't be combined again, it means that the respective characters are not in the range. Example

(vi editor)

[^ ]: Reverse matching
In order to ireverse matching the range to be matched, for example, "[^xyz]" matches any character except "x" or "y" or "z".

Reverse matching can also work with "[Start character-End character]", such as "[^A-Z]" means match except capital letters.

The reverse matching "^" must be written in the first position of "[ ]" to be an inversely filter. For example, "[0-9^AZ]" is reverse matching , but it can match any number, capital letter and The symbol "^".
Therefore, "[^^0-9]" or "[^0-9^]" means that all matches except the numbers 0-9 and the symbol "^".

(vi editor)

^ back on top ^

POSIX Characters
POSIX Characters also exists in wild characters. The biggest advantage of using wild characters is that the output results will not be affected by different locales settings. It can simplify the description when used in regular expressions .
For example, to match all punctuation and symbols, it may be written as "[^0-9a-zA-Z ]", which still does not take into account the writing of characters that are not displayed (Whitespace), such as using POSIX-character as long as "[ :punct:]" Simple yet powerful.

The actual application of POSIX-characters requires multiple square brackets such as "[[:punct:]]", because the outermost "[ ]" represents the range of Bracket Expressions, and the inner "[::]" is meta-character.
Because POSIX character is a subset of bracket expressions, it can be combined with various square bracket expressions introduced above, such as "[[:upper:]xyz]" or "[^[: upper:]0-9]", [^[:upper:][:lower:]]", etc.

The meanings represented by each POSIX-character are as follows:

POSIX Characters matches characters
POSIX	ASCII	Illustrate	Note
[:alnum:]	[A-Z,a-z,0-9]	English alphabet and numbers
[:alpha:]	[A-Z,a-z]	English alphabet
[:blank:]		Space(ASCII = 20_H)和 TAB(ASCII = 9_H)
[:cntrl:]	[0_H-1F_H,7F_H]	Control character
[:digit:]	[0-9]	Number
[:graph:]	[21_H-7E_H]	Characters that will be displayed
[:upper:]	[A-Z]	Uppercase letter
[:lower:]	[a-z]	Lower case letters
[:print:]	[20_H-7E_H]	Characters that will be displayed + spaces
[:punct:]	[\]\[!"#$%&')(*+,./:;<=>?@\^_`{\|}~-]	Punctuation and symbols
[:space:]	[ \t\r\n\v\f]	Whitespace(characters not displayed)	Whitespace characters (such as newlines ) are generally not displayed, but Linux has its fixed representations for some commonly used Whitspace characters (refer to the column "Linux Common Representations" in [note] ASCII)
[:xdigit:]	[A-F,a-f,0-9]	Character in hexadecimal

(vi editor)

. : Match any single character
A "." can match any single character (must exist), somewhat similar to "?" of wildcards .

"." Represents any existing characters, such as "x.y" means that it can match "xay", "xBy" or "xxy", etc. but cannot match "xy".
This is an independent subset and cannot be mixed with the square bracket expressions . For example, "[x.y]" means that it can match the character "x" or "." .

"." must be written outside the square brackets to represent its meaning. For example, "[A-Z]. " means that the first character can be matched as a capital letter, and the second character can be used casually.
And "[A.Z]" means that it can match most of "A" or "." or "Z"

Linux and Unix-like, "." means any character (including symbols,invisiable characters) except newline elements and control characters).

Example: ( vi operation)

busy buzzing bumblebees buzzing busying
6 silly sisters selling shining shoes
The driver was drunk and drove the doctor's car into deep ditch.

(The middle output portion is omitted)

/s.ll ←Because "." can match any character, so the "sill" or "sell" both match

* : Characters before matching are repeated from zero to infinite
The "*" is the same symbol as the wildcards, but their functionalities are quite different. In regular expressions, the "*" indicates zero or more occurrences of the preceding character(s).
It is crucial to note which character is the "preceding" one. For example, in the pattern "x*z," the preceding character before the "*" is "x", so it means that "x" can appear 0 to infinite times. Therefore, it can match "z," "xz," "xxz," "xxxxxxxxxxz," and so on. So,

Iif you want to express one or more consecutive uppercase letters (such as A or BB or ABCDEFG), you can write it as "[A-Z][0-9]*" or "[A-Z]g*" and similar patterns. However, it is more common to write it as "[A-Z][A-Z]*" to "imply" the intention, so that it is easier to understand and not end up with incomprehensible code. Similarly, to match any character, you can use ".*" to represent matching any "thing" (including nothing), which is more readable than writing "A*" or "3*" and so on.

Here are some practical examples:

Regular Expression	Matches	Example
.*	Zero to any character.	[\]\[!"#$ABCabc0123\n\a\t etc.
[0-9][0-9]*	One or more consecutive digits.	0、11、12345、 543543543
[A-Z][A-Z]*	Greater than or equal to one or more consecutive uppercase letters.	Y、IJK、ZZZZZZ
[a-z][a-z] *	Greater than or equal to one or more consecutive lowercase letters.	z、xyz、abcdefghijk
Goo*gle	The "o" between G and gle can range from one to infinity	Gogle、Google、Gooooooooooogle
yaho*	yah "o" can be zero to infinite	yah、yaho、yahoooooooooooooo
.*k	from the first arbitrary character until "k" is encountered	This is a book
G.*	"G" starts and continues until newline	Good morning Mr. Chen

If you add the caret symbol "^" at the beginning of the pattern, it signifies that the pattern must be at the beginning of a line to be considered a match.

例:(vi 操作)

In the above example, because the character "n"=0 in front of "*" also matches, "ca", "can" and "cann" or "cannnnnn" (infinitely repeating the adjacent previous characters) are all consistent.

^ back on top ^

^ : Match the string at the start position
If you add "^" in front of the pattern, it means that the pattern must be at the beginning of a certain line to be considered a match.

Example: ( vi operation)

In the above example, there are many "can" character strings, but adding a "^" to the former not only requires the strings to match but also the starting position of the line to be considered a real match.
In addition, do not confuse it with the reverse matching "[^ ]", The "[^ ]" is the reverse matching, and "^[ ]" denotes matching the starting position.

Example: ( vi operation)

busy buzzing bumblebees buzzing busying
6 silly sisters selling shining shoes
The driver was drunk and drove the doctor's car into deep ditch.
can you can a can as a canner can can a can

google Goggles Solves SUDOKU Puzzles.

(The middle output portion is omitted)

/^[^A-Z] ← matches characters whose starting position is not uppercase

$ : Match the string at the end position
If you add "$^" after the pattern, it means that the pattern must be at the last position of a line to be considered a match.

Example: ( vi operation)

In the above example, because there are many "an" strings, adding "$^" at the end of the pattern not only must match the pattern, but also must be at the last position of the line to be considered a real match.
Therefore, if "^" and "$" are combined and written as "^$", it means to match a blank line with nothing at the beginning and at the end (only newline and no other characters), and if it is "^hello$", it means to match a line with only a list" hello" nothing else.

Example: ( vi operation)

&: Remember the matched string
We can use "&" to represent the matched string. That is, "&" is a variable, at this time "&=matched string".
For example, if the regular expressions "colou*r" matches the string "color", then &=color, if the matched string is "colour", then &=colour.

Memory matching "&" is mainly used for replacement, (replace with the pattern represented by "&"), so please use the substitution of vi command mode to test in the following example.
See the example below to make it clearer.

Example: (Use the replacement of the vi command mode to test)

busy buzzing bumblebees buzzing busying
6 silly sisters selling shining shoes
(The) driver was drunk and drove the doctor's car into deep ditch.
can you can a can as a canner can can a can

google(Goggles)(Solves) SUDOKU (Puzzles).

(The middle output portion is omitted)

:1,$ s/[A-Z][a-z][a-z]*/(&)/g ←Add "( )" to the words whose first letter is uppercase and the subsequent letter is lowercase

In the above example, the word matched for the first time is "The", so &=The, and when replaced with "(&)", the result is "(The)" because &=The; the word matched for the second time For "Goggles" and so on.
(The above example uses the replacement operation of the vi command mode )

^ back on top ^

{a,b} : Match the character before the repetition
Because the "*" in the regular expression can only match the preceding characters from 0 to ∞, it is obviously insufficient. If you want to specify the number of preceding characters, use "{a,b" }", for example "o{2,4}" means it can match 2~4 consecutive "o".
In the format of "{a,b}", a is the lower limit of the number of repetitions of the previous character, and b is the upper limit; if it is written as "{a}", it is a fixed number of repetitions of the previous character.

Special attention should be paid to the basic regular expressions "{ }", "( )", "< >" and "|" in brackets are regarded as reserved symbols, so escape characters should be added and written as " \ { a,b\}".

Example: ( vi operation)

What are the numbers for I II III IV V VI VII VIII VIIII in Roman numerals?

(The middle output portion is omitted)

/VI\{2,3\} ←Search for 2~3 consecutive "I" strings after "V"

In addition, if one of the elements in "{a,b}" is omitted, it means that there is no upper limit or lower limit for the preceding repeated characters.

Example: ( vi operation)

gogle google gooogle goooogle gooooogle is a gd god good goood

(The middle output portion is omitted)

/go\{,2\}d ← search for a string between "g" and "d" where the upper limit of "o" is 2

Example: ( vi operation)

gogle google gooogle goooogle gooooogle is a gd god good goood

(The middle output portion is omitted)

/o\{3\} ←Search for strings where the lower limit of "o" is 3

( ): Set the string before the match
The "*" or "{}" introduced above match the repeated "chracters" before the set match, but how to deal with the repeated "string" ? Such as the string "AwxywxywxywxyB", you can enclose the string with "( )" and treat it as characters. For example, "A(wxy)*B" can match "AwxyB" or "AwxywxywxyB".

Some sets of "( )" are matched with "*" or "+" and "?" of extended regular notation , etc. can have the effect of empowerment and multiplication. The following are common usages:

〝( )*〞 : Set matches from zero to infinity.
〝( )+〞 : set matching from one to infinity (extended regular expression is only supported).
〝( )?〞 : Set matches zero to one (extended regular notation is only supported).

Example: ( vi operation)

1,000 milliliter equal 1 liter.

(The middle output portion is omitted)

/$li$*ter ← Matches "li" from 0~∞ in front of "ter"

^ back on top ^

< > : Match the single word
Matching individual words is sometimes not as easy as you might imagine. For example, if you only want to find the word "oil," without proper spacing, you might also match words like "boil" or "unspoiled," which are not the intended matches. The simplest solution to avoid such headaches is to enclose the desired pattern you want to match with "< >" brackets.

Example: ( vi operation)

Word matching can also be fun to explore. For example, we all know that every English word has a vowel, which can be A, E, I, or U (with the exception of the words "by" and "fly" as the only two words without vowels). I can use "<[^aeiouyAEIOUY]*>" to preliminarily find words without vowels and exclude possible misspellings.

( )\1 : Backward reference memory match
" & " can memorize only one matched character string, if you want to memorize more than one matched character string, you can use back-reference memory matching "( )\1 "."
Among them, "( )" is the matching condition, and "\1" is the memory of the first matching variable, which can be changed from "( )( )( )( )( )( )( )( )( )\1 at most. \2\3\4\5\6\7\8\9〞A total of 9.
For example, if "(abc)(def)(ghi) \1\2\3" is matched, then \1=abc, \2=def, \3=ghi.

Where is Back-Referencememory matching used? Don’t look at the answers below. The tester uses regular expressions to match two adjacent and identical letter strings (such as "bb", "ee", "AA" in a word) ", "GG" and other strings).

Maybe there is a way, but I can't think of it, but if you use Back-Referencememory matching to write "$[a-zA-Z]$\1", it will do.

Example: ( vi operation)

busy buzzing bumblebees buzzing busying
6 silly sisters selling shining shoes>

(The middle output portion is omitted)

/$[a-zA-Z]$\1 ←Search for a string of two adjacent and identical letters

In the above example, "[a-zA-Z]" will match any English character, so at the beginning, "[a-zA-Z]" matches the first character "b" at the beginning of the article. At this time, the variable "\ 1" is equal to the character "b", so it matches the string "bb", and so on "[a-zA-Z]" to match the next character "u" (at this time, the variable "\1" = "u )" so start to match the string "uu" and go on to find two adjacent strings with the same letter.

Because "( )( )( )...\1\2\3...\9" can be up to 9, a more complicated example, such as "$[a-z]$$[a-z]$\ 2\1〞Swap the position of the variable and what can be matched, think about it or test it yourself.

Backward reference memory matching is not only used for searching, but also commonly used for replacing, as in the following example.

Example: (Use the replacement of the vi command mode to test)

busy buzzing bumblebees buzzing busying
silly 6 sisters selling shining shoes
The driver was drunk and drove the doctor's car into deep ditch.

(The middle output portion is omitted)

:1,$ s/$\<[0-9]*\>$ $\<[a-z]*\>$ /\2 \1 /g ← If the matching string is all numbers and the next string is all lowercase letters, exchange positions

In the above example, "(\<[0-9]*\>\)" matches the digital string "6", so "\1=6", "(\<[az]*\>\)" matches To the string "silly", so "\2=silly", and deliberately put "\2" before "\1" when replacing, so the purpose of swapping is achieved.

| : Or match
Or match "|" should be in the extended regular expressions , but modern Linux/UNIX-like is also included in the basic regular expressions, but in order to distinguish it, escape characters are added and written as "\|" . The base regular notation for missing or matching "|" seems to be limping around.

Or matching is to perform "or" operation on the templates on the left and right sides of "|"; that is, one of the two sides of "|" matches. For example, "[1-3]\|xyz" means that the numbers 1~3 or the string "xyz" can be matched.

Example: ( vi operation)

In addition, you can enclose "|" with "( )" to narrow the matching range, for example, "iphone$4s\|5$" means that both "iphone4s" and "iphone5" are matched.

^ back on top ^

1.2 Extended Regular Expressions (ERE)

但為什麼要有〝延伸正規表示法〞呢?個人認為(個人認為不一定對,有空我再考證一下)因基礎正規表示法在定義的時候漏掉了或匹配的〝|〞。

為什要為了一個或匹配符號的〝|〞而定義延伸正規表示法?符號〝|〞這麼重要嗎?是的!!舉簡單的例子,假設我不用〝|〞,但我要匹配單字〝as〞或〝if〞我可能可寫成〝[ai][sf]〞但此時你要保佑不要匹配到單字〝is〞。

但隨便增加或匹配符號的〝|〞到基礎正規表示法會有相容問題,如以前寫的 pattern 只是要匹配字元〝|〞但並不是要進行或匹配運算。故解決方法為原基礎正規表示法要用或匹配要加跳脫字元寫成〝\|〞另新增另一表示法叫〝延伸正規表示法〞則直接用〝|〞表示或匹配也順便修改了些東東。

延伸正規表示法(Extended Regular Expression)或叫〝posix-Extended〞常簡寫為 ERE,和基礎正規表示法不同的地方如下。

But why is there an "extended regular expressions"? I personally think (I don't think it is necessarily correct, I will check again when I have time) because the basic regular expressions missed or matched "|" when it was defined .

Why define extended regular expressions "|" so important? Yes!! For a simple example, suppose I don't use "|", but I want to match the single word "as " or "if" I may write as "[ai][sf]", but at this time you have to be blessed not to match the single word "is".

However, adding or matching symbols "|" to the basic regular expression will cause compatibility problems. For example, the pattern written before is only to match the character "|" but not to perform OR matching operations. Therefore, the solution is to use or match the original basic regular expressions and add escape characters to write it as "\|", and add another expression called "extended regular expressions" to directly use "|" to express or match and modify it by the way something.

Extended Regular Expressions or "posix-Extended" is often abbreviated as ERE, and the difference from the basic regular expressions is as follows.

Add "|" or match. (The original basic formal representation should be written as "\|").
The original basic regular expression should be written as "\{ \}", and "" looks like an unsightly escaped character. Remove it.
Add "+" and "?" to represent the characters (metacharacter).

So the extended regular expressions is nothing special, isn’t it? Generally speaking, if the command supports the extended regular expressions, and the option to specify the extended regular expressions to match is "-E", such as grep -E . And grep is also a typical command that supports extended regular expressions.

The new items added to the extended regular expression are as follows:

| : Or match
Is the same function as the or match of the basic regular expressions but used in the extended regular expression without adding escape characters. Because vi does not support extended regular expressions, the following operations are replaced by grep .

example:

$ grep -E 'ca(r|n)' re.txt ← match "car" or "can" (note! No need to escape characters)
The driver was drunk and drove the doctor's car into deep ditch.
can you can a can as a canner can can a can
How much oil boil can a gum boil boil if a gum boil can boil oil?
$ grep 'ca$r\|n$' re.txt ←If the option "-E" is removed For the basic regular expressions, or-match to add escape characters

+ : Match the previous zero to one repeated characters
which is equivalent to the usage of "{1,}" in the basic regular expressions . For example, "(xyz)+" can match "xyz" or "xyzxyz" or "xyzxyzxyz"

Example:

$ grep -E 'go+gle' re.txt ← list "gogle" or "google" or "goooooooooooogle"
google Goggles Solves SUDOKU Puzzles
$ seq 1 10000 | grep -E '^199+' ← list 199, 199, 19999999 Related figures
199
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999

? : Match zero to one repeated character in front of "?"
It is equivalent to the usage of "{0,1}" in the basic regular expressions.

例:

$ grep -E '[+-]?[0-9]+\.[0-9]+([Ee][+-][0-9]+)?' fileA ←List the real numbers in "fileA"

^ back on top ^

[Note]
ASCII table (source from "https://en.wikipedia.org/wiki/ASCII")

Dec	Hex	Abbr	Linux Common Representations	Name	Dec	Hex	Glyph	Dec	Hex	Glyph	Dec	Hex	Glyph
0	0	NUL		Null	32	20	（空格）	65	41	A	98	62	b
1	1	SOH		start of heading	33	21	!	66	42	B	99	63	c
2	2	STX		star of text	34	22	"	67	43	C	100	64	d
3	3	ETX		end of text	35	23	#	68	44	D	101	65	e
4	4	EOT		end of transmission	36	24	$	69	45	E	102	66	f
5	5	ENQ		enquiry	37	25	%	70	46	F	103	67	g
6	6	ACK		acknowledge	38	26	&	71	47	G	104	68	h
7	7	BEL	\a	bell	39	27	'	72	48	H	105	69	i
8	8	BS	\b	backspace	40	28	(	73	49	I	106	6A	j
9	9	TAB	\t	horizontal tab	41	29	)	74	4A	J	107	6B	k
10	0A	LF	\n	line feed,new line	42	2A	*	75	4B	K	108	6C	l
11	0B	VT	\v	vertical tab	43	2B	+	76	4C	L	109	6D	m
12	0C	FF	\f	NP form feed, new page	44	2C	,	77	4D	M	110	6E	n
13	0D	CR	\r	carriage return	45	2D	-	78	4E	N	111	6F	o
14	0E	SO		Shift out	46	2E	.	79	4F	O	112	70	p
15	0F	SI		Shift in	47	2F	/	80	50	P	113	71	q
16	10	DLE		data link escape	48	30	0	81	51	Q	114	72	r
17	11	DC1		device ctrl. 1 (XON enable software control speed)	49	31	1	82	52	R	115	73	s
18	12	DC2		device ctrl. 2	50	32	2	83	53	S	116	74	t
19	13	DC3		device ctrl. 3 (XOFF disable software control speed)）	51	33	3	84	54	T	117	75	u
20	14	DC4		device ctrl. 4	52	34	4	85	55	U	118	76	v
21	15	NAK		negative ack.	53	35	5	86	56	V	119	77	w
22	16	SYN		syn. idle	54	36	6	87	57	W	120	78	x
23	17	ETB		end of trans. block	55	37	7	88	58	X	121	79	y
24	18	CAN		cancel	56	38	8	89	59	Y	122	7A	z
25	19	EM		end of medium	57	39	9	90	5A	Z	123	7B	{
26	1A	SUB		substitute	58	3A	:	91	5B	[	124	7C	\|
27	1B	ESC		escape	59	3B	;	92	5C	\	125	7D	}
28	1C	FS		file separator	60	3C	<	93	5D	]	126	7E	~
29	1D	GS		group separator	61	3D	=	94	5E	^	127	7F	DEL (Invisiable)
30	1E	RS		record separator	62	3E	>	95	5F	_
31	1F	US		unit separator	63	3F	?	96	60	`
127	7F	DEL		delete	64	40	@	97	61	a