The Linux Newbie Guide  ⇒    Fundamentals     Advanced     Supplement   Command Index   ENG⇒中
All rights reserved, please indicate the source when citing
 

 Linux Filtering Programs

Filter Introduction
        grepgrep Search for a string in a file
           egrep
           frgep
        cut : Extract fields
        col : Filter control characters
        tr : Character translation
        sort : Sort lines
        uniq : Remove adjacent duplicate lines

ENG⇒中ENG⇒中
    Filter Introduction

Unix/Linux has a special group of software tools called "filters." Some filters cannot directly accept input from the keyboard or be used independently. Instead, they operate through pipes or redirection. Filters primarily work on a line-by-line basis to filter, search, modify, replace, insert, delete, or perform statistical operations on data.

The output of filters is usually directed to stdout, which is the screen. If you want to save the output to a file, you need to redirect it, such as using cat fileA | tr -s '\n' > fileB.

Commonly used filter tools in Unix/Linux include grep, cut, col, tr, uniq, sort, sed, and awk. Among them, sed and awk have their own scripting languages and are relatively complex, so they will be explained separately in the sed and awk sections.


grep : Search for a string in a file
The name grep comes from the search command "g/re/p" (global/regular expression/print) in the ed and ex line editors. Therefore, it is evident that the main function of grep is to search for data in files.

The command find is powerful for finding file names, but it is powerless when it comes to searching the contents of files. Fortunately, Unix/Linux has the incredibly powerful tool grep, which can be used with regular expressions to search for file contents.

The basic usage of grep is grep PATTERN FILE, where the regular expression pattern (template) is used to match and find the corresponding strings in the file.

For example, to find out which accounts are set up on the system by looking for the keyword "/home" in the account configuration file "/etc/passwd," you can use the command grep /home /etc/passwd to display the lines that contain the string "/home".

E.g.:
$ grep home /etc/passwd ←will display the lines in the file "/etc/passwd" that contain the string "home". This can be used to see the currently established accounts.
aaa:x:500:500::/home/aaa:/bin/bash
bbb:x:501:501::/home/bbb:/bin/bash
frank:x:502:502:Frank Wang:/home/frank:/bin/bash
phoebe:x:503:503::/home/phoebe:/bin/bash

To avoid any misinterpretation of the pattern (PATTERN) when using regular expressions, it is generally recommended to enclose the pattern in single quotes ('') or double quotes (""). Therefore, in the example you provided, it is advised to write it as grep '/home' /etc/passwd.

grep itself is not difficult; the challenge lies in becoming proficient with regular expressions.

Example:
$ ls -F /etc | grep '/$' ←display only the directories under "/etc" and exclude files from the output
a2ps/
acpi/
alsa/
alternatives/
audisp/
audit/
avahi/
blkid/
bluetooth/
(The following is omitted)

However, the most flexible way to use grep is not by directly operating on files but by utilizing pipelines (pipes) to apply multiple conditions. For example:

$ ls -d /etc/* | grep [[:digit:]] | grep [[:upper:]] ←List the files with digits and uppercase letters in the "/etc" directory
/etc/X11

When introducing regular expressions, it was mentioned that ls only supports wildcard characters, which can lead to different outputs depending on the settings or environment. To overcome the limitations of wildcard characters, you can utilize the support for regular expressions in grep. By using ls in combination with grep, you can achieve the same functionality without being affected by the environment.

Example:
$ ls -d /etc/* | grep '[A-Z].*' ←list the files or directories in the "/etc" directory where the first character is uppercase.
/etc/ConsoleKit
/etc/DIR_COLORS
/etc/group.OLD
/etc/Muttrc

Sometimes, you may only remember a certain keyword in the content of a file but forget which file it belongs to. In such cases, the -r option, combined with searching in subdirectories, can be very convenient.

Example:
$ grep -r 'colou*r' /etc/gconf ←Search for files in the /etc/gconf directory (including subdirectories) that contain the keywords "color" or "colour" in their contents.
/etc/gconf/gconf.xml.defaults/%gconf-tree-or.xml: <entry name="color_shading_ty
/etc/gconf/gconf.xml.defaults/%gconf-tree-or.xml: <entry name="secondary_color">
/etc/gconf/gconf.xml.defaults/%gconf-tree-or.xml: <entry name="primary_color">

grep has two commonly used options: -F and -E. The -F option is used to disable the interpretation of regular expressions, treating the pattern as a literal string. On the other hand, the -E option enables extended regular expression matching.

Here are some possible options and usages of grep:

Syntax:[STDIN] grep [-otpiton][--option] [FILE] or [STDOUT]
Command name/Function/Command user Options Function  
grep/
Find strings/
Any in files
-a Search for binary files  
-A# To display the line containing the searched string and the subsequent lines up to a specific line number (indicated by '#'), e.g. grep -A 2 "search_string" file.txt
-B# To display the line containing the searched string and the preceding lines up to a specific line number (indicated by '#') e.g. grep -B 2 "search_string" file.txt
-C# To display the line containing the searched string along with a certain number of lines before and after it (indicated by '#') e.g. grep -C 2 "search_string" file.txt
-c Display the number of lines matching the search results  
-D[read][skip] Search device file or Name Pipe(FIFOs) or Socket files The available items are
"read": treat the device file as a normal file
"skip": do not process the device file
-d[read][skip][recurse] Search directory This option may not be fully supported or may cause errors on some versions of the OS or file system.
The available items are
"read": treat the directory as a general file
"skip": do not process the directory
"recurse": process the directory and subdirectory, the same option "-r"
-e Specified template(pattern) It is mainly used to process files starting with "-"
(because the file starting with "-" is the same as the option symbol "-", this option will be misjudged)
-E Force extended regular expressions  to interpret search syntax  
-f Specified template  
-F Search with fixed strings (i.e. not interpreted in regular expressions )  
-G Iinterpret the pattern as a basic regular expression  
-h Do not list filenames when searching for multiple files The difference is only when searching for multiple files
-H List the content and file name of the line matching the string (this is the default value)  
-i Ignore case differences  
-I If the search binary file matches, do not output "Binary file XXX matches" The option is used to search for patterns in binary files without displaying the message "Binary file XXX" that could potentially disrupt the output.
-l Only list matching filenames Mainly used for multi-file search
-L Only list unmatching filenames Mainly used for multi-file search
-n List the line numbers that match the string  
-q No output Mainly used for bash files when judging
-r Search along with subdirectories (recursive search)  
-v Reverse search, that is, the lines that matches the string is not output  
-w Matches only "whole words" strings For example, the whole word matches "apple", but the string such as "apples" or "applets" does not match
-x Match the entire line exactly against the specified pattern  
--help Displays the command's built-in help and usage information  

Other common uses are as follows:

Example:
$ grep -n 'google' re.txt ← List the number of lines where the string is located

In addition, the pattern to be searched can also be written as a template file, and the search rule can be changed by changing the searched template file

Example:
$ cat MY_PATTERN ← For example, if there is a template file "MY_PATTERN", the content is as follows:
TAIWAN
[Tt]aiwan
$ grep -f MY_PATTERN *.txt ← Use the template in the template file "MY_PATTERN" to search for all files with the extension name "txt"

In addition, if the string to be searched has "-", use grep -e to distinguish it, saying that "-" is not an option, but to search for a string with the character "-".

Example:
$ cat my_file ←For example, the content of the file "my_file" is as follows:
Introduction to Linux
Linux is a muti-user & muti-task OS

$ grep -e '-user' my_file ←Search for the string '-user' in the file "my_file"
Linux is a muti-user & muti-task OS

grep -e has another useful place for multiple string searches.

Example:
$ grep -ne 'mail' -ne 'news' /etc/passwd search string "mail" & "news" and list line number
9:mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
10:news:x:9:13:news:/etc/news:
27:mailnull:x:47:47::/var/spool/mqueue:/sbin/nologin

And grep -qdoes not display output (quiet mode) is often used in scripting languages ​​for program judgment.

Example:
$ grep -q 'google' re.txt && cp re.txt re.txt~ ←If the file "reg.txt" has the string "google", backup this file

^ back on top ^



^ back on top ^


  


cut : Extract fields
The main function of the Linux command cut is to extract fields from a text file. Through the following experiment, I will explain it step by step.

$ echo -e '12\t3\t456\t789'
12         3         456         789

There are four fields separated by tabs (represented by the ASCII control code notation). If you want to extract the second and fourth fields using the cut command, you can use the following command:

$ echo -e '12\t3\t456\t789' | cut -f 2,4
3         789

The default field delimiter for the cut command is indeed a tab. The -f option is used to specify the fields to be extracted, and the selected fields can be represented as follows:
-f n Extrac field n
-f n,m,o,p Extract fields n,m,o and p
-f n-m Extract fields from n to m
-f n- Extract fields from n to last
-f -n Extract fields from first to nth
-f n-m,o-p,q,r- Extract fields from n to m, o to p, q and r to the end

Not all data have tab as the field delimiter. For example, in the "/etc/passwd" file, the field delimiter is ":". In such cases, you can use the -d option to specify the character used as the field delimiter.

Example:
$ cat /etc/passwd | grep '/home' | cut -d":" -f1
aaa
bbb
patrick
cindy
danny

Not only fields, cut can also be used to extract characters, such as cut -c 1-4 FILE to extract characters 1~4.

Example:
$ echo '123456789' | cut -c 5- ←Extract characters from 5 to the last character
56789

The cut options and usage are as follows:
Syntax:[STDIN] cut [-otpiton][--option][CHAR][FILE] Note
Command name/Function/Command user Options Function  
cut/
retrieve field/
Any
-b The extraction unit is byte If used in an English locale, the effect of the "-c" option is equivalent to the "-b" option
-c The extraction unit is character  
-d "CHAR" To specify a custom field delimiter  
-f FIELD[,FILED] Set output fields Use the option -d with a custom field delimiter or the default tab delimiter
-s If a line doesn't have a matching field delimiter, it won't be outputted  
-n Don't split mutibyte characters For multibyte characters used in non-English locales,
--output-delimiter Customize the output delimiting string  
--help Displays the command's built-in help and usage information  

Let me explain some ambiguous options. When using cut -d to set the field delimiter, if a line does not have a matching field delimiter, by default, the entire line will be outputted. In such cases, you can add the -s option to prevent the line from being outputted.

The option "--output-delimiter" allows you to freely set the output delimiter string. In the following example, we will change the original tab delimiter to the string "---" for output.



例:
$ echo -e '12\t3\t456\t789' | cut --output-delimiter="---" -f 1- ←Custom output delimiter string〝---〞
12--3---456---789

The advantage of the cut command is its simplicity, but the corresponding disadvantage is the lack of flexibility. Since it doesn't support regular expressions, let's take the ls -l command as an example. In the output below, the delimiters between fields are variable-length whitespace characters. Extracting fields using "cut" can be a bit tricky in such cases.

Example:
export TIME_STYLE=long-iso ←Set the date/time format (different environment settings will affect the output format of "ls -l")
ls -l --time-style=long-iso
drwxr-xy-x  aaa  aaa  4096  2011-09-07  11:44  Desktop 
drwxr-xy-x  aaa  aaa  4096  2011-09-07  11:44  Documents 
-rw-rw-r--   aaa  aaa  2011-08-08  12:42  fileA
-rw-rw-r-- aaa  aaa  12  2011-08-07  12:34  fileB
-rw-rw-r--   aaa  aaa  112  2011-03-08 10:12  MypProject
8 ←number of fields

In the given example, if the whitespace delimiter length is not fixed or if there are hidden control characters (such as tabs) interspersed within it, and you only want to output the eighth field (the position of the filename) using the command ls -l | cut -d " " -f8 and the field extraction result is not correct, there are several methods to consider in such cases.

^ back on top ^

col : Filter control characters
If you use the man command to look up the documentation for a specific command, here is an example:

$ man regex

REGEX(3)                   Linux Programmer’s Manual                  REGEX(3)

NAME
regcomp, regexec, regerror, regfree - POSIX regex functions

SYNOPSIS
#include <sys/types.h>
#include <regex.h>


int regcomp(regex_t *preg, const char *regex, int cflags);
int regexec(const  regex_t  *pregconst  char *string, size_t nmatch,
(The following is omitted)


When viewing the man (manual) pages for a command, you may notice that the documentation contains formatting elements such as bold text or underlines. This is because the man pages are not plain text files; they contain control characters that control the display or operation of the text.

Sometimes, for convenience or editing purposes, you may want to save the manual page of a command to a file. You can do this using the command man regex > regex.txt, which redirects the output of the man command to a file named "regex.txt". However, when you try to open this file using a text editor like vi or others, you may encounter garbled or unreadable characters, as shown below:

(vi regex.txt manipulation)
REGEX(3)                   Linux Programmer’s Manual                  REGEX(3)



N^HNA^HAM^HME^HE
regcomp, regexec, regerror, regfree - POSIX regex functions

S^HSY^HYN^HNO^HOP^HPS^HSI^HIS^HS
#^H#i^Hin^Hnc^Hcl^Hlu^Hud^Hde^He <^H<s^Hsy^Hys^Hs/^H/t^Hty^Hyp^Hpe^Hes^Hs
(The following is omitted)

In this case, you can use the col program to filter out control characters from the file. The "-b" option is used to filter out all known control characters like "RLF" (Reverse Line Feed), "HRLF" (Half Reverse Line Feed), "HFLF" (Half Forward Line Feed), etc.

So, in the given example, using the command man regex | col -b > regex.txt will filter out the control characters that cause garbled text, allowing you to use a regular text editor to read or edit the file.

The name col is an abbreviation of the word "colander," Generally, col is operated through pipelines (e.g., COMMAND | col) or redirection (e.g., col -x < FILE > SAVE_FILE) for operation.

Another useful option is -x, which converts tabs to corresponding spaces. This can be handy when you have a file that needs to convert tabs to spaces for formatting purposes.


Example:
$ echo -e '12\t\t3\t456\t789' | sed -n 'l' ← use the command "sed" to make the tab appear (display\t)
12\t\t3\t456\t789$
$ echo -e '12\t3\t456\t789' | col -x | sed -n 'l' ← use "col -x" to convert tabs to spaces
12      3       456     789$

For detailed usage, please refer to the provided link.

^ back on top ^

tr : Character translation
If the application is not complex and only involves simple character conversion or deletion (note that it's limited to characters, not strings), tr can be considered as a super simplified version of sed and can be intuitively used. In the following example, the character "1" is converted to "A":

Example:
$ echo '012345' | tr 1 A ←Convert character "1" to "A"
0A2345

In the previous example, it's better to enclose the characters to be transformed in single quotes (') or double quotes (") for clarity. Writing it as echo '012345' | tr '1' 'A' would be clearer. Additionally, the 'tr' command supports POSIX character classes and range expressions like 'CHAR1-CHAR2'. Here's an example:

Example:
$ echo 'abcdef 123 XYZ' | tr 'a-z' 'A-Z' ← convert lowercase characters to uppercase
ABCDEF 123 XYZ
$ echo 'abcdef 123 XYZ' | tr '[:digit:]' 'i-z' ← convert numbers to lowercase English and start from i
abcdef jkl XYZ

You can indeed use the format tr 'abcde' 'ijklm' where the characters 'i', 'j', 'k', 'l', and 'm' sequentially replace 'a', 'b', 'c', 'd', and 'e' respectively. However, it's important to note that it's not replacing the string 'ijklm' with 'abcde'.

In the following example, the character '5' is replaced with 'i' and the character '6' is replaced with 's':

Example:
$ echo "Hello 12345 World" | tr '56' 'is'
Hello 1234s is

Although sed is powerful, it operates on data based on the "pattern space" and removes trailing newline characters, making it unable to handle newline characters directly. In such cases, tr can be used easily.

For example, in earlier versions of Apple's Mac OS 9, the newline character was represented as "CR". To convert a UNIX/Linux file to a format readable by this vintage computer, you can perform the following transformation:
tr "\n" "\r" < UNIX_FILE > MAC_OS9_FILE .

Therefore, one of the main functionalities of tr is to handle control characters. tr supports various control character representations, as shown in the table below. If a representation is not listed, you can also use the octal ASCII code ("\OCTAL", where OCTAL represents the octal ASCII code).
ASCII control character representations Dec Hex ASCII abbr. Name/Meaning
\a 7 7 BEL bell
\b 8 8 BS (backspace
\t 9 9 TAB (horizontal tab
\n 10 0A LF (line feed,new line
\v 11 0B VT vertical tab
\f 12 0C FF NP form feed, new page
\r 13 0D CR carriage return
\\ 92 5c   character "\"

In addition to basic usage tr advanced usage and options are as follows:

Syntax:[STDIN] tr [-otpiton][--option] CHAR SET1 [CHAR SET2]
Command name/Function/Command user Options Function
tr/
Translate character/
Any
-c Invert selection
-d Delete characters
-s elete characters -s Remove consecutive repeated characters
-t Delete the part where the source character is more than the destination character

The option "-s" is commonly used and useful. It is used to delete consecutive duplicate characters and keep only one occurrence of each character.

Example:
$ echo "1         2      3   4" | tr -s " " ←Remove excess whitespace characters.
1 2 3 4
$ echo -e "1\t\t\t2      3       4" | tr -s " \t" ←Remove excess whitespace characters & tab
1      2 3 4
$ tr -s '\n' < fileA > fileB ←Delete "fileA" extra blank line and save it as "fileB"
$ sed -n '5,20 p' fileA | tr -s "\n" ←Delete redundant blank lines in lines 5~20 of "fileA"

Example:
$ echo 'busy buzzing bumblebees buzzing busying' | tr -d 'busy' ← Delete characters "b", "u", "s" or "y" (not delete string "busy").
zzing mleee zzing ing
$ echo -e "1\t\t\t2      3       4" | tr -s " \t" | tr -d " \t" ← First delete the repeated spce and tabs and then delete the blanks and tabs
1234
$ echo 'abcdef 123 XYZ' | tr -d '[:digit:]' ←←delete all digits
abcdef XYZ
$ tr -d '\r' < DOS_FILE > UNIX_FILE ← delete the character '\r' in windows/DOS files

The last example of the above example tr -d '\r' < DOS_FILE > UNIX_FILE is equivalent to the command dos2unix .

Examples of other options are as follows:

Example:
$ echo 'abcdef 123 XYZ' | tr -c '[:alpha:]' '-' ← Convert all non-alphabetic characters to characters 〝-〝
abcdef-----XYZ-
$ echo 'abcdef 123 XYZ' | tr -t 'abcde' 'AB' ←The target characters are only "A" and "B" less than the source "a"~"e", so only convert the first two characters
ABcdef 123 XYZ


^ back on top ^

sort : Sort lines
sort is used to sort the text on a line-by-line basis, comparing the first character of each line in ascending order. If the first characters are the same, the comparison continues with the next character. An annoying aspect of sort is that its sorting order is affected by the locale setting, similar to wildcard characters. To sort in ASCII code order, you need to set the environment variable "LANG=C" (subsequent tests are based on LANG=C as the reference).

The basic usage of sort is straightforward, and once you see an example, you will understand it. Below is an example with a file named "equip," which is a list of computer equipment purchased by the company this year.

cat equip
xerox         apr 4
acer1         feb 1
XEROX-FUJI    may 5
printer1      oct 6
acer2         oct 10
printer1      jul 3
ASUS1         sep 4
Apple         jun 5
IBM2          dec 7
acer2         oct 10
ASUS2         nov 20
IBM1          mar 1

But showing this unorganized list to the boss will definitely get you into trouble. The boss deals with hundreds of thousands of things in a second, so there's no way they would have time to look at unsorted documents. You should at least use the sort command to organize the list.

Example:
$ LANG=C ← Setting the "LANG" environment variable to "C" or "POSIX" will configure the locale to use the ASCII character set
$ sort equip
ASUS1         sep 4
ASUS2         nov 20
Apple         jun 5
IBM1          mar 1
IBM2          dec 7
XEROX-FUJI    may 5
acer1         feb 1
acer2         oct 10
acer2         oct 10
printer1      jul 3
printer1      oct 6
xerox         apr 4

The previous example is much better, but it's still strange that "ASUS" and "acer" with the same starting letter "A" are sorted so differently, as per the ASCII table where uppercase letters come before lowercase letters.

To sort the list without considering case sensitivity, you can use the "-f" option with the sort command. This option tells the sort command to ignore case and treat all letters as uppercase before sorting. In this way, "ASUS" and "acer" will be considered as the same letter starting with "A," and they will be sorted based on other criteria (such as their remaining characters or subsequent sorting rules).

$ sort -f equip ←option "-f" is case-insensitive
acer1         feb 1
acer2         oct 10
acer2         oct 10
Apple         jun 5
ASUS1         sep 4
ASUS2         nov 20
IBM1          mar 1
IBM2          dec 7
printer1      jul 3
printer1      oct 6
xerox         apr 4
XEROX-FUJI    may 5

After the sorting process mentioned above, the list should be in a satisfactory order. The sort command indeed offers more functionalities beyond this. For example, if you want to sort based on specific columns, you can use the "-k" option. In the following example, we will sort based on column 3:

$ sort -k 3 equip ← the "-k" option, you can customize the sorting based on different columns of your data
IBM1          mar 1
acer1         feb 1
acer2         oct 10
acer2         oct 10
ASUS2         nov 20
printer1      jul 3
ASUS1         sep 4
(The following is omitted)

Please note that the previous example still looks a bit strange. For example, when sorting in ascending order, the value "20" in the third column should come after "3." The reason for this behavior is that the sort command sorts based on the first character of each field. If the first character is the same, it proceeds to compare the next character, and so on. Therefore, based on the first character, "1x" and "2x" will always come before "3."

To resolve this issue, you can use the -n option. This option tells the sort command to perform a numerical sort based on the values in the specified field. Here's an example:

$ sort -n -k 3 equip
acer1         feb 1
IBM1          mar 1
printer11     jul 3
ASUS1         sep 4
xerox         apr 4
Apple         jun 5
XEROX-FUJI    may 5
printer11     oct 6
IBM2          dec 7
acer2         oct 10
acer2         oct 10
ASUS2         nov 20

This command will sort the contents of the file "equip" based on the numerical values in the third column. By using the -n option, the sorting will be done based on the numeric value of the field, ensuring that "20" comes after "3" in ascending order.


the sort command supports sorting based on multiple fields with different priority levels. You can achieve this by adding multiple -k options to specify the fields and their priority.
In your example, you want to sort based on field 2, and in the case of equality in field 2, you want to further sort based on field 5. Here's the command:

$ ls -l /etc | sort -k2 -k5 ←Sort according to field 5 when field 2 is equal
(The above is omitted)

-rw-r--r--  1 root root   77598 Nov 25 00:30 ld.so.cache
-rw-r--r--  1 root root   84649 Aug 23  2007 sensors.conf
-rw-r--r--  1 root root  117276 Sep 17  2007 Muttrc
-rw-r--r--  1 root root  362047 Apr 18  2007 services
-rw-r--r--  1 root root  412666 Jan 26 14:21 prelink.cache
(The following is omitted)

the -M option in the sort command is used to sort based on the English month abbreviations. The month abbreviations are three-letter representations such as "JAN," "FEB," and so on

$ sort -M -k 2 equip ← Sort according to the field2 (Month)
acer1         feb 1
IBM1          mar 1
printer11     jul 3
ASUS1         sep 4
(The following is omitted)

Talk about the option "-k", such as "-k 2.3", which means that the sorting starts from the third character of column 2.
The following example assumes that the "id" of the employee user is added before the list you get. For example, if you want to filter out the "id" and then sort it, you can specify field 1 to sort from the 6th character, as shown below.

$ cat equip1 ←The id of the employee user is added before the list
id03_xerox        apr 4
id04_XEROX-FUJI   may 5
id06_acer1        feb 1
id05_IBM1         mar 1
id09_IBM2         dec 7
id12_printer1     jul 3
$ sort -k1.6 equip1 ←The sixth of the specified field 1 Character start sorting
id06_acer1        feb 1
id05_IBM1         mar 1
id09_IBM2         dec 7
id12_printer1     jul 3
id03_xerox        apr 4
id04_XEROX-FUJI   may 5

Other commonly used options for sort are as follows:

Syntax:[STDIN] sort [-otpiton] [FILES]
Command name/Function/Command user Options Function
sort/
sortitng/
Any
-b ignore leading whitespace characters on each line
-d Only whitespace, numbers and letters are considered
-f Ignore case
-g The size of the number is sorted, similar to the option "-n" but can be scientific notation such as "1.23E10"
-i Ignore non-printable characters
-k field[.STAR CHAR] Sort by the specified column
-M Sort according to the English "JAN", "FEB"... "DEC" of the month
-n Sort by number
-o FILE Write output to file instead of screen
-R Random order
-r Reverse sort (big to small)
-t CHAR Specifies the delimiter character.
-u Delete duplicate lines after sorting

^ back on top ^

uniq : Remove adjacent duplicate lines
The uniq command in Linux is used to delete adjacent duplicate lines. Therefore, it is commonly used in conjunction with the sort command to sort the lines first and then remove adjacent duplicate lines using uniq. Alternatively, uniq can also be used independently to delete consecutive duplicate empty lines.

例:
$ sort equip | uniq > result ←sorts the content of the "equip" file and then uses "uniq" to remove adjacent duplicate lines(equivalent to using the "sort -u")
$ echo -e 'lineA\nlineA\nlineB' | uniq ←will delete adjacent duplicate lines. Only consecutive duplicate lines will be removed
lineA
lineB
$ echo -e 'lineA\nlineB\nlineA' | uniq ←will not delete non-adjacent duplicate lines. Only consecutive duplicate lines are removed
lineA
lineB
lineA
$ echo -e 'lineA\n\n\n\n\n\nlineB' | uniq ←remove consecutive duplicate empty lines
lineA

LineB

The uniq function is simple, so there are not many options. The commonly used options are as follows:

Syntax:[STDIN] uniq [-otpiton] [FILES]
Command name/Function/Command user Options Fuction
uniq/
delete adjacent duplicate lines/
Any
-c Show the number of repetitions
-d Show only adjacent and repeated rows
-f # Skip the compared fields ("#" is the number of fields)
-s # Ignore the #th characte
-u Contrary to "-d", list lines that appear only once
-w # Compare at most # characters per line

Example:
$ echo -e 'lineA\nLineB\nLineB\nLineC' | uniq -u ←lists only the lines that appear exactly once
lineA
LineC
# echo -e 'lineA\nLineB\nLineB\nLineC' | uniq -c ←displays the count of repeated occurrences of each line
1 lineA
2 LineB
1 LineC
$ uniq -s 9 fileA ←Do not compare the 9th characters of each line
$ uniq -w 9 fileA ←Compare only the first 9 characters of each line


^ back on top ^