Linux File Compression

The Linux Newbie Guide

⇒

Fundamentals

Advanced

Supplement

Command Index

ENG⇒中

File Compression

1.0 Introduction to File Compression
1.1 Common Compressed File Formats
gz Files
gzip : Compress/Decompress .gz files
gunzip : Decompress gz files
zcat : Read gz Compressed files
.bz2 Files
bzip2 : Compress/Decompress .bz2 files
bunzip2 : Decompress .bz2 files
bzcat : Reads bz2 compressed files
bzip2recover : Recovers data from damaged bz2 files
.xz Files
xz : Compress/Decompress .xz files
unxz Decompress xz files
xzcat: Read xz Compressed filess
.Z Files
compress Compress/Decompress .Z files
uncompress Decompress .Z files
.zip files
zip : Compress files into zip format
unzip : Decompress zip files
zipinfo : Lists information about zip files
1.2 File Archiving
.tar Files
tarball A Compressed tar Files
tar : Archives/Extracts files from a tar file
tar bomb
ENG⇒中 ENG⇒中
1.0 Introduction to Compressed Files

The purpose of using compression files is to reduce the size of data. Even if users don"t directly interact with PCs, they often come across compressed files in various forms, such as MP3 music or photos taken with smartphones. However, formats like MP3 for music or JPG for photos are often "lossy compressions," which means they discard imperceptible information to sacrifice some quality and significantly reduce file size. Some audiophiles with "audiophile" claim to perceive the distortion caused by lossy compression in MP3s and prefer not to listen to digital music in that format. Similarly, many professional photographers prefer working with non-distorted raw files.

However, most data cannot afford any distortion. For example, if you deposit $1000 in a bank, you wouldn"t accept it if it became $900 due to compression. In such cases where no distortion is allowed, it is referred to as "lossless compression." Lossless compression aims to maintain 100% fidelity upon decompression, but its compression ratio is usually lower compared to lossy compression.

Now, why is data compressible? Let"s take an example: it is said that the longest place name in the United States is "Chargoggagoggmanchauggagoggchaubunagungamaugg," which consists of 45 letters. If you observe closely, you"ll notice that the letters "gg" and "ago" repeat in many places. To compress this place name, I can use "!" to represent "gg" and "@" to represent "ago," resulting in "Chargo!@!manchau!@!chaubunagungamau!" which is only 36 letters long. Users can invent other rules to store this place name using even fewer bytes.

Compression efficiency varies depending on the nature of the data. Some compression techniques are particularly effective for text files, while others achieve higher compression ratios for executable files. However, attempting to compress a file that has already been compressed, whether lossy or lossless, may actually increase its size. Additionally, the different compression software available is a result of using various compression algorithms.

^ back on top ^

1.1 Common Compressed File Formats

The only one who can fix the problem is the one who created the problem. Use the same software to compress and decompress. But how do I know which software was used to compress the file? Fortunately, it can be determined based on its file extension. In general, file extensions in UNIX/Linux are only for "reference." For example, a plain text file composed entirely of ASCII characters may not necessarily have the extension ".txt". To accurately determine the file type, the file command is usually used. However, file compression in UNIX/Linux heavily relies on file extensions.

Different compression commands usually have their own specific file extensions to indicate whether it is a compressed file and which compression format was used. Common file extensions for compressed files in UNIX/Linux include ".gz", ".bz2", ".zip", ".Z", ".xz", which are mainly used for free software or open-source software. Some compressed file formats like ".rar" and ".arj" with copyright concerns are generally avoided.

In addition, the UNIX/Linux world also popularizes the concept of "archiving" files, which means packaging multiple files or directories into a single file for convenient transmission or storage. However, archiving itself does not involve compression. To reduce file size, most archived files are further compressed. The most common archiving tool is tar, and it uses the file extension ".tar". When a file is both archived and compressed, it is referred to as a "tarball, which may have file extensions such as ".tgz", ".tbz", ".taz", ".tzx," etc.

.gz Files
".gz" file is a file compressed by gzip , which is very common in UNIX/Linux system.

gzip : Compress/Decompress .gz files
Files compressed with gzip will be automatically distinguished by adding a ".gz" extension at the end of the file, as shown in the following example:

$ seq 1 1000000 > file ← generate a large file
$ ls -lgGh file ← check the size
-rw-rw-r--. 1 6.6M Jun 3 09:33 file ← size is 6.6M
$ gzip file ← compressed with gzip The newly generated 6.89G file
$ ls -lgGh file.gz ← View the compressed file size
-rw-rw-r--. 1 2.1M Jun 3 09:33 file.gz ← only the original file

In particular, when compressing a file with gzip , the original file will be deleted by default.
Use the option "-d" for decompression, and the compressed ".gz" file will be automatically deleted after decompression.

Example:

$ gzip -d linux.words.gz ←unzip the file "linux.words.gz"

The advanced usage of gzip is as follows:

Syntax:gzip [-otpiton][--option] [file/directory]
Command name/Function/Command user	options	Function
gzip/ compress and decompress gz files/ Any	-c	Output to standard output (stdout) and the original file is not changed
	-d	Decompress
	-f	If the file name already exists or the hard link file is also forcibly compressed
	-l	(lowercase "L") lists file information
	-q	Do not display any warning and error messages
	-r	Recursive processing, that is, files in the included directory are processed together
	-S	Change the default ".gz" extension (Note! If the extension is not ".z", ".Z", "-z" or "-Z", it may not be able to decompress)
	-t	Test whether the compressed file is damaged
	-v	Show compression/decompression information
	-1..9	Specify the compression rate, you can specify 1~9, the higher the number, the higher the compression rate, but the processing time is longer, and the compression rate is 6 if you do not specify
	--help	Displays the command's built-in help and usage information

例:

$ gzip -l my_file.gz ← List the information of the compressed file "my_file.gz"
cmpressed uncompressed ratio uncompressed_name
2608 6492 60.2% my_file.txt
$ gzip -S -z my_file ← Compress the file "my_file" to "my_file-z"
$ gzip -9 my_file ← specify a compression rate of 9
$ gzip -c file1 > file2.gz ← compress and output to the file "file2.gz" (the original file will not be deleted)
$ gzip -r /My_Dir ←Compress all files in the directory "My_Dir"
$ gzip *.txt ←Compress all files with the extension ".txt" Compress
$ gzip -l *.gz ← List all ".gz" compressed file information

^ back on top ^

gunzip : Decompress .gz files
You can also use the gunzip command (gunzip is the abbreviation of gun ungzip) to directly decompress the ".gz" file without any options, because gunzip is equivalent to gzip -d .

Example:

$ gunzip my_file.gz ←Decompress "my_file.gz"
$ gunzip -c my_file.gz | cat ←Same as zcat command, read compression File ".gz" content

zcat : Read gz Compressed files
To read a text file that has been compressed using gzip, you can use zcat without the need to decompress it first. zcat is a combination of the gzip and cat commands, allowing you to directly read the compressed file.

Example:

$ zcat my_file.gz | less ← directly read the compressed file "my_file.gz"
$ zcat file1.gz file2.gz > file3 ← read the two decompressed files and merge them into "file3"

^ back on top ^

.bz2 Files
The file extension ".bz2" is a file compressed by bzip2 . bzip2 is an advanced version of gzip . It has a higher compression rate and is becoming more and more popular, and its usage is almost the same as gzip , plus it has a repair function. It can be regarded as a gzip substitute (But there is no gzip -r recursive directory option, so if you want to compress the directory together, it is usually made into a tarball ).

bzip2 : Compress/Decompress .bz2 files
When you compress a file using bzip2, it will automatically add the ".bz2" extension to the file name and delete the original uncompressed file.
The following example is the same as demonstrating the actions of gzip , and then compare it with bzip2

Example:

$ seq 1 1000000 > file ← generate a large file
$ ls -lgGh file ← check the size
-rw-rw-r--. 1 6.6M Jun 3 09:33 file ← size is 6.6M
$ bzip2 file ← compressed with bzip2 The newly generated 6.89G file
$ ls -lgGh file.bz2 ←Check the compressed file size
-rw-rw-r--. 1 1.2M Jun 3 09:33 file.bz2 ←The size is only 1.2M, only the original file remains 1/6 of

The above test shows that bzip2 has a better compression rate than gzip .

Example:

$ bzip2 -c file.bz2 > file2 ←Compressed and output to the file "file2" (the original file will not be deleted)
$ bzip2 -d file.bz2 ←Decompress the file "file.bz2"

The options and usages available for bzip2 are as follows:

Syntax:bzip2 [-otpiton][--option] [file/directory/]
Command name/Function/Command user	Options	Function
bzip2/ Compress and decompress bz2 files/ Any	-c	Do not output directly to the file, but to standard output
	-d	Decompress
	-f	If the file name already exists or the hard link file is also forcibly compressed
	-k	Do not delete the source file after compression
	-q	Do not display any warning and error messages
	-t	Test whether the compressed file is damaged
	-v	Show compression/decompression information
	-1..9	Specify the compression rate, you can specify 1~9, the higher the number, the higher the compression rate, but the processing time is longer, the default compression rate is 9
	--help	Displays the command's built-in help and usage information

bunzip2 : Decompress .bz2 files
bunzip2 is equivalent to bzip2 -d , directly decompress the ".bz2" compressed file.
example:

$ bunzip2 file.bz2 ←Decompress "file.bz2"
$ bunzip2 -c file.bz2 > file ←Decompress and redirect to "file", the original compressed file will not be deleted
bzcat : Reads bz2 compressed files
bzcat can directly read ".bz2" compressed files without decompressing them.

例:

$ bzcat my_file.bz2 | head ← Directly read the first 10 lines of the compressed file "my_file.bz2"

bzip2recover : Recovers data from damaged bz2 files
".bz2" compressed files not only have better compression rates than ".gz" files, but in case a ".bz2" file gets damaged, there is a tool called bzip2recover that can attempt to repair it. While it is not guaranteed to successfully recover the file, sometimes trying a desperate measure can bring unexpected results. The usage of bzip2recover is straightforward - simply provide the damaged ".bz2" file as input.

例:

$ bzip2recover file.bz2 ← Attempts to repair damaged archive "file.bz2"
bzip2recover 1.0.4: extracts blocks from damaged .bz2 files.
bzip2recover: searching for block boundaries ...
block 1 runs from 80 to 274
block 2 runs from 323 to 392 (incomplete)
bzip2recover: splitting into blocks
writing block 1 to `rec00001file.bz2" ... ←The file after the attempted repair will be archived as "rec00001xxxx.bz2"
bzip2recover: finished

^ back on top ^

.xz Files
The file extension ".xz" is a file compressed by xz , which has a higher compression rate than bzip2 and can be regarded as a substitute for bzip2 .

xz : Compress/Decompress .xz files
xz usage and options are highly similar to bzip2 , commonly used options and usage are as follows:

Syntax:xz [-otpiton][--option] [file/directory/]
Command name/Function/Command user	Options	Function
xz/ Compress and decompress xz files/ Any	-c	Do not output directly to the file, but to standard output
	-d	Decompress
	-f	If the file name already exists or the hard link file is also forcibly compressed
	-k	Do not delete the source file after compression
	-q	Do not display any warning and error messages
	-t	Test whether the compressed file is damaged
	-T	Support multithreading, the default is 1, set to 0, thread = number of CPU cores
	-v	Show compression/decompression information
	-1..9	Specify the compression rate, you can specify 1~9, the higher the number, the higher the compression rate, but the processing time is longer, the default compression rate is 9
	--help	Displays the command's built-in help and usage information

The following example is the same as demonstrating the actions of gzip , and compare it again with xz.

Example:

$ seq 1 1000000 > file ← generate a large file
$ ls -lgGh file ← Check the size
-rw-rw-r--. 1 6.6M Jun 3 10:33 file ← Size is 6.6M
$ xz file ← compress with xz The newly generated 6.6M file
$ ls -lgGh file.xz ← Check the compressed file size
-rw-rw-r--. 1 183K Jun 3 10:35 file.xz ← The size is only 1.83K, only the original file remains 1/36

Example:

$ xz -T0 file Use multi-threading options to speed up compression

unxz : Decompress .xz files
unxz is equivalent to xz -d , decompress ".xz" compressed file directly.
example:

$ unxz archive1.tar.xz ← Uncompress "archive1.tar.xz"
$ unxz file[123].xz ← Decompress "file1.xz", "file2.xz" and "file3.xz"
$ unxz file.lzma ←unxz can also decompress ".lzma" files compressed by lzma

xzcat : Read xz Compressed files
xzcat can directly read ".xz" compressed files without decompression.

Example:

$ xzcat file.xz | more ← Directly read the content of the compressed file "file.xz"
$ xzcat file.xz | tr "a-z" "A-Z" > file.txt ← Read the content of "file.xz" and save it in uppercase "file.txt"

.Z Files
The ".Z" file is a file compressed using the compress utility, which is an antique-level compression software that was once popular but is now considered to have a low compression ratio and has gradually been phased out. Many newer Linux distributions may no longer include the compress utility. However, if you need to decompress a ".Z" file, there's no need to worry. As mentioned before, gzip can handle the decompression of ".Z" files. There may still be some people using very old versions of UNIX/Linux, so it is still necessary to introduce the traditional compress command.

compress : Compress/Decompress .Z files

compressis used as follows:

Syntax:compress [-otpiton] [file/directory]
Command name/Function/Command user	Options	Functiion
compress/ compress and decompress Z files/ Any	-c	Do not output directly to the file, but to standard output
	-d	Decompress
	-f	If the file name already exists or the hard link file is also forcibly compressed
	-r	Recursive processing, that is, files in the included directory are processed together
	-v	Show compression/decompression information

例:

$ compress my_file ← Compress the file "my_file" to "my_file.Z"
$ compress -c file1 > file2.Z ← Compress the file to STDOUT, and redirect it to the file "file2.Z" (the original file will not be deleted )
$ compress -d file.Z ← Decompress "file.Z"

uncompress : Decompress .Z files
uncompress is equivalent to compress -d , directly decompress ".Z" compressed file.

例:

$ uncompress file.Z ← Deompress file "file.Z"
$ uncompress -c file.txt.Z | head ← Read contents of compressed file without decompressing

..zip File
The ".zip" format is a cross-platform compression format commonly encountered not only in UNIX/Linux but also in DOS/Windows, macOS on Apple computers, and the now-discontinued IBM OS/2. Microsoft Windows versions from Windows XP and onwards even include built-in functionality for extracting and creating .zip files.

Another distinguishing feature of ".zip" files is that they offer something that formats like .gz, .bz2, or .Z do not have, which is the ability to both compress and archive files. In other words, a ".zip" file can not only compress files but also archives multiple files, including directories, into a single file.

zip : Compress files into zip format
The command for compressing files into a ".zip" format is called zip. Due to its cross-platform compatibility and archives functionality, it can be slightly more complex. Since ".zip" is a cross-platform compression format, its usage differs somewhat from traditional UNIX/Linux commands. In general, Linux commands follow the format of "command source destination," but the usage of the zip command follows the format of "command destination(compressed file) source," which is more similar to the format of DOS/Windows commands.

One significant difference between the zip command and compression software like gzip or bzip2 is that, by default, zip does not delete the source files after compression, whereas gzip and bzip2 only perform compression and do not archives files.

Example:

$ zip services.zip /etc/services ← Compresses the file "/etc/services" into a file called "services.zip".
adding: etc/services (deflated 73%)
$ zip CFG /etc/*.cfg ←ompresses and packages all files with the ".cfg" extension in the "/etc/" directory into a file called "CFG.zip".
adding: etc/a2ps.cfg (deflated 62%)
adding: etc/a2ps-site.cfg (deflated 59%)
adding: etc/enscript.cfg (deflated 53%)

The advanced usage and options of zip are as follows:

Syntax:zip [-otpiton][traget file] [file/directory]

Command name/Function/Command user

Options

Function

zip/
compressed zip file/
Any

-b PATH

Specify a temporary directory while working

-c

Annotate each compressed file

-d

Delete files in archive

-D

Do not store directory names when compressing

-f

Update the contents of the archive

-F

Attempt to repair corrupt zip files

-h

To display their built-in help

-i

Only compress files that meet the criteria

-j

Do not store directory information

-l

(lowercase "L") If it is a text file, change the UNIX newline to DOS/Windows newline character when compressing
(in order to be compatible with DOS/Windows)

-ll

lowercase "LL") If it is a text file, change the newline character of DOS/Windows to the newline character of UNIX when compressing

-m

Compression is complete, delete the source file

-n

Specifies file extensions or extensions not to be recompressed

-o

Update the mtime in the compressed file

-q

Do not show message

-P

(uppercase "p") Compression encryption

-r

Recursive processing, that is, including directories are processed together

-T

Test whether the compressed file is damaged

-u

Update the content of the compressed file (similar to -f but can add files)

-v

show compressed information

-x

Exclude matching files

-X

Do not store additional file information such as times , owner when compressing

-y

When compressing, if it is a symbolic link , save the symbolic link file directly (not the target file of the symbolic link)

-z

Annotate archives

-1..9

Specify the compression rate, you can specify 1~9, the higher the number, the higher the compression rate, but the processing time is longer

The basic usage of zip is as follows:

$ zip -r folder.zip my_dir/ ←Compresses the entire directory "my_dir" into a file named "folder.zip".
$ zip -r folder.zip dir1/ dir2/ ←create a zip archive named "folder.zip" by compressing and packaging two directories, "dir1" and "dir2", and their contents.
$ zip -r fileA dir/ fileB ←Compresses and packages the directory "dir" along with the file "fileB" into a zip archive named "fileA.zip"

Some files, whether they are lossy compressed files such as ".gif", ".jpg", ".mp3" or lossless compressed files such as ".rar" (already compressed files), can hardly be compressed anymore, and recompressing is just a waste of time And the file will become larger instead, so you can specify only packaging without compression.

Example:

$ zip -r my_pic ~/picture -n .rar:.jpg ←compresses and packages the directory "picture" within your home directory into a zip archive named "my_pic.zip". However, files with the extensions ".jpg" and ".rar" within the archive will be stored without compression.

In the previous example, if you want to specify multiple files to be included in the zip archive without compression, you can separate them using a colon ":" after the "-n" option. To remove a specific file from the zip archive without recompressing the entire package, you can use the "-d" option.

For example, to remove a certain file from a zip file:

$ zip -d my_pic.zip ~/picture/*.bmp ←Remove the extension ".bmp" from "my_pic.zip"

Sometimes you only want to compress and archive some files, or exclude some files with options "-i" and "-x".

例:

$ zip -r taipei_trip -i *.bmp *.raw ←Only compress ".bmp" and ".raw" files in the working directory
$ zip -r taipei_trip folder/ -x *.jpg ←Exclude compression in the "folder" directory ".jpg" file

If you want to encrypt or annotate the compressed file, you can use the options "-P" and "-z" or "-c".

Example:

$ zip -P "123" file.zip file ←〝123〞is the password, and you must enter the correct password to decompress it.
$ zip -z my_file.zip *.txt To add a comment or annotation to a compressed file that can be viewed when extracting it
enter new zip file comment (end with .): ← Enter a comment(can be several lines) to complete, press <Ctrl-D>
this is my homework↵ Enter←Enter the comment
about KEELUNG of history↵ Enter
Ctrl+d ←Complete to press <Ctrl-D> to end

If there are many files that need to add the same comment, wouldn’t it be troublesome to input the same comment every time?! You can edit the comment in advance and save it as a file, and then use input redirection to input the compressed file

Example::

$ zip -z my_file.zip < comment.txt ← Import the edited comment file "comment.txt" into the compressed file with STDIN

zip -z can only annotate the final compressed and packaged files, if you want to add annotations to each compressed file, you can use zip -c .

Example:

$ zip -c my_job file1.xls file2.xls file3.xls ← Comment for each compressed file
Enter comment for file1.xls
Sep ← Enter comment for file "file1.xls"
Enter commnet for file2.xls
Oct ← Enter Comment for file "file2.xls"
Enter comment for file3.xls
Nov ← Enter comment for file "file3.xls"

If the content in the compressed file changes frequently, you don’t need to compress it all again, just use the options "-f" and "-u".

Example:

$ zip fileAB fileA fileB ← Compressed file "fileAB.zip" has 2 files
$ echo "update fileB" >> fileB ←Change the content of one
$ zip -f fileAB.zip ←Update the content of the compressed file

In the above example, if there are two files in the compressed archive and one of the files has been modified, you can update the compressed archive without recompressing the entire archive or specifying the source file by using the command zip -f ZIP_FILE or zip -u ZIP_FILE. This makes it convenient to update the compressed archive.

(Continuing from the previous example)

$ echo "this is a fileC" > fileC ←Adding a new file "fileC.txt"
$ zip -u fileAB.zip fileC ←will archive the "fileC" into the existing "fileAB.zip"

By default, when running the zip command, it may generate numerous temporary files in the working directory. If you are compressing a large amount of data and the working directory is not sufficient, it may result in a compression failure. In such cases, you can use the zip -b PATH option to specify a larger directory to store the temporary files. This helps to avoid compression failures due to insufficient space in the working directory.

For example, to specify the directory /path/to/temp as the location for temporary files, you can use the following command:

$ zip -b /path/to/temp archive.zip file1.txt file2.txt

This command will compress the files file1.txt and file2.txt into the archive.zip file, using the /path/to/temp directory as the location for temporary files. Make sure that the specified directory has enough space to accommodate the temporary files generated during the compression process.

unzip : Decompress zip files
The command used to extract ".zip" files is unzip, and its basic usage is straightforward. You can simply append the path to the zip file after the unzip command.

Example:

$ unzip file.zip ←Unzip "file.zip" to the working directory
$ unzip file.zip -d ~project/←Unzip to the specified directory "project"
$ unzip -p file.zip ← Read the contents of a compressed file without extracting it

The advanced usage and options of unzip are as follows:

Syntax:unzip [-otpiton][ziped file] [file/directory]
Command name/Function/Command user	Options	Function
unzip/ Decompress zip file/ Any	-c	Do not output directly to the file, but to standard output.
	-d PATH	Specify the decompressed directory (not specified as the working directory.)
	-f	Update the archive by extracting only the newest files. (It compares the files in the archive with the files in the destination directory and keeps the latest version of each file.)
	-l	(lowercase "L") only lists brief information about the archive without actually decompressing it.
	-p	Extract to stdout or a pipeline
	-t	Test whether the compressed file is damaged
	-u	Same as -f, but only archives that do not exist in the working directory will be decompressed
	-v	Display decompression information (more detailed than unzip -l )
	-x FILE	Specifies not to decompress archives
	-Z	Similar to execute zipinfo

After a long time, you may forget which files are in the compressed file, or what the content of the file is, you can use unzip -l or unzip -v to check it.

Example:

$ unzip -l my_job ←Do not decompress, only list short information
Archive: my_job.zip
FT programming schdule ←If you use "zip -z" to add a comment when compressing, it will display the comment
Length     Data    Time    Name
--------   ----     ----    ----
6492 06-03-12 12:58   file1.xls
Sep ←If you used the zip -c to add a comment while compressing a file, the comment will be displayed when viewing the contents of the compressed file.
6492 06-03-12 12:58   file2.xls
Oct ←(Same as above)
6492 06-03-12 12:58   file3.xls
Nov ←(Same as above)
--------                     ----
18268                        3 files

$ unzip my_file.zip ← If you use "zip -z" to add comments when compressing, the comments will be displayed before decompression.
this is my homework ←Display Note
about KEELUNG history
extracting: people.txt ←Start decompression

After unzip is decompressed, the existing file may be overwritten. Use unzip -u or unzip -f to overcome this problem.

Example:

$ unzip -f my_job.zip ← Compare the existing files in the directory to be decompressed with the compressed files in zip, and keep the one that is the latest
$ unzip -u my_job.zip ← Same as above, but the files in the zip file shall prevail

When you run unzip -f my_job.zip, the files in the directory are compared with the files in the "my_job.zip" archive. If a file with the same name exists both in the directory and the archive, the command checks the timestamps of the files. Only if the file in the archive is newer, it will be extracted and replace the corresponding file in the directory. If there are files in the directory that do not exist in the archive or vice versa, they will not be affected.

This command is useful when you want to update the files in a directory with the newer versions from a corresponding zip archive without overwriting files that haven't been modified or are not present in the archive.

zipinfo : Lists information about zip files
zipinfo allows users to view zip archives without decompression, and is close to using the familiar ls -l command format to read the owner or permission of the file .

The usage of zipinfo is as follows:

Syntax:zipinfo [-otpiton][file]
Command name/Function/Command user	Optios	Function
zipinfo/ (zip infomation) list zip file information/ Any	-1	(Number 1) List only the names of the files in the zip file (this option will invalidate other options related to the output format)
	-2	Same as -1 but can be used with -h, -t, -z and other options
	-h	Only list the zip file name, size (byte) and multiple files contained in it
	-m	Similar to the command " ls -l " to list the names of the files in the zip file, but increase the display compression ratio
	-M	Similar to the command " ls -l \| more " to list the names of the files in the zip archive
	-s	Similar to the command " ls -l " to list the names of the files in the zip file, this option is the default value
	-t	Similar to -h, but increases display compression ratio
	-T	like -s but more verbose time
	-v	Detailed information about each file
	-x	Specify listed files
	-z	If you use "zip -z " to add annotations, the annotations will be displayed

Example:

$ zipinfo ft.zip ← similar to the command "ls -l" to list the names of the files in the zip file
Archive: ft.zip   4101 bytes   3 files
-rw-rw-r-- 2.3 unx     7253 tx defN 4-Jun-11 21:16 ft.c
-rw-rw-r-- 2.3 unx      953 tx defN 4-Jun-11 21:16 ft.h
-rw-rw-r-- 2.3 unx     1581 tx defN 4-Jun-11 21:19 lpt_ctr.c
3 files, 9787 bytes uncompressed, 3715 bytes compressed: 62.0%

^ back on top ^

1.2 File Archiving

Linux tar is a command-line utility used for file archiving and compression in the Linux operating system. "tar" stands for "tape archive," as it was originally designed for creating archives on magnetic tape drives. However, it is now commonly used for creating and managing archives on various storage media, including hard drives, solid-state drives, and network shares.

The tar command allows you to combine multiple files and directories into a single archive file, which can then be compressed using various compression algorithms such as gzip or bzip2. Tar archives preserve file permissions, ownership, and directory structure, making them ideal for creating backups, transferring files, or distributing collections of files.

One of the advantages of using tar is its compatibility with various compression formats. It can create uncompressed tar archives, as well as compressed archives using gzip (.tar.gz), bzip2 (.tar.bz2), xz (.tar.xz), and other compression algorithms. This flexibility allows users to choose the compression format that best suits their needs in terms of file size and compression speed.

The tar command provides a wide range of options and flags to control various aspects of the archiving and compression process. It allows you to specify file and directory exclusions, preserve symbolic links and special file attributes, set compression levels, and more.

In summary, Linux tar is a versatile and powerful tool for creating, managing, and compressing file archives in the Linux environment. It is widely used for tasks such as backups, software distribution, and file transfers, offering flexibility and efficiency in handling large collections of files and directories.

.tar Files
The original purpose of the ".tar" file format was to bundle multiple files (including directories) into a single file for convenient backup onto magnetic tape. Therefore, it is called a "tape archive" or tar file. Tar files typically have the ".tar" extension.

tarball A Compressed tar Files
The ".tar" file is simply archived using the tar command without compression. When it is further compressed using formats like xz, gzip, or bzip2, it is referred to as a "tarball." The source code of the Linux kernel or other open-source tools is often distributed in the form of tarballs.

For example, if there is a tar file named "file.tar," when it is compressed using bzip2, it becomes "file.tar.bz2." However, the extension for tarballs can be quite long, and many file systems, such as FAT, do not support file names like "file.tar.bz2." Therefore, tarballs are commonly abbreviated as follows:

Original tarball file name	Abbreviated tarball file name
.tar.bz2	.tbz or .tb2 or .tbz2
.tar.gz	.tgz
.tar.Z	.taz
.tar.xz	.txz
.tar.lzma	.tlz

Since the abbreviations for tarballs follow informal conventions and are not mandatory, some people get creative and use extensions like ".tz". So, is it "tbz", "tgz", or "taz"? Therefore, if you come across a file and are unable to determine its specific tarball format, you can use the file command to confirm it!

tar : Archives/Extracts files from a tar files
The tool used for archiving/extracting(restoring) ".tar" files is called "tar". tar can archive multiple files or even the entire contents of a directory into a single file for backup purposes. Therefore, it is commonly used to back up important directories and files. Since tar combines archiving/extraction with the ability to call various compression tools for processing, it can be a bit complex to use. The following usage examples list common practices. The usage of tar is as follows:

Syntax:tar [-otpiton][traget file] [file/directory]
Command name/Function/Command use	options	Function
tar/ tape archives/ Any	One of the essential options required is
	-A	Append a tar file to another tar file (may not work with tarball files)
	-c	create tar file
	-d	Compares the differences between the files in the tar archive and the corresponding files in the directory.
	-f	Specified file
	-r	Append files to an existing tar archive(may not work with tarball files)
	-t	list the contents of a tar file
	-u	Update the latest file (compare the contents of the tar file with the files in the directory, and only keep the latest)
	-x	Restore file
	--delete	Delete files inside a tar archive (may not work with tarball archives)
	Common options
	-C PATH	Change the directory path during the restoration process
	-k	Do not overwrite existing files on the file system when restoring
	-j	Compress/decompress with bzip2
	-J	Use xz to compress/decompress
	-p	Restore file permissions (this is the default value for root)
	-P	Use an absolute path , that is, do not remove the "/" at the beginning of the path
	-v	Show Processing Information
	-z	Compress/decompress with gzip
	-Z	Use compressto compress/decompress (compress command is required in the system)

The usage of tar may seem complicated, but in reality, it is quite simple. Just remember to use the "-c" option for archiving and the "-x" option for extraction. Additionally, the options related to UNIX/Linux and files (typically "-f") must be added at the end of the command options.

Example:

$ tar -cf etc_gconf.tar /etc/gconf ←Create an archive file named 'etc_gconf.tar' containing all the files within the directory '/etc/gconf
$ tar -xf etc_conf.tar ←Extract the contents of the archive file named 'etc_conf.tar'

It is best to include the "-v" option to display processing information, as it helps to avoid mistakes. Lastly, do not forget the peculiarities of Unix commands: the options related to files (usually "-f") should come at the end, followed by the filename.

Example:

$ tar -cvf etc_gconf.tar /etc/gconf
$ tar -xvf etc_conf.tar

Furthermore, the tar command does not automatically add the file extension ".tar" after archiving. However, to differentiate it, don't forget to add the extension yourself.

In addition, most applications compress the archived files into a tarball to save space. In most Linux distributions, you can use compression options such as "-j," "-J," "-z," or "-Z" to automatically invoke the relevant compression software for compression or decompression. Similarly, tar does not add the file extension to the tarball automatically. You need to add it yourself to ensure that you know the tarball format when it needs to be extracted.

Example:

$ tar -jcvf etc_gconf.tbz /etc/gconf ←Archiving and compressing using bzip2
$ tar -jxvf etc_conf.tbz ←Decompress with bzip2 and extract with tar

$ tar -Jcvf etc_gconf.txz /etc/gconf ←Archiving and compressing using xz
$ tar -Jxvf etc_conf.txz ←Decompress with unxz and extract with tar

Other common uses are as follows:

$ tar -jtvf etc_gconf.tbz ← List the contents of the tar file
$ tar -jcf bin.tar.bz2 /bin /sbin ← Archiving more folder
$ tar -Af fileA.tar fileB.tar ← Merge the two tar files ("fileB.tar" is merged into "fileA.tar")
$ tar -jxvf etc_gconf.tbz -C /tmp ←The restored path is "/tmp"
$ tar -uf my_sch.tar -C SCH_DIR/ ←Update the tar file
$ tar --delete -f my_sch.tar SCH_DIR/*.html ←Delete files in the tar file (delete all .html files in the directory SCH_DIR/)
$ tar -jcvf etc_back.tar.bz2 /etc > tar.log ←Yes Redirect the entire archiving process to "tar.log" to record

^ back on top ^

tar Bomb
When using the command tar -cf file.tar /etc to archive files into a tar file, you may notice messages like "tar: Removing leading '/' from member names." This means that during the archiving process, the leading '/' in the file paths is removed. By default, the tar file only records relative paths. When extracting (e.g., using tar -xf file.tar), the root directory is removed, and the entire archived directory is restored in the current working directory.
The reason tar doesn't use absolute paths by default is to prevent accidental "tar bombs." A "tar bomb" refers to overwriting important files during extraction. For example, overwriting the system's home directory "/home" with the current system's home directory can cause significant issues. If a system directory is overwritten, it may even lead to system failure.

Fortunately, the tar command has double protection to prevent tar bombs. Apart from using the "-P" option during archiving to specify absolute paths, you also need to use the "-P" option during extraction to restore files with absolute paths. Therefore, unless necessary, it's advisable to avoid using absolute paths during archiving and extraction to avoid getting "bombed."

However, not using absolute paths during archiving or extraction doesn't guarantee complete safety. During the extraction process, there is still a possibility of overwriting data in the working directory. Therefore, when restoring a tar file, it's recommended to use tar -t to list the contents of the tar file first. It's also best to create a temporary directory for extracting the tar file and confirm its contents before copying them to the desired directory.

^ back on top ^