The purpose of using compression files is to reduce the size of data. Even if users don"t directly interact with PCs, they often come across compressed files in various forms, such as MP3 music or photos taken with smartphones. However, formats like MP3 for music or JPG for photos are often "lossy compressions," which means they discard imperceptible information to sacrifice some quality and significantly reduce file size. Some audiophiles with "audiophile" claim to perceive the distortion caused by lossy compression in MP3s and prefer not to listen to digital music in that format. Similarly, many professional photographers prefer working with non-distorted raw files.
However, most data cannot afford any distortion. For example, if you deposit $1000 in a bank, you wouldn"t accept it if it became $900 due to compression. In such cases where no distortion is allowed, it is referred to as "lossless compression." Lossless compression aims to maintain 100% fidelity upon decompression, but its compression ratio is usually lower compared to lossy compression.
Now, why is data compressible? Let"s take an example: it is said that the longest place name in the United States is "Chargoggagoggmanchauggagoggchaubunagungamaugg," which consists of 45 letters. If you observe closely, you"ll notice that the letters "gg" and "ago" repeat in many places. To compress this place name, I can use "!" to represent "gg" and "@" to represent "ago," resulting in "Chargo!@!manchau!@!chaubunagungamau!" which is only 36 letters long. Users can invent other rules to store this place name using even fewer bytes.
Compression efficiency varies depending on the nature of the data. Some compression techniques are particularly effective for text files, while others achieve higher compression ratios for executable files. However, attempting to compress a file that has already been compressed, whether lossy or lossless, may actually increase its size. Additionally, the different compression software available is a result of using various compression algorithms.
The only one who can fix the problem is the one who created the problem. Use the same software to compress and decompress. But how do I know which software was used to compress the file? Fortunately, it can be determined based on its file extension. In general, file extensions in UNIX/Linux are only for "reference." For example, a plain text file composed entirely of ASCII characters may not necessarily have the extension ".txt". To accurately determine the file type, the file command is usually used. However, file compression in UNIX/Linux heavily relies on file extensions.
Different compression commands usually have their own specific file extensions to indicate whether it is a compressed file and which compression format was used. Common file extensions for compressed files in UNIX/Linux include ".gz", ".bz2", ".zip", ".Z", ".xz", which are mainly used for free software or open-source software. Some compressed file formats like ".rar" and ".arj" with copyright concerns are generally avoided.
In addition, the UNIX/Linux world also popularizes the concept of "archiving" files, which means packaging multiple files or directories into a single file for convenient transmission or storage. However, archiving itself does not involve compression. To reduce file size, most archived files are further compressed. The most common archiving tool is tar, and it uses the file extension ".tar". When a file is both archived and compressed, it is referred to as a "tarball, which may have file extensions such as ".tgz", ".tbz", ".taz", ".tzx," etc.
$ seq 1 1000000 > file ← generate a large file $ ls -lgGh file ← check the size -rw-rw-r--. 1 6.6M Jun 3 09:33 file ← size is 6.6M $ gzip file ← compressed with gzip The newly generated 6.89G file $ ls -lgGh file.gz ← View the compressed file size -rw-rw-r--. 1 2.1M Jun 3 09:33 file.gz ← only the original file |
$ gzip -d linux.words.gz ←unzip the file "linux.words.gz" |
Syntax:gzip [-otpiton][--option] [file/directory] | ||
Command name/Function/Command user | options | Function |
gzip/ compress and decompress gz files/ Any |
-c | Output to standard output (stdout) and the original file is not changed |
-d | Decompress | |
-f | If the file name already exists or the hard link file is also forcibly compressed | |
-l | (lowercase "L") lists file information | |
-q | Do not display any warning and error messages | |
-r | Recursive processing, that is, files in the included directory are processed together | |
-S | Change the default ".gz" extension (Note! If the extension is not ".z", ".Z", "-z" or "-Z", it may not be able to decompress) | |
-t | Test whether the compressed file is damaged | |
-v | Show compression/decompression information | |
-1..9 |
Specify the compression rate, you can specify 1~9, the higher the number, the higher the compression rate, but the processing time is longer, and the compression rate is 6 if you do not specify
|
|
--help | Displays the command's built-in help and usage information | |
$ gzip -l my_file.gz ← List the information of the compressed file "my_file.gz" cmpressed uncompressed ratio uncompressed_name 2608 6492 60.2% my_file.txt $ gzip -S -z my_file ← Compress the file "my_file" to "my_file-z" $ gzip -9 my_file ← specify a compression rate of 9 $ gzip -c file1 > file2.gz ← compress and output to the file "file2.gz" (the original file will not be deleted) $ gzip -r /My_Dir ←Compress all files in the directory "My_Dir" $ gzip *.txt ←Compress all files with the extension ".txt" Compress $ gzip -l *.gz ← List all ".gz" compressed file information |
$ gunzip my_file.gz ←Decompress "my_file.gz" $ gunzip -c my_file.gz | cat ←Same as zcat command, read compression File ".gz" content |
$ zcat my_file.gz | less ← directly read the compressed file "my_file.gz" $ zcat file1.gz file2.gz > file3 ← read the two decompressed files and merge them into "file3" |
$ seq 1 1000000 > file ← generate a large file $ ls -lgGh file ← check the size -rw-rw-r--. 1 6.6M Jun 3 09:33 file ← size is 6.6M $ bzip2 file ← compressed with bzip2 The newly generated 6.89G file $ ls -lgGh file.bz2 ←Check the compressed file size -rw-rw-r--. 1 1.2M Jun 3 09:33 file.bz2 ←The size is only 1.2M, only the original file remains 1/6 of |
$ bzip2 -c file.bz2 > file2 ←Compressed and output to the file "file2" (the original file will not be deleted) $ bzip2 -d file.bz2 ←Decompress the file "file.bz2" |
Syntax:bzip2 [-otpiton][--option] [file/directory/] | ||
Command name/Function/Command user | Options | Function |
bzip2/ Compress and decompress bz2 files/ Any |
-c | Do not output directly to the file, but to standard output |
-d | Decompress | |
-f | If the file name already exists or the hard link file is also forcibly compressed | |
-k | Do not delete the source file after compression | |
-q | Do not display any warning and error messages | |
-t | Test whether the compressed file is damaged | |
-v | Show compression/decompression information | |
-1..9 | Specify the compression rate, you can specify 1~9, the higher the number, the higher the compression rate, but the processing time is longer, the default compression rate is 9 | |
--help | Displays the command's built-in help and usage information | |
$ bunzip2 file.bz2 ←Decompress "file.bz2" $ bunzip2 -c file.bz2 > file ←Decompress and redirect to "file", the original compressed file will not be deleted |
$ bzcat my_file.bz2 | head ← Directly read the first 10 lines of the compressed file "my_file.bz2" |
$ bzip2recover file.bz2 ← Attempts to repair damaged archive "file.bz2" bzip2recover 1.0.4: extracts blocks from damaged .bz2 files. bzip2recover: searching for block boundaries ... block 1 runs from 80 to 274 block 2 runs from 323 to 392 (incomplete) bzip2recover: splitting into blocks writing block 1 to `rec00001file.bz2" ... ←The file after the attempted repair will be archived as "rec00001xxxx.bz2" bzip2recover: finished |
Syntax:xz [-otpiton][--option] [file/directory/] | ||
Command name/Function/Command user | Options | Function |
xz/ Compress and decompress xz files/ Any |
-c | Do not output directly to the file, but to standard output |
-d | Decompress | |
-f | If the file name already exists or the hard link file is also forcibly compressed | |
-k | Do not delete the source file after compression | |
-q | Do not display any warning and error messages | |
-t | Test whether the compressed file is damaged | |
-T | Support multithreading, the default is 1, set to 0, thread = number of CPU cores | |
-v | Show compression/decompression information | |
-1..9 | Specify the compression rate, you can specify 1~9, the higher the number, the higher the compression rate, but the processing time is longer, the default compression rate is 9 | |
--help | Displays the command's built-in help and usage information | |
$ seq 1 1000000 > file ← generate a large file $ ls -lgGh file ← Check the size -rw-rw-r--. 1 6.6M Jun 3 10:33 file ← Size is 6.6M $ xz file ← compress with xz The newly generated 6.6M file $ ls -lgGh file.xz ← Check the compressed file size -rw-rw-r--. 1 183K Jun 3 10:35 file.xz ← The size is only 1.83K, only the original file remains 1/36 |
$ xz -T0 file Use multi-threading options to speed up compression |
$ unxz archive1.tar.xz ← Uncompress "archive1.tar.xz" $ unxz file[123].xz ← Decompress "file1.xz", "file2.xz" and "file3.xz" $ unxz file.lzma ←unxz can also decompress ".lzma" files compressed by lzma |
$ xzcat file.xz | more ← Directly read the content of the compressed file "file.xz" $ xzcat file.xz | tr "a-z" "A-Z" > file.txt ← Read the content of "file.xz" and save it in uppercase "file.txt" |
Syntax:compress [-otpiton] [file/directory] | ||
Command name/Function/Command user | Options | Functiion |
compress/ compress and decompress Z files/ Any |
-c | Do not output directly to the file, but to standard output |
-d | Decompress | |
-f | If the file name already exists or the hard link file is also forcibly compressed | |
-r | Recursive processing, that is, files in the included directory are processed together | |
-v | Show compression/decompression information | |
$ compress my_file ← Compress the file "my_file" to "my_file.Z" $ compress -c file1 > file2.Z ← Compress the file to STDOUT, and redirect it to the file "file2.Z" (the original file will not be deleted ) $ compress -d file.Z ← Decompress "file.Z" |
$ uncompress file.Z ← Deompress file "file.Z" $ uncompress -c file.txt.Z | head ← Read contents of compressed file without decompressing |
Another distinguishing feature of ".zip" files is that they offer something that formats like .gz, .bz2, or .Z do not have, which is the ability to both compress and archive files. In other words, a ".zip" file can not only compress files but also archives multiple files, including directories, into a single file.
One significant difference between the zip command and compression software like gzip or bzip2 is that, by default, zip does not delete the source files after compression, whereas gzip and bzip2 only perform compression and do not archives files.
$ zip services.zip /etc/services ← Compresses the file "/etc/services" into a file called "services.zip". adding: etc/services (deflated 73%) $ zip CFG /etc/*.cfg ←ompresses and packages all files with the ".cfg" extension in the "/etc/" directory into a file called "CFG.zip". adding: etc/a2ps.cfg (deflated 62%) adding: etc/a2ps-site.cfg (deflated 59%) adding: etc/enscript.cfg (deflated 53%) |
Syntax:zip [-otpiton][traget file] [file/directory] | ||
Command name/Function/Command user | Options | Function |
zip/ compressed zip file/ Any |
-b PATH | Specify a temporary directory while working |
-c | Annotate each compressed file | |
-d | Delete files in archive | |
-D | Do not store directory names when compressing | |
-f | Update the contents of the archive | |
-F | Attempt to repair corrupt zip files | |
-h | To display their built-in help | |
-i | Only compress files that meet the criteria | |
-j | Do not store directory information | |
-l | (lowercase "L") If it is a text file, change the UNIX newline to DOS/Windows newline character when compressing (in order to be compatible with DOS/Windows) |
|
-ll | lowercase "LL") If it is a text file, change the newline character of DOS/Windows to the newline character of UNIX when compressing |
|
-m | Compression is complete, delete the source file | |
-n | Specifies file extensions or extensions not to be recompressed | |
-o | Update the mtime in the compressed file | |
-q | Do not show message | |
-P | (uppercase "p") Compression encryption | |
-r | Recursive processing, that is, including directories are processed together | |
-T | Test whether the compressed file is damaged | |
-u | Update the content of the compressed file (similar to -f but can add files) | |
-v | show compressed information | |
-x | Exclude matching files | |
-X | Do not store additional file information such as times , owner when compressing |
|
-y | When compressing, if it is a symbolic link , save the symbolic link file directly (not the target file of the symbolic link) | |
-z | Annotate archives | |
-1..9 |
|
|
$ zip -r folder.zip my_dir/ ←Compresses the entire directory "my_dir" into a file named "folder.zip". $ zip -r folder.zip dir1/ dir2/ ←create a zip archive named "folder.zip" by compressing and packaging two directories, "dir1" and "dir2", and their contents. $ zip -r fileA dir/ fileB ←Compresses and packages the directory "dir" along with the file "fileB" into a zip archive named "fileA.zip" |
$ zip -r my_pic ~/picture -n .rar:.jpg ←compresses and packages the directory "picture" within your home directory into a zip archive named "my_pic.zip". However, files with the extensions ".jpg" and ".rar" within the archive will be stored without compression. |
$ zip -d my_pic.zip ~/picture/*.bmp ←Remove the extension ".bmp" from "my_pic.zip" |
$ zip -r taipei_trip -i *.bmp *.raw ←Only compress ".bmp" and ".raw" files in the working directory $ zip -r taipei_trip folder/ -x *.jpg ←Exclude compression in the "folder" directory ".jpg" file |
$ zip -P "123" file.zip file ←〝123〞is the password, and you must enter the correct password to decompress it. $ zip -z my_file.zip *.txt To add a comment or annotation to a compressed file that can be viewed when extracting it enter new zip file comment (end with .): ← Enter a comment(can be several lines) to complete, press <Ctrl-D> this is my homework↵ Enter←Enter the comment about KEELUNG of history↵ Enter Ctrl+d ←Complete to press <Ctrl-D> to end |
$ zip -z my_file.zip < comment.txt ← Import the edited comment file "comment.txt" into the compressed file with STDIN |
$ zip -c my_job file1.xls file2.xls file3.xls ← Comment for each compressed file Enter comment for file1.xls Sep ← Enter comment for file "file1.xls" Enter commnet for file2.xls Oct ← Enter Comment for file "file2.xls" Enter comment for file3.xls Nov ← Enter comment for file "file3.xls" |
$ zip fileAB fileA fileB ← Compressed file "fileAB.zip" has 2 files $ echo "update fileB" >> fileB ←Change the content of one $ zip -f fileAB.zip ←Update the content of the compressed file |
$ echo "this is a fileC" > fileC ←Adding a new file "fileC.txt" $ zip -u fileAB.zip fileC ←will archive the "fileC" into the existing "fileAB.zip" |
By default, when running the zip command, it may generate numerous temporary files in the working directory. If you are compressing a large amount of data and the working directory is not sufficient, it may result in a compression failure. In such cases, you can use the zip -b PATH option to specify a larger directory to store the temporary files. This helps to avoid compression failures due to insufficient space in the working directory.
For example, to specify the directory /path/to/temp as the location for temporary files, you can use the following command:$ zip -b /path/to/temp archive.zip file1.txt file2.txt |
$ unzip file.zip ←Unzip "file.zip" to the working directory $ unzip file.zip -d ~project/←Unzip to the specified directory "project" $ unzip -p file.zip ← Read the contents of a compressed file without extracting it |
Syntax:unzip [-otpiton][ziped file] [file/directory] | ||
Command name/Function/Command user | Options | Function |
unzip/ Decompress zip file/ Any |
-c | Do not output directly to the file, but to standard output. |
-d PATH | Specify the decompressed directory (not specified as the working directory.) | |
-f | Update the archive by extracting only the newest files. (It compares the files in the archive with the files in the destination directory and keeps the latest version of each file.) | |
-l | (lowercase "L") only lists brief information about the archive without actually decompressing it. | |
-p | Extract to stdout or a pipeline | |
-t | Test whether the compressed file is damaged | |
-u | Same as -f, but only archives that do not exist in the working directory will be decompressed | |
-v | Display decompression information (more detailed than unzip -l ) | |
-x FILE | Specifies not to decompress archives | |
-Z | Similar to execute zipinfo | |
$ unzip -l my_job ←Do not decompress, only list short information Archive: my_job.zip FT programming schdule ←If you use "zip -z" to add a comment when compressing, it will display the comment Length Data Time Name -------- ---- ---- ---- 6492 06-03-12 12:58 file1.xls Sep ←If you used the zip -c to add a comment while compressing a file, the comment will be displayed when viewing the contents of the compressed file. 6492 06-03-12 12:58 file2.xls Oct ←(Same as above) 6492 06-03-12 12:58 file3.xls Nov ←(Same as above) -------- ---- 18268 3 files $ unzip my_file.zip ← If you use "zip -z" to add comments when compressing, the comments will be displayed before decompression. this is my homework ←Display Note about KEELUNG history extracting: people.txt ←Start decompression |
$ unzip -f my_job.zip ← Compare the existing files in the directory to be decompressed with the compressed files in zip, and keep the one that is the latest $ unzip -u my_job.zip ← Same as above, but the files in the zip file shall prevail |
When you run unzip -f my_job.zip, the files in the directory are compared with the files in the "my_job.zip" archive. If a file with the same name exists both in the directory and the archive, the command checks the timestamps of the files. Only if the file in the archive is newer, it will be extracted and replace the corresponding file in the directory. If there are files in the directory that do not exist in the archive or vice versa, they will not be affected.
This command is useful when you want to update the files in a directory with the newer versions from a corresponding zip archive without overwriting files that haven't been modified or are not present in the archive.
Syntax:zipinfo [-otpiton][file] | ||
Command name/Function/Command user | Optios | Function |
zipinfo/ (zip infomation) list zip file information/ Any |
-1 | (Number 1) List only the names of the files in the zip file (this option will invalidate other options related to the output format) |
-2 | Same as -1 but can be used with -h, -t, -z and other options | |
-h | Only list the zip file name, size (byte) and multiple files contained in it | |
-m | Similar to the command " ls -l " to list the names of the files in the zip file, but increase the display compression ratio | |
-M | Similar to the command " ls -l | more " to list the names of the files in the zip archive | |
-s | Similar to the command " ls -l " to list the names of the files in the zip file, this option is the default value | |
-t | Similar to -h, but increases display compression ratio | |
-T | like -s but more verbose time | |
-v | Detailed information about each file | |
-x | Specify listed files | |
-z | If you use "zip -z " to add annotations, the annotations will be displayed |
|
$ zipinfo ft.zip ← similar to the command "ls -l" to list the names of the files in the zip file Archive: ft.zip 4101 bytes 3 files -rw-rw-r-- 2.3 unx 7253 tx defN 4-Jun-11 21:16 ft.c -rw-rw-r-- 2.3 unx 953 tx defN 4-Jun-11 21:16 ft.h -rw-rw-r-- 2.3 unx 1581 tx defN 4-Jun-11 21:19 lpt_ctr.c 3 files, 9787 bytes uncompressed, 3715 bytes compressed: 62.0% |
The tar command allows you to combine multiple files and directories into a single archive file, which can then be compressed using various compression algorithms such as gzip or bzip2. Tar archives preserve file permissions, ownership, and directory structure, making them ideal for creating backups, transferring files, or distributing collections of files.
One of the advantages of using tar is its compatibility with various compression formats. It can create uncompressed tar archives, as well as compressed archives using gzip (.tar.gz), bzip2 (.tar.bz2), xz (.tar.xz), and other compression algorithms. This flexibility allows users to choose the compression format that best suits their needs in terms of file size and compression speed.
The tar command provides a wide range of options and flags to control various aspects of the archiving and compression process. It allows you to specify file and directory exclusions, preserve symbolic links and special file attributes, set compression levels, and more.
In summary, Linux tar is a versatile and powerful tool for creating, managing, and compressing file archives in the Linux environment. It is widely used for tasks such as backups, software distribution, and file transfers, offering flexibility and efficiency in handling large collections of files and directories.
For example, if there is a tar file named "file.tar," when it is compressed using bzip2, it becomes "file.tar.bz2." However, the extension for tarballs can be quite long, and many file systems, such as FAT, do not support file names like "file.tar.bz2." Therefore, tarballs are commonly abbreviated as follows:
Original tarball file name | Abbreviated tarball file name |
.tar.bz2 | .tbz or .tb2 or .tbz2 |
.tar.gz | .tgz |
.tar.Z | .taz |
.tar.xz | .txz |
.tar.lzma | .tlz |
Syntax:tar [-otpiton][traget file] [file/directory] | ||
Command name/Function/Command use | options | Function |
tar/ tape archives/ Any |
One of the essential options required is | |
-A | Append a tar file to another tar file (may not work with tarball files) | |
-c | create tar file | |
-d | Compares the differences between the files in the tar archive and the corresponding files in the directory. | |
-f | Specified file | |
-r | Append files to an existing tar archive(may not work with tarball files) | |
-t | list the contents of a tar file | |
-u | Update the latest file (compare the contents of the tar file with the files in the directory, and only keep the latest) | |
-x | Restore file | |
--delete | Delete files inside a tar archive (may not work with tarball archives) | |
Common options | ||
-C PATH | Change the directory path during the restoration process |
|
-k | Do not overwrite existing files on the file system when restoring | |
-j | Compress/decompress with bzip2 | |
-J | Use xz to compress/decompress | |
-p | Restore file permissions (this is the default value for root) | |
-P | Use an absolute path , that is, do not remove the "/" at the beginning of the path | |
-v | Show Processing Information | |
-z | Compress/decompress with gzip | |
-Z | Use compressto compress/decompress (compress command is required in the system) |
$ tar -cf etc_gconf.tar /etc/gconf ←Create an archive file named 'etc_gconf.tar' containing all the files within the directory '/etc/gconf $ tar -xf etc_conf.tar ←Extract the contents of the archive file named 'etc_conf.tar' |
$ tar -cvf etc_gconf.tar /etc/gconf $ tar -xvf etc_conf.tar |
Furthermore, the tar command does not automatically add the file extension ".tar" after archiving. However, to differentiate it, don't forget to add the extension yourself.
In addition, most applications compress the archived files into a tarball to save space. In most Linux distributions, you can use compression options such as "-j," "-J," "-z," or "-Z" to automatically invoke the relevant compression software for compression or decompression. Similarly, tar does not add the file extension to the tarball automatically. You need to add it yourself to ensure that you know the tarball format when it needs to be extracted.
$ tar -jcvf etc_gconf.tbz /etc/gconf ←Archiving and compressing using bzip2 $ tar -jxvf etc_conf.tbz ←Decompress with bzip2 and extract with tar $ tar -Jcvf etc_gconf.txz /etc/gconf ←Archiving and compressing using xz $ tar -Jxvf etc_conf.txz ←Decompress with unxz and extract with tar |
$ tar -jtvf etc_gconf.tbz ← List the contents of the tar file $ tar -jcf bin.tar.bz2 /bin /sbin ← Archiving more folder $ tar -Af fileA.tar fileB.tar ← Merge the two tar files ("fileB.tar" is merged into "fileA.tar") $ tar -jxvf etc_gconf.tbz -C /tmp ←The restored path is "/tmp" $ tar -uf my_sch.tar -C SCH_DIR/ ←Update the tar file $ tar --delete -f my_sch.tar SCH_DIR/*.html ←Delete files in the tar file (delete all .html files in the directory SCH_DIR/) $ tar -jcvf etc_back.tar.bz2 /etc > tar.log ←Yes Redirect the entire archiving process to "tar.log" to record |
The reason tar doesn't use absolute paths by default is to prevent accidental "tar bombs." A "tar bomb" refers to overwriting important files during extraction. For example, overwriting the system's home directory "/home" with the current system's home directory can cause significant issues. If a system directory is overwritten, it may even lead to system failure.
Fortunately, the tar command has double protection to prevent tar bombs. Apart from using the "-P" option during archiving to specify absolute paths, you also need to use the "-P" option during extraction to restore files with absolute paths. Therefore, unless necessary, it's advisable to avoid using absolute paths during archiving and extraction to avoid getting "bombed."
However, not using absolute paths during archiving or extraction doesn't guarantee complete safety. During the extraction process, there is still a possibility of overwriting data in the working directory. Therefore, when restoring a tar file, it's recommended to use tar -t to list the contents of the tar file first. It's also best to create a temporary directory for extracting the tar file and confirm its contents before copying them to the desired directory.