The Linux Newbie Guide  ⇒    Fundamentals     Advanced     Supplement   Command Index   ENG⇒中
All rights reserved, please indicate the source when citing
 

File descriptor

1.0 Introduction to File Descriptors
       File descriptors (fd) and redirection
       Directory "/proc/<PID>/fd" and file descriptors
1.1 exec and fd Redirection
           exec X>FILE : Redirect fd X to a file
           exec X>&Y : Redirect fd X to fd Y
           exec X<FILE : Redirect a file to fd X
           exec X<&Y : Redirect fd Y to fd X
           exec fd X<>FILE : Redirect file to fd X for both reading and writing
           X>&- or X<&- : Close fd X
1.2 Who Stole stdin?

ENG⇒中ENG⇒中
  1.0 Introduction to File Descriptors
One day, I came across a certain book discussing Shell Scripting, which provided an example of reading a file line by line. The sample script is shown below:

(Example ex1.sh)
$ cat ex1.sh
#!/bin/bash
# read〝FILE.txt〞 line by line
while read -r line #←This line is one of the puzzling parts for me – how can "read" read a file instead of the keyboard?
do
echo $line
done <FILE.txt #←This line is another puzzling part for me – what does "done < FILE" mean???

Although the above example consists of only a few lines, its syntax is perplexing. As someone with limited knowledge, I find it hard to understand, and the author hasn't explained the principles thoroughly (probably copied from somewhere?).

Well, if I can't comprehend a piece of writing, it's not a big deal. I also want to adopt it for my application. I tested the example, and it works perfectly fine.

My application is also quite simple: I just want to display one line at a time and require pressing any key to proceed to the next line. So, I added an extra command to rewrite it as follows:



((Example ex2.sh)
$ cat ex2.sh
#!/bin/bash
while read -r line
do
echo $line
read -p "Press any key to continue" -n 1 #←This is the line I added
done <FILE.txt

Strange enough, my modified program, ex2.sh, doesn't work as expected! Who's the culprit? Is it something peculiar in my understanding? Or is it the line I added myself?

So, I turned into a keyboard detective, determined to catch the culprit and bring them to justice.

After several nights of intense investigation, I finally caught the culprit – it's none other than "File Descriptor," often abbreviated as "fd".

In the original example (ex1.sh), the ability to read a file line by line is achieved through a shell simplification that hides the intricacies of file descriptor operations. However, my modified example (ex2.sh) doesn't work properly because the shell's hidden file descriptor operations have stolen stdin (standard input).

In the literature within my country, mentions of "file descriptor" are rare, and even when they are touched upon, they fail to address the core issues. Therefore, I decided to jot down my insights from these past few days, serving as both my personal reminder and potentially assisting others who encounter the same problem. This is particularly relevant for shell scripts that involve file reading and keyboard input prompts, where the use of file descriptors might eliminate the need to rely on tools like awk or sed to accomplish tasks.

In simple terms, a "file descriptor" is a number assigned by Unix-like operating systems when reading files. This number acts as an index for the kernel to track the input/output of processes associated with the opened files.

For instance, while browsing this article, your browser may have opened around 20 files (HTML files and various image files), each with a unique index (such as 100, 101, 102, and so on). These index numbers are the file descriptors (fd). However, maintaining these file descriptors is the responsibility of the kernel, and regular users need not concern themselves with this level of detail.


^ back on top ^

file descriptors (fd) and redirection
There are three file descriptors (fd) that are always open: stdin (keyboard input), stdout (screen output), and stderr (error messages). The POSIX standard reserves the fd numbers 0 to 2 for these three files, allowing users to perform redirection for various applications.

fd Number Name Function
0 stdin Standard Input
1 stdout Standard Output
2 stderr Standard Error


Redirection is essentially the process of redirecting the three uncloseable file descriptors (0 to 2) to other destinations (usually files or other fds).

  Function Example Example note
COMMAND 1> Redirect stdout echo '123' >fileA fd 1 output to a file
COMMAND 1>> Append stdout: seq 100 200 >> fileA fd 1 append output to a file
COMMAND 2> Redirect stderr: find / -name '*.conf' 2>/dev/null fd 2 output to a file
COMMAND 2>> Append stderr: seq 1 10 >>fileA fd 2 append output to a file
COMMAND 0< Redirect stdin cat < fileA fd 0 replaced by a file


For output redirection, the syntax is COMMAND [fd]>, where fd defaults to 1.
if omitted. For input redirection, the syntax is COMMAND [fd]<, where fd defaults to 0 if omitted.

Redirection can also change the original output to go to stderr or vice versa.
The syntax is X>&Y (where X is the original fd, and Y is the redirected fd; if X is omitted, it defaults to 1). For example, redirecting stderr (2) to stdout (1) is written as "2>&1".

  Function Example
2>&1 Redirect stderr(2) to stdout(1): ls -R /home > fileA 2>&1
1>&2 Redirect stdout to stderr find / -name '*readme.txt' 1>&2 2>/dev/null

Since stdin, stdout, and stderr (fd 0 to 2) are always open and can be used directly, if you need an fd greater than 3, you generally need to use the exec command to open it.

^ back on top ^

Directory "/proc/<PID>/fd" and file descriptors
When a process runs, it generates a Process ID (PID), and corresponding file descriptors fd are mapped to the directory "/proc/<PID>/fd" (where "<PID>" is the process's PID number). This directory allows you to observe the usage of file descriptors. Example:
$ seq 1 1000000
1
2
3
4 Ctrl+Z ←Press <Ctrl+Z> to pause
`
[1]+ Stopped                seq 1 100000000 ←The program is stopped
$ jobs -p ←List the PIDs of paused commands
$ 2373 ←The PID of the command "seq 1 1000000" is 2373
ls -lgG /proc/2373/fd/ ←List /proc/<PID>/fd to observe fd usage
total 0
lrwx------ 1 64 2015-04-26 22:28 0 -> /dev/tty1
lrwx------ 1 64 2015-04-26 22:28 1 -> /dev/tty1
lrwx------ 1 64 2015-04-26 22:28 2 -> /dev/tty1

In the above example, the directory "/proc/<PID>/fd/" contains 3 files, corresponding to file descriptors 0, 1, and 2, respectively. These are linked to "/dev/tty1" (in graphical interface tests, it could be "/dev/pts/N").

This means that in the example, stdin (fd 0), stdout (fd 1), and stderr (fd 2) are all connected to the tty (terminal) or /dev/pts/N (virtual terminal).

Let's modify the experiment with the command seq 1 1000000 > fileA 2>&1 and observe the results:
lrwx------ 1 64 2015-04-26 15:04 0 -> /dev/tty1
l-wx------ 1 64 2015-04-26 15:04 1 -> /home/basalt/fileA
l-wx------ 1 64 2015-04-26 15:04 2 -> /home/basalt/fileA

In this example, stdin (fd 0) remains as the tty, but stdout (fd 1) and stderr (fd 2) are both redirected to "fileA."

Therefore, when a command becomes confusing due to piping and redirection, you can gain clarity by observing the information provided by the file descriptors in the directory "/proc/<PID>/fd/."

Now, if you modify it further to seq 1 100 > fileB >&2, after the computation, the contents of the file "fileB" are empty. Why is that? Take a look at "/proc/<PID>/fd/" to understand!



^ back on top ^



   1.1 exec and fd Redirection

Excluding fd 0 (stdin), fd 1 (stdout), fd 2 (stderr), and system-reserved fd 10 to 255, general users are advised to only use fd 3 to 9 for redirection purposes.
(fd 255 is usually reserved for shell scripts, and process substitution may use fd 63 or fd 62, so it's best to avoid using the system's own fd 10 to 255 to prevent conflicts.)

To use fd 3 to 9, the exec command is used. In a process, the exec function serves two main purposes: it closes the parent process and runs the child process directly. Another important function of exec is fd redirection.

There are two types of redirection: redirecting one fd to another fd and redirecting an fd to a file. Let's explain each of them:

Combining these principles, the possible usages are as follows:

^ back on top ^

1.2 Who Stole stdin?
Let's revisit the mysterious example ex1.sh. When running it and observing the "/proc/<PID>/fd" directory, the result is as follows:

lr-x------ 1 64 2015-04-26 14:45 0 -> /home/basalt/FILE.txt ←stdin changed to a file
lrwx------ 1 64 2015-04-26 14:45 1 -> /dev/tty1
lrwx------ 1 64 2015-04-26 14:45 10 -> /dev/tty1 ←additional fd 10 opened
lrwx------ 1 64 2015-04-26 14:45 2 -> /dev/tty1
lr-x------ 1 64 2015-04-26 14:45 255 -> /home/basalt/ex1.sh

The stdin has become a file, and an additional fd 10 is opened. We can boldly speculate that when encountering the "done < FILE" type of syntax in ex1.sh, the shell cleverly changes the stdin source to a file. Let's deduce the hidden file descriptor operations made by the shell:
exec 10<&0 #←Backup fd 0 to fd 10
exec < FILE.txt #← stdin=file
while read -r line
do
echo $line
done #←Original command was done<FILE.txt

exec 0<&10 #← Restore fd 0 from fd 10

Due to the" done < FILE", the stdin is changed to a file. Since the stdin is taken away by the file, this is the reason why the modified ex2.sh cannot read from the keyboard using the read command. After the modification, the script can read from the file and the stdin (keyboard). The modified example "ex6.sh" is as follows:
$ cat ex6.sh
#!/bin/bash
# read〝FILE.txt〞 line by line

exec 7<FILE.txt # ←fd 7=FILE.txt)

while read -u 7 line #←read reads from fd 7 instead of stdin
do
echo $line
read -p "Press any key to continue" -n 1
done



On the internet, I stumbled upon a classic case where many people encountered the issue of stdin being taken away without understanding the reason. A user wanted to write a shell script to list files in the working directory and then prompt whether to delete them, but it didn't work, so the user sought help online.

The problematic shell script looks like this:
$ cat ex7.sh
#!/bin/bash
while read file_name
do
rm -iv $file_name
done < <(ls)

If users are familiar with fd operations, they should be able to help the user find the issue (hint: the last part "<(ls)" is process substitution). (For a corrected version that works, refer to [Note 1.1a].)

^ back on top ^





[Note1.1] Example Source Advanced Bash-Scripting Guide

[Note1.1a]

$ cat ex8.sh
#!/bin/bash
exec 3<&0
while read file_name
do
    rm -iv $file_name <&3
done < <(ls)

# To exclude directories from the list and avoid error messages, the last line can be modified as follows:
# done < <(ls -F | grep -v '/$')