103.4 Lesson 1
Certificate: |
LPIC-1 (101) |
---|---|
Version: |
5.0 |
Topic: |
103 GNU and Unix Commands |
Objective: |
103.4 Use streams, pipes and redirects |
Lesson: |
1 of 2 |
Introduction
All computer programs follow the same general principle: data received from some source is transformed to generate an intelligible outcome. In Linux shell context, the data source can be a local file, a remote file, a device (like a keyboard), etc. The program’s output is usually rendered on a screen, but is also common to store the output data in a local filesystem, send it to a remote device, play it through audio speakers, etc.
Operating systems inspired by Unix, like Linux, offer a great variety of input/output methods. In particular, the method of file descriptors allows to dynamically associate integer numbers with data channels, so that a process can reference them as its input/output data streams.
Standard Linux processes have three communication channels opened by default: the standard input channel (most times simply called stdin), the standard output channel (stdout) and the standard error channel (stderr). The numerical file descriptors assigned to these channels are 0
to stdin, 1
to stdout and 2
to stderr. Communication channels are also accessible through the special devices /dev/stdin
, /dev/stdout
and /dev/stderr
.
These three standard communication channels allow programmers to write code that reads and writes data without worrying about the kind of media it’s coming from or going to. For example, if a program needs a set of data as its input, it can just ask for data from the standard input and whatever is being used as the standard input will provide that data. Likewise, the simplest method a program can use to display its output is to write it in the standard output. In a standard shell session, the keyboard is defined as the stdin and the monitor screen is defined as the stdout and stderr.
The Bash shell has the ability to reassign the communication channels when loading a program. It allows, for example, to override the screen as the standard output and use a file in the local filesystem as stdout.
Redirects
The reassignment of a channel’s file descriptor in the shell environment is called a redirect. A redirect is defined by a special character within the command line. For example, to redirect the standard output of a process to a file, the greater than symbol >
is positioned at the end of the command and followed by the path to the file that will receive the redirected output:
$ cat /proc/cpuinfo >/tmp/cpu.txt
By default, only the content coming to stdout is redirected. That happens because the numerical value of the file descriptor should be specified just before the greater than symbol and, when not specified, Bash redirects the standard output. Therefore, using >
is equivalent to use 1>
(the value of stdout’s file descriptor is 1
).
To capture the content of stderr, the redirect 2>
should be used instead. Most command-line programs send debug information and error messages to the standard error channel. It is possible, for example, to capture the error message triggered by an attempt to read a non-existent file:
$ cat /proc/cpu_info 2>/tmp/error.txt $ cat /tmp/error.txt cat: /proc/cpu_info: No such file or directory
Both stdout and stderr are redirected to the same target with &>
or >&
. It’s important to not place any spaces beside the ampersand, otherwise Bash will take it as the instruction to run the process in background and not to perform the redirect.
The target must be a path to a writable file, like /tmp/cpu.txt
, or a writable file descriptor. A file descriptor target is represented by an ampersand followed by the file descriptor’s numerical value. For example, 1>&2
redirects stdout to stderr. To do the opposite, stderr to stdout, 2>&1
should be used instead.
Although not very useful, given that there is a shorter way to do the same task, it is possible to redirect stderr to stdout and then redirect it to a file. For example, a redirect to write both stderr and stdout to a file named log.txt
can be written as >log.txt 2>&1
. However, the main reason for redirecting stderr to stdout is to allow parsing of debug and error messages. It is possible to redirect the standard output of a program to the standard input of another program, but it is not possible to directly redirect the standard error to the standard input of another program. Thus, program’s messages sent to stderr first need to be redirected to stdout in order to be read by another program’s stdin.
To just discard the output of a command, its content can be redirected to the special file /dev/null
. For example, >log.txt 2>/dev/null
saves the contents of stdout in the file log.txt
and discards the stderr. The file /dev/null
is writable by any user but no data can be recovered from it, as it is not stored anywhere.
An error message is presented if the specified target is not writable (if the path points to a directory or a read-only file) and no modification to the target is made. However, an output redirect overwrites an existing writable target without any confirmation. Files are overwritten by output redirects unless Bash option noclobber
is enabled, which can be done for the current session with the command set -o noclobber
or set -C
:
$ set -o noclobber $ cat /proc/cpu_info 2>/tmp/error.txt -bash: /tmp/error.txt: cannot overwrite existing file
To unset the noclobber
option for the current session, run set +o noclobber
or set +C
. To make the noclobber
option persistent, it must be included in the user’s Bash profile or in the system-wide profile.
Even with the noclobber
option enabled it is possible to append redirected data to existent content. This is accomplished with a redirection written with two greater than symbols >>
:
$ cat /proc/cpu_info 2>>/tmp/error.txt $ cat /tmp/error.txt cat: /proc/cpu_info: No such file or directory cat: /proc/cpu_info: No such file or directory
In the previous example, the new error message was appended to the existing one in file /tmp/error.txt
. If the file does not exist yet, it will be created with the new data.
The data source of the standard input of a process can be reassigned as well. The less than symbol <
is used to redirect the content of a file to the stdin of a process. In this case, data flows from right to left: the reassigned descriptor is assumed to be 0 at the left of the less than symbol and the data source (a path to a file) must be at the right of the less than symbol. The command uniq
, like most command line utilities for processing text, accepts data sent to stdin by default:
$ uniq -c </tmp/error.txt 2 cat: /proc/cpu_info: No such file or directory
The -c
option makes uniq
display how many times a repeated line appears in the text. As the numeric value of the redirected file descriptor was suppressed, the example command is equivalent to uniq -c 0</tmp/error.txt
. To use a file descriptor other than 0
in an input redirect only makes sense in specific contexts, because it is possible for a program to ask for data at file descriptors 3
, 4
, etc. Indeed, programs can use any integer above 2 as new file descriptors for data input/output. For example, the following C code reads data from file descriptor 3
and just replicates it to file descriptor 4
:
Note
|
The program must handle such file descriptors correctly, otherwise it could attempt an invalid read or write operation and crash. |
#include <stdio.h> int main(int argc, char **argv){ FILE *fd_3, *fd_4; // Open file descriptor 3 fd_3 = fdopen(3, "r"); // Open file descriptor 4 fd_4 = fdopen(4, "w"); // Read from file descriptor 3 char buf[32]; while ( fgets(buf, 32, fd_3) != NULL ){ // Write to file descriptor 4 fprintf(fd_4, "%s", buf); } // Close both file descriptors fclose(fd_3); fclose(fd_4); }
To test it, save the sample code as fd.c
and compile it with gcc -o fd fd.c
. This program needs file descriptors 3 and 4 to be available so it can read and write to them. As an example, the previously created file /tmp/error.txt
can be used as the source for file descriptor 3
and the file descriptor 4
can be redirected to stdout:
$ ./fd 3</tmp/error.txt 4>&1 cat: /proc/cpu_info: No such file or directory cat: /proc/cpu_info: No such file or directory
From a programmer’s perspective, using file descriptors avoids having to deal with option parsing and filesystem paths. The same file descriptor can even be used as input and output. In this case, the file descriptor is defined in the command line with both less than and greater than symbols, like in 3<>/tmp/error.txt
.
Here Document and Here String
Another way to redirect input involve the Here document and Here string methods. The Here document redirect allows to type multi-line text that will be used as the redirected content. Two less than symbols <<
indicate a Here document redirect:
$ wc -c <<EOF > How many characters > in this Here document? > EOF 43
At the right of the two less than symbols <<
is the ending term EOF
. The insertion mode will finish as soon as a line containing only the ending term is entered. Any other term can be used as the ending term, but it is important to not put blank characters between the less than symbol and the ending term. In the example above, the two lines of text were sent to the stdin of wc -c
command, which displays the characters count. As with input redirects for files, the stdin (file descriptor 0
) is assumed if the redirected file descriptor is suppressed.
The Here string method is much like the Here document method, but for one line only:
$ wc -c <<<"How many characters in this Here string?" 41
In this example, the string at the right of the three less than signs is sent to the stdin of wc -c
, which counts the number of characters. Strings containing spaces must be inside quotes, otherwise only the first word will be used as the Here string and the remaining ones will be passed as arguments to the command.
Guided Exercises
-
In addition to text files, the command
cat
can also work with binary data, like sending the contents of a block device to a file. Using redirection, how cancat
send the contents of device/dev/sdc
to the filesdc.img
in the current directory? -
What’s the name of the standard channel redirected by the command
date 1> now.txt
? -
After trying to overwrite a file using redirection, a user gets an error informing that option
noclobber
is enabled. How can the optionnoclobber
be deactivated for the current session? -
What will be the result of command
cat <<.>/dev/stdout
?
Explorational Exercises
-
The command
cat /proc/cpu_info
displays an error message because/proc/cpu_info
is nonexistent. The commandcat /proc/cpu_info 2>1
redirects the error message to where? -
Will it still be possible to discard content sent to
/dev/null
if thenoclobber
option is enabled for the current shell session? -
Without using
echo
, how could the contents of the variable$USER
be redirected to the stdin of commandsha1sum
? -
The Linux kernel keeps symbolic links in
/proc/PID/fd/
to every file opened by a process, where PID is the identification number of the corresponding process. How could the system administrator use that directory to verify the location of log files opened bynginx
, supposing its PID is1234
? -
It’s possible to do arithmetic calculations using only shell builtin commands, but floating point calculations require specific programs, like
bc
(basic calculator). Withbc
it’s even possible to specify the number of decimal places, with parameterscale
. However,bc
accepts operations only through its standard input, usually entered in interactive mode. Using a Here string, how can the floating point operationscale=6; 1/3
be sent to the standard input ofbc
?
Summary
This lesson covers methods to run a program redirecting its standard communication channels. Linux processes use these standard channels as generic file descriptors to read and to write data, making it possible to arbitrarily change them to files or devices. The lesson goes through the following steps:
-
What file descriptors are and the role they play in Linux.
-
The standard communication channels of every process: stdin, stdout and stderr.
-
How to correctly execute a command using data redirection, both for input and output.
-
How to use Here Documents and Here Strings in input redirections.
The commands and procedures addressed were:
-
Redirection operators:
>
,<
,>>
,<<
,<<<
. -
Commands
cat
,set
,uniq
andwc
.
Answers to Guided Exercises
-
In addition to text files, the command
cat
can also work with binary data, like sending the contents of a block device to a file. Using redirection, how cancat
send the contents of device/dev/sdc
to the filesdc.img
in the current directory?$ cat /dev/sdc > sdc.img
-
What’s the name of the standard channel redirected by the command
date 1> now.txt
?Standard output or stdout
-
After trying to overwrite a file using redirection, a user gets an error informing that the option
noclobber
is enabled. How can the optionnoclobber
be deactivated for the current session?set +C
orset +o noclobber
-
What will be the result of command
cat <<.>/dev/stdout
?Bash will enter Heredoc input mode, then exit when a period appears in a line by itself. The typed text will be redirected to stdout (printed on screen).
Answers to Explorational Exercises
-
The command
cat /proc/cpu_info
displays an error message because/proc/cpu_info
is nonexistent. The commandcat /proc/cpu_info 2>1
redirects the error message to where?To a file named
1
in the current directory. -
Will it still be possible to discard content sent to
/dev/null
if thenoclobber
option is enabled for the current shell session?Yes.
/dev/null
is a special file not affected bynoclobber
. -
Without using
echo
, how can the contents of the variable$USER
be redirected to the stdin of commandsha1sum
?$ sha1sum <<<$USER
-
The Linux kernel keeps symbolic links in
/proc/PID/fd/
to every file opened by a process, where PID is the identification number of the corresponding process. How could the system administrator use that directory to verify the location of log files opened bynginx
, supposing its PID is1234
?By issuing the command
ls -l /proc/1234/fd
, which will display the targets of every symbolic link in the directory. -
It’s possible to do arithmetic calculations using only shell builtin commands, but floating point calculations require specific programs, like
bc
(basic calculator). Withbc
it’s even possible to specify the number of decimal places, with parameterscale
. However,bc
accepts operations only through its standard input, usually entered in interactive mode. Using a Here string, how could the floating point operationscale=6; 1/3
be sent to the standard input ofbc
?$ bc <<<"scale=6; 1/3"