103.4 Lesson 2
Certificate: |
LPIC-1 (101) |
---|---|
Version: |
5.0 |
Topic: |
103 GNU and Unix Commands |
Objective: |
103.4 Use streams, pipes and redirects |
Lesson: |
2 of 2 |
Introduction
One aspect of the Unix philosophy states that each program should have a specific purpose and should not try to incorporate features outside its scope. But keeping things simple does not mean less elaborated results, because different programs can be chained together to produce a combined output. The vertical bar character |
, also known as the pipe symbol, can be used to create a pipeline connecting the output of a program directly into the input of another program, whereas command substitution allows to store the output of a program in a variable or use it directly as an argument to another command.
Pipes
Unlike redirects, with pipes data flows from left to right in the command line and the target is another process, not a filesystem path, file descriptor or Here document. The pipe character |
tells the shell to start all distinct commands at the same time and to connect the output of the previous command to the input of the following command, left to right. For example, instead of using redirects, the content of the file /proc/cpuinfo
sent to the standard output by cat
can be piped to the stdin of wc
with the following command:
$ cat /proc/cpuinfo | wc 208 1184 6096
In the absence of a path to a file, wc
counts the number of lines, words and characters it receives on its stdin, as is the case in the example. Many pipes can be present in a compound command. In the following example, two pipes are used:
$ cat /proc/cpuinfo | grep 'model name' | uniq model name : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
The content of file /proc/cpuinfo
produced by cat /proc/cpuinfo
was piped to the command grep 'model name'
, which then selects only the lines containing the term model name
. The machine running the example has many CPUs, so there are repeated lines with model name
. The last pipe connects grep 'model name'
to uniq
, which is responsible for skipping any line equal to the previous one.
Pipes can be combined with redirects in the same command line. The previous example can be rewritten to a simpler form:
$ grep 'model name' </proc/cpuinfo | uniq model name : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
The input redirect for grep
is not strictly necessary as grep
accepts a file path as argument, but the example demonstrates how to build such combined commands.
Pipes and redirects are exclusive, that is, one source can be mapped to only one target. Yet, it is possible to redirect an output to a file and still see it on the screen with program tee
. To do it, the first program sends its output to the stdin of tee
and a file name is provided to the latter to store the data:
$ grep 'model name' </proc/cpuinfo | uniq | tee cpu_model.txt model name : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz $ cat cpu_model.txt model name : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
The output of the last program in the chain, generated by uniq
, is displayed and stored in the file cpu_model.txt
. To not overwrite the content of the provided file but to append data to it, the option -a
must be provided to tee
.
Only the standard output of a process is captured by a pipe. Let’s say you must to go through a long compilation process on the screen and at the same time save both the standard output and the standard error to a file for later inspection. Assuming your current directory does not have a Makefile, the following command will output an error:
$ make | tee log.txt make: *** No targets specified and no makefile found. Stop.
Although shown on the screen, the error message generated by make
was not captured by tee
and the file log.txt was created empty. A redirect needs to be done before a pipe can capture the stderr:
$ make 2>&1 | tee log.txt make: *** No targets specified and no makefile found. Stop. $ cat log.txt make: *** No targets specified and no makefile found. Stop.
In this example the stderr of make
was redirected to the stdout, so tee
was able to capture it with a pipe, display it on the screen and save it in the file log.txt
. In cases like this, it may be useful to save the error messages for later inspection.
Command Substitution
Another method to capture the output of a command is command substitution. By placing a command inside backquotes, Bash replaces it with its standard output. The following example shows how to use the stdout of a program as an argument to another program:
$ mkdir `date +%Y-%m-%d` $ ls 2019-09-05
The output of the program date
, the current date formatted as year-month-day, was used as an argument to create a directory with mkdir
. An identical result is obtained by using $()
instead of backquotes:
$ rmdir 2019-09-05 $ mkdir $(date +%Y-%m-%d) $ ls 2019-09-05
The same method can be used to store the output of a command as a variable:
$ OS=`uname -o` $ echo $OS GNU/Linux
The command uname -o
outputs the generic name of the current operating system, which was stored in the session variable OS
. To assign the output of a command to a variable is very useful in scripts, making it possible to store and evaluate the data in many distinct ways.
Depending on the output generated by the replaced command, the builtin command substitution may not be appropriate. A more sophisticated method to use the output of a program as the argument of another program employs an intermediate called xargs
. The program xargs
uses the contents it receives via stdin to run a given command with the contents as its argument. The following example shows xargs
running the program identify
with arguments provided by program find
:
$ find /usr/share/icons -name 'debian*' | xargs identify -format "%f: %wx%h\n" debian-swirl.svg: 48x48 debian-swirl.png: 22x22 debian-swirl.png: 32x32 debian-swirl.png: 256x256 debian-swirl.png: 48x48 debian-swirl.png: 16x16 debian-swirl.png: 24x24 debian-swirl.svg: 48x48
The program identify
is part of ImageMagick, a set of command-line tools to inspect, convert and edit most image file types. In the example, xargs
took all paths listed by find
and put them as arguments to identify
, which then shows the information for each file formatted as required by the option -format
. The files found by find
in the example are images containing the distribution logo in a Debian filesystem. -format
is a parameter to identify
, not to xargs
.
Option -n 1
requires xargs
to run the given command with only one argument at a time. In the example’s case, instead of passing all paths found by find
as a list of arguments to identify
, using xargs -n 1
would execute command identify
for each path separately. Using -n 2
would execute identify
with two paths as arguments, -n 3
with three paths as arguments and so on. Similarly, when xargs
process multiline contents — as is the case with input provided by find
— the option -L
can be used to limit how many lines will be used as arguments per command execution.
Note
|
Using |
If the paths have space characters, it is important to run find
with the option -print0
. This option instructs find
to use a null character between each entry so the list can be correctly parsed by xargs
(the output was suppressed):
$ find . -name '*avi' -print0 -o -name '*mp4' -print0 -o -name '*mkv' -print0 | xargs -0 du | sort -n
The option -0
tells xargs
the null character should be used as the separator. That way the file paths given by find
are correctly parsed even if they have blank or other special characters in it. The previous example shows how to use the command du
to find out the disk usage of every file found and then sort the results by size. The output was suppressed for conciseness. Note that for each search criteria it is necessary to place the -print0
option for find
.
By default, xargs
places the arguments of the executed command last. To change that behavior, the option -I
should be used:
$ find . -mindepth 2 -name '*avi' -print0 -o -name '*mp4' -print0 -o -name '*mkv' -print0 | xargs -0 -I PATH mv PATH ./
In the last example, every file found by find
is moved to the current directory. As the source path(s) must be informed to mv
before the target path, a substitution term is given to the option -I
of xargs
which is then appropriately placed alongside mv
. By using the null character as separator, it is not necessary to enclose the substitution term with quotes.
Guided Exercises
-
It’s convenient to save the execution date of actions performed by automated scripts. Command
date +%Y-%m-%d
shows the current date in year-month-day format. How can the output of such a command be stored in a shell variable calledTODAY
using command substitution? -
Using command
echo
, how can the contents of variableTODAY
be sent to the standard input of commandsed s/-/./g
? -
How could the output of command
date +%Y-%m-%d
be used as a Here string to commandsed s/-/./g
? -
Command
convert image.jpeg -resize 25% small/image.jpeg
creates a smaller version ofimage.jpeg
and places the resulting image in a likewise named file inside subdirectorysmall
. Usingxargs
, how is it possible to perform the same command for every image listed in filefilelist.txt
?
Explorational Exercises
-
A simple backup routine periodically creates an image of partition
/dev/sda1
withdd < /dev/sda1 > sda1.img
. To perform future data integrity checks, the routine also generates a SHA1 hash of the file withsha1sum < sda1.img > sda1.sha1
. By adding pipes and commandtee
, how would these two commands be combined into one? -
Command
tar
is used to archive many files into a single file, preserving directory structure. Option-T
allows to specify a file containing the paths to be archived. For example,find /etc -type f | tar -cJ -f /srv/backup/etc.tar.xz -T -
creates a compressed tar fileetc.tar.xz
from the list provided by commandfind
(option-T -
indicates the standard input as the path list). In order to avoid possible parsing errors due to paths containing spaces, what command options should be present forfind
andtar
? -
Instead of opening a new remote shell session, command
ssh
can just execute a command indicated as its argument:ssh user@storage "remote command"
. Given thatssh
also allows to redirect the standard output of a local program to the standard input of the remote program, how would thecat
command pipe a local file namedetc.tar.gz
to/srv/backup/etc.tar.gz
atuser@storage
throughssh
?
Summary
This lesson covers traditional inter-process communication techniques employed by Linux. Command pipelining creates a one way communication channel between two process and command substitution allows to store the output of a process in a shell variable. The lesson goes through the following steps:
-
How pipes can be used to stream the output of a process to the input of another process.
-
The purpose of commands
tee
andxargs
. -
How to capture the output of a process with command substitution, storing it in a variable or using it directly as a parameter to another command.
The commands and procedures addressed were:
-
Command pipelining with
|
. -
Command substitution with backticks and
$()
. -
Commands
tee
,xargs
andfind
.
Answers to Guided Exercises
-
It’s convenient to save the execution date of actions performed by automated scripts. Command
date +%Y-%m-%d
shows the current date in year-month-day format. How can the output of such a command can be stored in a shell variable calledTODAY
using command substitution?$ TODAY=`date +%Y-%m-%d`
or
$ TODAY=$(date +%Y-%m-%d)
-
Using command
echo
, how can the contents of variableTODAY
be sent to the standard input of commandsed s/-/./g
?$ echo $TODAY | sed s/-/./g
-
How could the output of command
date +%Y-%m-%d
be used as a Here string to commandsed s/-/./g
?$ sed s/-/./g <<< `date +%Y-%m-%d`
or
$ sed s/-/./g <<< $(date +%Y-%m-%d)
-
Command
convert image.jpeg -resize 25% small/image.jpeg
creates a smaller version ofimage.jpeg
and places the resulting image in a likewise named file inside subdirectorysmall
. Usingxargs
, how is it possible to perform the same command for every image listed in filefilelist.txt
?$ xargs -I IMG convert IMG -resize 25% small/IMG < filelist.txt
or
$ cat filelist.txt | xargs -I IMG convert IMG -resize 25% small/IMG
Answers to Explorational Exercises
-
A simple backup routine periodically creates an image of partition
/dev/sda1
withdd < /dev/sda1 > sda1.img
. To perform future data integrity checks, the routine also generates a SHA1 hash of the file withsha1sum < sda1.img > sda1.sha1
. By adding pipes and commandtee
, how would these two commands be combined into one?# dd < /dev/sda1 | tee sda1.img | sha1sum > sda1.sha1
-
Command
tar
is used to archive many files into a single file, preserving directory structure. Option-T
allows to specify a file containing the paths to be archived. For example,find /etc -type f | tar -cJ -f /srv/backup/etc.tar.xz -T -
creates a compressed tar fileetc.tar.xz
from the list provided by commandfind
(option-T -
indicates the standard input as the path list). In order to avoid possible parsing errors due to paths containing spaces, what command options should be present forfind
andtar
?Options
-print0
and--null
:$ find /etc -type f -print0 | tar -cJ -f /srv/backup/etc.tar.xz --null -T -
-
Instead of opening a new remote shell session, command
ssh
can just execute a command indicated as its argument:ssh user@storage "remote command"
. Given thatssh
also allows to redirect the standard output of a local program to the standard input of the remote program, how would thecat
pipe a local file namedetc.tar.gz
to/srv/backup/etc.tar.gz
atuser@storage
throughssh
?$ cat etc.tar.gz | ssh user@storage "cat > /srv/backup/etc.tar.gz"
or
$ ssh user@storage "cat > /srv/backup/etc.tar.gz" < etc.tar.gz