103.2 Lesson 1
Certificate: |
LPIC-1 |
---|---|
Version: |
5.0 |
Topic: |
103 GNU and Unix Commands |
Objective: |
103.2 Process text streams using filters |
Lesson: |
1 of 1 |
Introduction
Dealing with text is a major part of every systems administrator’s job. Doug McIlroy, a member of the original Unix development team, summarized the Unix philosophy and said (among other important things): “Write programs to handle text streams, because that is a universal interface.” Linux is inspired by the Unix operating system and it firmly adopts its philosophy, so an administrator must expect lots of text manipulation tools within a Linux distribution.
A Quick Review on Redirections and Pipes
Also from the Unix philosophy:
-
Write programs that do one thing and do it well.
-
Write programs to work together.
One major way of making programs work together is through piping and redirections. Pretty much all of your text manipulation programs will get text from a standard input (stdin), output it to a standard output (stdout) and send eventual errors to a standard error output (stderr). Unless you specify otherwise, the standard input will be what you type on your keyboard (the program will read it after you press the Enter key). Similarly, the standard output and errors will be displayed in your terminal screen. Let us see how this works.
In your terminal, type cat
and then hit the Enter key. Then type some random text.
$ cat This is a test This is a test Hey! Hey! It is repeating everything I type! It is repeating everything I type! (I will hit ctrl+c so I will stop this nonsense) (I will hit ctrl+c so I will stop this nonsense) ^C
For more information about the cat
command (the term comes from “concatenate”) please refer to the man pages.
Note
|
If you are working on a really plain installation of a Linux server, some commands such as |
As demonstrated above if you do not specify where cat
should read from, it will read from the standard input (whatever you type) and output whatever it reads to your terminal window (its standard output).
Now try the following:
$ cat > mytextfile This is a test I hope cat is storing this to mytextfile as I redirected the output I will hit ctrl+c now and check this ^C $ cat mytextfile This is a test I hope cat is storing this to mytextfile as I redirected the output I will hit ctrl+c now and check this
The >
(greater than) tells cat
to direct its output to the mytextfile
file, not the standard output. Now try this:
$ cat mytextfile > mynewtextfile $ cat mynewtextfile This is a test I hope cat is storing this to mytextfile as I redirected the output I will hit ctrl+c now and check this
This has the effect of copying mytextfile
to mynewtextfile
. You can actually verify that these two files have the same content by performing a diff
:
$ diff mynewtextfile mytextfile
As there is no output, the files are equal. Now try the append redirection operator (>>
):
$ echo 'This is my new line' >> mynewtextfile $ diff mynewtextfile mytextfile 4d3 < This is my new line
So far we have used redirections to create and manipulate files. We can also use pipes (represented by the symbol |
) to redirect the output of one program to another program. Let us find the lines where the word “this” is found:
$ cat mytextfile | grep this I hope cat is storing this to mytextfile as I redirected the output I will hit ctrl+c now and check this $ cat mytextfile | grep -i this This is a test I hope cat is storing this to mytextfile as I redirected the output I will hit ctrl+c now and check this
Now we have piped the output of cat
to another command: grep
. Notice when we ignore the case (using the -i
option) we get an extra line as a result.
Processing Text Streams
Reading a Compressed File
We will create a file called ftu.txt
containing a list of the following commands:
bzcat cat cut head less md5sum nl od paste sed sha256sum sha512sum sort split tail tr uniq wc xzcat zcat
Now we will use the grep
command to print all of the lines containing the string cat
:
$ cat ftu.txt | grep cat bzcat cat xzcat zcat
Another way to get this information is to just use the grep
command to filter the text directly, without the need to use another application to send the text stream to stdout
.
$ grep cat ftu.txt bzcat cat xzcat zcat
Note
|
Remember there are many ways to perform the same task using Linux. |
There are other commands that handle compressed files (bzcat
for bzip
compressed files, xzcat
for xz
compressed files and zcat
for gzip
compressed files) and each one is used to view the contents of a compressed file based on the compression algorithm used.
Verify that the newly created file ftu.txt
is the only one in the directory, then create a gzip
compressed version of the file:
$ ls ftu* ftu.txt $ gzip ftu.txt $ ls ftu* ftu.txt.gz
Next, use the zcat
command to view the contents of the gzipped compressed file:
$ zcat ftu.txt.gz bzcat cat cut head less md5sum nl od paste sed sha256sum sha512sum sort split tail tr uniq wc xzcat zcat
Note that gzip
will compress ftu.txt
into ftu.txt.gz
and it will remove the original file. By default, no output from the gzip
command will be displayed. However, if you do want gzip
to tell you what it is doing, use the -v
option for the “verbose” output.
Viewing a File in a Pager
You know cat
concatenates a file to the standard output (once a file is provided after the command). The file /var/log/syslog
is where your Linux system stores everything important going on in your system. Using the sudo
command to elevate privileges so as to be able to read the /var/log/syslog
file:
$ sudo cat /var/log/syslog
…you will see messages scrolling very fast within your terminal window. You can pipe the output to the program less
so the results will be paginated. By using less
you can use the arrow keys to navigate through the output and also use vi
like commands to navigate and search throughout the text.
However, rather than pipe the cat
command into a pagination program it is more pragmatic to just use the pagination program directly:
$ sudo less /var/log/syslog ... (output omitted for clarity)
Getting a Portion of a Text File
If only the start or end of a file needs to be reviewed, there are other methods available. The command head
is used to read the first ten lines of a file by default, and the command tail
is used to read the last ten lines of a file by default. Now try:
$ sudo head /var/log/syslog Nov 12 08:04:30 hypatia rsyslogd: [origin software="rsyslogd" swVersion="8.1910.0" x-pid="811" x-info="https://www.rsyslog.com"] rsyslogd was HUPed Nov 12 08:04:30 hypatia systemd[1]: logrotate.service: Succeeded. Nov 12 08:04:30 hypatia systemd[1]: Started Rotate log files. Nov 12 08:04:30 hypatia vdr: [928] video directory scanner thread started (pid=882, tid=928, prio=low) Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'A - ATSC' Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'C - DVB-C' Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'S - DVB-S' Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'T - DVB-T' Nov 12 08:04:30 hypatia vdr[882]: vdr: no primary device found - using first device! Nov 12 08:04:30 hypatia vdr: [929] epg data reader thread started (pid=882, tid=929, prio=high) $ sudo tail /var/log/syslog Nov 13 10:24:45 hypatia kernel: [ 8001.679238] mce: CPU7: Core temperature/speed normal Nov 13 10:24:46 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Activating via systemd: service name='org.freedesktop.Tracker1.Miner.Extract' unit='tracker-extract.service' requested by ':1.73' (uid=1000 pid=2425 comm="/usr/lib/tracker/tracker-miner-fs ") Nov 13 10:24:46 hypatia systemd[2004]: Starting Tracker metadata extractor... Nov 13 10:24:47 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Successfully activated service 'org.freedesktop.Tracker1.Miner.Extract' Nov 13 10:24:47 hypatia systemd[2004]: Started Tracker metadata extractor. Nov 13 10:24:54 hypatia kernel: [ 8010.462227] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 502907) Nov 13 10:24:54 hypatia kernel: [ 8010.462228] mce: CPU4: Core temperature above threshold, cpu clock throttled (total events = 502911) Nov 13 10:24:54 hypatia kernel: [ 8010.469221] mce: CPU0: Core temperature/speed normal Nov 13 10:24:54 hypatia kernel: [ 8010.469222] mce: CPU4: Core temperature/speed normal Nov 13 10:25:03 hypatia systemd[2004]: tracker-extract.service: Succeeded.
To help illustrate the number of lines displayed, we can pipe the output of the head
command to the nl
command, which will display the number of lines of text streamed into the command:
$ sudo head /var/log/syslog | nl 1 Nov 12 08:04:30 hypatia rsyslogd: [origin software="rsyslogd" swVersion="8.1910.0" x-pid="811" x-info="https://www.rsyslog.com"] rsyslogd was HUPed 2 Nov 12 08:04:30 hypatia systemd[1]: logrotate.service: Succeeded. 3 Nov 12 08:04:30 hypatia systemd[1]: Started Rotate log files. 4 Nov 12 08:04:30 hypatia vdr: [928] video directory scanner thread started (pid=882, tid=928, prio=low) 5 Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'A - ATSC' 6 Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'C - DVB-C' 7 Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'S - DVB-S' 8 Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'T - DVB-T' 9 Nov 12 08:04:30 hypatia vdr[882]: vdr: no primary device found - using first device! 10 Nov 12 08:04:30 hypatia vdr: [929] epg data reader thread started (pid=882, tid=929, prio=high)
And we can do the same by piping the output of the tail
command to the wc
command, which by default will count the number of words within a document, and using the -l
switch to print out the number of lines of text that the command has read:
$ sudo tail /var/log/syslog | wc -l 10
Should an administrator need to review more (or less) of the beginning or end of a file, the -n
option can be used to limit the commands' output:
$ sudo tail -n 5 /var/log/syslog Nov 13 10:37:24 hypatia systemd[2004]: tracker-extract.service: Succeeded. Nov 13 10:37:42 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Activating via systemd: service name='org.freedesktop.Tracker1.Miner.Extract' unit='tracker-extract.service' requested by ':1.73' (uid=1000 pid=2425 comm="/usr/lib/tracker/tracker-miner-fs ") Nov 13 10:37:42 hypatia systemd[2004]: Starting Tracker metadata extractor... Nov 13 10:37:43 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Successfully activated service 'org.freedesktop.Tracker1.Miner.Extract' Nov 13 10:37:43 hypatia systemd[2004]: Started Tracker metadata extractor. $ sudo head -n 12 /var/log/syslog Nov 12 08:04:30 hypatia rsyslogd: [origin software="rsyslogd" swVersion="8.1910.0" x-pid="811" x-info="https://www.rsyslog.com"] rsyslogd was HUPed Nov 12 08:04:30 hypatia systemd[1]: logrotate.service: Succeeded. Nov 12 08:04:30 hypatia systemd[1]: Started Rotate log files. Nov 12 08:04:30 hypatia vdr: [928] video directory scanner thread started (pid=882, tid=928, prio=low) Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'A - ATSC' Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'C - DVB-C' Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'S - DVB-S' Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'T - DVB-T' Nov 12 08:04:30 hypatia vdr[882]: vdr: no primary device found - using first device! Nov 12 08:04:30 hypatia vdr: [929] epg data reader thread started (pid=882, tid=929, prio=high) Nov 12 08:04:30 hypatia vdr: [882] no DVB device found Nov 12 08:04:30 hypatia vdr: [882] initializing plugin: vnsiserver (1.8.0): VDR-Network-Streaming-Interface (VNSI) Server
The Basics of sed, the Stream Editor
Let us take a look at the other files, terms and utilities that do not have cat
in their names. We can do this by passing the -v
option to grep
, which instructs the command to output only the lines not containing cat
:
$ zcat ftu.txt.gz | grep -v cat cut head less md5sum nl od paste sed sha256sum sha512sum sort split tail tr uniq wc
Most of what we can do with grep
we can also do with sed
— the stream editor for filtering and transforming text (as stated in the sed
manual page). First we will recover our ftu.txt
file by decompressing our gzip
archive of the file:
$ gunzip ftu.txt.gz $ ls ftu* ftu.txt
Now, we can use sed
to list only the lines containing the string cat
:
$ sed -n /cat/p < ftu.txt bzcat cat xzcat zcat
We have used the less-than sign <
to direct the contents of the file ftu.txt into
into our sed
command. The word enclosed between slashes (i.e. /cat/
) is the term we are searching for. The -n
option instructs sed
to produce no output (unless the ones later instructed by the p
command). Try running this same command without the -n
option to see what happens. Then try this:
$ sed /cat/d < ftu.txt cut head less md5sum nl od paste sed sha256sum sha512sum sort split tail tr uniq wc
If we do not use the -n
option, sed
will print everything from the file except for what the d
instructs sed
to delete from its output.
A common use of sed
is to find and replace text within a file. Suppose you want to change every occurrence of cat
to dog
. You can use sed
to do this by supplying the s
option to swap out each instance of the first term, cat
, for the second term, dog
:
$ sed s/cat/dog/ < ftu.txt bzdog dog cut head less md5sum nl od paste sed sha256sum sha512sum sort split tail tr uniq wc xzdog zdog
Rather than using a redirection operator (<
) to pass the ftu.txt
file into our sed
command, we can just have the sed
command operate on the file directly. We will try that next, while simultaneously creating a backup of the original file:
$ sed -i.backup s/cat/dog/ ftu.txt $ ls ftu* ftu.txt ftu.txt.backup
The -i
option will perform an in-place sed
operation on your original file. If you do not use the .backup
after the -i
parameter, you would just have rewritten your original file. Whatever you use as text after the -i
parameter will be the name the original file will be saved to prior to the modifications you asked sed
to perform.
Ensuring Data Integrity
We have demonstrated how easy it is to manipulate files in Linux. There are times where you may wish to distribute a file to someone else, and you want to be sure that the recipient ends up with a true copy of the original file. A very common use of this technique is practiced when Linux distribution servers host downloadable CD or DVD images of their software along with files that contain the calculated checksum values of those disc images. Here is an example listing from a Debian download mirror:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ [PARENTDIR] Parent Directory - [SUM] MD5SUMS 2019-09-08 17:46 274 [CRT] MD5SUMS.sign 2019-09-08 17:52 833 [SUM] SHA1SUMS 2019-09-08 17:46 306 [CRT] SHA1SUMS.sign 2019-09-08 17:52 833 [SUM] SHA256SUMS 2019-09-08 17:46 402 [CRT] SHA256SUMS.sign 2019-09-08 17:52 833 [SUM] SHA512SUMS 2019-09-08 17:46 658 [CRT] SHA512SUMS.sign 2019-09-08 17:52 833 [ISO] debian-10.1.0-amd64-netinst.iso 2019-09-08 04:37 335M [ISO] debian-10.1.0-amd64-xfce-CD-1.iso 2019-09-08 04:38 641M [ISO] debian-edu-10.1.0-amd64-netinst.iso 2019-09-08 04:38 405M [ISO] debian-mac-10.1.0-amd64-netinst.iso 2019-09-08 04:38 334M ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
In the listing above, the Debian installer image files are accompanied by text files that contain checksums of the files from the various algorithms (MD5, SHA1, SHA256 and SHA512).
Note
|
A checksum is a value derived from a mathematical computation, based on a cryptographic hash function, against a file. There are different types of cryptographic hash functions that vary in strength. The exam will expect you to be familiar with using |
Once you download a file (for example, the debian-10.1.0-amd64-netinst.iso
image) you would then compare the checksum of the file that was downloaded against a checksum value that was provided for you.
Here is an example to illustrate the point. We will calculate the SHA256 value of the ftu.txt
file using the sha256sum
command:
$ sha256sum ftu.txt 345452304fc26999a715652543c352e5fc7ee0c1b9deac6f57542ec91daf261c ftu.txt
The long string of characters preceding the file name is the SHA256 checksum value of this text file. Let us create a file that contains that value, so that we can use it to verify the integrity of our original text file. We can do this with the same sha256sum
command and redirect the output to a file:
$ sha256sum ftu.txt > sha256.txt
Now, to verify the ftu.txt
file, we just use the same command and supply the filename that contains our checksum value along with the -c
switch:
$ sha256sum -c sha256.txt ftu.txt: OK
The value contained within the file matches the calculated SHA256 checksum for our ftu.txt
file, just as we would expect. However, if the original file were modified (such as a few bytes lost during a file download, or someone had deliberately tampered with it) the value check would fail. In such cases we know that our file is bad or corrupted, and we can not trust the integrity of its contents. To prove the point, we will add some text at the end of the file:
$ echo "new entry" >> ftu.txt
Now we will make an attempt to verify the file’s integrity:
$ sha256sum -c sha256.txt ftu.txt: FAILED sha256sum: WARNING: 1 computed checksum did NOT match
And we see that the checksum does not match what was expected for the file. Therefore, we could not trust the integrity of this file. We could attempt to download a new copy of a file, report the failure of the checksum to the sender of the file, or report it to a data center security team depending on the importance of the file.
Looking Deeper into Files
The octal dump (od
) command is often used for debugging applications and various files. By itself, the od
command will just list out a file’s contents in octal format. We can use our ftu.txt
file from earlier to practice with this command:
$ od ftu.txt 0000000 075142 060543 005164 060543 005164 072543 005164 062550 0000020 062141 066012 071545 005163 062155 071465 066565 067012 0000040 005154 062157 070012 071541 062564 071412 062145 071412 0000060 060550 032462 071466 066565 071412 060550 030465 071462 0000100 066565 071412 071157 005164 070163 064554 005164 060564 0000120 066151 072012 005162 067165 070551 073412 005143 075170 0000140 060543 005164 061572 072141 000012 0000151
The first column of output is the byte offset for each line of output. Since od
prints out information in octal format by default, each line begins with the byte offset of eight bits, followed by eight columns, each containing the octal value of the data within that column.
Tip
|
Recall that a byte is 8 bits in length. |
Should you need to view a file’s contents in hexadecimal format, use the -x
option:
$ od -x ftu.txt 0000000 7a62 6163 0a74 6163 0a74 7563 0a74 6568 0000020 6461 6c0a 7365 0a73 646d 7335 6d75 6e0a 0000040 0a6c 646f 700a 7361 6574 730a 6465 730a 0000060 6168 3532 7336 6d75 730a 6168 3135 7332 0000100 6d75 730a 726f 0a74 7073 696c 0a74 6174 0000120 6c69 740a 0a72 6e75 7169 770a 0a63 7a78 0000140 6163 0a74 637a 7461 000a 0000151
Now each of the eight columns after the byte offset are represented by their hexadecimal equivalents.
One handy use of the od
command is for debugging scripts. For example, the od
command can show us characters that are not normally seen that exist within a file, such as newline entries. We can do this with the -c
option, so that instead of displaying the numerical notation for each byte, these column entries will instead be shown as their character equivalents:
$ od -c ftu.txt 0000000 b z c a t \n c a t \n c u t \n h e 0000020 a d \n l e s s \n m d 5 s u m \n n 0000040 l \n o d \n p a s t e \n s e d \n s 0000060 h a 2 5 6 s u m \n s h a 5 1 2 s 0000100 u m \n s o r t \n s p l i t \n t a 0000120 i l \n t r \n u n i q \n w c \n x z 0000140 c a t \n z c a t \n 0000151
All of the newline entries within the file are represented by the hidden \n
characters. If you just want to view all of the characters within a file, and do not need to see the byte offset information, the byte offset column can be removed from the output like so:
$ od -An -c ftu.txt b z c a t \n c a t \n c u t \n h e a d \n l e s s \n m d 5 s u m \n n l \n o d \n p a s t e \n s e d \n s h a 2 5 6 s u m \n s h a 5 1 2 s u m \n s o r t \n s p l i t \n t a i l \n t r \n u n i q \n w c \n x z c a t \n z c a t \n
Guided Exercises
-
Someone just donated a laptop to your school and now you wish to install Linux on it. There is no manual and you were forced to boot it from a USB thumb drive with no graphics whatsoever. You do get a shell terminal and you know, for every processor you have there will be a line for it in the
/proc/cpuinfo
file:processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 (lines skipped) processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 158 (more lines skipped)
-
Using the commands
grep
andwc
display how many processors you have. -
Do the same thing with
sed
instead ofgrep
.
-
-
Explore your local
/etc/passwd
file with thegrep
,sed
,head
andtail
commands per the tasks below:-
Which users have access to a Bash shell?
-
Your system has various users that exist to handle specific programs or for administrative purposes. They do not have access to a shell. How many of those exist in your system?
-
How many users and groups exist in your system (remember: use only the
/etc/passwd
file)? -
List only the first line, the last line and the tenth line of your
/etc/passwd
file.
-
-
Consider this
/etc/passwd
file example. Copy the lines below to a local file namedmypasswd
for this exercise.root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync nvidia-persistenced:x:121:128:NVIDIA Persistence Daemon,,,:/nonexistent:/sbin/nologin libvirt-qemu:x:64055:130:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin libvirt-dnsmasq:x:122:133:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin carol:x:1000:2000:Carol Smith,Finance,,,Main Office:/home/carol:/bin/bash dave:x:1001:1000:Dave Edwards,Finance,,,Main Office:/home/dave:/bin/ksh emma:x:1002:1000:Emma Jones,Finance,,,Main Office:/home/emma:/bin/bash frank:x:1003:1000:Frank Cassidy,Finance,,,Main Office:/home/frank:/bin/bash grace:x:1004:1000:Grace Kearns,Engineering,,,Main Office:/home/grace:/bin/ksh henry:x:1005:1000:Henry Adams,Sales,,,Main Office:/home/henry:/bin/bash john:x:1006:1000:John Chapel,Sales,,,Main Office:/home/john:/bin/bash
-
List all users in group
1000
(usesed
to select only the appropriate field) from yourmypasswd
file. -
List only the full names of all the users for this group (use
sed
andcut
).
-
Explorational Exercises
-
Once more using the
mypasswd
file from the previous exercises, devise a Bash command that will select one individual from the Main Office to win a raffle contest. Use thesed
command to only print out the lines for the Main Office, and then acut
command sequence to retrieve the first name of each user from these lines. Next you will want to randomly sort these names and only print out the top name from the list. -
How many people work in Finance, Engineering and Sales? (Consider exploring the
uniq
command.) -
Now you want to prepare a CSV (comma separated values) file so you can easily import, from the
mypasswd
file in the previous example, the filenames.csv
into LibreOffice. The file contents will have the following format:First Name,Last Name,Position Carol,Smith,Finance ... John,Chapel,Sales
Tip: Use the
sed
,cut
, andpaste
commands to achieve the desired results. Note that the comma (,
) will be the delimiter for this file. -
Suppose that the
names.csv
spreadsheet created in the previous exercise is an important file and we want to make sure nobody will tamper with it from the moment we send it to someone and the moment our recipient receives it. How can we insure the integrity of this file usingmd5sum
? -
You promised yourself that you would read a classic book 100 lines per day and you decided to start with Mariner and Mystic by Herman Melville. Devise a command using
split
that will separate this book into sections of 100 lines each. In order to get the book in plain text format, search for it at https://www.gutenberg.org. -
Using
ls -l
on the/etc
directory, what kind of listing do you get? Using thecut
command on the output of the givenls
command how would you display only the file names? What about the filename and the owner of the files? Along with thels -l
andcut
commands, utilize thetr
command to squeeze multiple occurrences of a space into a single space to aid in formatting the output with acut
command. -
This exercise assumes you are on a real machine (not a virtual machine). You must also have a USB stick with you. Review the manual pages for the
tail
command and find out how to follow a file as text is appended to it. While monitoring the output of atail
command on the/var/log/syslog
file, insert a USB stick. Write out the full command that you would use to get the Product, Manufacturer and the total amount of memory of your USB stick.
Summary
Dealing with text streams is of great importance when administering any Linux system. Text streams can be processed using scripts to automate daily tasks or finding relevant debugging information in log files. Here is a short summary of the commands covered in this lesson:
cat
-
Used to combine or read plain text files.
bzcat
-
Allows for the processing or reading of files compressed using the
bzip2
method. xzcat
-
Allows for the processing or reading of files compressed using the
xz
method. zcat
-
Allows for the processing or reading of files compressed using the
gzip
method. less
-
This command paginates the contents of a file, and allows for navigation and search functionality .
head
-
This command will display the first 10 lines of a file by default. With the use of the
-n
switch fewer or more lines can be displayed. tail
-
This command will display the last 10 lines of a file by default. With the use of the
-n
switch fewer or more lines can be displayed. The-f
option is used to follow the output of a text file has new data is being written to it. wc
-
Short for “word count” but depending on the parameters you use it will count characters, words and lines.
sort
-
Used for the organizing the output of a listing alphabetically, reverse alphabetically, or in a random order.
uniq
-
Used to list (and count) matching strings.
od
-
The “octal dump” command is used to display a binary file in either octal, decimal, or hexadecimal notation.
nl
-
The “number line” command will display the number of lines in a file as well as recreate a file with each line prepended by its line number.
sed
-
The stream editor can be used to find matching occurrences of strings using Regular Expressions as well as editing files using pre-defined patterns.
tr
-
The translate command can replace characters and also removes and compresses repeating characters.
cut
-
This command can print columns of text files as fields based on a file’s character delimiter.
paste
-
Join files in columns based on the usage of field separators.
split
-
This command can split larger files into smaller ones depending on the criteria set by the command’s options.
md5sum
-
Used for calculating the MD5 hash value of a file. Also used to verify a file against an existing hash value to ensure a file’s integrity.
sha256sum
-
Used for calculating the SHA256 hash value of a file. Also used to verify a file against an existing hash value to ensure a file’s integrity.
sha512sum
-
Used for calculating the SHA512 hash value of a file. Also used to verify a file against an existing hash value to ensure a file’s integrity.
Answers to Guided Exercises
-
Someone just donated a laptop to your school and now you wish to install Linux on it. There is no manual and you were forced to boot it from a USB thumb drive with no graphics whatsoever. You do get a shell terminal and you know, for every processor you have there will be a line for it in the
/proc/cpuinfo
file:processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 (lines skipped) processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 158 (more lines skipped)
-
Using the commands
grep
andwc
display how many processors you have.Here are two options:
$ cat /proc/cpuinfo | grep processor | wc -l $ grep processor /proc/cpuinfo | wc -l
Now that you know there are several ways you can do the same thing, when should you be using one or the other? It really depends on several factors, the two most important ones being performance and readability. Most of the time you will use shell commands inside shell scripts to automate your tasks and the larger and more complex your scripts become the more you need to worry about keeping them fast.
-
Do the same thing with
sed
instead ofgrep
Now, instead of
grep
we will try this withsed
:$ sed -n /processor/p /proc/cpuinfo | wc -l
Here we used
sed
with the-n
parameter sosed
will not print anything except for what matches with the expressionprocessor
, as instructed by thep
command. As we did in thegrep
solutions,wc -l
will count the number of lines, thus the number of processors we have.Study this next example:
$ sed -n /processor/p /proc/cpuinfo | sed -n '$='
This command sequence provides identical results to the previous example where the output of
sed
was piped into awc
command. The difference here is that instead of usingwc -l
to count the number of lines,sed
is again invoked to provide equivalent functionality. Once more, we are suppressing the output ofsed
with with the-n
option, except for the expression that we are explicitly calling, which is'$='
. This expression tellssed
to match the last line ($
) and then to print that line number (=
).
-
-
Explore your local
/etc/passwd
file with thegrep
,sed
,head
andtail
commands per the tasks below:-
Which users have access to a Bash shell?
$ grep ":/bin/bash$" /etc/passwd
We will improve this answer by only displaying the name of the user that utilizes the Bash shell.
$ grep ":/bin/bash$" /etc/passwd | cut -d: -f1
The user name is the first field (
-f1
parameter of thecut
command) and the/etc/passwd
file uses:
as separators (-d:
parameter of thecut
command) we just pipe the output of thegrep
command to the appropriatecut
command. -
Your system has various users that exists to handle specific programs or for administrative purposes. They do not have access to a shell. How many of those exist in your system?
The easiest way to find this is by printing out the lines for accounts that do not use the Bash shell:
$ grep -v ":/bin/bash$" /etc/passwd | wc -l
-
How many users and groups exist in your system (remember: use only the
/etc/passwd
file)The first field of any given line in your
/etc/passwd
file is the user name, the second is typically anx
indicating the user password is not stored here (it is encrypted in the/etc/shadow
file). The third is the user id (UID) and the fourth is the group id (GID). So this should give us the number of users:$ cut -d: -f3 /etc/passwd | wc -l
Well, most of the time it will. However, there are situations where you will set different super users or other special kinds of users sharing the same UID (user id). So, to be on the safe side we will pipe the result of our
cut
command to thesort
command and then count the number of lines.$ cut -d: -f3 /etc/passwd | sort -u | wc -l
Now, for the number of groups:
$ cut -d: -f4 /etc/passwd | sort -u | wc -l
-
List only the first line, the last line and the tenth line of your
/etc/passwd
fileThis will do:
$ sed -n -e '1'p -e '10'p -e '$'p /etc/passwd
Remember that the parameter
-n
tellssed
not to print anything other than what is specified by thep
command. The dollar sign ($
) used here is a regular expression meaning the last line of the file.
-
-
Consider this
/etc/passwd
file example. Copy the lines below to a local file namedmypasswd
for this exercise.root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync nvidia-persistenced:x:121:128:NVIDIA Persistence Daemon,,,:/nonexistent:/sbin/nologin libvirt-qemu:x:64055:130:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin libvirt-dnsmasq:x:122:133:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin carol:x:1000:2000:Carol Smith,Finance,,,Main Office:/home/carol:/bin/bash dave:x:1001:1000:Dave Edwards,Finance,,,Main Office:/home/dave:/bin/ksh emma:x:1002:1000:Emma Jones,Finance,,,Main Office:/home/emma:/bin/bash frank:x:1003:1000:Frank Cassidy,Finance,,,Main Office:/home/frank:/bin/bash grace:x:1004:1000:Grace Kearns,Engineering,,,Main Office:/home/grace:/bin/ksh henry:x:1005:1000:Henry Adams,Sales,,,Main Office:/home/henry:/bin/bash john:x:1006:1000:John Chapel,Sales,,,Main Office:/home/john:/bin/bash
-
List all users in group 1000 (use
sed
to select only the appropriate field) from yourmypasswd
file:The GID is the fourth field in the
/etc/passwd
file. You might be tempted to try this:$ sed -n /1000/p mypasswd
In this case you will also get this line:
carol:x:1000:2000:Carol Smith,Finance,,,Main Office:/home/carol:/bin/bash
You know this is not correct since Carol Smith is a member of GID 2000 and the the match occurred because of the UID. However, you may have noticed that after the GID the next field starts with an upper case character. We can use a regular expression to solve this problem.
$ sed -n /:1000:[A-Z]/p mypasswd
The expression
[A-Z]
will match any single upper case character. You will learn more about this in the respective lesson. -
List only the full names of all the users for this group (use
sed
andcut
):Use the same technique you used to solve the first part of this exercise and pipe it to a
cut
command.$ sed -n /:1000:[A-Z]/p mypasswd | cut -d: -f5 Dave Edwards,Finance,,,Main Office Emma Jones,Finance,,,Main Office Frank Cassidy,Finance,,,Main Office Grace Kearns,Engineering,,,Main Office Henry Adams,Sales,,,Main Office John Chapel,Sales,,,Main Office
Not quite there! Do note how the fields inside your results can be separated by
,
. So we will pipe the output to anothercut
command, using the,
as a delimiter.$ sed -n /:1000:[A-Z]/p mypasswd | cut -d: -f5 | cut -d, -f1 Dave Edwards Emma Jones Frank Cassidy Grace Kearns Henry Adams John Chapel
-
Answers to Explorational Exercises
-
Once more using the
mypasswd
file from the previous exercises, devise a Bash command that will select one individual from the Main Office to win a raffle contest. Use thesed
command to only print out the lines for the Main Office, and then acut
command sequence to retrieve the first name of each user from these lines. Next you will want to randomly sort these names and only print out the top name from the list.First explore how the parameter
-R
manipulates the output of thesort
command. Repeat this command a couple times on your machine (note you will need to enclose 'Main Office' within single quotes, sosed
will handle it as a single string):$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f1 | sort -R
Here is a solution to the problem:
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f1 | sort -R | head -1
-
How many people work in Finance, Engineering and Sales? (Consider exploring the
uniq
command.)Keep building on top of what you learned from the previous exercises. Try the following:
$ sed -n /'Main Office'/p mypasswd $ sed -n /'Main Office'/p mypasswd | cut -d, -f2
Notice now we do not care about the
:
as a delimiter. We just want the second field when we split the lines by the,
characters.$ sed -n /'Main Office'/p mypasswd | cut -d, -f2 | uniq -c 4 Finance 1 Engineering 2 Sales
The
uniq
command will only output the unique lines (not the repeating lines) and the parameter-c
tellsuniq
to count the occurences of the equal lines. There is a caveat here:uniq
will only consider adjacent lines. When this is not the case you will have to use thesort
command. -
Now you want to prepare a CSV (comma separated values) file so you can easily import, from the
mypasswd
file in the previous example, the filenames.csv
into LibreOffice. The file contents will have the following format:First Name,Last Name,Position Carol,Smith,Finance ... John,Chapel,Sales
Tip: Use the
sed
,cut
, andpaste
commands to achieve the desired results. Note that the comma (,) will be the delimiter for this file.Start with the
sed
andcut
commands, building on top of what we learned from the previous exercises:$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d" " -f1 > firstname
Now we have the file
firstname
with the first names of our employees.$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d" " -f2 | cut -d, -f1 > lastname
Now we have the file
lastname
containing the surnames of each employee.Next we determine which department each employee works in:
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f2 > department
Before we work on the final solution, try the following commands to see what type of output they generate:
$ cat firstname lastname department $ paste firstname lastname department
And now for the final solution:
$ paste firstname lastname department | tr '\t' , $ paste firstname lastname department | tr '\t' , > names.csv
Here we use the command
tr
to translate\t
, the tab separator, by a,
.tr
is quite useful when we need to exchange one character for another. Be sure to review the man pages for bothtr
andpaste
. For example, we can use the-d
option for the delimiter to make the previous command less complex:$ paste -d, firstname lastname department
We used the
paste
command here once we needed to get you familiar with it. However we could have easily performed all of the tasks in a single command chain:$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f1,2 | tr ' ' , > names.csv
-
Suppose that the
names.csv
spreadsheet created in the previous exercise is an important file and we want to make sure nobody will tamper with it from the moment we send it to someone and the moment our recipient receives it. How can we insure the integrity of this file usingmd5sum
?If you look into the man pages for
md5sum
,sha256sum
andsha512sum
you will see they all start with the following text:“compute and check XXX message digest”
Where “XXX” is the algoritm that will be used to create this message digest.
We will use
md5sum
as an example and later you can try with the other commands.$ md5sum names.csv 61f0251fcab61d9575b1d0cbf0195e25 names.csv
Now, for instance, you can make the file available through a secure ftp service and send the generated message digest using another secure means of communication. If the file has been slightly modified the message digest will be completely different. Just to prove it, edit
names.csv
and change Jones to James as demonstrated here:$ sed -i.backup s/Jones/James/ names.csv $ md5sum names.csv f44a0d68cb480466099021bf6d6d2e65 names.csv
Whenever you make files available for download, it is always a good practice to also distribute a message digest correspondent so people who download your file can produce a new message digest and check against the original. If you browse through https://kernel.org you will find the page https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/sha256sums.asc where you can obtain the sha256sum for all files available for download.
-
You promised yourself that you would read a classic book 100 lines per day and you decided to start with Mariner and Mystic by Herman Melville. Devise a command using
split
that will separate this book into sections of 100 lines each. In order to get the book in plain text format, search for it at https://www.gutenberg.org.First we will get the whole book from the Project Gutenberg site, where you can get this and other books that are available in the public domain.
$ wget https://www.gutenberg.org/files/50461/50461-0.txt
You might need to install
wget
if it is not already installed in your system. Alternatively, you can also usecurl
. Useless
to verify the book:$ less 50461-0.txt
Now we will split the book into chunks of 100 lines each:
$ split -l 100 -d 50461-0.txt melville
50461-0.txt
is the file we will be splitting.melville
will be the prefix for the split files. The-l 100
specifies the number of lines and the-d
option tellssplit
to number the files (using the provided suffix). You can usenl
on any of the splitted files (probably not on the last one) and confirm each one of them have 100 lines. -
Using
ls -l
on the/etc
directory, what kind of listing do you get? Using thecut
command on the output of the givenls
command how would you display only the file names? What about the filename and the owner of the files? Along with thels -l
andcut
commands, utilize thetr
command to squeeze multiple occurrences of a space into a single space to aid in formatting the output with acut
command.The
ls
command by itself will give you just the names of the files. We can, however, prepare the output of thels -l
(the long listing) to extract more specific information.$ ls -l /etc | tr -s ' ' , drwxr-xr-x,3,root,root,4096,out,24,16:58,acpi -rw-r--r--,1,root,root,3028,dez,17,2018,adduser.conf -rw-r--r--,1,root,root,10,out,2,17:38,adjtime drwxr-xr-x,2,root,root,12288,out,31,09:40,alternatives -rw-r--r--,1,root,root,401,mai,29,2017,anacrontab -rw-r--r--,1,root,root,433,out,1,2017,apg.conf drwxr-xr-x,6,root,root,4096,dez,17,2018,apm drwxr-xr-x,3,root,root,4096,out,24,16:58,apparmor drwxr-xr-x,9,root,root,4096,nov,6,20:20,apparmor.d
The
-s
parameter instructstr
to shrink the repeated spaces into a single instance of a space. Thetr
command works for any kind of repeating character you specify. Then we replace the spaces with a comma,
. We actually do not need to replace the spaces in our example so we will just omit the,
.$ ls -l /etc | tr -s ' ' drwxr-xr-x 3 root root 4096 out 24 16:58 acpi -rw-r--r-- 1 root root 3028 dez 17 2018 adduser.conf -rw-r--r-- 1 root root 10 out 2 17:38 adjtime drwxr-xr-x 2 root root 12288 out 31 09:40 alternatives -rw-r--r-- 1 root root 401 mai 29 2017 anacrontab -rw-r--r-- 1 root root 433 out 1 2017 apg.conf drwxr-xr-x 6 root root 4096 dez 17 2018 apm drwxr-xr-x 3 root root 4096 out 24 16:58 apparmor
If I want just the filenames then all that we need displayed is the ninth field:
$ ls -l /etc | tr -s ' ' | cut -d" " -f9
For the filename and the owner of a file we will need the ninth and the third fields:
$ ls -l /etc | tr -s ' ' | cut -d" " -f9,3
What if we just need the folder names and its owner?
$ ls -l /etc | grep ^d | tr -s ' ' | cut -d" " -f9,3
-
This exercise assumes you are on a real machine (not a virtual machine). You must also have a USB stick with you. Review the manual pages for the
tail
command and find out how to follow a file as text is appended to it. While monitoring the output of atail
command on the/var/log/syslog
file, insert a USB stick. Write out the full command that you would use to get the Product, Manufacturer and the total amount of memory of your USB stick.$ tail -f /var/log/syslog | grep -i 'product\:\|blocks\|manufacturer' Nov 8 06:01:35 brod-avell kernel: [124954.369361] usb 1-4.3: Product: Cruzer Blade Nov 8 06:01:35 brod-avell kernel: [124954.369364] usb 1-4.3: Manufacturer: SanDisk Nov 8 06:01:37 brod-avell kernel: [124955.419267] sd 2:0:0:0: [sdc] 61056064 512-byte logical blocks: (31.3 GB/29.1 GiB)
Of course this is an example and the results may vary depending on your USB memory stick manufacturer. Notice now we use the
-i
parameter with thegrep
command as we are not sure if the strings we are searching for were in upper or lower case. We also used the|
as a logical OR so we search for lines containingproduct
ORblocks
ORmanufacturer
.