103.2 Lesson 1

Certificate:

LPIC-1

Version:

5.0

Topic:

103 GNU and Unix Commands

Objective:

103.2 Process text streams using filters

Lesson:

1 of 1

Introduction

Dealing with text is a major part of every systems administrator’s job. Doug McIlroy, a member of the original Unix development team, summarized the Unix philosophy and said (among other important things): “Write programs to handle text streams, because that is a universal interface.” Linux is inspired by the Unix operating system and it firmly adopts its philosophy, so an administrator must expect lots of text manipulation tools within a Linux distribution.

A Quick Review on Redirections and Pipes

Also from the Unix philosophy:

Write programs that do one thing and do it well.
Write programs to work together.

One major way of making programs work together is through piping and redirections. Pretty much all of your text manipulation programs will get text from a standard input (stdin), output it to a standard output (stdout) and send eventual errors to a standard error output (stderr). Unless you specify otherwise, the standard input will be what you type on your keyboard (the program will read it after you press the Enter key). Similarly, the standard output and errors will be displayed in your terminal screen. Let us see how this works.

In your terminal, type cat and then hit the Enter key. Then type some random text.

$ cat
This is a test
This is a test
Hey!
Hey!
It is repeating everything I type!
It is repeating everything I type!
(I will hit ctrl+c so I will stop this nonsense)
(I will hit ctrl+c so I will stop this nonsense)
^C

For more information about the cat command (the term comes from “concatenate”) please refer to the man pages.

Note	If you are working on a really plain installation of a Linux server, some commands such as `info` and `less` might not be available. If this is the case, install these tools using the proper procedure in your system as described in the corresponding lessons.

As demonstrated above if you do not specify where cat should read from, it will read from the standard input (whatever you type) and output whatever it reads to your terminal window (its standard output).

Now try the following:

$ cat > mytextfile
This is a test
I hope cat is storing this to mytextfile as I redirected the output
I will hit ctrl+c now and check this
^C

$ cat mytextfile
This is a test
I hope cat is storing this to mytextfile as I redirected the output
I will hit ctrl+c now and check this

The > (greater than) tells cat to direct its output to the mytextfile file, not the standard output. Now try this:

$ cat mytextfile > mynewtextfile
$ cat mynewtextfile
This is a test
I hope cat is storing this to mytextfile as I redirected the output
I will hit ctrl+c now and check this

This has the effect of copying mytextfile to mynewtextfile. You can actually verify that these two files have the same content by performing a diff:

$ diff mynewtextfile mytextfile

As there is no output, the files are equal. Now try the append redirection operator (>>):

$ echo 'This is my new line' >> mynewtextfile
$ diff mynewtextfile mytextfile
4d3
< This is my new line

So far we have used redirections to create and manipulate files. We can also use pipes (represented by the symbol |) to redirect the output of one program to another program. Let us find the lines where the word “this” is found:

$ cat mytextfile | grep this
I hope cat is storing this to mytextfile as I redirected the output
I will hit ctrl+c now and check this

$ cat mytextfile | grep -i this
This is a test
I hope cat is storing this to mytextfile as I redirected the output
I will hit ctrl+c now and check this

Now we have piped the output of cat to another command: grep. Notice when we ignore the case (using the -i option) we get an extra line as a result.

Processing Text Streams

Reading a Compressed File

We will create a file called ftu.txt containing a list of the following commands:

bzcat
cat
cut
head
less
md5sum
nl
od
paste
sed
sha256sum
sha512sum
sort
split
tail
tr
uniq
wc
xzcat
zcat

Now we will use the grep command to print all of the lines containing the string cat:

$ cat ftu.txt | grep cat
bzcat
cat
xzcat
zcat

Another way to get this information is to just use the grep command to filter the text directly, without the need to use another application to send the text stream to stdout.

$ grep cat ftu.txt
bzcat
cat
xzcat
zcat

Note	Remember there are many ways to perform the same task using Linux.

There are other commands that handle compressed files (bzcat for bzip compressed files, xzcat for xz compressed files and zcat for gzip compressed files) and each one is used to view the contents of a compressed file based on the compression algorithm used.

Verify that the newly created file ftu.txt is the only one in the directory, then create a gzip compressed version of the file:

$ ls ftu*
ftu.txt

$ gzip ftu.txt
$ ls ftu*
ftu.txt.gz

Next, use the zcat command to view the contents of the gzipped compressed file:

$ zcat ftu.txt.gz
bzcat
cat
cut
head
less
md5sum
nl
od
paste
sed
sha256sum
sha512sum
sort
split
tail
tr
uniq
wc
xzcat
zcat

Note that gzip will compress ftu.txt into ftu.txt.gz and it will remove the original file. By default, no output from the gzip command will be displayed. However, if you do want gzip to tell you what it is doing, use the -v option for the “verbose” output.

Viewing a File in a Pager

You know cat concatenates a file to the standard output (once a file is provided after the command). The file /var/log/syslog is where your Linux system stores everything important going on in your system. Using the sudo command to elevate privileges so as to be able to read the /var/log/syslog file:

$ sudo cat /var/log/syslog

…you will see messages scrolling very fast within your terminal window. You can pipe the output to the program less so the results will be paginated. By using less you can use the arrow keys to navigate through the output and also use vi like commands to navigate and search throughout the text.

However, rather than pipe the cat command into a pagination program it is more pragmatic to just use the pagination program directly:

$ sudo less /var/log/syslog
... (output omitted for clarity)

Getting a Portion of a Text File

If only the start or end of a file needs to be reviewed, there are other methods available. The command head is used to read the first ten lines of a file by default, and the command tail is used to read the last ten lines of a file by default. Now try:

$ sudo head /var/log/syslog
Nov 12 08:04:30 hypatia rsyslogd: [origin software="rsyslogd" swVersion="8.1910.0" x-pid="811" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Nov 12 08:04:30 hypatia systemd[1]: logrotate.service: Succeeded.
Nov 12 08:04:30 hypatia systemd[1]: Started Rotate log files.
Nov 12 08:04:30 hypatia vdr: [928] video directory scanner thread started (pid=882, tid=928, prio=low)
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'A - ATSC'
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'C - DVB-C'
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'S - DVB-S'
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'T - DVB-T'
Nov 12 08:04:30 hypatia vdr[882]: vdr: no primary device found - using first device!
Nov 12 08:04:30 hypatia vdr: [929] epg data reader thread started (pid=882, tid=929, prio=high)
$ sudo tail /var/log/syslog
Nov 13 10:24:45 hypatia kernel: [ 8001.679238] mce: CPU7: Core temperature/speed normal
Nov 13 10:24:46 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Activating via systemd: service name='org.freedesktop.Tracker1.Miner.Extract' unit='tracker-extract.service' requested by ':1.73' (uid=1000 pid=2425 comm="/usr/lib/tracker/tracker-miner-fs ")
Nov 13 10:24:46 hypatia systemd[2004]: Starting Tracker metadata extractor...
Nov 13 10:24:47 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Successfully activated service 'org.freedesktop.Tracker1.Miner.Extract'
Nov 13 10:24:47 hypatia systemd[2004]: Started Tracker metadata extractor.
Nov 13 10:24:54 hypatia kernel: [ 8010.462227] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 502907)
Nov 13 10:24:54 hypatia kernel: [ 8010.462228] mce: CPU4: Core temperature above threshold, cpu clock throttled (total events = 502911)
Nov 13 10:24:54 hypatia kernel: [ 8010.469221] mce: CPU0: Core temperature/speed normal
Nov 13 10:24:54 hypatia kernel: [ 8010.469222] mce: CPU4: Core temperature/speed normal
Nov 13 10:25:03 hypatia systemd[2004]: tracker-extract.service: Succeeded.

To help illustrate the number of lines displayed, we can pipe the output of the head command to the nl command, which will display the number of lines of text streamed into the command:

$ sudo head /var/log/syslog | nl
1	Nov 12 08:04:30 hypatia rsyslogd: [origin software="rsyslogd" swVersion="8.1910.0" x-pid="811" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
2	Nov 12 08:04:30 hypatia systemd[1]: logrotate.service: Succeeded.
3	Nov 12 08:04:30 hypatia systemd[1]: Started Rotate log files.
4	Nov 12 08:04:30 hypatia vdr: [928] video directory scanner thread started (pid=882, tid=928, prio=low)
5	Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'A - ATSC'
6	Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'C - DVB-C'
7	Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'S - DVB-S'
8	Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'T - DVB-T'
9	Nov 12 08:04:30 hypatia vdr[882]: vdr: no primary device found - using first device!
10	Nov 12 08:04:30 hypatia vdr: [929] epg data reader thread started (pid=882, tid=929, prio=high)

And we can do the same by piping the output of the tail command to the wc command, which by default will count the number of words within a document, and using the -l switch to print out the number of lines of text that the command has read:

$ sudo tail /var/log/syslog | wc -l
10

Should an administrator need to review more (or less) of the beginning or end of a file, the -n option can be used to limit the commands' output:

$ sudo tail -n 5 /var/log/syslog
Nov 13 10:37:24 hypatia systemd[2004]: tracker-extract.service: Succeeded.
Nov 13 10:37:42 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Activating via systemd: service name='org.freedesktop.Tracker1.Miner.Extract' unit='tracker-extract.service' requested by ':1.73' (uid=1000 pid=2425 comm="/usr/lib/tracker/tracker-miner-fs ")
Nov 13 10:37:42 hypatia systemd[2004]: Starting Tracker metadata extractor...
Nov 13 10:37:43 hypatia dbus-daemon[2023]: [session uid=1000 pid=2023] Successfully activated service 'org.freedesktop.Tracker1.Miner.Extract'
Nov 13 10:37:43 hypatia systemd[2004]: Started Tracker metadata extractor.
$ sudo head -n 12 /var/log/syslog
Nov 12 08:04:30 hypatia rsyslogd: [origin software="rsyslogd" swVersion="8.1910.0" x-pid="811" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Nov 12 08:04:30 hypatia systemd[1]: logrotate.service: Succeeded.
Nov 12 08:04:30 hypatia systemd[1]: Started Rotate log files.
Nov 12 08:04:30 hypatia vdr: [928] video directory scanner thread started (pid=882, tid=928, prio=low)
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'A - ATSC'
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'C - DVB-C'
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'S - DVB-S'
Nov 12 08:04:30 hypatia vdr: [882] registered source parameters for 'T - DVB-T'
Nov 12 08:04:30 hypatia vdr[882]: vdr: no primary device found - using first device!
Nov 12 08:04:30 hypatia vdr: [929] epg data reader thread started (pid=882, tid=929, prio=high)
Nov 12 08:04:30 hypatia vdr: [882] no DVB device found
Nov 12 08:04:30 hypatia vdr: [882] initializing plugin: vnsiserver (1.8.0): VDR-Network-Streaming-Interface (VNSI) Server

The Basics of sed, the Stream Editor

Let us take a look at the other files, terms and utilities that do not have cat in their names. We can do this by passing the -v option to grep, which instructs the command to output only the lines not containing cat:

$ zcat ftu.txt.gz | grep -v cat
cut
head
less
md5sum
nl
od
paste
sed
sha256sum
sha512sum
sort
split
tail
tr
uniq
wc

Most of what we can do with grep we can also do with sed — the stream editor for filtering and transforming text (as stated in the sed manual page). First we will recover our ftu.txt file by decompressing our gzip archive of the file:

$ gunzip ftu.txt.gz
$ ls ftu*
ftu.txt

Now, we can use sed to list only the lines containing the string cat:

$ sed -n /cat/p < ftu.txt
bzcat
cat
xzcat
zcat

We have used the less-than sign < to direct the contents of the file ftu.txt into into our sed command. The word enclosed between slashes (i.e. /cat/) is the term we are searching for. The -n option instructs sed to produce no output (unless the ones later instructed by the p command). Try running this same command without the -n option to see what happens. Then try this:

$ sed /cat/d < ftu.txt
cut
head
less
md5sum
nl
od
paste
sed
sha256sum
sha512sum
sort
split
tail
tr
uniq
wc

If we do not use the -n option, sed will print everything from the file except for what the d instructs sed to delete from its output.

A common use of sed is to find and replace text within a file. Suppose you want to change every occurrence of cat to dog. You can use sed to do this by supplying the s option to swap out each instance of the first term, cat, for the second term, dog:

$ sed s/cat/dog/ < ftu.txt
bzdog
dog
cut
head
less
md5sum
nl
od
paste
sed
sha256sum
sha512sum
sort
split
tail
tr
uniq
wc
xzdog
zdog

Rather than using a redirection operator (<) to pass the ftu.txt file into our sed command, we can just have the sed command operate on the file directly. We will try that next, while simultaneously creating a backup of the original file:

$ sed -i.backup s/cat/dog/ ftu.txt
$ ls ftu*
ftu.txt  ftu.txt.backup

The -i option will perform an in-place sed operation on your original file. If you do not use the .backup after the -i parameter, you would just have rewritten your original file. Whatever you use as text after the -i parameter will be the name the original file will be saved to prior to the modifications you asked sed to perform.

Ensuring Data Integrity

We have demonstrated how easy it is to manipulate files in Linux. There are times where you may wish to distribute a file to someone else, and you want to be sure that the recipient ends up with a true copy of the original file. A very common use of this technique is practiced when Linux distribution servers host downloadable CD or DVD images of their software along with files that contain the calculated checksum values of those disc images. Here is an example listing from a Debian download mirror:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[PARENTDIR] Parent Directory                                      -
[SUM]       MD5SUMS                              2019-09-08 17:46 274
[CRT]       MD5SUMS.sign                         2019-09-08 17:52 833
[SUM]       SHA1SUMS                             2019-09-08 17:46 306
[CRT]       SHA1SUMS.sign                        2019-09-08 17:52 833
[SUM]       SHA256SUMS                           2019-09-08 17:46 402
[CRT]       SHA256SUMS.sign                      2019-09-08 17:52 833
[SUM]       SHA512SUMS                           2019-09-08 17:46 658
[CRT]       SHA512SUMS.sign                      2019-09-08 17:52 833
[ISO]       debian-10.1.0-amd64-netinst.iso      2019-09-08 04:37 335M
[ISO]       debian-10.1.0-amd64-xfce-CD-1.iso    2019-09-08 04:38 641M
[ISO]       debian-edu-10.1.0-amd64-netinst.iso  2019-09-08 04:38 405M
[ISO]       debian-mac-10.1.0-amd64-netinst.iso  2019-09-08 04:38 334M
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In the listing above, the Debian installer image files are accompanied by text files that contain checksums of the files from the various algorithms (MD5, SHA1, SHA256 and SHA512).

Note	A checksum is a value derived from a mathematical computation, based on a cryptographic hash function, against a file. There are different types of cryptographic hash functions that vary in strength. The exam will expect you to be familiar with using `md5sum`, `sha256sum` and `sha512sum`.

Once you download a file (for example, the debian-10.1.0-amd64-netinst.iso image) you would then compare the checksum of the file that was downloaded against a checksum value that was provided for you.

Here is an example to illustrate the point. We will calculate the SHA256 value of the ftu.txt file using the sha256sum command:

$ sha256sum ftu.txt
345452304fc26999a715652543c352e5fc7ee0c1b9deac6f57542ec91daf261c  ftu.txt

The long string of characters preceding the file name is the SHA256 checksum value of this text file. Let us create a file that contains that value, so that we can use it to verify the integrity of our original text file. We can do this with the same sha256sum command and redirect the output to a file:

$ sha256sum ftu.txt > sha256.txt

Now, to verify the ftu.txt file, we just use the same command and supply the filename that contains our checksum value along with the -c switch:

$ sha256sum -c sha256.txt
ftu.txt: OK

The value contained within the file matches the calculated SHA256 checksum for our ftu.txt file, just as we would expect. However, if the original file were modified (such as a few bytes lost during a file download, or someone had deliberately tampered with it) the value check would fail. In such cases we know that our file is bad or corrupted, and we can not trust the integrity of its contents. To prove the point, we will add some text at the end of the file:

$ echo "new entry" >> ftu.txt

Now we will make an attempt to verify the file’s integrity:

$ sha256sum -c sha256.txt
ftu.txt: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

And we see that the checksum does not match what was expected for the file. Therefore, we could not trust the integrity of this file. We could attempt to download a new copy of a file, report the failure of the checksum to the sender of the file, or report it to a data center security team depending on the importance of the file.

Looking Deeper into Files

The octal dump (od) command is often used for debugging applications and various files. By itself, the od command will just list out a file’s contents in octal format. We can use our ftu.txt file from earlier to practice with this command:

$ od ftu.txt
0000000 075142 060543 005164 060543 005164 072543 005164 062550
0000020 062141 066012 071545 005163 062155 071465 066565 067012
0000040 005154 062157 070012 071541 062564 071412 062145 071412
0000060 060550 032462 071466 066565 071412 060550 030465 071462
0000100 066565 071412 071157 005164 070163 064554 005164 060564
0000120 066151 072012 005162 067165 070551 073412 005143 075170
0000140 060543 005164 061572 072141 000012
0000151

The first column of output is the byte offset for each line of output. Since od prints out information in octal format by default, each line begins with the byte offset of eight bits, followed by eight columns, each containing the octal value of the data within that column.

Tip	Recall that a byte is 8 bits in length.

Should you need to view a file’s contents in hexadecimal format, use the -x option:

$ od -x ftu.txt
0000000 7a62 6163 0a74 6163 0a74 7563 0a74 6568
0000020 6461 6c0a 7365 0a73 646d 7335 6d75 6e0a
0000040 0a6c 646f 700a 7361 6574 730a 6465 730a
0000060 6168 3532 7336 6d75 730a 6168 3135 7332
0000100 6d75 730a 726f 0a74 7073 696c 0a74 6174
0000120 6c69 740a 0a72 6e75 7169 770a 0a63 7a78
0000140 6163 0a74 637a 7461 000a
0000151

Now each of the eight columns after the byte offset are represented by their hexadecimal equivalents.

One handy use of the od command is for debugging scripts. For example, the od command can show us characters that are not normally seen that exist within a file, such as newline entries. We can do this with the -c option, so that instead of displaying the numerical notation for each byte, these column entries will instead be shown as their character equivalents:

$ od -c ftu.txt
0000000   b   z   c   a   t  \n   c   a   t  \n   c   u   t  \n   h   e
0000020   a   d  \n   l   e   s   s  \n   m   d   5   s   u   m  \n   n
0000040   l  \n   o   d  \n   p   a   s   t   e  \n   s   e   d  \n   s
0000060   h   a   2   5   6   s   u   m  \n   s   h   a   5   1   2   s
0000100   u   m  \n   s   o   r   t  \n   s   p   l   i   t  \n   t   a
0000120   i   l  \n   t   r  \n   u   n   i   q  \n   w   c  \n   x   z
0000140   c   a   t  \n   z   c   a   t  \n
0000151

All of the newline entries within the file are represented by the hidden \n characters. If you just want to view all of the characters within a file, and do not need to see the byte offset information, the byte offset column can be removed from the output like so:

$ od -An -c ftu.txt
   b   z   c   a   t  \n   c   a   t  \n   c   u   t  \n   h   e
   a   d  \n   l   e   s   s  \n   m   d   5   s   u   m  \n   n
   l  \n   o   d  \n   p   a   s   t   e  \n   s   e   d  \n   s
   h   a   2   5   6   s   u   m  \n   s   h   a   5   1   2   s
   u   m  \n   s   o   r   t  \n   s   p   l   i   t  \n   t   a
   i   l  \n   t   r  \n   u   n   i   q  \n   w   c  \n   x   z
   c   a   t  \n   z   c   a   t  \n

Guided Exercises

Someone just donated a laptop to your school and now you wish to install Linux on it. There is no manual and you were forced to boot it from a USB thumb drive with no graphics whatsoever. You do get a shell terminal and you know, for every processor you have there will be a line for it in the /proc/cpuinfo file:
```
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158

(lines skipped)

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158

(more lines skipped)
```
- Using the commands grep and wc display how many processors you have.
- Do the same thing with sed instead of grep.
Explore your local /etc/passwd file with the grep, sed, head and tail commands per the tasks below:
- Which users have access to a Bash shell?
- Your system has various users that exist to handle specific programs or for administrative purposes. They do not have access to a shell. How many of those exist in your system?
- How many users and groups exist in your system (remember: use only the /etc/passwd file)?
- List only the first line, the last line and the tenth line of your /etc/passwd file.

Consider this /etc/passwd file example. Copy the lines below to a local file named mypasswd for this exercise.

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
nvidia-persistenced:x:121:128:NVIDIA Persistence Daemon,,,:/nonexistent:/sbin/nologin
libvirt-qemu:x:64055:130:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin
libvirt-dnsmasq:x:122:133:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin
carol:x:1000:2000:Carol Smith,Finance,,,Main Office:/home/carol:/bin/bash
dave:x:1001:1000:Dave Edwards,Finance,,,Main Office:/home/dave:/bin/ksh
emma:x:1002:1000:Emma Jones,Finance,,,Main Office:/home/emma:/bin/bash
frank:x:1003:1000:Frank Cassidy,Finance,,,Main Office:/home/frank:/bin/bash
grace:x:1004:1000:Grace Kearns,Engineering,,,Main Office:/home/grace:/bin/ksh
henry:x:1005:1000:Henry Adams,Sales,,,Main Office:/home/henry:/bin/bash
john:x:1006:1000:John Chapel,Sales,,,Main Office:/home/john:/bin/bash

List all users in group 1000 (use sed to select only the appropriate field) from your mypasswd file.
List only the full names of all the users for this group (use sed and cut).

Explorational Exercises

Once more using the mypasswd file from the previous exercises, devise a Bash command that will select one individual from the Main Office to win a raffle contest. Use the sed command to only print out the lines for the Main Office, and then a cut command sequence to retrieve the first name of each user from these lines. Next you will want to randomly sort these names and only print out the top name from the list.
How many people work in Finance, Engineering and Sales? (Consider exploring the uniq command.)
Now you want to prepare a CSV (comma separated values) file so you can easily import, from the mypasswd file in the previous example, the file names.csv into LibreOffice. The file contents will have the following format:
```
First Name,Last Name,Position
Carol,Smith,Finance
...
John,Chapel,Sales
```
Tip: Use the sed, cut, and paste commands to achieve the desired results. Note that the comma (,) will be the delimiter for this file.
Suppose that the names.csv spreadsheet created in the previous exercise is an important file and we want to make sure nobody will tamper with it from the moment we send it to someone and the moment our recipient receives it. How can we insure the integrity of this file using md5sum?
You promised yourself that you would read a classic book 100 lines per day and you decided to start with Mariner and Mystic by Herman Melville. Devise a command using split that will separate this book into sections of 100 lines each. In order to get the book in plain text format, search for it at https://www.gutenberg.org.
Using ls -l on the /etc directory, what kind of listing do you get? Using the cut command on the output of the given ls command how would you display only the file names? What about the filename and the owner of the files? Along with the ls -l and cut commands, utilize the tr command to squeeze multiple occurrences of a space into a single space to aid in formatting the output with a cut command.
This exercise assumes you are on a real machine (not a virtual machine). You must also have a USB stick with you. Review the manual pages for the tail command and find out how to follow a file as text is appended to it. While monitoring the output of a tail command on the /var/log/syslog file, insert a USB stick. Write out the full command that you would use to get the Product, Manufacturer and the total amount of memory of your USB stick.

Summary

Dealing with text streams is of great importance when administering any Linux system. Text streams can be processed using scripts to automate daily tasks or finding relevant debugging information in log files. Here is a short summary of the commands covered in this lesson:

cat: Used to combine or read plain text files.
bzcat: Allows for the processing or reading of files compressed using the bzip2 method.
xzcat: Allows for the processing or reading of files compressed using the xz method.
zcat: Allows for the processing or reading of files compressed using the gzip method.
less: This command paginates the contents of a file, and allows for navigation and search functionality .
head: This command will display the first 10 lines of a file by default. With the use of the -n switch fewer or more lines can be displayed.
tail: This command will display the last 10 lines of a file by default. With the use of the -n switch fewer or more lines can be displayed. The -f option is used to follow the output of a text file has new data is being written to it.
wc: Short for “word count” but depending on the parameters you use it will count characters, words and lines.
sort: Used for the organizing the output of a listing alphabetically, reverse alphabetically, or in a random order.
uniq: Used to list (and count) matching strings.
od: The “octal dump” command is used to display a binary file in either octal, decimal, or hexadecimal notation.
nl: The “number line” command will display the number of lines in a file as well as recreate a file with each line prepended by its line number.
sed: The stream editor can be used to find matching occurrences of strings using Regular Expressions as well as editing files using pre-defined patterns.
tr: The translate command can replace characters and also removes and compresses repeating characters.
cut: This command can print columns of text files as fields based on a file’s character delimiter.
paste: Join files in columns based on the usage of field separators.
split: This command can split larger files into smaller ones depending on the criteria set by the command’s options.
md5sum: Used for calculating the MD5 hash value of a file. Also used to verify a file against an existing hash value to ensure a file’s integrity.
sha256sum: Used for calculating the SHA256 hash value of a file. Also used to verify a file against an existing hash value to ensure a file’s integrity.
sha512sum: Used for calculating the SHA512 hash value of a file. Also used to verify a file against an existing hash value to ensure a file’s integrity.

Answers to Guided Exercises

Someone just donated a laptop to your school and now you wish to install Linux on it. There is no manual and you were forced to boot it from a USB thumb drive with no graphics whatsoever. You do get a shell terminal and you know, for every processor you have there will be a line for it in the /proc/cpuinfo file:
```
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158

(lines skipped)

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158

(more lines skipped)
```
- Using the commands grep and wc display how many processors you have.
  
  Here are two options:
  $ cat /proc/cpuinfo | grep processor | wc -l $ grep processor /proc/cpuinfo | wc -l
  Now that you know there are several ways you can do the same thing, when should you be using one or the other? It really depends on several factors, the two most important ones being performance and readability. Most of the time you will use shell commands inside shell scripts to automate your tasks and the larger and more complex your scripts become the more you need to worry about keeping them fast.
- Do the same thing with sed instead of grep
  
  Now, instead of grep we will try this with sed:
  $ sed -n /processor/p /proc/cpuinfo | wc -l
  Here we used sed with the -n parameter so sed will not print anything except for what matches with the expression processor, as instructed by the p command. As we did in the grep solutions, wc -l will count the number of lines, thus the number of processors we have.
  
  Study this next example:
  $ sed -n /processor/p /proc/cpuinfo | sed -n '$='
  This command sequence provides identical results to the previous example where the output of sed was piped into a wc command. The difference here is that instead of using wc -l to count the number of lines, sed is again invoked to provide equivalent functionality. Once more, we are suppressing the output of sed with with the -n option, except for the expression that we are explicitly calling, which is '$='. This expression tells sed to match the last line ($) and then to print that line number (=).
Explore your local /etc/passwd file with the grep, sed, head and tail commands per the tasks below:
- Which users have access to a Bash shell?
  $ grep ":/bin/bash$" /etc/passwd
  We will improve this answer by only displaying the name of the user that utilizes the Bash shell.
  $ grep ":/bin/bash$" /etc/passwd | cut -d: -f1
  The user name is the first field (-f1 parameter of the cut command) and the /etc/passwd file uses : as separators (-d: parameter of the cut command) we just pipe the output of the grep command to the appropriate cut command.
- Your system has various users that exists to handle specific programs or for administrative purposes. They do not have access to a shell. How many of those exist in your system?
  
  The easiest way to find this is by printing out the lines for accounts that do not use the Bash shell:
  $ grep -v ":/bin/bash$" /etc/passwd | wc -l
- How many users and groups exist in your system (remember: use only the /etc/passwd file)
  
  The first field of any given line in your /etc/passwd file is the user name, the second is typically an x indicating the user password is not stored here (it is encrypted in the /etc/shadow file). The third is the user id (UID) and the fourth is the group id (GID). So this should give us the number of users:
  $ cut -d: -f3 /etc/passwd | wc -l
  Well, most of the time it will. However, there are situations where you will set different super users or other special kinds of users sharing the same UID (user id). So, to be on the safe side we will pipe the result of our cut command to the sort command and then count the number of lines.
  $ cut -d: -f3 /etc/passwd | sort -u | wc -l
  Now, for the number of groups:
  $ cut -d: -f4 /etc/passwd | sort -u | wc -l
- List only the first line, the last line and the tenth line of your /etc/passwd file
  
  This will do:
  $ sed -n -e '1'p -e '10'p -e '$'p /etc/passwd
  Remember that the parameter -n tells sed not to print anything other than what is specified by the p command. The dollar sign ($) used here is a regular expression meaning the last line of the file.

Consider this /etc/passwd file example. Copy the lines below to a local file named mypasswd for this exercise.

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
nvidia-persistenced:x:121:128:NVIDIA Persistence Daemon,,,:/nonexistent:/sbin/nologin
libvirt-qemu:x:64055:130:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin
libvirt-dnsmasq:x:122:133:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin
carol:x:1000:2000:Carol Smith,Finance,,,Main Office:/home/carol:/bin/bash
dave:x:1001:1000:Dave Edwards,Finance,,,Main Office:/home/dave:/bin/ksh
emma:x:1002:1000:Emma Jones,Finance,,,Main Office:/home/emma:/bin/bash
frank:x:1003:1000:Frank Cassidy,Finance,,,Main Office:/home/frank:/bin/bash
grace:x:1004:1000:Grace Kearns,Engineering,,,Main Office:/home/grace:/bin/ksh
henry:x:1005:1000:Henry Adams,Sales,,,Main Office:/home/henry:/bin/bash
john:x:1006:1000:John Chapel,Sales,,,Main Office:/home/john:/bin/bash

List all users in group 1000 (use sed to select only the appropriate field) from your mypasswd file:

The GID is the fourth field in the /etc/passwd file. You might be tempted to try this:
```
$ sed -n /1000/p mypasswd
```
In this case you will also get this line:
```
carol:x:1000:2000:Carol Smith,Finance,,,Main Office:/home/carol:/bin/bash
```
You know this is not correct since Carol Smith is a member of GID 2000 and the the match occurred because of the UID. However, you may have noticed that after the GID the next field starts with an upper case character. We can use a regular expression to solve this problem.
```
$ sed -n /:1000:[A-Z]/p mypasswd
```
The expression [A-Z] will match any single upper case character. You will learn more about this in the respective lesson.

List only the full names of all the users for this group (use sed and cut):

Use the same technique you used to solve the first part of this exercise and pipe it to a cut command.

$ sed -n /:1000:[A-Z]/p mypasswd | cut -d: -f5
Dave Edwards,Finance,,,Main Office
Emma Jones,Finance,,,Main Office
Frank Cassidy,Finance,,,Main Office
Grace Kearns,Engineering,,,Main Office
Henry Adams,Sales,,,Main Office
John Chapel,Sales,,,Main Office

Not quite there! Do note how the fields inside your results can be separated by ,. So we will pipe the output to another cut command, using the , as a delimiter.

$ sed -n /:1000:[A-Z]/p mypasswd | cut -d: -f5 | cut -d, -f1
Dave Edwards
Emma Jones
Frank Cassidy
Grace Kearns
Henry Adams
John Chapel

Answers to Explorational Exercises

Once more using the mypasswd file from the previous exercises, devise a Bash command that will select one individual from the Main Office to win a raffle contest. Use the sed command to only print out the lines for the Main Office, and then a cut command sequence to retrieve the first name of each user from these lines. Next you will want to randomly sort these names and only print out the top name from the list.

First explore how the parameter -R manipulates the output of the sort command. Repeat this command a couple times on your machine (note you will need to enclose 'Main Office' within single quotes, so sed will handle it as a single string):
```
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f1 | sort -R
```
Here is a solution to the problem:
```
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f1 | sort -R | head -1
```
How many people work in Finance, Engineering and Sales? (Consider exploring the uniq command.)

Keep building on top of what you learned from the previous exercises. Try the following:
```
$ sed -n /'Main Office'/p mypasswd
$ sed -n /'Main Office'/p mypasswd | cut -d, -f2
```
Notice now we do not care about the : as a delimiter. We just want the second field when we split the lines by the , characters.
```
$ sed -n /'Main Office'/p mypasswd | cut -d, -f2 | uniq -c
      4 Finance
      1 Engineering
      2 Sales
```
The uniq command will only output the unique lines (not the repeating lines) and the parameter -c tells uniq to count the occurences of the equal lines. There is a caveat here: uniq will only consider adjacent lines. When this is not the case you will have to use the sort command.
Now you want to prepare a CSV (comma separated values) file so you can easily import, from the mypasswd file in the previous example, the file names.csv into LibreOffice. The file contents will have the following format:
```
First Name,Last Name,Position
Carol,Smith,Finance
...
John,Chapel,Sales
```
Tip: Use the sed, cut, and paste commands to achieve the desired results. Note that the comma (,) will be the delimiter for this file.

Start with the sed and cut commands, building on top of what we learned from the previous exercises:
```
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d" " -f1 > firstname
```
Now we have the file firstname with the first names of our employees.
```
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d" " -f2 | cut -d, -f1 > lastname
```
Now we have the file lastname containing the surnames of each employee.

Next we determine which department each employee works in:
```
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f2 > department
```
Before we work on the final solution, try the following commands to see what type of output they generate:
```
$ cat firstname lastname department
$ paste firstname lastname department
```
And now for the final solution:
```
$ paste firstname lastname department | tr '\t' ,
$ paste firstname lastname department | tr '\t' , > names.csv
```
Here we use the command tr to translate \t, the tab separator, by a ,. tr is quite useful when we need to exchange one character for another. Be sure to review the man pages for both tr and paste. For example, we can use the -d option for the delimiter to make the previous command less complex:
```
$ paste -d, firstname lastname department
```
We used the paste command here once we needed to get you familiar with it. However we could have easily performed all of the tasks in a single command chain:
```
$ sed -n /'Main Office'/p mypasswd | cut -d: -f5 | cut -d, -f1,2 | tr ' ' , > names.csv
```
Suppose that the names.csv spreadsheet created in the previous exercise is an important file and we want to make sure nobody will tamper with it from the moment we send it to someone and the moment our recipient receives it. How can we insure the integrity of this file using md5sum?

If you look into the man pages for md5sum, sha256sum and sha512sum you will see they all start with the following text:

“compute and check XXX message digest”

Where “XXX” is the algoritm that will be used to create this message digest.

We will use md5sum as an example and later you can try with the other commands.
```
$ md5sum names.csv
61f0251fcab61d9575b1d0cbf0195e25  names.csv
```
Now, for instance, you can make the file available through a secure ftp service and send the generated message digest using another secure means of communication. If the file has been slightly modified the message digest will be completely different. Just to prove it, edit names.csv and change Jones to James as demonstrated here:
```
$ sed -i.backup s/Jones/James/ names.csv
$ md5sum names.csv
f44a0d68cb480466099021bf6d6d2e65  names.csv
```
Whenever you make files available for download, it is always a good practice to also distribute a message digest correspondent so people who download your file can produce a new message digest and check against the original. If you browse through https://kernel.org you will find the page https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/sha256sums.asc where you can obtain the sha256sum for all files available for download.
You promised yourself that you would read a classic book 100 lines per day and you decided to start with Mariner and Mystic by Herman Melville. Devise a command using split that will separate this book into sections of 100 lines each. In order to get the book in plain text format, search for it at https://www.gutenberg.org.

First we will get the whole book from the Project Gutenberg site, where you can get this and other books that are available in the public domain.
```
$ wget https://www.gutenberg.org/files/50461/50461-0.txt
```
You might need to install wget if it is not already installed in your system. Alternatively, you can also use curl. Use less to verify the book:
```
$ less 50461-0.txt
```
Now we will split the book into chunks of 100 lines each:
```
$ split -l 100 -d 50461-0.txt melville
```
50461-0.txt is the file we will be splitting. melville will be the prefix for the split files. The -l 100 specifies the number of lines and the -d option tells split to number the files (using the provided suffix). You can use nl on any of the splitted files (probably not on the last one) and confirm each one of them have 100 lines.
Using ls -l on the /etc directory, what kind of listing do you get? Using the cut command on the output of the given ls command how would you display only the file names? What about the filename and the owner of the files? Along with the ls -l and cut commands, utilize the tr command to squeeze multiple occurrences of a space into a single space to aid in formatting the output with a cut command.

The ls command by itself will give you just the names of the files. We can, however, prepare the output of the ls -l (the long listing) to extract more specific information.
```
$ ls -l /etc | tr -s ' ' ,
drwxr-xr-x,3,root,root,4096,out,24,16:58,acpi
-rw-r--r--,1,root,root,3028,dez,17,2018,adduser.conf
-rw-r--r--,1,root,root,10,out,2,17:38,adjtime
drwxr-xr-x,2,root,root,12288,out,31,09:40,alternatives
-rw-r--r--,1,root,root,401,mai,29,2017,anacrontab
-rw-r--r--,1,root,root,433,out,1,2017,apg.conf
drwxr-xr-x,6,root,root,4096,dez,17,2018,apm
drwxr-xr-x,3,root,root,4096,out,24,16:58,apparmor
drwxr-xr-x,9,root,root,4096,nov,6,20:20,apparmor.d
```
The -s parameter instructs tr to shrink the repeated spaces into a single instance of a space. The tr command works for any kind of repeating character you specify. Then we replace the spaces with a comma ,. We actually do not need to replace the spaces in our example so we will just omit the ,.
```
$ ls -l /etc | tr -s ' '
drwxr-xr-x 3 root root 4096 out 24 16:58 acpi
-rw-r--r-- 1 root root 3028 dez 17 2018 adduser.conf
-rw-r--r-- 1 root root 10 out 2 17:38 adjtime
drwxr-xr-x 2 root root 12288 out 31 09:40 alternatives
-rw-r--r-- 1 root root 401 mai 29 2017 anacrontab
-rw-r--r-- 1 root root 433 out 1 2017 apg.conf
drwxr-xr-x 6 root root 4096 dez 17 2018 apm
drwxr-xr-x 3 root root 4096 out 24 16:58 apparmor
```
If I want just the filenames then all that we need displayed is the ninth field:
```
$ ls -l /etc | tr -s ' ' | cut -d" " -f9
```
For the filename and the owner of a file we will need the ninth and the third fields:
```
$ ls -l /etc | tr -s ' ' | cut -d" " -f9,3
```
What if we just need the folder names and its owner?
```
$ ls -l /etc | grep ^d | tr -s ' ' | cut -d" " -f9,3
```
This exercise assumes you are on a real machine (not a virtual machine). You must also have a USB stick with you. Review the manual pages for the tail command and find out how to follow a file as text is appended to it. While monitoring the output of a tail command on the /var/log/syslog file, insert a USB stick. Write out the full command that you would use to get the Product, Manufacturer and the total amount of memory of your USB stick.
```
$ tail -f /var/log/syslog | grep -i 'product\:\|blocks\|manufacturer'
Nov  8 06:01:35 brod-avell kernel: [124954.369361] usb 1-4.3: Product: Cruzer Blade
Nov  8 06:01:35 brod-avell kernel: [124954.369364] usb 1-4.3: Manufacturer: SanDisk
Nov  8 06:01:37 brod-avell kernel: [124955.419267] sd 2:0:0:0: [sdc] 61056064 512-byte logical blocks: (31.3 GB/29.1 GiB)
```
Of course this is an example and the results may vary depending on your USB memory stick manufacturer. Notice now we use the -i parameter with the grep command as we are not sure if the strings we are searching for were in upper or lower case. We also used the | as a logical OR so we search for lines containing product OR blocks OR manufacturer.