103.3 Lesson 2

Certificate:

LPIC-1

Version:

5.0

Topic:

103 GNU and Unix Commands

Objective:

103.3 Perform basic file management

Lesson:

2 of 2

Introduction

How to Find Files

As you use your machine, files progressively grow in number and size. Sometimes it becomes difficult to locate a particular file. Fortunately, Linux provides find to quickly search and locate files. find uses the following syntax:

find STARTING_PATH OPTIONS EXPRESSION

STARTING_PATH: defines the directory where the search begins.
OPTIONS: controls the behavior and adds specific criteria to optimize the search process.
EXPRESSION: defines the search query.

$ find . -name "myfile.txt"
./myfile.txt

The starting path in this case is the current directory. The option -name specifies that the search is based on the name of the file. myfile.txt is the name of the file to search. When using file globbing, be sure to include the expression in quotation marks:

$ find /home/frank -name "*.png"
/home/frank/Pictures/logo.png
/home/frank/screenshot.png

This command finds all files ending with .png starting from /home/frank/ directory and beneath. If you do not understand the usage of the asterisk (*), it is covered in the previous lesson.

Using Criteria to Speed Search

Use find to locate files based on type, size or time. By specifying one or more options, the desired results are obtained in less time.

Switches to finding files based on type include:

-type f: file search.
-type d: directory search.
-type l: symbolic link search.

$ find . -type d -name "example"

This command finds all directories in the current directory and below, that have the name example.

Other criteria which could be used with find include:

-name: performs a search based on the given name.
-iname: searches based on the name, however, the case is not important (i.e. the test case myFile is similar to MYFILE).
-not: returns those results that do not match the test case.
-maxdepth N: searches the current directory as well as subdirectories N levels deep.

Locating Files by Modification Time

find also allows to filter a directory hierarchy based on when the file was modified:

$ sudo find / -name "*.conf" -mtime 7
/etc/logrotate.conf

This command would search for all files in the entire file system (the starting path is the root directory, i.e. /) that end with the characters .conf and have been modified in the last seven days. This command would require elevated privileges to access directories starting at the base of the system’s directory structure, hence the use of sudo here. The argument passed to mtime represents the number of days since the file was last modified.

Locating Files by Size

find can also locate files by size. For example, searching for files larger than 2G in /var:

$ sudo find /var -size +2G
/var/lib/libvirt/images/debian10.qcow2
/var/lib/libvirt/images/rhel8.qcow2

The -size option displays files of sizes corresponding to the argument passed. Some example arguments include:

-size 100b: files which are exactly 100 bytes.
-size +100k: files taller than 100 kilobytes.
-size -20M: files smaller than 20 megabytes.
-size +2G: files larger than 2 gigabytes.

Note	To find empty files we can use: `find . -size 0b` or `find . -empty`.

Acting on the Result Set

Once a search is done, it is possible to perform an action on the resulting set by using -exec:

$ find . -name "*.conf" -exec chmod 644 '{}' \;

This filters every object in the current directory (.) and below for file names ending with .conf and then executes the chmod 644 command to modify file permissions on the results.

For now, do not bother with the meaning of '{}' \; as it will be discussed later.

Using `grep` to Filter for Files Based on Content

grep is used to search for the occurrence of a keyword.

Consider a situation where we are to find files based on content:

$ find . -type f -exec grep "lpi" '{}' \; -print
./.bash_history
Alpine/M
helping/M

This would search every object in the current directory hierarchy (.) that is a file (-type f) and then executes the command grep "lpi" for every file that satisfies the conditions. The files that match these conditions are printed on the screen (-print). The curly braces ({}) are a placeholder for the find match results. The {} are enclosed in single quotes (') to avoid passing grep files with names containing special characters. The -exec command is terminated with a semicolon (;), which should be escaped (\;) to avoid interpretation by the shell.

Adding the option -delete to the end of an expression would delete all files that match. This option should be used when you are certain that the results only match the files that you wish to delete.

In the example below, find locates all files in the hierarchy starting at the current directory then deletes all files that end with the characters .bak:

$ find . -name "*.bak" -delete

Archiving Files

The `tar` Command (Archiving and Compresssion)

The tar command, short for “tape archive(r)”, is used to create tar archives by converting a group of files into an archive. Archives are created so as to easily move or backup a group of files. Think of tar as a tool that creates a glue onto which files can be attached, grouped and easily moved.

tar also has the ability to extract tar archives, display a list of the files included in the archive as well as add additional files to an existing archive.

The tar command syntax is as follows:

tar [OPERATION_AND_OPTIONS] [ARCHIVE_NAME] [FILE_NAME(S)]

OPERATION

Only one operation argument is allowed and required. The most frequently used operations are:

--create (-c): Create a new tar archive.
--extract (-x): Extract the entire archive or one or more files from an archive.
--list (-t): Display a list of the files included in the archive.

OPTIONS

The most frequently used options are:

--verbose (-v): Show the files being processed by the tar command.
--file=archive-name (-f archive-name): Specifies the archive file name.

ARCHIVE_NAME

The name of the archive.

FILE_NAME(S)

A space-separated list of file names to be extracted. If not provided the entire archive is extracted.

Creating an Archive

Let’s say we have a directory named stuff in the current directory and we want to save it to a file named archive.tar. We would run the following command:

$ tar -cvf archive.tar stuff
stuff/
stuff/service.conf

Here’s what those switches actually mean:

-c: Create an archive.
-v: Display progress in the terminal while creating the archive, also known as “verbose” mode. The -v is always optional in these commands, but it is helpful.
-f: Allows to specify the filename of the archive.

In general to archive a single directory or a single file on Linux, we use:

tar -cvf NAME-OF-ARCHIVE.tar /PATH/TO/DIRECTORY-OR-FILE

Note	`tar` works recursively. It will perform the required action on every subsequent directory inside the directory specified.

To archive multiple directories at once, we list all the directories delimiting them by a space in the section /PATH/TO/DIRECTORY-OR-FILE:

$ tar -cvf archive.tar stuff1 stuff2

This would produce an archive of stuff1 and stuff2 in archive.tar

Extracting an Archive

We can extract an archive using tar:

$ tar -xvf archive.tar
stuff/
stuff/service.conf

This will extract the contents of archive.tar to the current directory.

This command is the same as the archive creation command used above, except the -x switch that replaces the -c switch.

To extract the contents of the archive to a specific directory we use -C:

$ tar -xvf archive.tar -C /tmp

This will extract the contents of archive.tar to the /tmp directory.

$ ls /tmp
stuff

Compressing with `tar`

The GNU tar command included with Linux distributions can create a .tar archive and then compress it with gzip or bzip2 compression in a single command:

$ tar -czvf name-of-archive.tar.gz stuff

This command would create a compressed file using the gzip algorithm (-z).

While gzip compression is most frequently used to create .tar.gz or .tgz files, tar also supports bzip2 compression. This allows the creation of bzip2 compressed files, often named .tar.bz2, .tar.bz or .tbz files.

To do so, we replace -z for gzip with -j for bzip2:

$ tar -cjvf name-of-archive.tar.bz stuff

To decompress the file, we replace -c with -x, where x stands for “extract”:

$ tar -xzvf archive.tar.gz

gzip is faster, but it generally compresses a bit less, so you get a somewhat larger file. bzip2 is slower, but it compresses a bit more, so you get a somewhat smaller file. In general, though, gzip and bzip2 are practically the same thing and both will work similarly.

Alternatively we may apply gzip or bzip2 compression using gzip command for gzip compressions and the bzip command for bzip compressions. For example, to apply gzip compression, use:

gzip FILE-TO-COMPRESS

gzip: creates the compressed file with the same name but with a .gz ending.
gzip: removes the original files after creating the compressed file.

The bzip2 command works in a similar fashion.

To uncompress the files we use either gunzip or bunzip2 depending on the algorithm used to compressed a file.

The `cpio` Command

cpio stands for “copy in, copy out”. It is used to process archive files such as *.cpio or *.tar files.

cpio performs the following operations:

Copying files to an archive.
Extracting files from an archive.

It takes the list of files from the standard input (mostly output from ls).

To create a cpio archive, we use:

$ ls | cpio -o > archive.cpio

The -o option instructs cpio to create an output. In this case, the output file created is archive.cpio. The ls command lists the contents of the current directory which are to be archived.

To extract the archive we use :

$ cpio -id < archive.cpio

The -i option is used to perform the extract. The -d option would create the destination folder. The character < represents standard input. The input file to be extracted is archive.cpio.

The `dd` Command

dd copies data from one location to another. The command line syntax of dd differs from many other Unix programs, it uses the syntax option=value for its command line options rather than the GNU standard -option value or --option=value formats:

$ dd if=oldfile of=newfile

This command would copy the content of oldfile into newfile, where if= is the input file and of= refers to the output file.

Note	The `dd` command typically will not output anything to the screen until the command has finished. By providing the `status=progress` option, the console will display the amount of work getting done by the command. For example: `dd status=progress if=oldfile of=newfile`.

dd is also used in changing data to upper/lower case or writing directly to block devices such as /dev/sdb:

$ dd if=oldfile of=newfile conv=ucase

This would copy all the contents of oldfile into newfile and capitalise all of the text.

The following command will backup the whole hard disk located at /dev/sda to a file named backup.dd:

$ dd if=/dev/sda of=backup.dd bs=4096

Guided Exercises

Consider the following listing:

$ find /home/frank/Documents/ -type d
/home/frank/Documents/
/home/frank/Documents/animal
/home/frank/Documents/animal/domestic
/home/frank/Documents/animal/wild

What kind of files would this command output?
In which directory does the search begin?

A user wishes to compress his backup folder. He uses the following command:
```
$ tar cvf /home/frank/backup.tar.gz /home/frank/dir1
```
Which option is lacking to compress the backup using the gzip algorithm?

Explorational Exercises

As system administrator, it is required to perform regular checks in order to remove voluminous files. These voluminous files are located in /var and end with a .backup extension.
- Write down the command, using find, to locate these files:
- An analysis of the sizes of these files reveals that they range from 100M to 1000M. Complete the previous command with this new information, so that you may locate those backup files ranging from 100M to 1000M:
- Finally, complete this command, with the delete action so that these files will be removed:
In the /var directory, there exist four backup files:
```
db-jan-2018.backup
db-feb-2018.backup
db-march-2018.backup
db-apr-2018.backup
```
- Using tar, specify the command that would create an archive file with the name db-first-quarter-2018.backup.tar:
- Using tar, specify the command that would create the archive and compress it using gzip. Take note that the resulting file name should end with .gz:

Summary

In this section, you learned:

How to find files with find.
How to add search criteria based on time, file type or size by supplying argument to find.
How to act on a returned set.
How to archive, compress and decompress files using tar.
Processing archives with cpio.
Copying files with dd.

Answers to Guided Exercises

Consider the following listing:

$ find /home/frank/Documents/ -type d
/home/frank/Documents/
/home/frank/Documents/animal
/home/frank/Documents/animal/domestic
/home/frank/Documents/animal/wild

What kind of files would this command output?

Directories.
In which directory does the search begins?

/home/frank/Documents

A user wishes to compress his backup folder. He uses the following command:
```
$ tar cvf /home/frank/backup.tar.gz /home/frank/dir1
```
Which option is lacking to compress the backup using the gzip algorithm?

Option -z.

Answers to Explorational Exercises

As system administrator, it is required of you to perform regular checks in order to remove voluminous files. These voluminous files are located in /var and end with a .backup extension.
- Write down the command, using find, to locate these files:
  $ find /var -name *.backup
- An analysis of the sizes of these files reveals that they range from 100M to 1000M. Complete the previous command with this new information, so that you may locate those backup files ranging from 100M to 1000M:
  $ find /var -name *.backup -size +100M -size -1000M
- Finally, complete this command, with the delete action so that these files will be removed:
  $ find /var -name *.backup -size +100M -size -1000M -delete

In the /var directory, there exist four backup files:

db-jan-2018.backup
db-feb-2018.backup
db-march-2018.backup
db-apr-2018.backup

Using tar, specify the command that would create an archive file with the name db-first-quarter-2018.backup.tar:

$ tar -cvf db-first-quarter-2018.backup.tar db-jan-2018.backup db-feb-2018.backup db-march-2018.backup db-apr-2018.backup

Using tar, specify the command that would create the archive and compress it using gzip. Take note that the resulting file name should end with .gz:
```
$ tar -zcvf db-first-quarter-2018.backup.tar.gz db-jan-2018.backup db-feb-2018.backup db-march-2018.backup db-apr-2018.backup
```