4.3 Lesson 1
Certificate: |
Linux Essentials |
---|---|
Version: |
1.6 |
Topic: |
4 The Linux Operating System |
Objective: |
4.3 Where Data is Stored |
Lesson: |
1 of 2 |
Introduction
For an operating system, everything is considered data. For Linux, everything is considered a file: programs, regular files, directories, block devices (hard disks, etc.), character devices (consoles, etc.), kernel processes, sockets, partitions, links, etc. The Linux directory structure, starting from the root /
, is a collection of files containing data. The fact that everything is a file is a powerful feature of Linux as it allows for tweaking virtually every single aspect of the system.
In this lesson we will be discussing the different locations in which important data is stored as established by the Linux Filesystem Hierarchy Standard (FHS). Some of these locations are real directories which store data persistently on disks, whereas others are pseudo filesystems loaded in memory which give us access to kernel subsystem data such as running processes, use of memory, hardware configuration and so on. The data stored in these virtual directories is used by a number of commands that allow us to monitor and handle it.
Programs and their Configuration
Important data on a Linux system are — no doubt — its programs and their configuration files. The former are executable files containing sets of instructions to be run by the computer’s processor, whereas the latter are usually text documents that control the operation of a program. Executable files can be either binary files or text files. Executable text files are called scripts. Configuration data on Linux is traditionally stored in text files too, although there are various styles of representing configuration data.
Where Binary Files are Stored
Like any other file, executable files live in directories hanging ultimately from /
. More specifically, programs are distributed across a three-tier structure: the first tier (/
) includes programs that can be necessary in single-user mode, the second tier (/usr
) contains most multi-user programs and the third tier (/usr/local
) is used to store software that is not provided by the distribution and has been compiled locally.
Typical locations for programs include:
/sbin
-
It contains essential binaries for system administration such as
parted
orip
. /bin
-
It contains essential binaries for all users such as
ls
,mv
, ormkdir
. /usr/sbin
-
It stores binaries for system administration such as
deluser
, orgroupadd
. /usr/bin
-
It includes most executable files — such as
free
,pstree
,sudo
orman
— that can be used by all users. /usr/local/sbin
-
It is used to store locally installed programs for system administration that are not managed by the system’s package manager.
/usr/local/bin
-
It serves the same purpose as
/usr/local/sbin
but for regular user programs.
Recently some distributions started to replace /bin
and /sbin
with symbolic links to /usr/bin
and /usr/sbin
.
Note
|
The |
Apart from these directories, regular users can have their own programs in either:
-
/home/$USER/bin
-
/home/$USER/.local/bin
Tip
|
You can find out what directories are available for you to run binaries from by referencing the |
We can find the location of programs with the which
command:
$ which git /usr/bin/git
Where Configuration Files are Stored
The /etc
Directory
In the early days of Unix there was a folder for each type of data, such as /bin
for binaries and /boot
for the kernel(s). However, /etc
(meaning et cetera) was created as a catch-all directory to store any files that did not belong in the other categories. Most of these files were configuration files. With the passing of time more and more configuration files were added so /etc
became the main folder for configuration files of programs. As said above, a configuration file usually is a local, plain text (as opposed to binary) file which controls the operation of a program.
In /etc
we can find different patterns for config files names:
-
Files with an ad hoc extension or no extension at all, for example
group
-
System group database.
hostname
-
Name of the host computer.
hosts
-
List of IP addresses and their hostname translations.
passwd
-
System user database — made up of seven fields separated by colons providing information about the user.
profile
-
System-wide configuration file for Bash.
shadow
-
Encrypted file for user passwords.
-
Initialization files ending in
rc
:bash.bashrc
-
System-wide
.bashrc
file for interactive bash shells. nanorc
-
Sample initialization file for GNU nano (a simple text editor that normally ships with any distribution).
-
Files ending in
.conf
:resolv.conf
-
Config file for the resolver — which provide access to the Internet Domain Name System (DNS).
sysctl.conf
-
Config file to set system variables for the kernel.
-
Directories with the
.d
suffix:Some programs with a unique config file (
*.conf
or otherwise) have evolved to have a dedicated*.d
directory which help build modular, more robust configurations. For example, to configure logrotate, you will findlogrotate.conf
, but also thelogrotate.d
directories.This approach comes in handy in those cases where different applications need configurations for the same specific service. If, for example, a web server package contains a logrotate configuration, this configuration can now be placed in a dedicated file in the
logrotate.d
directory. This file can be updated by the webserver package without interfering with the remaining logrotate configuration. Likewise, packages can add specific tasks by placing files in the/etc/cron.d
directory instead of modifying/etc/crontab
.In Debian — and Debian derivatives — such an approach has been applied to the list of reliable sources read by the package management tool
apt
: apart from the classic/etc/apt/sources.list
, now we find the/etc/apt/sources.list.d
directory:$ ls /etc/apt/sources* /etc/apt/sources.list /etc/apt/sources.list.d:
Configuration Files in HOME
(Dotfiles)
At user level, programs store their configurations and settings in hidden files in the user’s home directory (also represented ~
). Remember, hidden files start with a dot (.
) — hence their name: dotfiles.
Some of these dotfiles are Bash scripts that customize the user’s shell session and are sourced as soon as the user logs into the system:
.bash_history
-
It stores the command line history.
.bash_logout
-
It includes commands to execute when leaving the login shell.
.bashrc
-
Bash’s initialization script for non-login shells.
.profile
-
Bash’s initialization script for login shells.
Note
|
Refer to the lesson on “Command Line Basics” to learn more about Bash and its init files. |
Other user-specific programs' config files get sourced when their respective programs are started: .gitconfig
, .emacs.d
, .ssh
, etc.
The Linux Kernel
Before any process can run, the kernel must be loaded into a protected area of memory. After that, the process with PID 1
(more often than not systemd
nowadays) sets off the chain of processes, that is to say, one process starts other(s) and so on. Once the processes are active, the Linux kernel is in charge of allocating resources to them (keyboard, mouse, disks, memory, network interfaces, etc).
Note
|
Prior to |
Where Kernels are Stored: /boot
The kernel resides in /boot
— together with other boot-related files. Most of these files include the kernel version number components in their names (kernel version, major revision, minor revision and patch number).
The /boot
directory includes the following types of files, with names corresponding with the respective kernel version:
config-4.9.0-9-amd64
-
Configuration settings for the kernel such as options and modules that were compiled along with the kernel.
initrd.img-4.9.0-9-amd64
-
Initial RAM disk image that helps in the startup process by loading a temporary root filesystem into memory.
System-map-4.9.0-9-amd64
-
The
System-map
(on some systems it will be namedSystem.map
) file contains memory address locations for kernel symbol names. Each time a kernel is rebuilt the file’s contents will change as the memory locations could be different. The kernel uses this file to lookup memory address locations for a particular kernel symbol, or vice-versa. vmlinuz-4.9.0-9-amd64
-
The kernel proper in a self-extracting, space-saving, compressed format (hence the
z
invmlinuz
;vm
stands for virtual memory and started to be used when the kernel first got support for virtual memory). grub
-
Configuration directory for the
grub2
bootloader.
Tip
|
Because it is a critical feature of the operating system, more than one kernel and its associated files are kept in |
The /proc
Directory
The /proc
directory is one of the so-called virtual or pseudo filesystems since its contents are not written to disk, but loaded in memory. It is dynamically populated every time the computer boots up and constantly reflects the current state of the system. /proc
includes information about:
-
Running processes
-
Kernel configuration
-
System hardware
Besides all the data concerning processes that we will see in the next lesson, this directory also stores files with information about the system’s hardware and the kernel’s configuration settings. Some of these files include:
/proc/cpuinfo
-
It stores information about the system’s CPU:
$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz stepping : 10 cpu MHz : 3696.000 cache size : 12288 KB (...)
/proc/cmdline
-
It stores the strings passed to the kernel on boot:
$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.9.0-9-amd64 root=UUID=5216e1e4-ae0e-441f-b8f5-8061c0034c74 ro quiet
/proc/modules
-
It shows the list of modules loaded into the kernel:
$ cat /proc/modules nls_utf8 16384 1 - Live 0xffffffffc0644000 isofs 40960 1 - Live 0xffffffffc0635000 udf 90112 0 - Live 0xffffffffc061e000 crc_itu_t 16384 1 udf, Live 0xffffffffc04be000 fuse 98304 3 - Live 0xffffffffc0605000 vboxsf 45056 0 - Live 0xffffffffc05f9000 (O) joydev 20480 0 - Live 0xffffffffc056e000 vboxguest 327680 5 vboxsf, Live 0xffffffffc05a8000 (O) hid_generic 16384 0 - Live 0xffffffffc0569000 (...)
The /proc/sys
Directory
This directory includes kernel configuration settings in files classified into categories per subdirectory:
$ ls /proc/sys abi debug dev fs kernel net user vm
Most of these files act like a switch and — therefore — only contain either of two possible values: 0
or 1
(“on” or “off”). For instance:
/proc/sys/net/ipv4/ip_forward
-
The value that enables or disables our machine to act as a router (be able to forward packets):
$ cat /proc/sys/net/ipv4/ip_forward 0
There are some exceptions, though:
/proc/sys/kernel/pid_max
-
The maximum PID allowed:
$ cat /proc/sys/kernel/pid_max 32768
Warning
|
Be extra careful when changing the kernel settings as the wrong value may result in an unstable system. |
Hardware Devices
Remember, in Linux “everything is a file”. This implies that hardware device information as well as the kernel’s own configuration settings are all stored in special files that reside in virtual directories.
The /dev
Directory
The device directory /dev
contains device files (or nodes) for all connected hardware devices. These device files are used as an interface between the devices and the processes using them. Each device file falls into one of two categories:
- Block devices
-
Are those in which data is read and written in blocks which can be individually addressed. Examples include hard disks (and their partitions, like
/dev/sda1
), USB flash drives, CDs, DVDs, etc. - Character devices
-
Are those in which data is read and written sequentially one character at a time. Examples include keyboards, the text console (
/dev/console
), serial ports (such as/dev/ttyS0
and so on), etc.
When listing device files, make sure you use ls
with the -l
switch to differentiate between the two. We can — for instance — check for hard disks and partitions:
# ls -l /dev/sd* brw-rw---- 1 root disk 8, 0 may 25 17:02 /dev/sda brw-rw---- 1 root disk 8, 1 may 25 17:02 /dev/sda1 brw-rw---- 1 root disk 8, 2 may 25 17:02 /dev/sda2 (...)
Or for serial terminals (TeleTYpewriter):
# ls -l /dev/tty* crw-rw-rw- 1 root tty 5, 0 may 25 17:26 /dev/tty crw--w---- 1 root tty 4, 0 may 25 17:26 /dev/tty0 crw--w---- 1 root tty 4, 1 may 25 17:26 /dev/tty1 (...)
Notice how the first character is b
for block devices and c
for character devices.
Tip
|
The asterisk ( |
Furthermore, /dev
includes some special files which are quite useful for different programming purposes:
/dev/zero
-
It provides as many null characters as requested.
/dev/null
-
Aka bit bucket. It discards all information sent to it.
/dev/urandom
-
It generates pseudo-random numbers.
The /sys
Directory
The sys filesystem (sysfs
) is mounted on /sys
. It was introduced with the arrival of kernel 2.6 and meant a great improvement on /proc/sys
.
Processes need to interact with the devices in /dev
and so the kernel needs a directory which contains information about these hardware devices. This directory is /sys
and its data is orderly arranged into categories. For instance, to check on the MAC address of your network card (enp0s3
), you would cat
the following file:
$ cat /sys/class/net/enp0s3/address 08:00:27:02:b2:74
Memory and Memory Types
Basically, for a program to start running, it has to be loaded into memory. By and large, when we speak of memory we refer to Random Access Memory (RAM) and — when compared to mechanical hard disks — it has the advantage of being a lot faster. On the down side, it is volatile (i.e., once the computer shuts down, the data is gone).
Notwithstanding the aforementioned — when it comes to memory — we can differentiate two main types in a Linux system:
- Physical memory
-
Also known as RAM, it comes in the form of chips made up of integrated circuits containing millions of transistors and capacitors. These, in turn, form memory cells (the basic building block of computer memory). Each of these cells has an associated hexadecimal code — a memory address — so that it can be referenced when needed.
- Swap
-
Also known as swap space, it is the portion of virtual memory that lives on the hard disk and is used when there is no more RAM available.
On the other hand, there is the concept of virtual memory which is an abstraction of the total amount of usable, addressing memory (RAM, but also disk space) as seen by applications.
free
parses /proc/meminfo
and displays the amount of free and used memory in the system in a very clear manner:
$ free total used free shared buff/cache available Mem: 4050960 1474960 1482260 96900 1093740 2246372 Swap: 4192252 0 4192252
Let us explain the different columns:
total
-
Total amount of physical and swap memory installed.
used
-
Amount of physical and swap memory currently in use.
free
-
Amount of physical and swap memory currently not in use.
shared
-
Amount of physical memory used — mostly — by
tmpfs
. buff/cache
-
Amount of physical memory currently in use by kernel buffers and the page cache and slabs.
available
-
Estimate of how much physical memory is available for new processes.
By default free
shows values in kibibytes, but allows for a variety of switches to display its results in different units of measurement. Some of these options include:
-b
-
Bytes.
-m
-
Mebibytes.
-g
-
Gibibytes.
-h
-
Human-readable format.
-h
is always comfortable to read:
$ free -h total used free shared buff/cache available Mem: 3,9G 1,4G 1,5G 75M 1,0G 2,2G Swap: 4,0G 0B 4,0G
Note
|
A kibibyte (KiB) equals 1,024 bytes while a kilobytes (KB) equals 1000 bytes. The same is respectively true for mebibytes, gibibytes, etc. |
Guided Exercises
-
Use the
which
command to find out the location of the following programs and complete the table:Program which
commandPath to Executable (output) User needs root
privileges?swapon
kill
cut
usermod
cron
ps
-
Where are the following files to be found?
File /etc
~
.bashrc
bash.bashrc
passwd
.profile
resolv.conf
sysctl.conf
-
Explain the meaning of the number elements for kernel file
vmlinuz-4.15.0-50-generic
found in/boot
:Number Element Meaning 4
15
0
50
-
What command would you use to list all hard drives and partitions in
/dev
?
Explorational Exercises
-
Device files for hard drives are represented based on the controllers they use — we saw
/dev/sd*
for drives using SCSI (Small Computer System Interface) and SATA (Serial Advanced Technology Attachment), but-
How were old IDE (Integrated Drive Electronics) drives represented?
-
And modern NVMe (Non-Volatile Memory Express) drives?
-
-
Take a look at the file
/proc/meminfo
. Compare the contents of this file to the output of the commandfree
and identify which key from/proc/meminfo
correspond to the following fields in the output offree
:free
output/proc/meminfo
fieldtotal
free
shared
buff/cache
available
Summary
In this lesson you have learned about the location of programs and their configuration files in a Linux system. Important facts to remember are:
-
Basically, programs are to be found across a three-level directory structure:
/
,/usr
and/usr/local
. Each of these levels may containbin
andsbin
directories. -
Configuration files are stored in
/etc
and~
. -
Dotfiles are hidden files that start with a dot (
.
).
We have also discussed the Linux kernel. Important facts are:
-
For Linux, everything is a file.
-
The Linux kernel lives in
/boot
together with other boot-related files. -
For processes to start executing, the kernel has to first be loaded into a protected area of memory.
-
The kernel job is that of allocating system resources to processes.
-
The
/proc
virtual (or pseudo) filesystem stores important kernel and system data in a volatile way.
Likewise, we have explored hardware devices and learned the following:
-
The
/dev
directory stores special files (aka nodes) for all connected hardware devices: block devices or character devices. The former transfer data in blocks; the latter, one character at a time. -
The
/dev
directory also contains other special files such as/dev/zero
,/dev/null
or/dev/urandom
. -
The
/sys
directory stores information about hardware devices arranged into categories.
Finally, we touched upon memory. We learned:
-
A program runs when it is loaded into memory.
-
What RAM (Random Access Memory) is.
-
What Swap is.
-
How to display the use of memory.
Commands used in this lesson:
cat
-
Concatenate/print file content.
free
-
Display amount of free and used memory in the system.
ls
-
List directory contents.
which
-
Show location of program.
Answers to Guided Exercises
-
Use the
which
command to find out the location of the following programs and complete the table:Program which
commandPath to Binary (output) User needs root
privileges?swapon
which swapon
/sbin/swapon
Yes
kill
which kill
/bin/kill
No
cut
which cut
/usr/bin/cut
No
usermod
which usermod
/usr/sbin/usermod
Yes
cron
which cron
/usr/sbin/cron
Yes
ps
which ps
/bin/ps
No
-
Where are the following files to be found?
File /etc
~
.bashrc
No
Yes
bash.bashrc
Yes
No
passwd
Yes
No
.profile
No
Yes
resolv.conf
Yes
No
sysctl.conf
Yes
No
-
Explain the meaning of the number elements for kernel file
vmlinuz-4.15.0-50-generic
found in/boot
:Number Element Meaning 4
Kernel version
15
Major revision
0
Minor revision
50
Patch number
-
What command would you use to list all hard drives and partitions in
/dev
?ls /dev/sd*
Answers to Explorational Exercises
-
Device files for hard drives are represented based on the controllers they use — we saw
/dev/sd*
for drives using SCSI (Small Computer System Interface) and SATA (Serial Advanced Technology Attachment), but-
How were old IDE (Integrated Drive Electronics) drives represented?
/dev/hd*
-
And modern NVMe (Non-Volatile Memory Express) drives?
/dev/nvme*
-
-
Take a look at the file
/proc/meminfo
. Compare the contents of this file to the output of the commandfree
and identify which key from/proc/meminfo
correspond to the following fields in the output offree
:free
output/proc/meminfo
fieldtotal
MemTotal
/SwapTotal
free
MemFree
/SwapFree
shared
Shmem
buff/cache
Buffers
,Cached
andSReclaimable
available
MemAvailable