051.1 Lesson 1

Certificate:

Open Source Essentials

Version:

1.0

Topic:

051 Software Fundamentals

Objective:

051.1 Software Components

Lesson:

1 of 1

Introduction

Free and open source software — often abbreviated as FOSS — has become an integral part of our everyday lives, mostly without us being aware of it. FOSS, for example, can be found behind all of our activities on the internet in some form: on the computer where we view web pages in the browser, or on servers that store these web pages and deliver them as soon as we call them up.

What is Software?

Before we look at all the specifics of free and open source software, however, we need to clarify what software actually is. Let’s start with a very general description: Software is the non-physical, immaterial part of computers in any form. Software ensures that the physical parts (the hardware) of the computer interact and that the computer can accept commands and execute tasks.

A smartphone, a laptop, or a server in the data center is just a machine made of metal and plastic when it is switched off. As soon as it is turned on, software starts up, in the form of coded command sequences that control the individual components of this machine, that allow the user to interact with the machine, and that perform very specific tasks by invoking individual applications.

It is the work of software developers to analyze the tasks that the computer is supposed to perform and to specify them in a way that allows the computer to implement them. The tools used by the developers are as numerous and diverse as the tasks performed by the software.

Some software tasks are very closely connected to the hardware and the architecture of the computer, e.g., the addressing and management of memory or the handling of different processes. System programmers therefore work close to the hardware.

Application developers, on the other hand, focus more on the user and program applications that enable users to perform their tasks both efficiently and intuitively. An example of a complex application is a word processing program that provides all the functions for text formatting in menus or buttons and also displays the text as it might ultimately be printed.

In general, an algorithm is a way of solving a problem. For instance, to calculate an average, the normal algorithm is to add a collection of values and divide the sum by the total number of values. Although algorithms are traditionally designed and carried out by programmers, algorithms are also generated nowadays by artificial intelligence.

The concepts in this chapter can help you understand the strengths and risks of FOSS, make informed choices, and even decide whether you want to be a software developer.

Programming Languages

Programming languages are highly structured, artificial languages that tell a computer what to do. Programs are usually written in text, but some languages are written in graphical form. Software developers write instructions (called code) to the computer in this artificial language. However, the computer hardware does not directly execute this code. Hardware can directly execute only a series of bit patterns stored in memory, called machine code or machine language. All programming languages are either converted into machine code by a compiler or interpreted by another machine code program called an interpreter to make the hardware execute these instructions.

Some of the most widely used programming languages currently include Python, JavaScript, C, C++, Java, C#, Swift, and PHP. Each of these programming languages has its own unique strengths and weaknesses, and the choice of language depends on the project and the needs of the developer. For example, Java is a popular choice for developing large-scale enterprise applications, while Python is often used for scientific computing and data analysis.

Developers have shown impressive creativity in designing programming languages. Originally, they were low-level languages that resembled the instructions in the computer. Languages have become more and more high-level, meaning that they try to represent powerful combinations of instructions is brief terms. Some languages reflect the way people naturally think, while preserving the rigor necessary to run correctly.

About 400 programming languages are currently recognized, although many are used only in very niche applications or legacy environments. Each was developed in order to solve in certain tasks.

Characteristics, Syntax, and Structure of Programming Languages

The choice of programming language can have a significant impact on performance, scalability, and ease of development in a software project. These sections lay out important elements of languages.

Characteristics of Programming Languages

Some of the common characteristics and qualities of programming languages include:

Concurrency: Concurrency denotes handling multiple tasks simultaneously, either by running on different hardware processors or by alternating the tasks' use of a single processor. The degree of concurrency supported by a programming language can greatly affect its performance and scalability, especially for applications that require real-time processing or large amounts of data. Each separate piece of work might be called a process, a task, or a thread.
Memory management: Memory management is allocation and freeing of memory in a program. Depending on the language or runtime environment, memory management can be done manually by the programmer or handled automatically. Proper memory management is crucial for ensuring that a program uses memory effectively, and that it does not run out of memory or cause other problems. If a program fails to free unused memory, the program causes a memory leak that gradually increases the use of memory until the program crashes or negative performance effects are noticeable.
Shared memory: Shared memory is a type of interprocess communication mechanism that enables multiple processes to read and manipulate a common region of memory. Shared memory is common in hardware such as disk drives, and can also be an efficient way to share data between processes. But the mechanism requires careful synchronization and management to prevent data corruption. An error known as a race condition occurs if one process makes an unanticipated change to data while another process is using it.
Message passing: Message passing is a communication mechanism between processes that enables them to exchange data and coordinate their activities. This is commonly used in concurrent programming to achieve interprocess communication, and can be implemented through various mechanisms such as sockets, pipes, or message queues.
Garbage collection: Garbage collection is an automatic memory management technique used by some programming languages to reclaim memory that is no longer being used while a process is running. This can help prevent memory leaks and make it easier for developers to write correct and efficient code, but it can also introduce performance overhead and make control over the precise behavior of the program more difficult.
Data types: Data types determine what type of information can be represented in the program. The data types can be predefined in the language or user-defined, and can include integers, floating-point numbers (i.e., approximations of real numbers), strings, arrays, and others.
Input and output (I/O): Input and output are mechanisms for reading and writing data to and from a program. Input can come from a variety of sources, such as user clicks and keyboard input, a file, or a network connection, while output can be sent to a variety of destinations, such as a display, a file, or a network connection. I/O allows programs to interact with the outside world and exchange information with other systems.
Error handling: Error handling detects and responds to errors that occur during the execution of a program. This includes errors such as division by zero and a requested file that is not found. Error handling allows programs to continue running even when errors occur, improving their reliability and robustness.

The concepts just listed are fundamental to understanding how programming languages work and how to write efficient and maintainable code.

Syntax of Programming Languages

The syntax of a programming language refers to rules for writing program statements and expressions. It is important for the syntax to be well-defined and consistent, so that the programmer can effectively write and understand their code. The following are the building blocks of most programming languages:

Procedures and functions: Procedures and functions are used to define reusable blocks of code that can be called multiple times.
Variables: Variables represent pieces of memory, and store data that can be manipulated and passed between procedures and functions.
Operators: Operators are keywords or symbols (such as + and -) that assign values to variables and perform arithmetic operations.
Control structure: Generally, program code is executed in the order that it is written, but conditional statements change the flow of execution. Which code is executed next is based on various conditions, such as the contents of memory, the state of the keyboard, packets arriving from the network, and so on. The loop statement, a special form of conditional statement, is useful for performing the same operations on a series of data sets. An exception, which invokes special code when an error occurs, is another control structure.

The syntax and behavior of these constructs can vary between programming languages, and the choice of language can have a big impact on the readability and maintainability of the code.

Libraries

A good programming language should make it easy to develop programs and easy to reuse existing code. Many programming languages have a mechanism to organize procedures and functions into parts that can be reused in other programs.

A library is a collection of procedures and functions in support of a particular feature or goal, combined into a single file. The availability of many easy-to-use libraries is another very important requirement of a good programming language. For example, Python is widely recognized as a good language for developing AI-related programs because it has a number of libraries suitable for AI processing.

With the increasing size and complexity of programs, libraries as ready-made building blocks are becoming more and more important. This is especially true in the open source world, where people are comfortable taking code that others have created and reusing it. As a result, an ecosystem of libraries has developed for each programming language, and package managers such as composer for PHP, pip for Python, and gems for Ruby make it easy to install libraries.

Libraries are also important compiled languages. Combining multiple binary files and pre-compiled libraries to obtain a single executable file is called linking, and the tool that performs this operation is called a linker. There are two types of linking: static linking, in which only the necessary library code is included in the final application’s executable file, and dynamic linking, in which a library installed in the system is shared by all applications that use that library. Currently, dynamic linking is the preferred approach and is characterized by smaller application executables and less memory usage at runtime.

Note that since the same libraries can be used by multiple programs, differences between versions of a library can be even more of an issue than with applications. Let’s digress for a moment and remember how to look at version numbers. Semantic versioning is commonly used, which indicates versions by three numbers separated by dots. A typical version might be 2.39.16, which indicates a major version of 2 (a number that is likely to change only once every few years), a minor version of 39 within the major version (which may update every few months to contain important feature changes), and a fast-moving revision of 16 (which can change because of a single bug fix). Later versions and revisions have higher numbers.

A Very Simple Example

Let’s look at a very simple example of a computer program in the Python language to get a rough idea of a few of the elements mentioned.

Stated in natural language, the program is supposed to do the following: “Ask the user to enter a number and check whether this number is even or odd. Finally, output the result.”

And here is the code we can save in the file simpleprogram.py:

num = int(input("Enter a number: "))
if (num % 2) == 0:
   print("The given number is EVEN.")
else:
   print("The given number is ODD.")

Even in these few lines of code, we can find many of the characteristics and syntax elements mentioned above:

In line 1 we set the variable num and assign it a value with the = operator.
The assigned value corresponds to the input of the user (via the input() function ). In addition, the int() function ensures that this input is converted to the integer data type, if possible. The expression that is passed to a function within parentheses is called a parameter or argument.
If the user enters a string of letters, Python would print an error as part of its error handling. If a decimal number is entered, the function int() converts it to the base number, e.g. 5.73 to 5.
In the following lines, the control structure stating a condition, with the keywords if and else, controls what happens in each of the two possible cases (the number is either even or odd).
First, the modulo operator tests whether (if) dividing the entered number by 2 yields the value 0 (i.e., no remainder) — in this case, the number is even. The doubled == is the “is equal to” comparison operator, which is different from the assignment operator = in line 1.
In the other case (else), i.e. when the division by 2 produces a result unequal 0, the entered number must be odd.
In both cases, the function print() returns the result as output in text form.

And this is what it looks like when we run the program on the command line:

$ python simpleprogram.py
Enter a number: 5
The given number is ODD.

When you consider how much language logic is already involved in this small example, you gain an idea of what complex software distributed over thousands of files is capable of; for example, operating systems such as Microsoft Windows, macOS, or Linux, which make all the hardware of a computer available and at the same time ensure that the users can install all other desired applications and use them for work or fun.

Machine Code, Assembly Language and Assemblers

As mentioned earlier, hardware can directly execute only a series of bit patterns called machine code. The CPU reads a bit pattern from memory in units of a word (8 to 64 bits) and executes the correlating instruction. Individual instructions are quite simple, for example, “copy the contents of memory location A to memory location B,” “multiply the contents of memory location C by the contents of memory location D,” or “read the data that has arrived at the device at address X.” In the era of 8-bit CPUs, some people could memorize all the bit patterns used in machine code and write programs directly. Nowadays, the number of instruction patterns has increased by an order of magnitude, and trying to remember all of these patterns is impractical.

Machine code is a sequence of bit patterns, or 0s and 1s, which is not at all intuitive to humans. To make programming more intuitive, assembly language was created, in which the instructions were given names and could be specified by strings. In assembly language, instructions that correspond one-to-one to the machine code are written one at a time. Instructions might look like:

move	[B], [A]		   copy the contents of memory A to memory B
multi	R1, [C], [D]	multiply the contents of memory C by the contents of memory D
input	R1, [X]			read the data that has arrived at the device at address X

One instruction in assembly language corresponds to one instruction in machine code, which corresponds to the exact instruction that the hardware can understand and perform. Advantages of assembly language over machine language include:

Improved readability and maintainability: Assembly language is much easier to read and write than machine code. This makes it easier for programmers to understand, debug, and maintain their code.
Address computation automation: Machine code programming can also use the concept of variables and functions, but everything must be expressed in terms of memory addresses. Assembly language also assigns names to memory addresses, which makes it easier to express the logic of a program.

Because assembly language has access to all of the hardware’s functionality, it is commonly used in the following situations:

Architecture dependent part of the operating system: Using dedicated instructions that are specific to one CPU architecture, for access to initialization and security features, can be done only in assembly language.
Developing low-level system components: Assembly language is used to develop system components that need to interact directly with the computer’s hardware, such as device drivers, firmware, and the basic input/output system (BIOS). In particular, high-speed devices that require pushing hardware performance to its limits often need drivers and firmware programmed in assembly language.
Programming microcontrollers: Assembly language is also used to program microcontrollers, which are small, low-powered computers used in a wide range of embedded systems from toys to industrial control. Some microcontrollers have memory capacities of just several hundred bytes, and are commonly programmed in assembly language.

Assembly language code is converted to machine code before being executed by an application called an assembler. The assembler is the oldest programming tool and has brought a number of advantages that were unthinkable in machine code programming. To confuse matters, sometimes people refer to assembly language as assembler.

Machine code and assembly language differ from one hardware processor to another. They are called “low-level languages” because they operate directly on hardware. However, the concepts of computation and input/output are the same across all processors. If the common concepts can be expressed in a way that is easier for humans to understand, programming efficiency can be dramatically improved. This is where “high-level languages” come in.

Compiled Languages

Compiled languages are programming languages that are translated either into machine code or into an intermediate format called bytecode. Bytecode is executed on the target computer by a runtime virtual machine. The virtual machine translates the bytecode into the proper machine code for each computer. Bytecode allows programs to be platform-agnostic and to run on any system with a compatible virtual machine.

The translation of the source code written in a high-level programming language into machine code or bytecode is done by a compiler. Examples of compiled languages that produce machine code directly include C and C++. Languages that produce bytecode include Java and C#.

The choice between machine code and bytecode depends on the requirements of the project, such as performance, platform agnosticism, and ease of development.

Interpreted Languages

Interpreted languages are programming languages that are executed by an interpreter, rather than being compiled into machine code. The interpreter reads the source code and executes the statements contained in it. An interpreter is able to directly process the source code, without transforming it into another file format. Thus, unlike a compiler, which translates the entire program into machine code before executing it, an interpreter reads each line of code and executes it immediately, allowing the programmer to see the results of each line as they are executed.

Interpreted languages are commonly used for scripting, which refers to short programs that automate tasks, for command-line interfaces, and for batch and job control. Scripts written in interpreted languages can be easily modified and executed without the need for recompilation, making them well suited for tasks that require rapid prototyping or flexible and rapid iteration. With this convenience come some potential drawbacks. For instance, an interpreted program runs more slowly than its equivalent compiled program, and the source code is accessible to anyone in possession of the script.

Examples of interpreted languages include Python, Ruby, and JavaScript. Python is widely used in scientific computing, data analysis, and machine learning, while Ruby is often used in web development and for creating automation scripts. JavaScript is a client-side scripting language embedded in web browsers to create dynamic and interactive web pages.

Data-Oriented Languages

Data-oriented languages are programming languages that are optimized for processing and manipulating large amounts of data. They are designed to efficiently handle large sets of structured or unstructured data and provide a set of tools for working with databases, data structures, and algorithms for data processing and analysis.

Data-oriented languages are used in a variety of applications, including data science, big data analytics, machine learning, and database programming. They are well suited for tasks that involve processing and analyzing large amounts of data, such as data cleaning and transformation, data visualization, and statistical modeling.

Examples of data-oriented languages include SQL (short for Structured Query Language), R, and MATLAB. SQL is a standard language used for managing relational databases and is widely used in business and industry. R is a programming language and environment for statistical computing and graphics and is widely used in data science and machine learning. MATLAB is a numerical computing environment and programming language used in a wide range of applications, including signal processing, image processing, and computational finance.

Programming Paradigms

Besides the specifics of programming languages, programming paradigms determine the particular solution approach. We can think of a paradigm as a basic strategy with which we approach a task, depending on the specific requirements and conditions.

A comparable example is the construction of a house: Whether masons erect the walls brick by brick or ready-cast concrete components are assembled on site is a fundamental decision that depends on requirements and circumstances. What features do you want the house to have? Where is it located? Is it connected to other houses?

In a similar way, paradigms set the direction of programming: whether and in what way, for example, a software project is broken into smaller, separate parts. Each programming language is suited best to some particular paradigm. Therefore, the choice of paradigm is closely related to the choice of programming language.

The following paradigms are common in programming:

Object-oriented programming (OOP): OOP is based on the concept of objects, which are instances of classes that encapsulate data and behavior. For instance, a language might offer a rectangle as a class to help the programmer display a box on the screen.

OOP focuses on the object-level manipulation of data. OOP makes it easier to write code that is maintainable, reusable, and extensible, and is widely used in desktop software, video games, and web applications. Examples of object-oriented programming languages include Java, C#, and Python.
Procedural programming: Procedural programming performs tasks through procedures, or blocks of code that can be executed in a specific order. This makes it easy to write structured code that is easy to follow, but can lead to code that is less flexible and harder to maintain as the size and complexity of the project grows. Examples of procedural programming languages include C, Pascal, and Fortran.

Other approaches to software development are in use today, for which some languages are better suited than others. Additionally, drag-and-drop interfaces allow non-programmers to write programs, and many online services have recently started to generate code through artificial intelligence when given plain-language instructions.

In conclusion, each programming paradigm has its own strengths and weaknesses, and the choice of paradigm often depends on the needs of the project, the experience and preferences of the developer, and the constraints of the platform and development environment. Understanding the different types of paradigms can help you choose the right paradigm for your needs, and can also help you write better and more efficient code.

Guided Exercises

What is the purpose of functions?
What is the advantage of bytecode over a machine code file?
What is the advantage of a machine code file over bytecode?

Explorational Exercises

What are some disadvantages of dividing a program into a large number of processes or tasks?
You have found several open source packages, offered in different versions, that provide features you need for your program. What are some criteria for choosing a package?
In addition to OOP and procedural development paradigms, what other software development approaches exist and what is a programming language which best supports each approach?

Summary

In this lesson you have learned what software is and how it is developed with the help of programming languages. The numerous programming languages differ not only in their syntax, but also, for example, in their management of hardware resources or handling of data structures.

Programming languages also differ in how the source code, readable by humans, is converted by an interpreter or compiler into the final machine code to be processed by the computer.

Programming paradigms determine the strategy of software projects and thus also the choice of suitable programming languages, depending on the requirements and the size of the respective project.

Answers to Guided Exercises

What is the purpose of functions?

Functions encapsulate certain common activities, such as outputting a string. By making a function, you can allow your program and other programs to perform the function conveniently and repeatedly without having to write their own code for it.
What is the advantage of bytecode over a machine code file?

The bytecode file can run on many different computers, where a virtual machine turns the code into machine code. For instance, JavaScript runs in many browsers on many types of computers.
What is the advantage of a machine code file over bytecode?

Machine code runs as fast as possible. Bytecode runs more slowly because the virtual machine must turn it into machine code while running the bytecode.

Answers to Explorational Exercises

What are some disadvantages of dividing a program into a large number of processes or tasks?

When a program is divided into processes, they must communicate with each other. If they work on a lot of data in common, the processes could expend a lot of overhead in data exchange and in protecting data from multiple, simultaneous changes (race conditions). Processes also incur overhead when they start up and terminate. The more processes you have, the more complex the program and its interactions become, so errors can be harder to find.

The functional paradigm tends to make it easier to divide programs into many processes, because of immutability. Immutable data does not suffer from race conditions.
You have found several open source packages, offered in different versions, that provide features you need for your program. What are some criteria for choosing a package?

Check bug reports and security advisories for the packages, because some are very buggy and even unsafe. Sometimes the latest version is not the best, because a security flaw might have been introduced into it.

Peruse the forum where developers talk about the package, to see that it is actively maintained. Your program will probably be in use for a long time, and you want the package to be available and robust over time as well.

Try different packages to check their performance, as well as their correctness.

Most packages depend on functions found in other packages (dependencies), so a weakness in one of the dependencies can affect your program.
In addition to OOP and procedural development paradigms, name some other software development approaches and a programming language that best supports each approach.

In addition to OOP and procedural development paradigms, other types include the following.

Functional programming emphasizes the use of functions and mathematical concepts, such as lambdas and closures, to write code that is based on the evaluation of expressions rather than the execution of statements. Functional programming treats functions as first-class citizens — so that they can be manipulated by the program — and emphasizes immutability, or working with variables that cannot change after being initially set. This makes it easier to reason about and test code, as well as to write concurrent and parallel applications. Examples of functional programming languages include Erlang, Haskell, Lisp, and Scheme.

Imperative languages focus on the statements required to control the flow of the program’s transitions from and to various states.

Declarative languages describe what actions to take and the logic behind the instructions. The order in which the instructions are carried out isn’t specified. These instructions, function calls, and other statements can be reordered and optimized by compilers so long as they preserve the underlying logic.

Natural programming is a programming paradigm that uses natural language or other human-friendly representations to describe the desired behavior of a program. The idea is to make programming accessible to people who may not have formal training in computer science. Examples of natural programming languages include Scratch and Alice.