056.2 Lesson 1
Certificate: |
Open Source Essentials |
---|---|
Version: |
1.0 |
Topic: |
056 Collaboration and Communication |
Objective: |
056.2 Source Code Management |
Lesson: |
1 of 1 |
Introduction
Anyone who has ever edited a text document in a team knows the problems with such collaboration: Which version is the current one? Where is this version saved? Is it currently being edited by someone? Who has made which comments or changes to the text — when and why? The result is often differing versions of the document and, in the worst case, a collection of versions that nobody has a grasp of.
Now imagine a software project with hundreds of files, on which developers from all over the world are working by developing new features, fixing bugs, splitting off parts and developing them separately, and so on. Such a development process is no longer manageable without suitable tools.
Special software for source code management (SCM), also known as version control systems (VCS) or revision control systems (RCS), provides a remedy here. It eliminates the problems that were just outlined.
In the world of software development, SCM stands as a fundamental pillar, safeguarding the integrity of your codebase. Picture it as a diligent guardian, meticulously tracking every tweak and turn made to your source code over time.
Source Code Management System and Repository
The source code management system is the heart of a software project. Although important work goes on in discussion forums and other places, the SCM system represents both the history of the project and its current state as a living entity.
The source code repository is a kind of digital workshop for a project. Just as a physical workshop stores all the tools and items needed to manufacture a workpiece, a source code repository stores all the files, documents, and code related to a software project. It provides a structured environment for organizing and managing the project’s assets.
Information is normally stored in the form of a directory tree. SCM systems typically use a client-server model, where any user can pull data from the repository as well as push data into it. The system keeps track of who made each change and when it was made, ensuring transparency and accountability within the team.
Imagine you’re working on a project with multiple developers, and a bug is discovered in the code. With an SCM, you can easily examine the changes between versions to pinpoint the specific code change responsible for the bug.
Because the system remembers each version of the files as they change, a user has access to any of these versions and can revert to earlier versions (in case incorrect changes were made). So the repository is also a kind of archive, granting access to every change ever made to the project, and to the status of the project at any moment in its history.
SCM systems save space by keeping track of changes to a file instead of storing the complete file each time a change is made. This efficient storage method ensures that historical versions are accessible without consuming excessive resources.
Many tools, such as continuous integration/continuous delivery (CI/CD) and testing, revolve around the SCM system. People also build their reputations through the system by making their contributions visible to everyone. Therefore, source code management is not just about tracking changes; it’s about preserving the integrity of your codebase and fostering collaboration among developers, ensuring that your projects can evolve in a dynamic and ever-changing landscape.
Popular SCM systems include Git, Subversion, and CVS. Like many other software functions, SCM is now often provided by specialized vendors. In other words, it is Software as a Service (SaaS) or a “cloud” service: Participants have the software on their local systems to manage their personal changes, but upload their changes to a central repository that benefits from typical cloud features such as 24/7 uptime, backups, and secure access.
Millions of developers now use cloud services, notably GitHub. GitLab is an alternative based on open source code. Both GitHub and GitLab allow people to work in the vendor’s cloud repositories or set up a local version of the SCM system. One advantage of working on those systems is the reputation one can gain through their “star” systems, which allow users to rate each other’s work. The cloud services add extra attractions such as change request trackers, rating systems, wikis, and discussion forums.
Repositories can contain both personal and corporate projects, meaning they are not necessarily exclusive to enterprise use. Any developer who wants to start a development project or work on an open source project, copying and modifying it according to their needs, can do so. In all these cases, it is important to have firm control over who can access your repository.
Understanding the terminology surrounding version control and source code management is essential in navigating its use. In the following sections, we take a closer look at some of these concepts and terms.
Commits, Tags, and Branches
A developer creates a commit each time a change is uploaded into the respository. Commits represent snapshots of changes made to the codebase at a particular point in time. Each commit includes metadata such as the author’s name, a timestamp, and a descriptive message explaining the changes. Commits help developers track the evolution of the codebase and understand the history of specific changes.
Consider a team of developers working on a web application. Each time they make changes to the codebase, they create a commit to document those changes. For instance, someone who adds a new feature to the application might create a commit with a message like “Added user authentication feature.” This commit captures the state of the codebase after the feature was implemented.
When a developer fixes a bug, the commit message usually refers to the number of the bug in the project’s bug tracking system.
Tags are named references to specific commits. They typically mark significant points in the project’s history, such as releases or milestones. Tags provide a way to label and refer to important versions of the codebase, making it easier to manage and navigate the project’s history.
Branches come into play when developers need to work on different tasks concurrently. Branches are independent lines of development that diverge from the main codebase. They allow developers to work on features or fixes in isolation without affecting the main code until they are ready for integration. Branches help organize development efforts and facilitate collaboration among team members.
For example, if one developer is working on adding a new feature to the application while another is fixing a bug, they can each create separate branches to isolate their changes. Once their work is complete, they can merge their branches back into the main codebase (Branches).
When multiple people are working on the same file, or are checking files into another branch, sometimes it will happen that two people made a change to the same line of a file. When the second person to make a change tries to check it in, the system warns of a conflict.
Some VCSes simply don’t let a developer check in a file if it contains a conflict with the current version in the repository. The developer must check out the current version and figure out how to resolve the conflict, then check in a new version.
Cloud-based systems offer merge or pull requests. These are created by a developer who has worked on a local system or branch and believes their changes are suitable for inclusion in another branch. The developers for the target branch decide whether to accept the request.
Subrespositories
It often happens that the code of another, independent project (e.g., a media player) is required for the development of a software project (e.g., a complex website). Instead of copying the code of the media player in part or in full into your own project, many VCSs offer the feature of subrepositories, also known as submodules.
In our example, the repository of the media player is integrated into the repository of the website as a subrepository. It then appears as a separate directory in the directory tree. This means that the code base of the media player is fully available, but remains independent. If required, it can even be updated from the original repository of the media player.
This capability proves valuable for handling intricate projects with numerous dependencies or integrating third-party libraries and frameworks into a codebase. Submodules enhance project organization and facilitate collaboration by allowing developers to work with interconnected codebases efficiently.
General Use of a Source Control Management System
Each participant in a project starts by creating an identity, normally tied to the participant’s unique email address. A cloud-based system manages identities through accounts, as social media sites and other organizations do.
New developers go through a sequence like the following, when using an SCM system:
-
Install the SCM software, if it is not already provided on their operating system.
-
Get the entire project onto the local system one time, also known as cloning.
-
From this step onward, work locally (in one or more branches, if necessary), and push changes to the repository.
-
Pull the recent version from the repository before the next working session.
Project managers decide whom to trust and give access to the repository. Senior devoélopers have the important responsibiilty of deciding when changes submitted by other contributors are ready to go into the main branch of the respository.
On cloud-based systems, a project owner can control the accessibility of a repository by setting its visibility to either public or private. Public repositories grant read access to anyone on the internet. Private repositories, however, limit access solely to the owner, individuals with whom the owner have explicitly shared access, and, in the case of organization repositories, specific members of the organization.
Imagine a team of developers working on an e-commerce website. They decide to add a new feature that allows customers to track their orders. To implement this feature, they create a feature branch with a name such as order-tracking
, where they can work on the necessary code changes without affecting the main codebase. Once the feature is complete and tested, they merge this branch into the main development branch for further integration and testing.
The main development branch serves as a central hub where all the new features are brought together for testing. For instance, if multiple developers are working on different features concurrently, they can merge their feature branches into the development branch to ensure that everything works together smoothly. This integration process helps identify and resolve any conflicts or compatibility issues early on.
When it’s time to release a new version of the e-commerce website, the team creates a release branch, such as v2.0
, from the development branch. They focus on stabilizing the codebase, fixing any last-minute bugs, and conducting thorough testing to ensure a smooth release. Once the release is ready, the code from the release branch is deployed to production, and the cycle begins anew.
Common Version Control Systems
Some of the best-known version control systems are Git, Subversion (also known as SVN), and CVS. All of these are open source.
Git is a distributed version control system widely used in software development and other fields. When using Git, each developer has a complete copy of the codebase on their computer.
This decentralized approach enables developers to work offline and collaborate seamlessly, without relying on a central server. That is, developers can work independently on different features or fixes and merge their changes together seamlessly. Even if the central server goes offline, developers can continue working and share updates with each other.
Consider the development of the Linux kernel, which initially relied on a centralized version control system called BitKeeper. When BitKeeper’s free-of-charge status was revoked, Linus Torvalds and the Linux community developed Git as a distributed alternative. This decision enabled non-linear development and the efficient handling of large projects such as the Linux kernel. The success of Git for the Linux kernel — an extremely complex project with thousands of developers and innumerable branches — shows the power and scability of Git.
Most software development these days uses Git. Extremely popular SaaS offerings are built around it.
Git treats conflicts by giving the developer a file with both versions of the changed line, clearly marking which version of the file the lines are from. The developer must decide how to resolve the conflict and check in a coherent version of the file.
Subversion (SVN) was probably the most popular SCM system before Git was invented. Unlike Git, Subversion is centralized: The version history resides on a central server. Developers connect to this server to make changes, ensuring that everyone is working with the latest version of the codebase.
Assume you are part of a team working on a project using Subversion. Each time you need to make changes to the codebase, you connect to the central SVN server to check out a working copy of the code. This ensures that you’re working with the most up-to-date version of the project. After making your changes, you commit them back to the server, updating the central repository with your modifications. The centralized workflow helps maintain consistency and ensures that everyone is working towards the same goals.
Before Subversion, CVS was a very popular, centralized version control system. It had design problems that led to the design of Subversion as an alternative.
Guided Exercises
-
Name three core features of SCM systems.
-
Describe the concept of tagging in SCM systems and explain why it is important for managing software releases.
-
What is the difference between a branch and a subrepository in a SCM system?
Explorational Exercises
-
Compare Git and Subversion (SVN) in terms of their architecture and workflow.
-
What is the “index” or “staging area” in Git?
-
Outline the Git trunk-based branching strategy.
-
Which of the following SCM systems are open source?
Git
Mercurial
Subversion
GitHub
Bitbucket
GitLab
Summary
This lesson explained the central role of source code management systems in modern software development. You learned the basic terms and ways of using the system, including repository, branches, tags, and merges.
Answers to Guided Exercises
-
Name three core features of SCM systems.
-
Logging changes to the source code
-
Management of (simultaneous) access to the source code by developers
-
Ability to restore any state of development of the files or the entire project
-
-
Describe the concept of tagging in SCM systems and explain why it is important for managing software releases.
Tagging is the practice of assigning descriptive labels or names to specific commits within the codebase, serving as markers for significant points in the project’s history, such as releases or milestones. These tags offer a convenient means of referring to particular versions of the codebase and monitoring the project’s evolution over time.
-
What is the difference between a branch and a subrepository in a SCM system?
A branch is a parallel development line in a project, such as for bug fixing or developing new features, which is usually merged back into the main development branch as soon as the task has been completed. A subrepository or sumbmodule is an independent project whose repository is integrated into a project in order to access its code base. The subrepository appears as a directory in the directory tree of the project and remains independent.
Answers to Explorational Exercises
-
Compare Git and Subversion (SVN) in terms of their architecture and workflow.
Git is a distributed version control system (DVCS), allowing each developer to have a complete copy of the codebase and work even offline on the source code. Git has gained widespread popularity among developers due to its speed, flexibility, and robust branching and merging capabilities. SVN is centralized VCS, with the version history residing on a central server. It remains popular in certain enterprise environments due to its centralized nature and mature feature set.
-
What is the “index” or “staging area” in Git?
The index or staging area is an intermediate layer between the local working copy of the project and the current version on the server. It is a file in which all the information for a user’s next commit is stored.
-
Outline the Git trunk-based branching strategy.
Trunk-based development is a strategy that emphasizes frequent integration of changes into the main codebase (trunk). Developers work on short-lived feature branches and merge them into the trunk multiple times a day, ensuring continuous integration and rapid feedback.
-
Which of the following SCM systems are open source software?
Git
X
Mercurial
X
Subversion
X
GitHub
Bitbucket
GitLab
X