Managing software development projects with Git

Git It Right

© Lead Image © Stephen Rees, 123rf.com

© Lead Image © Stephen Rees, 123rf.com

Author(s):

The Git version control system is a powerful tool for managing large and small software development projects. We'll show you how to get started.

Special Thanks: This article was made possible by support from Linux Professional Institute

With its egalitarian spirit and tradition of strong community involvement, open source development doesn't scale very well without some form of version control.

Over the past several years, Git [1] has settled in as the most viable and visible version control tool for the Linux space. Git was created by Linus Torvalds himself, and it got its start as the version control system for the Linux kernel development community (see the box entitled "The Birth of Git"). Since then, Git has been adopted by hundreds of open source projects and is the featured tool on several large code-hosting sites, such as GitHub.

The Birth of Git

The Git version control system was born in an unexpected whirlwind of development back in the Spring and Summer of 2005. At the time, the Linux kernel developers were using the BitKeeper version control system. Unlike many version control tools of the time, BitKeeper had a distributed architecture, which worked well for the remote development process practiced within the dispersed kernel community.

The ever-practical Linus adopted BitKeeper because he saw it as the best available solution for the Linux team, however, BitKeeper came with some complications. The biggest issue was that, although BitKeeper was provided free of charge to the kernel team in what was called a "community" edition, it didn't have a free (as in freedom) open source license. Several restrictions were placed on using the software, including a prohibition against reverse engineering and creating unauthorized extensions. Several leading free software developers, such as Richard Stallman and Alan Cox, objected to a high-profile open source product like Linux using a non-free code management system, but the kernel community continued to use BitKeeper in an uneasy truce.

Then in 2005, word got out that Andrew Tridgell, creator of Samba, who was, at the time, an employee of the Open Source Development Labs (OSDL), then the overseer of Linux kernel development, had reverse-engineered the proprietary BitKeeper protocols and developed a free client, thus violating the BitKeeper license. BitMover, the company that owned and maintained BitKeeper, acted swiftly to cancel the community license, thus suddenly ending the kernel team's access to BitKeeper version control.

Linus already knew he didn't like any of the other code management systems available at the time, so when he lost access to BitKeeper, he started to build his own. The Git project was officially announced on April 6, 2005, and by June 16, it was already operational enough to manage the Linux kernel 2.6.12 release [2]. Linus passed the Git maintainership to Junio Hamano in July 2005, and Hamano has continued to develop and improve Git ever since. In an interview with TechCrunch in 2012, Linus said that one of his biggest successes was "…recognizing how good a developer Junio Hamano was on Git, and trusting him enough to just ask if he would be willing to maintain the project." [3]

Git has seen many improvements since 2005, thanks to Hamano and others within the Git development community, but the rapid rise of Git from the ashes of the BitKeeper debacle, and its gradual emergence as the world's most popular version control framework, have added another chapter to the legend of Linus Torvalds as a genius software developer. Linus isn't just the creator of Linux; he is creator of both Linux and Git – two incredibly successful tools that are in use all over the world.

Even if you aren't a professional developer, if you work in the Linux space, you occasionally need to download and compile source code, and, more often than not, that means interacting with Git. Many Linux users pick up occasional Git commands on the fly without ever getting a formal introduction to what Git is and how it works. This article is the first in a two-part series aimed at building a better understanding of Git for everyday Linux users. This first article shows how to install Git, create a Git project, commit changes, and clone the repository to a remote location. Next month, you'll learn some advance techniques for managing code in Git.

Git makes it easy to manage different versions of files side by side. The software is decentralized, and a server connection is only required for synchronization. Daily work is handled locally, which leads to much better performance. And after more than ten years of active development, Git is surprisingly easy to use, even for beginners.

Before You Start

Git is included with almost all distributions. On Red Hat systems, you can complete the install with sudo yum install git, and on Ubuntu, enter

sudo apt install git

You first need to configure a name and an email address. Without this information, Git either issues a warning or generates a dummy name and address. For the user John Doe with the email address john.doe@example.org, the process is shown in Listing 1.

Listing 1

Name and Address

$ git config --global user.name "John Doe"
$ git config --global user.email john.doe@example.org
$ git config --list
user.name=John Doe
user.email=john.doe@example.org

The git config --list command displays the settings. You can change these settings at any time. Git uses a multi-level structure for these settings (see the box entitled "A Question of Settings").

A Question of Settings

Git saves the settings as a function of the option specified in the config command. You have three options: --system, --global, and --local. The setting determines the storage location for the configuration. The configuration files are evaluated in the opposite direction, starting with the local, project-specific data.

The documentation shows the specification of global, i.e., user-specific configuration data. The software stores the information tagged as --global in the .gitconfig file in the user's home directory. See the man page man git-config or the help with git help config for more information on configuration options.

Let's Go

The project in this example is located in the ~/mproject directory and consists of the text files readme.txt and project.txt. The commands from Listing 2 create the project, but they do not yet create a repository.

Listing 2

Creating a Project

$ cd
$ mkdir mproject
$ cd project
$ echo "readme.txt file" > readme.txt
$ echo "file project.txt" > project.txt

Creating the actual repository requires three steps (Listing 3): First, initialize the project in the main directory, in this case ~/mproject, then register the files it contains, and transfer them to the Git database, the repository.

Listing 3

Creating a Repository

$ cd ~/project
$ git init
Empty git repository initialized in /home/john/mproject/.git/
$ git add readme.txt
$ git add project.txt
$ git commit -m "First Commit"
[master (Basic-Commit) 77558e4] First Commit
 2 files changed, 2 insertions(+)
 create mode 100644 readme.txt
 create mode 100644 project.txt

The git init command creates an empty repository in the .git subdirectory (a hidden directory). It doesn't matter whether or not the files are in the actual directory, Git does not take project-specific data into account.

Use git add to add a file to the index. The current version of the file is now in the staging area. It contains versions that are flagged for the next commit. Or in Git speak: The file is staged.

The git commit command adds the marked files to the repository. This process is known as a commit or check-in. Each commit has its own text. Git displays the first line of the text in various outputs; it should therefore be short and as precise as possible. You can then describe the commit in detail, separated by an empty line.

You pass this text to the software with the -m option. Without this option, Git starts the default editor, unless configured otherwise (Figure 1). If no entry exists, Git aborts the commit.

Figure 1: A commit without specifying text brings up the default editor.

Properly Managed

A version control system (VCS) manages versions, or, more precisely, versions of files. A version is created when you add a file to a project or edit a file already contained in the project. Using this system, you define the version of a project, such as a program or a web page.

A VCS logs who made what changes, when, and why. The log makes it possible to trace the changes, compare different versions, and restore previous versions. It also displays the changes that project members have made in parallel.

The software manages the files in a repository, or repo for short, which is basically a directory [4]. Since the work is usually done on a copy, and the original is typically located in another directory (ideally on a different physical medium); a kind of backup is automatically created.

If several people work on a project, the use of a VCS is actually obligatory. Synchronization quickly becomes a problem without it.

Versions

A Git-versioned project keeps three versions of a file. The version in the working directory is the one you work on. Once the file has reached a state that you want to keep, transfer it to the staging area using git add file and continue working on the version in the working directory.

You can repeat this process as often as you like. However, you always overwrite the previous version in the staging area. There is exactly one version for each file in the staging area. Any following commits adopt this latest version. The version in the working directory is irrelevant.

Figure 2 shows two different versions of the file project.txt, one in the staging area and a second in the working directory. The repository contains the third version.

Figure 2: Two versions of a file and possibly instructions for working with them.

Git sometimes gives hints when executing some commands. The hints often refer to how you undo a particular action.

Help

In addition to the general manual (man git), the installation comes with several specific manuals (see Table 1). If you call man git, you will find an overview of the subcommands, including a short description in the GIT COMMANDS section (Figure 3).

Table 1

All Cases Covered

Call

Content

man gittutorial

Git-based project flow

man giteveryday

Frequently used commands, including examples (Fedora)

man gitcore-tutorial

Procedure in detail; partly using outdated commands

man git

General manual

Figure 3: The man git man page gives you a quick overview of git commands.

Further documentation is located in /usr/share/doc/git. The scope of the documentation depends on the distribution. Fedora comes with a manual for users, user-manual.html, and a how-to, howto-index.html.

For help with a subcommand, add the subcommand with a hyphen. For instance, man git-add brings up information on the git add command (Figure 4).

Figure 4: In addition to a page with general information, many distributions also have pages explaining the various subcommands.

All distributions used in the test support automatic completion of the Git commands and their options using the Tab key. The excerpt from Listing 4, Line 2, shows the output after the command git a followed by pressing Tab. In this case, several options appear. The call to git --help (Listing 4, Line 3) shows an excerpt from the overview of the Git commands by task.

Listing 4

Completion and Help

01 $ git a
02 add    am    annotate    apply    archive
03 $ git --help
04 Use: git ... <command> [<args>]

Continuing with the Project

The project.txt file changes as the project progresses. You copy and save new versions with the commands add and commit. The git status command shows the status of the files in the working directory. Listing 5 shows the status of the file after the git add project.txt command switches from changes that are not flagged for a commit to changes that are flagged for commit.

Listing 5

Project File Status

$ echo "new line" >> project.txt
$ git status
On Branch master
Changes not flagged for commit:
  (use"git add <File>..." to flag the changes for the commit)
  (use "git checkout -- <File>..." to discard the changes in the working directory)
        changed:       project.txt
no changes are flagged for commit (use "git add" and/or "git commit -a")
$ git add project.txt
$ git status
On Branch master
changes flagged for commit:
  (Use "git reset HEAD <File>..." to remove from the staging area)
        changed:       project.txt
$ git commit -m "new line inserted"
[master 9d71c8d] new line inserted
 1 file changed, 1 insertion(+)

The git add command lets users specify patterns for files and directories and other options. You can use git add -u to move all modified files entered in the index into the staging area. Table 2 shows the commands used so far.

Table 2

Getting Started

Command

Function

init

Create or initialize empty repository

add

Add files to the staging area (basis for a commit)

commit

Transfer staging area versions to repository

status

Request status of files in working directory

Where to Next?

Git is now managing the project, but what is it actually doing? Lets look at the history first, which you can see with git log. The excerpt from Listing 6 shows a project with two commits, which corresponds to two versions.

Listing 6

Two Commits

$ git log
commit 9d71c8dd00db5bfb7e21ac8884356d0af284b1b8 (HEAD -> master)
Author: John Doe <john.doe@muster.de>
Date:   Fri May 11 15:22:13 2018 +0200
    new line inserted
commit e29f38d1bc7625090
Author: John Doe <john.doe@muster.de>
Date:   Thu May 10 09:35:53 2018 +0200
    First commit

Each commit is identified by a 40-digit SHA1 hash, which I will simply refer to as the hash. The hash is used for unique identification and as a checksum. For some commands, it is possible to specify the hash as a parameter; the first 8 to 10 digits are often sufficient.

The git log 77558e4ac command will only output the log messages up to the specified commit. In the terminal, you can copy and paste the hash with the mouse by double-clicking the hash with the left mouse button and then pasting it again with a single click on the middle mouse button.

Table 3 contains some commands, including possible options for handling the versioned data. The commands include a multitude of options.

Table 3

Extended Commands

Command

Function

log

Display versions including hash for identification

diff

Display differences between working directory and staging area

diff --staged

Display differences between staging area and last commit

diff hash

Display differences between working directory and commit hash

diff hash1 hash2

Display differences between the specified commits

reset HEAD

Remove files from the staging area

reset --hard

Reset files in the working directory to checked-in state

checkout

Overwrite file in working directory with content from last commit

checkout hash

Check out all files of the specified version .(Note: you cannot modify versions that have already been checked in)

The git difftool command behaves like git diff but starts an external program (Figure 5). Use the command

Figure 5: The git difftool command calls an external program to view the differences between files, in this case, the Meld visual diff and merge tool.
git config --global diff.tool   Program_name

to define the external program if required.

Remote Repository

So far, the project consists only of a local repository in the project directory. However, a typical project usually comes from a remote repository.

To create a remote repository, first create a bare repository from the local data. Unlike the local repo, this does not contain a working directory.

You then have the option to move the bare repository to a corresponding directory, ~/gitrepo in Listing 7. You can then rename or delete the existing project directory. Simply clone the newly created remote repository, and Git creates the subdirectory.

Listing 7

Remote Repository

$ cd ~/mproject/../
$ git clone --bare mproject mproject.git
Clone in bare repository 'mproject.git' ...
$ mv mproject.git gitrepo
$ cd
$ mkdir gitrepo
$ mv mproject/ mproject_old
$ cd
$ git clone /home/john/gitrepo/mproject.git
Clone to 'mproject' ...
$ cd mproject
$ git remote show origin
* Remote repository origin
  URL for retrieval: /home/john/gitrepo/project.git
  URL for sending: /home/john/gitrepo/project.git
  Main branch: master
  Remote branch:
    master followed
  Local branch configured for 'git pull':
    master merges with remote branch master
  Local reference configured for 'git push':
    master sent to master (current)

From now on, you will be editing the project in the newly created mproject directory. The local repository is connected to the remote gitrepo/mproject repository. The git push command transfers the data from the local directory to the remote directory.

Conclusions

Using Git is quite simple, even without prior knowledge of version control systems. You can complete your daily work efficiently at the command line with just a few commands. And since Git usually saves the changes locally, commands execute quickly.