\(\def\dt{\Delta t}\)
\(\newcommand{\transpose}[0]{^\mathrm{T}}\)
\(\newcommand{\half}[0]{\tfrac{1}{2}}\)
\(\newcommand{\Half}[0]{\frac{1}{2}}\)
\(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\)
\(\newcommand\given[1][]{\:#1\vert\:}\)
UP | HOME

Git Introduction

Table of Contents

1 What is Version Control?

Version control (also called revision control or source control) is a way of tracking changes made to a document through time. For software developers it is an essential tool for collaboration, bug tracking, developing new features, backing up files, etc. Have you ever written a piece of software that you just know was working this morning before you left your house, and now that you're trying to demo it to a professor and it isn't working? Most likely, you accidentally changed some very minor detail of your code and without some very careful line-by-line combing, this change is not going to be easy to detect? Or have you ever had a script that is working great, but you need to change something. So to make sure you don't lose the version that is working well, you make a copy of the file so now you have two file script.m and script_new_method.m. Later on you end up doing this again, and again, and then suddenly you have twenty different scripts that area all essentially doing the same thing. Then what happens when you find a mistake in the very first script? You now need to propagate the fix through a mess of other scripts. That is not a good situation! These scenarios are examples showing situations where version control can save the day. From an even more basic standpoint, as a student, using version control is a great way to backup your work even if you aren't using some of the more advanced features.

In this program we are going to be using Git for version control. The first chapter of the freely-available Pro Git book has a great description of what version control is and why you might want to use it.

1.1 Why Git?

These days it seems that Git is the VC system that is taking over the world. Major software companies such as Google, Facebook, Microsoft, and Netflix use Git as their primary VC tool. The entire Linux kernel is managed in a Git repository! If you check out that link, you'll notice the repository has over 500,000 commits, has released nearly 500 versions, and has accepted contributions from nearly 6000 people. Even more importantly, Git has become the primary VC system for the robotics world. All of the Robot Operating System packages are maintained and released using Git. Simply due to its popularity, Git is certainly the VC system to learn. Knowing the most popular system will increase the probability of an easy transition into your next position. Fortunately, the requirements of any version control package are fairly universal. Thus, knowing one version control software will certainly help you to transition to any others. So even if you end up in a place that doesn't use Git, if you know Git, you will be fine. For reference, some other popular VC systems include Mercurial, Subversion, and CVS.

Beyond just being popular, why is Git any good? Well it comes down to a few simple design choices that were made early on in Git development. Most version control software packages support very similar workflows and operations. However, under-the-hood, how these operations are carried out may be very different.

Distributed
Every single person that works with the Git repository ends up with a complete history of the repository. That means if two people are both working on a single project and they each have a copy of the project on their computers, then two complete copies of all changes made by each person exists. This drastically reduces the chance that you will end up with corrupted files, or lose any work due to hard drive failures. Further, it is very easy to "push" changes that you make to internet-accessible servers. In this way it is very easy for two people to share their changes. There are dozens of third-party, cloud-based services specifically for allowing people to share their Git repositories. Having your history in the cloud means even better insurance against lost files, and a it can be a great chance to share your contributions with the world.
Free and modern
Git is completely open source and freely distributed. Development is extremely active.
Fast
Compared to other systems Git is really fast at many operations that you will find yourself doing on a daily basis. Operations such as merging changes, checking out different versions, and searching through log history are extremely fast, even on large repositories.
More reading
The official Git About page actually has a fantastic summary of what features Git supports. It makes more sense to read this after you understand the majority of the operations that are used in version control.

2 Getting Started with Git

These are the first things that I would recommend doing for people new to Git.

  1. First, you need to install Git. I would also recommend installing some sort of tool that provides graphical representations of Git history. On Linux, I use gitk for this. On Ubuntu/Debian, both Git and gitk can be installed with

    sudo apt-get install git gitk
    

    I'd head to the official Git downloads section for installers for other operating systems. The official site also has a nice list of other GUIs that are available for working with Git. I only use the command line and the Emacs package magit so I have no opinions on any of these GUIs. Students in previous cohorts have used GitKraken occasionally.

  2. It is a really good idea to take a few minutes and read Chapter 1 of the Pro Git book. If you were to only read one piece of this chapter it should be Section 1.3 - Git Basics.
  3. There are some basic things that you should setup on your machine before you really start using Git. Follow the instructions from Pro Git Section 1.6 - First-Time Git Setup.
  4. It is a good idea to have an online service where you can backup your Git repositories. The most popular free services are GitHub and Bitbucket. GitHub has more features and is more popular. However, one of its biggest detractors is that it does not freely support private repositories (all repositories are visible to the public). If you're willing to pay for GitHub you gain access to great tools for private repositories. Sometimes GitHub will give free private repositories for educational purposes. Bitbucket on the other hand offers free private repositories for every user. Regardless of which online service you use, I'd recommend creating a GitHub account, and ensuring that by the time you're done with MSR you have a nice variety of well-developed software packages on your profile. Potential employers often want to see this. Note that I personally use a combination of Dropbox, private servers here at Northwestern, GitHub, and even GitLab for backing up repositories. GitLab seems to be getting more popular rapidly. I think this is because it is open source, it is easy to setup your own instance of GitLab, and it is not owned by Microsoft.
  5. It's also a good idea to run through the First-Time Git Setup in the Pro Git book. This will help you get a few important configuration options setup correctly before you even begin. During that First-Time Git Setup, you'll choose what text editor you want to use with Git. On your Ubuntu system's this will likely default to gedit. If you'd rather use Sublime or Atom, check out these tips on GitHub's help.

3 Git Glossary and Concepts

This section provides a brief description of the most common Git concepts that you will encounter. When you are using git, it is very easy to get help. Below we will talk about many Git verbs (commit, merge, branch, etc.), and help with these verbs can be obtained with the following commands:

$ git help <verb>
$ git <verb> --help
$ man git-<verb>
Committing

Once you have a repository setup, and you have let Git know that there are some files it should be tracking, then committing is how you record the changes that have occurred in the tracked files into the repository's history. Every time you commit a set of changes, Git automatically creates a unique hash and assigns it to that commit. In other words, every set of changes that have ever been recorded to the repository have a unique identifying number called a hash. A concept that goes along with committing is the Git HEAD. The HEAD basically tells your repository what state you are currently in. So if you have a repository with a group of commits (each with a unique state for all tracked files and a unique hash), you can change your copies of the files to reflect each of these commits by moving your HEAD to be pointed to each commit's hash. Another concept that should be noted is that every commit has at least one parent. This is how the time evolution of files is recorded.

The most common commands are

# Commit only the changes that have already been "staged"
$ git commit

# First "stage" all changes to all files that have changed since the last commit,
# and then commit them to the repostory history. An editor will pop up, and you
# will be asked to write a message describing this commit. This is the most 
# common Git command:
$ git commit -a

# Stage, commit, and fill in a message in one command:
$ git commit -a -m "This is my commit message"
Staging
Without passing the -a switch to the commit verb, committing will only record changes to files that have been "staged". Imagine you are tracking three files, and all three have changed since the last commit. Also imagine that it would be easier to understand what changes had occurred if you were to commit the changes to each of the files separately. So you could stage the changes to one of the files, and then run git commit to commit just those staged changes. Note that certain operations will automatically stage changes (merging and adding files). Also note that you can unstage changes using the reset verb.
Branching

The hashes that are created to identify each commit are very long, useless-to-a-human strings. So Git is really designed to operate using branches in place of hashes. A branch is nothing more than a pointer to a commit, and typically your HEAD is just a pointer to a branch. Branches will usually have nice, descriptive names indicating what you are doing when you are on that branch. Many of the operations discussed below (merges, checkouts, diffs) are typically implemented using branch names. Git will not function properly if your HEAD is not pointing at a branch.

# View all branches and which branch your HEAD is currently pointed at:
git branch

# Create a new branch at your current HEAD location:
git branch NEWBRANCH

# Delete a branch that your HEAD is not currently pointing at:
git branch -d NEWBRANCH

# Move your HEAD to a particular commit and create a branch at that commit
# for your HEAD to point at:
git checkout -b NEWBRANCH 9f621b7c9363c36b9ea2afc5400344e8d7c04ed9

Logging
The log verb is how you see information on the commits that have been made. The git log command by default only shows the log files associated with your current branch (discussed below). Note that you can view the log for individual files (git log filename), view the log for all branches (git log --all), and search through log history (using the -G or -S options).
Checkouts

Checking out is how you move your head to point at different commits or branches. It can also be used to grab a single file from a commit or branch and apply it to your current HEAD state.

# Change HEAD to a different branch (will not work if you have uncommitted changes):
git checkout BRANCHNAME

# Get a single file from a different branch and apply it to your current repo state:
git checkout BRANCHNAME -- filename
Merging
This is how you join two branches together. It is one of the most important operations in Git, and also one of the places where things are the most likely to go wrong. Sometimes changes in branches conflict with each other, and merging cannot figure out what change you want. Figuring out how to deal with these merge conflicts will take practice! Note that the git merge BRANCHNAME operation attempts to combine the state of the repo at BRANCHNAME and the state of the repo that your HEAD is currently pointing at, and create a new commit that is the child of your current commit.
Adding/Ignoring
The add verb is how you tell Git that you are interested in tracking specific files. The command git rm --cached is how you remove a file from being tracked. In any repository you can create a file called .gitignore in that file you can describe files that you never want to track in this particular repository.
Remotes/Fetching/Pushing/Pulling

Remotes are simply Git repositories located outside of your main repository that you will be synchronizing content with. So for example, a remote could be a second copy of a repository on your local computer, or it could be some internet-accessible place that is hosting your repository (e.g. GitHub). The concept of "Pushing" involves sending data from your repository to a remote, "Fetching" involves downloading content from a remote to analyze changes that have been sent to the remote from somewhere else, and "Pulling" involves first fetching content, and then merging content according to several rules. Note that Git support several different protocols for communicating with a remote. Note that GitHub supports Smart HTTP and SSH. I typically use SSH (it's more secure, and easier to manage authentication). We'll talk later about SSH, but if you wanted to start using it now, check out this article: https://help.github.com/articles/generating-ssh-keys/ As a final note, I use the local protocol all the time inside of a Dropbox folder. So I push/pull/fetch from a copy of the repo in my Dropbox instead of on a GitHub server.

# Add a new remote pointing to a GitHub repository called "github-origin" 
# (people often use the name "origin" when they only have one remote):
git remote add github-origin https://github.com/username/reponame.git
# Push a local branch "local" to a remote called "github-origin". 
git push github-origin local
# Same as above, but additionally use the "-u" or "--set-upstream" 
# switch to setup an association between the "local" branch on your
# repository and the "local" branch on the remote. After this, any 
# time you want to push from your "local" branch to the server's 
# "local" branch you can just use "git push":
git push -u github-origin local
# Fetch content from the server (if the server's branches have been
# updated since you last fetched from the server, you should see 
# your references to the remote branches move i.e. your local 
# understanding of the remotes target commits will change.)
git fetch
# fetch "local" branch from "github-origin" remote and merge with current branch:
git pull github-origin
Cloning

This is how you copy an entire repository from one place to another place. Typically when people are using remotes the repository located at the remote was created using git init --bare instead of just git init. A bare repository doesn't actually contain any of the tracked files. Rather, it only contains the information relevant to Git. Bare repositories should always be used for remotes shared between people. By convention, bare versions of repositories end in ".git". For example, the bare version of a repository called test-repo should be in a directory called test-repo.git (GitHub) follows this convention.

# Clone a copy of "reponame" located on a GitHub server into a local directory 
# called "reponame"
git clone https://github.com/username/reponame.git
# Clone a copy of "reponame" located on a GitHub server into a local directory 
# called "my-name"
git clone https://github.com/username/reponame.git my-name
Tagging
Tags are just like branches except they typically don't move. They are usually used for indicating important commits that should never move. For example, tags are used for marking releases of software. To add a tag called "tag-name", move head to a branch or commit and run git tag tag-name.
Stashing
Stashing is a way to quickly save changes that you don't want to be permanent, but you also don't want to lose. For example, imagine you have uncommitted changes that are preventing you from checking out another branch, git stash could be used to save those changes in a special place. Then later if you returned to the commit where the stash was created, you could re-apply all of the stashed changes.

4 Live demo and log

The following lists a complete log of some basic Git commands. It may be a good exercise to step through each of these steps, and ensure you understand what happens for each command. I would recommend opening gitk using gitk --all & on the command line and then using Shift+F5 to reload the repo info after each Git operation.

4.1 Git Log

 1: $ cd
 2: $ cd Desktop/
 3: $ l
 4: $ git init newrepo
 5: $ cd newrepo/
 6: $ l
 7: $ echo "var1 = 2" > file1.txt
 8: $ l
 9: $ git status
10: $ git add file1.txt
11: $ git status
12: $ git commit -a
13: $ cat file1.txt
14: $ echo "var0 = 3" > file1.txt
15: $ cat file1.txt
16: $ git status
17: $ git commit -a -m "edit file 1"
18: $ echo "var2 = 2" > file2.txt
19: $ l
20: $ git status
21: $ git add file2.txt
22: $ git status
23: $ git commit -a -m "added file 2"
24: $ git log
25: $ git log file2.txt
26: $ git log
27: $ git branch test
28: $ git branch
29: $ cat file1.txt
30: $ cat file2.txt
31: $ git checkout master
32: $ git checkout test
33: $ cat file1.txt
34: $ cat file2.txt
35: $ git status
36: $ git commit -a -m "comment file 1 on test"
37: $ git checkout master
38: $ cat file1.txt
39: $ git checkout test
40: $ cat file1.txt
41: $ git checkout master
42: $ git status
43: $ git commit -a -m "comment on master"
44: $ git status
45: $ cat file1.txt
46: $ git checkout test
47: $ cat file1.txt
48: $ git branch feature
49: $ git branch
50: $ git checkout feature
51: $ l
52: $ git status
53: $ git diff
54: $ git status
55: $ git diff
56: $ git diff file2.txt
57: $ git diff file1.txt
58: $ git commit -a
59: $ git checkout test
60: $ git merge feature
61: $ cat file1.txt
62: $ git checkout feature
63: $ cat file1.txt
64: $ git checkout master
65: $ git merge feature
66: $ git status
67: $ git commit -a -m "Merged feature"
68: $ l
69: $ mkdir data
70: $ l
71: $ cd data/
72: $ touch file3.txt
73: $ cd ..
74: $ l
75: $ git status
76: $ git add data/
77: $ git status
78: $ git commit -a -m "Added data dir"
79: $ git status
80: $ git checkout test
81: $ cat file1.txt
82: $ echo "var0 = 5" > file1.txt
83: $ l
84: $ git diff
85: $ git commit -a -m "edit file1 on test branch"
86: $ git checkout master
87: $ git merge test
88: $ # note that Git can successfully merge this file using 'recursive' strategy
89: $ git status
90: $ cat file1.txt

5 Git best practices

  • Try not to work on same file at the same time, if you do, do it on a different branch.
  • Don't add raw data files, archives, pictures, databases, etc..
  • Always setup a .gitignore properly
    • Don't track any files that aren't necessary.
  • Avoid re-writing history or forcing anything.
    • git reset --hard
    • git rebase
    • forcing pushing/pulling
  • Don't make unnecessary whitespace and formatting changes – being sloppy about the specific things that you edit can result in very unhelpful diffs. For example, converting all tabs to spaces in a document will result in a diff that shows every single line changing. If you need to make a large-scale formatting change like this, do it on a single, well-labeled commit that has no substantive code changes.
    • If working with someone else, make sure your editors are setup to be consistent.
    • If editing code from someone else, be cautious about randomly changing their formatting. Try to adhere to any stylistic decisions that the original code author made.
  • There are many different workflows that a team may employ for managing a Git repository, and there is no right answer. How your team chooses to deal with branching, committing, bug reporting, etc. is based on personal preferences, but it should also be informed by the type of project that you're working on. When you join a company, they'll typically already have a set of rules governing how their employees work with their codebase. What if you don't have an employer telling you how to manage your repo? How do you choose the best workflow? In my opinion, this is a difficult question to answer that likely has no "best" solution. I think that the more you work with Git (or any version control system), the more you will develop an internal sense of the pros and cons of various strategies. When you're beginning perhaps the most important point is to make sure that you and anyone you're working with have a workflow in place and that everyone is clear on what the rules are. If all team members understand and follow the same set of rules, you are much less likely to run into complicated Git issues.

6 What is GitHub?

GitHub is the most common web-based Git repo hosting service. It is a free place to push all of your changes to be backed up in the cloud. Beyond simply allowing people to backup their repos, GitHub has tons of great features that extend the usefulness of Git. I only talk about GitHub specifically because its use is widespread. In fact, many GitHub specific concepts and terms have evolved to be part of the standard Git vernacular. Here are some of the primary GitHub features:

Issue Reporting
Many projects that are hosted on GitHub use GitHub as their primary tool for managing bug reports and bug fixes. If you encounter a bug for a project that is on GitHub, it is usually a great idea to go to the main project site and click the Issues button on the right hand side. From here you can search and see if others have had the same issue as you, and maybe even find a workaround/solution. If nobody has reported the issue, you could even submit the bug information yourself. Writing a good bug report takes practice, but submitting bug reports is the only way to practice.
Forking
Let's say you are using a project from someone else on GitHub and you encounter something that you don't like. It could be a bug or maybe a missing feature. Well if you were interested in fixing this bug or adding this feature yourself, then you could go to the main project page and click the Fork button on the top right of the page. This would create a copy (called a "fork") of the original project, but this new copy would be associated with your username. So you'd have the ability to push and pull from your fork as much as you like. The alternative way to get access to a copy of the original repo would be to clone the original repository to your machine, creating a new repo under your username, and then push the cloned repo into the new GitHup repo. The forking method is much better because then your fork of the project will always be tied to the original project. Everyone that visits the original project page can see how many forks there are, and who created them, and anyone that visits your fork page will see that it is a fork of the original project. For example, if you visit my profile, you'd see that I have a fork of the ros_tutorials package.
Pull Requests
When you combine issue reporting and forking, you get a new feature – pull requests. If you fork someone's project and then fix a bug or add an enhancement, you can submit a pull request. This alerts the original project owner that you've done something that they would find helpful and gives them the opportunity to merge your changes directly into the original project. Check out the pull request I submitted using the previously-linked ros_tutorials fork. Here are the steps I used to submit this PR:
  1. noticed a small bug
  2. forked the project (https://github.com/jarvisschultz/ros_tutorials)
  3. cloned my fork
  4. fixed the bug and pushed the fix to my fork
  5. submitted a bug report (https://github.com/ros/ros_tutorials/issues/24)
  6. submitted a pull request (https://github.com/ros/ros_tutorials/pull/25)
  7. eventually that pull request was accepted and pulled into the main project
GitHub Pages
GitHub pages is a great way to host and build websites for yourself or an individual project for free. Basically you create either a special branch or a special repository, and you push Markdown files to the branch/repo, and GitHub will automatically run Jekyll, build your website, and give you a unique URL.
Wiki Hosting
GitHub let's you have either publicly editable or non-editable wiki's hosted directly on your page.
Gists
Gists are miniature Git repositories. They are a great way to share individual files, they are really helpful for sharing tutorial code, or minimal examples for bug reports, and they can be set to be either public or secret for free.
Permissions Management
For every Git repository, you have full control over collaborators – people that you allow to push code to your repository.
Organizations
Organizations are used for projects that involve groups of people all working on common projects that may have complex permissions requirements or require multiple administrators/owners.
GitHub Classroom
This is a set of tools that allow educators to create and collect assignments on GitHub. I use this extensively in my course.
Private Repos
By default all repositories on GitHub are public (anyone can see them). If you would like to have a private repository (you control who can see your repo), you either need to pay for an account, or as a student, you can request free private repos.
Fancy Rendering
GitHub has the ability to nicely render many types of files, even some that aren't code. Here are a few examples:

7 Other sources

My advice for learning Git is to use it as often as possible, and when you encounter problems to take the time to understand what is wrong and how to fix it. It will take time, but eventually your Git toolbox will grow to a useful size.

There are a ton of great sources online for learning Git! Including many interactive, and step-by-step tutorials. In my opinion, the very best thing you could do to really learn everything about Git is to read and study the Pro Git book. However, that is clearly a pretty big undertaking! So, below are a few sources of other tutorials and step-by-step guides that I think are useful:

GitHub guides
These guides cover a wide array common topics people are interested in. They are generally concise, well-written, and aesthetically pleasing. Many of the topics are GitHub specific things, but that is not necessarily a bad thing.
Git - the simple guide
This is a short, single page introduction to Git that doesn't get into anything too complicated. It would be a great place to return to as a reference as you're learning or to get a refresher after a long absence from Git.
GIT IMMERSION
A very nice step-by-step walkthrough of using Git.
Code School Git Path
Several basic online courses about Git. The intro course is free.
nuitrcs/gitworkshop
Northwestern IT Research Computing Services occasionally runs a Git workshop, and this link is to their GitHub repo supporting the workshop. They primarily link to other teaching tools.

7.1 Custom Git Prompts

Many people that find themselves regularly working with Git, end up making a custom prompt in their terminal that helps them to easily see what is happening in their current repo. The prompt in Bash is defined by the PS1 environment variable (see the PROMPTING section of the Bash man page). You certainly don't need to do this, but if you think it would be useful to have a custom prompt, I've described a few options to get you started below.

Note that when you look at a PS1 variable, you will often see many strange character sequences like [\e[38;5;172m\]. These are ANSI Escape Codes that control terminal text formatting options, usually used in this case to control color. Feel free to experiment and create your own prompt.

7.1.1 My Byobu Prompt

I use Byobu regularly, and therefore, I built my personal custom prompt on top of Byobu's built-in custom prompt. If you have Byobu installed, you can run byobu-enable-prompt in a terminal it will add a line similar to the following in your ~/.bashrc:

[ -r /home/jarvis/.byobu/prompt ] && . /home/jarvis/.byobu/prompt   #byobu-prompt#

It will also create the ~/.byobu/prompt file referenced above. By default that file contains only one line:

. /usr/share/byobu/profiles/bashrc  #byobu-prompt#

Instead, I replace that file with a custom file that controls my prompt when in Byobu. My custom file is located in my system_configurations GitHub repo.

If you like my Byobu prompt but aren't ready to commit to learning Byobu, here's a small snippet that should approximate my prompt that you could put in your ~/.bashrc:

GREEN="\[\033[0;32m\]"
function parse_git_branch {
    git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e "s/* \(.*\)/ [\1$(parse_git_dirty)]/"
}
PS1="${debian_chroot:+($debian_chroot)}\[\e[38;5;202m\]\[\e[38;5;245m\]\u\[\e[00m\]@\[\e[38;5;13m\]\h\[\e[00m\]:\[\e[38;5;172m\]\w\[\e[00m\]$GREEN\$(parse_git_branch)\[\e[00m\]\$(printf '%s' '⟫') "

7.1.2 Bash Git Prompt

This is a widely used prompt that contains a lot of useful Git info. They have a handy guide explaining what information is contained in the prompt. They also have a very easy two-step set of instructions on how to install and add this prompt to your terminal. Finally note that they document all of the configuration options that you have at your disposal.

7.2 Custom Git Interfaces

It is worth pointing out that many text editors and IDEs have custom interfaces to Git that can make day-to-day Git operations far more convenient. Here are a few that I use or know people that use regularly.

  • magit is an Emacs package that is incredibly powerful. I do basically all of my Git usage through magit. Last I heard there was active development on porting magit to sublime and atom.
  • Most people that use Atom seem to use git-control for Atom-Git integration.
  • Sublime has basic Git integration via the Sublime Git package.
  • VSCode has Git integration built-in
  • MATLAB also has built-in Git support
Creative Commons License
ME 495: Embedded Systems in Robotics by Jarvis Schultz is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.