UP | HOME

Introduction to git

What is Version Control

A tool for tracking changes to files and synchronizing updates between multiple developers.

Why Version Control?

  • Go back to previous versions
  • No saving of final_final_really_final_version.doc
  • Easily merge changes from multiple people
  • Keep a record of why you made changes along with those changes

Non-git version control

  • Concurrent Versions System (cvs)
  • Subversion (svn)
  • Mercurial
  • Perforce
  • Bitkeeper1

What is git?

  1. Git is a version control system originally developed by Linus Torvalds for use with the Linux kernel.
  2. Git is used by Google, Facebook, Microsoft, Netflix, Boston Dynamics, and the Linux Kernel (and almost everyone else too!).
  3. The source code for most programs on your machine is maintained in git

Why git?

Git enables you to easily keep a history of the changes, the reasons behind them, and combine your work with others. It is ubiquitous, distributed, fast, and flexible. Used properly, it frees you to experiment with your code, without worrying about whether you will irrevocably loose something important.

It is also a job requirement for nearly any robotics job involving software (which is to say almost every robotics job).

  1. Strive to be the person that everyone on the team comes to with git questions.
  2. Avoid being the person who messes up the repository for the team.
    • Almost every error in git is recoverable with some effort
    • Each of you will likely make some or even many git mistakes along the way. It is through fixing these mistakes that you may gain mastery of git.

Jupyter Notebooks

  1. While git works with any type of file, it works best with text files designed to be edited by humans.
  2. One type of file you may encounter that does not quite fit this model is a Jupyter notebook
    • A Jupyter notebook combines source code (often python) and its output, in an interactive format
  3. If coding Jupyter notebook, I recommend that you look into using git along with jupytext.

Git Help

  1. Often, when you do something slightly wrong in git, git will suggest the appropriate action.
  2. Getting help: git help <verb>
  3. git <verb> --help
  4. man git-<verb>

The Three Areas of Git

  1. Working directory - The contents of your files on disk, as they appear with ls etc.
    • You edit files in your working directory as usual
  2. Commit History- The repository contains the full project history, consisting of all of the data that you have committed from the staging area.
  3. Staging Index - The holding area for the changes you make to your working directory.
    • After you change files, you explicitly add the changed files to the staging area, in preparation for saving to the repository (committing).
    • Often shortened to just Index

Basic git usage

  1. A git repository is a directed acyclic garph (DAC) of objects, called commits.
    • git commit creates a commit, a snapshot of the files you are tracking in your working directory.
    • To track a file, add it to your staging area using git add
    • You can remove a tracked file using git rm. If you had previously commited it it will still be recoverable from the repository.
    • You should write a message to accompany the commit. The message explains what the commit does.
    • Automatically commit all tracked changes using git commit -a
    • Each commit is identified by a hash, a number calculated from the state of the changes.
    • The HEAD is a pointer to a commit corresponding to the current state of your repository.
  2. Use git status to see current information about the state of your repository.
    • You can run this frequently and there are bash scripts to add various levels of the information it provides to your prompt
  3. You can checkout a commit to make your working directory match its state at a given commit
    • A checkout moves your HEAD to a different commit and sets your working directory to correspond to that commit.
  4. You can branch to create a name for a commit (rather than its hash).
    • Every git starts with a default branch, usually called either master or main (but it can be named anything).
    • You can create other branches. When you commit, you move the branch to the latest stage.
    • By maintaining multiple branches you create a DAC of commits. This enables you to experiment easily
  5. Use git switch to switch branches2
  6. Use git restore to retrieve a file from a different commit2
  7. Use git log to display information about the commits made on a branch and the associated meta-information
  8. Join changes from two branches together using git merge.
    • If on each of the two branches you have not edited files in the same place, a fast-forward merge occurs, which simply adds the commits from the branch you are merging on top of the current branch
    • Other merge strategies also exist. If a file has been modified in incompatible ways, you get a merge conflict that must be resolved manually
    • Git provides tools for managing conflicting edits, but it is best to coordinate with your team to avoid the situation.
  9. When you add a file in the directory, git lists all other files that it does not know about as "untracked".
    • You can prevent a file showing up as untracked by listing it in a .gitignore file
    • .gitignore also accepts globbing patterns (e.g., *.o will ignore all files ending in .o).
    • .gitignore is useful for keeping the repository clean and preventing user-generated files that should not be shared from cluttering git
    • Examples of files that should be in .gitignore
      • Results of compilation (e.g., anything generated from the source code)
      • Personal preferences that apply only on your machine
  10. Remotes connect different repositories and are created with git remote add.
    • The repositories can be in different directories or different computers.
    • Remotes contain branches that you can merge with the base repository.
    • A remote is usually a copy of your repository, but it need not be.
    • Git is distributed so there need not be a central repository. However most projects have a central repository in practice and host it on a website like GitLab, GitHub, BitBucket, or sourcehut.org
    • Git provides commands for keeping remote repositories in sync.
    • git fetch retrieves commits from a remote repository. The branches become available on your local repository for merging.
    • git pull is the same as git fetch and git merge. It is safer to separate these commands.
    • git push updates a remote branch with the commits on your branch.
  11. Cloning: git clone creates a copy of a repository from another location
  12. Tagging: git tag can be used to label a commit. Like a branch, but usually you keep a tag in the same place.
    • Projects often provide tags for each release of their software
  13. Stash: git stash saves what you've been working on and restores your working directory to the last commit. You can then restore your changes from the stash. Useful if you need to quickly revisit your last commit without losing what you are currently working on.
  14. Rebase: git rebase. A tool for re-writing git history. Very powerful, used in some git workflows to keep history clean. If used in conjunction with a git workflow with strict rules can be quite useful.

Git Tips

  1. When in doubt, copy your repository, and try out commands. Repeat until you get the desired result
    • It is very hard to actually lose data in git
    • That does not mean the commands to recover are always straightforward
    • If you just copy your repo, you can experiment without worry.
  2. Commit often! You should try to be coding incrementally. Set many small goals, when you accomplish each goal commit.
    • Commits create milestones that you can explore and restore your code from at any time
    • I can predict how well a project works based on the number of commits
  3. Don't add generated content, such as compiled files to your code. Put these files in your .gitignore
  4. Avoid adding large binary data files such as databases or large datasets
    • These take up a lot of storage when they change, since git needs to store a whole new copy
    • There are tools for handling large binary files like git annex or git-lfs
  5. Keep small formatting changes in separate commits from substantive changes. Try to lump small formatting changes together
    • Makes the history clear. Try to follow a style guide when coding.
  6. Write good commit messages that clearly explain what your commit did.
    • The first line is a summary. Then there is a blank line, followed by more details. Don't forget the more details part!
    • Many projects limit line widths to 80 characters for easy display on a narrow part of the screen
  7. Employ a git workflow for your team.
    • A company will likely have some practices already in place.
    • I provide a workflow below

Simple Git Workflow

Here is a workflow that I have found is useful for small student groups. The main benefits of this workflow are

  1. The main branch always has working code.
  2. git will force you to incorporate your teammates' changes prior to changing main

Setup

  1. There is a central repository called origin. Everybody clones origin
  2. The main branch on the central repository must always contain the latest working code.
    • Your local main branch simply tracks origin/main. You never edit code on main directly=
  3. Everybody works in branches of the form name/feature.
    • For smaller projects it is okay if each person has just one feature, but you can have as many branches as you want
    • Do not ever create a branch called just name since it prevents you from making name/feature later due to a quirk in git.

Process

  1. Get the latest code from your team. You always track this progress on your local main branch:
    1. git switch main to be sure you are on your main branch
    2. git fetch from origin to retrieve the latest updates
    3. git merge --ff-only the changes onto your local main branch.

      Details
      • If this merge fails, it means you edited code on main directly (which is not permitted in this workflow).
      • To recover depends on what happened, but here is one way to hopefully (get you back on the workflow path):
        1. Commit your current work.
        2. Create a feature branch at the current location: git branch name/feature
        3. Reset the main branch to the origin location: git reset origin/main
        4. The git merge --ff-only should succeed and you can now continue with the regular workflow.
    4. main is now up to date with origin/main.
    5. In the normal workflow, this step is the only time you ever modify anything on main
  2. Incorporate the code from the team into your branch
    1. If starting a new branch:
      • git switch -c name/feature to create it and switch to it.
      • You can now begin coding
    2. If working on an existing branch:
      • git switch name/feature to switch to the branch
      • git rebase main to incorporate the upstream changes.

        Details

        This command will

        • Temporarily remove all the commits you have worked on since you last diverged from main
        • Insert the commits from main that you had not yet incorporated into your branch
        • Replay your commits on to of the last commit in main. (So it will be as if you made your changes against the latest main branch).
        • If there are conflicts, prompt you to fix and commit them.

        -You can always abort the rebase with git rebase --abort and communicate with your team if you are unsure about the nature of the conflicts.

    3. Test your code and make sure it works before proceeding.
  3. When ready to merge your feature branch to the remote main
    • git push origin name/feature:main
    • If nobody has pushed to main since your last fetch, this will go smoothly. Coordinate with your team prior to pushing to main!
    • If somebody has pushed to main, your push will be rejected and you must incorporate their work before pushing. Go to step 1.
  4. If you want to share code with some group members but are not ready to commit to main, you may establish other shared branches called share/feature. They behave like the main branch except the bar for committing/pushing to them can be set to an arbitrarily low level (i.e., you may even agree to commit broken code that does not compile to it).

Extensions

One shortcoming of this workflow is that your name/feature branch only exists on your local computer. If you are properly backing up your system, this should not be a problem; however it is probably a good idea to always push feature branches to a remote repository just in case.

  1. One option is to make secondary copy of your repository and branches at a different remote (called, for example, myorigin).
    • You can push your feature branches to this repository as a backup, without sharing them with your team
    • This fork of your main repository can also be useful when showing off your GitHub: the project is now on your own account (just make sure to credit your teammates in the README).
  2. Another option is to push name/feature to origin. As long as everybody on the team agrees that you are the only person who can use that branch everything should work.
    • When you rebase name/feature you will need to force push (git push -f) to your own feature branch since you are changing the history of the branch.
    • It is easiest to have your other teammates never checkout name/feature
    • If you violate the previous point, you will need to recover.
Recovery

This is an advanced topic and you want to follow the workflow above to avoid ending up here.

  1. You and a partner are working on a common branch.
  2. The commits on origin/common look like O -> A
  3. Partner has rebased and partner/common looks like O -> -> B' -> C' -> A'.
  4. You have done some work and mine/common looks like O -> A -> D -> E.
  5. Partner force pushes. origin/common now looks like O -> B' -> C' -> A' (i.e., it exactly matches partner/common).
  6. mine/common and origin/common have now diverged since the rebase replaced commits with copies.
  7. You cannot push (well, you can force push but it will overwrite your partner's work)
  8. You cannot pull, since you have O -> A and origin/common has O -> B'. Which history is correct? Git does not know.
  9. While on mine/common
  10. Create a new branch off of mine/common called mine/recover.
  11. Commit any uncommitted changes
  12. git switch mine/common
  13. git reset origin/common (you will loose uncommitted changes, which is why we created mine/recover)
  14. Now mine/common looks like O -> B' -> C' -> A', and mine/recover looks like O -> A -> D -> E
  15. You can continue to develop mine/common and you can access the work you had done on mine/recover
    • Getting your changes back onto mine/common will likely be a manual process involving merge, rebase, and possibly cherry-pick
Discussion

This section is a more advanced discussion about two key design choices in the workflow.

  1. Why is it based on rebase and not merge?
    • Merging makes the most sense when each branch corresponds to a single feature and then that feature is merged into the main branch (which represents the latest working code). In this way the order in which features were developed is maintained and it is clear when (chronologically) each feature was implemented and added.
    • In our development environment, each branch usually does not nicely correspond to a single discrete feature. Instead, the main code is always changing, and rebasing allows each developer to pretend that their branch started with the most up-to-date working version of the code.
    • In this workflow we never rebase the main branch, so history in the main branch follows the history of each developer getting their code into main in a linear fashion.
  2. Why not just merge the feature branch into main and then push?
    • If somebody has made changes on main before you attempt to push you won't be able to push (which alerts you to this situation). However, you will have already merged all the commits you made into your local main branch, which means that you need to take some corrective action to make your main branch incorporate the changes (e.g., merging into main or rebasing main or even resetting (e.g. moving) a branch). These actions are doable and even the norm in many git workflows, but they amount to an extra maintenance step and require additional thought.
    • By using the simple git workflow, git prevents you from pushing changes to the remote if you have not incorporated the latest changes from main. Because this step happens before you modify your local main branch, the possibilities of what operations you run on main are simplified: you are only ever do a fast-forward merge on main. Meanwhile, you can decide whether you want to incorporate the latest changes from main into your branch right away (usually but not always a good idea) or keep working on the branch as is. Whichever you decide, you do not need to take any special actions (which is not true if you directly merged your code into the local main branch).

Git hosting services

  1. Some popular git hosting services are:
  2. These are convenient places to store your repositories and collaborate with others.
  3. Don't conflate git and GitHub. Git is software tool distributed freely on the internet, whereas GitHub is one of several companies that provide services related to that tool.
  4. Many have items surrounding git that arguably enhance it (but also lock you into the particular provider) such as
    Issues
    Enable users to file a bug report with your program
    Pull Requests
    Enables another user to alert you that they have a contribution to your code ready to go
    Forking
    Tracks that a repository is a clone of another repository
    Static Website Hosting
    Host a static website for your project or portfolio
    Wiki
    Like wikipedia, but for your project
    Gists
    A quick way to host a single file
    Permissions
    Prevent people from performing actions on the repository
    Organizations
    Groups of repositories

How to learn

  1. Use git often. Use it for everything.
  2. Use the command line rather than GUI tools or GitHub to manipulate the repository.
  3. Use gitk --all to see what is happening in your repository.
  4. Do not just blindly run commands. Especially when getting started, take the time to read about them, experiment with them in a copy of your repository and understand what they are doing.
  5. Explore the Resources provided here

Git Prompt

  1. It is extremely useful to setup your terminal to show information about git
  2. To add this to bash you must modify the PS1 variable in your ~/.bashrc
    • The PS1 variable controls the look of your prompt
    • Add $(__git_ps1 "(%s)") to where PS1 is set in ~/.bashrc, just before the \$
  3. This is just the basic prompt showing you the branch. There are many other possibilities out there
Demonstration

Here is the code from the live demonstration

Basic Branching

git init --bare central # Mimic a central repository (e.g., on GitHub)
git clone central repo1
git status
echo "Good Code" > file1.txt
git add file1.txt
git commit  # First commit
git status
gitk --all &
echo "More code" >> file1.txt
git commit -a # Fixed a bug!
git branch br1
git status
git switch br1
git status
echo "this is branch1" >> file1.txt
git commit -a # Another bug is fixed!
less file1.txt
git checkout main
less file1.txt

Conflicts!

echo "I have other ideas" >> file1.txt
git commit -a # Conflict created!
git merge br1
git status
git merge --abort
git merge br1
cat file1.txt
git checkout --ours file1.txt
git checkout --theirs file1.txt
git add file1.txt
git status
git commit #fixed conflict
git push

Clone Me

cd ..
git clone central repo2
cd repo2
ls
git remote -v
git switch -c elwin/feature
echo "I Solved IT!" > file4.txt
git add file4.txt
git commit # Got it
gitk --all &

Meanwhile bash

cd ../repo1
echo "important bug fix" >> file2.txt
git add file2.txt
git commit # That's it
git push

Ready to commit

cd ../repo2
git push origin elwin/feature:main
git fetch
git checkout main
git merge
git switch elwin/feature
git rebase main
git push origin elwin/feature:main
git checkout main
git pull

Exercises

Follow along with these Git exercises to practice your git usage. Along the way, if you don't understand something use the built-in git help to read more about the command git help.

You should also follow along with gitk. Note: if you run gitk before making any commits, you will get an error message pop up. Just click Ok. After your first commit, refresh gitk (Shift-F5) and you should see your commits.

After every action on the repository you must refresh gitk to see the changes reflected.

Basic Operations

  1. First time git setup:

    git config --global user.name "Your Name"
    git config --global user.email "Your email"
    git config --global core.editor YourEditor
    git config --global init.defaultBranch main
    

    (for VScode use 'code --wait' as the YourEditor)

  2. Create a new local repository: Create a directory using mkdir, cd into it, and use git init
  3. Create files on your main branch.
    1. Create your first file, quote.txt and put some content in it (such as quote that you like).
    2. Run git status to see that there is now a file in your repository that is not tracked by git.
    3. Use git add to have git track the file it and place it in the staging area. Confirm with git status
    4. Use git commit to commit the file to the repository. Write a commit message describing what you did.
  4. Launch gitk, and set it to view all branches:
    • gitk --all & (the & runs it in the background so you can keep using the same terminal).
    • After each git operation, use Shift-F5 in the gitk window to update its view of the repository.
    • Try to predict what you are going to see before refreshing the view

      git1.png
  5. Create a second file called robot.txt and commit it to the repository.
    • This file should name your favorite robot and why you like it.

      git2.png
  6. Add another quote to quote.txt and commit the changes.
    • Use git add to stage the file in the index.
    • Use git status to see what git add did.
      • Notice you can unstage the file by using git restore --staged quote.txt.
    • Use git commit -a to automatically stage all tracked files and create a commit.
    • Write a good log message. Remember, the better your log messages are, the more useful git becomes

      git3.png
  7. Create a new branch called test in the repository and switch to it.
    • You can do this with a single command or with a sequence of different commands.
    • Some commands that may be useful are branch and switch.

      git4.png
  8. Append some more information about your favorite robot to robot.txt (on a new line), delete the quote.txt file, and commit the changes.

    • git commit -a will automatically stage the changes to these files (because you have previously added them to git)
    • You can also stage them manually with git add and git rm
    git4_1.png
  9. Switch to the main branch.

    • What has happened to the changes you made on the other branch?
    • What changed in git status and gitk?
    git5.png
  10. Merge the changes from test into main

    • If you followed the instructions literally and appended data there should be no conflicts and the merge will go through as a fast-forward merge, essentially just taking the commit from test and adding it to your current main branch
    • If you modified previously existing text in robot.txt, you may get merge conflicts
      • use git status to see what files are affected
      • You can edit the affected files, add them, and commit the changes to complete the merge.
    git6.png
  11. Create a file that should not be tracked: for example a zip file of the contents of your repository, using the zip command (zip <zipfile> <files_to_add>).
  12. Use git status to see that the file is not tracked. Create a .gitignore file to ignore the zip file and verify that the zip file does not show up in git status. Commit .gitignore to the repository.

Remotes

  1. Log in to your GitHub account
    • GitHub has many convenience functions that you can use to modify your repository directly from their website.
    • Avoid these features: they create commits directly on your remote repository, which can lead to confusion.
  2. On GitHub, create a new private repository and don't create any files.
  3. Add your ssh key to your github account: https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account
  4. On your local repository, use git remote -v to see remote repositories. You shouldn't see anything.
    • Technically a git remote can point to a git repository anywhere, including somewhere else on your harddrive
    • You do not need a third-party service like github to make git remotes.
  5. The URL for your repository is git@github.com:user-name/repository-name.git
    • If you created the repository on github before doing anything on your system, you would git clone the repository
      • This will automatically add a remote called origin
    • In this case, we already have a repository so instead we just need to manually create the remote origin.
    • There is also an https URL, which is useful for cloning public repositories, but you cannot use ssh keys with this method.
      • The https URL follows the pattern https://github.com/user-name/repository-name.git
  6. Add your remote git repository as a remote called origin.
    • git remote add origin <url_to_repository>
  7. Push your main branch to origin and set it up such that your main branch to track origin/main:
    • git push -u origin main
    • Now that your local branch is tracking a remote branch, git push will automatically push your main to origin/main
    • Be sure to look in gitk to see both your local and remote branches
    • Look on GitHub and see that your changes were indeed pushed.
  8. Make sure your ssh keys are working: if they are working you need not enter a password when doing a git push
    • You may need to log out and log back in for a new key to be unlocked
    • See Linux Notes for how to generate an ssh key

Collaboration

You will now be divided into teams.

  1. Select 1 team member. Everyone should clone that partner repository from GitHub.
    • The creator of the github repository should add their partner as a collaborator so that both partners may push
  2. Create a new feature branch in partner named yourname/aboutme.
  3. Create a file called yourname.txt.
    • In this file write something that you would like to share about yourself with others two true statements about yourself and one false statement.
    • Place each statement on it's own line.
    • This is the "feature" you are adding to the "program"
  4. commit your changes to your local branch.
  5. One by one, each team member should merge their feature into main
    • Follow the Simple Git Workflow
    • Features should be fully merged one at a time.
    • Pushes and merges should go smoothly.
      • You have no conflicts because you edited different files
  6. At the end of this, everyone should update their local main branch to match origin and their feature branch should be up-to-date as well.

Simultaneous editing

We will now engage in some mayhem.

  1. Everybody should edit every yourname.txt file (on their own branches feature branches)
    • Place $username.T next to each line that you think is true, and $username.F next to each line you think is false (substitute your username for $username)
  2. Each team member should simultaneously attempt to merge their feature into main
    • Follow the Simple Git Workflow, as before
    • People on your team will encounter conflicts. Resolve them by combining the T/F ratings onto a single line and committing the files.
  3. When you are done, everyone's vote should be on every line. How well did you do?

Resources

  1. Pro Book: A definitive guide to git. Definitely recommend reading the first few chapters.
  2. Git from the Bottom up. Learn git by understanding its internal data-structures.
  3. Learning Git Interactive Tutorial.
  4. Git Imersion step-by-step walkthrough
  5. Github Guides Guides for git by GitHub
  6. Learn more about git Workflows

Footnotes:

1

Linus Torvalds wrote git so he could stop using Bitkeeper

2

The git switch and git restore commands are relatively recent additions. Prior to their introduction, git checkout (somewhat confusingly) served both of these purposes and you will still encounter many references to git checkout.

Author: Matthew Elwin.