Introduction to git
What is Version Control
A tool for tracking changes to files and synchronizing updates between multiple developers.
Why Version Control?
- Go back to previous versions
- No saving of
final_final_really_final_version.doc
- Easily merge changes from multiple people
- Keep a record of why you made changes along with those changes
Non-git version control
- Concurrent Versions System (cvs)
- Subversion (svn)
- Mercurial
- Perforce
- Bitkeeper1
What is git?
- Git is a version control system originally developed by Linus Torvalds for use with the Linux kernel.
- Git is used by Google, Facebook, Microsoft, Netflix, Boston Dynamics, and the Linux Kernel (and almost everyone else too!).
- The source code for most programs on your machine is maintained in
git
Why git?
Git enables you to easily keep a history of the changes, the reasons behind them, and combine your work with others. It is ubiquitous, distributed, fast, and flexible. Used properly, it frees you to experiment with your code, without worrying about whether you will irrevocably loose something important.
It is also a job requirement for nearly any robotics job involving software (which is to say almost every robotics job).
- Strive to be the person that everyone on the team comes to with
git
questions. - Avoid being the person who messes up the repository for the team.
- Almost every error in git is recoverable with some effort
- Each of you will likely make some or even many
git
mistakes along the way. It is through fixing these mistakes that you may gain mastery ofgit
.
Jupyter Notebooks
- While git works with any type of file, it works best with text files designed to be edited by humans.
- One type of file you may encounter that does not quite fit this model is a Jupyter notebook
- A Jupyter notebook combines source code (often python) and its output, in an interactive format
- If coding Jupyter notebook, I recommend that you look into using
git
along with jupytext.
Git Help
- Often, when you do something slightly wrong in
git
,git
will suggest the appropriate action. - Getting help:
git help <verb>
git <verb> --help
man git-<verb>
The Three Areas of Git
- Working directory - The contents of your files on disk, as they appear with
ls
etc.- You edit files in your working directory as usual
- Commit History- The repository contains the full project history, consisting of all of the data that you have committed from the staging area.
- Staging Index - The holding area for the changes you make to your working directory.
- After you change files, you explicitly add the changed files to the staging area, in preparation for saving to the repository (committing).
- Often shortened to just Index
Basic git usage
- A git repository is a directed acyclic garph (DAC) of objects, called commits.
git commit
creates a commit, a snapshot of the files you are tracking in your working directory.- To track a file, add it to your staging area using
git add
- You can remove a tracked file using
git rm
. If you had previously commited it it will still be recoverable from the repository. - You should write a message to accompany the commit. The message explains what the commit does.
- Automatically commit all tracked changes using
git commit -a
- Each commit is identified by a hash, a number calculated from the state of the changes.
- The HEAD is a pointer to a commit corresponding to the current state of your repository.
- Use
git status
to see current information about the state of your repository.- You can run this frequently and there are
bash
scripts to add various levels of the information it provides to your prompt
- You can run this frequently and there are
- You can
checkout
a commit to make your working directory match its state at a given commit- A checkout moves your HEAD to a different commit and sets your working directory to correspond to that commit.
- You can
branch
to create a name for a commit (rather than its hash).- Every git starts with a default branch, usually called either
master
ormain
(but it can be named anything). - You can create other branches. When you commit, you move the branch to the latest stage.
- By maintaining multiple branches you create a DAC of commits. This enables you to experiment easily
- Every git starts with a default branch, usually called either
- Use
git switch
to switch branches2 - Use
git restore
to retrieve a file from a different commit2 - Use
git log
to display information about the commits made on a branch and the associated meta-information - Join changes from two branches together using
git merge
.- If on each of the two branches you have not edited files in the same place, a fast-forward merge occurs, which simply adds the commits from the branch you are merging on top of the current branch
- Other merge strategies also exist. If a file has been modified in incompatible ways, you get a merge conflict that must be resolved manually
- Git provides tools for managing conflicting edits, but it is best to coordinate with your team to avoid the situation.
- When you add a file in the directory, git lists all other files that it does not know about as "untracked".
- You can prevent a file showing up as untracked by listing it in a
.gitignore
file .gitignore
also accepts globbing patterns (e.g.,*.o
will ignore all files ending in.o
)..gitignore
is useful for keeping the repository clean and preventing user-generated files that should not be shared from cluttering git- Examples of files that should be in
.gitignore
- Results of compilation (e.g., anything generated from the source code)
- Personal preferences that apply only on your machine
- You can prevent a file showing up as untracked by listing it in a
- Remotes connect different repositories and are created with
git remote add
.- The repositories can be in different directories or different computers.
- Remotes contain branches that you can merge with the base repository.
- A remote is usually a copy of your repository, but it need not be.
- Git is distributed so there need not be a central repository. However most projects have a central repository in practice and host it on a website like GitLab, GitHub, BitBucket, or sourcehut.org
- Git provides commands for keeping remote repositories in sync.
git fetch
retrieves commits from a remote repository. The branches become available on your local repository for merging.git pull
is the same asgit fetch
andgit merge
. It is safer to separate these commands.git push
updates a remote branch with the commits on your branch.
- Cloning:
git clone
creates a copy of a repository from another location - Tagging:
git tag
can be used to label a commit. Like a branch, but usually you keep a tag in the same place.- Projects often provide tags for each release of their software
- Stash:
git stash
saves what you've been working on and restores your working directory to the last commit. You can then restore your changes from the stash. Useful if you need to quickly revisit your last commit without losing what you are currently working on. - Rebase:
git rebase
. A tool for re-writing git history. Very powerful, used in some git workflows to keep history clean. If used in conjunction with a git workflow with strict rules can be quite useful.
Git Tips
- When in doubt, copy your repository, and try out commands. Repeat until you get the desired result
- It is very hard to actually lose data in git
- That does not mean the commands to recover are always straightforward
- If you just copy your repo, you can experiment without worry.
- Commit often! You should try to be coding incrementally. Set many small goals, when you accomplish each goal commit.
- Commits create milestones that you can explore and restore your code from at any time
- I can predict how well a project works based on the number of commits
- Don't add generated content, such as compiled files to your code. Put these files in your
.gitignore
- Avoid adding large binary data files such as databases or large datasets
- These take up a lot of storage when they change, since git needs to store a whole new copy
- There are tools for handling large binary files like git annex or git-lfs
- Keep small formatting changes in separate commits from substantive changes. Try to lump small formatting changes together
- Makes the history clear. Try to follow a style guide when coding.
- Write good commit messages that clearly explain what your commit did.
- The first line is a summary. Then there is a blank line, followed by more details. Don't forget the more details part!
- Many projects limit line widths to 80 characters for easy display on a narrow part of the screen
- Employ a git workflow for your team.
- A company will likely have some practices already in place.
- I provide a workflow below
Simple Git Workflow
Here is a workflow that I have found is useful for small student groups. The main benefits of this workflow are
- The
main
branch always has working code. git
will force you to incorporate your teammates' changes prior to changingmain
Setup
- There is a central repository called
origin
. Everybody clonesorigin
- The
main
branch on the central repository must always contain the latest working code.- Your local
main
branch simply tracksorigin/main
. You never edit code onmain
directly=
- Your local
- Everybody works in branches of the form
name/feature
.- For smaller projects it is okay if each person has just one feature, but you can have as many branches as you want
- Do not ever create a branch called just
name
since it prevents you from makingname/feature
later due to a quirk in git.
Process
- Get the latest code from your team. You always track this progress on your local
main
branch:git switch main
to be sure you are on yourmain
branchgit fetch
fromorigin
to retrieve the latest updatesgit merge --ff-only
the changes onto your localmain
branch.Details
- If this merge fails, it means you edited code on
main
directly (which is not permitted in this workflow). - To recover depends on what happened, but here is one way to hopefully (get you back on the workflow path):
- Commit your current work.
- Create a feature branch at the current location:
git branch name/feature
- Reset the
main
branch to theorigin
location:git reset origin/main
- The
git merge --ff-only
should succeed and you can now continue with the regular workflow.
- If this merge fails, it means you edited code on
main
is now up to date withorigin/main
.- In the normal workflow, this step is the only time you ever modify anything on
main
- Incorporate the code from the team into your branch
- If starting a new branch:
git switch -c name/feature
to create it and switch to it.- You can now begin coding
- If working on an existing branch:
git switch name/feature
to switch to the branchgit rebase main
to incorporate the upstream changes.Details
This command will
- Temporarily remove all the commits you have worked on since you last diverged from
main
- Insert the commits from
main
that you had not yet incorporated into your branch - Replay your commits on to of the last commit in main. (So it will be as if you made your changes against the latest
main
branch). - If there are conflicts, prompt you to fix and commit them.
-You can always abort the rebase with
git rebase --abort
and communicate with your team if you are unsure about the nature of the conflicts.- Temporarily remove all the commits you have worked on since you last diverged from
- Test your code and make sure it works before proceeding.
- If starting a new branch:
- When ready to merge your feature branch to the remote main
git push origin name/feature:main
- If nobody has pushed to
main
since your last fetch, this will go smoothly. Coordinate with your team prior to pushing to main! - If somebody has pushed to
main
, your push will be rejected and you must incorporate their work before pushing. Go to step 1.
- If you want to share code with some group members but are not ready to commit to
main
, you may establish other shared branches calledshare/feature
. They behave like themain
branch except the bar for committing/pushing to them can be set to an arbitrarily low level (i.e., you may even agree to commit broken code that does not compile to it).
Extensions
One shortcoming of this workflow is that your name/feature
branch only exists on your local computer.
If you are properly backing up your system, this should not be a problem; however it is probably a good idea
to always push feature branches to a remote repository just in case.
- One option is to make secondary copy of your repository and branches at a different remote (called, for example,
myorigin
).- You can push your feature branches to this repository as a backup, without sharing them with your team
- This
fork
of your main repository can also be useful when showing off your GitHub: the project is now on your own account (just make sure to credit your teammates in the README).
- Another option is to push
name/feature
toorigin
. As long as everybody on the team agrees that you are the only person who can use that branch everything should work.- When you rebase
name/feature
you will need to force push (git push -f
) to your own feature branch since you are changing the history of the branch. - It is easiest to have your other teammates never checkout
name/feature
- If you violate the previous point, you will need to recover.
- When you rebase
Recovery
This is an advanced topic and you want to follow the workflow above to avoid ending up here.
- You and a partner are working on a common branch.
- The commits on
origin/common
look like O -> A - Partner has rebased and
partner/common
looks like O -> -> B' -> C' -> A'. - You have done some work and
mine/common
looks like O -> A -> D -> E. - Partner force pushes.
origin/common
now looks like O -> B' -> C' -> A' (i.e., it exactly matchespartner/common
). mine/common
andorigin/common
have now diverged since the rebase replaced commits with copies.- You cannot push (well, you can force push but it will overwrite your partner's work)
- You cannot pull, since you have O -> A and
origin/common
has O -> B'. Which history is correct? Git does not know. - While on
mine/common
- Create a new branch off of
mine/common
calledmine/recover
. - Commit any uncommitted changes
git switch mine/common
git reset origin/common
(you will loose uncommitted changes, which is why we createdmine/recover
)- Now
mine/common
looks like O -> B' -> C' -> A', andmine/recover
looks like O -> A -> D -> E - You can continue to develop
mine/common
and you can access the work you had done onmine/recover
- Getting your changes back onto
mine/common
will likely be a manual process involvingmerge
,rebase
, and possiblycherry-pick
- Getting your changes back onto
Discussion
This section is a more advanced discussion about two key design choices in the workflow.
- Why is it based on rebase and not merge?
- Merging makes the most sense when each branch corresponds to a single feature and then that feature is merged into the
main
branch (which represents the latest working code). In this way the order in which features were developed is maintained and it is clear when (chronologically) each feature was implemented and added. - In our development environment, each branch usually does not nicely correspond to a single discrete feature. Instead, the main code is always changing, and rebasing allows each developer to pretend that their branch started with the most up-to-date working version of the code.
- In this workflow we never
rebase
the main branch, so history in themain
branch follows the history of each developer getting their code intomain
in a linear fashion.
- Merging makes the most sense when each branch corresponds to a single feature and then that feature is merged into the
- Why not just merge the feature branch into main and then push?
- If somebody has made changes on
main
before you attempt to push you won't be able to push (which alerts you to this situation). However, you will have already merged all the commits you made into your localmain
branch, which means that you need to take some corrective action to make yourmain
branch incorporate the changes (e.g., merging into main or rebasing main or even resetting (e.g. moving) a branch). These actions are doable and even the norm in many git workflows, but they amount to an extra maintenance step and require additional thought. - By using the simple git workflow, git prevents you from pushing changes to the remote if you have not incorporated the latest changes from
main
. Because this step happens before you modify your localmain
branch, the possibilities of what operations you run onmain
are simplified: you are only ever do a fast-forward merge onmain
. Meanwhile, you can decide whether you want to incorporate the latest changes frommain
into your branch right away (usually but not always a good idea) or keep working on the branch as is. Whichever you decide, you do not need to take any special actions (which is not true if you directly merged your code into the local main branch).
- If somebody has made changes on
Git hosting services
- Some popular git hosting services are:
- These are convenient places to store your repositories and collaborate with others.
- Don't conflate
git
and GitHub. Git is software tool distributed freely on the internet, whereas GitHub is one of several companies that provide services related to that tool. - Many have items surrounding git that arguably enhance it (but also lock you into the particular provider) such as
- Issues
- Enable users to file a bug report with your program
- Pull Requests
- Enables another user to alert you that they have a contribution to your code ready to go
- Forking
- Tracks that a repository is a clone of another repository
- Static Website Hosting
- Host a static website for your project or portfolio
- Wiki
- Like wikipedia, but for your project
- Gists
- A quick way to host a single file
- Permissions
- Prevent people from performing actions on the repository
- Organizations
- Groups of repositories
How to learn
- Use
git
often. Use it for everything. - Use the command line rather than GUI tools or GitHub to manipulate the repository.
- Use
gitk --all
to see what is happening in your repository. - Do not just blindly run commands. Especially when getting started, take the time to read about them, experiment with them in a copy of your repository and understand what they are doing.
- Explore the Resources provided here
Git Prompt
- It is extremely useful to setup your terminal to show information about git
- To add this to bash you must modify the
PS1
variable in your~/.bashrc
- The
PS1
variable controls the look of your prompt - Add
$(__git_ps1 "(%s)")
to wherePS1
is set in~/.bashrc
, just before the\$
- The
- This is just the basic prompt showing you the branch. There are many other possibilities out there
Demonstration
Here is the code from the live demonstration
Basic Branching
git init --bare central # Mimic a central repository (e.g., on GitHub) git clone central repo1 git status echo "Good Code" > file1.txt git add file1.txt git commit # First commit git status gitk --all & echo "More code" >> file1.txt git commit -a # Fixed a bug! git branch br1 git status git switch br1 git status echo "this is branch1" >> file1.txt git commit -a # Another bug is fixed! less file1.txt git checkout main less file1.txt
Conflicts!
echo "I have other ideas" >> file1.txt git commit -a # Conflict created! git merge br1 git status git merge --abort git merge br1 cat file1.txt git checkout --ours file1.txt git checkout --theirs file1.txt git add file1.txt git status git commit #fixed conflict git push
Clone Me
cd .. git clone central repo2 cd repo2 ls git remote -v git switch -c elwin/feature echo "I Solved IT!" > file4.txt git add file4.txt git commit # Got it gitk --all &
Meanwhile bash
cd ../repo1 echo "important bug fix" >> file2.txt git add file2.txt git commit # That's it git push
Ready to commit
cd ../repo2
git push origin elwin/feature:main
git fetch
git checkout main
git merge
git switch elwin/feature
git rebase main
git push origin elwin/feature:main
git checkout main
git pull
Exercises
Follow along with these Git exercises to practice your git usage.
Along the way, if you don't understand something use the built-in git help
to read more about the command git help
.
You should also follow along with gitk
. Note: if you run gitk
before making any commits, you will get an error message pop up.
Just click Ok. After your first commit, refresh gitk
(Shift-F5
) and
you should see your commits.
After every action on the repository you must refresh gitk
to see the changes reflected.
Basic Operations
First time git setup:
git config --global user.name "Your Name" git config --global user.email "Your email" git config --global core.editor YourEditor git config --global init.defaultBranch main
(for VScode use
'code --wait'
as theYourEditor
)- Create a new local repository:
Create a directory using
mkdir
,cd
into it, and usegit init
- Create files on your main branch.
- Create your first file,
quote.txt
and put some content in it (such as quote that you like). - Run
git status
to see that there is now a file in your repository that is not tracked by git. - Use
git add
to have git track the file it and place it in the staging area. Confirm withgit status
- Use
git commit
to commit the file to the repository. Write a commit message describing what you did.
- Create your first file,
- Launch
gitk
, and set it to view all branches:gitk --all &
(the&
runs it in the background so you can keep using the same terminal).- After each git operation, use
Shift-F5
in thegitk
window to update its view of the repository. Try to predict what you are going to see before refreshing the view
- Create a second file called
robot.txt
and commit it to the repository.This file should name your favorite robot and why you like it.
- Add another quote to
quote.txt
and commit the changes.- Use
git add
to stage the file in the index. - Use
git status
to see whatgit add
did.- Notice you can unstage the file by using
git restore --staged quote.txt
.
- Notice you can unstage the file by using
- Use
git commit -a
to automatically stage all tracked files and create a commit. Write a good log message. Remember, the better your log messages are, the more useful git becomes
- Use
- Create a new branch called
test
in the repository and switch to it.- You can do this with a single command or with a sequence of different commands.
Some commands that may be useful are
branch
andswitch
.
Append some more information about your favorite robot to
robot.txt
(on a new line), delete thequote.txt
file, and commit the changes.git commit -a
will automatically stage the changes to these files (because you have previously added them to git)- You can also stage them manually with
git add
andgit rm
Switch to the
main
branch.- What has happened to the changes you made on the other branch?
- What changed in
git status
andgitk
?
Merge the changes from
test
intomain
- If you followed the instructions literally and appended data there should be no conflicts and the merge will go through as a fast-forward merge,
essentially just taking the commit from
test
and adding it to your currentmain
branch - If you modified previously existing text in
robot.txt
, you may get merge conflicts- use
git status
to see what files are affected - You can edit the affected files, add them, and commit the changes to complete the merge.
- use
- If you followed the instructions literally and appended data there should be no conflicts and the merge will go through as a fast-forward merge,
essentially just taking the commit from
- Create a file that should not be tracked: for example a zip file of the contents of your repository,
using the
zip
command (zip <zipfile> <files_to_add>
). - Use
git status
to see that the file is not tracked. Create a.gitignore
file to ignore the zip file and verify that the zip file does not show up ingit status
. Commit.gitignore
to the repository.
Remotes
- Log in to your GitHub account
- GitHub has many convenience functions that you can use to modify your repository directly from their website.
- Avoid these features: they create commits directly on your remote repository, which can lead to confusion.
- On GitHub, create a new private repository and don't create any files.
- Add your ssh key to your github account: https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account
- On your local repository, use
git remote -v
to see remote repositories. You shouldn't see anything.- Technically a git remote can point to a git repository anywhere, including somewhere else on your harddrive
- You do not need a third-party service like github to make git remotes.
- The URL for your repository is
git@github.com:user-name/repository-name.git
- If you created the repository on github before doing anything on your system, you would
git clone
the repository- This will automatically add a
remote
calledorigin
- This will automatically add a
- In this case, we already have a repository so instead we just need to manually create the remote
origin
. - There is also an
https
URL, which is useful for cloning public repositories, but you cannot use ssh keys with this method.- The
https
URL follows the patternhttps://github.com/user-name/repository-name.git
- The
- If you created the repository on github before doing anything on your system, you would
- Add your remote git repository as a remote called
origin
.git remote add origin <url_to_repository>
- Push your main branch to
origin
and set it up such that yourmain
branch to trackorigin/main
:git push -u origin main
- Now that your local branch is tracking a remote branch,
git push
will automatically push yourmain
toorigin/main
- Be sure to look in
gitk
to see both your local and remote branches - Look on GitHub and see that your changes were indeed pushed.
- Make sure your ssh keys are working: if they are working you need not enter a password when doing a
git push
- You may need to log out and log back in for a new key to be unlocked
- See Linux Notes for how to generate an ssh key
Collaboration
You will now be divided into teams.
- Select 1 team member. Everyone should clone that
partner
repository from GitHub.- The creator of the
github
repository should add their partner as a collaborator so that both partners may push
- The creator of the
- Create a new feature branch in
partner
namedyourname/aboutme
. - Create a file called
yourname.txt
.- In this file write something that you would like to share about yourself with others two true statements about yourself and one false statement.
- Place each statement on it's own line.
- This is the "feature" you are adding to the "program"
commit
your changes to your local branch.- One by one, each team member should merge their feature into
main
- Follow the Simple Git Workflow
- Features should be fully merged one at a time.
- Pushes and merges should go smoothly.
- You have no conflicts because you edited different files
- At the end of this, everyone should update their local
main
branch to matchorigin
and their feature branch should beup-to-date
as well.
Simultaneous editing
We will now engage in some mayhem.
- Everybody should edit every
yourname.txt
file (on their own branches feature branches)- Place $username.T next to each line that you think is true, and $username.F next to each line you think is false (substitute your username for
$username
)
- Place $username.T next to each line that you think is true, and $username.F next to each line you think is false (substitute your username for
- Each team member should simultaneously attempt to merge their feature into
main
- Follow the Simple Git Workflow, as before
- People on your team will encounter conflicts. Resolve them by combining the T/F ratings onto a single line and committing the files.
- When you are done, everyone's vote should be on every line. How well did you do?
Resources
- Pro Book: A definitive guide to git. Definitely recommend reading the first few chapters.
- Git from the Bottom up. Learn git by understanding its internal data-structures.
- Learning Git Interactive Tutorial.
- Git Imersion step-by-step walkthrough
- Github Guides Guides for git by GitHub
- Learn more about git Workflows
Footnotes:
Linus Torvalds wrote git
so he could stop using Bitkeeper
The git switch
and git restore
commands are relatively recent additions. Prior to their introduction, git checkout
(somewhat confusingly) served both of these purposes
and you will still encounter many references to git checkout
.