Introduction to git
What is Version Control
A tool for tracking changes to files and synchronizing updates between multiple developers.
Why Version Control?
- Go back to previous versions
- No saving of
final_final_really_final_version.doc - Easily merge changes from multiple people
- Keep a record of why you made changes along with those changes
Non-git version control
- Concurrent Versions System (cvs)
- Subversion (svn)
- Mercurial
- Perforce
- Bitkeeper1
What is git?
- Git is a version control system originally developed by Linus Torvalds for use with the Linux kernel.
- Git is used by Google, Facebook, Microsoft, Netflix, Boston Dynamics, and the Linux Kernel (and almost everyone else too!).
- The source code for most programs on your machine is maintained in
git
Why git?
Git enables you to easily keep a history of the changes, the reasons behind them, and combine your work with others. It is ubiquitous, distributed, fast, and flexible. Used properly, it frees you to experiment with your code, without worrying about whether you will irrevocably loose something important.
It is also a job requirement for nearly any robotics job involving software (which is to say almost every robotics job).
- Strive to be the person that everyone on the team comes to with
gitquestions. - Avoid being the person who messes up the repository for the team.
- Almost every error in git is recoverable with some effort
- Each of you will likely make some or even many
gitmistakes along the way. It is through fixing these mistakes that you may gain mastery ofgit.
Jupyter Notebooks
- While git works with any type of file, it works best with text files designed to be edited by humans.
- One type of file you may encounter that does not quite fit this model is a Jupyter notebook
- A Jupyter notebook combines source code (often python) and its output, in an interactive format
- If coding Jupyter notebook, I recommend that you look into using
gitalong with jupytext.
Git Help
- Often, when you do something slightly wrong in
git,gitwill suggest the appropriate action. - Getting help:
git help <verb> git <verb> --helpman git-<verb>
The Three Areas of Git
- Working directory - The contents of your files on disk, as they appear with
lsetc.- You edit files in your working directory as usual
- Commit History- The repository contains the full project history, consisting of all of the data that you have committed from the staging area.
- Staging Index - The holding area for the changes you make to your working directory.
- After you change files, you explicitly add the changed files to the staging area, in preparation for saving to the repository (committing).
- Often shortened to just Index
Basic git usage
- A git repository is a directed acyclic garph (DAC) of objects, called commits.
git commitcreates a commit, a snapshot of the files you are tracking in your working directory.- To track a file, add it to your staging area using
git add - You can remove a tracked file using
git rm. If you had previously commited it it will still be recoverable from the repository. - You should write a message to accompany the commit. The message explains what the commit does.
- Automatically commit all tracked changes using
git commit -a - Each commit is identified by a hash, a number calculated from the state of the changes.
- The HEAD is a pointer to a commit corresponding to the current state of your repository.
- Use
git statusto see current information about the state of your repository.- You can run this frequently and there are
bashscripts to add various levels of the information it provides to your prompt
- You can run this frequently and there are
- You can
checkouta commit to make your working directory match its state at a given commit- A checkout moves your HEAD to a different commit and sets your working directory to correspond to that commit.
- You can
branchto create a name for a commit (rather than its hash).- Every git starts with a default branch, usually called either
masterormain(but it can be named anything). - You can create other branches. When you commit, you move the branch to the latest stage.
- By maintaining multiple branches you create a DAC of commits. This enables you to experiment easily
- Every git starts with a default branch, usually called either
- Use
git switchto switch branches2 - Use
git restoreto retrieve a file from a different commit2 - Use
git logto display information about the commits made on a branch and the associated meta-information - Join changes from two branches together using
git merge.- If on each of the two branches you have not edited files in the same place, a fast-forward merge occurs, which simply adds the commits from the branch you are merging on top of the current branch
- Other merge strategies also exist. If a file has been modified in incompatible ways, you get a merge conflict that must be resolved manually
- Git provides tools for managing conflicting edits, but it is best to coordinate with your team to avoid the situation.
- When you add a file in the directory, git lists all other files that it does not know about as "untracked".
- You can prevent a file showing up as untracked by listing it in a
.gitignorefile .gitignorealso accepts globbing patterns (e.g.,*.owill ignore all files ending in.o)..gitignoreis useful for keeping the repository clean and preventing user-generated files that should not be shared from cluttering git- Examples of files that should be in
.gitignore- Results of compilation (e.g., anything generated from the source code)
- Personal preferences that apply only on your machine
- You can prevent a file showing up as untracked by listing it in a
- Remotes connect different repositories and are created with
git remote add.- The repositories can be in different directories or different computers.
- Remotes contain branches that you can merge with the base repository.
- A remote is usually a copy of your repository, but it need not be.
- Git is distributed so there need not be a central repository. However most projects have a central repository in practice and host it on a website like GitLab, GitHub, BitBucket, or sourcehut.org
- Git provides commands for keeping remote repositories in sync.
git fetchretrieves commits from a remote repository. The branches become available on your local repository for merging.git pullis the same asgit fetchandgit merge. It is safer to separate these commands.git pushupdates a remote branch with the commits on your branch.
- Cloning:
git clonecreates a copy of a repository from another location - Tagging:
git tagcan be used to label a commit. Like a branch, but usually you keep a tag in the same place.- Projects often provide tags for each release of their software
- Stash:
git stashsaves what you've been working on and restores your working directory to the last commit. You can then restore your changes from the stash. Useful if you need to quickly revisit your last commit without losing what you are currently working on. - Rebase:
git rebase. A tool for re-writing git history. Very powerful, used in some git workflows to keep history clean. If used in conjunction with a git workflow with strict rules can be quite useful.
Git Tips
- When in doubt, copy your repository, and try out commands. Repeat until you get the desired result
- It is very hard to actually lose data in git
- That does not mean the commands to recover are always straightforward
- If you just copy your repo, you can experiment without worry.
- Commit often! You should try to be coding incrementally. Set many small goals, when you accomplish each goal commit.
- Commits create milestones that you can explore and restore your code from at any time
- I can predict how well a project works based on the number of commits
- Don't add generated content, such as compiled files to your code. Put these files in your
.gitignore - Avoid adding large binary data files such as databases or large datasets
- These take up a lot of storage when they change, since git needs to store a whole new copy
- There are tools for handling large binary files like git annex or git-lfs
- Keep small formatting changes in separate commits from substantive changes. Try to lump small formatting changes together
- Makes the history clear. Try to follow a style guide when coding.
- Write good commit messages that clearly explain what your commit did.
- The first line is a summary. Then there is a blank line, followed by more details. Don't forget the more details part!
- Many projects limit line widths to 80 characters for easy display on a narrow part of the screen
- Employ a git workflow for your team.
- A company will likely have some practices already in place.
- I provide a workflow below
Simple Git Workflow
Here is a workflow that I have found is useful for small student groups. The main benefits of this workflow are
- The
mainbranch always has working code. gitwill force you to incorporate your teammates' changes prior to changingmain
Setup
- There is a central repository called
origin. Everybody clonesorigin - The
mainbranch on the central repository must always contain the latest working code.- Your local
mainbranch simply tracksorigin/main. You never edit code onmaindirectly=
- Your local
- Everybody works in branches of the form
name/feature.- For smaller projects it is okay if each person has just one feature, but you can have as many branches as you want
- Do not ever create a branch called just
namesince it prevents you from makingname/featurelater due to a quirk in git.
Process
- Get the latest code from your team. You always track this progress on your local
mainbranch:git switch mainto be sure you are on yourmainbranchgit fetchfromoriginto retrieve the latest updatesgit merge --ff-onlythe changes onto your localmainbranch.Details
- If this merge fails, it means you edited code on
maindirectly (which is not permitted in this workflow). - To recover depends on what happened, but here is one way to hopefully (get you back on the workflow path):
- Commit your current work.
- Create a feature branch at the current location:
git branch name/feature - Reset the
mainbranch to theoriginlocation:git reset origin/main - The
git merge --ff-onlyshould succeed and you can now continue with the regular workflow.
- If this merge fails, it means you edited code on
mainis now up to date withorigin/main.- In the normal workflow, this step is the only time you ever modify anything on
main
- Incorporate the code from the team into your branch
- If starting a new branch:
git switch -c name/featureto create it and switch to it.- You can now begin coding
- If working on an existing branch:
git switch name/featureto switch to the branchgit rebase mainto incorporate the upstream changes.Details
This command will
- Temporarily remove all the commits you have worked on since you last diverged from
main - Insert the commits from
mainthat you had not yet incorporated into your branch - Replay your commits on to of the last commit in main. (So it will be as if you made your changes against the latest
mainbranch). - If there are conflicts, prompt you to fix and commit them.
-You can always abort the rebase with
git rebase --abortand communicate with your team if you are unsure about the nature of the conflicts.- Temporarily remove all the commits you have worked on since you last diverged from
- Test your code and make sure it works before proceeding.
- If starting a new branch:
- When ready to merge your feature branch to the remote main
git push origin name/feature:main- If nobody has pushed to
mainsince your last fetch, this will go smoothly. Coordinate with your team prior to pushing to main! - If somebody has pushed to
main, your push will be rejected and you must incorporate their work before pushing. Go to step 1.
- If you want to share code with some group members but are not ready to commit to
main, you may establish other shared branches calledshare/feature. They behave like themainbranch except the bar for committing/pushing to them can be set to an arbitrarily low level (i.e., you may even agree to commit broken code that does not compile to it).
Extensions
One shortcoming of this workflow is that your name/feature branch only exists on your local computer.
If you are properly backing up your system, this should not be a problem; however it is probably a good idea
to always push feature branches to a remote repository just in case.
- One option is to make secondary copy of your repository and branches at a different remote (called, for example,
myorigin).- You can push your feature branches to this repository as a backup, without sharing them with your team
- This
forkof your main repository can also be useful when showing off your GitHub: the project is now on your own account (just make sure to credit your teammates in the README).
- Another option is to push
name/featuretoorigin. As long as everybody on the team agrees that you are the only person who can use that branch everything should work.- When you rebase
name/featureyou will need to force push (git push -f) to your own feature branch since you are changing the history of the branch. - It is easiest to have your other teammates never checkout
name/feature - If you violate the previous point, you will need to recover.
- When you rebase
Recovery
This is an advanced topic and you want to follow the workflow above to avoid ending up here.
- You and a partner are working on a common branch.
- The commits on
origin/commonlook like O -> A - Partner has rebased and
partner/commonlooks like O -> -> B' -> C' -> A'. - You have done some work and
mine/commonlooks like O -> A -> D -> E. - Partner force pushes.
origin/commonnow looks like O -> B' -> C' -> A' (i.e., it exactly matchespartner/common). mine/commonandorigin/commonhave now diverged since the rebase replaced commits with copies.- You cannot push (well, you can force push but it will overwrite your partner's work)
- You cannot pull, since you have O -> A and
origin/commonhas O -> B'. Which history is correct? Git does not know. - While on
mine/common - Create a new branch off of
mine/commoncalledmine/recover. - Commit any uncommitted changes
git switch mine/commongit reset origin/common(you will loose uncommitted changes, which is why we createdmine/recover)- Now
mine/commonlooks like O -> B' -> C' -> A', andmine/recoverlooks like O -> A -> D -> E - You can continue to develop
mine/commonand you can access the work you had done onmine/recover- Getting your changes back onto
mine/commonwill likely be a manual process involvingmerge,rebase, and possiblycherry-pick
- Getting your changes back onto
Discussion
This section is a more advanced discussion about two key design choices in the workflow.
- Why is it based on rebase and not merge?
- Merging makes the most sense when each branch corresponds to a single feature and then that feature is merged into the
mainbranch (which represents the latest working code). In this way the order in which features were developed is maintained and it is clear when (chronologically) each feature was implemented and added. - In our development environment, each branch usually does not nicely correspond to a single discrete feature. Instead, the main code is always changing, and rebasing allows each developer to pretend that their branch started with the most up-to-date working version of the code.
- In this workflow we never
rebasethe main branch, so history in themainbranch follows the history of each developer getting their code intomainin a linear fashion.
- Merging makes the most sense when each branch corresponds to a single feature and then that feature is merged into the
- Why not just merge the feature branch into main and then push?
- If somebody has made changes on
mainbefore you attempt to push you won't be able to push (which alerts you to this situation). However, you will have already merged all the commits you made into your localmainbranch, which means that you need to take some corrective action to make yourmainbranch incorporate the changes (e.g., merging into main or rebasing main or even resetting (e.g. moving) a branch). These actions are doable and even the norm in many git workflows, but they amount to an extra maintenance step and require additional thought. - By using the simple git workflow, git prevents you from pushing changes to the remote if you have not incorporated the latest changes from
main. Because this step happens before you modify your localmainbranch, the possibilities of what operations you run onmainare simplified: you are only ever do a fast-forward merge onmain. Meanwhile, you can decide whether you want to incorporate the latest changes frommaininto your branch right away (usually but not always a good idea) or keep working on the branch as is. Whichever you decide, you do not need to take any special actions (which is not true if you directly merged your code into the local main branch).
- If somebody has made changes on
Git hosting services
- Some popular git hosting services are:
- These are convenient places to store your repositories and collaborate with others.
- Don't conflate
gitand GitHub. Git is software tool distributed freely on the internet, whereas GitHub is one of several companies that provide services related to that tool. - Many have items surrounding git that arguably enhance it (but also lock you into the particular provider) such as
- Issues
- Enable users to file a bug report with your program
- Pull Requests
- Enables another user to alert you that they have a contribution to your code ready to go
- Forking
- Tracks that a repository is a clone of another repository
- Static Website Hosting
- Host a static website for your project or portfolio
- Wiki
- Like wikipedia, but for your project
- Gists
- A quick way to host a single file
- Permissions
- Prevent people from performing actions on the repository
- Organizations
- Groups of repositories
How to learn
- Use
gitoften. Use it for everything. - Use the command line rather than GUI tools or GitHub to manipulate the repository.
- Use
gitk --allto see what is happening in your repository. - Do not just blindly run commands. Especially when getting started, take the time to read about them, experiment with them in a copy of your repository and understand what they are doing.
- Explore the Resources provided here
Git Prompt
- It is extremely useful to setup your terminal to show information about git
- To add this to bash you must modify the
PS1variable in your~/.bashrc- The
PS1variable controls the look of your prompt - Add
$(__git_ps1 "(%s)")to wherePS1is set in~/.bashrc, just before the\$
- The
- This is just the basic prompt showing you the branch. There are many other possibilities out there
Demonstration
Here is the code from the live demonstration
Basic Branching
git init --bare central # Mimic a central repository (e.g., on GitHub) git clone central repo1 git status echo "Good Code" > file1.txt git add file1.txt git commit # First commit git status gitk --all & echo "More code" >> file1.txt git commit -a # Fixed a bug! git branch br1 git status git switch br1 git status echo "this is branch1" >> file1.txt git commit -a # Another bug is fixed! less file1.txt git checkout main less file1.txt
Conflicts!
echo "I have other ideas" >> file1.txt git commit -a # Conflict created! git merge br1 git status git merge --abort git merge br1 cat file1.txt git checkout --ours file1.txt git checkout --theirs file1.txt git add file1.txt git status git commit #fixed conflict git push
Clone Me
cd .. git clone central repo2 cd repo2 ls git remote -v git switch -c elwin/feature echo "I Solved IT!" > file4.txt git add file4.txt git commit # Got it gitk --all &
Meanwhile bash
cd ../repo1 echo "important bug fix" >> file2.txt git add file2.txt git commit # That's it git push
Ready to commit
cd ../repo2
git push origin elwin/feature:main
git fetch
git checkout main
git merge
git switch elwin/feature
git rebase main
git push origin elwin/feature:main
git checkout main
git pull
Exercises
Follow along with these Git exercises to practice your git usage.
Along the way, if you don't understand something use the built-in git help
to read more about the command git help.
You should also follow along with gitk. Note: if you run gitk
before making any commits, you will get an error message pop up.
Just click Ok. After your first commit, refresh gitk (Shift-F5) and
you should see your commits.
After every action on the repository you must refresh gitk to see the changes reflected.
Basic Operations
First time git setup:
git config --global user.name "Your Name" git config --global user.email "Your email" git config --global core.editor YourEditor git config --global init.defaultBranch main
(for VScode use
'code --wait'as theYourEditor)- Create a new local repository:
Create a directory using
mkdir,cdinto it, and usegit init - Create files on your main branch.
- Create your first file,
quote.txtand put some content in it (such as quote that you like). - Run
git statusto see that there is now a file in your repository that is not tracked by git. - Use
git addto have git track the file it and place it in the staging area. Confirm withgit status - Use
git committo commit the file to the repository. Write a commit message describing what you did.
- Create your first file,
- Launch
gitk, and set it to view all branches:gitk --all &(the&runs it in the background so you can keep using the same terminal).- After each git operation, use
Shift-F5in thegitkwindow to update its view of the repository. Try to predict what you are going to see before refreshing the view
- Create a second file called
robot.txtand commit it to the repository.This file should name your favorite robot and why you like it.
- Add another quote to
quote.txtand commit the changes.- Use
git addto stage the file in the index. - Use
git statusto see whatgit adddid.- Notice you can unstage the file by using
git restore --staged quote.txt.
- Notice you can unstage the file by using
- Use
git commit -ato automatically stage all tracked files and create a commit. Write a good log message. Remember, the better your log messages are, the more useful git becomes
- Use
- Create a new branch called
testin the repository and switch to it.- You can do this with a single command or with a sequence of different commands.
Some commands that may be useful are
branchandswitch.
Append some more information about your favorite robot to
robot.txt(on a new line), delete thequote.txtfile, and commit the changes.git commit -awill automatically stage the changes to these files (because you have previously added them to git)- You can also stage them manually with
git addandgit rm
Switch to the
mainbranch.- What has happened to the changes you made on the other branch?
- What changed in
git statusandgitk?
Merge the changes from
testintomain- If you followed the instructions literally and appended data there should be no conflicts and the merge will go through as a fast-forward merge,
essentially just taking the commit from
testand adding it to your currentmainbranch - If you modified previously existing text in
robot.txt, you may get merge conflicts- use
git statusto see what files are affected - You can edit the affected files, add them, and commit the changes to complete the merge.
- use
- If you followed the instructions literally and appended data there should be no conflicts and the merge will go through as a fast-forward merge,
essentially just taking the commit from
- Create a file that should not be tracked: for example a zip file of the contents of your repository,
using the
zipcommand (zip <zipfile> <files_to_add>). - Use
git statusto see that the file is not tracked. Create a.gitignorefile to ignore the zip file and verify that the zip file does not show up ingit status. Commit.gitignoreto the repository.
Remotes
- Log in to your GitHub account
- GitHub has many convenience functions that you can use to modify your repository directly from their website.
- Avoid these features: they create commits directly on your remote repository, which can lead to confusion.
- On GitHub, create a new private repository and don't create any files.
- Add your ssh key to your github account: https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account
- On your local repository, use
git remote -vto see remote repositories. You shouldn't see anything.- Technically a git remote can point to a git repository anywhere, including somewhere else on your harddrive
- You do not need a third-party service like github to make git remotes.
- The URL for your repository is
git@github.com:user-name/repository-name.git- If you created the repository on github before doing anything on your system, you would
git clonethe repository- This will automatically add a
remotecalledorigin
- This will automatically add a
- In this case, we already have a repository so instead we just need to manually create the remote
origin. - There is also an
httpsURL, which is useful for cloning public repositories, but you cannot use ssh keys with this method.- The
httpsURL follows the patternhttps://github.com/user-name/repository-name.git
- The
- If you created the repository on github before doing anything on your system, you would
- Add your remote git repository as a remote called
origin.git remote add origin <url_to_repository>
- Push your main branch to
originand set it up such that yourmainbranch to trackorigin/main:git push -u origin main- Now that your local branch is tracking a remote branch,
git pushwill automatically push yourmaintoorigin/main - Be sure to look in
gitkto see both your local and remote branches - Look on GitHub and see that your changes were indeed pushed.
- Make sure your ssh keys are working: if they are working you need not enter a password when doing a
git push- You may need to log out and log back in for a new key to be unlocked
- See Linux Notes for how to generate an ssh key
Collaboration
You will now be divided into teams.
- Select 1 team member. Everyone should clone that
partnerrepository from GitHub.- The creator of the
githubrepository should add their partner as a collaborator so that both partners may push
- The creator of the
- Create a new feature branch in
partnernamedyourname/aboutme. - Create a file called
yourname.txt.- In this file write something that you would like to share about yourself with others two true statements about yourself and one false statement.
- Place each statement on it's own line.
- This is the "feature" you are adding to the "program"
commityour changes to your local branch.- One by one, each team member should merge their feature into
main- Follow the Simple Git Workflow
- Features should be fully merged one at a time.
- Pushes and merges should go smoothly.
- You have no conflicts because you edited different files
- At the end of this, everyone should update their local
mainbranch to matchoriginand their feature branch should beup-to-dateas well.
Simultaneous editing
We will now engage in some mayhem.
- Everybody should edit every
yourname.txtfile (on their own branches feature branches)- Place $username.T next to each line that you think is true, and $username.F next to each line you think is false (substitute your username for
$username)
- Place $username.T next to each line that you think is true, and $username.F next to each line you think is false (substitute your username for
- Each team member should simultaneously attempt to merge their feature into
main- Follow the Simple Git Workflow, as before
- People on your team will encounter conflicts. Resolve them by combining the T/F ratings onto a single line and committing the files.
- When you are done, everyone's vote should be on every line. How well did you do?
Resources
- Pro Book: A definitive guide to git. Definitely recommend reading the first few chapters.
- Git from the Bottom up. Learn git by understanding its internal data-structures.
- Learning Git Interactive Tutorial.
- Git Imersion step-by-step walkthrough
- Github Guides Guides for git by GitHub
- Learn more about git Workflows
Footnotes:
Linus Torvalds wrote git so he could stop using Bitkeeper
The git switch and git restore commands are relatively recent additions. Prior to their introduction, git checkout (somewhat confusingly) served both of these purposes
and you will still encounter many references to git checkout.