Introduction

Distributed Version Control System
- merge collaborated file
- time capsule (time log of source)
- in distributed VCS, everyone has a copy of the repo.  Better than central repository, because can make commits quickly, can work offline and there’s redundancy (everyone has a copy).

git help [command] // help details
git config --global user.name "user name"
git config --global user.email email@email.com
git config --global color.ui true // pretty command line colours

Workflow

1. create file (starts untracked)
2. add file to staging area using git add.  Files here ready to be committed
3. commit changes, which creates a snapshot of staging area

git init // initialise git in directory
git status -s // what has changed since last commit -s silent mode

Staging area

git add [list of files] // adds file/s listed
git add --all // adds all
git add *.txt // all in .txt in current directory
git add "*.txt" // all in .txt in entire project
git add docs/*.txt // all type files in directory
git add directory/ // adds all in directory including sub-directories
git add . // adds all files
git add -i // stage in interactive mode
git add -p // stage in patches

Patch options:

Stage this hunk [y,n,a,d,/,j,J,g,e,?]? ?
y - stage this hunk
n - do not stage this hunk
a - stage this and all the remaining hunks in the file
d - do not stage this hunk nor any of the remaining hunks in the file
g - select a hunk to go to
/ - search for a hunk matching the given regex
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see previous hunk
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help

Timeline/s

A journal of all changes to the timeline history:
git log --oneline --decorate --all --graph

git commit -a -m 'Initial version' // -m message, -a includes all tracked files (not new files)

Message should describe what the commit does (present tense).

Staging & Remotes

Working on local repository

HEAD refers to the last commit on the current branch or timeline.  By default, HEAD points to your most recent commit.

git diff / show unstaged differences since last commit
git diff --staged // difference of staged
git diff HEAD // difference of most recent commit (refer to HEAD)
git reset --soft HEAD^ // (rollback) unstage file/s to one before the current HEAD
git reset \--hard HEAD^ // undo last commit and all changes (add no. of commits with ^^)
git commit \--amend -m "new message" // amend will add to last commit
git checkout -- [target] // change target file back to how it was prior to last commit
git rm [target] // remove actual files from disk, and stage for removal
git rm \--cached [target] // stop tracking (but won’t delete from filesystem)
git push origin --delete [remote-branch] // delete branch on remote (GitHub)
git push origin :old_branch new_branch // delete old remote and push to new remote

Working with remote repository

Access control (who has access to what) managed by host (GitHub / BitBucket) or self-managed (Gitosis / Gitorious).  Can have multiple remotes (for example, origin, production and testing).

git remote -v // check status of origin / master
git remote add [name] [address]
git remote add origin <https://github.com/user/repo.git> // "origin" refers to official repository that most / all collaborators are using.
git remote rm [name] // removes a remote
git push -u [remote repo name] [local branch to push]
git push -u origin master // -u will remember parameters, so can run 'git push’ afterwards
git push [remote repo name] [local branch to push]
git push origin master // will push to own branch on github repository
git pull [remote repo name] [local branch to push] // fetch / sync local repo with remote
git pull origin master //pull any changes from github made by collaborators
git checkout --track origin/feature/branch // create local branch from remote and track
git checkout -b local-branch origin/remote-branch // same as above but provide different name

Cloning & Branching

git clone [repo] [different directory name] // clone repo locally

Feature branches

- feature branches are separate branches of code that aren’t ready yet for main application
- kept separate so can’t break main app

When to use:
- use feature brand to work on lengthy or major change
- on unchartered territory such as new gem or such
- if you’re new to a project
- general collaboration / all of the above

git branch [branch name] // create new branch
git branch -d [branch name] // delete branch
git branch // see branches available
git checkout [branch name] // switch branches (timelines)
git checkout -b [branch name] // -b creates the branch and checks it out
git merge [branch name] // will merge changes from target into the branch you’re in
git mergetool // brings up git merge tool

Vi commands:
j (down), k (up), h (left), l (right), ESC (leave mode), i (insert mode), :wq (save/write & quit), :q! (cancel & quit).

If changes were made on both branches, Git can’t fast-forward as it would if only changes were made on checked-out branch.  Instead, a recursive merge is made and git automatically 'merge commits' to merge the branches.

Collaboration Basics

git pull

If behind origin repository on github, can git pull to update master, then git push.

git pull fetches origin (git fetch), creates local branch origin/master, then merges origin/master with master (git merge origin/master).

Merge conflicts

Need to fix conflicts manually. Then git commit -a (leave off the -m message).

Remote Branches & Tags

Good practice to create a remote branch to backup work if working for more than a day on a feature.  Also necessary if working on a feature with collaborators.

To create remote branch:
git checkout -b [branch name] // create local branch
git push origin [branch name] // push to remote.

To get remote branch:
git pull [--rebase // do rebase after fetch] // fetch then merge
git branch -r // -r will list remote branches
git checkout [branch name] // checkout remote branch -] git push to push (already tracking branch).
git remote show [remote name] // remote name usually origin

Delete a remote branch only:
git push origin :[branch name]

Delete local branch only:
git branch -d [branch name] // will be warned if unmerged changes
git branch -D [branch name] // delete ignoring unmerged changes
git remote show origin // show status of remote branches
git remote prune origin // delete local branches with deleted remotes (stale remotes)

A tag is a reference to a specific commit (used mostly for release versioning)
git tag // list all tags
git checkout [tag name]
git tag -a [tag name] -m [tag description]
git push --tags // push tags to remote

Rebase

Rebase is an alternative to merge commits which may seem like they’re polluting timeline history.

So, instead of git pull && git push, going to use git fetch && rebase.

git fetch // pulls down changes without merging (unlike git pull)
git rebase [branch to rebase from] // run from branch want to bring up-to-date

git rebase does three things:
1. moves changes to master not in origin/master to temporary area
2. runs all origin/master one at a time
3. runs commits from temporary area
(no merge commits, one commit after another after another)

Interactive Mode

Interactive mode rebases every commit after the one you specify.  Brings up an editor that allows you to edit the state to replay.

git rebase -i HEAD~3 // interactive mode - replay last three commits before current HEAD

Note, in git log commits shown in reverse chronological order, whereas in interactive rebase order is original chronological order

Pick: chooses a commit to replay
Reword: allows use to change commit message (will bring up another editor to edit the commit message)
Edit: replays commits but allows us to edit commit. Can be used to split one commit into many commits.  Will bring up the command line prompt when gets to edit keyword.

To edit:
1. git reset [\--hard // to erase changes] HEAD^ // will onstage and leave working files uncommitted
2. stage and commit files as needed
3. git rebase --continue // resumes replaying commands in rebase script

Squash: squashes a commit into the previous commit.  Will bring up editor to update commit message for squashed commits.  Squash will merge the commit into the previous.

History & Configuration

System: /etc/gitconfig // every user on the system and all their repositories
Global: ~/.gitconfig (or ~/.config/git/config) //specific to each user
Local: In repo .git/config // local to repository

git config --local user.name "User Name"

Configure for system, global, local. Many flags and options for git log, git diff, git blame: see slides.
git config --global --local --system --list
git config --global pull.rebase true
git config --global color.ui true
git config --global --unset core.excludesfile //unset to remove entry

Reuse Recorded Resolution: record merge conflicts and automatically replay.
git config --global rerere.enabled true

Configure alias:
git config --global alias.name-of-alias "command to alias"

Excludes added to .git/info/exclude:
file.type, *.type, folder/, folder/*.filetype

.gitignore to ignore log files
.gitattributes file: conversion settings for file types

Stashing

Store files and work away in temporary area.

git stash [save] [\--keep-index] [\--include-untracked]
Save work to list of stashes and restore state to last commit. keep-index staging area will not be stashed (i.e. files that are added). include-untracked will stash new untracked files as well.

git stash apply [stash@{1}] // brings stashed files back - stash@{0} is default
git stash list [\--stat] // see list of stashes. stat will list details for stashes
git stash drop [stash@{1}] // drops the stash from the list
git stash pop [stash@{1}] // applies and drops the stash in one command
git stash show [stash@{1}] [\--patch] // shows details on a stash. Patch shows diff of files changes in the stash
git stash branch [branch-name] [stash@{1}] // Automatically create new branch and pop the stash
git stash clear // clear all stashes

Purging History

Commands can’t fail or filter will fail. -f forces filter to override the backup.

git filter-branch --tree-filter [command] [\--all // on all commits in all branches | HEAD // only in current HEAD] // checkout each commit in history into working directory, run command and re-commit.
For example: git filter-branch --tree-filter 'rm -f passwords.txt'

git filter-branch --index-filter [command] // index-filter: commands MUST operate on a staging area
git filter-branch -f --prune-empty --all // prune all empty commits

Cherry Pick

git cherry-pick -x --signoff [\--edit // edit the commit message] [hash] // run from branch to cherry pick into (SHA will change). -x adds cherry-pick info to commit message. --signoff adds cherry-picker
git cherry-pick --no-commit [hash] [hash2] // will allow to cherry pick and customise commit/s. For example into one.

Submodules

Submodules are git repos inside git repos (independent of parent repo).

Creating Submodules

Create just like any other git repo, and then git to the parent as a submodule.

1. Have an existing repo
2. git submodule add git@example.com:css.git // will clone into directory inside current directory, create .gitmodules file containing config for submodules
3. git commit -m "Added submodule" // commit changes

Modifying Submodules

If changes made to submodule, the parent needs to be updated to point the new commit in the module.

Update submodule: cd into submodule folder, checkout branch (by default don’t start in branch), make changes, commit, and push submodule.
Update parent: parent repo will still reference the old commit on the submodule. So, on the parent need to add changes (incl. submodule directory), commit, and push parent repo

Setting up project which has existing submodules

1. clone repo locally
2. git submodule init // read .gitmodules file and initialise accordingly.
3. git submodule update // update submodules

git push --recurse-submodules=check // run from parent directory to abort push if submodules have not also been pushed
git push --recurse-submodules=on-demand // will push any un-pushed submodules
git config alias.pushall "push --recurse-submodules=on-demand" // can make this an alias!

Reflog

Git keeps a second log, only in local repo, called reflog.  Git updates reflog whenever HEAD moves (due to new commits, checking out branches, or resets).  For example, reflog can be used in situations where reset hard and blow away changes.

git reflog // view the ref log
git reset --hard 6f49f08 // HEAD@{0}
git log --walk-reflogs // gives more details than one line

If lose access to Reflog, can also retrieve history with git fsck, which checks the git database for integrity.  If run with --full, shows the objects (commits) that aren’t pointed to by another object (branches).

git fsck --full

Search for a commit by message:
git log --all --grep='search query'

Blame

Blame shows what revision and author last modified each line of a file.
git blame [-L line-state,line-end] [file] // show commit history on a file
git show [commit sha] // show an individual commit

GitHub

Fork-based workflow

Great for occasional contributors - fork from upstream and pull request into upstream.

git remote add upstream [path_to_repo] // add initial branch as upstream
git fetch upstream
git merge upstream/master master
git push origin master

Single-repo Workflow

Easier for stable team - everyone has a clone of the single repo. Important to use feature branches in this example.

git merge --no-ff feature_branch // no fast forward if want to keep commit history

Releases

Release tags:
- point to a single commit.

git tag -s -a // lightweight tag, signed tag, annotated tag

Release branches:
- can be updated with new commits.

GitHub Releases:
Allow you to share binaries without checking into git, or include more release notes.

Issues

Include #1 in the commit message to include in the issue.  If include ‘fixes’ will automatically close the issue: ie. “fixes #9".

Wikis

Can be used if README.md becomes too large.

API

curl https://api.github.com

Git Internals

Git originally a toolkit for a VCS before a user-friendly VCS.  User-friendly commands known as “porcelain”. Low-level verbs referred to as “plumbing” commands.

git init creates the .git directory where everything related to git stored.
The folder structure is:
index // where git stores staging area information
HEAD // points to branch currently have checked out
config* // contains project-specific configuration options
description // used by GitWeb program
hooks/ // contains client or server-side hook scripts
info/ // keeps a global exclude file for ignored patterns don’t want to track in .gitignore
objects/ // stores all content for database
refs/ // stores pointers into commit objects in the database (branches)

HEAD and index files, and objects and refs directories are core parts of Git.

Git Objects

Git is a content-addressable filesystem: this means git is a key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.

hash-object takes some data, stores in .git directory and gives you back the key the data is stored as.

echo 'test content' | git hash-object -w --stdin

Git stores content as a single file per piece of content, names with the SHA-1 checksum of the content as its header. The subdirectory is named with the first 2 characters of the SHA-1, and the filename is the remaining 38 characters.

Pull the content back:
git cat-file -p [checksum] ] file.file

cat-file can also tell you the object type:
git cat-file -t [checksum]

Tree Objects

The tree solves the problem of storing the filename and also allows you to store a group of files together.

Git stores content in a manner similar to a UNIX filesystem, but a bit simplified. All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents. A single tree object contains one or more tree entries, each of which contains a SHA-1 pointer to a blob or subtree with its associated mode, type, and filename.

git cat-file -p master^{tree} //the most recent tree in a project.
The master^{tree} syntax specifies the tree object that is pointed to by the last commit on your master branch.

git ls-tree --full-tree -r HEAD // List the contents of a bare repository

Git normally creates a tree by taking the state of your staging area or index and writing a series of tree objects from it.  So, to create a tree object, you first have to set up an index by staging some files.

update-index command can artificially add the earlier version of the file to a new staging area.
git update-index --add --cacheinfo 100644 83baae61804e65cc73a7201a7252750c76066a30 test.txt
- add because file doesn’t yet exist in staging area
- cacheinfo because the file you’re adding isn’t in your directory but is in the database

modes:
100644 // normal file
100755 // executable file
120000 // symbolic link

write-tree command writes the staging area out to a tree object:
git write-tree

Can also read trees into staging area by calling read-tree. --prefix allows tree to be read as a subdirectory:
git read-tree --prefix=bak [SHA-1]

Commit Objects

You have three trees that specify the different snapshots of your project that you want to track, but the earlier problem remains: you must remember all three SHA-1 values in order to recall the snapshots. You also don’t have any information about who saved the snapshots, when they were saved, or why they were saved. This is the basic information that the commit object stores for you.

commit-tree creates a commit object
echo ‘first commit’ | git commit-tree checksum

These three main Git objects – the blob, the tree, and the commit – are initially stored as separate files in your .git/objects directory.

Object Storage

Git constructs a header that starts with the type of the object. Then, it adds a space followed by the size of the content and finally a null byte.
For example: "blob 16\u0000"
Git concatenates the header and the original content and then calculates the SHA-1 checksum of that new content.
Git compresses the new content with zlib, which you can do in Ruby with the zlib library.

Git References

References or refs are files in which the SHA-1 value is stored under a simple name for use as a pointer, rather than the raw SHA-1 value.  These are stored in the .git/refs directory.

To create a new reference, you can add to .git/refs/heads/master:
echo "[checksum]" > .git/refs/heads/master

Preferably, use the update-ref command to update refs:
git update-ref refs/heads/master [commit checksum]

To effectively create a branch, reference other than master:
git update-ref refs/heads/test [commit checksum]

Database might look like:

When you run commands like git branch (branchname), Git basically runs that update-ref command to add the SHA-1 of the last commit of the branch you’re on into whatever new reference you want to create.

The HEAD

Git knows the hash of the last commit when git branch (branch name) because of the HEAD.

The HEAD file is a symbolic reference to the branch you’re currently on (symbolic reference meaning doesn’t contain a hash, rather a pointer to another reference).

When you run git commit, it creates the commit object, specifying the parent of that commit object to be whatever SHA-1 value the reference in HEAD points to.

The HEAD file can be manually edited, but best to use symbolic-ref command.
git symbolic-ref HEAD // Reads the HEAD
git symbolic-ref HEAD refs/heads/test //sets the HEAD

Tags

The tag object is very much like a commit object – it contains a tagger, a date, a message, and a pointer. The main difference is that a tag object generally points to a commit rather than a tree.

Lightweight tags:
git update-ref refs/tags/v1.0 [commit hash]

Annotated tags:
If you create an annotated tag, Git creates a tag object and then writes a reference to point to it rather than directly to the commit. You can see this by creating an annotated tag (-a specifies that it’s an annotated tag):
git tag -a v1.1 [commit checksum] -m 'test tag'
cat .git/refs/tags/v1.1 // returns hash of tag
git cat-file -p [tag hash] // prints tag details

Remotes

Remote references reside in .git/refs/remotes/origin/master or similar.

Remote references differ from branches (refs/heads references) mainly in that they’re considered read-only. You can git checkout to one, but Git won’t point HEAD at one, so you’ll never update it with a commit command. Git manages them as bookmarks to the last known state of where those branches were on those servers.

Packfiles

Any staged and committed changes to a file will result in git storing an entirely new blob object.

However, Git can store one in full and the second object only as the delta between it and the first.

The initial format in which Git saves objects on disk is called a “loose” object format. However, occasionally Git packs up several of these objects into a single binary file called a “packfile” in order to save space and be more efficient. Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server.

git gc // stands for garbage collect: cleanup unnecessary files and optimize the local repository

The objects that remain after git gc are the blobs that aren’t pointed to by any commit.
Because you never added them to any commits, they’re considered dangling and aren’t packed up in your new packfile.

The other files are your new packfile and an index. The packfile is a single file containing the contents of all the objects that were removed from your filesystem. The index is a file that contains offsets into that packfile so you can quickly seek to a specific object. What is cool is that although the objects on disk before you ran the gc were collectively about 22K in size, the new packfile is only 7K. You’ve cut your disk usage by ⅔ by packing your objects.

git verify-pack // shows what was packed up
git verify-pack -v .git/objects/pack/pack-[hash].idx

The Refspec

If add a remote to branch, reference is stored in the .git/config file. This also include the refspec for fetching:

[remote "origin"]
  url = https://github.com/schacon/simplegit-progit
  fetch = +refs/heads/*:refs/remotes/origin/*

The format of the refspec is an optional +, followed by [src]:[dst], where [src] is the pattern for references on the remote side and [dst] is where those references will be written locally.

The + tells Git to update the reference even if it is not a fast-forward.

The default configuration can be changed in the .git/config, or it can be adjusted each time:
git fetch origin master:refs/remotes/origin/mymaster // pulls master remote branch and refererences in origin/mymaster

The refspec can be updated in the same way for push.

Transfer Protocols

The dumb transfer is simple, but generally not secure or advisable.

The smart protocol requires a process on the remote end that is intelligent about Git.

To upload data to a remote process, Git uses the send-pack and receive-pack processes. The send-pack process runs on the client and connects to a receive-pack process on the remote side.

This can take place over SSH or HTTP(S).

Environment Variables

Details on Git environment variables: https://git-scm.com/book/en/v2/Git-Internals-Environment-Variables.