Getting started with Git

What is “version control”?
Version control is a system that records changes to a file over time so that you can recall specific versions later.

Local Version Control Systems
Many people’s version-control method is to copy files into another backup directory. This approach is incredibly error prone. It is easy to forget which directory you’re in and accidentally write to the wrong file or copy over files you don’t mean to.

Centralized Version Control Systems
The next major issue that people encounter is that they need to collaborate with developers on other systems. To deal with this problem, Centralized Version Control Systems (CVCSs) were developed. These systems, such as CVS, Subversion, and Perforce, have a single server that contains all the versioned files, and a number of clients that check out files from that central place. What if the CVC server crashes or its disk fails?

Distributed Version Control Systems
This is where Distributed Version Control Systems (DVCSs) step in. In a DVCS (such as Git, Mercurial, Bazaar or Darcs), clients don’t just check out the latest snapshot of the files: they fully mirror the repository. Thus if any server dies, and these systems were collaborating via it, any of the client repositories can be copied back up to the server to restore it. Every clone is a full backup of all the data.

Why Git?
Git stretches the very notion of version control systems (VCS) by its ability to offer almost all of its features for use offline and without a central server.

History of Git
In 2002, the Linux kernel project began using a proprietary DVCS called BitKeeper. In 2005, the relationship between the community that developed the Linux kernel and the commercial company that developed BitKeeper broke down, and the tool’s free-of-charge status was revoked. This prompted the Linux development community (and in particular Linus Torvalds, the creator of Linux) to develop their own tool based on some of the lessons they learned while using BitKeeper. Some of the goals of the new system were as follows:

– Speed
– Simple design
– Strong support for non-linear development (thousands of parallel branches)
– Fully distributed
– Able to handle large projects like the Linux kernel efficiently (speed and data size)

Since its birth in 2005, Git has evolved and matured to be easy to use and yet retain these initial qualities. It’s incredibly fast, it’s very efficient with large projects, and it has an incredible branching system for non-linear development.

Snapshots, Not Differences
The major difference between Git and any other VCS is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. Git thinks of its data more like a set of snapshots of a miniature filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored.

Nearly Every Operation Is Local
For example, to browse the history of the project, Git doesn’t need to go out to the server to get the history it simply reads it directly from your local database. This also means that there is very little you can’t do if you’re offline or off VPN. If you get on an airplane or a train and want to do a little work, you can commit happily until you get to a network connection to upload.

Git Integrity
Everything in Git is check-summed before it is stored and is then referred to by that checksum. The mechanism that Git uses for checksumming is called a SHA-1 hash. This is a 40-character string composed of hexadecimal characters (0–9 and a–f) and calculated based on the contents of a file or directory structure in Git. A SHA-1 hash looks something like: 24b9da6552252987aa493b52f8696cd6d3b00373

The Three States
Git has three main states that your files can reside in: committed, modified, and staged. Committed means that the data is safely stored in your local database. Modified means that you have changed the file but have not committed it to your database yet. Staged means that you have marked a modified file in its current version to go into your next commit snapshot. This leads us to the three main sections of a Git project: the Git directory, the working directory, and the staging area.

The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
The staging area is a file, generally contained in your Git directory that stores information about what will go into your next commit. It’s sometimes referred to as the “index” or staging area.

The basic Git workflow goes something like this:
1. We modify files in our working directory.
2. We stage the files, adding snapshots of them to our staging area.
3. We do a commit, which takes the files from staging area and stores that snapshot to our Git directory.

If a particular version of a file is in the Git directory, it’s considered committed. If it has been modified and was added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified.

Install Git
Git has a very light footprint for its installation. Git is primarily written in C, which means there is a unique distribution for each supported platform.

Linux
$ sudo yum install git-all
If you’re on a Debian-based distribution like Ubuntu, try apt-get:
$ sudo apt-get install git-all
For more options, there are instructions http://git-scm.com/download/linux

Mac
There are several ways to install Git on a Mac. The easiest is probably to install the Xcode Command Line Tools http://git-scm.com/download/mac.
You can also install it as part of the GitHub for Mac install http://mac.github.com

Windows
Just go to http://git-scm.com/download/win and the download will start automatically. Note that this is a project called Git for Windows, which is separate from Git itself; for more information on it, go to https://git-for-windows.github.io/
Another easy way to get Git installed is by installing GitHub for Windows http://windows.github.com

First-Time Git Setup
With the binaries on your $PATH, issue the following three commands just once per new machine on which you’ll be using Git. Replace the username and email address with your preferred credentials.

git config --global user.name “username”
git config --global user.email “username@email.com”
git config --global color.ui “auto”

 

These commands store your preferences in a file named .gitconfig inside your home directory (~ on UNIX and Mac, and %USERPROFILE% on Windows).
Here are several in-depth Git installation guides:

http://help.github.com/win-git-installation/
http://help.github.com/mac-git-installation/
http://help.github.com/linux-git-installation/

Creating a Repository
Now that Git is installed and the user information established, you can begin establishing new repositories. From a command prompt, change directories to either a blank folder or an existing project that you want to put under version control. Then initialize the directory as a Git repository by typing the following commands:

git init
git add .
git commit –m ’The first commit‘

The first command, init, builds a .git directory that contains all the metadata and repository history. Unlike many other version control systems, Git uniquely stores everything in just a single directory at the top of the project. No pollution in every directory.

Next, the add command with the dot wildcard tells Git to start tracking changes for the current directory, its files, and for all folders beneath.

Lastly, the commit function takes all previous additions and makes them permanent in the repository’s history in a transactional action. Rather than letting Git prompt the user via the default text editor, the -m option preemptively supplies the commit message to be saved alongside the committed files.

Cloning Existing Projects
This is similar to the checkout concept in Subversion or other centralized version control systems. The difference in a DVCS is that the entire history, not just the latest version, is retrieved and saved to the local user’s disk.

git clone git://github.com/username/myproject.git
or
git clone http://github.com/username/myproject.git
or
git clone git@github.com:username/myproject.git

The clone command performs several subtasks under the hood. It sets up a remote (a Git repository address bookmark) named origin that points to the location git://github.com/username/myproject.git Next, clone asks this location for the contents of its entire repository. Git copies those objects in a zlib-compressed manner over the network to the requestor’s local disk. Lastly, clone switches to a branch named master, which is equivalent to Subversion’s trunk, as the current working copy. The local copy of this repo is now ready to have edits made, branches created, and commits issued – all while online or offline.

The Typical Local Workflow

Editing
Once you’ve cloned or initialized a new Git project, just start changing files as needed for your current assignment. There is no locking of files by teammates. To move a file:

git mv originalfile.txt newsubdir/newfilename.txt

To expunge a file:

git rm fileyouwishtodelete.txt

Viewing
Daily work calls for strong support of viewing current and historical facts about your repository.

To check the current status:

git status

Diff
A patch-style view of the difference between the currently edited and committed files, or any two points in the past can easily be summoned. The .. operator signifies a range is being provided. An omitted second element in the range implies a destination of the current committed state, also known as HEAD:

git diff
git diff 32d4..
git diff --summary 32d4..

Git allows for diffing between the local files, the stage files, and the committed files with a great deal of precision.

 git diff everything unstaged diffed to the last commit
 git diff --cached everything staged diffed to the last commit
 git diff HEAD everything unstaged and staged diffed to the last commit

Log
The full list of changes since the beginning of time, or optionally, since a certain date.

git log
git log --since=yesterday
git log --since=2weeks

Stashing
When your changes are in an incomplete state, you aren’t ready to commit them, and you need to temporarily return to the last commit and pushes all your uncommitted changes onto a stack.

git stash

When you are ready to write the stashed changes back into the working copies of the files, simply pop them back of the stack.

git stash pop

Aborting
If you want to abort your current uncommitted changes and restore the working copy to the last committed state, there are two commands that will help you accomplish this.

git reset --hard

Resetting with the hard option recursively discards all of your currently uncommitted (unstaged or staged) changes.
To target just one blob, use the checkout command to restore the file to its previous committed state.

git checkout -- Person.java

Adding (Staging)
When the developer is ready to put files into the next commit, they must be first staged with the add command.

git add file name, folder name, or wildcard
git add submodule1/PrimaryClass.java
git add .
git add *.java

Specifying a folder name as the target of a git add recursively stages files in any subdirectories.
The -i option activates interactive add mode

git add -i

The -p option is a shortcut for activation of the patch sub-mode of the interactive prompt, allowing for precise pieces within a file to be selected for staging.

git add -p

Committing
Once all desired blobs are staged, a commit command transactionally saves the pending additions to the local repository.

git commit
git commit –m ”your commit message”

To view the statistics and facts about the last commit:

git show

If a mistake was made in the last commit’s message:

git amend

Branching
Git branches can be targeted to exist only locally, or be shared with (pushed to) the rest of the team.

git branch new branch name from branch
git branch new branch name

Choosing a Branch
Checking out (switching to) a branch is as simple as providing its name:

git checkout branch name

To list the complete set of current local and remote branches known to Git:

git branch -a

Merging
To merge one or more branches into the current branch.

git merge branch one
git merge branch one branch two

If any conflicts are encountered, which is rare with Git, a notification message is displayed and the files are internally marked with >>>>>>>>> and <<<<<<<< around the conflicting portion of the file contents.

Rebase
Rebasing is the rewinding of existing commits on a branch with the intent of moving the “branch start point” forward, then replaying the rewound commits.

git rebase source branch name
git rebase source branch name destination branch name

Tagging
To mark a point in your code timeline with a tag:

git tag tag name
git tag tag name treeish

The Remote Workflow

A remote called origin is automatically created if you cloned a remote repository. The full address of that remote can be viewed with:

git remote v

To add a new remote name:

git remote add remote name remote address
git remote add remote name git@github.com:matthewmccullough/ts.git

Push
Pushing with Git is the sending local changes to a colleague or community repository with sufficiently open permissions as to allow you to write to it. If the colleague has the pushed-to branch currently checked out, they will have to re-checkout the branch to allow the merge engine to potentially weave your pushed changes into their pending changes.

Fetch
To retrieve remote changes without merging them into your local branches, simply fetch the blobs. This invisibly stores all retrieved objects locally in your .git directory at the top of your project structure, but waits for further explicit instructions for a source and destination of the merge.

git fetch remote name
git merge remote name/remote branch

Pull
Pulling is the combination of a fetch and a merge as per the previous section all in one seamless action.

git pull
git pull remote name
git pull remote name branch name

Bundle
Bundle prepares binary diffs for transport on a USB stick or via email. These binary diffs can be used to “catch up” a repository that is behind otherwise too stringent of firewalls to successfully be reached directly over the network by push or pull.

git bundle create catchupsusan.bundle HEAD~8..HEAD
git bundle create catchupsusan.bundle --since=10.days master

Cloning

git svn clone --stdlayout svn repo url

Pushing Git Commits to Subversion

git svn dcommit

Retrieving Subversion Changes

git svn rebase
Advertisements