Oskar Korczak: Basic development cycle in Git

This article is a short overview of Git and how it can be used in development.
I started to use Git, some time ago and almost same time I fell in love in it. Now, I know that I'm fully devoted to Git and I don't want to come back to SVN, any more.
I don't want to slander SVN, as it solved majority of CVS drawbacks, which were teasing our industry. However, the thing is, Git is so much better, than SVN that honestly, I can't see the point of sticking to it, for even a minute longer, than I have to. SVN had its five minutes and used them in a pretty decent way. Now, it's time for the Git!
When it comes to Git's learning curve, it's not extremely steep. There is a well written book, titled GitPro, which shows all necessary concepts useful in every day development. If I were starting learning Git, I would read first three or four chapters, just to get a general idea what is the whole thing about.
In terms of architecture, the most influential decision made, was treating Git as a distributed Software Configuration Management (SCM) tool. That approach turned SVN's flaws into Git's virtues. Now, Git is boasting of its:

speed
failure resistance
spread backup

There is a very interesting observation showing that basing on a single design decision, Linus Torvalds got rid off all ridiculous issues we were heroically fighting with, in SVN. Let's quickly recap all of them and show how Git copes with his ancestors' cons:
Speed - there is no need to wait for syncing with remote repo, every single time you want to commit, as Git is using a local copy of remote repo. The only thing you do is occasionally pulling and pushing changes from/to remote repo.
Failure resistance - there is no concept of central repo, so during development process everybody has a copy of the whole, initial repo. Assuming destruction of one of the repos, the recovery is quick and easy. It bases on a copy of remote repo on other machines. It's much harder to loose all changes on all dev machines. As a consequence of one repo failure, there is no longer a tie up for all developers.
Spread backup - given the remote repo is down, you can still work on your local copy of remote branch and sync with remote one, when it's up and running.

Okay, let's get down to the business and show how basic, Git, development cycle may look like.
First of all, we would need a Git installed on our machine. Furthermore, a GitHub account would be a sort of must for us, too. We will use GitHub to set up a project and have a reliable remote repo. Everything is described on GitHub's bootcamp page, so I'm not diving into it.
When it's done, we have two options. We may start our project on our local machine and add it to remote repo or we can clone existing remote repository. To start a repository at current directory issue:

$ git init

If you like to copy, say remote repo, use:

$ git clone https://github.com/oskarkorczak/sample-repo.git

You are able to list all your repositories (local and remote), using below command:

$ git branch -a

Now, we would like to crete a separate branch and start developing our change:

$ git checkout -b name-of-the-new-branch

Basing on a small steps rule and TDD cycle, you would like to commit every single time you have a green test. In order to check, what changes are going to be tracked or potentially going to be committed issue:

$ git status

You can also commit whenever you feel you did something valuable in branch. That's as simple as saying to Git:

$ git commit -am "Here comes your commit message."

It basically adds your changes under version control (-a switch) and puts an inline message (-m switch).
If you would be curious what is happening in Git's history, you can always ask for it, by issuing commnad:

$ git log

Given that it took you one day to develop a change in your branch. Now, you would like to sync it with others and commit to remote repo. First of all, you have to switch to your local master branch, which is a local copy of remote master (main) branch:

$ git checkout master

Then, download all changes done by other developers to your local master branch:

$ git pull origin master

Origin is a default name of remote repository. It basically works like IP:port address pair in networks. Origin might be an equivalent of IP and port may be a branch name, in this case, it's master. Let's switch back to our branch:

$ git checkout name-of-the-new-branch

Now, we have to stop for a second and clarify how rebase works. In a nutshell, it's highly probable that there have been some commits of other developers. Rebase is a process of lining them up, so that all sync commits from last pull request from local master, would be before your local branch amendments. Simplifying, it means that rebase process will put all your changes at the end of updated master branch.

$ git rebase master

If it happen that you would have some confilcts, rebase command will stop at the first conflit and notify you about that. You have to resolve all conflicts manually or alternatively you can install a merge tool. Resolving process is a step similar to SVN. You have to go to particular file, delete unnecessary entires and tell Git it's done, by adding new changes under version control:

$ git add resolved-file-names

Then, you have to prod Git to carry on rebasing your changes with local master by:

$ git rebase --continue

If you forgot about adding something to the code, you are always able to abort rebasing by:

$ git rebase --abort

After that, it's time to send your branch to GitHub, where it could be picked up by some Continuous Integration system for building and testing purposes.

$ git push origin name-of-the-new-branch

When all tests are green you can raise a pull request. On GitHub it's simple, you are switching to your branch, pressing a "Pull Request" button and it's done. You're getting a link to pull request, which might be send to all developers to be reviewed. When code review is done, your pull request might be merged with remote master automatically by pressing a "Merge" button in pull request. There might be also a case that between the moment you send your changes to GitHub and hitting "Merge" button, somebody else merged his changes in overlapping area of the project. Then, there couldn't be done a fast-forward merge, which means repeating a part of above sequence of steps. You will have to update your local master, do the rebase and once again send updated branch to GitHub. Another "push" will update exiting GitHub's branch automatically. "Merge" button is available once again.

Assuming that you smoothly went through code review and merging procedure, the last thing you should do is cleanup. Remote branch is no longer necessary and it shouldn't clutter remote repository, so we are going to delete it:

$ git push origin :name-of-the-new-branch

To be honest, deleting remote branch is a bit dodgy for me. Who the heck invented colon before branch name as a deletion marker? A sort of -d switch is more natural, isn't it?
On the other hand, removing local branch is totally different and is done by command:

$ git branch -d name-of-the-new-branch

There are also two more, very useful Git commands. First one is responsible for fetching all remotes to local repo:

$ git pull --all

Second one is removing from local repo all branches, which are not existing on remote repository:

$ git remote prune origin

Later on, say next day, when your code base is changed by other team members, you can call a chain of useful commands. They will update master and all other branches, clean branches no longer maintained on remote repo and list local and remote branches, which are currently used. All these things might be done by invoking commands you already know:

$ git pull origin master; git pull --all; git remote prune origin; git br -a

At this point, you should know whole, basic Git development cycle.

There are also couple of golden rules related to development as such. They are especially useful while working with Git:

Baby steps - use TDD and commit every single time you have a green test.
Small scope of changes - keep your changes as small as possible, not to affect many places in the code base. It prevents your changes from alternating between you and other developers.
Short living branches - tightly connected with above point. The longer branch lives, the more vulnerable it is for not merging in a fast-forward mode.

Oskar Korczak

Saturday, 1 September 2012

Basic development cycle in Git

No comments:

Post a Comment