Oskar Korczak: Code Review

Showing posts with label Code Review. Show all posts

Saturday, 10 November 2012

Which game are you playing ?

Few days back, I spoke with my friend about pretty "funny" situation. He told me that developers at his project are playing a game, which my friend called Pair Programming vs. Code Review. If developers like each other, they play Pair Programming. If they do not like each other, then they play Code Review.

As you can see those developers found a problem, where it was not even existing. That's a rare and undesired skill, which often manifest itself in pseudo pragmatic teams, who are probably not sticking to 5th Agile manifesto principle:
"Build projects around motivated individuals.
Give them the environment and support they need, and trust them to get the job done."
Generally, it is called over-engineering! But not only that, they are loosing time for guerrilla warfare, rather than focusing on helping each other. They also seem not to have any respect to each other.

The question is: should it be this way? Of course, no!
I would like to be clear about Pair Programming and Code Review, so let's start from some definitions.

Scratching Tiger, Hidden Dragon - Pair Programming

One developer types in code at the keyboard. He works at the tactical level. The other one looks for mistakes and thinks strategically about whether the code is being written right and whether the right code is being written.

You can also describe pair programming in a more holistic and tao oriented way, as a yin-yang or a unity and harmony arousing from diversity. There are many approaches explaining principles of pair programming like coder and observer, driver and navigator, however I am always keen on scratching tiger and hidden dragon parallel.

In other words, it is a conjunction of opposite features like active and passive, linear and non-linear or specific and general in a collective action such as solving a problem - programming in this case.

To be fairly honest, one can extend and generalize this approach to any sort of task you have got to do.

The overall idea is very easy to grasp and simple. Let's look at pair programming from two, different perspectives: tiger and dragon.

Scratching Tiger:

Active person writing the code, almost scratching it
Keeps on focusing on syntax and details
Uses his linear part of brain, logical (left) hemisphere to solve arousing problems

Hidden Dragon:

Passive person pays attention to code quality and acceptance criteria fulfillment, almost like dragon, flying in the sky and observing what tiger does
Can see whole solution, has an overall idea of subsequent tasks and steps
Uses his non-linear part of brain, abstract (right) hemisphere to imagine a flow and grasp the bigger picture

You would probably agree that above description is somewhat static and simply a bit boring. Frankly, you are right, but the key here is movement and synergy you can get from it.

How can we get the movement ?

Again, this is simple. You need a proper space. Comfortable desk, two chars and two keyboards. These things are really important, so do not neglect any of it, especially two keyboards. These prerequisites enable higher level of collaboration between developers. Apart from that they are not loosing time and ideas, while swapping keyboards.

And now comes the most important part. Developers are creating a collective mind, which is buzzing and boiling due to new, fresh ideas. The worst thing you can do, is not switching between Tiger and Dragon role. You have to do it frequently, without any hesitation and freely. You cannot be afraid of your partner saying leave that keyboard, etc. When you see Tiger did a typo, initialized a loop in a wrong way or did not do refactoring, which might be done - simply take over and use your own keyboard to fix it. You need to change perspectives very often and switch your brain from linear to non-linear mode and vice versa. That is the only way you can solve difficult problems.

Why pairing on solving tasks?

Mainly because of four reasons:

Readability - two people take care about readable code, both of them have to understand what was written

Code quality - percentage of defects found 40% - 60% (Steve McConnell "Code Complete")

Bus factor - knowledge sharing and mitigating the risk of single point of failure

Pressure - group shields developers from external pressure much better than single developer can protect himself. What's more, in stressful situations people tend to cluster to face difficult situation.

Okay, so when it is worth to pair?

It is definitely more expedient to pair on crucial and strategically important tasks. It is also useful to pair on both very hard, non-trivial implementations and medium level difficulty tasks. There is no use of pairing on relatively easy tasks, as they do not add so much value, comparing to effort.

Always remember about rotating pairs, not giving an opportunity to get used to your partner. The efficiency is decreasing then. Moreover, it is also good not to pair newbies, as there is not master-apprentice relation, which negatively affects pairing process. Definitely people need to like each other.

Code Review - No junk in a trunk

It is a sort of semi formal walkthrough the code aiming to:

share knowledge (mitigate bus factor)
check design from high level perspective
check at least SOLID rules appliance
check definition of done (DoD)
increase resilience before holidays

When can you perform code review?

Of course after development and testing, but before merging to master or trunk, so that there is no junk in a trunk.

The programmer or pair, who developed a solution, walks some other team member through new code. They start from telling the story and acceptance criteria, then they are explaining applied design and showing working, green, CI system tests. There is also a place for Q&A and potentially some code review remarks or changes. Code review is able to discover between 20% to 40% of defects (Steve McConnell "Code Complete").

Summary

Code Review and Pair Programming are two, complementary techniques improving the general quality of the provided solution. Apart from that they mitigate the risk associated with bus factor, by knowledge sharing.

Both of them create a useful unity in development cycle.

You may even use scratching tiger, hidden dragon parallel, in this case, as well. Pair Programming would be a tiger and code review a dragon looking at the solution from the distance.

It is much cheaper (sic!) to let developers be tigers and dragons, rather than giving them a wigging that they loose time for crap.

Rarely you have to choose between these two approaches. Then do it wisely and choose pairing. If your management is trying to be clumsy lean oriented and tells you should not do pairing, as it is a pure waste, take at least code review. Remember, half pairing is still better than not paring at all. It is always good to discuss design, start doing the most tricky bit together and then split and finish what you started separately, if time is so valuable.

However, in fact it does not hold true. You are writing robust and clean solutions faster while paring and reviewing, than not.

Saturday, 1 September 2012

Basic development cycle in Git

This article is a short overview of Git and how it can be used in development.
I started to use Git, some time ago and almost same time I fell in love in it. Now, I know that I'm fully devoted to Git and I don't want to come back to SVN, any more.
I don't want to slander SVN, as it solved majority of CVS drawbacks, which were teasing our industry. However, the thing is, Git is so much better, than SVN that honestly, I can't see the point of sticking to it, for even a minute longer, than I have to. SVN had its five minutes and used them in a pretty decent way. Now, it's time for the Git!
When it comes to Git's learning curve, it's not extremely steep. There is a well written book, titled GitPro, which shows all necessary concepts useful in every day development. If I were starting learning Git, I would read first three or four chapters, just to get a general idea what is the whole thing about.
In terms of architecture, the most influential decision made, was treating Git as a distributed Software Configuration Management (SCM) tool. That approach turned SVN's flaws into Git's virtues. Now, Git is boasting of its:

speed
failure resistance
spread backup

There is a very interesting observation showing that basing on a single design decision, Linus Torvalds got rid off all ridiculous issues we were heroically fighting with, in SVN. Let's quickly recap all of them and show how Git copes with his ancestors' cons:
Speed - there is no need to wait for syncing with remote repo, every single time you want to commit, as Git is using a local copy of remote repo. The only thing you do is occasionally pulling and pushing changes from/to remote repo.
Failure resistance - there is no concept of central repo, so during development process everybody has a copy of the whole, initial repo. Assuming destruction of one of the repos, the recovery is quick and easy. It bases on a copy of remote repo on other machines. It's much harder to loose all changes on all dev machines. As a consequence of one repo failure, there is no longer a tie up for all developers.
Spread backup - given the remote repo is down, you can still work on your local copy of remote branch and sync with remote one, when it's up and running.

Okay, let's get down to the business and show how basic, Git, development cycle may look like.
First of all, we would need a Git installed on our machine. Furthermore, a GitHub account would be a sort of must for us, too. We will use GitHub to set up a project and have a reliable remote repo. Everything is described on GitHub's bootcamp page, so I'm not diving into it.
When it's done, we have two options. We may start our project on our local machine and add it to remote repo or we can clone existing remote repository. To start a repository at current directory issue:

$ git init

If you like to copy, say remote repo, use:

$ git clone https://github.com/oskarkorczak/sample-repo.git

You are able to list all your repositories (local and remote), using below command:

$ git branch -a

Now, we would like to crete a separate branch and start developing our change:

$ git checkout -b name-of-the-new-branch

Basing on a small steps rule and TDD cycle, you would like to commit every single time you have a green test. In order to check, what changes are going to be tracked or potentially going to be committed issue:

$ git status

You can also commit whenever you feel you did something valuable in branch. That's as simple as saying to Git:

$ git commit -am "Here comes your commit message."

It basically adds your changes under version control (-a switch) and puts an inline message (-m switch).
If you would be curious what is happening in Git's history, you can always ask for it, by issuing commnad:

$ git log

Given that it took you one day to develop a change in your branch. Now, you would like to sync it with others and commit to remote repo. First of all, you have to switch to your local master branch, which is a local copy of remote master (main) branch:

$ git checkout master

Then, download all changes done by other developers to your local master branch:

$ git pull origin master

Origin is a default name of remote repository. It basically works like IP:port address pair in networks. Origin might be an equivalent of IP and port may be a branch name, in this case, it's master. Let's switch back to our branch:

$ git checkout name-of-the-new-branch

Now, we have to stop for a second and clarify how rebase works. In a nutshell, it's highly probable that there have been some commits of other developers. Rebase is a process of lining them up, so that all sync commits from last pull request from local master, would be before your local branch amendments. Simplifying, it means that rebase process will put all your changes at the end of updated master branch.

$ git rebase master

If it happen that you would have some confilcts, rebase command will stop at the first conflit and notify you about that. You have to resolve all conflicts manually or alternatively you can install a merge tool. Resolving process is a step similar to SVN. You have to go to particular file, delete unnecessary entires and tell Git it's done, by adding new changes under version control:

$ git add resolved-file-names

Then, you have to prod Git to carry on rebasing your changes with local master by:

$ git rebase --continue

If you forgot about adding something to the code, you are always able to abort rebasing by:

$ git rebase --abort

After that, it's time to send your branch to GitHub, where it could be picked up by some Continuous Integration system for building and testing purposes.

$ git push origin name-of-the-new-branch

When all tests are green you can raise a pull request. On GitHub it's simple, you are switching to your branch, pressing a "Pull Request" button and it's done. You're getting a link to pull request, which might be send to all developers to be reviewed. When code review is done, your pull request might be merged with remote master automatically by pressing a "Merge" button in pull request. There might be also a case that between the moment you send your changes to GitHub and hitting "Merge" button, somebody else merged his changes in overlapping area of the project. Then, there couldn't be done a fast-forward merge, which means repeating a part of above sequence of steps. You will have to update your local master, do the rebase and once again send updated branch to GitHub. Another "push" will update exiting GitHub's branch automatically. "Merge" button is available once again.

Assuming that you smoothly went through code review and merging procedure, the last thing you should do is cleanup. Remote branch is no longer necessary and it shouldn't clutter remote repository, so we are going to delete it:

$ git push origin :name-of-the-new-branch

To be honest, deleting remote branch is a bit dodgy for me. Who the heck invented colon before branch name as a deletion marker? A sort of -d switch is more natural, isn't it?
On the other hand, removing local branch is totally different and is done by command:

$ git branch -d name-of-the-new-branch

There are also two more, very useful Git commands. First one is responsible for fetching all remotes to local repo:

$ git pull --all

Second one is removing from local repo all branches, which are not existing on remote repository:

$ git remote prune origin

Later on, say next day, when your code base is changed by other team members, you can call a chain of useful commands. They will update master and all other branches, clean branches no longer maintained on remote repo and list local and remote branches, which are currently used. All these things might be done by invoking commands you already know:

$ git pull origin master; git pull --all; git remote prune origin; git br -a

At this point, you should know whole, basic Git development cycle.

There are also couple of golden rules related to development as such. They are especially useful while working with Git:

Baby steps - use TDD and commit every single time you have a green test.
Small scope of changes - keep your changes as small as possible, not to affect many places in the code base. It prevents your changes from alternating between you and other developers.
Short living branches - tightly connected with above point. The longer branch lives, the more vulnerable it is for not merging in a fast-forward mode.