Distributed SCMs: Be Smart, Use the Bleeding Edge
Posted by Mathew Abonyi Sun, 05 Aug 2007 04:38:00 GMT
I’ll preface this title by admitting I’ve not paid much attention to my version control software. When you’re smart (cue 16-ton weight), you try to think ahead and choose software or hardware which will vastly reduce the effort of programming. While staying on the edge with most things, for version control I neglected to really think about it. I had a very brief and painful time with CVS and promptly converted to Subversion because it was already installed, everyone used it, it was much easier, it solved the immediate problems with CVS I had, and it did what I needed to do at the time.
Of course, in programming, we all know that “what I needed to do” is never the same as “what I need to do now”. I have a lot more code to manage, I have a lot of projects which I am developing on my own or manage or take a serious part in, and I have more projects which I want to dip my feet into. The number of snippets, little things here and there, and feedback or changes from other people has meant a lot of branches, tags, svn cp and other management issues.
Now, this might sound like a “shiny thing” moment, but it is completely different from shiny thing syndrome. My advice to practically everyone reading this is to keep moving at the cutting edge of version control… and that doesn’t mean upgrading to Subversion 1.4.4. It means that there are genuine advances in version control which go well beyond Subversion and well beyond what it will ever be. I’ve known about distributed SCMs for a while, but haven’t really bothered looking into it. I’m writing this post because I think a lot of people are in the same boat. Subversion is supported, if not as the default SCM, practically everywhere: Sourceforge, Rubyforge, Google Code, Unfuddle, Lighthouse, etc. Every respectable project hosting service uses Subversion and almost every plugin, gem or developer out there at the moment references to Subversion repositories. Well, I’m going to make a serious splash in this department and say: it’s time to move on from Subversion. I have completely grown out of it, and I’m not even managing a codebase as large as Rails (or even ActiveRecord).
The single biggest development in version control over the last umpteen years, but specifically the last 2, is distributed version control. Basically, that means everyone’s working copy is a repository, every commit is local and every ‘commit’ to another repository is a ‘push’ (every ‘update’ from another repository is a ‘pull’)—that means merging is the default behaviour when you communicate with another repository. In other words, every repository is a branch. This way, distributed version control doesn’t force a single source code management architecture. These systems are completely open and the advantages to this model-less model are (obviously) endless:
- most importantly, distribution is all about caring more about the coder-to-code relationship than the coder-to-project relationship
- Distributed SCMs are orders of magnitude faster in almost every department
- they usually use a bit more space
- actual commits are almost instant, since they are local
- merging is simplified enormously—it’s the daily business of a DSCM
- no need for an update—you merge when ready
- development can happen anywhere—no connection required
- no default model of management or hierarchy
- much more intuitive code structure—do what you like
- no need for access control, but it’s there if you need it
- etc.
You’ll see the benefits just in the development of your own code, without merging with anyone else’s repositories but your own, that distribution of management doesn’t require distribution of code. You can use any methodology, any hierarchy, any paradigm of managing your code that you wish.
Returning to my original point, staying at the bleeding edge doesn’t mean upgrade to the next minor or major version to get the latest feature. It means upgrade, or change software if necessary, to use the latest breakthroughs which will cut costs and save time for your own projects or your company’s projects.
Now, you’re probably asking which distributed SCM to use. I personally go for Mercurial for the following reasons:
- SVK is a lie—it is not distributed; it just distributes SVN, which is like a multiplying a herd of turtles
- much faster than Arch, Darcs and Bazaar
- only Git is faster—but that was developed by Linus Torvalds
- Mercurial and Git are relatively younger, meaning they’ve learnt from the mistakes of other DSCMs
- Mercurial and Git come from the same background (the BitKeeper drama) and BitKeeper, according to Torvalds, was the only SCM worth using
- Mercurial compares feature-for-feature with other DSCMs, and then some
- Mercurial uses less space than others
- very user friendly CLI—by far the easiest to learn of the DSCMs
- decent documentation
- a few project hosting sites already (like ShareSource)
- ability to have Mercurial support using the HTTP front-end, even where you normally wouldn’t have it (i.e. Sourceforge and Rubyforge)
- If you don’t have a feature in Mercurial, find or code a Mercurial extension—it is fully customisable
My Forceful Conclusion
Like the language you are using (Ruby 1.8), the framework (Rails 1.2), the testing suite (RSpec 1.0 and Mocha), the operating system (Mac OS X 10.4 and Linux 2.6), the web server (nginx, LiteSpeed, or lighttpd), the application server (LiteSpeed or Mongrel), the blogging software (Typo or Mephisto), even the hardware (Amazon S3/EC2 or many dual core 2+ GHz servers with many GBs of RAM), you need to use the latest version control software (Mercurial 0.9.4 or Git 1.5). If you don’t, it’s like using an old server with 333MHz & 256MB RAM, or Apache 1.3 & mod_ruby. But unlike an out-moded application server, SVN will make you a slower developer than the one sitting next to you using Mercurial, Git, Darcs or Bazaar. Oh, and I needn’t mention that he’ll get all the babes too.
