Distributed SCMs: Be Smart, Use the Bleeding Edge

Posted by Mathew Abonyi Sun, 05 Aug 2007 05:38:00 GMT

I’ll preface this title by admitting I’ve not paid much attention to my version control software. When you’re smart (cue 16-ton weight), you try to think ahead and choose software or hardware which will vastly reduce the effort of programming. While staying on the edge with most things, for version control I neglected to really think about it. I had a very brief and painful time with CVS and promptly converted to Subversion because it was already installed, everyone used it, it was much easier, it solved the immediate problems with CVS I had, and it did what I needed to do at the time.

Of course, in programming, we all know that “what I needed to do” is never the same as “what I need to do now”. I have a lot more code to manage, I have a lot of projects which I am developing on my own or manage or take a serious part in, and I have more projects which I want to dip my feet into. The number of snippets, little things here and there, and feedback or changes from other people has meant a lot of branches, tags, svn cp and other management issues.

Now, this might sound like a “shiny thing” moment, but it is completely different from shiny thing syndrome. My advice to practically everyone reading this is to keep moving at the cutting edge of version control… and that doesn’t mean upgrading to Subversion 1.4.4. It means that there are genuine advances in version control which go well beyond Subversion and well beyond what it will ever be. I’ve known about distributed SCMs for a while, but haven’t really bothered looking into it. I’m writing this post because I think a lot of people are in the same boat. Subversion is supported, if not as the default SCM, practically everywhere: Sourceforge, Rubyforge, Google Code, Unfuddle, Lighthouse, etc. Every respectable project hosting service uses Subversion and almost every plugin, gem or developer out there at the moment references to Subversion repositories. Well, I’m going to make a serious splash in this department and say: it’s time to move on from Subversion. I have completely grown out of it, and I’m not even managing a codebase as large as Rails (or even ActiveRecord).

The single biggest development in version control over the last umpteen years, but specifically the last 2, is distributed version control. Basically, that means everyone’s working copy is a repository, every commit is local and every ‘commit’ to another repository is a ‘push’ (every ‘update’ from another repository is a ‘pull’)—that means merging is the default behaviour when you communicate with another repository. In other words, every repository is a branch. This way, distributed version control doesn’t force a single source code management architecture. These systems are completely open and the advantages to this model-less model are (obviously) endless:

  • most importantly, distribution is all about caring more about the coder-to-code relationship than the coder-to-project relationship
  • Distributed SCMs are orders of magnitude faster in almost every department
  • they usually use a bit more space
  • actual commits are almost instant, since they are local
  • merging is simplified enormously—it’s the daily business of a DSCM
  • no need for an update—you merge when ready
  • development can happen anywhere—no connection required
  • no default model of management or hierarchy
  • much more intuitive code structure—do what you like
  • no need for access control, but it’s there if you need it
  • etc.

You’ll see the benefits just in the development of your own code, without merging with anyone else’s repositories but your own, that distribution of management doesn’t require distribution of code. You can use any methodology, any hierarchy, any paradigm of managing your code that you wish.

Returning to my original point, staying at the bleeding edge doesn’t mean upgrade to the next minor or major version to get the latest feature. It means upgrade, or change software if necessary, to use the latest breakthroughs which will cut costs and save time for your own projects or your company’s projects.

Now, you’re probably asking which distributed SCM to use. I personally go for Mercurial for the following reasons:

  • SVK is a lie—it is not distributed; it just distributes SVN, which is like a multiplying a herd of turtles
  • much faster than Arch, Darcs and Bazaar
  • only Git is faster—but that was developed by Linus Torvalds
  • Mercurial and Git are relatively younger, meaning they’ve learnt from the mistakes of other DSCMs
  • Mercurial and Git come from the same background (the BitKeeper drama) and BitKeeper, according to Torvalds, was the only SCM worth using
  • Mercurial compares feature-for-feature with other DSCMs, and then some
  • Mercurial uses less space than others
  • very user friendly CLI—by far the easiest to learn of the DSCMs
  • decent documentation
  • a few project hosting sites already (like ShareSource)
  • ability to have Mercurial support using the HTTP front-end, even where you normally wouldn’t have it (i.e. Sourceforge and Rubyforge)
  • If you don’t have a feature in Mercurial, find or code a Mercurial extension—it is fully customisable

My Forceful Conclusion

Like the language you are using (Ruby 1.8), the framework (Rails 1.2), the testing suite (RSpec 1.0 and Mocha), the operating system (Mac OS X 10.4 and Linux 2.6), the web server (nginx, LiteSpeed, or lighttpd), the application server (LiteSpeed or Mongrel), the blogging software (Typo or Mephisto), even the hardware (Amazon S3/EC2 or many dual core 2+ GHz servers with many GBs of RAM), you need to use the latest version control software (Mercurial 0.9.4 or Git 1.5). If you don’t, it’s like using an old server with 333MHz & 256MB RAM, or Apache 1.3 & mod_ruby. But unlike an out-moded application server, SVN will make you a slower developer than the one sitting next to you using Mercurial, Git, Darcs or Bazaar. Oh, and I needn’t mention that he’ll get all the babes too.

Posted in , ,  | Tags , , , , , , , , ,  | 10 comments | no trackbacks

Comments

  1. Chris said about 3 hours later:

    Can mercurial checkout from & commit to an SVN repo? This is currently the feature keeping me on Git.

  2. Mathew Abonyi said about 3 hours later:

    Yup. Check out the hgsvn project. It maintains a hybrid SVN/Mercurial working directory, so you can do both. Also, hgsvn can be used to convert an SVN repo to a Mercurial one—that’s the process I used, which, though slow, converts everything.

  3. Jacob Atzen said about 3 hours later:

    I find DSCM very interesting. I do have a few issues with it though which somebody might be able to clarify:

    - Backup, how does one enforce backup of every developers code if it’s located on their own machines instead of in a central repository – and how much more diskspace does it take as I assume you will need to backup every single developers full tree?

    - Unity, in lack of a better word, how does one ensure that all developers have the most recent code. In a highly agile environment where shared ownership is the order of the day one needs to be sure that everybody agrees on what the latest version of the code looks like?

    - Tool support, what tools are there out there supporting DSCM’s? I’m thinking stuff like Trac and Tortoisesvn?

    - One-point-failure, it seems to me that with DSCM there needs to be one person tasked with the job of tracking the latest “official” version of the code. Why would we want to spend a developers time doing this when a centralized repository does it much better?

    I hope someone will care to elaborate on these points.

  4. Vineet Kumar said about 3 hours later:

    You say to use an OS you like, “(Mac OS X 10.4 and Linux 2.6)”. Just so you know, Linus Torvalds had something to do with one of those, too.

  5. andy said about 3 hours later:

    only Git is faster—but that was developed by Linus Torvalds
    Mercurial and Git come from the same background (the BitKeeper drama) and BitKeeper, according to Torvalds, was the only SCM worth using

    So wait.. is Linus affiliation a good thing or bad thing? :)

  6. andy said about 4 hours later:

    Hey Jacob, I can cover your questions.

    With DSCMs it’s really easy to set up one machine as the “central server” machine. Everyone on the team agrees to push their changes to that machine (like svn commit), and to regularly pull the latest version (like svn update). This doesn’t have to be a machine anyone is using, it can just be a server in the closet. In other words, it can work just like a centralized SCM.

    But DSCMs are awesome even when you use a centralized model. Separate teams can each have their own “central” repos, and they can decide when they want to share changes with everyone else. QA can keep a separate “fully tested” branch. Developers can skip the cental repo and just share changes with each other. Etc.

    For disk space, overall you do use more space, seeing as how every developer has a full copy of the repository. But hard drives these days can take it.

    For tool support, most DSCMs have some tool support, but it’s generally not as good as SVN’s tool support (yet).

  7. Mathew Abonyi said about 6 hours later:

    @Vineet: Neither good nor bad. Linus is an extraordinary programmer, but also has his own quirky way of doing things. Linux, I think, is amazingly good as an operating system and I used to use it religiously before Mac OS X. Git, however, is too quirky and specific to Linus’ needs of managing the Linux Kernel. When he says that BitKeeper was the only SCM worth using, I take that as an extraordinary programmer’s point of view who would otherwise use tarballs and patches (no SCM). However, I wouldn’t connect that comment with Git in anyway.

    His comment is actually quite important, because it should encourage one to consider what it is that an SCM really gives you—it shouldn’t just be a glorified tarball, which I think things like CVS and SVN actually are when you give it real consideration.

  8. Chris said about 16 hours later:

    For the record:

    For now hgsvn is primarily intended at doing local mirrors, private branches, or patch-driven development (by submitting patches to project maintainers, which is necessary if you don’t have SVN commit access anyway). However, an hgpushsvn command is in the plans that will allow one day to push all changes to SVN automatically.

    So those of us working in SVN teams while using git-svn can’t quite move to hg yet :)

  9. Mathew Abonyi said 1 day later:

    @andy: Actually I find Mercurial uses as much as or less disk space than an SVN repository. This I imagine is due to the way Mercurial stores changesets and creates checkpoints, whereas SVN only creates simple deltas. But in terms of working copies, yes, you sacrifice about 10% extra for a Mercurial repository/working directory vs a Subversion working directory. The advantages, though, as you say, far outweigh the little bit of extra space.

  10. dan ros said 12 days later:

    good job :-)

Trackbacks

Use the following link to trackback from your own site:
http://www.mathewabonyi.com/articles/trackback/54

Comments are disabled