Towards better Code Libraries with Git

Last updated:

A lot of problems recurring in Software Engineering are very commonly solved using the two following approaches:

  • Adding a layer between two communicating entities to allow for flexibility in the way they communicate;
  • Refactoring behaviour and moving it somewhere else.

Creating and maintaining a personal (or team-wide or even company-wide) code library is, in my opinion, better related to the latter rather than the former, although not very distant from it.

What I mean by code library is simply that mass of code that's been shown be repeated over and over again in many of our projects.

I (and I stand on the shoulders of giants here) believe firmly that everytime we see the same code even with slight variations three or more times on our projects we could benefit from extracting that code and moving it somewhere else (perhaps its own class) and reference that new place in each of the old places that piece of code was.

There are many books written on that subject (on how better to perform these extractions and refactorings) and so many acronyms I can't even recall so I'll leave that discussion to my peers and focus, instead, on a few ways such code library management would actually be implemented in practice.

From what little time I've spent thinking about this issue, I think some points stand out:

In all but the very largest projects, it should suffice to create a separate library class collection and copy those files (or whatever other piece of code you might want to share between your projects) into each project everytime you strat a new one. It might seem to much work but, with the help of tools such as Git, after you've copied (cloned, in git-speak) a library project into your new project, you only need to run git pull in your library directory to have the local library copy synchronize itself with any changes you might (will) have made in the main library.

If the task of going through each project and git pulling from the main library repository ever becomes too costly or tiresome (remember, though, that this is probably a task a simple shell script could accomplish easily) then it might be better to implement a fully-fledged CDN (content delivery netwrok).

By fully-fledged CDN I mean some way to server the code library on demand, at runtime to your projects, via the Internet ( or perhaps a local network) when and if you applications need them.

Some of the most obvious advantages of this approach centre around the fact that you're using a single repository rather than many local copies which need to be updated regularly.

A central code library repository would probably make for easier maintenance of the code library itself (not necessarily easier maintenance on the projects that use it) and, of course, you wouldn't need a local copy of the main library on each project and any changes to the main repository would instantly propagate to all projects that use code from that library, because that code is fetched directly from the library into the projects that use it.

I can also see at least one possible drawback to this approach. if all the projects you work in "drink from the same fountain" (the main repository), changes to that library would need to be very well thought out in order to prevent existing applications from breaking.

This could escalate to a maintenance hell or, at least to the creation of many branches on the main library repository to accomodate for old projects that use it, thereby defeating the whole purpose of centralized management.

Even though the cost of copying the central library and having to update (git pull) it locally for every project might become a bottleneck in really large projects or in large collections of projects, I still think the so-called decentralized (copying the library into each new project) approach provides better maintenance perspectives and allows for better per-project customization of your projects.

Dialogue & Discussion