Why a small atomic commit is easier to work with
Originally posted on May 05, 2016.
A few weeks ago I attended a series of 3 talks from Martin Fowler in Australian Technology Park, Redfern, Sydney. One of the talks was about deriving the application state from a sequence of persisted events. A.K.A Event Sourcing:
[…] The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as the application state itself.
That talk reminded me a lot about Git, which is built using the principle that if you replay all committed changes since the beginning and in the same chronological order, you will get the exact same result. The current state.
In Git, if you replay all committed changes since the beginning and in the same chronological order, you will get the exact same result.
However, Git is just a tool. It is the responsibility of the engineer to use it in a way that brings the most valuable possible outcome. One of these responsibilities is taking care of how changes are inserted into the VCS, creating a commit which purpose should reflect one change and one change only. Also known as an atomic change.
As from the emacs manual, in database terminology, an atomic change is an indivisible change — it can succeed entirely or it can fail entirely, but it cannot partly succeed.
In Git, it means that a change should be able to be reverted (git-revert) and not cause any side effects or conflicts in other parts of the system other than the one that is being reverted. Also, it should contain a single change that doesn’t have real value if applied partially.
An atomic commit should be able to be reverted or applied without side effects.
Another important point about atomic commits is the fact that it should not break the normal flow of your build, it should just remove or add something cleanly. If you have a build routine or tests, you should be able to run it successfully whether the commit is there or not, just by assuming a specific set of premises. “Premises” in this context represents the required state of the codebase in which the commit can be applied with the least amount of code conflicts.
This is more important if you are committing into the master branch, the one branch in which the history should always be in a consistent and immutable state.
An atomic change is a piece of functionality that can be replayed over and over again against a specific set of premises.
Not breaking the build is an important aspect of atomicity, because then it is possible to reset your application to any state in order to see how the application was working at that time. It is also possible to easily leverage built-in tools to find bugs in the history (such as git bisect), something that can’t be easily done if one can’t run the build after a reset.
As the last point, we have the principle of traceability of changes. Ideally one should be able to track the whole source and purpose of a change through the history of the commits without having to talk with the original author, because he or she might not be available anymore or not even remember what that change was all about. If a commit does more than one thing, it might be impossible to understand in the future why those lines in the system were changed.
If a commit does more than one thing, it might be impossible to understand in the future why those lines in the system were changed
If you create commits with more than one change, it will be hard to find the point in time in which a mistake or feature was introduced in the codebase, it will be hard to reset the codebase to a previous state, and might be impossible to revert a modification without side effects.
This is a principle, not a law. Be thoughtful and make the best decision given the circumstances.
See also One Pull Request. One Concern.
Thanks for reading. If you have some feedback, reach out to me on Twitter, Facebook or Github.