‹ back home

What is git?

2020-03-25 #tech

git is one of the most important tools for a developer nowadays regardless of what programming language is being used.

Regrettably, it seems that nobody explains what git is to new developers at any point. Over the years, I’ve mentored many fellow developers of different levels of experience, and it’s clear that git is something that’s not explain well enough (if at all) in educational settings. Those who were familiar were mostly self-taught or had learned from colleagues.

Many courses seem to cover git, but only cover very advanced topics: branching, tags, git-flow, and alike. This is pretty much like teaching a medicine student how to do heart surgery on their first week – sure, it’s important, but it’s definitely not the first topic that should be covered, nor will they be able to successfully assimilate this skill during their first week anyway.

What is it for?

We’re all (hopefully) familiar with the idea of saving a file with code; when saved, it’s written to disk, and replaces the previous version. Sometimes we break code, and want to look back at previous versions to understand what we changed and how we broke it. Sometimes we look at lines of code and wonder “what was the author thinking when writing this?”.

Git allows saving “snapshots” of history for an entire codebase (e.g.: multiple files with code in them). While the files with code look the same, git keeps a history of changes in a .git directory so we can look back upon changes, compare them, and even roll back entirely.

Each “snapshot” or “past version” of the code that is saved is called a “commit”. Commits aren’t created automatically, they must be done explicitly (e.g.: by using the git commit command). Commits are accompanied by a descriptive message of what changed and why, so anyone inspecting history in future can figure out what was going on.

So what’s a commit?

Internally, git doesn’t keep a copy each file for each version. Instead, a commit is just a set of differences of what’s changed between one version and the previous one.

Say you have this code in your “version 1” (and you’ve committed this):

def do_stuff(obj):
    obj.do_something()
    print('done!')
    return True

And you have this code in “version 2”:

def do_stuff(obj):
    obj.do_something()
    return True

When you commit “version 2”, git merely records “this version deleted line #3 of somefile.py). When you ask it to show this commit, it’ll render it something like this:

@@ -1,4 +1,3 @@
 def do_stuff(obj):
     obj.do_something()
-    print('done!')
     return True

Notice that tiny - at the beginning? That means “this line was deleted”. (side note: git doesn’t literally store this, it’s more efficient, but this is the general idea).

A bit more formally, a commit is a set of changes applied (a.k.a.: a “delta”) to a codebase, with an attached human-friendly message.

How do I create a new commit

So, let’s assume you’re working on some files with code (maybe an assignment?). The first thing you want to do it initialise the current directory as a git repository.

This basically means “tell git to set up it’s stuff here”. A repository is basically a directory, but you can push it (“send my commits it to a repository on a remote machine”), or pull it (“copy commits from a remote machine”).

Initialising is done running git init with the official/cli client, but your IDE may have some other way of doing it.

Once you’re working on a repository, you need to follow a few steps to create a commit:

The importance of commit messages

If you work on a same codebase for days/weeks/months, you’ll many times want to look back. “Oh, how was this code written in the old version where it didn’t crash?”, or “Why did I write this line, what was I thinking?”.

The key to writing commit messages, is to quickly describe what you did, and why you did anything that might not seem obvious. When you work with a team, they’ll appreciate the attention to detail. When you work alone, your future self will thank you even more.

My general guideline is to think “What would be useful to my future self when trying to understand why I made this commit?”.

Sometimes we don’t leave comments in the code like “added this check because otherwise X and Y happen” because too many of those end up hurting readability and aren’t always important enough. A commit messages is the perfect place for this. It’ll never be in the way, but will forever be findable.

As you use git and continue working on more complicated codebases, you should learn to appreciate the ability to look back and figure out how and why you did things.

Final words on commits

Think of commits as checkpoints. Did you break everything and lose? No problem, yo can go back to the last checkpoint. Or use git diff to see exactly what you’ve change since you did it.

You can also use git stash. Stashing “moves aside” all the changes you’ve made, so you can test your last commit and see if something worked (or not). This is great if you want to figure out if you’ve broken something, or if it was always broken. git stash pop will restore the stashed changes.

Want to know more?

Thee intention with this article is to teach what git is for, and why you should care. If I’ve achieve my goal, you’ll probably want to learn more and incorporate it into your workflow – learning it pays off pretty fast, and you’ll save so much time so fast that it’s a great investment early in one’s career as a developer.

A great starting guide is the official git book. It’s pretty short (in under 30 pages you’ll have everything you need to know and lots more!), and very hands on:

https://git-scm.com/book/en/v2


Update: This article was slightly re-worded in July 2022 and September 2022 to improve readability and clarify some items.

— § —