The Engineering of Git: A Directed Acyclic Graph of Hashes
Many developers treat Git as a magical black box that uploads code to the cloud. In
reality, Git is not a version control system in the traditional sense; it is a
meticulously engineered, purely functional content-addressable filesystem
that operates entirely via cryptographic hashes. To truly master Git—and easily reverse any
mistake—you must look inside the invisible .git folder.
Part 1: The Three Trees
Git manages three completely distinct areas of your computer. Understanding how files move between them is the key to mastering Git:
- The Working Directory: This is the physical folder you see in your code editor. Git tracks changes here, but does not permanently save them.
- The Staging Area (Index): A hidden file (
.git/index) that meticulously queues up exactly which files you intend to include in the next permanent snapshot. When you rungit add process.js, you are mathematically hashing the current state ofprocess.jsand placing that hash in the Index. - The .git Directory (Repository): The actual database. If you delete
your working directory but keep the
.gitfolder, you still possess 100% of the project's history.
Part 2: Content-Addressable Storage (Blobs)
Unlike Subversion (SVN), Git does not store differences (deltas) between files. When you
type git commit, Git takes an entire, identical snapshot of
every single file in your repository.
If you have a 1,000-page novel and change one letter on page 50, does Git duplicate the other 999 pages? No, due to Content Addressing.
Every file placed into Git is passed through the SHA-1 cryptographic hashing algorithm.
The resulting 40-character hash (e.g., a1b2c3d4e5f6...) becomes the file's
permanent name inside .git/objects. The object is compressed via Zlib and
stored forever. This is called a Blob.
Crucially, Blobs do not store filenames. If you rename math.js to
calc.js without changing the content, the SHA-1 hash remains identical. Git seamlessly
realizes it already possesses that exact file content and entirely skips duplication.
Part 3: Directories (Trees)
If Blobs don't store filenames or folder structures, how does Git know your project layout? Through Tree Objects.
A Tree is a simple text file that maps human-readable filenames to cryptographic SHA-1 hashes.
Notice that the Tree can point downward to other Trees (subdirectories). When you commit, Git generates a master "Root Tree" representing the top level of your project. This entire directory structure is itself hashed into a single 40-character SHA-1.
Part 4: The Commit Object and The DAG
A Commit Object is incredibly tiny. It is a 200-byte text file containing only four things:
- The SHA-1 hash of the Root Tree (the exact snapshot of the filesystem).
- The SHA-1 hash of the Parent Commit (the commit that came directly before this one).
- The Author Name, Email, and Timestamp.
- The human-readable Commit Message.
The Commit Object itself is then hashed. Because every commit immutably points to its parent, Git history forms a Directed Acyclic Graph (DAG). If you altered a past commit, its SHA-1 hash would completely change, which would break the parent pointer of the subsequent commit. Mathematical integrity is guaranteed.
Part 5: Branches are 41-Byte Files
In older SVN systems, creating a branch meant physically copying all source files into a new network directory, a slow and catastrophic process.
In Git, a Branch is literally a text file located at .git/refs/heads/main.
Open it in a text editor, and you will see exactly 41 bytes: a 40-character Commit SHA-1
and a newline.
When you run git branch feature-x, Git instantly creates a new tiny file
named feature-x containing the exact same SHA-1 hash as your current branch. It
takes milliseconds because absolutely NO application text was copied.
When you create a new commit while on feature-x, Git mathematically
calculates the new Commit Hash, and completely overwrites the feature-x text file
to point to the new hash. The pointer physically moves forward through the graph.
Part 6: HEAD and Detached States
How does Git know which branch you are currently on? It checks a special file located at .git/HEAD.
Normally, the HEAD file contains a symbolic reference: ref: refs/heads/main.
It points directly to the branch name. If you switch branches via
git checkout feature-x, Git surgically modifies the HEAD file to read
ref: refs/heads/feature-x, and alters your working directory to match the
target Commit.
Detached HEAD State
If you run git checkout [specific-commit-hash], you are bypassing branches
entirely. Git forcefully overwrites the .git/HEAD file to contain that raw
40-character Hash.
Why is this highly dangerous? If you create new commits in this state, they will form a new
path in the Graph. But because NO branch pointer is moving forward to track them, the instant
you switch back to main, those new commits will become entirely orphaned.
The Git Garbage Collector will eventually delete them permanently, as they are no longer
reachable by any known Reference.