Git Internals: How Git Stores Data and History on Disk
Table of Contents
Let’s take a look at some of git’s “plumbing”, that is, the commands that are mostly used for working “under-the-hood” with Git, and that you’ll only use if you’re trying to debug your Git files themselves.
All the content from our Boot.dev courses are available for free here on the blog. This one is the “Internals” chapter of Learn Git. If you want to try the far more immersive version of the course, do check it out!
Why Do Two Identical Repos Have Different Commit Hashes?
Even though two repos might have the same content, they’ll have different commit hashes. While commit hashes are derived from content changes, there’s also some other stuff that affects the end hash:
- The commit message
- The author’s name and email
- The date and time
- Parent (previous) commit hashes
Hashes are effectively unique in practice. Git uses a cryptographic hash function called SHA-1 to generate them, so you won’t accidentally create two different commits with the same hash. You might also hear commit hashes referred to as “SHAs”.
Where Is Git’s Data Actually Stored?
All the data in a Git repository is stored directly in the (hidden) .git directory. That includes all the commits, branches, tags, and other objects.
Git is made up of objects that are stored in the .git/objects directory. A commit is just a type of object. You can poke around in there yourself:
ls -l .git/objects
ls -al .git/objects/5b/
The first two characters of a hash become the directory name, and the remaining characters become the filename. It’s just files all the way down.
What Do Git Object Files Look Like?
Well, they’re not easily readable. If you try to cat a raw object file, it’s a mess — the contents have been compressed to raw bytes. You can use xxd to print them in hex:
cat .git/objects/5b/a786fc...
# x??...garbage...
xxd .git/objects/5b/a786fc...
# 00000000: 789c ...
Git compresses objects before storing them. You’re not supposed to read them directly.
How Do You Read a Commit With git cat-file?
Git has a built-in plumbing command, git cat-file, that allows us to see the contents of a commit without needing to futz around with the object files directly:
git cat-file -p 5ba786f
tree 4e507fdc6d9044ccd8a4a3061324c9f711c4667d
author ThePrimeagen <[email protected]> 1705891256 -0700
committer ThePrimeagen <[email protected]> 1705891256 -0700
A: add contents.md
Notice that we can see the tree object, the author, the committer, and the commit message. But we cannot see the contents of contents.md itself. That’s because the blob object stores it.
log is a porcelain command, while cat-file is a plumbing command. You’ll use log much more often when working on coding projects, but cat-file is useful for understanding Git’s internals.
What Are Trees and Blobs in Git?
- tree: Git’s way of storing a directory
- blob: Git’s way of storing a file
You walk the chain with cat-file. Inspect the commit to get its tree hash, then inspect the tree to find blobs:
git cat-file -p <tree-hash>
# 100644 blob 9abc0123... contents.md
git cat-file -p <blob-hash>
# # contents
The commit points to a tree, the tree points to blobs. Trees can also point to other trees, which is how nested directories work.
Why Does a Second Commit Have a Parent Field?
When you make a second commit and inspect it with cat-file, you’ll notice one extra field that wasn’t in the first commit: parent.
git cat-file -p <second-commit-hash>
# tree abcd1234...
# parent 5ba786fc...
# author ThePrimeagen <[email protected]> ...
That parent pointer is how Git builds history — each commit points back to the one before it. That’s what makes git log work.
How Does Git Store Snapshots Without Wasting Space?
Git stores an entire snapshot of files on a per-commit level. This is surprising (or at least it was to me)! You might assume each commit only stores the changes (“diff”) made in that commit. Nope.
But Git has some performance optimizations so that your .git directory doesn’t get too unbearably large:
- Git compresses and packs files to store them more efficiently.
- Git deduplicates files that are the same across different commits. If a file doesn’t change between commits, Git will only store it once.
You can verify this yourself: inspect the blob hash for a file in one commit, then inspect it again in the next commit (where the file didn’t change). Same hash, same blob. Git doesn’t duplicate it.
Frequently Asked Questions
Why can two similar Git commits have different hashes?
Because a commit hash is derived from the tree, parent commit, author, committer, timestamp, and message. Change any of those and the hash changes.
What is a Git object?
A Git object is one of the core data types Git stores inside .git/objects. The three main kinds are commits, trees, and blobs.
What is the difference between a tree and a blob in Git?
A tree is Git's way of storing a directory. A blob is Git's way of storing a file. Commits point to trees, and trees point to blobs or other trees.
What does git cat-file do?
git cat-file is a plumbing command that inspects Git objects. With the -p flag it pretty-prints the object so you can see commits, trees, and blobs in a readable format.
Does Git store a full copy of every file in every commit?
Git stores entire snapshots per commit, but it deduplicates files that are the same across different commits. If a file doesn't change, Git only stores it once.
Related Articles
Git Branching: Create, Switch, and Manage Branches
Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser
A Git branch allows you to keep track of different changes separately. For example, you can create a new branch to experiment with changing a color scheme without affecting your primary branch. If you like the changes, you merge (or rebase) the branch back into main. If you don’t, you delete it.
Git Config: Set Your Name, Email, and Branch Defaults
Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser
Git stores author information so that when you’re making a commit it can track who made the change. Here’s how all that configuration actually works.
Git Merge: Combine Branches and Understand Fast-Forwards
Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser
“What’s the point of having multiple branches?” you might ask. They’re most often used to safely make changes without affecting your (or your team’s) primary branch. Once you’re happy with your changes, you’ll want to merge them back into main so that they make their way into the final product.
Git for Beginners: Install It and Make Your First Commit
Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser
Git is the distributed version control system (VCS). Nearly every developer in the world uses it to manage their code. It has quite a monopoly on VCS. Developers use Git to: