Git Internals: How Git Stores Data and History on Disk

Boot.dev Blog » DevOps » Git Internals: How Git Stores Data and History on Disk

ThePrimeagen Ex-Netflix engineer, NeoVim ricer, and Git rebaser

Last published March 31, 2026

Let’s take a look at some of git’s “plumbing”, that is, the commands that are mostly used for working “under-the-hood” with Git, and that you’ll only use if you’re trying to debug your Git files themselves.

All the content from our Boot.dev courses are available for free here on the blog. This one is the “Internals” chapter of Learn Git. If you want to try the far more immersive version of the course, do check it out!

Why Do Two Identical Repos Have Different Commit Hashes?

Even though two repos might have the same content, they’ll have different commit hashes. While commit hashes are derived from content changes, there’s also some other stuff that affects the end hash:

The commit message
The author’s name and email
The date and time
Parent (previous) commit hashes

Hashes are effectively unique in practice. Git uses a cryptographic hash function called SHA-1 to generate them, so you won’t accidentally create two different commits with the same hash. You might also hear commit hashes referred to as “SHAs”.

Where Is Git’s Data Actually Stored?

All the data in a Git repository is stored directly in the (hidden) .git directory. That includes all the commits, branches, tags, and other objects.

Git is made up of objects that are stored in the .git/objects directory. A commit is just a type of object. You can poke around in there yourself:

ls -l .git/objects
ls -al .git/objects/5b/

The first two characters of a hash become the directory name, and the remaining characters become the filename. It’s just files all the way down.

What Do Git Object Files Look Like?

Well, they’re not easily readable. If you try to cat a raw object file, it’s a mess — the contents have been compressed to raw bytes. You can use xxd to print them in hex:

cat .git/objects/5b/a786fc...
# x??...garbage...

xxd .git/objects/5b/a786fc...
# 00000000: 789c ...

Git compresses objects before storing them. You’re not supposed to read them directly.

How Do You Read a Commit With git cat-file?

Git has a built-in plumbing command, git cat-file, that allows us to see the contents of a commit without needing to futz around with the object files directly:

git cat-file -p 5ba786f

tree 4e507fdc6d9044ccd8a4a3061324c9f711c4667d
author ThePrimeagen <[email protected]> 1705891256 -0700
committer ThePrimeagen <[email protected]> 1705891256 -0700

A: add contents.md

Notice that we can see the tree object, the author, the committer, and the commit message. But we cannot see the contents of contents.md itself. That’s because the blob object stores it.

log is a porcelain command, while cat-file is a plumbing command. You’ll use log much more often when working on coding projects, but cat-file is useful for understanding Git’s internals.

What Are Trees and Blobs in Git?

tree: Git’s way of storing a directory
blob: Git’s way of storing a file

You walk the chain with cat-file. Inspect the commit to get its tree hash, then inspect the tree to find blobs:

git cat-file -p <tree-hash>
# 100644 blob 9abc0123... contents.md

git cat-file -p <blob-hash>
# # contents

The commit points to a tree, the tree points to blobs. Trees can also point to other trees, which is how nested directories work.

Why Does a Second Commit Have a Parent Field?

When you make a second commit and inspect it with cat-file, you’ll notice one extra field that wasn’t in the first commit: parent.

git cat-file -p <second-commit-hash>
# tree abcd1234...
# parent 5ba786fc...
# author ThePrimeagen <[email protected]> ...

That parent pointer is how Git builds history — each commit points back to the one before it. That’s what makes git log work.

How Does Git Store Snapshots Without Wasting Space?

Git stores an entire snapshot of files on a per-commit level. This is surprising (or at least it was to me)! You might assume each commit only stores the changes (“diff”) made in that commit. Nope.

But Git has some performance optimizations so that your .git directory doesn’t get too unbearably large:

Git compresses and packs files to store them more efficiently.
Git deduplicates files that are the same across different commits. If a file doesn’t change between commits, Git will only store it once.

You can verify this yourself: inspect the blob hash for a file in one commit, then inspect it again in the next commit (where the file didn’t change). Same hash, same blob. Git doesn’t duplicate it.

Frequently Asked Questions

Why can two similar Git commits have different hashes?

Because a commit hash is derived from the tree, parent commit, author, committer, timestamp, and message. Change any of those and the hash changes.

What is a Git object?

A Git object is one of the core data types Git stores inside .git/objects. The three main kinds are commits, trees, and blobs.

What is the difference between a tree and a blob in Git?

A tree is Git's way of storing a directory. A blob is Git's way of storing a file. Commits point to trees, and trees point to blobs or other trees.

What does git cat-file do?

git cat-file is a plumbing command that inspects Git objects. With the -p flag it pretty-prints the object so you can see commits, trees, and blobs in a readable format.

Does Git store a full copy of every file in every commit?

Git stores entire snapshots per commit, but it deduplicates files that are the same across different commits. If a file doesn't change, Git only stores it once.

Git Internals: How Git Stores Data and History on Disk

Why Do Two Identical Repos Have Different Commit Hashes?

Where Is Git’s Data Actually Stored?

What Do Git Object Files Look Like?

How Do You Read a Commit With git cat-file?

What Are Trees and Blobs in Git?

Why Does a Second Commit Have a Parent Field?

How Does Git Store Snapshots Without Wasting Space?

Frequently Asked Questions

Why can two similar Git commits have different hashes?

What is a Git object?

What is the difference between a tree and a blob in Git?

What does git cat-file do?

Does Git store a full copy of every file in every commit?

Related Articles

Git Branching: Create, Switch, and Manage Branches

Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser

Git Config: Set Your Name, Email, and Branch Defaults

Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser

Git Merge: Combine Branches and Understand Fast-Forwards

Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser

Git for Beginners: Install It and Make Your First Commit

Mar 31, 2026 by ThePrimeagen - Ex-Netflix engineer, NeoVim ricer, and Git rebaser