Introduction to Git: Version Control Explained from Scratch
- A Git repository is just a normal folder with a hidden .git directory inside it — that directory is Git's entire memory, storing every snapshot of your project ever taken.
- Committing is a two-step process by design: 'git add' stages the exact changes you want, and 'git commit' permanently seals that snapshot with a message — this intentional separation gives you precision control over what goes into each save.
- Branches let you experiment in complete isolation from your working code — the industry-standard workflow is to never commit directly to main, always branch, then merge after review.
- Every commit is a permanent, timestamped snapshot with a unique SHA.
- Git is local-first — all operations (branching, committing, history) work without a network.
- Branches are lightweight pointers — creating one costs almost nothing.
- Working directory: your actual files.
- Staging area (index): what you've selected for the next commit.
- Repository (.git): the complete history graph.
- Remote: a shared copy hosted elsewhere (GitHub, GitLab).
- Git stores content as snapshots, not diffs — this is why branching is instant and history traversal is fast.
- The reflog is your safety net — it records every HEAD movement for 90 days, allowing recovery of "deleted" commits.
- Force-pushing to shared branches destroys history and breaks teammates' local repos.
You ran `git reset --hard` and lost uncommitted work.
git stash listgit reflogA teammate force-pushed and your branch is now diverged from remote.
git fetch origingit log --oneline origin/main..HEADYou need to find who introduced a bug and when.
git blame -L 40,50 src/io/thecodeforge/service/PaymentService.javagit log --all --oneline --grep="payment"Your repo is huge and cloning/pulling is slow.
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sort -k3 -n -r | head -20du -sh .gitProduction Incident
git log on main shows a linear history that doesn't match what was there before.git rebase main on their feature branch, then force-pushed their branch. When merging, the merge was fast-forwarded. Later, the same developer rebased main locally against an outdated remote (git fetch was skipped), then ran git push --force-with-lease origin main. The lease check passed because their local ref matched what they last fetched — but they hadn't fetched since teammates pushed. The force-push overwrote main with the stale rebased history.git push --force origin <good-sha>:main.
4. All teammates must run git fetch origin && git reset --hard origin/main to sync.
5. Implement branch protection rules requiring PR reviews and disallowing force-pushes to main.--force-with-lease only on personal feature branches.Branch protection rules on main are not optional — they are infrastructure.The reflog on any machine that ever pulled the correct main will have the lost commits. Recovery is possible if you act before the reflog expires (90 days default).Always fetch before rebasing against a remote branch. Stale refs are the root cause of most force-push disasters.Production Debug GuideFrom symptom to resolution for real production scenarios.
git log --oneline -1. 3. Create a branch pointing to it: git branch rescue-branch HEAD. 4. Switch to it: git switch rescue-branch. Your commits are now safe on a named branch.git status to see conflicted files. 2. Open each file — look for <<<<<<<, =======, >>>>>>> markers. 3. Edit the file to keep the correct version (or combine both). 4. Stage the resolved file: git add <file>. 5. Continue: git merge --continue or git rebase --continue. 6. If it's a rebase and you want to abort entirely: git rebase --abort.git filter-branch or git filter-repo to remove the file from all history. 3. Force-push the cleaned history. 4. Add the file pattern to .gitignore. 5. If the repo is public on GitHub, assume the secret was scraped by bots within minutes.git pull creates unexpected merge commits and pollutes history.→1. This happens when your local branch has commits that the remote doesn't, and you pull with default merge strategy. 2. Use git pull --rebase to replay your commits on top of the remote changes. 3. Set it permanently: git config --global pull.rebase true. 4. This keeps history linear and clean.Git is the version control system underlying virtually every software project. It solves three problems: tracking who changed what and when, enabling parallel development through branches, and providing a mechanism to recover from mistakes.
The common misconception is that Git is just a backup system. It is not. Git is a content-addressable DAG (directed acyclic graph) of snapshots. Understanding this mental model separates engineers who blindly run commands from those who can recover from any state, resolve complex conflicts, and design efficient branching strategies.
This guide covers the fundamentals, then layers on production-grade insights: when rebases go wrong, how to recover from force-push disasters, and why your branching strategy directly impacts deployment velocity.
What Is a Repository and Why Does Git Need One?
Before you run a single Git command, you need to understand the word 'repository' — because it comes up constantly. A repository (usually shortened to 'repo') is just a folder on your computer that Git is actively watching and tracking. That's it. The moment you tell Git to watch a folder, that folder becomes a repository.
Inside every Git repository, Git creates a hidden folder called .git. This is Git's private notebook where it stores the entire history of your project — every version of every file, every message you attached to your saves, and every branch you've ever created. You'll almost never need to touch that .git folder directly, but knowing it exists explains the magic: as long as that folder is there, your full history is safe.
Think of the repository as a library and the .git folder as the librarian's filing cabinet in the back room. Your actual files are the books on the shelves that you read and edit. The filing cabinet tracks every edition of every book ever checked in, who edited it, and what changed between editions. You work with the books; Git manages the filing cabinet automatically.
Creating a repository is the very first step in any Git workflow. You'll either initialise a fresh one from scratch, or clone (download) an existing one from a platform like GitHub. Let's start from scratch.
# Step 1: Create a new project folder and navigate into it mkdir my-first-project cd my-first-project # Step 2: Tell Git to start tracking this folder # This creates the hidden .git folder inside my-first-project git init # Step 3: Confirm that Git is now watching this folder # The .git folder only appears when you pass the -a flag (show hidden files) ls -a # Step 4: Check the current state of the repository # At this point there are no files yet, so Git will say it's a fresh repo git status
. .. .git
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track them)
.git directory grows over time as you accumulate commits, branches, and objects. In large repositories (monorepos, projects with large binary assets), the .git folder can exceed several gigabytes. Use git count-objects -vH to monitor size. If the repository becomes unwieldy, tools like git gc (garbage collection) and git filter-repo (history rewriting) are essential maintenance operations, not optional cleanup..git directory that stores the complete object graph of your project's history. The .git folder is the single source of truth — lose it and you lose everything not pushed to a remote. Always push to a remote regularly.git init in an empty directory.git clone <url> — this creates the directory, downloads all history, and sets up the remote automatically.git clone --depth 1 <url> — fetches only the latest commit, dramatically reducing clone time for large repos.Your First Commit: How Git Actually Saves Your Work
In Git, saving your work isn't called 'saving' — it's called 'committing'. A commit is a permanent snapshot of your project at a specific moment in time. Think of it like taking a photograph of your entire project folder. You can take as many photographs as you want, and you can always look back at any photo from any point in the past.
Here's the part that trips up every beginner: Git uses a two-step process to save work, and for good reason. First you 'stage' your changes (step 1), then you 'commit' them (step 2). Staging is like putting items into a box before you seal and label it. It lets you choose exactly which changes go into a commit — maybe you changed three files but only want to snapshot two of them right now. That's completely valid.
The command for staging is git add. The command for committing is git commit. Every commit requires a message — a short human-readable note explaining what changed and why. These messages are invaluable six months later when you're trying to remember why a certain change was made. Write them like you're leaving a note for your future self, because you are.
The analogy: staging is packing the box, committing is sealing it, labelling it, and putting it on the archive shelf permanently.
# We're inside my-first-project (from the previous step) # Step 1: Create a real file with some content echo "# My First Project" > README.md echo "This project is tracked by Git." >> README.md # Step 2: Check what Git sees — it notices the new file but isn't tracking it yet # Git calls untracked files 'untracked' — they exist in the folder but not in Git's history git status # Step 3: Stage the file — tell Git 'yes, include this file in the next snapshot' # The dot (.) means 'stage everything in the current folder' git add README.md # Step 4: Check status again — README.md is now in the 'staging area' git status # Step 5: Commit the staged file with a descriptive message # -m lets you write the message inline — always use present tense, e.g. 'Add' not 'Added' git commit -m "Add README with project description" # Step 6: View the commit history — you'll see your commit with its unique ID git log --oneline
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: README.md
[main (root-commit) a3f92c1] Add README with project description
1 file changed, 2 insertions(+)
create mode 100644 README.md
a3f92c1 Add README with project description
git add -p (patch mode) lets you stage individual hunks of a file, creating atomic commits that isolate concerns. This is critical for code review — reviewers can understand each commit independently, and git bisect can pinpoint exactly which change introduced a regression. Squashing unrelated changes into one commit is a debugging liability.git bisect and code review clarity. Use git add -p for surgical staging.git add <file> to stage the entire file.git add -p to interactively stage individual hunks, then commit each concern separately.git add -u instead of git add . to avoid accidentally staging build artifacts.git reset HEAD <file> to unstage without losing changes.Branches: How to Experiment Without Breaking Anything
Here's where Git goes from 'useful' to 'genuinely magical'. A branch is an independent line of development that runs parallel to your main code. You can create a branch, make all kinds of experimental changes in it, and your main code is completely untouched. If the experiment works, you merge the branch back in. If it doesn't, you delete the branch and it's as if it never happened.
Every Git repository starts with one default branch, usually called main (older projects may call it master). This is your stable, production-ready code. Nobody should push broken code directly to main. Instead, every new feature or bug fix gets its own branch.
Picture a river. The main river is your main branch — it keeps flowing steadily. When you want to try something new, you dig a side canal (a new branch). You do all your experimental digging in the canal. If the canal works great, you reconnect it to the main river (merge). If it floods and turns into a swamp, you just fill it back in (delete the branch) — the main river never knew anything happened.
This is the exact workflow used at every professional software company on earth. Features developed on branches, reviewed, then merged. Understanding this is what separates someone who 'knows some Git commands' from someone who actually understands Git.
# We're continuing inside my-first-project with our first commit already made # Step 1: See all existing branches — the asterisk (*) shows your current branch git branch # Step 2: Create a new branch called 'add-contact-page' # Naming convention: use lowercase, hyphens, descriptive names — never spaces git branch add-contact-page # Step 3: Switch to the new branch so our changes happen there, not on main git switch add-contact-page # (Older Git versions use 'git checkout add-contact-page' — both work) # Step 4: Create a new file on this branch echo "# Contact Page" > contact.md echo "Email us at: hello@myproject.com" >> contact.md # Step 5: Stage and commit the new file — this commit lives ONLY on add-contact-page git add contact.md git commit -m "Add contact page with email address" # Step 6: Switch back to main and notice contact.md is GONE from your folder # It's not deleted — it simply doesn't exist on the main branch yet git switch main ls # Step 7: Merge the work from add-contact-page into main # Now contact.md will appear in main and the commit history merges too git merge add-contact-page ls # Step 8: Delete the branch — the work is merged, we don't need it anymore git branch -d add-contact-page
Switched to branch 'add-contact-page'
[add-contact-page 7b12e4f] Add contact page with email address
1 file changed, 2 insertions(+)
create mode 100644 contact.md
Switched to branch 'main'
README.md
Updating a3f92c1..7b12e4f
Fast-forward
contact.md | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 contact.md
README.md contact.md
Deleted branch add-contact-page (was 7b12e4f).
Configuring Git and Connecting to GitHub — The Full Picture
Git runs entirely on your local machine — everything we've done so far lives only on your computer. That's great for personal version control, but most real projects need a remote home so your team can access the code, or so you don't lose everything if your laptop dies. That's where platforms like GitHub, GitLab, and Bitbucket come in.
Think of your local repository as your personal notebook and GitHub as the shared whiteboard in the office. You do your thinking and drafting in your notebook, then when you're ready, you share your updates to the whiteboard. Your colleagues can pull your updates from the whiteboard into their own notebooks.
Before any of this works, Git needs to know who you are. Every commit is stamped with a name and email address so your team knows who made each change. This is a one-time setup on your machine. After that, you point Git at your remote repository with git remote add, push your work up with git push, and pull your teammates' work down with git pull.
You don't need a GitHub account to learn Git locally, but you'll want one before you share any project or apply for jobs — your GitHub profile is your public portfolio.
# ── PART 1: One-time global configuration ────────────────────────────────── # Tell Git your name — this appears on every commit you ever make git config --global user.name "Alex Johnson" # Tell Git your email — use the same email as your GitHub account git config --global user.email "alex.johnson@example.com" # Set VS Code as your default editor (optional but recommended for beginners) git config --global core.editor "code --wait" # Confirm your settings look correct git config --list # ── PART 2: Connect your local repo to GitHub ─────────────────────────────── # Assumes you've created an empty repo on github.com called 'my-first-project' # Add the remote — 'origin' is the conventional name for your primary remote # Replace YOUR_USERNAME with your actual GitHub username git remote add origin https://github.com/YOUR_USERNAME/my-first-project.git # Verify the remote was added correctly git remote -v # ── PART 3: Push your local commits to GitHub ─────────────────────────────── # -u sets 'origin main' as the default so future pushes just need 'git push' git push -u origin main # ── PART 4: Pull the latest changes from GitHub (use this daily) ──────────── # This fetches + merges any changes your teammates pushed since your last pull git pull origin main
user.email=alex.johnson@example.com
core.editor=code --wait
origin https://github.com/YOUR_USERNAME/my-first-project.git (fetch)
origin https://github.com/YOUR_USERNAME/my-first-project.git (push)
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 381 bytes | 381.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/YOUR_USERNAME/my-first-project.git
* [new branch] main -> main
branch 'main' set up to track 'origin/main'.
Already up to date.
git fetch and git pull is critical in production. git fetch downloads remote changes but does not modify your working directory — it's safe to run anytime. git pull is git fetch followed by git merge (or git rebase if configured), which modifies your local branch. In CI/CD pipelines and shared environments, always prefer git fetch first, inspect the changes with git log HEAD..origin/main, then decide whether to merge or rebase. Blind git pull on a dirty working directory can create confusing merge states.git fetch is a read operation (safe, inspectable); git pull is a write operation (modifies history). In production, prefer fetch-then-inspect over blind pull. SSH keys eliminate credential friction for daily workflows.git fetch origin then inspect with git log HEAD..origin/main.git pull --rebase origin main to replay your commits on top. Cleaner history than merge.git stash, then pull, then git stash pop. Pulling with uncommitted changes can create conflicts that are harder to resolve.git fetch origin && git reset --hard origin/main. This destroys local work — use with caution.| Concept | What It Is | Real-World Analogy |
|---|---|---|
| Repository | A folder Git is tracking, with full history stored in .git | A library with a complete archive of every past edition of every book |
| Commit | A permanent snapshot of your project at a specific moment | A dated photograph of your entire project folder, sealed forever |
| Branch | An independent parallel line of development | A side canal dug from the main river — experiments happen there, not in the river |
| Staging Area | A holding zone where you choose what goes into the next commit | Packing items into a box before you seal and label it |
| Merge | Combining the history of one branch into another | Reconnecting the side canal back to the main river |
| Remote (GitHub) | An online copy of your repository that your team can access | The shared whiteboard in the office — everyone reads from and writes to it |
| git push | Sending your local commits up to the remote repository | Copying your notebook updates onto the shared whiteboard |
| git pull | Downloading commits from the remote into your local repo | Copying the shared whiteboard updates back into your notebook |
| Merge vs Rebase | Merge preserves branch history with a merge commit; rebase rewrites history to be linear | Merge is tying two ropes together with a knot; rebase is untying one rope and splicing it onto the end of the other |
| git fetch vs git pull | Fetch downloads remote changes without modifying working directory; pull fetches then merges | Fetch is reading the whiteboard without touching your notebook; pull is reading and immediately copying changes in |
| Reflog | A local log of every HEAD movement — your safety net for recovery | A GPS tracker on your car — even if you drive somewhere and forget how to get back, it remembers every turn |
🎯 Key Takeaways
- A Git repository is just a normal folder with a hidden .git directory inside it — that directory is Git's entire memory, storing every snapshot of your project ever taken.
- Committing is a two-step process by design: 'git add' stages the exact changes you want, and 'git commit' permanently seals that snapshot with a message — this intentional separation gives you precision control over what goes into each save.
- Branches let you experiment in complete isolation from your working code — the industry-standard workflow is to never commit directly to main, always branch, then merge after review.
- Git is local-first: everything works on your machine without internet access. GitHub is a remote host — it's not Git itself, it's a platform that stores a copy of your Git repository online so teams can collaborate.
- The reflog is your ultimate recovery tool — it records every HEAD movement for 90 days, meaning almost any 'lost' commit can be recovered as long as you act before garbage collection runs.
- Merge preserves history faithfully; rebase rewrites it to be linear. The choice is a trade-off between auditability (merge) and readability (rebase). Never rebase commits that others have based work on.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the difference between 'git fetch' and 'git pull', and when would you use each one?
- QCan you explain the difference between merging and rebasing in Git, and what are the trade-offs of each approach?
- QIf you accidentally committed a file containing a database password to a public GitHub repository, what would you do immediately and why?
- QExplain the Git object model. What are blobs, trees, commits, and tags, and how do they relate to each other?
- QYour team's repository has grown to 5GB and cloning takes 30 minutes. How would you diagnose and fix this?
- QWhat is a detached HEAD state, how do you get into it, and how do you recover if you made commits while detached?
Frequently Asked Questions
What is the difference between Git and GitHub?
Git is the version control software that runs on your local computer and tracks changes to your files. GitHub is a website that hosts Git repositories online so teams can share and collaborate on code. You can use Git without GitHub, but you need Git installed to work with GitHub repositories. Think of Git as the engine and GitHub as the parking garage where everyone stores and shares their cars.
Do I need to pay for Git or GitHub?
Git itself is completely free and open-source. GitHub offers a free tier that covers everything individual developers and most small teams need, including unlimited public and private repositories. Paid plans exist for larger organisations that need advanced access controls, audit logs, and enterprise features. For learning and most professional use, the free tier is more than enough.
What happens if two people edit the same file at the same time — does Git break?
Git handles this remarkably well through a concept called merging. If two people change different parts of the same file, Git automatically combines both changes without any conflict. If two people change the exact same line, Git flags a 'merge conflict' and asks a human to decide which version (or combination) to keep. This is a normal, routine part of collaborative development — Git shows you exactly which lines conflict and you resolve them manually, then commit the resolution.
What is the difference between merge and rebase, and when should I use each?
Merge creates a new commit that ties together two branch histories, preserving the exact timeline of when each change was made. Rebase takes your commits and replays them on top of another branch, creating a linear history. Use merge when you want to preserve the true history of how development happened (important for auditing). Use rebase on feature branches before merging to main to keep the project history clean and linear. The critical rule: never rebase commits that have been pushed to a shared branch, as this rewrites history that others may have based work on.
How do I undo a commit that I already pushed to the remote?
If the commit is the most recent one, use git revert <sha> — this creates a new commit that undoes the changes without rewriting history, making it safe for shared branches. If you must completely erase the commit (e.g., it contains secrets), use git reset --hard HEAD~1 locally then git push --force-with-lease, but only if no one else has pulled the commit. For commits further back in history, git revert is still the safest option on shared branches.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.