Sanitizing Git History: A Security-First Approach to Public Repositories

Nov 24, 2025

6 min read — enough time for 18 password attempts

On This Page

The Problem Nobody Talks About

You’ve built something useful. You want to share it. You push to GitHub and wait. Did you just publish your AWS account ID? Your email? That SSH key comment with your personal address?

This happened to me. Not in a catastrophic way, but in the slow-burn realization that my “public” repositories contained breadcrumbs of information that didn’t need to be public. Account IDs in Terraform variables. GitLab usernames in OIDC trust policies. Email addresses embedded in SSH public keys.

None of this is immediately exploitable. AWS account IDs aren’t secrets per se, but they’re pieces of a puzzle. Combined with other information, they make reconnaissance easier. And in security, we don’t make reconnaissance easier.

The Dual-Remote Reality

My setup uses two remotes:

GitLab: Production. CI/CD pipelines. Real infrastructure deployment.
GitHub: Public portfolio. Code samples. Open source contributions.

This separation exists because GitLab CI/CD needs actual values to deploy infrastructure. The OIDC trust policy must reference my real GitLab project path. Terraform needs my actual AWS account ID to name S3 buckets.

But GitHub? GitHub is for showing work, not running it. The code should be instructive, not operational.

The challenge: How do you maintain one codebase that serves both purposes?

The Wrong Approach: Manual Scrubbing

My first instinct was to manually edit files before pushing to GitHub. Find-and-replace account IDs. Delete the GitLab-specific configs. Push a “clean” version.

This fails for three reasons:

Git remembers everything. Even if you edit a file, the old version lives in history. git log -p reveals all.
It’s error-prone. Miss one file, one commit, one variable, and you’ve leaked data.
It doesn’t scale. Every commit requires re-sanitization. One slip and you’re back to square one.

The Right Approach: Rewrite History

Git’s history isn’t immutable it just feels that way. Tools like git-filter-repo can rewrite every commit, replacing sensitive strings across the entire repository history.

Here’s what I did:

Step 1: Create a Replacement Map

XXXXXXXXXXXX==>123456789012
YYYYYYYYYYYY==>234567890123
sensitive-username==>username
personal-email@domain.com==>user@example.com

Each line maps a sensitive value to a safe placeholder. The tool processes every blob in every commit, performing these substitutions.

Step 2: Run the Filter

git-filter-repo --replace-text replacements.txt --force

This rewrites history. Every commit that contained XXXXXXXXXXXX now contains 123456789012. The SHA hashes change (because the content changed), but the commit messages, dates, and structure remain.

Step 3: Force Push

git push origin main --force

Yes, force push. This replaces the remote history with the sanitized version. Anyone who cloned before will have conflicts but that’s the point. The old, sensitive history no longer exists on the remote.

The Template Strategy

Sanitizing existing repos is reactive. For the portfolio, I wanted something proactive: a template that’s clean from the start.

The approach:

Copy the production repo to a new directory
Remove .git (fresh history)
Replace all personal content with placeholders
Create example files that demonstrate structure without leaking data
Initialize new repo and push to GitHub

The result: portfolio-template a fully functional Astro portfolio with zero personal information. Fork it, customize it, deploy it.

What Got Replaced

File	Personal Content	Template Placeholder
`src/consts.ts`	Site title with my name	`Your Name \| Portfolio`
`src/pages/index.astro`	Bio, title, specializations	Generic placeholders
`src/components/Header.astro`	LinkedIn, GitHub, email URLs	`your-linkedin`, `your-github`
`src/data/certifications.ts`	My actual certifications	Example certification objects
`src/content/projects/*.md`	Detailed project writeups	Single example template

Terraform: The Trickier Case

Infrastructure code is harder to sanitize because it needs real values to work. My solution uses gitignored variable files:

Committed (public):

# variables.tf
variable "aws_account_id" {
  description = "AWS account ID"
  type        = string
  default     = "" # Set via terraform.tfvars
}

Gitignored (local only):

# terraform.tfvars
aws_account_id = "XXXXXXXXXXXX"
gitlab_project_path = "sensitive-username/aws-sec"

The committed code has empty defaults. The real values live in .tfvars files that never touch GitHub. Users who clone the repo create their own .tfvars with their own values.

For the S3 backend (which can’t use variables), I use a separate backend.hcl:

# backend.hcl (gitignored)
bucket = "terraform-state-XXXXXXXXXXXX-us-west-1"

Initialize with:

terraform init -backend-config=backend.hcl

Lessons Learned

1. Commit Hygiene Matters From Day One

It’s easier to never commit secrets than to remove them later. Before every commit:

Check git diff for account IDs, emails, keys
Use .gitignore aggressively
Consider pre-commit hooks that scan for patterns

2. AWS Account IDs Aren’t Secrets, But Treat Them Like One

AWS says account IDs aren’t sensitive. They’re in every ARN, every CloudTrail log, every error message. But:

They enable targeted attacks
They’re used in social engineering
They identify you across services

Minimizing exposure is good hygiene, not paranoia.

3. Git History Is a Liability

Every commit is permanent until you actively rewrite it. That quick test with hardcoded credentials? It’s in your history. That config file you deleted? Still there.

Assume anything committed will be found. Act accordingly.

4. Separation of Concerns Applies to Repos

Not everything belongs in the same repository. Production configs, personal data, and example code have different audiences and different risk profiles. Separate them.

5. Templates Are Documentation

A well-structured template teaches more than a tutorial. It shows the right file structure, the correct frontmatter format, the expected configuration. Users learn by customizing, not by reading.

The Power of Doubling Back

The most valuable skill in this process wasn’t any particular git command—it was the willingness to revisit decisions. To look at a “finished” repo and ask: “Is this actually ready to be public?”

Security isn’t a feature you add at the end. It’s a lens you apply continuously. Every commit, every push, every new file deserves the question: “What am I exposing?”

This blog post exists because I doubled back. I looked at repos I’d already pushed and realized they weren’t as clean as I thought. The fix took a few hours. The alternative—leaving sensitive data exposed indefinitely wasn’t acceptable.

Double back. Check your history. Clean what needs cleaning. Your future self will thank you.

Resources

git-filter-repo - The modern replacement for git filter-branch
BFG Repo-Cleaner - Alternative for removing large files and passwords
GitHub: Removing sensitive data
Portfolio Template - The sanitized template discussed in this post

Security isn’t about perfection it’s about continuous improvement. Every repo you clean, every secret you catch before commit, every habit you build makes the next project safer.

Johnny Endrihs

Has spent 20+ years securing networks and technology, from military communications to finance. Exploring the frontier of cybersecurity and AI security architecture.