// developer tools

encoding-doctor

Scan, fix, and verify file encoding issues across your entire project in one command. Mojibake, BOM, CRLF, null bytes — automatically detected and repaired.

mojibake fixer BOM remover CRLF normalizer null byte cleaner encoding detector no config needed
Read before running

encoding-doctor modifies files in-place. Always run enc-doctor scan first and review the report before running enc-doctor fix.
Backups are created automatically as .bak files, but you are responsible for verifying results. Do not run fix on production files without testing first.

Get Started View on GitHub
$ pip install encoding-doctor
COPIED

click to copy

5 Issue types detected
4 Commands
0 Dependencies
3.9+ Python version
3 OS supported

Try it yourself. Live simulation.

Press Enter ↵ to run each command — just like the real terminal.

Welcome! This is a live simulation of enc-doctor running in a real Windows terminal. Press Enter to run the first command: enc-doctor --help
Command Prompt
FileEditViewHelp
C:\Users\use-doctor\Downloads\encoding-doctor\encoding_doctor>
Ready Step 0 / 7

or press the Enter key on your keyboard

Built for developers who care about clean code.

If your project runs Python and touches text files, encoding-doctor is for you.

Python developers on Windows
Windows saves files as cp1252 by default. Every time you edit a UTF-8 file with the wrong editor, you risk silent corruption. encoding-doctor catches it before it reaches Git.
// Notepad, VS Code misconfigured, Excel CSV exports
Teams mixing Windows + Linux + macOS
One developer on Windows pushes CRLF. Another on Linux sees red lines in every diff. encoding-doctor normalizes the whole project in one command.
// Cross-platform dev teams, remote work, freelancers working across machines
Developers migrating legacy codebases
Old Python 2 projects, PHP 5 codebases, VB.NET files — all full of encoding mess. Scan thousands of files at once, fix systematically, verify everything is clean.
// Python 2 → 3 migration, legacy enterprise code
Developers with large or growing projects
The bigger the codebase, the harder it is to spot encoding issues manually. encoding-doctor scans thousands of files in seconds, saves hours of debugging, and protects your project before it reaches production.
// Large Python apps, data pipelines, ETL projects, enterprise codebases

Everything encoding-doctor can do.

Six issue types detected and fixed. Four commands. Zero configuration. Built for real Python projects.

// what it fixes
FIX
Mojibake
UTF-8 characters mis-read as cp1252 and re-saved as garbage bytes. Most common when files are opened on Windows with the wrong locale.
BEFORE├óÔÇáÔÇÖ name.py
AFTER→ name.py
FIX
BOM (Byte Order Mark)
Invisible EF BB BF byte prefix silently added by Notepad and Excel. Breaks Python's open(), JSON parsers, and HTTP headers.
BEFORE\xef\xbb\xbf# my_module.py
AFTER# my_module.py
FIX
CRLF Line Endings
Windows \r\n mixed with Unix \n. Creates noisy Git diffs, breaks shell scripts, and causes CI failures on Linux runners.
BEFOREline1\r\n line2\r\n
AFTERline1\n line2\n
FIX
Null Bytes
Binary corruption from bad FTP transfers, copy-paste from legacy terminals, or corrupted editors. Causes silent parse errors that are hard to trace.
BEFOREdata\x00value\x00
AFTERdatavalue
FIX
Mixed Encoding
A single file containing both cp1252 and UTF-8 sections. Common in codebases migrated from Python 2 or files edited across different operating systems.
BEFOREcp1252 + UTF-8 mixed
AFTERUTF-8 unified
⟨⟩ FIX
JSON / YAML Corruption
Non-ASCII characters in config files from copy-pasting out of Word, email, or Google Docs. Silently breaks json.load() and PyYAML parsers.
BEFORE"name’s value"
AFTER"name's value"
// commands
01
scan
enc-doctor scan <path> [--all]
Recursively scans all text files and reports encoding issues. Non-destructive — never touches files. Use --all to also list clean files.
safe read-only recursive
02
fix
enc-doctor fix <path> [--dry-run]
Repairs all detected issues in-place. Automatically creates .bak backups before touching any file. Use --dry-run to preview changes first.
modifies files auto-backup
03
verify
enc-doctor verify <path>
Validates every file in the directory is valid UTF-8. Reports a PASS or FAIL per file. Run this after fixing to confirm everything is clean before committing.
safe read-only commit-ready check
04
restore
enc-doctor restore <file>
Restores a single file from its .bak backup created by fix. Useful when a fix produces an unexpected result and you need to roll back one file.
undo single-file
// design principles
Zero config
No config files, no .toml, no setup. Point it at a folder and it works. Everything is opinionated and sensible by default.
No dependencies
Pure Python standard library. No chardet, no ftfy, nothing to pin or audit. Install once, works everywhere Python runs.
Safe by default
Backups are created before every fix. Dry-run mode lets you preview every change. Verify confirms the result. You are always in control.

Important — Read this first.

enc-doctor fix modifies files on disk. Know what you're doing before you run it.

01
Always run scan first
Review the report carefully before running fix. Understand exactly what will change.
02
Back up your project folder first
enc-doctor creates per-file .bak backups automatically. But also back up the entire folder with xcopy (Windows) or cp -r (macOS/Linux) before running fix for extra safety.
03
Do not run on binary files
encoding-doctor targets text files only. Passing binary directories can cause data loss.
04
Verify after every fix
Run enc-doctor verify and review changed files manually before committing.
05
Not a substitute for proper encoding hygiene
Fix the root cause — configure your editor and Git to always use UTF-8 and LF. This tool treats symptoms, not the source.

Simple workflow. No config.

01
Install
One pip install. No dependencies beyond the standard library.
pip install encoding-doctor
02
Scan your project
Point enc-doctor at any folder. It recursively scans all text files and produces a detailed report.
enc-doctor scan ./my_project
03
Review the report
Read what was found. Each issue is listed with file path, issue type, and occurrence count. Understand what will change before proceeding.
04
Fix
Run fix only after reviewing the report. Backups are created automatically before any file is modified.
enc-doctor fix ./my_project
05
Verify
Confirm all files pass UTF-8 validation. Safe to commit when verify reports clean.
enc-doctor verify ./my_project

Common questions.

Is scan really free forever? +
Yes. enc-doctor scan has no license requirement and never will. You can scan any project, any size, as many times as you want — completely free. Only fix, verify, and restore require a license.
What if fix breaks my file? +
encoding-doctor creates a .bak backup of every file before modifying it. You can restore any file instantly with enc-doctor restore <file>. We also recommend backing up your entire project folder with xcopy (Windows) or cp -r (macOS/Linux) before running fix — the interactive demo shows you exactly how.
Does it work on large projects with thousands of files? +
Yes. encoding-doctor scans recursively and skips known non-source directories like __pycache__, .git, node_modules, and virtual environments automatically. It is designed for real production codebases, not just small scripts.
Can I use one license on multiple machines? +
Solo license is for 1 developer — you can activate it on your machine and deactivate it when switching devices using enc-doctor deactivate. Team license covers up to 5 developers. Each developer activates with the same license key.
Does it work offline or in air-gapped environments? +
Scan works fully offline — zero network required. Fix and verify require a one-time license activation, but once activated the license is stored locally. encoding-doctor has zero runtime dependencies and runs on any machine with Python 3.9+ including Raspberry Pi, edge devices, and air-gapped servers.

One command. Clean project.
Pay once, own it forever.

No subscription. No usage limits. No telemetry.
Buy once — license never expires. Works fully offline.

SOLO
Solo
$19
one-time · lifetime license
  • 1 developer
  • Unlimited projects
  • scan · fix · verify · restore
  • Windows 10 · 11
  • macOS 10.15 Catalina and later
  • Linux: Ubuntu 20.04+ · Debian 10+
  • Fedora 36+ · Arch · and more
  • Lifetime license
Get Solo — $19
Instant activation
pip install encoding-doctor
License key by email
No subscription · Pay once · Use forever