I was recently using AI to process client data, and the client was clear: sensitive information should not be sent to AI servers. But when I went looking for de-identification tools, I hit a wall—Taiwan’s PII formats (national ID numbers, local phone numbers, tax IDs) aren’t supported by any existing tool.
So I combined an open-source library with local adaptations and built pii-guard-tw. It automatically replaces PII in your documents with placeholders, lets you send the sanitized version to AI for processing, and then restores the original data afterward. Your real data never leaves your machine.
Supported PII Types
- National ID numbers, resident certificate numbers, passport numbers
- Mobile numbers (09xx / +886 format), landlines
- Email addresses, credit card numbers
- Tax ID numbers (統一編號, Taiwan’s business registration number), license plates, dates of birth, bank account numbers
- Person names, organization names, location names (detected via CKIP BERT, a Mandarin NLP model from Academia Sinica)
Supported File Formats
- Plain text (.txt / .csv / .tsv)
- Excel (.xlsx) — processes cell by cell, preserves formatting
- Word (.docx) — handles both paragraphs and tables
- PDF (.pdf) — extracts text first, then processes
MCP Integration
There’s also an MCP server so you can plug it directly into Claude Code and use it seamlessly in your workflow.
Still very early stage—issues and PRs are welcome.
A Note for API / Enterprise Users
Claude API and Enterprise users can refer to Anthropic’s official ZDR (Zero Data Retention) policy—your data isn’t retained by default. For regular subscription users, besides using a de-identification tool, remember to go into your settings and turn off “Allow my data to be used for model training.” That way your data is only stored by Anthropic for 30 days instead of five years.