PIIScanner
Detect and redact personally identifiable information in text.
When to Use This
PIIScanner is your first line of defense when users might include sensitive personal data in prompts. Customer support chatbots, document analysis tools, HR applications, and any LLM app that processes free-text input from real people are all candidates for PII scanning. Without it, names, email addresses, phone numbers, Social Security numbers, credit card numbers, API keys, and dozens of other sensitive identifiers flow directly into your LLM — and potentially into logs, caches, and training pipelines.
PIIScanner covers three broad categories: identity and contact information (names, addresses, phone numbers), financial data (credit cards, bank accounts, IBAN), and credentials (passwords, API keys, cloud service tokens). For workflows where you need the original values back after the LLM processes the redacted text, see the deanonymize workflow.
Quick Example
from meshulash_guard import Guard, Action, Condition
from meshulash_guard.scanners import PIIScanner, PIILabel
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
pii = PIIScanner(
labels=[PIILabel.EMAIL_ADDRESS, PIILabel.PHONE_NUMBER],
action=Action.REPLACE,
condition=Condition.ANY,
)
result = guard.scan_input(
"My email is sarah@company.com and my phone is 555-867-5309",
scanners=[pii],
)
print(result.status) # "secured"
print(result.processed_text) # "My email is [EMAIL_ADDRESS-A1B2] and my phone is [PHONE_NUMBER-C3D4]"
print(result.placeholders)
# {"[EMAIL_ADDRESS-A1B2]": "sarah@company.com", "[PHONE_NUMBER-C3D4]": "555-867-5309"}
Expected output:
secured
My email is [EMAIL_ADDRESS-A1B2] and my phone is [PHONE_NUMBER-C3D4]
{'[EMAIL_ADDRESS-A1B2]': 'sarah@company.com', '[PHONE_NUMBER-C3D4]': '555-867-5309'}
Labels
PIIScanner has 53 members: 47 canonical labels you use individually, 5 bundle shortcuts that expand to groups of labels, and one ALL wildcard.
Individual Labels
| Label | What It Detects |
|---|---|
PIILabel.PERSON_NAME |
Full names of individuals |
PIILabel.NATIONAL_ID |
National ID numbers, SSNs, and government-issued ID numbers |
PIILabel.DRIVER_LICENSE |
Driver's license numbers |
PIILabel.USERNAME |
Usernames and login handles |
PIILabel.TAX_NUMBER |
Tax identification numbers (EIN, VAT, TIN) |
PIILabel.ACCOUNT_ID |
Account or customer IDs |
PIILabel.BUSINESS_ID |
Business registration numbers |
PIILabel.EMAIL_ADDRESS |
Email addresses (RFC 5321/5322 validated) |
PIILabel.PHONE_NUMBER |
Phone numbers in international and local formats |
PIILabel.STREET_ADDRESS |
Street addresses |
PIILabel.CITY |
City names in address context |
PIILabel.COUNTY |
County names in address context |
PIILabel.STATE |
State or province names in address context |
PIILabel.COUNTRY |
Country names in address context |
PIILabel.POSTAL_CODE |
ZIP codes and postal codes |
PIILabel.IP_ADDRESS |
IPv4 and IPv6 addresses |
PIILabel.MAC_ADDRESS |
Network hardware (MAC) addresses |
PIILabel.URL |
URLs and web addresses |
PIILabel.CREDIT_CARD |
Credit and debit card numbers (Luhn validated) |
PIILabel.IBAN |
International Bank Account Numbers |
PIILabel.BANK_ACCOUNT |
Bank account numbers |
PIILabel.ROUTING_NUMBER |
Bank routing (ABA) numbers |
PIILabel.CRYPTO_ADDRESS |
Cryptocurrency wallet addresses (BTC, ETH, etc.) |
PIILabel.PASSWORD |
Password strings (entropy-validated) |
PIILabel.SECRET_KEY |
Generic secret keys |
PIILabel.ACCESS_TOKEN |
OAuth and bearer access tokens |
PIILabel.PRIVATE_KEY |
Private key material (RSA, EC, PEM) |
PIILabel.CONNECTION_STRING |
Database and service connection strings |
PIILabel.AWS_ACCESS_KEY |
AWS access key IDs (AKIA...) |
PIILabel.AWS_SECRET_KEY |
AWS secret access keys |
PIILabel.GCP_API_KEY |
Google Cloud Platform API keys |
PIILabel.GCP_SERVICE_ACCOUNT |
GCP service account JSON credentials |
PIILabel.OPENAI_API_KEY |
OpenAI API keys (sk-...) |
PIILabel.AZURE_API_KEY |
Azure Cognitive Services and other Azure API keys |
PIILabel.STRIPE_SECRET_KEY |
Stripe secret keys (sk_live_..., sk_test_...) |
PIILabel.STRIPE_PUBLISHABLE_KEY |
Stripe publishable keys (pk_live_..., pk_test_...) |
PIILabel.SLACK_TOKEN |
Slack bot and user tokens (xoxb-..., xoxp-...) |
PIILabel.GITHUB_TOKEN |
GitHub personal access tokens and fine-grained tokens |
PIILabel.TWILIO_API_KEY |
Twilio API keys and auth tokens |
PIILabel.SENDGRID_API_KEY |
SendGrid API keys |
PIILabel.HEROKU_API_KEY |
Heroku API keys |
PIILabel.API_KEY |
Generic API keys not matched by a more specific label |
PIILabel.TOKEN |
Generic tokens not matched by a more specific label |
PIILabel.USER_AGENT |
Browser and HTTP user-agent strings |
PIILabel.AWS_ARN |
AWS resource ARNs |
PIILabel.DATE_TIME |
Dates and timestamps in a personal context |
PIILabel.ORGANIZATION |
Organization and company names in a personal context |
Bundles
Bundles are shortcuts that expand to multiple labels. Use them instead of listing every label individually.
PIILabel.PII — 23 labels (expanded by default)
The core PII bundle covers identity, contact, location, and financial labels. This is the most commonly used bundle.
| Label | What It Detects |
|---|---|
PIILabel.PERSON_NAME |
Full names of individuals |
PIILabel.NATIONAL_ID |
National ID and SSN numbers |
PIILabel.DRIVER_LICENSE |
Driver's license numbers |
PIILabel.USERNAME |
Usernames and login handles |
PIILabel.TAX_NUMBER |
Tax identification numbers |
PIILabel.ACCOUNT_ID |
Account or customer IDs |
PIILabel.BUSINESS_ID |
Business registration numbers |
PIILabel.EMAIL_ADDRESS |
Email addresses |
PIILabel.PHONE_NUMBER |
Phone numbers |
PIILabel.STREET_ADDRESS |
Street addresses |
PIILabel.CITY |
City names |
PIILabel.COUNTY |
County names |
PIILabel.STATE |
State or province names |
PIILabel.COUNTRY |
Country names |
PIILabel.POSTAL_CODE |
ZIP and postal codes |
PIILabel.IP_ADDRESS |
IPv4 and IPv6 addresses |
PIILabel.MAC_ADDRESS |
Network MAC addresses |
PIILabel.URL |
URLs and web addresses |
PIILabel.CREDIT_CARD |
Credit and debit card numbers |
PIILabel.IBAN |
International bank account numbers |
PIILabel.BANK_ACCOUNT |
Bank account numbers |
PIILabel.ROUTING_NUMBER |
Bank routing numbers |
PIILabel.CRYPTO_ADDRESS |
Cryptocurrency wallet addresses |
PIILabel.PHI — 1 label
Protected Health Information subset. Expand as regulations require additional PHI labels in future versions.
| Label | What It Detects |
|---|---|
PIILabel.PERSON_NAME |
Patient name |
PIILabel.PCI — 4 labels
Payment Card Industry data. Use to comply with PCI DSS requirements.
| Label | What It Detects |
|---|---|
PIILabel.CREDIT_CARD |
Credit and debit card numbers |
PIILabel.IBAN |
International bank account numbers |
PIILabel.BANK_ACCOUNT |
Bank account numbers |
PIILabel.ROUTING_NUMBER |
Bank routing numbers |
PIILabel.SECRETS — 20 labels
All credential and secret token labels. Use when scanning developer tools, CI/CD pipelines, or any context where credentials might appear.
| Label | What It Detects |
|---|---|
PIILabel.PASSWORD |
Password strings |
PIILabel.SECRET_KEY |
Generic secret keys |
PIILabel.ACCESS_TOKEN |
OAuth and bearer tokens |
PIILabel.PRIVATE_KEY |
Private key material |
PIILabel.CONNECTION_STRING |
Database connection strings |
PIILabel.AWS_ACCESS_KEY |
AWS access key IDs |
PIILabel.AWS_SECRET_KEY |
AWS secret access keys |
PIILabel.GCP_API_KEY |
Google Cloud API keys |
PIILabel.GCP_SERVICE_ACCOUNT |
GCP service account credentials |
PIILabel.OPENAI_API_KEY |
OpenAI API keys |
PIILabel.AZURE_API_KEY |
Azure API keys |
PIILabel.STRIPE_SECRET_KEY |
Stripe secret keys |
PIILabel.STRIPE_PUBLISHABLE_KEY |
Stripe publishable keys |
PIILabel.SLACK_TOKEN |
Slack bot and user tokens |
PIILabel.GITHUB_TOKEN |
GitHub personal access tokens |
PIILabel.TWILIO_API_KEY |
Twilio API keys |
PIILabel.SENDGRID_API_KEY |
SendGrid API keys |
PIILabel.HEROKU_API_KEY |
Heroku API keys |
PIILabel.API_KEY |
Generic API keys |
PIILabel.TOKEN |
Generic tokens |
PIILabel.TECH — 2 labels
Technical identifiers that can be used to fingerprint a user or device.
| Label | What It Detects |
|---|---|
PIILabel.USER_AGENT |
Browser and HTTP user-agent strings |
PIILabel.AWS_ARN |
AWS resource ARNs |
PIILabel.ALL expands all five bundles and deduplicates the result: 45 unique labels. (PERSON_NAME appears in both PII and PHI; CREDIT_CARD, IBAN, BANK_ACCOUNT, and ROUTING_NUMBER appear in both PII and PCI — so 23 + 1 + 4 + 20 + 2 = 50 minus 5 duplicates = 45 unique labels.)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
labels |
list[PIILabel] |
required | One or more labels to detect. Cannot be empty. Use PIILabel.ALL to detect everything. |
action |
Action |
Action.REPLACE |
Default action when a label is detected. |
condition |
Condition |
Condition.ANY |
Gating condition — when the scanner triggers. |
overrides |
dict[PIILabel, Action] |
None |
Per-label action overrides. Labels in this dict use the specified action instead of action. |
threshold |
float |
None |
Confidence threshold (0.0–1.0). Detections below this score are ignored. |
allowlist |
list[str] |
None |
Values to allow through even when detected (e.g., ["test@example.com"]). |
Per-Label Overrides
Per-label overrides let you apply different actions to different labels. For example: replace email addresses (removing them from the text) but only log phone numbers (keeping them visible while recording the detection server-side).
from meshulash_guard import Guard, Action, Condition
from meshulash_guard.scanners import PIIScanner, PIILabel
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
pii = PIIScanner(
labels=[PIILabel.EMAIL_ADDRESS, PIILabel.PHONE_NUMBER, PIILabel.CREDIT_CARD],
action=Action.REPLACE, # Default: replace emails and credit cards
overrides={
PIILabel.PHONE_NUMBER: Action.LOG, # Only log phone numbers
},
)
result = guard.scan_input(
"Contact: alice@example.com, 212-555-0100, card 4111-1111-1111-1111",
scanners=[pii],
)
print(result.status)
# "secured"
print(result.processed_text)
# "Contact: [EMAIL_ADDRESS-A1B2], 212-555-0100, card [CREDIT_CARD-C3D4]"
# Note: phone number is unchanged (Action.LOG keeps original text)
Expected output:
When overrides are present, the SDK sends multiple guardline specs to the server — one per distinct action group. This is handled automatically; your code doesn't need to change.
Using Bundles
Bundles are the fastest way to cover broad categories without listing every label.
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import PIIScanner, PIILabel
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
# Scan for all core PII (23 labels) using the PII bundle
pii = PIIScanner(
labels=[PIILabel.PII],
action=Action.REPLACE,
)
result = guard.scan_input(
"Hi, I'm Marcus Chen, reachable at marcus@corp.io or 555-234-5678",
scanners=[pii],
)
print(result.status) # "secured"
print(result.processed_text)
# "Hi, I'm [PERSON_NAME-A1B2], reachable at [EMAIL_ADDRESS-C3D4] or [PHONE_NUMBER-E5F6]"
Expected output:
To scan everything with one label:
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import PIIScanner, PIILabel
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
pii = PIIScanner(labels=[PIILabel.ALL], action=Action.REPLACE)
result = guard.scan_input(
"My AWS key is AKIAIOSFODNN7EXAMPLE and my GitHub token is ghp_abc123def456",
scanners=[pii],
)
print(result.status) # "secured"
print(result.processed_text)
# "My AWS key is [AWS_ACCESS_KEY-A1B2] and my GitHub token is [GITHUB_TOKEN-C3D4]"
Expected output:
Actions and Conditions
PIIScanner defaults to Action.REPLACE (replace detected text with [LABEL-HASH] placeholders) and Condition.ANY (trigger if at least one label is detected). You can use Action.BLOCK to reject the entire input if PII is found, or Action.LOG to record detections without modifying text.
See the Concepts page for the full reference on Actions and Conditions.
scan_input Example
Scanning user input before it reaches your LLM — the most common use case.
from meshulash_guard import Guard, Action, Condition
from meshulash_guard.scanners import PIIScanner, PIILabel
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
# Use bundles to cover broad categories
pii = PIIScanner(
labels=[PIILabel.PII, PIILabel.SECRETS],
action=Action.REPLACE,
condition=Condition.ANY,
)
user_prompt = (
"I need help with my account. My name is Jordan Rivera, "
"email jordan@example.org, and my OpenAI key is sk-proj-abc123xyz789. "
"SSN: 123-45-6789."
)
result = guard.scan_input(user_prompt, scanners=[pii])
print(f"Status: {result.status}")
print(f"Processed: {result.processed_text}")
print(f"Detections: {len(result.placeholders)} items redacted")
# Send result.processed_text to your LLM — not the original!
Expected output:
Status: secured
Processed: I need help with my account. My name is [PERSON_NAME-A1B2],
email [EMAIL_ADDRESS-C3D4], and my OpenAI key is [OPENAI_API_KEY-E5F6].
SSN: [NATIONAL_ID-G7H8].
Detections: 4 items redacted
scan_output Example
Scanning the LLM's response to catch any PII the model echoed back.
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import PIIScanner, PIILabel
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
pii = PIIScanner(
labels=[PIILabel.EMAIL_ADDRESS, PIILabel.PHONE_NUMBER, PIILabel.CREDIT_CARD],
action=Action.REPLACE,
)
# Simulate an LLM response that echoed back user PII
llm_response = (
"Sure! I've found your account linked to user@domain.com. "
"Your card ending in 4111-1111-1111-1111 will be charged. "
"We'll call you at 800-555-0199 to confirm."
)
result = guard.scan_output(llm_response, scanners=[pii])
print(f"Status: {result.status}")
print(f"Cleaned: {result.processed_text}")
Expected output: