Skip to content

PIIScanner

Detect and redact personally identifiable information in text.

When to Use This

PIIScanner is your first line of defense when users might include sensitive personal data in prompts. Customer support chatbots, document analysis tools, HR applications, and any LLM app that processes free-text input from real people are all candidates for PII scanning. Without it, names, email addresses, phone numbers, Social Security numbers, credit card numbers, API keys, and dozens of other sensitive identifiers flow directly into your LLM — and potentially into logs, caches, and training pipelines.

PIIScanner covers three broad categories: identity and contact information (names, addresses, phone numbers), financial data (credit cards, bank accounts, IBAN), and credentials (passwords, API keys, cloud service tokens). For workflows where you need the original values back after the LLM processes the redacted text, see the deanonymize workflow.

Quick Example

from meshulash_guard import Guard, Action, Condition
from meshulash_guard.scanners import PIIScanner, PIILabel

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

pii = PIIScanner(
    labels=[PIILabel.EMAIL_ADDRESS, PIILabel.PHONE_NUMBER],
    action=Action.REPLACE,
    condition=Condition.ANY,
)

result = guard.scan_input(
    "My email is sarah@company.com and my phone is 555-867-5309",
    scanners=[pii],
)

print(result.status)          # "secured"
print(result.processed_text)  # "My email is [EMAIL_ADDRESS-A1B2] and my phone is [PHONE_NUMBER-C3D4]"
print(result.placeholders)
# {"[EMAIL_ADDRESS-A1B2]": "sarah@company.com", "[PHONE_NUMBER-C3D4]": "555-867-5309"}

Expected output:

secured
My email is [EMAIL_ADDRESS-A1B2] and my phone is [PHONE_NUMBER-C3D4]
{'[EMAIL_ADDRESS-A1B2]': 'sarah@company.com', '[PHONE_NUMBER-C3D4]': '555-867-5309'}

Labels

PIIScanner has 53 members: 47 canonical labels you use individually, 5 bundle shortcuts that expand to groups of labels, and one ALL wildcard.

Individual Labels

Label What It Detects
PIILabel.PERSON_NAME Full names of individuals
PIILabel.NATIONAL_ID National ID numbers, SSNs, and government-issued ID numbers
PIILabel.DRIVER_LICENSE Driver's license numbers
PIILabel.USERNAME Usernames and login handles
PIILabel.TAX_NUMBER Tax identification numbers (EIN, VAT, TIN)
PIILabel.ACCOUNT_ID Account or customer IDs
PIILabel.BUSINESS_ID Business registration numbers
PIILabel.EMAIL_ADDRESS Email addresses (RFC 5321/5322 validated)
PIILabel.PHONE_NUMBER Phone numbers in international and local formats
PIILabel.STREET_ADDRESS Street addresses
PIILabel.CITY City names in address context
PIILabel.COUNTY County names in address context
PIILabel.STATE State or province names in address context
PIILabel.COUNTRY Country names in address context
PIILabel.POSTAL_CODE ZIP codes and postal codes
PIILabel.IP_ADDRESS IPv4 and IPv6 addresses
PIILabel.MAC_ADDRESS Network hardware (MAC) addresses
PIILabel.URL URLs and web addresses
PIILabel.CREDIT_CARD Credit and debit card numbers (Luhn validated)
PIILabel.IBAN International Bank Account Numbers
PIILabel.BANK_ACCOUNT Bank account numbers
PIILabel.ROUTING_NUMBER Bank routing (ABA) numbers
PIILabel.CRYPTO_ADDRESS Cryptocurrency wallet addresses (BTC, ETH, etc.)
PIILabel.PASSWORD Password strings (entropy-validated)
PIILabel.SECRET_KEY Generic secret keys
PIILabel.ACCESS_TOKEN OAuth and bearer access tokens
PIILabel.PRIVATE_KEY Private key material (RSA, EC, PEM)
PIILabel.CONNECTION_STRING Database and service connection strings
PIILabel.AWS_ACCESS_KEY AWS access key IDs (AKIA...)
PIILabel.AWS_SECRET_KEY AWS secret access keys
PIILabel.GCP_API_KEY Google Cloud Platform API keys
PIILabel.GCP_SERVICE_ACCOUNT GCP service account JSON credentials
PIILabel.OPENAI_API_KEY OpenAI API keys (sk-...)
PIILabel.AZURE_API_KEY Azure Cognitive Services and other Azure API keys
PIILabel.STRIPE_SECRET_KEY Stripe secret keys (sk_live_..., sk_test_...)
PIILabel.STRIPE_PUBLISHABLE_KEY Stripe publishable keys (pk_live_..., pk_test_...)
PIILabel.SLACK_TOKEN Slack bot and user tokens (xoxb-..., xoxp-...)
PIILabel.GITHUB_TOKEN GitHub personal access tokens and fine-grained tokens
PIILabel.TWILIO_API_KEY Twilio API keys and auth tokens
PIILabel.SENDGRID_API_KEY SendGrid API keys
PIILabel.HEROKU_API_KEY Heroku API keys
PIILabel.API_KEY Generic API keys not matched by a more specific label
PIILabel.TOKEN Generic tokens not matched by a more specific label
PIILabel.USER_AGENT Browser and HTTP user-agent strings
PIILabel.AWS_ARN AWS resource ARNs
PIILabel.DATE_TIME Dates and timestamps in a personal context
PIILabel.ORGANIZATION Organization and company names in a personal context

Bundles

Bundles are shortcuts that expand to multiple labels. Use them instead of listing every label individually.

PIILabel.PII — 23 labels (expanded by default)

The core PII bundle covers identity, contact, location, and financial labels. This is the most commonly used bundle.

Label What It Detects
PIILabel.PERSON_NAME Full names of individuals
PIILabel.NATIONAL_ID National ID and SSN numbers
PIILabel.DRIVER_LICENSE Driver's license numbers
PIILabel.USERNAME Usernames and login handles
PIILabel.TAX_NUMBER Tax identification numbers
PIILabel.ACCOUNT_ID Account or customer IDs
PIILabel.BUSINESS_ID Business registration numbers
PIILabel.EMAIL_ADDRESS Email addresses
PIILabel.PHONE_NUMBER Phone numbers
PIILabel.STREET_ADDRESS Street addresses
PIILabel.CITY City names
PIILabel.COUNTY County names
PIILabel.STATE State or province names
PIILabel.COUNTRY Country names
PIILabel.POSTAL_CODE ZIP and postal codes
PIILabel.IP_ADDRESS IPv4 and IPv6 addresses
PIILabel.MAC_ADDRESS Network MAC addresses
PIILabel.URL URLs and web addresses
PIILabel.CREDIT_CARD Credit and debit card numbers
PIILabel.IBAN International bank account numbers
PIILabel.BANK_ACCOUNT Bank account numbers
PIILabel.ROUTING_NUMBER Bank routing numbers
PIILabel.CRYPTO_ADDRESS Cryptocurrency wallet addresses
PIILabel.PHI — 1 label

Protected Health Information subset. Expand as regulations require additional PHI labels in future versions.

Label What It Detects
PIILabel.PERSON_NAME Patient name
PIILabel.PCI — 4 labels

Payment Card Industry data. Use to comply with PCI DSS requirements.

Label What It Detects
PIILabel.CREDIT_CARD Credit and debit card numbers
PIILabel.IBAN International bank account numbers
PIILabel.BANK_ACCOUNT Bank account numbers
PIILabel.ROUTING_NUMBER Bank routing numbers
PIILabel.SECRETS — 20 labels

All credential and secret token labels. Use when scanning developer tools, CI/CD pipelines, or any context where credentials might appear.

Label What It Detects
PIILabel.PASSWORD Password strings
PIILabel.SECRET_KEY Generic secret keys
PIILabel.ACCESS_TOKEN OAuth and bearer tokens
PIILabel.PRIVATE_KEY Private key material
PIILabel.CONNECTION_STRING Database connection strings
PIILabel.AWS_ACCESS_KEY AWS access key IDs
PIILabel.AWS_SECRET_KEY AWS secret access keys
PIILabel.GCP_API_KEY Google Cloud API keys
PIILabel.GCP_SERVICE_ACCOUNT GCP service account credentials
PIILabel.OPENAI_API_KEY OpenAI API keys
PIILabel.AZURE_API_KEY Azure API keys
PIILabel.STRIPE_SECRET_KEY Stripe secret keys
PIILabel.STRIPE_PUBLISHABLE_KEY Stripe publishable keys
PIILabel.SLACK_TOKEN Slack bot and user tokens
PIILabel.GITHUB_TOKEN GitHub personal access tokens
PIILabel.TWILIO_API_KEY Twilio API keys
PIILabel.SENDGRID_API_KEY SendGrid API keys
PIILabel.HEROKU_API_KEY Heroku API keys
PIILabel.API_KEY Generic API keys
PIILabel.TOKEN Generic tokens
PIILabel.TECH — 2 labels

Technical identifiers that can be used to fingerprint a user or device.

Label What It Detects
PIILabel.USER_AGENT Browser and HTTP user-agent strings
PIILabel.AWS_ARN AWS resource ARNs

PIILabel.ALL expands all five bundles and deduplicates the result: 45 unique labels. (PERSON_NAME appears in both PII and PHI; CREDIT_CARD, IBAN, BANK_ACCOUNT, and ROUTING_NUMBER appear in both PII and PCI — so 23 + 1 + 4 + 20 + 2 = 50 minus 5 duplicates = 45 unique labels.)

Parameters

Parameter Type Default Description
labels list[PIILabel] required One or more labels to detect. Cannot be empty. Use PIILabel.ALL to detect everything.
action Action Action.REPLACE Default action when a label is detected.
condition Condition Condition.ANY Gating condition — when the scanner triggers.
overrides dict[PIILabel, Action] None Per-label action overrides. Labels in this dict use the specified action instead of action.
threshold float None Confidence threshold (0.0–1.0). Detections below this score are ignored.
allowlist list[str] None Values to allow through even when detected (e.g., ["test@example.com"]).

Per-Label Overrides

Per-label overrides let you apply different actions to different labels. For example: replace email addresses (removing them from the text) but only log phone numbers (keeping them visible while recording the detection server-side).

from meshulash_guard import Guard, Action, Condition
from meshulash_guard.scanners import PIIScanner, PIILabel

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

pii = PIIScanner(
    labels=[PIILabel.EMAIL_ADDRESS, PIILabel.PHONE_NUMBER, PIILabel.CREDIT_CARD],
    action=Action.REPLACE,    # Default: replace emails and credit cards
    overrides={
        PIILabel.PHONE_NUMBER: Action.LOG,   # Only log phone numbers
    },
)

result = guard.scan_input(
    "Contact: alice@example.com, 212-555-0100, card 4111-1111-1111-1111",
    scanners=[pii],
)

print(result.status)
# "secured"

print(result.processed_text)
# "Contact: [EMAIL_ADDRESS-A1B2], 212-555-0100, card [CREDIT_CARD-C3D4]"
# Note: phone number is unchanged (Action.LOG keeps original text)

Expected output:

secured
Contact: [EMAIL_ADDRESS-A1B2], 212-555-0100, card [CREDIT_CARD-C3D4]

When overrides are present, the SDK sends multiple guardline specs to the server — one per distinct action group. This is handled automatically; your code doesn't need to change.

Using Bundles

Bundles are the fastest way to cover broad categories without listing every label.

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import PIIScanner, PIILabel

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

# Scan for all core PII (23 labels) using the PII bundle
pii = PIIScanner(
    labels=[PIILabel.PII],
    action=Action.REPLACE,
)

result = guard.scan_input(
    "Hi, I'm Marcus Chen, reachable at marcus@corp.io or 555-234-5678",
    scanners=[pii],
)

print(result.status)          # "secured"
print(result.processed_text)
# "Hi, I'm [PERSON_NAME-A1B2], reachable at [EMAIL_ADDRESS-C3D4] or [PHONE_NUMBER-E5F6]"

Expected output:

secured
Hi, I'm [PERSON_NAME-A1B2], reachable at [EMAIL_ADDRESS-C3D4] or [PHONE_NUMBER-E5F6]

To scan everything with one label:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import PIIScanner, PIILabel

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

pii = PIIScanner(labels=[PIILabel.ALL], action=Action.REPLACE)

result = guard.scan_input(
    "My AWS key is AKIAIOSFODNN7EXAMPLE and my GitHub token is ghp_abc123def456",
    scanners=[pii],
)

print(result.status)          # "secured"
print(result.processed_text)
# "My AWS key is [AWS_ACCESS_KEY-A1B2] and my GitHub token is [GITHUB_TOKEN-C3D4]"

Expected output:

secured
My AWS key is [AWS_ACCESS_KEY-A1B2] and my GitHub token is [GITHUB_TOKEN-C3D4]

Actions and Conditions

PIIScanner defaults to Action.REPLACE (replace detected text with [LABEL-HASH] placeholders) and Condition.ANY (trigger if at least one label is detected). You can use Action.BLOCK to reject the entire input if PII is found, or Action.LOG to record detections without modifying text.

See the Concepts page for the full reference on Actions and Conditions.

scan_input Example

Scanning user input before it reaches your LLM — the most common use case.

from meshulash_guard import Guard, Action, Condition
from meshulash_guard.scanners import PIIScanner, PIILabel

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

# Use bundles to cover broad categories
pii = PIIScanner(
    labels=[PIILabel.PII, PIILabel.SECRETS],
    action=Action.REPLACE,
    condition=Condition.ANY,
)

user_prompt = (
    "I need help with my account. My name is Jordan Rivera, "
    "email jordan@example.org, and my OpenAI key is sk-proj-abc123xyz789. "
    "SSN: 123-45-6789."
)

result = guard.scan_input(user_prompt, scanners=[pii])

print(f"Status: {result.status}")
print(f"Processed: {result.processed_text}")
print(f"Detections: {len(result.placeholders)} items redacted")

# Send result.processed_text to your LLM — not the original!

Expected output:

Status: secured
Processed: I need help with my account. My name is [PERSON_NAME-A1B2],
email [EMAIL_ADDRESS-C3D4], and my OpenAI key is [OPENAI_API_KEY-E5F6].
SSN: [NATIONAL_ID-G7H8].
Detections: 4 items redacted

scan_output Example

Scanning the LLM's response to catch any PII the model echoed back.

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import PIIScanner, PIILabel

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

pii = PIIScanner(
    labels=[PIILabel.EMAIL_ADDRESS, PIILabel.PHONE_NUMBER, PIILabel.CREDIT_CARD],
    action=Action.REPLACE,
)

# Simulate an LLM response that echoed back user PII
llm_response = (
    "Sure! I've found your account linked to user@domain.com. "
    "Your card ending in 4111-1111-1111-1111 will be charged. "
    "We'll call you at 800-555-0199 to confirm."
)

result = guard.scan_output(llm_response, scanners=[pii])

print(f"Status: {result.status}")
print(f"Cleaned: {result.processed_text}")

Expected output:

Status: secured
Cleaned: Sure! I've found your account linked to [EMAIL_ADDRESS-A1B2].
Your card ending in [CREDIT_CARD-C3D4] will be charged.
We'll call you at [PHONE_NUMBER-E5F6] to confirm.