Skip to content

BanSubstringsScanner

Block or log text containing any term from a configurable banned-word list.

When to Use This

BanSubstringsScanner enforces an explicit list of prohibited terms — competitor names, offensive words, confidential product names, regulated content, or any string that should never appear in input or output. The list can be provided inline as a Python list or loaded from a plain-text file at construction time.

Compared to classifier-based scanners, BanSubstringsScanner is fully deterministic: you control exactly which terms trigger it, and it fires the same way every time. Use it when you need guaranteed enforcement on a known list rather than probabilistic detection. Common use cases include preventing users from mentioning competitor brands, stopping LLM responses from disclosing internal project names, and enforcing regulatory keyword policies in financial or healthcare applications.

Quick Example

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import BanSubstringsScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

ban = BanSubstringsScanner(
    substrings=["CompetitorX", "CompetitorY", "rival-product"],
    action=Action.BLOCK,
)

result = guard.scan_input(
    "I heard CompetitorX has a better deal than you. Should I switch?",
    scanners=[ban],
)

print(result.status)          # "blocked"
print(result.processed_text)  # original text unchanged (Action.BLOCK keeps text)

Expected output:

blocked
I heard CompetitorX has a better deal than you. Should I switch?

Match Types

BanSubstringsScanner supports two matching modes via the MatchType enum.

Value Behavior
MatchType.STRING Substring can appear anywhere in the text (default). Matches "hack" inside "hackathon".
MatchType.WORD Substring must match as a complete word. Matches "hack" in "the hack was" but NOT in "hackathon".

Use MatchType.WORD when your banned terms are common English words that could appear as parts of legitimate longer words. For example, banning "gun" with MatchType.STRING would incorrectly block "begun" and "penguin" — switch to MatchType.WORD to avoid those false positives.

Parameters

Parameter Type Default Description
substrings list[str] \| str \| Path required Banned terms: a Python list of strings, a file path as a string, or a pathlib.Path to a text file with one term per line (blank lines ignored). File is read at construction time — I/O errors raise ValueError immediately.
match_type MatchType MatchType.STRING Matching mode: STRING (substring anywhere) or WORD (whole-word boundary match).
case_sensitive bool False Whether matching is case-sensitive. Defaults to False (case-insensitive).
action Action Action.BLOCK Action when a banned term is found.

Actions and Conditions

BanSubstringsScanner defaults to Action.BLOCK because the explicit ban list represents a deliberate policy decision — if a term is on the list, it should not pass. Use Action.LOG when monitoring for terms you may want to ban in the future but are still gathering data on.

Set case_sensitive=True when you need to distinguish between a proper noun and its lowercase form — for example, blocking "ACME" (a brand name) without blocking "acme" (common use).

See the Concepts page for the full reference on Actions and Conditions.

scan_input Example

Loading banned terms from a file and using MatchType.WORD to avoid false positives on partial matches:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import BanSubstringsScanner, MatchType

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

# banned-terms.txt contains one term per line:
# hack
# exploit
# bypass
# jailbreak
ban = BanSubstringsScanner(
    substrings="banned-terms.txt",
    match_type=MatchType.WORD,
    action=Action.BLOCK,
)

messages = [
    "Is there a way to bypass the age verification?",
    "I attended a hackathon last weekend.",
    "Can you help me exploit this discount code?",
]

for message in messages:
    result = guard.scan_input(message, scanners=[ban])
    print(f"[{result.status}] {message[:55]}...")

Expected output:

[blocked] Is there a way to bypass the age verification?...
[passed] I attended a hackathon last weekend....
[blocked] Can you help me exploit this discount code?...

scan_output Example

Preventing an LLM from mentioning competitor names or internal project codenames in its responses:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import BanSubstringsScanner, MatchType

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

ban = BanSubstringsScanner(
    substrings=["CompetitorX", "CompetitorY", "Project Titan", "internal-api"],
    match_type=MatchType.WORD,
    case_sensitive=False,
    action=Action.BLOCK,
)

llm_response = (
    "Based on your needs, you might also want to consider CompetitorX, "
    "which offers similar features at a lower price point."
)

result = guard.scan_output(llm_response, scanners=[ban])

if result.status == "blocked":
    print("LLM response blocked — model mentioned a banned term.")
else:
    print(result.processed_text)

Expected output:

LLM response blocked — model mentioned a banned term.