Skip to content

LanguageScanner

Block or log text written in languages outside your application's supported set.

When to Use This

LanguageScanner detects the language of incoming text and triggers when that language is not on your allow-list. Use it when your LLM application is designed for a specific language audience — for example, a Hebrew-language customer support bot that should not process English, French, or Arabic input.

Beyond enforcement, LanguageScanner is useful for operational visibility. Deploying with Action.LOG shows you which languages real users actually submit, helping you tune your allow-list before enabling hard blocks. You can also use overrides to give different languages different actions — log some, block others — without multiple scanner instances.

Quick Example

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import LanguageScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

language = LanguageScanner(
    allowed_languages=["en"],
    action=Action.BLOCK,
)

result = guard.scan_input(
    "Bonjour, je voudrais annuler ma commande.",
    scanners=[language],
)

print(result.status)          # "blocked"
print(result.processed_text)  # original text unchanged (Action.BLOCK keeps text)

Expected output:

blocked
Bonjour, je voudrais annuler ma commande.

How Language Detection Works

Pass allowed_languages as a list of ISO 639-1 two-letter codes (e.g., ["en", "he"]). Text detected as a language not in that list triggers the scanner. Detection works for most common world languages.

Detection uses a minimum confidence threshold (confidence_threshold, default 0.8). If the system cannot determine the language of the input with at least that level of confidence — for example, a very short message or highly ambiguous text — the scanner does not trigger. Lower the threshold to catch more borderline cases; raise it to reduce false positives on ambiguous input.

# Only allow English and Hebrew; log Arabic instead of blocking it
language = LanguageScanner(
    allowed_languages=["en", "he"],
    confidence_threshold=0.75,
    action=Action.BLOCK,
    overrides={"ar": Action.LOG},
)

Parameters

Parameter Type Default Description
allowed_languages list[str] required ISO 639-1 language codes to allow (e.g., ["en", "he"]). Text in any other language triggers the scanner. At least one code is required.
confidence_threshold float 0.8 Minimum detection confidence (0.0–1.0). Lower values catch more borderline cases; higher values reduce false positives on ambiguous or short text.
action Action Action.BLOCK Action when a disallowed language is detected.
overrides dict[str, Action] \| None None Per-language action overrides keyed by ISO 639-1 code. Allows finer control — e.g., block most languages but only log a specific one.

Actions and Conditions

LanguageScanner defaults to Action.BLOCK because receiving text in an unsupported language typically means the request cannot be served correctly. Blocking early prevents downstream errors and wasted LLM calls.

Start with Action.LOG to observe real usage before enforcing blocks. Once you have confirmed which languages appear in production, switch to Action.BLOCK for languages you want to exclude. Use overrides to keep logging for languages you are still deciding on.

See the Concepts page for the full reference on Actions and Conditions.

scan_input Example

Protecting an English-and-Hebrew application from unsupported languages, with Spanish getting a log-only treatment during observation:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import LanguageScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

language = LanguageScanner(
    allowed_languages=["en", "he"],
    action=Action.BLOCK,
    overrides={"es": Action.LOG},
)

user_input = "¿Pueden ayudarme con mi pedido, por favor?"

result = guard.scan_input(user_input, scanners=[language])

print(f"Status: {result.status}")

if result.status == "blocked":
    print("Request rejected: unsupported language detected.")
elif result.status == "logged":
    print("Request allowed but flagged for review.")
else:
    print(result.processed_text)

Expected output:

Status: logged
Request allowed but flagged for review.

scan_output Example

Auditing LLM responses to ensure the model is replying in the expected language:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import LanguageScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

language = LanguageScanner(
    allowed_languages=["en"],
    action=Action.BLOCK,
)

# Unexpected LLM response in a different language
llm_response = "Je suis désolé, je ne comprends pas votre demande."

result = guard.scan_output(llm_response, scanners=[language])

if result.status == "blocked":
    print("LLM response blocked — model replied in an unsupported language.")
else:
    print(result.processed_text)

Expected output:

LLM response blocked — model replied in an unsupported language.