Skip to content

GibberishScanner

Detect random character sequences, keyboard mashing, and adversarial noise in text.

When to Use This

GibberishScanner identifies text that lacks the structure of natural language — random keystrokes, repeated nonsense characters, adversarial whitespace and character sequences, or automated junk submissions. Use it to clean up your input pipeline before expensive LLM calls, and to catch adversarial inputs designed to confuse downstream processing.

Common scenarios include rejecting keyboard-mash inputs in chat interfaces, filtering bot-generated spam before content moderation, detecting prompt-injection noise designed to overwhelm token windows, and validating that OCR or file-parsed text is coherent before sending it to an LLM. GibberishScanner supports both English and Hebrew text.

Quick Example

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import GibberishScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

gibberish = GibberishScanner(
    threshold=0.7,
    action=Action.BLOCK,
)

result = guard.scan_input(
    "asdfghjklqwertyuiop zxcvbnm asdfghjkl",
    scanners=[gibberish],
)

print(result.status)          # "blocked"
print(result.processed_text)  # original text unchanged (Action.BLOCK keeps text)

Expected output:

blocked
asdfghjklqwertyuiop zxcvbnm asdfghjkl

How Gibberish Detection Works

Text is scored from 0.0 (fully coherent, natural language) to 1.0 (likely gibberish). The threshold controls the cutoff: text scoring above the threshold is flagged.

The detection engine analyzes text structure to distinguish natural language patterns from random character sequences. It works for both English and Hebrew input. Short inputs (a few characters) may score unpredictably — consider pairing GibberishScanner with an input length check for very short messages.

Threshold guidance:

  • 0.7 (default) — balanced; catches clear gibberish while allowing technical strings like API keys or short codes.
  • 0.5 — stricter; flags borderline inputs like random-looking short tokens.
  • 0.9 — permissive; only flags the most obvious random-character sequences.

Parameters

Parameter Type Default Description
threshold float 0.7 Detection threshold (0.0–1.0). Text scoring above this value is flagged as gibberish. Higher = more permissive (fewer flags); lower = stricter (more flags).
action Action Action.BLOCK Action when gibberish is detected.

Actions and Conditions

GibberishScanner defaults to Action.BLOCK because gibberish input almost never has a legitimate purpose and is frequently adversarial. The main reason to switch to Action.LOG is during initial rollout to understand your false-positive rate on your specific user base — some applications receive naturally noisy input (product codes, serial numbers) that might score high.

If you have many legitimate inputs with high entropy (UUIDs, short codes, passwords), raise threshold to 0.85 or higher to avoid false positives on those patterns.

See the Concepts page for the full reference on Actions and Conditions.

scan_input Example

Filtering adversarial noise before sending user input to an LLM, while allowing through legitimate high-entropy strings like order IDs:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import GibberishScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

gibberish = GibberishScanner(
    threshold=0.8,
    action=Action.BLOCK,
)

inputs = [
    "xkqzpwvfjm bqxlm zzzqq pppppp fff",
    "My order number is ORD-2024-8847291",
    "qqqqqqqqqq wwwwwwww eeeeeeee rrrr",
]

for text in inputs:
    result = guard.scan_input(text, scanners=[gibberish])
    print(f"[{result.status}] {text[:45]}...")

Expected output:

[blocked] xkqzpwvfjm bqxlm zzzqq pppppp fff...
[passed] My order number is ORD-2024-8847291...
[blocked] qqqqqqqqqq wwwwwwww eeeeeeee rrrr...

scan_output Example

Catching garbled or corrupted LLM output — for example, after a model generates degenerate text due to a misconfigured sampling parameter:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import GibberishScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

gibberish = GibberishScanner(
    threshold=0.7,
    action=Action.BLOCK,
)

# Degenerate LLM output
llm_response = "thththththth anananananan ststststst inging inginging"

result = guard.scan_output(llm_response, scanners=[gibberish])

if result.status == "blocked":
    print("LLM response blocked — output appears garbled or degenerate.")
else:
    print(result.processed_text)

Expected output:

LLM response blocked — output appears garbled or degenerate.