GibberishScanner
Detect random character sequences, keyboard mashing, and adversarial noise in text.
When to Use This
GibberishScanner identifies text that lacks the structure of natural language — random keystrokes, repeated nonsense characters, adversarial whitespace and character sequences, or automated junk submissions. Use it to clean up your input pipeline before expensive LLM calls, and to catch adversarial inputs designed to confuse downstream processing.
Common scenarios include rejecting keyboard-mash inputs in chat interfaces, filtering bot-generated spam before content moderation, detecting prompt-injection noise designed to overwhelm token windows, and validating that OCR or file-parsed text is coherent before sending it to an LLM. GibberishScanner supports both English and Hebrew text.
Quick Example
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import GibberishScanner
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
gibberish = GibberishScanner(
threshold=0.7,
action=Action.BLOCK,
)
result = guard.scan_input(
"asdfghjklqwertyuiop zxcvbnm asdfghjkl",
scanners=[gibberish],
)
print(result.status) # "blocked"
print(result.processed_text) # original text unchanged (Action.BLOCK keeps text)
Expected output:
How Gibberish Detection Works
Text is scored from 0.0 (fully coherent, natural language) to 1.0 (likely gibberish). The threshold controls the cutoff: text scoring above the threshold is flagged.
The detection engine analyzes text structure to distinguish natural language patterns from random character sequences. It works for both English and Hebrew input. Short inputs (a few characters) may score unpredictably — consider pairing GibberishScanner with an input length check for very short messages.
Threshold guidance:
0.7(default) — balanced; catches clear gibberish while allowing technical strings like API keys or short codes.0.5— stricter; flags borderline inputs like random-looking short tokens.0.9— permissive; only flags the most obvious random-character sequences.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold |
float |
0.7 |
Detection threshold (0.0–1.0). Text scoring above this value is flagged as gibberish. Higher = more permissive (fewer flags); lower = stricter (more flags). |
action |
Action |
Action.BLOCK |
Action when gibberish is detected. |
Actions and Conditions
GibberishScanner defaults to Action.BLOCK because gibberish input almost never has a legitimate purpose and is frequently adversarial. The main reason to switch to Action.LOG is during initial rollout to understand your false-positive rate on your specific user base — some applications receive naturally noisy input (product codes, serial numbers) that might score high.
If you have many legitimate inputs with high entropy (UUIDs, short codes, passwords), raise threshold to 0.85 or higher to avoid false positives on those patterns.
See the Concepts page for the full reference on Actions and Conditions.
scan_input Example
Filtering adversarial noise before sending user input to an LLM, while allowing through legitimate high-entropy strings like order IDs:
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import GibberishScanner
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
gibberish = GibberishScanner(
threshold=0.8,
action=Action.BLOCK,
)
inputs = [
"xkqzpwvfjm bqxlm zzzqq pppppp fff",
"My order number is ORD-2024-8847291",
"qqqqqqqqqq wwwwwwww eeeeeeee rrrr",
]
for text in inputs:
result = guard.scan_input(text, scanners=[gibberish])
print(f"[{result.status}] {text[:45]}...")
Expected output:
[blocked] xkqzpwvfjm bqxlm zzzqq pppppp fff...
[passed] My order number is ORD-2024-8847291...
[blocked] qqqqqqqqqq wwwwwwww eeeeeeee rrrr...
scan_output Example
Catching garbled or corrupted LLM output — for example, after a model generates degenerate text due to a misconfigured sampling parameter:
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import GibberishScanner
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
gibberish = GibberishScanner(
threshold=0.7,
action=Action.BLOCK,
)
# Degenerate LLM output
llm_response = "thththththth anananananan ststststst inging inginging"
result = guard.scan_output(llm_response, scanners=[gibberish])
if result.status == "blocked":
print("LLM response blocked — output appears garbled or degenerate.")
else:
print(result.processed_text)
Expected output: