CodeScanner
Detect source code embedded in user input or LLM responses.
When to Use This
CodeScanner identifies text that contains source code — programming languages, markup, stylesheets, query languages, and shell scripts. Use it when your application should not receive or emit raw code, or when you want to monitor which types of code pass through your system.
Common scenarios include blocking SQL injection payloads in chatbot inputs before they reach downstream systems, preventing users from submitting code to services that are not meant to execute it, and catching LLM responses that unexpectedly contain implementation details or shell commands. Use the languages filter to restrict detection to the specific languages that matter for your threat model — for example, only SQL and BASH for a customer-facing assistant.
Quick Example
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
code = CodeScanner(
threshold=0.7,
action=Action.BLOCK,
)
result = guard.scan_input(
"def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')",
scanners=[code],
)
print(result.status) # "blocked"
print(result.processed_text) # original text unchanged (Action.BLOCK keeps text)
Expected output:
Supported Languages
CodeScanner can detect the following 17 languages. Use CodeLanguage members in the languages parameter to restrict detection to a subset.
| Value | Language |
|---|---|
CodeLanguage.PYTHON |
Python |
CodeLanguage.JAVASCRIPT |
JavaScript |
CodeLanguage.TYPESCRIPT |
TypeScript |
CodeLanguage.JAVA |
Java |
CodeLanguage.CPP |
C++ |
CodeLanguage.C |
C |
CodeLanguage.CSHARP |
C# |
CodeLanguage.GO |
Go |
CodeLanguage.RUST |
Rust |
CodeLanguage.PHP |
PHP |
CodeLanguage.RUBY |
Ruby |
CodeLanguage.SWIFT |
Swift |
CodeLanguage.KOTLIN |
Kotlin |
CodeLanguage.SQL |
SQL |
CodeLanguage.BASH |
Bash / Shell |
CodeLanguage.HTML |
HTML |
CodeLanguage.CSS |
CSS |
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold |
float |
0.7 |
Detection confidence threshold (0.0–1.0). Higher values require stronger code signal before triggering; lower values catch more ambiguous cases. |
languages |
list[CodeLanguage] \| None |
None |
Filter detection to specific languages. None means detect all 17 supported languages. |
action |
Action |
Action.BLOCK |
Action when code is detected. |
overrides |
dict[CodeLanguage, Action] \| None |
None |
Per-language action overrides. Useful for treating certain languages differently — e.g., block SQL but only log HTML. |
Actions and Conditions
CodeScanner defaults to Action.BLOCK because code in user input frequently indicates an injection attempt or policy violation. For a general-purpose assistant where users occasionally paste code snippets, start with Action.LOG to measure frequency before deciding on enforcement.
Raise threshold to 0.85 or higher to reduce false positives on text that superficially resembles code (technical writing, command references). Lower it toward 0.5 to catch obfuscated or minimal code snippets.
See the Concepts page for the full reference on Actions and Conditions.
scan_input Example
Blocking SQL and Bash specifically — the two languages most likely to represent injection payloads in a customer-facing assistant:
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner, CodeLanguage
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
code = CodeScanner(
threshold=0.65,
languages=[CodeLanguage.SQL, CodeLanguage.BASH],
action=Action.BLOCK,
)
# Potential SQL injection payload
user_input = "'; DROP TABLE users; SELECT * FROM accounts WHERE '1'='1"
result = guard.scan_input(user_input, scanners=[code])
print(f"Status: {result.status}")
if result.status == "blocked":
print("Request rejected: code injection payload detected.")
Expected output:
scan_output Example
Detecting if an LLM response unexpectedly contains source code — for example, a customer support bot that should never provide shell commands:
from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner, CodeLanguage
guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")
code = CodeScanner(
languages=[CodeLanguage.BASH, CodeLanguage.PYTHON],
action=Action.BLOCK,
overrides={CodeLanguage.HTML: Action.LOG},
)
llm_response = (
"To reset your password, run this command: "
"curl -X POST https://api.example.com/reset --data 'user=admin'"
)
result = guard.scan_output(llm_response, scanners=[code])
if result.status == "blocked":
print("LLM response blocked — model included code in its reply.")
else:
print(result.processed_text)
Expected output: