CodeScanner

Detect source code embedded in user input or LLM responses.

When to Use This

CodeScanner identifies text that contains source code — programming languages, markup, stylesheets, query languages, and shell scripts. Use it when your application should not receive or emit raw code, or when you want to monitor which types of code pass through your system.

Common scenarios include blocking SQL injection payloads in chatbot inputs before they reach downstream systems, preventing users from submitting code to services that are not meant to execute it, and catching LLM responses that unexpectedly contain implementation details or shell commands. Use the languages filter to restrict detection to the specific languages that matter for your threat model — for example, only SQL and BASH for a customer-facing assistant.

Quick Example

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

code = CodeScanner(
    threshold=0.7,
    action=Action.BLOCK,
)

result = guard.scan_input(
    "def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')",
    scanners=[code],
)

print(result.status)          # "blocked"
print(result.processed_text)  # original text unchanged (Action.BLOCK keeps text)

Expected output:

blocked
def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')

Supported Languages

CodeScanner can detect the following 17 languages. Use CodeLanguage members in the languages parameter to restrict detection to a subset.

Value	Language
`CodeLanguage.PYTHON`	Python
`CodeLanguage.JAVASCRIPT`	JavaScript
`CodeLanguage.TYPESCRIPT`	TypeScript
`CodeLanguage.JAVA`	Java
`CodeLanguage.CPP`	C++
`CodeLanguage.C`	C
`CodeLanguage.CSHARP`	C#
`CodeLanguage.GO`	Go
`CodeLanguage.RUST`	Rust
`CodeLanguage.PHP`	PHP
`CodeLanguage.RUBY`	Ruby
`CodeLanguage.SWIFT`	Swift
`CodeLanguage.KOTLIN`	Kotlin
`CodeLanguage.SQL`	SQL
`CodeLanguage.BASH`	Bash / Shell
`CodeLanguage.HTML`	HTML
`CodeLanguage.CSS`	CSS

Parameters

Parameter	Type	Default	Description
`threshold`	`float`	`0.7`	Detection confidence threshold (0.0–1.0). Higher values require stronger code signal before triggering; lower values catch more ambiguous cases.
`languages`	`list[CodeLanguage] \\| None`	`None`	Filter detection to specific languages. `None` means detect all 17 supported languages.
`action`	`Action`	`Action.BLOCK`	Action when code is detected.
`overrides`	`dict[CodeLanguage, Action] \\| None`	`None`	Per-language action overrides. Useful for treating certain languages differently — e.g., block SQL but only log HTML.

Actions and Conditions

CodeScanner defaults to Action.BLOCK because code in user input frequently indicates an injection attempt or policy violation. For a general-purpose assistant where users occasionally paste code snippets, start with Action.LOG to measure frequency before deciding on enforcement.

Raise threshold to 0.85 or higher to reduce false positives on text that superficially resembles code (technical writing, command references). Lower it toward 0.5 to catch obfuscated or minimal code snippets.

See the Concepts page for the full reference on Actions and Conditions.

scan_input Example

Blocking SQL and Bash specifically — the two languages most likely to represent injection payloads in a customer-facing assistant:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner, CodeLanguage

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

code = CodeScanner(
    threshold=0.65,
    languages=[CodeLanguage.SQL, CodeLanguage.BASH],
    action=Action.BLOCK,
)

# Potential SQL injection payload
user_input = "'; DROP TABLE users; SELECT * FROM accounts WHERE '1'='1"

result = guard.scan_input(user_input, scanners=[code])

print(f"Status: {result.status}")

if result.status == "blocked":
    print("Request rejected: code injection payload detected.")

Expected output:

Status: blocked
Request rejected: code injection payload detected.

scan_output Example

Detecting if an LLM response unexpectedly contains source code — for example, a customer support bot that should never provide shell commands:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner, CodeLanguage

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

code = CodeScanner(
    languages=[CodeLanguage.BASH, CodeLanguage.PYTHON],
    action=Action.BLOCK,
    overrides={CodeLanguage.HTML: Action.LOG},
)

llm_response = (
    "To reset your password, run this command: "
    "curl -X POST https://api.example.com/reset --data 'user=admin'"
)

result = guard.scan_output(llm_response, scanners=[code])

if result.status == "blocked":
    print("LLM response blocked — model included code in its reply.")
else:
    print(result.processed_text)

Expected output:

LLM response blocked — model included code in its reply.