Skip to content

CodeScanner

Detect source code embedded in user input or LLM responses.

When to Use This

CodeScanner identifies text that contains source code — programming languages, markup, stylesheets, query languages, and shell scripts. Use it when your application should not receive or emit raw code, or when you want to monitor which types of code pass through your system.

Common scenarios include blocking SQL injection payloads in chatbot inputs before they reach downstream systems, preventing users from submitting code to services that are not meant to execute it, and catching LLM responses that unexpectedly contain implementation details or shell commands. Use the languages filter to restrict detection to the specific languages that matter for your threat model — for example, only SQL and BASH for a customer-facing assistant.

Quick Example

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

code = CodeScanner(
    threshold=0.7,
    action=Action.BLOCK,
)

result = guard.scan_input(
    "def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')",
    scanners=[code],
)

print(result.status)          # "blocked"
print(result.processed_text)  # original text unchanged (Action.BLOCK keeps text)

Expected output:

blocked
def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')

Supported Languages

CodeScanner can detect the following 17 languages. Use CodeLanguage members in the languages parameter to restrict detection to a subset.

Value Language
CodeLanguage.PYTHON Python
CodeLanguage.JAVASCRIPT JavaScript
CodeLanguage.TYPESCRIPT TypeScript
CodeLanguage.JAVA Java
CodeLanguage.CPP C++
CodeLanguage.C C
CodeLanguage.CSHARP C#
CodeLanguage.GO Go
CodeLanguage.RUST Rust
CodeLanguage.PHP PHP
CodeLanguage.RUBY Ruby
CodeLanguage.SWIFT Swift
CodeLanguage.KOTLIN Kotlin
CodeLanguage.SQL SQL
CodeLanguage.BASH Bash / Shell
CodeLanguage.HTML HTML
CodeLanguage.CSS CSS

Parameters

Parameter Type Default Description
threshold float 0.7 Detection confidence threshold (0.0–1.0). Higher values require stronger code signal before triggering; lower values catch more ambiguous cases.
languages list[CodeLanguage] \| None None Filter detection to specific languages. None means detect all 17 supported languages.
action Action Action.BLOCK Action when code is detected.
overrides dict[CodeLanguage, Action] \| None None Per-language action overrides. Useful for treating certain languages differently — e.g., block SQL but only log HTML.

Actions and Conditions

CodeScanner defaults to Action.BLOCK because code in user input frequently indicates an injection attempt or policy violation. For a general-purpose assistant where users occasionally paste code snippets, start with Action.LOG to measure frequency before deciding on enforcement.

Raise threshold to 0.85 or higher to reduce false positives on text that superficially resembles code (technical writing, command references). Lower it toward 0.5 to catch obfuscated or minimal code snippets.

See the Concepts page for the full reference on Actions and Conditions.

scan_input Example

Blocking SQL and Bash specifically — the two languages most likely to represent injection payloads in a customer-facing assistant:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner, CodeLanguage

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

code = CodeScanner(
    threshold=0.65,
    languages=[CodeLanguage.SQL, CodeLanguage.BASH],
    action=Action.BLOCK,
)

# Potential SQL injection payload
user_input = "'; DROP TABLE users; SELECT * FROM accounts WHERE '1'='1"

result = guard.scan_input(user_input, scanners=[code])

print(f"Status: {result.status}")

if result.status == "blocked":
    print("Request rejected: code injection payload detected.")

Expected output:

Status: blocked
Request rejected: code injection payload detected.

scan_output Example

Detecting if an LLM response unexpectedly contains source code — for example, a customer support bot that should never provide shell commands:

from meshulash_guard import Guard, Action
from meshulash_guard.scanners import CodeScanner, CodeLanguage

guard = Guard(api_key="sk-your-api-key", tenant_id="your-tenant-id")

code = CodeScanner(
    languages=[CodeLanguage.BASH, CodeLanguage.PYTHON],
    action=Action.BLOCK,
    overrides={CodeLanguage.HTML: Action.LOG},
)

llm_response = (
    "To reset your password, run this command: "
    "curl -X POST https://api.example.com/reset --data 'user=admin'"
)

result = guard.scan_output(llm_response, scanners=[code])

if result.status == "blocked":
    print("LLM response blocked — model included code in its reply.")
else:
    print(result.processed_text)

Expected output:

LLM response blocked — model included code in its reply.