LMDeploy SSRF (CVE-2026-33626) Weaponized Within 13 Hours of Disclosure — Your AI Inference Box Is a Metadata-API Probe

Introduction

Between the moment GitHub published advisory GHSA-6w67-hwm5-92mq — later assigned CVE-2026-33626 — and the first in-the-wild exploitation attempt, 12 hours and 31 minutes passed. That's how long a sysadmin had to ship a patched LMDeploy build before Sysdig's honeypot fleet caught an attacker using the SSRF as a generic HTTP probe against AWS IMDS, Redis, MySQL, and an internal admin interface, chained with out-of-band DNS exfiltration. The bug is a Server-Side Request Forgery in the vision-language image loader of LMDeploy, the Shanghai AI Laboratory toolkit that serves models like InternVL2, internlm-xcomposer2, and Qwen2-VL through an OpenAI-compatible API.

What Happened

LMDeploy is widely used in production for LLM inference, particularly for vision-language models. The HTTP API accepts OpenAI-style chat completion requests where a message can contain an image_url field. When that field is set, LMDeploy's load_image() function in lmdeploy/vl/utils.py fetches the URL and returns the image to the model's context.

The flaw is an almost textbook SSRF: the function performed no hostname resolution check, no private-IP blocklist, and no redirect hardening before calling requests.get(image_url, …). By default, api_server.py binds the server to 0.0.0.0, and API keys are disabled. Anyone who could reach the inference endpoint could tell it to fetch any URL — including IMDS at http://169.254.169.254/, any RFC 1918 address, or http://127.0.0.1:<port>; for loopback service probing.

# Vulnerable pattern in load_image()
if image_url.startswith('http'):
    response = requests.get(image_url, headers=headers, timeout=FETCH_TIMEOUT)
    # No URL validation, no IP blocklist

Proof-of-concept payload:

POST /v1/chat/completions
{
  "model": "internlm-xcomposer2",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Describe this image"},
      {"type": "image_url", "image_url": {
        "url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/";
      }}
    ]
  }]
}

The CVE, tracked as CVE-2026-33626 (CVSS 7.5), was publicly disclosed on April 20–21, 2026, and fixed in LMDeploy 0.12.3. Within 12 hours and 31 minutes of the advisory going live, Sysdig's TRT observed an attacker at 103.116.72.119 produce 10 distinct requests across three phases in a single eight-minute session:

Metadata sweep against AWS IMDS and internal admin endpoints.
Service fingerprinting against Redis (6379), MySQL (3306), and other loopback ports — treating the vision LLM as a generic HTTP GET primitive. Three localhost probes in 36 seconds.
Out-of-band exfiltration using a public OAST service (Burp Collaborator / Project Discovery interactsh-style) to confirm egress even when responses were filtered.

The attacker switched between internlm-xcomposer2 and OpenGVLab/InternVL2-8B mid-session — a signal they were testing whether one VLM refused suspicious inputs where the other complied. They also exercised an unauthenticated drop_conn endpoint in disaggregated LMDeploy clusters that calls self.zmq_disconnect(drop_conn_request.remote_engine_id) — a way to degrade or break prefill/decode routing for a peer by guessing a live engine ID.

Twelve hours to exploitation. Patch-Tuesday cadences and monthly scans do not close this window.

Why It Matters

AI inference servers have quietly become the most credentialed boxes in most cloud accounts. They hold IMDS-reachable IAM roles that can read S3 buckets, write to Bedrock or Vertex, pull private model weights, and reach whatever vector stores and feature stores the model was built against. An SSRF against the HTTP API effectively hands the attacker the role of the inference box. Because the flaw lives in the image loader path, the attack traffic looks like ordinary inference requests — it will sail past most WAF signatures.

The pattern is also generalizable. Sysdig notes this is one of a string of SSRF and RCE bugs in inference and agent-framework stacks being exploited within hours of advisory publication. If you run any LLM-serving software in production, assume its attack surface is already mapped.

Who Is Affected

Any LMDeploy instance running a vision-language model before 0.12.3
Any self-hosted inference service exposed on 0.0.0.0 without an API key or auth layer
Cloud deployments with IMDSv1 enabled or IMDSv2 without hop-limit hardening
Disaggregated LMDeploy clusters — the drop_conn endpoint lacks auth by default
Organizations that run LLM-serving software without egress controls on the inference VM

How to Protect Yourself

Step 1: Patch immediately.

pip install --upgrade "lmdeploy>=0.12.3"
# or with uv
uv pip install --upgrade "lmdeploy>=0.12.3"
# container workloads
docker pull openmmlab/lmdeploy:v0.12.3

Verify the _is_safe_url() validator is present in the installed build:

python -c "from lmdeploy.vl.media import connection; print(hasattr(connection, '_is_safe_url'))"

Step 2: Harden IMDS on every cloud instance running LMDeploy (or any LLM server).

AWS — enforce IMDSv2 with a single-hop limit:

aws ec2 modify-instance-metadata-options \
    --instance-id i-0123456789abcdef0 \
    --http-tokens required \
    --http-endpoint enabled \
    --http-put-response-hop-limit 1

GCP — set the metadata-concealment firewall rule:

gcloud compute firewall-rules create deny-metadata-http \
    --action=DENY --rules=tcp:80,tcp:443 --priority=100 \
    --destination-ranges=169.254.169.254/32 \
    --direction=EGRESS --target-tags=lmdeploy

Azure — scope managed identity permissions and require X-Identity-Header / X-IDENTITY-ENDPOINT aware clients.

Step 3: Put the inference API behind auth and egress controls.

Do not run LMDeploy bound to 0.0.0.0 on the open internet. In front of it:

server {
    listen 443 ssl;
    server_name llm.example.com;
    location /v1/ {
        auth_request /auth;
        proxy_pass http://127.0.0.1:23333;
    }
}

Enable LMDeploy's API key option (--api-keys) and restrict egress from the inference VM to the small set of URLs your application legitimately needs the model to fetch. Block all RFC 1918 and link-local addresses outbound.

Step 4: Hunt for exploitation attempts. Look at LMDeploy or reverse-proxy logs for the past few weeks for suspicious image_url values:

grep -E 'image_url.*(169\.254|127\.0\.0\.1|10\.|192\.168|172\.(1[6-9]|2[0-9]|3[0-1]))' /var/log/lmdeploy/*.log
grep -Ei 'image_url.*\.(oast|interact|burpcollaborator|ngrok|oastify)\.' /var/log/lmdeploy/*.log

Watch for unusual combinations: requests that alternate models within the same session, requests from a single IP hitting /v1/chat/completions 10+ times in under a minute with varying image_url targets, and any request pointing at RFC 1918 addresses.

Step 5: If you find evidence of exploitation, rotate everything the inference box could reach. IAM role session tokens are the first priority — once IMDS credentials leave the box, they're valid anywhere until they expire or you rotate the role trust policy.

aws iam update-assume-role-policy --role-name LMDeployRole --policy-document file://tight-trust.json
aws iam detach-role-policy --role-name LMDeployRole --policy-arn <each-policy>
aws iam attach-role-policy --role-name LMDeployRole --policy-arn <each-policy>

Longer term: add LLM-aware runtime security (Sysdig Falco, Aqua, CrowdStrike Falcon LogScale, or similar) that flags the inference process making anomalous outbound HTTP requests to cloud metadata endpoints.

Introduction

What Happened

Why It Matters

Who Is Affected

How to Protect Yourself

Source