Key Takeaways

  • The per-vendor MCP server model creates linear context window growth — each vendor adds 50+ tools to the system prompt
  • Action routing collapses N tools per vendor into 1, using a domain.action addressing scheme the LLM pattern-matches naturally
  • Context window consumption drops ~75% with three vendors; savings compound with every additional product
  • The pattern extends to any domain with multiple vendor APIs: GRC platforms, identity providers, SIEM, cloud
  • One binary, one deployment, one CI pipeline per domain replaces N of each — operational overhead scales with domains, not vendors

The N-Server Problem

Every vendor in the ecosystem publishes their own MCP server. Each one is a separate binary, a separate Docker image, a separate CI pipeline, a separate set of credentials. That works when you connect one tool. It breaks when you connect ten.

Consider a security consultant running compliance assessments across a client's stack. They need CrowdStrike for endpoint, SentinelOne for managed detection, Microsoft Defender for XDR, Vanta or Drata for GRC, Okta or Entra ID for identity, and three or four more besides. Each vendor's MCP server registers 30–60 tools. The LLM has to reason over hundreds of tool schemas before it can act.

graph TD
  V1["Vendor A — 55 tools"]
  V2["Vendor B — 38 tools"]
  V3["Vendor C — 18 tools"]
  V4["Vendor D — 42 tools"]
  V5["Vendor E — 27 tools"]
  V1 --> T["180+ tool schemas / ~36,000 tokens / in system prompt"]
  V2 --> T
  V3 --> T
  V4 --> T
  V5 --> T

  style T fill:#2a1a1a,stroke:#e05050,color:#e05050
  style V1 fill:#1a1a2a,stroke:#666,color:#aaa
  style V2 fill:#1a1a2a,stroke:#666,color:#aaa
  style V3 fill:#1a1a2a,stroke:#666,color:#aaa
  style V4 fill:#1a1a2a,stroke:#666,color:#aaa
  style V5 fill:#1a1a2a,stroke:#666,color:#aaa

The context window tax is real. Every tool in the system prompt costs tokens. Every token costs inference time. Every inference costs money. And the LLM has to select from a growing menu of options before it can act — O(N) selection where N is the total tool count across all vendors.

There's an operational tax too: separate binaries mean separate builds, separate deployments, separate credential rotations, separate version tracking. The overhead scales with the number of vendors, not with the number of domains you operate in.

The Action-Routed Pattern

The fix is deceptively simple: register one tool per vendor, route actions by domain.

Instead of exposing 55 separate tools for a single vendor — hosts_list, hosts_get, hosts_contain, alerts_list, alerts_get, rtr_session, and 49 more — you register one tool named after the vendor. That tool accepts an action parameter using a dot-separated naming scheme: hosts.list, hosts.get, hosts.contain, alerts.list, alerts.get, rtr.session.

graph TD
  A1["Vendor A — 1 tool, 175 actions"]
  A2["Vendor B — 1 tool, 118 actions"]
  A3["Vendor C — 1 tool, 64 actions"]
  A4["Vendor D — 1 tool, 91 actions"]
  A5["Vendor E — 1 tool, 60 actions"]
  A1 --> T2["5 tool schemas / ~9,000 tokens / in system prompt"]
  A2 --> T2
  A3 --> T2
  A4 --> T2
  A5 --> T2

  style T2 fill:#1a2a1a,stroke:#50c878,color:#50c878
  style A1 fill:#1a2a1a,stroke:#50c878,color:#50c878
  style A2 fill:#1a2a1a,stroke:#50c878,color:#50c878
  style A3 fill:#1a2a1a,stroke:#50c878,color:#50c878
  style A4 fill:#1a2a1a,stroke:#50c878,color:#50c878
  style A5 fill:#1a2a1a,stroke:#50c878,color:#50c878

The dot-separated format is deliberate. It mirrors how practitioners already think about vendor APIs — in terms of domains and operations. The LLM doesn't iterate through 55 tools to find the right one. It pattern-matches on the domain name, which is effectively constant time regardless of how many actions the vendor supports.

The Context Window Math

Tool schemas are expensive. Each one carries a name, a description, and a JSON schema for its parameters. The average tool costs roughly 200 tokens in the system prompt. Here's what that looks like with three vendors:

Per-Vendor Servers
22,200
tokens in system prompt (3 vendors)
Vendor A: 55 tools × ~200 tokens = 11,000
Vendor B: 38 tools × ~200 tokens = 7,600
Vendor C: 18 tools × ~200 tokens = 3,600
Plus: 3 binaries, 3 Docker images, 3 CI pipelines
Action-Routed Server
5,500
tokens in system prompt (3 vendors)
Vendor A: 1 tool × ~2,500 tokens
Vendor B: 1 tool × ~1,800 tokens
Vendor C: 1 tool × ~1,200 tokens
Plus: 1 binary, 1 Docker image, 1 CI pipeline

That's a 75% reduction in context window consumption with just three vendors. The savings compound because each additional vendor adds one tool instead of dozens. Five vendors? The gap widens to 80%+. Ten vendors? You're saving over 90% of the context budget that would otherwise go to tool schemas.

And the operational savings are just as significant: one binary to build, one image to push, one pipeline to maintain, one set of shared libraries to update. Fix a credential-redaction bug once and every vendor benefits.

75%
Context Window Savings
With just 3 vendors. Scales to 90%+ with 10.
O(1)
Tool Selection
LLM pattern-matches on domain name, not linear search through N tools.
1/N
Operational Overhead
One deployment per domain instead of one per vendor.

How Action Routing Works

When the LLM calls a vendor tool, it includes an action parameter like hosts.list. The server's dispatcher splits on the first dot: hosts becomes the domain, list becomes the operation. It looks up the domain handler, sets the action to the operation, and forwards the request.

graph TD
  LLM["LLM calls vendor tool"] --> Tool["Vendor Tool receives action: hosts.list"]
  Tool --> Dispatch["Dispatcher splits on dot — domain: hosts, op: list"]
  Dispatch --> Handler["Domain handler: hosts → action: list"]
  Handler --> API["Vendor API — GET /hosts?filter=..."]
  API --> Result["JSON response"]
  Result --> Handler
  Handler --> Dispatch2["Parsed result"]
  Dispatch2 --> Tool2["MCP result"]
  Tool2 --> LLM2["Formatted response to LLM"]

  style LLM fill:#1a2a1a,stroke:#50c878,color:#50c878
  style LLM2 fill:#1a2a1a,stroke:#50c878,color:#50c878
  style Tool fill:#1a1a2a,stroke:#666,color:#ccc
  style Tool2 fill:#1a1a2a,stroke:#666,color:#ccc
  style Dispatch fill:#1a1a2a,stroke:#888,color:#ccc
  style Dispatch2 fill:#1a1a2a,stroke:#888,color:#ccc
  style Handler fill:#1a1a2a,stroke:#888,color:#ccc
  style API fill:#2a1a1a,stroke:#e05050,color:#e05050
  style Result fill:#2a1a1a,stroke:#e05050,color:#e05050

The dispatcher is shared across all vendor modules. One implementation, one test suite. Each vendor module only needs to define its domain handlers and register them. The routing logic is infrastructure, not business logic.

Why Not Normalize?

graph TD
  subgraph Normalization["Normalization Layer Approach"]
    direction TB
    N1["Vendor A hosts"] --> NL["Normalization
Layer"] N2["Vendor B agents"] --> NL N3["Vendor C endpoints"] --> NL NL --> NM["Unified 'device'
schema"] end subgraph ActionRouted["Action-Routed Approach"] direction TB A1["Vendor A hosts"] --> AR["Native JSON
passthrough"] A2["Vendor B agents"] --> AR2["Native JSON
passthrough"] A3["Vendor C endpoints"] --> AR3["Native JSON
passthrough"] end style Normalization fill:none,stroke:#555 style ActionRouted fill:none,stroke:#555 style NL fill:#2a1a1a,stroke:#e05050,color:#e05050 style NM fill:#2a1a1a,stroke:#e05050,color:#e05050 style AR fill:#1a2a1a,stroke:#50c878,color:#50c878 style AR2 fill:#1a2a1a,stroke:#50c878,color:#50c878 style AR3 fill:#1a2a1a,stroke:#50c878,color:#50c878 style N1 fill:#1a1a2a,stroke:#666,color:#aaa style N2 fill:#1a1a2a,stroke:#666,color:#aaa style N3 fill:#1a1a2a,stroke:#666,color:#aaa style A1 fill:#1a2a1a,stroke:#50c878,color:#50c878 style A2 fill:#1a2a1a,stroke:#50c878,color:#50c878 style A3 fill:#1a2a1a,stroke:#50c878,color:#50c878

The obvious alternative to action routing is normalization: map every vendor's "hosts," "agents," and "endpoints" to a unified "device" schema. This feels clean in theory. In practice, it's a trap.

Normalization layers strip vendor-specific fields that turn out to matter. They introduce translation bugs that only surface at runtime. They require constant maintenance as vendors update their APIs. And they force you to make schema decisions — do you keep Vendor A's first_seen timestamp or Vendor B's createdAt? — that the LLM would handle better on its own.

Action routing takes the opposite approach: each product returns its native JSON. The LLM is a pattern matcher. It's better at understanding schema differences than a rigid normalization layer that strips nuance and introduces translation bugs. Give it good descriptions and it figures out the rest.

Two Connection Models

Stdio: Single Product, Single Tenant

The simplest deployment. One binary, one product flag, one set of credentials. The LLM sees it as a single MCP server:

{ "mcpServers": { "edr-vendor": { "command": "mcp-server", "args": ["--product", "crowdstrike"], "env": { "CLIENT_ID": "...", "CLIENT_SECRET": "..." } } } }

Same binary, different product flag for each vendor. The LLM still sees separate servers, but you only build and ship one artifact.

HTTP: Multi-Product, Multi-Tenant

For hosted deployments — MSSPs serving multiple clients, consultants switching between stacks, SaaS platforms embedding AI into multi-tenant workflows — the binary runs as an HTTP server with per-request credential injection:

mcp-server --transport http // Per-request credentials via headers: // X-Product: crowdstrike // X-Client-ID: ... // X-Client-Secret: ... // Session store caches initialized product instances // by credential fingerprint. First request creates // the client; subsequent requests reuse it.

A session store caches initialized product instances by credential fingerprint. The first request for a given set of credentials creates the client; subsequent requests reuse it. Authentication middleware — JWT validation, OAuth bearer tokens, credential redaction at error boundaries — is written once and protects every product.

Beyond EDR: The Extension Pattern

The action-routed pattern isn't specific to endpoint security. It applies to any domain where you need to connect to multiple vendor APIs through a single MCP surface:

graph TD
  subgraph Domains["Domain Servers"]
    EDR["EDR Server
9 vendors
672 actions"] GRC["GRC Server
Vanta, Drata, Secureframe
~200 actions"] IDP["Identity Server
Entra ID, Okta, Authentik
~150 actions"] SIEM["SIEM Server
Splunk, Sentinel, Elastic
~300 actions"] Cloud["Cloud Server
AWS, GCP, Azure
~400 actions"] end LLM2["LLM Agent"] EDR --> LLM2 GRC --> LLM2 IDP --> LLM2 SIEM --> LLM2 Cloud --> LLM2 style Domains fill:none,stroke:#555 style EDR fill:#1a2a1a,stroke:#50c878,color:#50c878 style GRC fill:#1a2a1a,stroke:#50c878,color:#50c878 style IDP fill:#1a2a1a,stroke:#50c878,color:#50c878 style SIEM fill:#1a2a1a,stroke:#50c878,color:#50c878 style Cloud fill:#1a2a1a,stroke:#50c878,color:#50c878 style LLM2 fill:#1a1a2a,stroke:#888,color:#ccc

For a consultant running compliance assessments, this means one MCP server for GRC platforms (Vanta, Drata, Secureframe) instead of three separate binaries with 30+ tools each. For a security operations team, one server for endpoint vendors instead of five. The LLM sees one tool per vendor per domain — manageable, predictable, and fast.

Domain Example Vendors Est. Actions Per-Vendor Tools Action-Routed Tools
Endpoint / EDR CrowdStrike, SentinelOne, Defender, Elastic, Wazuh, Bitdefender, Cortex, Trend Micro, Huntress 672 ~200 9
GRC / Compliance Vanta, Drata, Secureframe, Hyperproof ~200 ~120 4
Identity Entra ID, Okta, Authentik ~150 ~90 3
SIEM Splunk, Sentinel, Elastic, Sumo Logic, Google SecOps ~300 ~150 5
Cloud AWS, GCP, Azure ~400 ~200 3
Total ~1,722 ~760 24

Across five domains and 24 vendors, action routing replaces ~760 tool schemas with 24. That's the difference between an LLM spending its context budget on schema parsing and spending it on actual reasoning about your security data.

Design Principles

  1. Don't normalize data models.

    Each vendor returns its native JSON. The LLM handles schema differences — it's better at this than a rigid normalization layer that strips vendor-specific fields and introduces translation bugs.

  2. One tool per vendor per domain.

    The tool name is the vendor name. No intermediate abstraction, no generic "edr" tool that tries to translate between vendor semantics. The LLM already knows which vendor it's talking to. Let it.

  3. Dot-separated actions.

    hosts.list, alerts.get, ml_exclusions.create. The dot avoids ambiguity with underscored domain names. The dispatcher splits on the first dot. It's readable, predictable, and mirrors how practitioners think about vendor APIs.

  4. Shared utilities, not shared abstractions.

    Credential redaction, session caching, JWT validation, OAuth proxying — these are concrete utilities extracted into shared libraries, not abstract interfaces that constrain product implementations. One fix propagates everywhere.

  5. Descriptive bridging over prescriptive normalization.

    When vendors use different terminology for similar concepts (e.g., "alerts" vs "threats" vs "incidents"), handle this in tool descriptions — the one place the LLM actually reads before making a call. Good descriptions let the LLM bridge terminology gaps without hiding vendor-specific nuance.

Why This Matters for Consultants and MSSPs

If you're an organization that serves organizations — a security consultant, an MSSP, a compliance auditor — you don't connect to one vendor. You connect to whatever stack your client runs. That might be CrowdStrike this week and SentinelOne next week. Vanta for one client's SOC 2 and Drata for another's ISO 27001. Entra ID here, Okta there.

graph TD
  C["AI Agent / Consultant"]

  subgraph ClientA["Client A"]
    CA1["CrowdStrike"]
    CA2["Vanta"]
    CA3["Okta"]
  end

  subgraph ClientB["Client B"]
    CB1["SentinelOne"]
    CB2["Drata"]
    CB3["Entra ID"]
  end

  subgraph ClientC["Client C"]
    CC1["Defender"]
    CC2["Secureframe"]
    CC3["Authentik"]
  end

  C -->|"1 tool per vendor"| CA1
  C --> CA2
  C --> CA3
  C --> CB1
  C --> CB2
  C --> CB3
  C --> CC1
  C --> CC2
  C --> CC3

  style ClientA fill:none,stroke:#555
  style ClientB fill:none,stroke:#555
  style ClientC fill:none,stroke:#555
  style C fill:#1a2a1a,stroke:#50c878,color:#50c878
  style CA1 fill:#1a1a2a,stroke:#666,color:#aaa
  style CA2 fill:#1a1a2a,stroke:#666,color:#aaa
  style CA3 fill:#1a1a2a,stroke:#666,color:#aaa
  style CB1 fill:#1a1a2a,stroke:#666,color:#aaa
  style CB2 fill:#1a1a2a,stroke:#666,color:#aaa
  style CB3 fill:#1a1a2a,stroke:#666,color:#aaa
  style CC1 fill:#1a1a2a,stroke:#666,color:#aaa
  style CC2 fill:#1a1a2a,stroke:#666,color:#aaa
  style CC3 fill:#1a1a2a,stroke:#666,color:#aaa

With per-vendor servers, every new client means provisioning a new set of binaries, managing a new set of credentials, and watching your context window budget evaporate. With action-routed multi-product servers, you add one tool per vendor per domain. The context cost stays flat. The operational cost stays flat. You scale with the number of domains you operate in, not the number of vendors you connect to.

This is especially important for GRC workflows. A compliance assessment across Vanta, Drata, or Secureframe involves pulling controls, evidence, tests, and frameworks. Each platform has 30-50 API endpoints. Action routing collapses those into one tool per platform. The LLM reasons about controls.list, evidence.get, tests.run — the same domain language regardless of which GRC platform the client uses.

Why This Matters

The MCP ecosystem is young. Right now, the pattern is one server per vendor. That works when you're connecting one tool. It breaks when you're connecting ten.

Every tool in the system prompt costs tokens. Every token costs inference time. Every inference costs money. And every separate binary costs operational overhead — builds, deploys, monitoring, credential rotation, version tracking.

Action routing collapses all of that. One binary per domain. One tool per vendor. Actions routed by domain. Context window usage drops 75% with three vendors and over 90% with ten. Operational overhead scales with domains, not vendors. And adding a new vendor becomes an afternoon of porting handler code, not a month of scaffolding a new repository.

We built this because we had to — our work connects to every vendor in a client's stack, and running fifteen separate MCP servers wasn't viable. The architecture emerged from the constraint. Now it's the foundation for everything we're building next.

Want to see it in action?

Book a discovery call and we'll walk you through the architecture, show you a live demo, and discuss how it fits your stack.

Book a discovery call → View all solutions