Customer Support Agent

A customer-facing chatbot that combines a multi-turn agent loop with agentic RAG over a knowledge base and schema-constrained business API tools for real-world actions like refunds, order modifications, and account changes. Human-in-the-loop escalation routes complex or sensitive cases to human agents.

Details

This architecture extends the Enterprise RAG Chatbot pattern with an agent loop and transactional tools. The Enterprise RAG Chatbot has no tool access or agent behavior - its blast radius is limited to text output quality. Adding an agent loop with business API tools means a successful attack can trigger real-world financial and operational consequences, fundamentally changing the trust model.

Customer support agents illustrate elements of the agent-native application pattern: schema-constrained API tools serve as atomic primitives, and the agent loop handles routing and edge cases rather than hard-coded branching logic. Observing what customers ask the agent to do - and where it fails or escalates - is a form of latent demand discovery, revealing unmet support needs and missing knowledge base content.

Capabilities

Trust analysis

Four input surfaces feed the agent's context: system instructions (developer-controlled), conversation history (customer-supplied, untrusted), retrieved knowledge base content (semi-trusted internal content maintained by the organization), and customer data from CRM/backend lookups (sensitive but necessary for task completion). The conversation history and retrieved content are the primary prompt injection surfaces, while customer data from backend systems introduces PII that must not leak into responses or tool calls directed at the wrong account.

Business API tools are schema-constrained by function calling definitions - the agent can only invoke predefined operations with typed arguments, unlike code execution where generated code can do anything a sandbox permits. The schema constrains the shape of actions but not their correctness: the agent can call a valid refund endpoint with the wrong amount, process a return on the wrong order, or modify an account based on manipulated context. The tools operate on real money and customer data, making every incorrect tool call a potential financial or operational incident.

Human-in-the-loop functions as a tiered escalation boundary rather than per-action approval. Routine queries and low-value actions proceed autonomously, while complex cases, high-value transactions, or low-confidence situations route to human agents. The escalation decision is itself a model judgment call - an overconfident agent may fail to escalate cases that require human judgment, and prompt injection or goal manipulation can suppress escalation to keep the conversation under attacker influence.

The system typically serves many customers through shared infrastructure. Context isolation between sessions must prevent one customer's data from leaking into another's context. The knowledge base is a shared resource: anyone who can write to the corpus can influence how the agent handles all customer interactions when those documents are retrieved.

Interaction effects

  • Agent loop + transactional tools + prompt injection: Unlike RAG-only systems where compromise affects text output, a successful prompt injection here can trigger real-world actions - processing fraudulent refunds, modifying accounts, or exfiltrating customer data through tool calls. The blast radius extends from text quality to financial and operational consequences.
  • Agentic RAG + tools + policy enforcement: The agent retrieves policy documents and uses them to make authorization decisions (e.g., refund eligibility, discount limits). If the knowledge base contains incorrect or manipulated policy content, the agent may authorize actions beyond intended limits - a context poisoning vector with direct financial impact.
  • Customer data in context + multi-turn accumulation: Customer PII enters context from CRM lookups and conversation, accumulating across turns. Each tool result adds more sensitive data (order details, payment information, account history). A successful prompt injection late in a conversation has access to all previously loaded PII, amplifying the data exfiltration risk.
  • Human escalation + agent confidence: The model decides when to escalate to human agents. Goal manipulation or prompt injection can suppress escalation, keeping the conversation under attacker influence when it should have been routed to a human. Conversely, misaligned model behaviors like sycophancy can lead the agent to grant unauthorized concessions rather than escalating.
  • Multi-tenant + shared knowledge base: Multiple customers interact with the same system and knowledge base. Insufficient session isolation creates cross-tenant data leakage risk, while the shared corpus creates a single context poisoning surface that affects all customers when manipulated documents are retrieved.

Threats

Threat Relevance Note
Prompt injection Primary Customer messages, knowledge base docs, and CRM data can direct transactional tools
Context poisoning Primary Poisoned knowledge base documents have direct financial impact through tool-mediated actions
Tool misuse Primary Wrong refund amounts, wrong account modifications; real financial consequences
Data exfiltration Elevated Customer PII accumulates across turns from CRM lookups and conversation
Cross-tenant / cross-session data leakage Elevated Multi-tenant shared infrastructure; customer data isolation failures
Goal manipulation Elevated Suppressing escalation, redirecting toward unauthorized refunds or account changes
Tool output poisoning Elevated Corrupted CRM or backend data hijacks subsequent reasoning
Misaligned model behaviors Elevated Sycophancy leads to unauthorized concessions; skipping verification before transactions
Hallucination exploitation Elevated Incorrect policy interpretations cause inappropriate transactional actions
Guardrail bypass Elevated Circumventing action authorization, PII filters; multi-turn jailbreaking
Denial of service Elevated Tool-call loops exhaust backend resources or inflate inference costs
System prompt extraction Standard Revealing tool schemas or internal policy rules
User manipulation Standard Customer trust in retrieval-grounded responses
Training data poisoning Standard Baseline risk, no architecture-specific amplifier

Examples

  • An e-commerce support agent that looks up orders, processes returns and refunds, and retrieves from a help center knowledge base.
  • A banking support agent that checks account balances, initiates disputes, and answers policy questions from internal documentation.
  • A telecom support agent that modifies plans, troubleshoots service issues using diagnostic tools, and escalates to human agents for complex billing disputes.