LLM Classification Endpoint

A single LLM inference call where the application assembles a prompt, sends it to the model, and consumes the response directly - no tools, no loop, no agent behavior.

Capabilities

Single inference call
Structured output constraints
Guardrails (input/output classifiers)
Context engineering (prompt assembly)

Trust analysis

This is the simplest AI integration topology. The application developer controls the full context boundary: what enters the prompt and how the output is consumed. All intelligence comes from the model's weights and the quality of the assembled context. There is no tool access, no persistent state, and no ability to take actions beyond generating text.

The prompt is the only input surface, and the output goes directly to the consuming application or user. When the context includes untrusted input (user-supplied text, retrieved documents, third-party data), that input becomes the primary vector for prompt injection. Structured output constraints limit the response format, reducing the range of harmful outputs but not eliminating hallucination exploitation or guardrail bypass risks.

This is the baseline trust model that all more complex systems inherit. Every additional capability (multi-turn conversation, tools, retrieval, agent loops) adds trust surfaces on top of this foundation.

Interaction effects

Minimal - this is the atomic unit. No capabilities interact because there is only one capability (text generation from a prompt). The trust surface is contained entirely within the prompt/response boundary.

Threats

Threat	Relevance	Note
Prompt injection	Primary	Untrusted input in assembled context overrides system instructions
Hallucination exploitation	Standard	Incorrect classifications, fabricated extractions
Guardrail bypass	Standard	Circumventing output format or content restrictions
System prompt extraction	Standard	Revealing instructions instead of producing structured output
User manipulation	Standard	Classification labels treated as ground truth by downstream systems
Misaligned model behaviors	Standard	Systematically biased classifications from sycophancy or shortcut-taking
Training data poisoning	Standard	Systematic misclassification of specific input patterns

Examples

A content moderation classifier that labels user-submitted text as safe or unsafe.
A summarization endpoint that condenses a document into a brief summary.
A translation service that converts text between languages in a single call.
An extraction endpoint that pulls structured data from unstructured text.