LLM Classification Endpoint

A single LLM inference call where the application assembles a prompt, sends it to the model, and consumes the response directly - no tools, no loop, no agent behavior.

Capabilities

Trust analysis

This is the simplest AI integration topology. The application developer controls the full context boundary: what enters the prompt and how the output is consumed. All intelligence comes from the model's weights and the quality of the assembled context. There is no tool access, no persistent state, and no ability to take actions beyond generating text.

The prompt is the only input surface, and the output goes directly to the consuming application or user. When the context includes untrusted input (user-supplied text, retrieved documents, third-party data), that input becomes the primary vector for prompt injection. Structured output constraints limit the response format, reducing the range of harmful outputs but not eliminating hallucination exploitation or guardrail bypass risks.

This is the baseline trust model that all more complex systems inherit. Every additional capability (multi-turn conversation, tools, retrieval, agent loops) adds trust surfaces on top of this foundation.

Interaction effects

Minimal - this is the atomic unit. No capabilities interact because there is only one capability (text generation from a prompt). The trust surface is contained entirely within the prompt/response boundary.

Threats

Threat Relevance Note
Prompt injection Primary Untrusted input in assembled context overrides system instructions
Hallucination exploitation Standard Incorrect classifications, fabricated extractions
Guardrail bypass Standard Circumventing output format or content restrictions
System prompt extraction Standard Revealing instructions instead of producing structured output
User manipulation Standard Classification labels treated as ground truth by downstream systems
Misaligned model behaviors Standard Systematically biased classifications from sycophancy or shortcut-taking
Training data poisoning Standard Systematic misclassification of specific input patterns

Examples

  • A content moderation classifier that labels user-submitted text as safe or unsafe.
  • A summarization endpoint that condenses a document into a brief summary.
  • A translation service that converts text between languages in a single call.
  • An extraction endpoint that pulls structured data from unstructured text.