Building AI Agents That Actually Work in Business Workflows

The Agent Hype vs. Agent Reality

Everyone is talking about AI agents. The pitch is compelling: autonomous software that reasons, plans, and executes multi-step tasks without human intervention. The reality is more nuanced. Most businesses do not need a fully autonomous agent. They need targeted automation with intelligence at specific decision points.

I run Syntora, a consultancy that builds custom AI automation for small and mid-size businesses. Over the past year, I have built and deployed production agents using the Claude API from Anthropic. Some worked exactly as planned. Others taught me expensive lessons about where agents break down.

This is what I have learned about which patterns actually work and which ones will burn your budget before delivering value.

Can an Agentic AI Handle Complex Decision-Making in Business Workflows?

Yes, but only within well-defined boundaries. The word "complex" is doing a lot of heavy lifting in that question. An agent can handle complex classification, complex extraction, and complex generation. It cannot reliably handle complex judgment calls that require institutional knowledge, regulatory awareness, or relationship context.

The distinction matters. A document classification agent that routes incoming contracts to the right department based on content analysis? That works. An agent that decides whether to accept a contract's terms? That fails in ways that cost real money.

Here are the three patterns I keep coming back to because they deliver consistent results in production.

Three Patterns That Work

1. Extraction Agents: Unstructured to Structured

The most reliable agent pattern takes messy, unstructured input and converts it into clean, structured output. Think invoices, contracts, emails, PDFs, and support tickets. Humans send information in wildly inconsistent formats. An extraction agent normalizes all of it.

This pattern works because the success criteria is clear. Either the extracted data matches the source document or it does not. There is no ambiguity, no judgment call, no creative interpretation needed.

Here is a simplified extraction agent using the Anthropic SDK:

import anthropic
import json

client = anthropic.Anthropic()

def extract_invoice_data(raw_text: str) -> dict:
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""Extract the following fields from this invoice text.
Return valid JSON only. No explanation.

Fields: vendor_name, invoice_number, date, line_items (array of description, quantity, unit_price, total), subtotal, tax, grand_total

Invoice text:
{raw_text}"""
            }
        ]
    )
    return json.loads(message.content[0].text)

Sonnet handles this perfectly. I have never needed to use Opus for extraction tasks. The cost difference matters at scale: processing 10,000 invoices per month on Opus would cost roughly 5x more than Sonnet with no measurable improvement in extraction accuracy.

2. Classification Agents: Routing and Triage

Classification agents read incoming content and decide where it should go. Support ticket triage, lead qualification, document routing, email categorization. The agent acts as an intelligent switchboard.

This pattern works because the output space is constrained. The agent picks from a finite set of categories. Even when the input is ambiguous, the worst case is a misroute that a human catches and corrects, not a catastrophic failure.

I will walk through a full example of this pattern below.

3. Generation Agents: Content and Proposals

Generation agents produce first drafts of structured content. Proposal templates, report summaries, email responses, status updates. The key word is "first draft." These agents do not ship content directly to customers. They produce output that a human reviews, edits, and approves.

This pattern works because it eliminates the blank page problem. A business development team that spends 45 minutes writing each proposal from scratch can instead spend 10 minutes editing an agent-generated draft. The agent handles structure and boilerplate. The human adds nuance and relationship context.

If you are evaluating where AI agents could fit in your workflow, we can help you identify the right use cases.

Three Patterns That Fail

1. Fully Autonomous Customer-Facing Agents

What worries you most about AI agents interacting directly with customers? It should be the failure modes. When an extraction agent makes a mistake, a human catches it during review. When a customer-facing agent makes a mistake, the customer experiences it in real time.

I have seen businesses deploy chatbots that can issue refunds, modify accounts, and make promises. The bot works correctly 95% of the time. The other 5% generates support escalations, social media complaints, and lost customers. A 95% accuracy rate sounds good until you realize that means 1 in 20 customers has a bad experience with no human to intervene.

The fix is not to avoid customer-facing AI entirely. It is to constrain the agent's authority. Let it answer questions, surface relevant information, and draft responses. Do not let it take irreversible actions without human approval.

2. Agents With Unbounded Tool Access

Giving an agent access to your database, your email system, your CRM, and your file storage feels powerful. It is also dangerous. An agent with broad tool access can take actions you did not anticipate in contexts you did not test.

Every tool an agent can access is a surface area for failure. I scope tool access to the minimum required for the task. A classification agent needs read access to incoming documents and write access to a routing queue. It does not need access to your CRM, your email, or your payment system.

3. Agents Without Human-in-the-Loop for High-Stakes Decisions

Any decision that involves money, legal exposure, or customer relationships needs a human checkpoint. Full stop. I build agents that recommend, draft, and classify. I do not build agents that approve, commit, or send without review.

The human-in-the-loop is not a weakness of the system. It is a feature. The agent handles the 80% of the work that is mechanical and repetitive. The human handles the 20% that requires judgment, context, and accountability.

Practical Example: Document Classification Agent

Here is a production-style document classification agent that routes incoming PDFs. This is the pattern I deploy most frequently because it solves a real problem at every business that processes inbound documents.

import anthropic
import json
from dataclasses import dataclass

client = anthropic.Anthropic()

CATEGORIES = {
    "contract": {
        "route_to": "legal",
        "priority": "high",
        "sla_hours": 24
    },
    "invoice": {
        "route_to": "accounting",
        "priority": "medium",
        "sla_hours": 48
    },
    "proposal": {
        "route_to": "business_dev",
        "priority": "medium",
        "sla_hours": 48
    },
    "support_request": {
        "route_to": "customer_success",
        "priority": "high",
        "sla_hours": 4
    },
    "general_correspondence": {
        "route_to": "admin",
        "priority": "low",
        "sla_hours": 72
    }
}

@dataclass
class ClassificationResult:
    category: str
    confidence: float
    route_to: str
    priority: str
    sla_hours: int
    summary: str

def classify_document(document_text: str) -> ClassificationResult:
    category_list = ", ".join(CATEGORIES.keys())

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=512,
        messages=[
            {
                "role": "user",
                "content": f"""Classify this document into exactly one category.

Categories: {category_list}

Return valid JSON with these fields:
- category: one of the categories above
- confidence: float between 0 and 1
- summary: one sentence describing the document

Document text:
{document_text}"""
            }
        ]
    )

    result = json.loads(message.content[0].text)
    routing = CATEGORIES[result["category"]]

    return ClassificationResult(
        category=result["category"],
        confidence=result["confidence"],
        route_to=routing["route_to"],
        priority=routing["priority"],
        sla_hours=routing["sla_hours"],
        summary=result["summary"]
    )

def process_incoming_document(document_text: str) -> dict:
    result = classify_document(document_text)

    if result.confidence < 0.7:
        return {
            "action": "manual_review",
            "reason": "Low confidence classification",
            "result": result
        }

    return {
        "action": "auto_route",
        "destination": result.route_to,
        "priority": result.priority,
        "sla_hours": result.sla_hours,
        "result": result
    }

Notice the confidence threshold. When the agent is not sure, it flags the document for manual review instead of guessing. That one check prevents the majority of misrouting errors.

Cost Control: How to Not Burn Money

AI agent costs come from three places: API calls, token usage, and iteration loops. Here is how I manage each.

Model selection. Sonnet handles 90% of agent tasks. I only use Opus for tasks that require deep reasoning across long documents or nuanced multi-step analysis. For a typical classification or extraction agent, Sonnet is faster, cheaper, and equally accurate.

Token budgets. Every agent gets a max_tokens ceiling. If your extraction agent is returning 4,000 tokens when the structured output should be 500, something is wrong. Set the ceiling close to your expected output size. This prevents runaway costs from malformed prompts or unexpected inputs.

Batch processing. When you are processing hundreds of documents, use the Anthropic batch API instead of individual requests. The cost savings and rate limit benefits add up fast.

Here is a rough cost comparison for processing 10,000 documents per month:

| Approach | Estimated Monthly Cost | |----------|----------------------| | Opus, individual requests | $150-300 | | Sonnet, individual requests | $30-60 | | Sonnet, batch API | $15-30 |

The difference between the cheapest and most expensive approach is 10x. For most classification and extraction tasks, the cheapest option delivers identical results.

Scoping an Agent Project

When a client asks me to build an AI agent, I start with three questions:

What is the input? If the input is unstructured text, images, or documents, an agent can help. If the input is already structured data, you probably need a regular script, not an agent.
What is the output? If the output is a classification, a structured data object, or a first draft, an agent fits. If the output is a final decision with no human review, proceed with extreme caution.
What happens when it is wrong? If a mistake means a document goes to the wrong queue and someone re-routes it, the risk is low. If a mistake means a customer gets the wrong information or money moves to the wrong account, you need human-in-the-loop at minimum and possibly a different approach entirely.

These three questions filter out 80% of bad agent ideas before any code gets written.

The Bottom Line

AI agents work in business when they are scoped tightly, given constrained tool access, and paired with human oversight for high-stakes decisions. The three patterns that consistently deliver value are extraction, classification, and generation. The pattern that consistently fails is full autonomy without guardrails.

Build agents that make your team faster, not agents that replace your team's judgment.

I'm Parker Gawne, founder of Syntora. We build custom Python infrastructure for small and mid-size businesses. syntora.io