Stop Running Slow Local Scripts. Deploy Production Claude Apps.
Local Claude scripts run slow due to network latency and single-threaded API calls. Production systems use asynchronous batching and cloud functions for parallel processing.
Syntora addresses slow local LLM performance with Claude API by building custom serverless architectures. This approach leverages asynchronous batching and cloud functions to parallelize processing, ensuring scalability and efficiency. Syntora helps clients develop robust, cost-effective solutions for document processing by re-engineering local scripts for a production environment.
While local development is excellent for testing and iteration, it lacks the characteristics required for a production environment. A production-ready system must effectively handle concurrent requests, implement intelligent retry mechanisms, cache frequently accessed results, and scale without requiring manual intervention. This necessitates moving core application logic from a single script to a robust, serverless architecture.
Syntora designs and builds custom serverless architectures tailored to specific document processing needs. We would start by auditing your existing local script and understanding your document types, volume, and latency requirements. The scope of an engagement depends on factors such as the complexity of your current logic, the desired processing throughput, and integration points with existing systems. A typical build for this type of system could take 6-10 weeks, and the client would provide access to their current scripts, sample documents, and relevant API keys.
What Problem Does This Solve?
Most developers start with Anthropic's Python library and a simple script. This works for a single file, but it fails at scale because API calls are synchronous. A script analyzing 100 documents makes 100 separate requests, waiting for each one to finish before starting the next. An 8-second API response time means the script takes over 13 minutes to complete, tying up the machine.
A common next step is to run the script on a cloud server, like a basic DigitalOcean droplet or an AWS EC2 instance. This removes the dependency on a local machine but introduces new problems. Trying to parallelize the script with `multiprocessing` can saturate the server's single network connection, hitting API rate limits from one IP address. This causes the API to throttle requests, leading to unpredictable failures and incomplete results.
We saw this with a product team analyzing 500 Intercom conversations for sentiment. Their script took 45 minutes to run locally. When moved to an EC2 instance, it crashed after 20 minutes due to API rate limiting. They were left with a partial dataset and no clear way to resume the job without reprocessing everything, doubling their API costs.
How Would Syntora Approach This?
Syntora's approach involves re-engineering your core processing logic for a serverless environment optimized for the Claude API. The first step in an engagement would be a discovery phase, where we would analyze your current local script, identify its core functions, and define the specific data flow and error handling requirements.
For document ingestion and batch processing, the architecture would typically utilize an S3 bucket for storage and an SQS queue to manage individual processing tasks. When new files are uploaded to the S3 bucket, messages are placed onto the SQS queue. This queue can then trigger multiple concurrent AWS Lambda invocations, immediately parallelizing the workload to accelerate processing of large document batches.
The AWS Lambda function itself would be a Python service, using httpx for efficient, asynchronous API calls to the Claude API. We would implement intelligent batching strategies, grouping multiple smaller documents into a single API call when feasible, to maximize context window utilization and optimize API costs.
Resulting data is parsed into a guaranteed structure using Pydantic schemas and Claude's tool-use feature, ensuring data integrity and usability. The structured results are then written to a Supabase Postgres database for persistent storage, querying, and retrieval. For frequently accessed or identical documents, a caching layer is implemented at the database level; if the same document is submitted, the system can return the cached result from Supabase without incurring a new API call.
The entire infrastructure would be defined using the Serverless Framework. This ensures repeatable deployments, version control, and simplifies future maintenance and scaling. The delivered system would be a fully operational, custom serverless application designed to meet your specific throughput and latency goals, operating on a pay-per-use cloud model.
What Are the Key Benefits?
From Local Script to Live API in 2 Weeks
We convert your proof-of-concept into a production-ready system in 10 business days. No need to spend months learning DevOps and cloud architecture.
Pay Per Millisecond, Not Per Server
AWS Lambda's cost model means you only pay for compute time when the code is running. Idle time costs nothing, unlike an always-on EC2 instance.
You Get the Infrastructure-as-Code Repo
We deliver the complete Serverless Framework configuration and Python source code in your private GitHub repository. You own every line of code.
Alerts on Spikes in Cost or Errors
We configure AWS CloudWatch alarms to send a Slack notification if your error rates exceed 2% or if monthly costs are projected to exceed a set budget.
Connects to Any Data Source via S3
The system triggers on file uploads to an S3 bucket. You can connect any service, from HubSpot to a custom internal tool, by having it export data to S3.
What Does the Process Look Like?
Code Review and Scoping (Week 1)
You provide the existing Python script and grant read-only access to your AWS account. We audit the logic and deliver a fixed-scope proposal with a detailed architecture diagram.
Infrastructure Build (Week 1)
We build the core infrastructure using the Serverless Framework. You receive access to a private GitHub repository to see the deployment configuration as it's built.
Logic Migration and Testing (Week 2)
We port your script's logic into the Lambda function, add error handling, and deploy to a staging environment. You receive a test endpoint to validate results against your script.
Production Handoff and Monitoring (Week 3)
We deploy to production, connect it to your live data source, and monitor for two weeks. You receive a runbook detailing how to deploy updates and check logs.
Frequently Asked Questions
- How much does this type of production wrapper cost?
- Pricing depends on the complexity of the workflow. A single-step summarization task is straightforward. A multi-step process involving data extraction, enrichment, and summarization requires more engineering. After a 30-minute discovery call to review your existing script and goals, we provide a fixed-price proposal. Book a discovery call at cal.com/syntora/discover.
- What happens if the Claude API is down or returns an error?
- The SQS queue is configured with a dead-letter queue (DLQ). If a Lambda function fails after 3 automatic retries (e.g., due to a temporary API outage), the original message is moved to the DLQ. This prevents data loss and allows us to inspect and re-process failed jobs manually once the API is back online.
- Why not just use a managed service like AWS Bedrock?
- Bedrock provides managed access to the LLM endpoint, which solves some scaling issues, but it is not an application framework. We build the application layer on top: the caching, logging, cost tracking, input validation, and complex orchestration logic that a real business process requires. We often use Bedrock as the underlying inference engine.
- How do you manage API keys and secrets securely?
- We use AWS Secrets Manager to store the Anthropic API key and any other credentials. The Lambda function is granted a specific IAM role that allows it to retrieve the key at runtime. Keys are never hardcoded in the source code or stored in environment variables, following AWS security best practices.
- Can this system handle real-time requests, not just batch processing?
- Yes. For real-time use cases, we connect the Lambda function to an Amazon API Gateway endpoint. This provides a secure, public-facing REST API that your web or mobile application can call for synchronous responses. A typical cached response is returned in under 100ms; a non-cached API call to Claude returns in 2-5 seconds.
- What version of Claude do you build on?
- We build on the latest stable model with the best cost-performance ratio, which is currently Claude 3.5 Sonnet. The system is designed to be model-agnostic. The model name is stored in a configuration file, so switching to Opus, Haiku, or a future model requires changing only one line of code and redeploying.
Ready to Automate Your Technology Operations?
Book a call to discuss how we can implement ai automation for your technology business.
Book a Call