The Problem
A mid-market company needs new project management software. Someone Googles it, opens 15 tabs, closes 12 because pricing is gated. A spreadsheet appears with 40 columns. Stakeholders pile on. Nobody agrees on weights. Vendor demos eat weeks. An analyst report costs $5,000+ and is 6 months stale.
After 2-4 months, the team picks the option with the best demo or the most internal champions. The spreadsheet is abandoned. Total confidence: low.
For M&A analysts, multiply that by 10x. Evaluating a target company's product means reading 50-100 pages of documentation per vendor, manually extracting features into a comparison matrix, cross-referencing reviews and forums. 200-500 hours of analyst time per evaluation. Every data point manually sourced. Every comparison manually maintained.
Architecture
The platform spans 4 repositories: a Next.js 15 frontend with 175 components, an Express.js backend with 90+ endpoints across 115 handler classes, a PostgreSQL database with pgvector (119 migrations), and a jobs service for background processing.
Key architectural rules: dependency injection everywhere with zero global singletons. Every handler, service, and worker receives typed dependencies via constructor. One handler per file. A 5,888-line OpenAPI 3.1 spec serves as the single source of truth for frontend types.
The API follows Stripe conventions: header-based versioning, idempotency keys on mutating endpoints, tiered rate limiting (20 req/min for AI chat, 100 for products, 200 for analytics), and structured error responses with documentation URLs. The frontend never touches the database directly.
The Algorithm
The core is a 5-dimension weighted scoring algorithm. Final Score = (Features x 0.30) + (Tags x 0.25) + (Spec x 0.25) + (Context x 0.10) + (Base x 0.10).
Feature Match (30%)
Must-have features weighted heavily (+100 per match). Nice-to-have features as bonus (max +20). Deal-breakers apply a -30 penalty each. Fuzzy matching with Levenshtein similarity >0.8 catches near-matches that exact comparison misses.
Tag Match (25%)
Jaccard similarity (intersection / union) between profile tags and product tags. Returns 0-100 based on overlap.
Spec Match (25%)
Maps AI-detected specs from free-text to structured data in the database. Supports 5 data types. Each spec carries a confidence score (0.0-1.0) and an evidence chain linking back to source documentation.
Context Match (10%)
Keyword extraction from use-case descriptions with 90+ stop-word removal. Matches against product descriptions to catch intent signals that structured fields miss.
Base Score (10%)
Data quality signal. Products with logos, descriptions, websites, and feature lists score higher. Baseline 50, capped at 100.
A 4-level confidence system (Getting Started through Strong Recommendations) weighs profile completeness, conversation depth, evidence quality, and algorithm performance. Every match score includes a human-readable explanation from a dedicated ScoreExplainerService.
Scout AI
Scout is a conversational AI copilot backed by the scoring algorithm. Not a chatbot. A context-enriched assistant that guides users through decision-making across 4 modes: General (technical keyword auto-detection), Build Profile (guided extraction of 7 decision fields), Deep Dive (single-product analysis with score explanations), and Compare (side-by-side of 2-4 products).
A 5-stage context enrichment pipeline processes every message: technical keyword detection, product name extraction, context enrichment (fetch product details and URLs), intent detection (auto-select mode), and AI generation with enriched context. Token budgets scale by mode from 300 (simple general) to 2,000 (multi-product comparison).
Pipelines
Three BullMQ workers run continuously: a matching worker (concurrency 3) that calculates scores on profile changes, an analytics worker for click and conversion tracking, and a document processing worker that chunks uploads into 500-token segments with 768-dim Gemini embeddings stored in pgvector.
A 2-pass spec research pipeline discovers and extracts product data. Pass 1 (VALUE) crawls vendor documentation via Gemini with Google Search, extracting structured specs with confidence scores and source URLs. Pass 2 (SENTIMENT) crawls review sites for user sentiment scores (-1 to +1). Rate-limited at 3s between products, ~$0.02-0.05 per product.
The database uses 6 HNSW-indexed embedding tables for hybrid search: vector cosine similarity (70%) + BM25 keyword matching (30%) via PostgreSQL RPC. Component-level research breaks products into tiers (Free/Pro/Enterprise) with per-tier specs, pricing models, and dependency graphs.
The Takeaway
1. Evidence chains, not claims.
Every spec value links back to its source URL, carries a confidence score, and tracks which research version produced it. This is a structured knowledge graph with provenance, not a spreadsheet with checkmarks.
2. Component-level granularity.
Most comparison tools say "Product X has SSO." This platform says "Product X has SSO on the Enterprise tier, not on Pro or Free, with SAML 2.0 support, confidence 0.92, sourced from their official security documentation."
3. Confidence is computed, not assumed.
The 4-level system weighs profile completeness, conversation depth, evidence quality, and algorithm performance. The system tells you when it does not know enough.
4. The algorithm is explainable.
Every match score decomposes into 5 weighted dimensions. Every dimension shows what matched, what is missing, and what penalized. Not a black box returning a number.
Ready to be our next case study?
Book a call to discuss your workflows. We'll show you exactly what we can build.
Get in touch