Cait’s Skills Architecture: Why It’s Different

AI insights

May 19, 2026

Ben Levine

Technical Brief is a series from Candidly’s AI product team. Written for builders, evaluators, and anyone who wants to look under the hood, these posts go deeper on the architecture, research, and design decisions behind our AI-native platform.

Written by Ben Levine, Chief Product Officer at Candidly

Most AI assistants work the same way: a large language model, a prompt, and hope. Cait is built differently.

Candidly has developed a proprietary Skills and Executable Scripts architecture — a structured system where Cait’s domain expertise is encoded as verifiable, auditable, human-authored knowledge modules. Rather than relying on a model’s general training to handle PSLF eligibility, IBR calculations, or refinancing decision trees, Cait invokes purpose-built Skills and scripts that encode the exact procedural logic for each financial task.

The research backing this approach is new and specific. SkillsBench (Li et al., 2026), the first rigorous benchmark of Agent Skills, found that human-authored Skills more than doubled task completion rates in Finance — a +121% relative lift, ranking 2nd highest among all domains tested. The domains that benefit most from Skills are those requiring specialized procedural knowledge that general AI models don’t reliably internalize on their own. Financial services is one of them. And the data shows it’s not close: models without Skills frequently fail to recognize they even need domain-specific knowledge on financial tasks, defaulting to general-purpose approaches that miss critical procedural details.

Why this matters for enterprise partners

Accuracy you can verify. Executable scripts produce deterministic outputs. When Cait calculates a monthly payment or compares two loan scenarios, the math is traceable — not a model’s best guess. The benchmark results across financial tasks tell the story:

SEC financial report analysis: Without Skills, pass rate was 0%. With well-authored Skills encoding the right regulatory procedures, it jumped to 75%.
Financial modeling and QA tasks: Models without Skills defaulted to generic spreadsheet operations, missing domain-specific calculation methods. With Skills, they followed the correct procedural steps — the same kind of structured logic Cait applies to loan comparisons, repayment plan analysis, and eligibility determinations.
Overall Finance domain: A +121% relative lift in task completion — ranking 2nd highest out of all 11 domains tested.

That’s the difference between an AI that sounds financial and one that actually performs.

Domain depth that compounds — and that most competitors can’t replicate. The paper found that the average publicly available Skill scores 6.2 out of 12 on a rigorous quality rubric. The Skills driving benchmark results scored 10.1 out of 12. That authoring quality gap is where differentiation lives. Self-generated AI knowledge provides zero average benefit — the lift comes entirely from human-authored, expert-crafted procedural knowledge. Candidly’s Skills library represents years of accumulated financial domain expertise encoded to a standard that the market hasn’t caught up to.

Built to stay on the frontier, regardless of the pace of AI innovation. One of the most important findings in the research: a smaller model with well-authored Skills outperformed a larger, more expensive model without them. For enterprise partners, this means Cait’s accuracy and reliability aren’t hostage to any single model provider or any particular moment in the AI arms race. Candidly’s Skills layer is model-agnostic — as the underlying models improve, our domain expertise compounds on top of an already-strong foundation. You’re not buying access to today’s best model. You’re buying a system that gets better as the field advances.

Efficiency at scale. Skills increased input costs by only 6-13% while delivering the accuracy improvements described above. The ROI is structural: a marginal increase in inference cost in exchange for transformative gains in task completion, compliance confidence, and user trust.

Partner-ready by design. Skills can be authored to reflect your institution’s specific products, policies, and advice frameworks. The details of each of partners’ integrations will differ from each other — and Cait’s architecture accommodates that without rebuilding from scratch each time.

This is how Candidly delivers AI that meets the compliance, accuracy, and trust bar that financial institutions require — not AI that sounds good but can’t be held accountable.

Brief: The “sandbox + skills + scripts” approach

Cait runs inside an isolated sandbox environment — a self-contained workspace where the agent has access to a structured filesystem, a bash shell, and Python execution. At session start, the agent reads its operating context from a set of canonical files:

Soul.md — personality, compliance guardrails, and communication principles (modeled after Anthropic’s constitutional approach)
Agent.md — execution rules, skill routing logic, and a scratchpad the agent updates as it learns during a session
User.json — the user’s full financial profile (read-only), loaded from LangGraph state
skills/ — a library of domain-specific skill directories

This filesystem-first design means Cait doesn’t rely on a massive system prompt stuffed with every possible instruction. Instead, it reads what it needs, when it needs it — keeping the context window focused and the reasoning sharp.

Two layers of domain expertise

Skills (SKILL.md files) are structured, human-authored procedural knowledge for complex financial judgment calls — PSLF eligibility, repayment plan comparison, ESPP sell-vs-hold analysis. The agent reads the relevant skill before responding, so it reasons through the correct procedure rather than improvising from general training.

Executable Scripts (skills/*/scripts/*.py) handle deterministic financial math — monthly payment calculations, tax withholding, contribution limit enforcement. These produce traceable, verifiable outputs, not model-generated approximations.

Why this architecture

Recent research (SkillsBench, Li et al., 2026) validates this approach: human-authored Skills produced a +121% lift in task completion in Finance — 2nd highest of all 11 domains tested. Self-generated AI knowledge showed zero benefit. The gap between average publicly available Skills (6.2/12 quality) and the benchmark-driving Skills (10.1/12) is where differentiation lives.

The sandbox gives the agent a controlled, auditable execution environment. The skills and scripts give it domain depth that compounds over time and travels across models. Together, they deliver an AI that doesn’t just sound financial — it actually performs, with math you can trace and reasoning you can verify.