Blog

Inside Spectre: A Conversation with Mav Full-Stack Developer Ilya Klapatok

When we first announced Spectre in October 2024, we set out to solve a problem that was slowing down our own work at Mav: integrating AI into Ruby on Rails felt messy and fragmented. The existing tools weren't Rails-like. They weren't clean. And for developers who just wanted to build, they added too much friction.

Since then, Spectre has evolved quickly. We've added support for Claude and Gemini alongside OpenAI and Ollama. We've built out RAG capabilities with MongoDB. We've refined prompt templates, improved error handling, and normalized the differences between providers so you can swap between them without rewriting your application logic.

Evaluating LLM Choices

But there's a lot that goes into making a tool like this actually work. Deciding what to abstract and what to leave exposed. Figuring out tricky bugs that eat up entire afternoons. Keeping things simple even when you're juggling different AI providers.

We sat down with Ilya, a full-stack developer at Mav and one of the core contributors to Spectre, to talk through how it all comes together: What makes multi-provider support so complex, how we handle prompts at scale, why we went with MongoDB for RAG, and where Spectre's headed next.

This is a behind-the-scenes conversation about the engineering decisions and lessons we've learned building an open-source tool for the Rails community.

If you're a Rails developer curious about how AI integration actually works under the hood, or if you're already using Spectre and want to know more about what's possible, this conversation is for you.

You can find Spectre on GitHub, and we'd love to hear how you're using it.


What problem does Spectre really solve? Why build another AI gem?

Spectre turns a messy pile of AI “remotes” into one universal remote for Rails. Each provider speaks a different dialect, has different shapes of APIs, and ships features with slightly different rules. Without a unifying layer, your app ends up with branching logic everywhere — different request payloads, different error codes, different token counters, and different streaming semantics. Spectre abstracts that stuff behind one clean interface so Rails developers can focus on product, not plumbing.

You often compare Spectre to a translator or a bridge. What’s behind that metaphor?

Think of four islands: OpenAI, Anthropic (Claude), Google (Gemini), and local models via Ollama. Each has unique customs — message formats, tool-calling schemas, function/JSON modes, and token limits. Spectre is the bridge that lets you walk across with the same suitcase: same Ruby code, same prompt templates, same error handling. Where providers differ, Spectre adapts, translating intentions rather than just copying words.

For non-technical folks: what’s the biggest challenge in supporting multiple AI providers?

It’s like teaching one universal remote to talk to TVs, consoles, and soundbars that all use different infrared codes. In practice, that means:

  • Payload translation: One provider wants messages with roles, another wants a prompt string, a third prefers a tool-call schema.

  • Capability gaps: Some models support embeddings, some don’t. Some stream reliably, others batch only.

  • Limits and pricing: Context sizes, rate limits, and token accounting vary.

Spectre hides those differences so you can swap providers without rewriting business logic.

Claude doesn’t support embeddings. How do you handle features that some models just can’t do?

Spectre separates “who generates text” from “who generates embeddings.” If the chosen chat model can’t embed, Spectre uses an embedding-capable provider under the hood (e.g., OpenAI or a local embed model via Ollama). It’s an outlet adapter: the appliance remains the same; we pick the right plug so it works in that wall.

Can you give an example of something that works very differently between OpenAI and Claude, and how Spectre normalizes it?

  • Message formats: OpenAI often prefers role-based messages (system, user, assistant), while Claude puts more weight on a system-like preamble and uses slightly different rules for tool calls. Spectre accepts a single, consistent messages array and restructures it per provider.

  • JSON mode: OpenAI has a strict JSON response mode; Claude needs careful prompting for JSON fidelity. Spectre offers a uniform “structured output” helper and adds provider-specific guardrails behind the scenes.

The result: your code calls one completions.create(...); Spectre adapts the wiring per vendor.

What’s special about Spectre’s prompt templates?

They live like Rails views. Instead of sprinkling prompt strings across controllers and services, you store templates as files, render them with locals, and can reuse partials. Highlights:

  • Files on disk (similar to app/views), versionable, reviewable, and testable

  • ERB-style interpolation and partials to keep prompts dry and composable

  • Environment-aware variants (e.g., safer, smaller prompts in development)

  • Helpers for system vs. user content, role separation, and reusable sections

Prompt templates become first-class artifacts — you can code-review them, run tests that render them, and roll back like you would a view.

How does Spectre’s RAG (Retrieval-Augmented Generation) with MongoDB work?

Spectre lets you turn unstructured data (docs, tickets, wikis) into a searchable knowledge base:

  • Chunk: Break documents into semantically meaningful pieces.

  • Embed: Generate vectors using a supported embedding model.

  • Store: Save chunks + vectors + metadata in MongoDB.

  • Retrieve: On a question, embed the query, run vector similarity (plus optional metadata filters), and feed the top matches into the prompt.

To the app, it feels like a clean “ask the knowledge base” API. Behind the scenes, Spectre coordinates chunking, embeddings, indexing strategies, and relevance scoring.

What did it take to evolve Spectre across releases? What was the hardest part?

The hardest part is resisting leaky abstractions. Each provider is tempting you to pass through one more vendor-specific knob. We held the line by:

  • Defining a narrow core interface for chat, embeddings, and moderation

  • Normalizing error classes and timeouts

  • Agreeing on a canonical message structure and token-count semantics

Any time we broke our own rules, it multiplied complexity. The craft was saying “no” until we could generalize cleanly.

You mention improved error handling and edge case fixes. Can you share a tricky bug?

A fun one: different providers report Retry-After differently — some in seconds, some effectively in milliseconds, and some omit it. We once over-waited by 1,000x after a rate limit burst. Spectre’s fix:

  • Parse header robustly with units fallback

  • Add jittered exponential backoff

  • Normalize rate limit errors to a single exception with a suggested retry window

It sounds small, but it made bursty production traffic stable across vendors.

How do you test something that spans four providers?

  • Contract tests: Every provider adapter must pass the same spec suite (same inputs, same high-level outputs). If it quacks like a Completion, it ships.

  • Deterministic runs: Seeded randomness and temperature controls for reproducible outputs where possible.

  • Recorded sessions: Fixture-style “cassettes” for integration tests to avoid flakiness and cost.

  • Chaos + limits: Simulate timeouts, partial streaming, truncated contexts, and rate limit storms to verify retries and fallbacks.

  • RAG correctness: Golden-answer tests with curated corpora to validate retrieval + prompting pipelines.

How do you keep prompts maintainable at scale?

  • Treat prompts like views: code reviews, linting for obvious pitfalls (e.g., ambiguous instructions)

  • Versioning and changelogs for important prompt shifts

  • Extract partials for reusable patterns (disclaimers, safety rails)

  • Store domain context in RAG rather than ballooning prompts

  • Telemetry: log template + model + token usage to track regressions and cost

What does multi-model support unlock for Rails developers?

  • Cost/performance tuning per task: Use a small, fast model for routing and a larger model for reasoning.

  • Latency hedging: Fail over to a second provider on outages.

  • Data residency and privacy: Local models via Ollama where cloud use is restricted.

  • Feature fit: If a provider lacks embeddings or function calling, pair it with another that does — without rewriting app logic.

Paint a real-world use case that benefits from RAG on MongoDB.

Imagine a support portal. Index your product docs and past tickets into MongoDB as vectors. When a user asks, Spectre retrieves the top relevant snippets and drafts an answer citing exact passages. If the cloud is down, switch to a local LLM for drafting; keep embeddings with a cloud model or local embedder. Same Rails code, different power outlets.

Talking to a dev who’s hesitant to add AI because it seems complex — what do you tell them?

Start with one controller action and one prompt template. Spectre gives you:

  • A single API surface for four different ecosystems

  • Prompt files you can review like views

  • Built-in patterns for RAG without standing up extra infra

  • Clear error handling and sane defaults

The learning curve is “Rails-y”: a few conventions, then you’re productive.

Garrett Wilson

Garrett Wilson

© 2026 Mav Automation Ventures Inc. All rights reserved.