Baptiste Laget

Founder

Baptiste Laget

Founder

Building Memno: A Technical Deep Dive into Personal AI Infrastructure

Building Memno: A Technical Deep Dive into Personal AI Infrastructure

Jul 4, 2025

Jul 4, 2025

When we set out to build Memno, we faced a fundamental architectural challenge: how do you create an AI assistant that truly knows you without becoming a privacy nightmare?

Most AI systems today follow one of two patterns. They either process everything in a shared environment where your data mingles with everyone else's, or they rely on client-side processing that limits their capabilities. We wanted something different: a system with the power of cloud-scale AI and the privacy guarantees of local-first software.

The answer came from applying the same architectural principles we used with Prompteus to the personal AI space.

The Isolation Principle

At the heart of Memno's architecture lies a simple but powerful idea: every user exists in their own universe.

When you sign up for Memno, we don't just create an account in a database. We spin up an entirely isolated environment dedicated to you. Your documents, your conversations, your vector embeddings: everything lives in storage instances that belong to you alone. No shared tables. No multi-tenant databases. No risk of your query accidentally surfacing someone else's data.

This isolation extends to compute as well. Each user gets dedicated processing resources, complete with:

  • Personal vector indexes

  • Isolated document storage

  • Dedicated database instances for structured data and metadata

  • Compute resources that scale with your usage, not your neighbors'

This isn't just privacy theater. It's privacy by architecture.

Building at the Edge

We built Memno on infrastructure that operates at the edge—close to where you are, not in some distant data center. This gives us two critical advantages:

Speed: When you ask Memno a question, it's searching through your personal knowledge graph on servers near you. Sub-50ms vector searches aren't just possible; they're the norm. Your AI assistant should think as fast as you do.

Scale: Traditional architectures hit scaling walls because they centralize everything. Our approach scales linearly. Whether we have a thousand users or a million, each one gets the same isolated, high-performance experience.

The edge computing model also enables something crucial: true data locality. Your information is processed and stored in your region, adding another layer of privacy protection through geographic isolation.

The Memory Engine

Memno's intelligence comes from what we call the Memory Engine: a system that transforms everything you share into searchable, semantic knowledge.

When you upload a document, take a photo, or record a voice memo, here's what happens behind the scenes:

  1. Intelligent Processing: Documents are converted to structured formats while preserving their semantic meaning. OCR and object detection runs on images. Audio gets transcribed. But we go beyond simple extraction, and we understand context, maintain document structure, and preserve the relationships between ideas.

  1. Semantic Chunking: Rather than naive splitting, we use context-aware algorithms to break content into meaningful segments. A paragraph about quarterly results stays together. A recipe doesn't get split between ingredients and instructions. Each chunk maintains metadata about its source and position, enabling us to reconstruct context when needed.

  1. Vector Embedding: Each chunk gets transformed into high-dimensional vectors that capture meaning, not just keywords. These embeddings power semantic search that understands intent, not just matches keywords.

  1. Knowledge Graph Storage: Everything lands in your personal knowledge graph, where it's instantly searchable and connected to everything else you've shared. The graph structure preserves relationships between documents, temporal connections, and conceptual links that emerge from your data.

The magic is in how these pieces work together. When you ask "What did we decide about the marketing budget?", Memno doesn't just search for those keywords. It understands the semantic relationship between budgets, decisions, and marketing, pulling from meeting notes, emails, and documents to give you a complete answer.

Distributed Security by Design

Security in Memno isn't a feature. It's the foundation everything else is built on.

Our Request Engine, a distributed processing system we pioneered with Prompteus, ensures no single AI provider ever sees your complete data. When Memno needs to understand a document or answer a question, it orchestrates across multiple AI models and providers, each seeing only the minimum context required for their specific task.

This is the same battle-tested approach that secures Prompteus for enterprise customers, now applied to personal AI. The Request Engine works like a sophisticated conductor:

  • Query Analysis: Your question is first analyzed to determine what context is needed

  • Distributed Retrieval: Relevant information is pulled from your isolated storage

  • Fragmented Processing: Different AI providers handle different aspects—one might process intent, another generates the response, a third handles refinement

  • Secure Synthesis: Results are combined in a secure environment before returning to you

It’s something our team learned building infrastructure for healthcare systems in a previous life. Think of it like a surgical team where each specialist only sees what they need to see. The radiologist reads the scan. The anesthesiologist monitors vitals. The surgeon operates. No single person has unnecessary access to everything.

This approach protects against both external threats and the AI providers themselves. Even if a provider wanted to reconstruct your data (which they can't, given our agreements), they'd only have fragments without the context to piece them together.

The Workflow Architecture

Memno inherits and extends the workflow architecture we developed for Prompteus, adapted for personal AI use cases. Our workflow system operates on several key principles:

Event-Driven Processing: When you share a 100-page PDF or forward a long email thread, Memno acknowledges receipt instantly and triggers asynchronous workflows. Each workflow is self-contained and idempotent. If something fails, it can safely retry without duplicating work.

Parallel Execution: Document processing happens in parallel streams:

  • Content extraction runs independently of metadata parsing

  • Vector embedding proceeds as soon as chunks are ready

  • Index updates happen in micro-batches for optimal performance

Intelligent Routing: Workflows dynamically route to appropriate processing resources based on content type and complexity. A simple text note processes differently than a complex PDF with embedded images and tables.

State Management: Every workflow maintains durable state, enabling complex multi-step processes that can pause, resume, and recover from failures. Processing a large document library? Memno remembers exactly where it left off, even across system updates.

These workflows are resilient. If OCR fails on page 47 of your document, the system retries with different parameters. If a vector embedding times out, it queues for reprocessing. Your data gets processed completely and correctly, even if individual steps encounter issues.

Performance at Scale

Here's what surprised us: isolation doesn't just improve privacy, it improves performance.

In traditional multi-tenant systems, one user's complex query can slow down everyone else. A company uploading thousands of documents creates processing queues that affect all users. Not in Memno.

Your isolated environment means your performance is predictable:

  • Vector searches: 23ms median, 45ms p99

  • API responses: 47ms median, 312ms p99

  • Document ingestion: 2.3s for typical PDFs

  • Concurrent operations: No throttling between users

These aren't theoretical limits. They're production metrics across our daily active users. The architecture supports dozens concurrent workflows per user, with automatic queuing for burst loads.

The Phone Call Problem

One of Memno's most ambitious features is the ability to make phone calls on your behalf. This presented unique architectural challenges.

How do you give an AI system enough context to handle a call intelligently while ensuring it can't leak sensitive information? Our solution: containerized execution environments.

When Memno makes a call to reschedule your dentist appointment, it operates in a sandboxed environment with only the specific information needed for that task. It knows about the appointment, your calendar availability, and your preferences. It doesn't have access to your financial documents or that confidential merger memo you uploaded last week.

This same principle applies to email responses and other external interactions. Memno is smart enough to handle complex tasks but architecturally prevented from oversharing. And when in doubt, Memno will ask for your input before sharing anything.

What's Next

We're just getting started. The architecture we've built for Memno is designed to support capabilities we haven't even announced yet.

The same workflow architecture that powers Prompteus's enterprise features gives us a foundation for sophisticated personal AI capabilities:

Intelligent Agents: Workflows that span days or weeks, maintaining state and context as they help you accomplish complex goals.

Collaborative Intelligence: Share specific knowledge graphs with teammates while maintaining complete isolation of everything else. The same distributed security model that protects your data can enable selective, secure sharing.

Domain-Specific Processing: Specialized pipelines that understand not just text and images, but code, spreadsheets, and structured data with domain-specific intelligence. The workflow system can route to specialized processors based on content type and user needs.

Predictive Assistance: With your permission, Memno could proactively surface relevant information before you ask, powered by the same event-driven architecture that processes your documents today.

The foundation is there. Every architectural decision we've made, from edge computing to isolated environments to distributed processing, sets us up to build features that would be impossible in traditional architectures.

Building Different

When we started Memno, we could have taken the easy path. We could have built another chatbot with a vector database bolted on. Instead, we took the architectural principles that make Prompteus enterprise-grade and reimagined them for a more personal use.

Memno represents our belief that personal AI should be both powerful and private. That you shouldn't have to choose between capability and control.

We've built Memno to be the AI assistant we wanted for ourselves: one that remembers everything, understands context, and keeps our data completely private.

The result is an AI system that's not just secure by default, but secure by design. Not just fast when it launches, but fast as it scales. Not just private in policy, but private in architecture.

This is how we think personal AI should be built: with privacy and performance as the foundation, not features.

The second brain you've been waiting for.

The second brain you've been waiting for.