XMem — memory API

XMem — memory API

v1.0 · public beta

Memory that captures reasoning, not just results

Memory that captures reasoning, not just results

Memory API for agent builders that resolves conflicts, tracks versions, and preserves lineage, so your agents reason with the right context.

Your AI tool has access to everything, but it still doesn't understand your business. With XTrace, it does.

Memory Hub

···

X-Mem

···

XTrace Vector Database

···

Memory Hub

X-Mem

X-Mem

XTrace Vector Database

XTrace Vector Database

Benchmarks

The numbers that matter in production.

92.3

%

LOCOMO

Long-form conversational memory. Highest reported score on the leaderboard.

92.3

%

LOCOMO

Long-form conversational memory. Highest reported score on the leaderboard.

95

%

fewer tokens

vs. dumping chat history. Because resolved beliefs beat raw transcripts.

95

%

fewer tokens

vs. dumping chat history. Because resolved beliefs beat raw transcripts.

<80

ms

p50 recall latency

p99 < 220ms. Fast enough for voice. Global edge on Enterprise.

<80

ms

p50 recall latency

p99 < 220ms. Fast enough for voice. Global edge on Enterprise.

LOCOMO — long-term conversational memory

Higher is better. Evaluated on the public LOCOMO suite (Snap Research). Competitor scores from published vendor papers & blogs.

XMem

92.3%

Mem0 (selective)

91.6%

Zep

75.1%

Letta (filesystem)

74.0%

OpenAI Memory

~52.9%

Notes: Mem0 score = their 2026 token-efficient pipeline (mem0.ai/research). Zep = Zep's own corrected number per their rebuttal (getzep.com). Letta = their published filesystem-only baseline. OpenAI

Memory is estimated from Mem0's comparative "26% relative lift over OpenAI" figure.

LOCOMO — long-term conversational memory

Higher is better. Evaluated on the public LOCOMO suite (Snap Research). Competitor scores from published vendor papers & blogs.

XMem

92.3%

Mem0 (selective)

91.6%

Zep

75.1%

Letta (filesystem)

74.0%

OpenAI Memory

~52.9%

Notes: Mem0 score = their 2026 token-efficient pipeline (mem0.ai/research). Zep = Zep's own corrected number per their rebuttal (getzep.com). Letta = their published filesystem-only baseline. OpenAI

Memory is estimated from Mem0's comparative "26% relative lift over OpenAI" figure.

Under the hood

Why not just a RAG or vector DB?

A vector DB answers "what words are most similar?" A memory engine answers "what's true right now?" Those are different queries, and they need different data structures.

Dimension

Naive RAG

"what words are most similar?"

XMem

"what's true right now?"

What it stores

What it stores

Embeddings of strings. No notion of "true right now."

Embeddings of strings. No notion of "true right now."

Structured facts with timestamps, sources, and

confidence.

Structured facts with timestamps, sources, and

confidence.

Conflicts

Conflicts

Contradictions coexist forever. Retrieval picks whichever scores higher.

Contradictions coexist forever. Retrieval picks whichever scores higher.

Newer facts revise older ones. Retractions are first- class.

Newer facts revise older ones. Retractions are first- class.

Lineage

Lineage

None. You can't answer why do you believe this?

None. You can't answer why do you believe this?

Every answer carries lineage. One call returns the fact + its provenance.

Every answer carries lineage. One call returns the fact + its provenance.

Time decay

Time decay

None. A fact from Jan 2024 ranks with a fact from yesterday.

None. A fact from Jan 2024 ranks with a fact from yesterday.

Decay curves per fact type. Preferences age differently than contracts.

Decay curves per fact type. Preferences age differently than contracts.

Integration

You'll end up writing a belief layer on top.

You'll end up writing a belief layer on top.

One SDK call. Real Intelligence right out of the box. We handle all the complexities for you.

One SDK call. Real Intelligence right out of the box. We handle all the complexities for you.

Built from the ground up

Built from the ground up

XMem runs on our own vector store — written from scratch, homomorphically encrypted so embeddings are searchable without ever being decrypted on our servers. The SDK is open source.

XMem runs on our own vector store — written from scratch, homomorphically encrypted so embeddings are searchable without ever being decrypted on our servers. The SDK is open source.

How it works

One write. One recall.
A belief engine in the middle.

Click a node on the timeline to see how memory builds up with XTrace

use cases

Our memory is the difference between an agent and a chatbot.

Support agents

Know the user's plan, their last three tickets, and the policy changes since they signed up. Stop asking "what's your order number?"

Coding agents

Remember which patterns this repo favors, what conventions were deprecated last month, and which files the user owns.

Voice agents

<80ms p50 is the latency budget you need. Facts-only recall means short prompts — fewer tokens, lower cost, faster TTFB.

Research agents

Track what's been read, what's been ruled out, and which claims are still load- bearing. Supersede chains = reproducible research.

Companions

Users want an AI that actually knows them. Not one that forgot their dog's name between sessions.

Workflow / ops agents

Long-running agents that span days and tools. Lineage matters here more than anywhere — XMem gives you audit + replay.

For CTOs

The memory layer you'd build if you had a year.

Every team building agents writes some version of this themselves. De-duping facts. Hand-rolled entity resolution. Deciding whose contradiction to trust. It's a year of distraction from your product.

XMem is that layer — built, battle-tested, benchmarked, compliant. You ship the agent. We ship the brain.

Homomorphically encrypted by default

Our own encrypted vector database keeps embeddings searchable while encrypted — we never see your data in the clear.

0

Zero training on your data. Ever.

Your facts are yours. Never used to train models. Export-on-exit as JSON.

Python SDK — open source.

Apache 2.0 licensed. Self-host the SDK; point it at our managed engine or your own deploy. TypeScript and Go coming soon.

Drop-in. No lock-in.

Three calls to integrate. One call to export everything. We don't hold your data hostage.

Pricing

Start free. Scale when it works.

Every plan includes every feature. You're only paying for volume.

Hacker

Side projects & hackathons.

$

0

free forever

10K adds / month

100K recalls / month

All primitives & SDKs

Community Discord support

Shared infra · p50 < 120ms

Hacker

Side projects & hackathons.

$

0

free forever

10K adds / month

100K recalls / month

All primitives & SDKs

Community Discord support

Shared infra · p50 < 120ms

Startup

First agent in production.

$

49

/mo

+ usage beyond included

250K adds / month

2.5M recalls / month

Shared infra · p50 < 100ms

Email support · 1 business day

Private Slack channel

Usage analytics dashboard

Startup

First agent in production.

$

49

/mo

+ usage beyond included

250K adds / month

2.5M recalls / month

Shared infra · p50 < 100ms

Email support · 1 business day

Private Slack channel

Usage analytics dashboard

Scale

Real users, real traffic.

$

199

/mo

+ usage beyond included

5M adds / month

50M recalls / monthalls / month

Dedicated infra · p50 < 80ms

Priority support · 4hr SLA

Audit log export

99.9% uptime SLA

Most popular

Scale

Real users, real traffic.

$

199

/mo

+ usage beyond included

5M adds / month

50M recalls / month

Dedicated infra · p50 < 80ms

Priority support · 4hr SLA

Audit log export

99.9% uptime SLA

Most popular

Enterprise

Regulated, at scale.

Custom

talk to founders

Unlimited volume

VPC / single-tenant deploy

HIPAA · custom DPA · SSO

99.99% uptime SLA

Dedicated solutions engineer

Custom decay & policy models

Enterprise

Regulated, at scale.

Custom

talk to founders

Unlimited volume

VPC / single-tenant deploy

HIPAA · custom DPA · SSO

99.99% uptime SLA

Dedicated solutions engineer

Custom decay & policy models

Priced on

adds

(events written) and

recalls

(context requests). Everything else — facts, episodes, artifacts, supersede chains — is just how we think about what's inside.

Same engine, two surfaces

Your agent has memory. Your whole team should too.

Most XMem customers ship their agent, then roll memory out to the rest of the team. That's Memory Hub: the same belief engine, surfaced as a product humans can use.

you are here

XMem

Memory for your agent

A drop-in SDK. Your code, your agent, your stack. One engine per application.

·

Python SDK (TS + Go coming soon)

·

Fact-level belief revision

·

Scoped per agent · per user

·

$0 → $199 → custom

Memory Hub

Memory for your company

A platform your whole team uses. Captures across every AI tool. Serves every teammate and every agent.

·

UI for humans · API for agents

·

Notion, Slack, Drive, ChatGPT, Claude...

·

Audience tiers · E2E Personal Space

·

Per-seat · starts at team-size of 5

Per-seat · starts at team-size of 3

Common questions

Developer questions.

How does XMem relate to Memory Hub?

+

Do I still need my vector DB?

+

What's the latency overhead?

+

Can I self-host?

+

What LLMs does XMem work with?

+

How do you handle PII?

+

Is it open source?

+

·

Your team's shared brain

© 2026 XTrace

Common questions

Developer questions.

How does XMem relate to Memory Hub?

+

Do I still need my vector DB?

+

What's the latency overhead?

+

Can I self-host?

+

What LLMs does XMem work with?

+

How do you handle PII?

+

Is it open source?

+

Your team's shared brain

© 2026 XTrace