XMem — memory API

XMem — memory API

v1.0 · public beta

Memory that captures reasoning, not just results

Memory that captures reasoning, not just results

Memory API for agent builders that resolves conflicts, tracks versions, and preserves lineage, so your agents reason with the right context.

Your AI tool has access to everything, but it still doesn't understand your business. With XTrace, it does.

Memory Hub

···

X-Mem

···

XTrace Vector Database

···

Memory Hub

X-Mem

XTrace Vector Database

Benchmarks

The numbers that matter in production.

92.3

%

LOCOMO

Long-form conversational memory. Highest reported score on the leaderboard.

92.3

%

LOCOMO

Long-form conversational memory. Highest reported score on the leaderboard.

95

%

fewer tokens

vs. dumping chat history. Because resolved beliefs beat raw transcripts.

95

%

fewer tokens

vs. dumping chat history. Because resolved beliefs beat raw transcripts.

<80

ms

p50 recall latency

p99 < 220ms. Fast enough for voice. Global edge on Enterprise.

<80

ms

p50 recall latency

p99 < 220ms. Fast enough for voice. Global edge on Enterprise.

LOCOMO — long-term conversational memory

Higher is better. Evaluated on the public LOCOMO suite (Snap Research). Competitor scores from published vendor papers & blogs.

XMem

92.3%

Mem0 (selective)

91.6%

Zep

75.1%

Letta (filesystem)

74.0%

OpenAI Memory

~52.9%

Notes: Mem0 score = their 2026 token-efficient pipeline (mem0.ai/research). Zep = Zep's own corrected number per their rebuttal (getzep.com). Letta = their published filesystem-only baseline. OpenAI

Memory is estimated from Mem0's comparative "26% relative lift over OpenAI" figure.

LOCOMO — long-term conversational memory

Higher is better. Evaluated on the public LOCOMO suite (Snap Research). Competitor scores from published vendor papers & blogs.

XMem

92.3%

Mem0 (selective)

91.6%

Zep

75.1%

Letta (filesystem)

74.0%

OpenAI Memory

~52.9%

Notes: Mem0 score = their 2026 token-efficient pipeline (mem0.ai/research). Zep = Zep's own corrected number per their rebuttal (getzep.com). Letta = their published filesystem-only baseline. OpenAI

Memory is estimated from Mem0's comparative "26% relative lift over OpenAI" figure.

Under the hood

Why not just a vector DB?

A vector DB answers "what looks like this?" A memory engine answers "what's true right now?" Those are different queries, and they need different data structures.

Dimension

Vector DB + RAG

"what looks like this?"

XMem

"what's true right now?"

What it stores

What it stores

Embeddings of strings. No notion of "true right now."

Embeddings of strings. No notion of "true right now."

Structured facts with timestamps, sources, and

confidence.

Structured facts with timestamps, sources, and

confidence.

Conflicts

Conflicts

Contradictions coexist forever. Retrieval picks whichever scores higher.

Contradictions coexist forever. Retrieval picks whichever scores higher.

Newer facts supersede older ones. Retractions are first- class.

Newer facts supersede older ones. Retractions are first- class.

Lineage

Lineage

None. You can't answer why do you believe this?

None. You can't answer why do you believe this?

Every answer carries lineage. One call returns the fact + its provenance.

Every answer carries lineage. One call returns the fact + its provenance.

Time decay

Time decay

None. A fact from Jan 2024 ranks with a fact from yesterday.

None. A fact from Jan 2024 ranks with a fact from yesterday.

Decay curves per fact type. Preferences age differently than contracts.

Decay curves per fact type. Preferences age differently than contracts.

Integration

You'll end up writing a belief layer on top.

You'll end up writing a belief layer on top.

One SDK call. Works alongside your existing vector DB — we don't replace it.

One SDK call. Works alongside your existing vector DB — we don't replace it.

Built from the ground up

XMem runs on our own vector store — written from scratch, homomorphically encrypted so embeddings are searchable without ever being decrypted on our servers. The SDK is open source.

Built from the ground up

XMem runs on our own vector store — written from scratch, homomorphically encrypted so embeddings are searchable without ever being decrypted on our servers. The SDK is open source.

Architecture

One write. One recall.
A belief engine in the middle.

Your agent sends raw events. XMem extracts facts, resolves entities, supersedes stale beliefs, and serves back a ranked, scoped, source-tagged context window.

your

agent

writes events

XMem engine

extract

extract

resolve

resolve

supersede

supersede

decay

decay

ranked context

injected into LLM

$ curl -X POST https://api.xmem.ai/v1/recall \

-H "Authorization: Bearer xm_live_..." \

-d '{"agent_id":"support-bot","user_id":"u_4713","query":"reach them?"}'

→ {

"facts": [{

"content": "prefers email over SMS",

"confidence": 0.94,

"source": "chat:2026-04-18",

"supersedes": ["fact_8a1c..."],

"ttl_days": 180

}],

"prompt": "User u_4713 prefers email over SMS (src: chat 2026-04-18).",

"tokens": 42, "latency_ms": 67

}

Core primitives

Six objects. One coherent mental model.

Everything in XMem is built from these. Learn them in an afternoon; never hit a dead end.

Fact

A structured belief about a user, entity, or world state. Has content, confidence, source, and a decay curve.

mem.write(content=...)

Episode

A conversation, session, or interaction — raw. XMem extracts facts from episodes automatically.

mem.log_episode(...)

Artifact

Structured objects your agent produces — tickets, PRs, summaries. Linked back to the facts that shaped them.

mem.attach_artifact(...)

Fact

A structured belief about a user, entity, or world state. Has content, confidence, source, and a decay curve.

mem.write(content=...)

Episode

A conversation, session, or interaction — raw. XMem extracts facts from episodes automatically.

mem.log_episode(...)

Artifact

Structured objects your agent produces — tickets, PRs, summaries. Linked back to the facts that shaped them.

mem.attach_artifact(...)

Supersede chain

New facts supersede old. The chain is queryable — you can always replay the belief history for any entity.

mem.history(fact_id=...)

Policy

Who (which agent, which user tier) sees what. Scoped at fact-level. Useful when one engine serves multiple products.

mem.policy.grant(...)

Audit log

Every read, write, and supersede is logged with actor and timestamp. Export to your SIEM. Enterprise-grade, day one.

mem.audit.stream(...)

Supersede chain

New facts supersede old. The chain is queryable — you can always replay the belief history for any entity.

mem.history(fact_id=...)

Policy

Who (which agent, which user tier) sees what. Scoped at fact-level. Useful when one engine serves multiple products.

mem.policy.grant(...)

Audit log

Every read, write, and supersede is logged with actor and timestamp. Export to your SIEM. Enterprise-grade, day one.

mem.audit.stream(...)

Built for

Every agent that holds a conversation longer than a goldfish.

Support agents

Know the user's plan, their last three tickets, and the policy changes since they signed up. Stop asking "what's your order number?"

Coding agents

Remember which patterns this repo favors, what conventions were deprecated last month, and which files the user owns.

Voice agents

<80ms p50 is the latency budget you need. Facts-only recall means short prompts — fewer tokens, lower cost, faster TTFB.

Research agents

Track what's been read, what's been ruled out, and which claims are still load- bearing. Supersede chains = reproducible research.

Companions

Users want an AI that actually knows them. Not one that forgot their dog's name between sessions.

Workflow / ops agents

Long-running agents that span days and tools. Lineage matters here more than anywhere — XMem gives you audit + replay.

For CTOs

The memory layer you'd build if you had a year.

Every team building agents writes some version of this themselves. Chat history truncation. De-duping facts. Hand-rolled entity resolution. Deciding whose contradiction to trust. It's six months of distraction from your actual product.

XMem is that layer — built, battle-tested, benchmarked, compliant. You ship the agent. We ship the brain.

HE

Homomorphically encrypted by default

Our own encrypted vector database keeps embeddings searchable while encrypted — we never see your data in the clear.

0

Zero training on your data. Ever.

Your facts are yours. Never used to train models. Export-on-exit as JSON.

SDK

Python SDK — open source.

Apache 2.0 licensed. Self-host the SDK; point it at our managed engine or your own deploy. TypeScript and Go coming soon.

Drop-in. No lock-in.

Three calls to integrate. One call to export everything. We don't hold your data hostage.

Pricing

Start free. Scale when it works.

Every plan includes every feature. You're only paying for volume.

Hacker

Side projects & hackathons.

$

0

free forever

10K adds / month

100K recalls / month

All primitives & SDKs

Community Discord support

Shared infra · p50 < 120ms

Startup

First agent in production.

$

49

/mo

+ usage beyond included

250K adds / month

2.5M recalls / month

Shared infra · p50 < 100ms

Email support · 1 business day

Private Slack channel

Usage analytics dashboard

Scale

Real users, real traffic.

$

199

/mo

+ usage beyond included

5M adds / month

50M recalls / monthalls / month

Dedicated infra · p50 < 80ms

Priority support · 4hr SLA

Audit log export

99.9% uptime SLA

Most popular

Scale

Real users, real traffic.

$

199

/mo

+ usage beyond included

5M adds / month

50M recalls / month

Dedicated infra · p50 < 80ms

Priority support · 4hr SLA

Audit log export

99.9% uptime SLA

Most popular

Enterprise

Regulated, at scale.

Custom

talk to founders

Unlimited volume

VPC / single-tenant deploy

HIPAA · custom DPA · SSO

99.99% uptime SLA

Dedicated solutions engineer

Custom decay & policy models

Priced on

adds

(events written) and

recalls

(context requests). Everything else — facts, episodes, artifacts, supersede chains — is just how we think about what's inside.

Same engine, two surfaces

Your agent has memory. Your whole team should too.

Most XMem customers ship their agent, then roll memory out to the rest of the team. That's Memory Hub: the same belief engine, surfaced as a product humans can use.

you are here

XMem

Memory for your agent

A drop-in SDK. Your code, your agent, your stack. One engine per application.

·

Python SDK (TS + Go coming soon)

·

Fact-level belief revision

·

Scoped per agent · per user

·

$0 → $199 → custom

Memory Hub

Memory for your company

A platform your whole team uses. Captures across every AI tool. Serves every teammate and every agent.

·

UI for humans · API for agents

·

Notion, Slack, Drive, ChatGPT, Claude...

·

Audience tiers · E2E Personal Space

·

Per-seat · starts at team-size of 5

Per-seat · starts at team-size of 3

Common questions

Developer questions.

How does XMem relate to Memory Hub?

+

Do I still need my vector DB?

+

What's the latency overhead?

+

Can I self-host?

+

What LLMs does XMem work with?

+

How do you handle PII?

+

Is it open source?

+

·

Your team's shared brain

© 2026 XTrace

Common questions

Developer questions.

How does XMem relate to Memory Hub?

+

Do I still need my vector DB?

+

What's the latency overhead?

+

Can I self-host?

+

What LLMs does XMem work with?

+

How do you handle PII?

+

Is it open source?

+

Your team's shared brain

© 2026 XTrace

Common questions

Developer questions.

How does XMem relate to Memory Hub?

+

Do I still need my vector DB?

+

What's the latency overhead?

+

Can I self-host?

+

What LLMs does XMem work with?

+

How do you handle PII?

+

Is it open source?

+

·

Your team's shared brain

© 2026 XTrace