July 9, 2026 · 9 min read

How to test OTP and verification emails in CI

Testing that the OTP or reset email actually arrived is where suites flake. The reliable pattern: a disposable inbox, long-poll, assert the code, clean up.

“Did the email actually arrive?” is one of the most valuable things to test and one of the most commonly skipped — because doing it badly produces a flaky suite everyone learns to ignore. This is the pattern that doesn’t flake. (For a tool-by-tool comparison, see the Mailosaur alternative post; this one is the how-to.)

Why naive email tests flake

The three classic mistakes:

Mocking sendEmail(). This tests that your code called the function, not that an email was delivered, templated correctly, and contained the right code. The interesting bugs live in the gap.
Sleeping then polling. sleep(5000) then checking an inbox is slow and racy — too short and you miss the email, too long and your suite crawls.
Regex-ing raw HTML. A template tweak breaks a test that has nothing to do with the change.

The fix for all three: a real disposable inbox, a long-poll that returns the instant the message lands, and structured extraction of the code/link.

The pattern, end to end

import { MailClient } from "@ollastack/client"; // HTTP works in any language
const mail = new MailClient({ baseUrl: "https://login.ollastack.com", token: process.env.TOKEN });

test("signup sends a 6-digit OTP", async () => {
  // 1. A fresh, real receiving address
  const box = await mail.createInbox({ mode: "test", name: "signup" });

  // 2. Drive your app — it emails the OTP to box.address
  await signUp(box.address);

  // 3. Long-poll: returns the moment the email arrives (no sleeps)
  const msg = await mail.waitForEmail({ mailboxId: box.id, timeoutMs: 60000 });

  // 4. Assert on the EXTRACTED code, not the HTML
  expect(msg.codes[0]).toMatch(/^\d{6}$/);
});

codes[] and links[] are pulled out of the message for you, so a template change that keeps the code intact doesn’t break the test. The same shape works for a magic link:

test("password reset sends a working link", async () => {
  const box = await mail.createInbox({ mode: "test", name: "reset" });
  await requestReset(box.address);
  const msg = await mail.waitForEmail({ mailboxId: box.id, timeoutMs: 60000 });

  const resetUrl = msg.links.find((l) => l.includes("/reset"));
  expect(resetUrl).toBeTruthy();
  // ...then drive a browser to resetUrl and assert the flow completes.
});

Isolating parallel test runs

The mistake that bites once you parallelize: two runs share an inbox and read each other’s emails. Two clean fixes:

A fresh inbox per test (cheap — they’re disposable).
One inbox, isolated by subaddress tag. Send to slug+run123@… and filter ?tag=run123. A unique tag per run keeps each run’s mail separate without provisioning new inboxes.

// fetch only THIS run's messages
const msgs = await mail.listMessages({ mailboxId: box.id, tag: process.env.RUN_ID });

Cleaning up between runs

Disposable inboxes accumulate. Clear an inbox in one call so a run starts clean:

curl -X DELETE "https://login.ollastack.com/api/mailboxes/$ID/messages" \
  -H "Authorization: Bearer $TOKEN"   # -> { cleared: N }

Test inboxes also have a retention policy, so stale messages purge on their own — but an explicit clear at the start of a suite removes ambiguity.

Prefer push? Use the inbound webhook

If your test harness is event-driven, register a webhook on the inbox and get a signed POST the moment mail arrives, instead of polling. /wait is simpler for a linear test; the webhook is better for a queue-based runner.

A note on scope hygiene

Use a token scoped mail.test:* for CI. It can drive throwaway inboxes but cannot read or send from your persistent agent identities — so a leaked CI token can’t reach real correspondence. (Agent identities use a separate mail.agent:* scope.) This separation is enforced server-side.

The checklist

✅ Send to a real disposable address, not a mock.
✅ Use /wait (or the webhook), never sleep-and-poll.
✅ Assert on codes[] / links[], not raw HTML.
✅ Isolate parallel runs (fresh inbox or +tag).
✅ Clear the inbox at suite start.
✅ Scope the CI token to mail.test:*.

Do those six and “did the email arrive?” becomes one of the most reliable tests in your suite instead of the flakiest.

Grab a token and point your runner at the OpenAPI spec (/api/openapi.json) — it documents every mailbox endpoint.

Frequently asked questions

How do I test OTP emails in CI?

Create a disposable test inbox, drive your signup with its address, long-poll the wait endpoint for the email, and assert on the extracted code — reliable, no sleeps.

How do I keep the test from flaking?

Use the long-poll wait endpoint instead of fixed sleeps, isolate parallel runs with a fresh inbox or a +tag subaddress, and bulk-clear between runs.

Are test inboxes spam-filtered?

No — they're never filtered, so a strict transactional email is never dropped from under your assertion.

Last updated June 19, 2026. Spotted something out of date? Email hello@ollastack.com.