Dream Engines

Client

dream.Client is the sync entry point. For async, see AsyncClient.

Construction

PYTHON
import dream
client = dream.Client(
api_key="dre_...", # optional; reads DREAM_API_KEY
base_url="https://api.dreamengines.run", # optional; reads DREAM_BASE_URL
timeout_s=300.0, # default; covers cold starts
retry_policy=dream.RetryPolicy(max_attempts=3), # default
)

Every argument is optional. The constructor never makes a network call — the first HTTP request happens when you call a method.

api_key resolution

In order: explicit api_key= argument → DREAM_API_KEY env → DREAM_ENGINE_API_KEY env (legacy) → no auth header. The last is fine for hitting a no-auth dev deployment.

base_url resolution

In order: explicit base_url=DREAM_BASE_URL env → https://api.dreamengines.run (production default). Trailing slash is stripped automatically.

timeout_s

Per-request timeout in seconds. Default 300 s — generous because the first request to a Modal-hosted engine pays a 70-90 s cold start. Once the container is warm, real predict calls return in ~3 s.

retry_policy

PYTHON
dream.RetryPolicy(
max_attempts=3, # 1 disables retries
base_delay_s=0.25, # first retry waits 0.25s, then 0.5, 1, 2, 4 (capped)
max_delay_s=4.0,
jitter=True, # multiply each delay by uniform [0.5, 1.5]
)

Retries fire on 429, 502, 503, 504, and httpx.TransportError. The server's Retry-After header is honored on 429. Auth errors (401/403), validation errors (4xx other), and 500 are non-retryable.

Methods

The client has two layers of public methods:

PYTHON
client.models.list() # → list[ModelHandle]
client.models.get("dreamdojo-2b-gr1") # → ModelHandle

ModelHandle is the typed surface for a specific spec — predict, predict_batch, and the spec metadata (model.action_dim, model.resolution, model.spec.benchmark, etc).

Wire-level (power users)

PYTHON
client.predict(frame_bytes=..., actions_bytes=...) # → PredictResponse
client.predict_batch(frame_bytes=..., actions_bytes_list=...) # → BatchResponse
client.healthz() # → dict
client.status() # → dict
client.specs() # → dict (legacy; use models.list())

These bypass the ModelHandle validation layer (active-spec check, input coercion, lazy frame decode). Use them when you already have wire-ready bytes and don't want the typed wrapping.

predict_many

Predict a rollout for every row in an IterableSource and write results to a RolloutSink. Runs a concurrency pool internally so multiple requests are in flight simultaneously.

PYTHON
result = client.predict_many(
src,
sink,
*,
spec=None,
concurrency=4,
on_error="skip",
progress=True,
num_steps=None,
guidance=None,
seed=0,
) -> PredictManyResult
ParameterTypeDefaultDescription
srcIterableSourceSource yielding SourceRow instances. Any of the dream.io loaders, or a custom subclass.
sinkRolloutSinkSink receiving RolloutRecord instances. Any of the RolloutSink.* factories, or a custom subclass.
specstr | NoneNoneModel spec slug, e.g. "dreamdojo-2b-gr1". Recorded in each row's metadata under "spec" for downstream provenance.
concurrencyint4Maximum in-flight requests. Raise to 8–16 if you're not hitting rate limits.
on_error"skip" | "halt""skip""skip": record the failure, continue. "halt": raise on first non-retryable failure. Auth + credit errors always halt regardless.
progressboolTrueShow a tqdm progress bar (no-op if tqdm isn't installed).
num_stepsint | NoneNoneForwards to Client.predict's diffusion-step knob. Same value used for every row.
guidancefloat | NoneNoneForwards to Client.predict's classifier-free guidance knob.
seedint0Forwards to Client.predict. Same value used for all rows; vary per-row by subclassing IterableSource.

Returns: PredictManyResult dataclass:

PYTHON
@dataclass
class PredictManyResult:
ok: int # rows where the engine returned 2xx and the sink accepted the rollout
failed: int # rows skipped (on_error="skip"); 0 when on_error="halt"
output_uri: str # canonical URI returned by sink.finalize()
failures: list[FailedRow] # per-row failure records — error class, row_id, attempts
@property
def total(self) -> int: # ok + failed
...

FailedRow carries row_id: str, error: DreamError, attempts: int. Inspect result.failures to see which rows failed and why.

Retry behavior: per-row retries on 429 / 5xx / httpx.TransportError (3× exponential backoff). Auth errors (401/403) and insufficient-credits errors (402) fail fast and are not retried — they always raise out of predict_many regardless of the on_error setting (a 402 on row 1 will 402 on every other row in the same job, so retrying them is wasted work). In-flight rows are allowed to drain before the error is re-raised, and the sink is finalized best-effort so partial results aren't lost.

Example:

PYTHON
from dream.io import frames_from_hf, RolloutSink
src = frames_from_hf(
"kingJulio/dream-engine-example-frames",
frame_field="start_frame",
actions_field="action_sequence",
)
sink = RolloutSink.dir("./out")
result = client.predict_many(
src, sink,
spec="dreamdojo-2b-gr1",
concurrency=4,
on_error="skip",
progress=True,
)
print(f"{result.ok} ok / {result.failed} failed → {result.output_uri}")

See the Bulk inference quickstart and the dream.io reference for a full walk-through.


estimate_cost

Preview the cost of a predict_many run without doing any GPU work. Peeks at the source's row count and multiplies by the spec's frames-per-rollout × the per-frame price.

PYTHON
estimate = client.estimate_cost(
src,
*,
spec=None,
frames_per_row=None,
) -> CostEstimate
ParameterTypeDefaultDescription
srcIterableSourceSame source you'd pass to predict_many.
specstr | NoneNoneModel spec slug. Used to look up the per-frame price; falls back to the catalog's default when None or unknown.
frames_per_rowint | NoneNoneOverride for the rollout frame count. When None, peeks the source's first row and infers from the actions tensor's T + 1.

Returns: CostEstimate dataclass:

PYTHON
@dataclass
class CostEstimate:
rows: int | None # None for unbounded streaming sources
frames_per_row: int # spec's canonical frame count (e.g. 49)
total_frames: int | None # rows × frames_per_row; None if rows is None
total_usd: float | None # total_frames × price_per_frame; None if rows is None
spec: str # spec slug the estimate was computed against ("" when no spec passed)
price_per_frame_usd: float # per-frame USD charge resolved from the catalog

When rows is None (streaming HF source with no Hub metadata available), total_usd is also None. Either set src.rows_hint = N on the loader directly when you know the count, or skip the estimate.

Note: the first-row peek consumes one item from iter(src) — construct a fresh source for the subsequent predict_many call. IterableSource.__iter__ returns a fresh iterator each call, so the common pattern is to call estimate_cost(src1, ...) then predict_many(src1, ...) against the same src1 and the second iter pass starts from row 0.

Example:

PYTHON
estimate = client.estimate_cost(src, spec="dreamdojo-2b-gr1")
print(f"≈ ${estimate.total_usd:.4f} for {estimate.rows} rows")
# ≈ $0.3920 for 16 rows

Lifecycle

Client holds an httpx connection pool. Either close it explicitly or use it as a context manager:

PYTHON
with dream.Client() as client:
rollout = client.models.get("dreamdojo-2b-gr1").predict(...)
# pool closed at exit
# or
client = dream.Client()
try:
...
finally:
client.close()

If you forget, Python's garbage collector eventually closes the pool — but you'll see a ResourceWarning in test logs.

Reusing one client

A single Client instance is fine to reuse across many requests in one process. Don't construct a fresh client per request — you'll lose connection pooling and the cold-start retry budget.

For multi-thread / multi-process: each thread can share one client (httpx's pool is thread-safe). For asyncio, use AsyncClient instead.