Client
dream.Client is the sync entry point. For async, see
AsyncClient.
Construction
import dream client = dream.Client( api_key="dre_...", # optional; reads DREAM_API_KEY base_url="https://api.dreamengines.run", # optional; reads DREAM_BASE_URL timeout_s=300.0, # default; covers cold starts retry_policy=dream.RetryPolicy(max_attempts=3), # default)Every argument is optional. The constructor never makes a network call — the first HTTP request happens when you call a method.
api_key resolution
In order: explicit api_key= argument → DREAM_API_KEY env →
DREAM_ENGINE_API_KEY env (legacy) → no auth header. The last is
fine for hitting a no-auth dev deployment.
base_url resolution
In order: explicit base_url= → DREAM_BASE_URL env →
https://api.dreamengines.run (production default). Trailing slash is
stripped automatically.
timeout_s
Per-request timeout in seconds. Default 300 s — generous because the first request to a Modal-hosted engine pays a 70-90 s cold start. Once the container is warm, real predict calls return in ~3 s.
retry_policy
dream.RetryPolicy( max_attempts=3, # 1 disables retries base_delay_s=0.25, # first retry waits 0.25s, then 0.5, 1, 2, 4 (capped) max_delay_s=4.0, jitter=True, # multiply each delay by uniform [0.5, 1.5])Retries fire on 429, 502, 503, 504, and httpx.TransportError.
The server's Retry-After header is honored on 429. Auth errors
(401/403), validation errors (4xx other), and 500 are non-retryable.
Methods
The client has two layers of public methods:
Resource-style (recommended)
client.models.list() # → list[ModelHandle]client.models.get("dreamdojo-2b-gr1") # → ModelHandleModelHandle is the typed surface for a specific spec —
predict,
predict_batch, and the spec metadata
(model.action_dim, model.resolution, model.spec.benchmark, etc).
Wire-level (power users)
client.predict(frame_bytes=..., actions_bytes=...) # → PredictResponseclient.predict_batch(frame_bytes=..., actions_bytes_list=...) # → BatchResponseclient.healthz() # → dictclient.status() # → dictclient.specs() # → dict (legacy; use models.list())These bypass the ModelHandle validation layer (active-spec check,
input coercion, lazy frame decode). Use them when you already have
wire-ready bytes and don't want the typed wrapping.
predict_many
Predict a rollout for every row in an IterableSource and write
results to a RolloutSink. Runs a concurrency pool internally so
multiple requests are in flight simultaneously.
result = client.predict_many( src, sink, *, spec=None, concurrency=4, on_error="skip", progress=True, num_steps=None, guidance=None, seed=0,) -> PredictManyResult| Parameter | Type | Default | Description |
|---|---|---|---|
src | IterableSource | — | Source yielding SourceRow instances. Any of the dream.io loaders, or a custom subclass. |
sink | RolloutSink | — | Sink receiving RolloutRecord instances. Any of the RolloutSink.* factories, or a custom subclass. |
spec | str | None | None | Model spec slug, e.g. "dreamdojo-2b-gr1". Recorded in each row's metadata under "spec" for downstream provenance. |
concurrency | int | 4 | Maximum in-flight requests. Raise to 8–16 if you're not hitting rate limits. |
on_error | "skip" | "halt" | "skip" | "skip": record the failure, continue. "halt": raise on first non-retryable failure. Auth + credit errors always halt regardless. |
progress | bool | True | Show a tqdm progress bar (no-op if tqdm isn't installed). |
num_steps | int | None | None | Forwards to Client.predict's diffusion-step knob. Same value used for every row. |
guidance | float | None | None | Forwards to Client.predict's classifier-free guidance knob. |
seed | int | 0 | Forwards to Client.predict. Same value used for all rows; vary per-row by subclassing IterableSource. |
Returns: PredictManyResult dataclass:
@dataclassclass PredictManyResult: ok: int # rows where the engine returned 2xx and the sink accepted the rollout failed: int # rows skipped (on_error="skip"); 0 when on_error="halt" output_uri: str # canonical URI returned by sink.finalize() failures: list[FailedRow] # per-row failure records — error class, row_id, attempts @property def total(self) -> int: # ok + failed ...FailedRow carries row_id: str, error: DreamError, attempts: int. Inspect result.failures to see which rows failed and why.
Retry behavior: per-row retries on 429 / 5xx /
httpx.TransportError (3× exponential backoff). Auth errors
(401/403) and insufficient-credits errors (402) fail fast and
are not retried — they always raise out of predict_many regardless
of the on_error setting (a 402 on row 1 will 402 on every other row
in the same job, so retrying them is wasted work). In-flight rows are
allowed to drain before the error is re-raised, and the sink is
finalized best-effort so partial results aren't lost.
Example:
from dream.io import frames_from_hf, RolloutSink src = frames_from_hf( "kingJulio/dream-engine-example-frames", frame_field="start_frame", actions_field="action_sequence",)sink = RolloutSink.dir("./out") result = client.predict_many( src, sink, spec="dreamdojo-2b-gr1", concurrency=4, on_error="skip", progress=True,)print(f"{result.ok} ok / {result.failed} failed → {result.output_uri}")See the Bulk inference quickstart
and the dream.io reference for a
full walk-through.
estimate_cost
Preview the cost of a predict_many run without doing any GPU work.
Peeks at the source's row count and multiplies by the spec's
frames-per-rollout × the per-frame price.
estimate = client.estimate_cost( src, *, spec=None, frames_per_row=None,) -> CostEstimate| Parameter | Type | Default | Description |
|---|---|---|---|
src | IterableSource | — | Same source you'd pass to predict_many. |
spec | str | None | None | Model spec slug. Used to look up the per-frame price; falls back to the catalog's default when None or unknown. |
frames_per_row | int | None | None | Override for the rollout frame count. When None, peeks the source's first row and infers from the actions tensor's T + 1. |
Returns: CostEstimate dataclass:
@dataclassclass CostEstimate: rows: int | None # None for unbounded streaming sources frames_per_row: int # spec's canonical frame count (e.g. 49) total_frames: int | None # rows × frames_per_row; None if rows is None total_usd: float | None # total_frames × price_per_frame; None if rows is None spec: str # spec slug the estimate was computed against ("" when no spec passed) price_per_frame_usd: float # per-frame USD charge resolved from the catalogWhen rows is None (streaming HF source with no Hub metadata
available), total_usd is also None. Either set src.rows_hint = N
on the loader directly when you know the count, or skip the estimate.
Note: the first-row peek consumes one item from iter(src) —
construct a fresh source for the subsequent predict_many call.
IterableSource.__iter__ returns a fresh iterator each call, so the
common pattern is to call estimate_cost(src1, ...) then
predict_many(src1, ...) against the same src1 and the second iter
pass starts from row 0.
Example:
estimate = client.estimate_cost(src, spec="dreamdojo-2b-gr1")print(f"≈ ${estimate.total_usd:.4f} for {estimate.rows} rows")# ≈ $0.3920 for 16 rowsLifecycle
Client holds an httpx connection pool. Either close it explicitly or
use it as a context manager:
with dream.Client() as client: rollout = client.models.get("dreamdojo-2b-gr1").predict(...)# pool closed at exit # orclient = dream.Client()try: ...finally: client.close()If you forget, Python's garbage collector eventually closes the pool —
but you'll see a ResourceWarning in test logs.
Reusing one client
A single Client instance is fine to reuse across many requests in one
process. Don't construct a fresh client per request — you'll lose
connection pooling and the cold-start retry budget.
For multi-thread / multi-process: each thread can share one client
(httpx's pool is thread-safe). For asyncio, use
AsyncClient instead.