Client

dream.Client is the sync entry point. For async, see AsyncClient.

Construction

PYTHON

import dream
 
client = dream.Client(
    api_key="dre_...",                              # optional; reads DREAM_API_KEY
    base_url="https://api.dreamengines.run",            # optional; reads DREAM_BASE_URL
    timeout_s=300.0,                                # default; covers cold starts
    retry_policy=dream.RetryPolicy(max_attempts=3), # default
)

Every argument is optional. The constructor never makes a network call — the first HTTP request happens when you call a method.

`api_key` resolution

In order: explicit api_key= argument → DREAM_API_KEY env → DREAM_ENGINE_API_KEY env (legacy) → no auth header. The last is fine for hitting a no-auth dev deployment.

`base_url` resolution

In order: explicit base_url= → DREAM_BASE_URL env → https://api.dreamengines.run (production default). Trailing slash is stripped automatically.

`timeout_s`

Per-request timeout in seconds. Default 300 s — generous because the first request to a Modal-hosted engine pays a 70-90 s cold start. Once the container is warm, real predict calls return in ~3 s.

`retry_policy`

PYTHON

dream.RetryPolicy(
    max_attempts=3,         # 1 disables retries
    base_delay_s=0.25,      # first retry waits 0.25s, then 0.5, 1, 2, 4 (capped)
    max_delay_s=4.0,
    jitter=True,            # multiply each delay by uniform [0.5, 1.5]
)

Retries fire on 429, 502, 503, 504, and httpx.TransportError. The server's Retry-After header is honored on 429. Auth errors (401/403), validation errors (4xx other), and 500 are non-retryable.

Methods

The client has two layers of public methods:

Resource-style (recommended)

PYTHON

client.models.list()                 # → list[ModelHandle]
client.models.get("dreamdojo-2b-gr1") # → ModelHandle

ModelHandle is the typed surface for a specific spec — predict, predict_batch, and the spec metadata (model.action_dim, model.resolution, model.spec.benchmark, etc).

Wire-level (power users)

PYTHON

client.predict(frame_bytes=..., actions_bytes=...)         # → PredictResponse
client.predict_batch(frame_bytes=..., actions_bytes_list=...) # → BatchResponse
client.healthz()                     # → dict
client.status()                      # → dict
client.specs()                       # → dict (legacy; use models.list())

These bypass the ModelHandle validation layer (active-spec check, input coercion, lazy frame decode). Use them when you already have wire-ready bytes and don't want the typed wrapping.

`predict_many`

Predict a rollout for every row in an IterableSource and write results to a RolloutSink. Runs a concurrency pool internally so multiple requests are in flight simultaneously.

PYTHON

result = client.predict_many(
    src,
    sink,
    *,
    spec=None,
    concurrency=4,
    on_error="skip",
    progress=True,
    num_steps=None,
    guidance=None,
    seed=0,
) -> PredictManyResult

Parameter	Type	Default	Description
`src`	`IterableSource`	—	Source yielding `SourceRow` instances. Any of the `dream.io` loaders, or a custom subclass.
`sink`	`RolloutSink`	—	Sink receiving `RolloutRecord` instances. Any of the `RolloutSink.*` factories, or a custom subclass.
`spec`	`str \| None`	`None`	Model spec slug, e.g. `"dreamdojo-2b-gr1"`. Recorded in each row's metadata under `"spec"` for downstream provenance.
`concurrency`	`int`	`4`	Maximum in-flight requests. Raise to 8–16 if you're not hitting rate limits.
`on_error`	`"skip" \| "halt"`	`"skip"`	`"skip"`: record the failure, continue. `"halt"`: raise on first non-retryable failure. Auth + credit errors always halt regardless.
`progress`	`bool`	`True`	Show a tqdm progress bar (no-op if `tqdm` isn't installed).
`num_steps`	`int \| None`	`None`	Forwards to `Client.predict`'s diffusion-step knob. Same value used for every row.
`guidance`	`float \| None`	`None`	Forwards to `Client.predict`'s classifier-free guidance knob.
`seed`	`int`	`0`	Forwards to `Client.predict`. Same value used for all rows; vary per-row by subclassing `IterableSource`.

Returns: PredictManyResult dataclass:

PYTHON

@dataclass
class PredictManyResult:
    ok:         int               # rows where the engine returned 2xx and the sink accepted the rollout
    failed:     int               # rows skipped (on_error="skip"); 0 when on_error="halt"
    output_uri: str               # canonical URI returned by sink.finalize()
    failures:   list[FailedRow]   # per-row failure records — error class, row_id, attempts
 
    @property
    def total(self) -> int:       # ok + failed
        ...

FailedRow carries row_id: str, error: DreamError, attempts: int. Inspect result.failures to see which rows failed and why.

Retry behavior: per-row retries on 429 / 5xx / httpx.TransportError (3× exponential backoff). Auth errors (401/403) and insufficient-credits errors (402) fail fast and are not retried — they always raise out of predict_many regardless of the on_error setting (a 402 on row 1 will 402 on every other row in the same job, so retrying them is wasted work). In-flight rows are allowed to drain before the error is re-raised, and the sink is finalized best-effort so partial results aren't lost.

Example:

PYTHON

from dream.io import frames_from_hf, RolloutSink
 
src  = frames_from_hf(
    "kingJulio/dream-engine-example-frames",
    frame_field="start_frame",
    actions_field="action_sequence",
)
sink = RolloutSink.dir("./out")
 
result = client.predict_many(
    src, sink,
    spec="dreamdojo-2b-gr1",
    concurrency=4,
    on_error="skip",
    progress=True,
)
print(f"{result.ok} ok / {result.failed} failed → {result.output_uri}")

See the Bulk inference quickstart and the dream.io reference for a full walk-through.

`estimate_cost`

Preview the cost of a predict_many run without doing any GPU work. Peeks at the source's row count and multiplies by the spec's frames-per-rollout × the per-frame price.

PYTHON

estimate = client.estimate_cost(
    src,
    *,
    spec=None,
    frames_per_row=None,
) -> CostEstimate

Parameter	Type	Default	Description
`src`	`IterableSource`	—	Same source you'd pass to `predict_many`.
`spec`	`str \| None`	`None`	Model spec slug. Used to look up the per-frame price; falls back to the catalog's default when `None` or unknown.
`frames_per_row`	`int \| None`	`None`	Override for the rollout frame count. When `None`, peeks the source's first row and infers from the actions tensor's `T + 1`.

Returns: CostEstimate dataclass:

PYTHON

@dataclass
class CostEstimate:
    rows:                int | None    # None for unbounded streaming sources
    frames_per_row:      int           # spec's canonical frame count (e.g. 49)
    total_frames:        int | None    # rows × frames_per_row; None if rows is None
    total_usd:           float | None  # total_frames × price_per_frame; None if rows is None
    spec:                str           # spec slug the estimate was computed against ("" when no spec passed)
    price_per_frame_usd: float         # per-frame USD charge resolved from the catalog

When rows is None (streaming HF source with no Hub metadata available), total_usd is also None. Either set src.rows_hint = N on the loader directly when you know the count, or skip the estimate.

Note: the first-row peek consumes one item from iter(src) — construct a fresh source for the subsequent predict_many call. IterableSource.__iter__ returns a fresh iterator each call, so the common pattern is to call estimate_cost(src1, ...) then predict_many(src1, ...) against the same src1 and the second iter pass starts from row 0.

Example:

PYTHON

estimate = client.estimate_cost(src, spec="dreamdojo-2b-gr1")
print(f"≈ ${estimate.total_usd:.4f} for {estimate.rows} rows")
# ≈ $0.3920 for 16 rows

Lifecycle

Client holds an httpx connection pool. Either close it explicitly or use it as a context manager:

PYTHON

with dream.Client() as client:
    rollout = client.models.get("dreamdojo-2b-gr1").predict(...)
# pool closed at exit
 
# or
client = dream.Client()
try:
    ...
finally:
    client.close()

If you forget, Python's garbage collector eventually closes the pool — but you'll see a ResourceWarning in test logs.

Reusing one client

A single Client instance is fine to reuse across many requests in one process. Don't construct a fresh client per request — you'll lose connection pooling and the cold-start retry budget.

For multi-thread / multi-process: each thread can share one client (httpx's pool is thread-safe). For asyncio, use AsyncClient instead.

Client

Construction

api_key resolution

base_url resolution

timeout_s

retry_policy

Methods

Resource-style (recommended)

Wire-level (power users)

predict_many

estimate_cost

Lifecycle

Reusing one client

`api_key` resolution

`base_url` resolution

`timeout_s`

`retry_policy`

`predict_many`

`estimate_cost`