Running SUEWS on large grids: strategies for city-scale simulations

If you’re running SUEWS on grids with hundreds or thousands of cells — say, a city at 100 m resolution — you may hit memory limits when the entire domain is loaded at once. This post covers how to handle this efficiently.

Why large grids are tractable

SUEWS grid cells are independent — there is no lateral exchange between them. Each cell computes its own surface energy and water balance using shared meteorological forcing. This means you don’t need to run the whole domain in a single pass. You can split it up however you like and reassemble the results afterwards — the output is bit-identical.

Two strategies to exploit this: reduce problem size per process, and parallelise across processes.

Strategy 1: Reduce problem size

Split your grid into smaller batches. Instead of feeding 1000+ cells into a single run, run subsets — say, 50–100 cells at a time. This alone can bring peak memory from hundreds of GB down to a few GB.

You can also chunk the forcing data temporally. The chunk_day parameter controls this — smaller values reduce peak memory at a small overhead cost:

from supy import SUEWSSimulation

sim = SUEWSSimulation("config.yml")
output = sim.run(chunk_day=365)  # process one year at a time
sim.save("output/")

Strategy 2: Parallelise

Since cells are independent, they’re embarrassingly parallel. If you split your domain into per-grid (or per-batch) YAML configurations, you can run them concurrently across all your CPU cores:

from pathlib import Path
from joblib import Parallel, delayed
from supy import SUEWSSimulation

# one config per grid (or per batch of grids)
grid_configs = sorted(Path("input/").glob("grid_*.yml"))

def run_grid(config_path):
    sim = SUEWSSimulation(str(config_path))
    sim.run(chunk_day=365)
    sim.save(f"output/{config_path.stem}/")
    return config_path.stem

# n_jobs=-1 uses all CPU cores
completed = Parallel(n_jobs=-1)(
    delayed(run_grid)(c) for c in grid_configs
)
print(f"Completed {len(completed)} grids")

The key is to keep each forked process small and short-lived — the smaller the batch per process, the faster each fork completes and the better your cores stay utilised.

Practical tips

  • Start small: verify your pipeline on 10 cells before scaling to 1000+.
  • GIS integration: output from each grid carries its grid ID, so joining results back to a spatial layer is straightforward.
  • Combine both strategies: for very large domains, chunk temporally (chunk_day) and spatially (grid batches in parallel).
  • Monitor memory: on Linux/macOS, htop or top during a test batch tells you quickly whether your batch size is right.

Related discussion: @Janka raised a question about gridded input data workflows — worth a read if you’re setting up multi-grid runs.

Questions welcome — if you’re working at city scale, share your setup and we can help optimise.