Cluster calculation is terminated during reading phase when dask array input is large and chunksize is too small

When running a function for a large data set (100x100x10) and picking a too small chunksize the calculation terminates during reading. The function does not show up in the dashboard. Only the reading bandwidth and the worker memory go up and then at a certain point crash down to 0 releasing everything the workers have stored up to that point.

See the comparison between 2 runs below. Input is the same. Only difference is the chunksize. Log that is shown in the case of the error is the one from IO. I have seen this error now multiple times and the memory always is at ~1.1 GB when it fails causing me to believe it is a memory issue. Though I reserve much more memory for the workers.

Edited Aug 19, 2022 by Rik Broekhoven