vectorization of a function applied to dask array results in memory error

I want to evaluate on cluster a function on every input of an N-dimensional array using vectorization. Dask_array has apply_gufunc with the attribute vectorize = True for such problems. However running the following minimum example I get a MemoryError whereas I have the worker memory is such that it can deal with the size of a single chunk (7GB vs. 30 GB). The error goes away if I use smaller chunks or less workers. Maybe the error indicates a memory overflow of local io and somehow in the cluster computation part is send back to io to pre-evaluate?

n=224
def combine(a, b, c):
    return a * b * c * np.eye(n)
A, B, C = 100, 80 , 121
dask_array_a = dask.array.from_array(np.tile(np.linspace(0, 1, A, dtype=complex).reshape((-1,1,1)), (1, B, C)),chunks=(1,80,C))
dask_array_b = dask.array.from_array(np.tile(np.linspace(0, 1, B, dtype=complex).reshape((1,-1,1)), (A, 1, C)) ,chunks=(1,80,C))                                     
dask_array_c = dask.array.from_array(np.tile(np.linspace(0, 1, C, dtype=complex), (A, B, 1)),chunks=(1,80,C))

combine_vect = dask.array.apply_gufunc(combine, '(),(),()->(l,m)', dask_array_a ,dask_array_b, dask_array_c, vectorize=True, output_sizes={'l':n, 'm':n}, meta=dask.array.empty(shape=(A,B,C,n,n), chunks=(1,40,C,n,n), dtype='complex'))

combine_test = dask.array.trace(combine_vect, axis1 = -1, axis2 = -2) #needed to have output not overflowing local
combine_test.compute()

Edited Jan 24, 2023 by Rik Broekhoven