## Parallel MUMPS solver with MPI

MUMPS supports parallelism using MPI, but Kwant currently only uses it in sequential mode.

#### MUMPS documentation

There are several issues to consider:

#### parallelizing the "solve" step

Reasonably standard. Controlling this will require looking in section 5.1.3 of the documentation

#### constructing the Hamiltonian

MUMPS supports several ways of specifying the matrix to solve for (see section 5.2 and 5.2.2 of the documentation)

- full matrix is provided on rank 0 (MUMPS will then split the matrix across the available cores)
- matrix is pre-split by the application that calls MUMPS

The first method is much easier to implement than the second, as the second would require hooking the Builder into MPI (a mess). One could imagine the following way of operating:

```
comm = ... # MPI communicator to parallelize over
syst = make_system().finalized()
parallel_mumps_solver.solve(syst, comm=comm) # not the actual API, just an example
```

The problem with this is that the system is actually constructed on all cores, even though MUMPS would only use the system on core 0! One can get around this by requiring the user to construct the system only on rank 0:

```
comm = ... # MPI communicator to parallelize over
syst = None
if comm.rank == 0:
syst = make_system().finalized()
# note that *everyone* must call `solve` in order to avoid a deadlock
parallel_mumps_solver.solve(syst, comm=comm)
```

This way the system is only constructed on rank 0 (so we save memory), however the user
has to make sure that they use MPI correctly (e.g. call `solve`

on all ranks to avoid a deadlock)