Minimally invasive vectorization of Kwant
Here's a plan to vectorize Kwant with a minimal amount of changes. The idea is that this is something that can be coded quickly, but is still very useful, and comprises a useful step forward towards more generalized systems/operators.
In the following, I only speak about finite systems. Infinite systems should be a straightforward generalization.
Vectorization of lowlevel systems
A kwant (lowlevel) system provides two things:
(1) A graph whose nodes (=sites) and edges (=hoppings) are densely numbered, each starting with zero.
(2) A way to obtain an element of the Hamiltonian matrix that is identified by two site indices and may further depend on system parameters.
I propose to extend the lowlevel system API in the following way:

We introduce numbered "value kinds". (I'm looking for a better name, how about "part"? I don't want to say "term", because of the limitation that each site/hopping belongs only to one value kind.) Each site and each hopping has exactly one value kind associated with it. This association is exposed by the API in two ways: two arrays that store the value kind index for each site and hopping, and for each value kind, a sequence of the associated site/hopping indices.
(We probably want to restrict a given value kind to either sites or hoppings. We even might have two separate numberings, one for site value kinds and one for hopping value kinds, but then, we may also have a single common numbering. This is a detail to be discussed.)

We add a new method that allows to query a single value kind in a vectorized way for all the associated sites/hoppings at once. That method takes the value kind index, and system parameters. It returns a dense 3D array of the shape (n, a, b) where n is the number of sites/hoppings that share the value kind, and (a, b) is the size of the matrix element. Obviously this implies that all values that share the same kind have the same matrix shape (for the same parameters).

For each value kind, we list the necessary parameters.
Observe that the above allows to implement hamiltonian_submatrix
efficiently. It also preserves the numbering of sites and hoppings, so that existing user scripts continue to work unchanged.
Still, to simplify our lifes, we might want to deprecate hamiltonian_submatrix
, and even break support for calling it in the fully general way. In practice, hamiltonian_submatrix
is either called for the full system, or for the unit cell and the intercell hopping of a lead.
Vectorized finalized builders
How to implement the above API for finalized builders?
We introduce a new SiteArray
class that associates multiple tags with a single site family. It's API is like the one of Site
, but in plural (except for family
). So, there's tags
, and positions
that each return a sequence. Note that for the common Bravais lattices, both tags
and positions
can return a single dense 2D array of numbers.
From the user's perspective, vectorized builders are created in exactly the same way. The only difference is that value functions receive SiteArray
instances instead of Site
, and return arrays of values. In addition, vectorization needs to be enabled somehow. This could be done with a global switch, e.g. by calling kwant.vectorize()
.
Of course finalized builders have to provide the new vectorized lowlevel system API as well as the old one. Observe that providing the old one, i.e. the hamiltonian
method poses no problem. We can keep the old code and only adapt it very slightly to call vectorized value functions with site arrays of length one.
The vectorized evaluation interface can be also provided efficiently. Note that builder is free to classify values as value kinds within the constraints imposed by the low level system API (i.e. compatible matrix sizes). So, it makes sense to classify as one value kind all sites or hoppings that share the same value function and in addition share the same site family (pair). For example, for graphene with one kind of onsite value function and one hopping value function, there will be two kinds of onsite values (one for each site family), and one or two types of hopping kinds. (That depends on whether the user has specified only A>B kind of hoppings or also B>A hoppings. IIRC builder will remember the one that has been set.) The Hermitian conjugate opposite hoppings can be treated with the same approach as currently.
Constant values of the same shape that are defined on the same site families should be probably also classified as a single kind of value. If we want, we can treat the case where all of the matrices are equal specially, for example by adopting a storage convention that is similar to NumPy's broadcasting, i.e. instead of storing 100 times the same onsite matrix ([M, M, M, ...]
) we store it just once ([M]
) in a sequence of length 1.
Vectorized lowlevel systems should expose a site_arrays
attribute that contains a sequence of SiteArray
instances, that contains the same information as sites
today, but using drastically less Python objects. The old sites
attribute will continue to exist for backwards compatibility, but it could be made to create site objects ondemand, so that they don't have to be stored.
So, when the request arrives (by the lowlevel API) to evaluate the Hamiltonian for a given value kind, the finalized builder has to get hold of the necessary site arrays before it can call the userprovided value function. These can be built ondemand efficiently by a piece of Cython code or perhaps even by clever usage of NumPy.
Comments
The above does not include any vectorization of the Builder
class, that still contains a dictionary with individual sites as keys. So construction and finalization of builders won't be any faster. Speeding up builders is a separate project.
I wrote here previously:
We can consider making
Site
a derived class ofSiteArray
.
I no longer think that this is a good idea. Instead, it seems much more natural and useful if site arrays are (fancyindexable) sequences of sites. (However, exposing the sequence API is something that is optional and can be postponed.) This means, that unlike sites, site arrays should not be twotuples of family and tag(s).
SiteArrays
and Sites
could be moved to kwant.system
at this occasion.
If we want to implement the code that assembled the SiteArrays
upon evaluation of a finalized builder in Python using NumPy, we may have to modify the internals of graph.CGraph
, at least such that it exposes the arrays it stores as NumPy arrays. Currently the arrays are simply C pointers to memory. That was not done initially, because of limitation of Cython that have disappeared since. It can well be that the graph
submodule can be drastically simplifed or even replaced by something else.