Minimally invasive vectorization of Kwant
Here's a plan to vectorize Kwant with a minimal amount of changes. The idea is that this is something that can be coded quickly, but is still very useful, and comprises a useful step forward towards more generalized systems/operators.
In the following, I only speak about finite systems. Infinite systems should be a straightforward generalization.
Vectorization of low-level systems
A kwant (low-level) system provides two things:
(1) A graph whose nodes (=sites) and edges (=hoppings) are densely numbered, each starting with zero.
(2) A way to obtain an element of the Hamiltonian matrix that is identified by two site indices and may further depend on system parameters.
I propose to extend the low-level system API in the following way:
We introduce numbered "value kinds". (I'm looking for a better name, how about "part"? I don't want to say "term", because of the limitation that each site/hopping belongs only to one value kind.) Each site and each hopping has exactly one value kind associated with it. This association is exposed by the API in two ways: two arrays that store the value kind index for each site and hopping, and for each value kind, a sequence of the associated site/hopping indices.
(We probably want to restrict a given value kind to either sites or hoppings. We even might have two separate numberings, one for site value kinds and one for hopping value kinds, but then, we may also have a single common numbering. This is a detail to be discussed.)
We add a new method that allows to query a single value kind in a vectorized way for all the associated sites/hoppings at once. That method takes the value kind index, and system parameters. It returns a dense 3D array of the shape (n, a, b) where n is the number of sites/hoppings that share the value kind, and (a, b) is the size of the matrix element. Obviously this implies that all values that share the same kind have the same matrix shape (for the same parameters).
For each value kind, we list the necessary parameters.
Observe that the above allows to implement
hamiltonian_submatrix efficiently. It also preserves the numbering of sites and hoppings, so that existing user scripts continue to work unchanged.
Still, to simplify our lifes, we might want to deprecate
hamiltonian_submatrix, and even break support for calling it in the fully general way. In practice,
hamiltonian_submatrix is either called for the full system, or for the unit cell and the inter-cell hopping of a lead.
Vectorized finalized builders
How to implement the above API for finalized builders?
We introduce a new
SiteArray class that associates multiple tags with a single site family. It's API is like the one of
Site, but in plural (except for
family). So, there's
positions that each return a sequence. Note that for the common Bravais lattices, both
positions can return a single dense 2D array of numbers.
From the user's perspective, vectorized builders are created in exactly the same way. The only difference is that value functions receive
SiteArray instances instead of
Site, and return arrays of values. In addition, vectorization needs to be enabled somehow. This could be done with a global switch, e.g. by calling
Of course finalized builders have to provide the new vectorized low-level system API as well as the old one. Observe that providing the old one, i.e. the
hamiltonian method poses no problem. We can keep the old code and only adapt it very slightly to call vectorized value functions with site arrays of length one.
The vectorized evaluation interface can be also provided efficiently. Note that builder is free to classify values as value kinds within the constraints imposed by the low level system API (i.e. compatible matrix sizes). So, it makes sense to classify as one value kind all sites or hoppings that share the same value function and in addition share the same site family (pair). For example, for graphene with one kind of onsite value function and one hopping value function, there will be two kinds of onsite values (one for each site family), and one or two types of hopping kinds. (That depends on whether the user has specified only A->B kind of hoppings or also B->A hoppings. IIRC builder will remember the one that has been set.) The Hermitian conjugate opposite hoppings can be treated with the same approach as currently.
Constant values of the same shape that are defined on the same site families should be probably also classified as a single kind of value. If we want, we can treat the case where all of the matrices are equal specially, for example by adopting a storage convention that is similar to NumPy's broadcasting, i.e. instead of storing 100 times the same onsite matrix (
[M, M, M, ...]) we store it just once (
[M]) in a sequence of length 1.
Vectorized low-level systems should expose a
site_arrays attribute that contains a sequence of
SiteArray instances, that contains the same information as
sites today, but using drastically less Python objects. The old
sites attribute will continue to exist for backwards compatibility, but it could be made to create site objects on-demand, so that they don't have to be stored.
So, when the request arrives (by the low-level API) to evaluate the Hamiltonian for a given value kind, the finalized builder has to get hold of the necessary site arrays before it can call the user-provided value function. These can be built on-demand efficiently by a piece of Cython code or perhaps even by clever usage of NumPy.
The above does not include any vectorization of the
Builder class, that still contains a dictionary with individual sites as keys. So construction and finalization of builders won't be any faster. Speeding up builders is a separate project.
I wrote here previously:
We can consider making
Sitea derived class of
I no longer think that this is a good idea. Instead, it seems much more natural and useful if site arrays are (fancy-indexable) sequences of sites. (However, exposing the sequence API is something that is optional and can be postponed.) This means, that unlike sites, site arrays should not be two-tuples of family and tag(s).
Sites could be moved to
kwant.system at this occasion.
If we want to implement the code that assembled the
SiteArrays upon evaluation of a finalized builder in Python using NumPy, we may have to modify the internals of
graph.CGraph, at least such that it exposes the arrays it stores as NumPy arrays. Currently the arrays are simply C pointers to memory. That was not done initially, because of limitation of Cython that have disappeared since. It can well be that the
graph submodule can be drastically simplifed or even replaced by something else.