This document contains a proposal for the new lowlevel system format in Kwant2.
We will start by describing some preliminary concepts and definitions. We will
then describe the public API exposed by the new lowlevel system format.
After this, we will discuss the concrete implementation provided by Kwant, the
kwant.builder.System
. Finally we will present the CAPI to systems.
Preliminaries
Site Ranges
With the new format we will introduce something similar to the concept of a
site family to the low level systems. A site range is simply a set of sites
that have the same number of orbitals associated with them (and that may also
be related in other ways). For the kwant.builder.System
implementation these
may correspond to highlevel SiteFamily
objects, but this is not a
requirement. There may be more than one site range with the same number of
orbitals. Note that this means that the number of orbitals per site must be
defined a priori, and cannot be defined "on the fly" as it can now in Kwant
1.2.
Sites and Site Ordering
The lowlevel format just considers sites as integers. A concrete
implementation may provide a mapping from these integers to more
rich objects (as kwant.builder.System
does to highlevel Site
objects).
but this mapping is not required by the lowlevel interface.
Sites are arranged in the following way:
indices: 0 1 2 N N+1 N+2 N+M
sites: [a1, a1, a3, ..., aN, b1, b2, ..., bM, ....]
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
site range "a" site range "b"
The site ranges are ordered by their number of orbitals. The order of site ranges with the same number of orbitals is not defined by the lowlevel interface; implementations may choose an order convenient for them. This means that the lowlevel site ranges refer to a contiguous block of sites. This choice of ordering has two advantages:
 site ranges can be represented by 2 integers,
(num_orbitals, first_site)
, as opposed to an integer and an array of integers,(num_orbitals, [site1, site2, ...])
 Slicing into quantities defined over all orbitals of the system (e.g. wavefunctions) is easy, as all sites in a contiguous block have the same number of orbitals. We can also easily calculate the shape of subblocks of the Hamiltonian in this way.
 Optionally we may apply RLE compression to store the site tags.
Symmetries
Every system we will want to describe has an associated symmetry group. We are only concerned with spatial symettries. For a generic finite system this symmetry group is just the trivial group. For an infinite system it is a translational symmetry characterized by N linearly independent realspace vectors. There may also be some rotational symmetries, which may render some symmetry groups nonabelian. For the most common cases, however, the symmetry group is abelian (e.g. nfold rotational symmetry + translational symmetry).
All of the groups that we wish to describe are finitely generated. We can therefore encode any group element as a sequence of pairs of integers. The first integer in each pair is the index of the generator (the order of the generators is defined by the Symmetry object), and the second integer is the power to which the generator is raised. For abelian groups it would be enough to store a sequence of integers  the powers to which of the generators is raised  however this is not enough information for nonabelian groups.
Connections
Once we have a system with sites, hoppings, and a given symmetry group we can
define what we will call the connections of the system. Informally we
can define this in the following way. We take all the sites that are in the
fundamental domain of the symmetry group, or connected by a hopping to such
sites; call this set S
. We now define the connections C
as the set
of symmetry group elements that map sites from the fundamental domain to sites
in S
. Note that the connections always contain the identity connection (and only
this element for a finite system or a completely disconnected infinite system).
Take the example of a 1D chain with nearestneighbor hoppings, the set of
connections is {(0,), (1,), (1,)}
. With secondnearest neighbor hoppings in
addition the connections would be {(0,), (1,), (1,), (2,), (2,)}
. For a
square lattice with nearestneighbor hoppings the connections would be {(0, 0), (1, 0), (1, 0), (0, 1), (0, 1)}
System
API
Here we define the lowlevel system API[2].
Data members

site_ranges
: sequence of pairs of integers,(begin_index, num_orbs)
that define the grouping of sites into ranges. 
symmetry
: An object that knows how to simplify symmetry group elements. 
connections
: sequence of group elements. Each group element is a sequence of pairs of integers, as described in the preliminary section on symmetries. 
terms
: sequence of tuples of integers,(connection, range_to, range_from)
whereconnection
indexesconnections
,range_to
andrange_from
index thesite_ranges
.
Methods

elements
: given an index intoterms
, returns a readonly sequence of hoppings that are present in the given term of the Hamiltonian. Onsite elements are represented by "selfhoppings"(i, i)
. 
hamiltonian_term
: evaluate a single term of the Hamiltonian. Given an index intoterms
, and a dictionary of arguments, returns an array of shape(N, norbs_range_to, norbs_range_from)
whereN
is the number of elements in the term and thenorbs_*
are the number of orbitals inrange_to
andrange_from
for the term. 
hamiltonian
: evaluate the Hamiltonian of the system. Given a sequence of representations for the generators of the symmetry group of the Hamiltonian, returns either a sparse matrix in COO format, or a dense matrix. It would also be nice if we could simply pass the "momenta", for the abelian case, and haveexp[ i k R]
calculated.
Symmetry
API
This is a proposed API to lowlevel symmetries.
Data members

num_directions
: the number of generators of the symmetry group 
abelian
: True if the symmetry group is abelian, False otherwise
Methods

simplify
: given a sequence of pairs of integers (representing a symmetry group element, using the convention of(generator_index, power)
) returns a sequence of pairs of integers representing a simplified symmetry group element.
Discussion
Nonabelian Symmetry groups
Do we really need to support these? This makes the treatment of the symmetries more complicated and for questionable gain.
terms
The meaning of The terms
are just an arbitrary grouping of parts of the full Hamiltonian.
The abstract lowlevel system format requires that all parts of the Hamiltonian
grouped together into a single term
must be associated with the same
symmetry group element and also with the same pair of site ranges.
Subclasses may then give further meaning to the terms
. Finalized Builders,
for example, will most probably make terms
from elements that are evaluated
by the same value function. We could imagine a lowlevel system optimized for
recursive Green's function that groups elements into terms
corresponding
to a single slice.
More general class of systems
Might be useful to have a more general class of systems that that only knows
how to "act" on a wavefunction. If one has a system that is very connected (so
that the Hamiltonian has essentiallyO(N²)
nonzero elements) so that the
Hamiltonian can barely fit in memory this may be useful. Are there some kind
of iterative algorithms that only require this?
Actual data format
Above we have talked about sequences
of various things. These could be
implemented as simple arrays, since they contain only simple types
(integers). In addition this would keep the interface closed to the
C lowlevel interface (see below), and so would make interoperability
easier.
Calculating band structure
Formally we need to solve an infinite eigenvalue problem. In practice we use the symmetry of the system to "fold" the problem into the fundamental domain. The finite eigenvalue problem to solve is then:
[∑ Uₐ(k̲) Hₐ ] u(k̲) = E(k̲) u(k̲)
a
The sum is over all symmetry group elements, a
, in the connection set, k̲
is
a vector of the momenta associated to each symmetry group generator. Uₐ(k̲)
is
a representation of a
in U(N)
. Hₐ
is the Hamiltonian matrix linking the
fundamental domain to the domain under the action of a
. E(k̲)
is the energy
and u(k̲)
is the wavefunction.
For a
that are pure translational symmetries the Uₐ(k̲)
will just look like
exp[i k̲·R̲ₐ] ⊗ 𝟙ₙ
where R̲ₐ
is the tuple of integers associated with a
and
the components of k̲
are in [0, 2π)
. If there is a rotational symmetry
subgroup, then I guess the associated momenta will be quantized in units of
2π/m
for an mfold rotational symmetry.
We can see that the lowlevel format naturally provides the necessary ingredients to be able to set up such a system.
Footnotes
[2]: We should probably define everything as arrays straight away to make it easier to conform to the C system API.
kwant.builder.System
Implemetation
The lowlevel site ranges will correspond to the highlevel site
families[3]. The elements
will be grouped by value: a separate
array of hoppings (onsites) for each value function, and a separate
array for constant values. The terms
will thus refer to elements
that can be efficiently evaluated by a vectorized value function.
Data members

site_arrays
: sequence ofbuilder.SiteArray
objects, one per highlevel site family. The tags in theSiteArray
are ordered lexicographically. This is more efficient than storing a sequence of sites (site family is only stored once). 
sites
: A property which behaves like a sequence ofSite
s. TheSite
s are ordered in the same way as the lowlevel format; by site family and then lexicographically by tag. The implementation just looks up tags insite_arrays
. 
term_elements
: a readonly sequence of arrays of pairs of integers; the elements associated with eachterm
. 
term_values
: a sequence of the same length asterms
. Contains a pair(val, conj)
.conj
is a flag that is true if the Hermitian conjugate of the values is to be calculated.val
is either a callable or an array. If it is an array, it has shape(N, norbs_fam_to, norbs_fam_from)
whereN
is the length of theelements
pointed to by the correspondingterm
. If it is a callable, then the first argument is aSite
(orSiteArray
for vectorized value functions, see below). If theelements
are hoppings (i.esite_to != site_from
) then the the second argument is also aSite
. This means we also need to either explicitly store whether a particular sequence ofelements
is onsites/hoppings, or check if the first pair hassite_to == site_from
or not[4].
Evaluating Hamiltonian terms
If a term has a corresponding value
that is an array, return the array.
Otherwise, if the callable value
has an attribute __vectorized__
then it
is called once with a SiteArray
as its first (and second, for hoppings)
argument. If the callable is not vectorized then it is called once for
each element in the block's els
. If the block has the conjugate
flag
set then the Hermitian conjugate of each value is calculated, and this
is returned.
Discussion
term_elements
Storing the It's kind of ugly to have to "check" if a particular sequence of elements
corresponds to onsites or hoppings. A distinction needs to be made because
in Kwant onsite value functions only take a single Site
, whereas
hopping value functions take two Site
s.
Footnotes
[3]: Or not, see the discussion on implementing RGF
[4]: This is ugly and weird, it should be tweaked. This may require a slight redefinition of the lowlevel interface.
System
API
C Here we define the C lowlevel system API. Systems implemented in C must conform to this interface. Solvers implemented in C may rely on this interface.
Rationale
There are two possible reasons why having a C interface to Kwant systems is important:
 Someone may wish to implement a lowlevel system directly in C for performance reasons
 Someone may wish to interface a Kwant system with an external solver that provides some functionality that is not provided by Kwant (e.g. RGF is not currently implemented in Kwant).
It may be argued that the first reason is not so relevant now that the
lowlevel format supports vectorized evaluation of the Hamiltonian,
which the kwant.builder.System
implementation will do.
A performanceconscious user who has pinpointed the evaluation of the
Hamiltonian as a bottleneck may just vectorize their value functions.
In this way there is only 1 Python function call for each value
function defined in the system (usually less than 10, say). If the
evaluation is still too slow, the user can rewrite their value functions
in Cython or directly in C and wrap with a Python interface.
The second reason is still relevant, however. It is unlikely that
the Kwant package will provide all the functionality that users
may want (e.g. DFT). Kwant does, however, have a very nice highlevel
interface for defining tightbinding models; it is entirely feasible
that someone would want to use Kwant "just" for building their systems.
In this case it is imperative that the System
be accessible in a
supported way from a lowlevel language like C, as most external solvers
would be implemented in such a lowlevel language.
The Proposed Interface
typedef uint32_t uint_t; // may be any suitably large unsigned integer
typedef double complex complex_t ;
struct CSymmetry_t {
uint_t num_directions ;
int abelian ;
void (*simplify)(
const void* symmetry ;
const uint_t *group_element,
uint_t n_factors,
uint_t *out_group_element,
uint_t *out_n_factors
) ;
} ;
struct CSystem_t {
uint_t *site_families ; // shape: (n_site_fams, 2)
uint_t n_site_fams ;
struct CSymmetry_t *symmetry ;
uint_t **connection_set ; // pointers to symmetry group elements
uint_t n_sym_els ; // number of symmetry elements in connection set
uint_t *n_factors ; // number of pairs (generator, power) for each connection set element
uint_t *terms ; // shape: (n_terms, 3)
uint_t n_terms ;
// Get a pointer to the hoppings associated with a term
// The memory for the `out_elements` array is be managed internally
// by subclasses of CSystem_t. This prevents needless copying.
void (*elements)(
const void *system, // the system
uint_t term_index, // the index of the term
const uint_t **out_elements, // shape: (n_elements, 2)
uint_t *out_n_elements // number of elements
) ;
// Calculate a single Hamiltonian block
// The memory of the `out` array is managed by the caller
void (*hamiltonian_term)(
const void *system, // the system
uint_t term_index, // the index of the term to calculate
const void *args, // the arguments to the Hamiltonian
complex_t *out // the array to hold the calculated Hamiltonian block
) ;
// Calculate the Hamiltonian and return in dense format
// The memory of the `out` array is managed by the caller
void (*hamiltonian_dense)(
const void *system, // the system
uint_t sym_index, // the index of the symmetry element to calculate
const void *args, // the arguments to the Hamiltonian
complex_t *out // the array to hold the calculated Hamiltonian block
) ;
// Calculate the Hamiltonian and return in sparse COO format
// The memory of the `out`, `out_row_idx` and `out_col_idx` arrays
// is managed by the caller
void (*hamiltonian_sparse)(
const void *system, // the system
uint_t sym_index, // the index of the symmetry element to calculate
const void *args, // the arguments to the Hamiltonian
uint_t *out_row_idx, // row indices of Hamiltonian elements
uint_t *out_col_idx, // column indices of Hamiltonian elements
complex_t *out_values // Hamiltonian values
) ;
} ;
Discussion
Array Format
Most of the arrays are arrays of some "composite" type and hence are "2D
arrays". Maybe instead we should have a struct
for each one, e.g.:
struct SiteFamily_t {
uint_t first_site, n_orbitals ;
} ;
struct SitePair_t {
uint_t to_site, from_site ;
}
struct Term_t {
uint_t sym_el, fam_to, fam_from ;
} ;
This would be more transparent, and might give us some performance gain from accessing aligned memory (if we use 32bit integer types), although we are certainly not at this level of microoptimization yet. In addition, we would have to make sure that this is compatible with the Python interface.
act
method
Symmetries should have an Instead of explicitly constructing the U(N)
representation we should probably
just be able to "act" with the symmetry.
Interfaces with other lowlevel languages
It might also be nice to provide a proper C++ interface, so that people interfacing with that language do not have to jump through the hoops of doing OOP in pure C, and can have a nicer interface to arrays than raw pointers, which are so 1980's.