Joseph Weston · bd0f2546
--- a/Low-level-System-Format.md
+++ b/Low-level-System-Format.md
-This document specifies a proposal for the new low-level system format in Kwant-2.
+This document contains a proposal for the new low-level system format in Kwant-2.
+
+We will start by describing som preliminary concepts and definitions. We will
+then describe the public API exposed by the new low-level system format.
+Finally we will discuss the concrete implementation provided with Kwant, the
+`FinalizedBuilder`.
+
+## Preliminary
+
+### Low-Level Site Families
+With the new format we will introduce the concept of a site family to the low
+level systems. A low-level site family is simply a set of sites that have the
+same number of orbitals associated with them. For the `FinalizedBuilder`
+implementation these *may* correspond to high-level `SiteFamily` objects,
+but this is not a requirement. There may be more than one low-level site
+family with the same number of orbitals. Note that this means that the
+number of orbitals per site must be defined *a priori*, and cannot
+be defined "on the fly" as it can now in Kwant-1.
+
+### Sites and Site Ordering
+The low-level format just considers sites as integers. A concrete
+implementation *may* provide a mapping from these integers to more
+rich objects (as `FinalizedBuilder` does to high-level `Site` objects).
+but this is not required by the low-level interface.

-We will start by describing the format for a *finite* system, that is, a system with no symmetries.
-We will then move on to *infinite* systems with symmetries. We will then list the operations
-that the low-level system must efficiently support and sketch the implementation of these operations
-given the low-level format. In the final section we briefly discuss a C-level interface to systems.
-
-## Finite Systems
-### Site Ordering
 Sites are arranged in the following way:

    indices:  0   1   2        N   N+1 N+2      N+M
@@ -14,226 +30,213 @@ Sites are arranged in the following way:
              ~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
                site family "a"    site family "b" 

-Where the families are ordered lexicographically and the sites within a particular family are ordered
-lexicographically by their tag. This permits a unique numbering of sites as indicated above.
-The above choice of ordering makes dealing with quantities such as wavefunctions easier, as all sites
-belonging to the same site family necessarily have the same number of orbitals.
+The site families are ordered by their number of orbitals.  This means that the
+low-level site families refer to a contiguous block of sites. This choice of
+ordering has two advantages:
+
+ site families can be represented by 2 integers,
+  `(num_orbitals, first_site)`, as opposed to an integer and an
+  *array* of integers, `(num_orbitals, [site1, site2, ...])`
+ Slicing into quantities defined over all orbitals of the system (e.g.
+  wavefunctions) is easy, as all sites in a contiguous block
+  have the same number of orbitals. We can also easily calculate
+  the shape of sub-blocks of the Hamiltonian in this way.
+
+
+### Symmetries
+Every system we will want to describe has an associated symmetry group.  For a
+finite system this symmetry group is just the trivial group.  For an infinite
+system it is a translational symmetry characterized by N linearly independent
+realspace vectors. All of the symmetry groups that we wish to deal with have a
+representation as the additive group of tuples of integers (or integers modulo
+some n for the case of finite subgroups such as discrete rotational symmetry).
+For example, the symmetry group of a system with 1D translational symmetry and
+6-fold rotational symmetry has a representation as `((ℤ, ℤ/6ℤ), +)`.
+A symmetry can act on sites (by shifting their position) but they can also act
+on wavefunctions (multiplication by a phase factor).
+
+Elements of the symmetry group will be represented in the low-level system as
+tuples of integers. There will be a `Symmetry` object that is capable of
+mapping this tuple-representation to a representation that can act on
+wavefunctions or site positions[1]. We will require that the tuple that
+contains all zeros must act as the identity.
+
+
+### Connection Sets
+Once we have a system with sites, hoppings, and a given symmetry group we can
+define what we will call the *connection set* of the system[2].  Informally we
+can define this in the following way. We take all the sites that are in the
+fundamental domain of the symmetry group, or connected by a hopping to such
+sites; call this set `S`. We now define the *connection set*, `C`, as the set
+of symmetry group elements that map sites from the fundamental domain to sites
+in `S`. Note that the connection set always contains the identity (for a finite
+system or a completely disconnected infinite system).
+
+Take the example of a 1D chain with nearest-neighbor hoppings, the connection
+set is `{(0,), (1,), (-1,)}`. With second-nearest neighbor hoppings in addition
+the connection set would be `{(0,), (1,), (-1,), (2,), (-2,)}`. For a square
+lattice with nearest-neighbor hoppings the connection set would be `{(0, 0),
+(1, 0), (-1, 0), (0, 1), (0, -1)}`
+
+
+## `System` API
+Here we define the low-level system API[3].
+
+### Data members
+
+ `site_families`: sequence of pairs of integers,  `(first_site, num_orbs)`
+that define the grouping of sites into low-level families.
+
+ `symmetry`: an object that can act on a wavefunction, given a group element
+(sequence of integers).
+
+ `connection_set`: sequence of arrays of integers. The arrays correspond to
+the representation of the symmetry group elements mentioned above.
+
+ `elements`: sequence of arrays of pairs of integers; the hoppings in the
+system. Also contains "self hoppings" `(i, i)` that represent on-site
+values. The hoppings in an array must be from and to the same pair of
+low-level site families.
+
+ `blocks`: sequence of tuples of integers,
+`(sym_el, fam_to, fam_from, chunk, conj)` where `sym_el` indexes the
+`connection_set`, `fam_to` and `fam_from` index the `site_families`,
+`chunk` indexes the `elements`, and `conj` is a flag which signifies
+that the complex conjugate of the hoppings specified should be taken.
+
+### Methods
+ `hamiltonian_block`: evaluate the Hamiltonian of a single block.
+Returns an array of shape `(N, norbs_fam_to, norbs_fam_from)` where
+`N` is the number of hoppings in the block and the other numbers
+are the number of orbitals in `fam_to` and `fam_from` for the block.
+This method is implemented by subclasses, it is the equivalent of
+the `hamiltonian` method of Kwant-1. This way, however, implementations
+may vectorize the evaluation of the Hamiltonian over a block.
+
+ `hamiltonian`: evaluate the Hamiltonian associated with a particular
+symmetry group element. This calls `hamiltonian_block` for each block
+that has the given symmetry group element, and forms either a dense
+matrix, or a sparse matrix in COO format.
+
+### Discussion
+
+#### Calculating band structure
+Write this
+
+#### Recursive Greens Function
+Write this
+

-Even though generic low-level systems will not have to provide an explicit interface to `Site` objects,
-such as those found in a `Builder`, it will nevertheless have to have a concept of `SiteFamily` where
-sites from the same family have the same number of orbitals.
+## `Builder.System` Implemetation

-### On-Site Hamiltonian Elements
-For a given site family, `fam`, with `n_orbs` orbitals per site the sites and values are stored in the
-following way.
+The low-level site families will correspond to the high-level site
+families[4]. The `elements` will be grouped by value: a separate
+array of hoppings (onsites) for each value function, and a separate
+array for constant values. The `blocks` will thus refer to elements
+that can be efficiently evaluated by a vectorized value function.

-All of the sites which have *constant* values assigned to them are stored like so:
+### Data members

-    (fam, [i1, i2, i3, ..., iN], [v1, v2, v3, ..., vn])
+ `sites`: sequence of `builder.Site` objects, the sites in the system

-where the `i<n>` are the site indices [1] and the `v<n>` are the values assigned to each
-site. The two sequences can be efficiently stored as arrays, as the shape is known (`(N,)` for the indices
-and `(N, n_orbs, n_orbs)` for the values).
+ `site_values`: a sequence of the same length as `site_families`. Contains
+either a callable or an array of shape `(N, norbs_fam, norbs_fam)`.

-All of the sites which have a value given by a *function* `func` are stored like so:
+ `values`: a sequence of the same length as `blocks`. Contains either
+a callable or an array. The array has shape
+`(N, norbs_fam, norbs_fam)` if the `block` refers to `elements` that
+are onsites, or `(N, norbs_fam_to, norbs_fam_from)` if the block
+refers to `elements` that are hoppings[5].

-    (fam, func, [i1, i2, i3, ..., iN])
+### Evaluating Hamiltonian Blocks
+If a block has a corresponding `value` that is an array, return the array.
+Otherwise, if the callable has an attribute `__vectorized__` then it
+is called once with `SiteArray` as its first (and second, for hoppings)
+argument. If the callable is not vectorized then it is called once for
+each element in the block's `els`. If the block has the `conjugate` flag
+set then the Hermitian conjugate of each value is calculated, and this
+is returned.

-If `func` is vectorized then it can be called *once*, passing all the sites [2], if not
-then it can be called `N` times, passing each site individually.
+### Discussion
+Write this

-### Hopping Hamiltonian Elements
-The hoppings are stored in an analogous way to the on-sites, except that now there is a *pair*
-of families, instead of a single one.

-For constant hoppings we store everything in the following way
+## C `System` API
+Here we define the C low-level system API. Systems implemented in C must conform
+to this interface. Solvers implemented in C may rely on this interface.

-    (fam_to, fam_from, [i1, i2, ..., iN], [j1, j2, ..., jN], [v1, v2, ..., vN])
+```C

-Where the values each have shape `(fam_to.n_orbs, fam_from.n_orbs)`.
+typedef uint32_t uint_t;  // may be any suitably large unsigned integer
+typedef double complex complex_t ;

-For hoppings given by a function `func` we store:
+struct Symmetry_t {
+
+    uint_t num_directions ;
+
+    // get a representation of a group element in GL(C, N)
+    void (*get_representation)(
+        const void* symmetry,
+        const uint_t *group_element,
+        uint_t N,
+        complex_t* out) ;
+
+} ;
+
+struct CSystem_t {
+
+    uint_t *site_families ;  // shape: (n_site_fams, 2)
+    uint_t n_site_fams ;
+
+    struct Symmetry_t *symmetry ;
+    uint_t *connection_set ;  // shape: (n_sym_els, symmetry->num_directions)
+    uint_t n_sym_els ;
+
+
+    // array of all the elements, ordered into chunks
+    uint_t *elements ;  // shape: (n_elements, 2)
+    uint_t n_elements ;
+    // the indices of the first element in each chunk 
+    uint_t *element_chunks ;  // shape: (n_element_chunks,)
+    uint_t n_element_chunks ;
+
+    uint_t *blocks ;  // shape: (n_blocks, 5)
+    uint_t n_blocks ;
+
+    void (*hamiltonian_block_t)(
+        const void *system,  // the system
+        gint_t block_index,  // the index of the block to calculate
+        const void *args,  // the arguments to the Hamiltonian
+        complex_t *out  // the array to hold the calculated Hamiltonian block
+} ;
+```
+
+### Discussion
+
+#### Array Format
+Most of the arrays are arrays of some "composite" type and hence are "2D arrays".
+Maybe instead we should have a `struct` for each one, e.g.:
+```C
+struct Block_t {
+    uint_t sym_el, fam_to, fam_from, chunk, conj ;
+} ;
+```
+This would be more transparent. The disadvantage is that we would be subject
+to padding, unless we define the structure as `packed`.

-    (fam_to, fam_from, func, [i1, i2, ..., iN], [j1, j2, ..., jN])

-As for the on-site value functions, `func` can be evaluated just once if vectorized, or
-called for each pair `i<n>, j<n>` if not.

 ---------------------
 ##### Footnotes
-[1]: Should the indices be the *global* site indices, or just the offset from the start of the sites
-     from the current site family?
-
-[2]: The functions may expect a bona-fide `SiteArray` in the case of a finalized `Builder`,
-     can we efficiently construct such a thing? Seems a bit of a waste to construct a new
-     array and then immediately discard it. If the sites were ordered by their value functions
-     then we could just use a slice.
-
-
-## Infinite Systems
-The difference between finite and infinite systems is that the latter have a discrete
-*translational symmetry*. In Kwant-2 the translational symmetry is characterised by N linearly
-independent real-space vectors (as opposed to a single realspace vector, as in Kwant-1).
-We define an isomorphism `φ: G → (ℤ^N, +)` from the symmetry group `G` to `(ℤ^N, +)`, the
-additive group of N-tuples of integers. Note that there can be several valid choices of `φ`.
-
-We define the sets `S` and `T` respectively to be the sets of all the sites and hoppings in
-the system. We define the *finite representation* of `S`, which we denote `S'`, to be all the elements
-of `S` in the fundamental domain of the symmetry group. We define the finite representation of the
-hoppings, `T'`, to be the elements of `T` which satisfy the following conditions:
-
-+ At least one site is in the fundamental domain, i.e. it is in the set `S'`.
-+ If one site `s` is not in the fundamental domain then the following condition also has
-  to be satisfied. The group element `g ∈ G` for which `s = g·p, p ∈ S'` must have a
-  representation under `φ` in `(ℤ^N, +)` for which the first non-zero element is positive.
-  This prevents double counting of hoppings.
-
-Given these definitions we can further define the *connection set* of the system as the
-subset `G' ⊆ G` which can generate all the sites in the hoppings in `T'`. We define
-the *degree* of the system as `|G'|`. 
-
-Note that the above definitions apply equally well to finite systems. In this case
-`G` is just the trivial group, `S = S'`, `T=T'` and the system hence has degree 1.
-
-Infinite systems are stored in the following way. The on-sites are stored in the
-same way as for finite systems; only sites in the finite representation are stored.
-For each element of the connection set, the associated hoppings (i.e. the hoppings
-between the fundamental domain and its image under the action of the connection
-set element) are stored in the same way as for the finite system.
-
-
-    group element:                  g1                         g2
-                          _______________________    _______________________
-                         /                       \  /                       \ ...
-    pair of families:    fam1,fam2  fam2,fam3 ...   fam1,fam2  fam2,fam3 ... 
-                         ___/ \________________
-                        /                      \
-    value functions:    val1,     val2,    ...
-                       (i1, j1)  (m1, n1)
-    hoppings:          (i2, j2)  (m2, n2)
-                       (i3, j3)  (m3, n3)
-                          .         .
-                          .         .
-
-                       \______/  \______/
-                       
-                            "blocks"
-
-We can thus store the graph of the system in this way. Each combination `(g, fam1, fam2, val)` corresponds
-to a "block" of the graph (we need to think of a better name, as this won't necessarily correspond at all
-to a block matrix of the Hamiltonian). We store the graph as an array of ordered pairs, and we also store
-a sequence of "blocks" which consists of `(g, fam1, fam2, val)`. `val` is either a single function or
-a sequence of matrices, as discussed in the section on finite systems.
-
-    graph:   ⎡i1, i2 ... iN, iN+1, ..., iN+M, ...⎤
-             ⎣j1, j2 ... jN, jN+1, ..., jN+M, ...⎦
-
-    block offsets: [0, N, N+M, ...]
-
-    blocks: [(g1, fam1, fam2, val1), (g1, fam1, fam2, val2), ...
-             (g1, fam1, fam3, valM), ...
-             (g2, fam1, fam2, valK), ...
-            ]
-
-Is there any advantage at all in keeping the graph in a single array, as opposed to split
-into several arrays explicitly associated with each block?.
-
-
-## Supported Operations
-Here we note down the various common operations that the low-level format has to efficiently support
-and how they would be implemented with the above definition for the low-level format.
-
-### Calculating band structure
-This requires evaluating the full Hamiltonian matrices associated with *each* of the elements
-of the connection set, and then solving an eigenvalue equation like:
-
-    [H + V_1 exp(i k_x x) + V_2 exp(2 i k_x x) + ... + h.c
-       + W_1 exp(i k_y y) + W_2 exp(2 i k_y y) + ... + h.c] u = E u
-
-We can easily evaluate the matrices individually, as they correspond to all blocks with
-a particular symmetry group element.
-
-### Evaluating all elements which depend on some parameter (e.g. time)
-Iterate over the blocks, inspect the `value`, if the `value` takes a parameter with the correct
-name we add the block index to a list of block indices. We can cache this list of block indices.
-Whenever we want to evaluate the part of the Hamiltonian which depends on parameter X we have just
-to evaluate the blocks which we cached.
-
-### Application of the Hamiltonian to a vector
-To be written.
-
-### Obtaining a submatrix between user-defined sites
-This is just the current behaviour of `hamiltonian_submatrix`. This is not easy with the above format.
-If the sites of the submatrix are known before finalization then the user can define a site family
-that contains the sites for which they wish to extract a submatrix. As the sites are ordered by
-site family, this enables efficient construction of the submatrix.
-
-
-## Abstract API
-The above proposal will need to be supported by an abstract API that allows
-systems to be implemented in C/C++ by users if they so wish. Below is a
-sketch of a Cython implementation of this abstract API
-```python
-class System:
-
-    # An array of site families. Each site family has the format
-    # (offset, n_orbs) where "offset" is the offset in the ordering
-    # of sites where sites of this site_family start, and "n_orbs" is
-    # the number of orbitals on each site.
-    ndarray[int, ndim=2] site_families
-
-    # An array of pairs (i, j). These are organized into chunks which
-    # have a homogeneous pair of site families.
-    ndarray[int, ndim=2] hoppings
-
-    # An array storing the offsets of the different chunks of the
-    # `hoppings` array.
-    ndarray[int, ndim=1] chunks
-
-    # An array of blocks. A block has the format (sym, fam1, fam2, chunk, conj)
-    # where `sym` is an integer that corresponds to a symmetry group element,
-    #`fam1` and `fam2` are indices into the `site_families`, `chunk` is an index
-    # into `chunks`, and `conj` is a boolean flag that indicates that the
-    # complex conjugate should be calculated (this allows efficient storage
-    # for Hermitian Hamiltonians).
-    ndarray[int, ndim=2] blocks
-
-    # Get the Hamiltonian of the given block. Returns am array of shape
-    # (N, fam2.n_orbs, fam1.n_orbs)
-    def hamiltonian_block(int block_index):
-        ...
-
-    # just a convenience function around "hamiltonian_block
-    # to construct the Hamiltonian corresponding to a symmetry group element
-    def hamiltonian(int symmetry_element):
-        ...
-```
+[1]: Currently, Symmetries are a uniquely high-level concept in Kwant, and only the action of
+the group elements on sites is implemented.
+
+[2]: This is not a very good name, hopefully a better one will be thought up
+
+[3]: Should we actually define everything as arrays straight away to make it easier to
+conform to the C system API?
+
+[4]: Or not, see the discussion on implementing RGF

-Note that the low-level system doesn't have to know *anything at all* about
-symmetries. The *meaning* of the symmetry group elements (which are just
-represented by an integer in the above format) is entirely up to the
-implementations to define.
-
-
-## C API
-In addition to the new low-level format, systems in Kwant-2 should have a C API.
-This is needed in order to interface Kwant systems with external codes
-(e.g. existing RGF codes, DFT...) and also so that users may write their own
-low-level systems directly in C that can be used with the Kwant solvers.
-There are four possible "combinations" of systems and solvers which should be
-possible:
-
-+ Python system, Python solver
-+ Python system, C solver
-+ C system, Python Solver
-+ C system, C solver
-
-In order to "drive" calculations from the Python level all the C-level
-components will have to be wrapped. Ideally the manual wrapping that has
-to be done by the end user should be minimal; to this end we provide a set of
-Cython classes that may be subclassed by the end user.
-
-In addition the case where a C solver uses a C system should be efficient
-in the sense that the solver should be able to call the C-API of the
-system directly without passing through the Python level. A toy example for
-such a framework was written by Christoph in 2011.
+[5]: This is ugly and weird, it should be tweaked. This may require a slight
+re-definition of the low-level interface.