Distinguish public and internal interfaces (not only in documentation)
Motivation
Even though Python is a language for "consenting adults", it's still good to help users not to shoot in their foot, by making it clear what is part of the public API and what is internal.
PEP 8 says: "Public attributes are those that you expect unrelated clients of your class to use, with your commitment to avoid backward incompatible changes. Non-public attributes are those that are not intended to be used by third parties; you make no guarantees that non-public attributes won't change or even be removed."
We have been trying to do that, but Kwant is still behind what PEP 8 recommends and what good libraries (like the stdlib) do. The aim of this proposal is to make Kwant join this illustrious circle.
Global-level functions
We have been careful to put global names that belong to the API into __all__
variables. In addition to that, PEP 8 recommends to use leading underscores for internal global names in parallel with __all__
: "Even with __all__
set appropriately, internal interfaces (packages, modules, classes, functions, attributes or other names) should still be prefixed with a single leading underscore."
I think this is good practice, since it makes it immediately clear to a user what belongs to the API.
Kwant has plenty of global-level functions that are not part of the API (= not included in __all__
) and whose names do not begin with underscores. Some examples:
lattice.short_array_repr()
,
builder.validate_hopping()
,
plotter.set_colors()
,
physics.leads.setup_linsys()
,
linalg.lll.gs()
All of the above (and the many other similar ones) should be perpended with underscores.
Methods
I'm quite confident that the names of methods in Kwant do not require any changes. Because there's no __all__
for classes we have been using underscores already.
Types
Internal types that do not have to be known by the user should get an underscore, e.g.:
kwant.solvers.common.WaveFunction
,
kwant.builder.FiniteSystem
Non-public instance variables
PEP 8 is quite clear: "Use one leading underscore only for non-public methods and instance variables."
It’s reasonable that internal should be the default (PEP 8 also says so). Some examples of instance variables that are clearly internal:
solvers.mumps.Solver
: nrhs
,
lattice.Polyatomic
: reduced_vecs
,
builder.Builder
: H
Note that a class with only non-public instance variables becomes effectively immutable if none of its public methods provides a way to modify the instance. Kwant’s lattice types are good examples of classes that should be effectively immutable, but are currently lacking the corresponding leading underscores.
Public instance variables
When an instance could be useful for external clients, and we can commit ourselves to keep it around, we can define is as part of the public API by (a) documenting it and (b) leaving out the leading underscore. A good example is sites
of finalized builders. A somewhat less clear is symmetry
of builders. I think the latter could be useful, and we are sure to have to keep the symmetry around, so why not make it part of the API?
I think it’s important to make it easy and safe to use public instance variables. I think a reasonable approach is to either make sure that they are immutable (example: sites
of finalized builders), or to treat them as external parameters (example: leads
of builders). If mutation is allowed, the rules should be documented, and we should make sure that there is a clear-enough error message if these rules are violated (often duck typing will be enough).
Write-protecting public instance variables
Users are allowed to modify the leads
attribute of builders. It's even OK to assign somthing new to it. If that something is bogus, it won't lead to silent corruption of the builders state, because leads
is treated as just another parameter to finalized()
.
On the other hand, the sites
instance variable of a finalized builder is clearly a part of the internal state. It is immutable, and its elements are effectively immutable as well, but a user could still simply assign some wrong value to it: fsyst.sites = bogus_list
.
It's not difficult to imagine situations where this will lead to confusing errors.
We could choose not to care about such situations, but it actually is common among proper Python libraries to do so. The following idiom occurs more than 100 times in the stdlib:
@property
def foo(self)
return self._foo
I propose to do the same and make public instance variables properties, whenever reassigning them risks making the internal state inconsistent. This rules out attributes like leads
of builders (consistency is not endangered), or attributes of simple container classes, e.g. the error message of an exception.