tinyarray issueshttps://gitlab.kwant-project.org/kwant/tinyarray/-/issues2017-09-19T09:49:03Zhttps://gitlab.kwant-project.org/kwant/tinyarray/-/issues/9Creation of tinyarrays from zero-dimensional numpy arrays with unusual dtype ...2017-09-19T09:49:03ZChristoph GrothCreation of tinyarrays from zero-dimensional numpy arrays with unusual dtype does not work```
>>> ta.array(np.zeros((), np.float))
array(0.0)
>>> ta.array(np.zeros((1), np.float16))
array([0.0])
>>> ta.array(np.zeros((), np.float16))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: A sequenc...```
>>> ta.array(np.zeros((), np.float))
array(0.0)
>>> ta.array(np.zeros((1), np.float16))
array([0.0])
>>> ta.array(np.zeros((), np.float16))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: A sequence does not support sequence protocol - this is probably due to a bug in numpy for 0-d arrays.
```
The error message is actually misleading. Numpy has no choice in this matter. (See the source code for a pointer.)https://gitlab.kwant-project.org/kwant/tinyarray/-/issues/6Implement @ operator for Python >= 3.52017-09-19T09:49:03ZChristoph GrothImplement @ operator for Python >= 3.5https://gitlab.kwant-project.org/kwant/tinyarray/-/issues/5Implement comparison of arrays whose shapes differ2017-09-19T09:49:03ZChristoph GrothImplement comparison of arrays whose shapes differhttps://gitlab.kwant-project.org/kwant/tinyarray/-/issues/4Implement indexing of sub-arrays (i.e. a[0] for a 2d-array), perhaps even sli...2017-09-19T09:49:03ZChristoph GrothImplement indexing of sub-arrays (i.e. a[0] for a 2d-array), perhaps even slicinghttps://gitlab.kwant-project.org/kwant/tinyarray/-/issues/19Conversion test fails on non-x86 architectures (ARM64, PPC64)2023-02-28T16:33:12Zbadshah400Conversion test fails on non-x86 architectures (ARM64, PPC64)When [building tinyarray 1.2.3](https://build.opensuse.org/package/show/home:badshah400:branches:devel:languages:python:numeric/python-tinyarray) on aarch64/ppc64 architectures, the conversion test fails (via `pytest`) with the following...When [building tinyarray 1.2.3](https://build.opensuse.org/package/show/home:badshah400:branches:devel:languages:python:numeric/python-tinyarray) on aarch64/ppc64 architectures, the conversion test fails (via `pytest`) with the following output. No such problem seen with tinyarray version 1.2.2.
```
[ 53s] =================================== FAILURES ===================================
[ 53s] _______________________________ test_conversion ________________________________
[ 53s]
[ 53s] def test_conversion():
[ 53s] for src_dtype in dtypes:
[ 53s] for dest_dtype in dtypes:
[ 53s] src = ta.zeros(3, src_dtype)
[ 53s] tsrc = tuple(src)
[ 53s] npsrc = np.array(tsrc)
[ 53s] impossible = src_dtype is complex and dest_dtype in [int, float]
[ 53s] for s in [src, tsrc, npsrc]:
[ 53s] if impossible:
[ 53s] raises(TypeError, ta.array, s, dest_dtype)
[ 53s] else:
[ 53s] dest = ta.array(s, dest_dtype)
[ 53s] assert isinstance(dest[0], dest_dtype)
[ 53s] assert src == dest
[ 53s]
[ 53s] # Check correct overflow detection. We assume a typical architecture:
[ 53s] # sys.maxsize is also the maximum size of an integer held in a tinyarray
[ 53s] # array, and that Python floats are double-precision IEEE numbers.
[ 53s] for n in [10**100, -10**100, 123 * 10**20, -2 * sys.maxsize,
[ 53s] sys.maxsize + 1, np.array(sys.maxsize + 1),
[ 53s] -sys.maxsize - 2]:
[ 53s] raises(OverflowError, ta.array, n, int)
[ 53s]
[ 53s] # Check that values just below the threshold of overflow work.
[ 53s] for n in [sys.maxsize, np.array(sys.maxsize),
[ 53s] -sys.maxsize - 1, np.array(-sys.maxsize - 1)]:
[ 53s] ta.array(n, int)
[ 53s]
[ 53s] # If tinyarray integers are longer than 32 bit, numbers around the maximal
[ 53s] # and minimal values cannot be represented exactly as double precision
[ 53s] # floating point numbers. Check correct overflow detection also in this
[ 53s] # case.
[ 53s] n = sys.maxsize + 1
[ 53s] for dtype in [float, np.float64, np.float32]:
[ 53s] # The following assumes that n can be represented exactly. This should
[ 53s] # be true for typical (all?) architectures.
[ 53s] assert dtype(n) == n
[ 53s] for factor in [1, 1.0001, 1.1, 2, 5, 123, 1e5]:
[ 53s]
[ 53s] for x in [n, min(-n-1, np.nextafter(-n, -np.inf, dtype=dtype))]:
[ 53s] x = dtype(factor) * dtype(x)
[ 53s] raises(OverflowError, ta.array, x, int)
[ 53s] if dtype is not float:
[ 53s] # This solicitates the buffer interface.
[ 53s] x = np.array(x)
[ 53s] assert(x.dtype == dtype)
[ 53s] > raises(OverflowError, ta.array, x, int)
[ 53s] E Failed: DID NOT RAISE <class 'OverflowError'>
[ 53s]
[ 53s] test_tinyarray.py:200: Failed
```
Brief system info:
* openSUSE Tumbleweed on aarch64 (also on ppc64)
* Python version 3.8.5
* tinyarray version 1.2.3
* GCC 10.2.1
* numpy 1.19.2https://gitlab.kwant-project.org/kwant/tinyarray/-/issues/21Binary operator tests fail on i386 with python 3.102023-02-28T16:35:52ZEmanuele RoccaBinary operator tests fail on i386 with python 3.10Hi,
tinyarray seems to have issues with python3.10 on i386. I wonder if the problem may somehow be related to https://github.com/python/cpython/issues/82180.
This issue is tracked in Debian as https://bugs.debian.org/cgi-bin/bugreport.c...Hi,
tinyarray seems to have issues with python3.10 on i386. I wonder if the problem may somehow be related to https://github.com/python/cpython/issues/82180.
This issue is tracked in Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1017051.
```
=================================== FAILURES ===================================
____________________________ test_binary_operators _____________________________
def test_binary_operators():
ops = operator
operations = [ops.add, ops.sub, ops.mul, ops.mod,
ops.floordiv, ops.truediv]
if sys.version_info.major < 3:
operations.append(ops.div)
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=RuntimeWarning)
for op in operations:
for dtype in dtypes:
for shape in [(), 1, 3, (3, 2)]:
if dtype is complex and op in [ops.mod, ops.floordiv]:
continue
a = make(shape, dtype)
b = make(shape, dtype)
> assert_equal(op(ta.array(a.tolist()), ta.array(b.tolist())),
op(a, b))
E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 2 / 3 (66.7%)
E Max absolute difference: 1.11022302e-16
E Max relative difference: 1.11022302e-16
E x: array([nan+nanj, 1. +0.j, 1. +0.j])
E y: array([nan+nanj, 1. +0.j, 1. +0.j])
../../../test_tinyarray.py:375: AssertionError
______________________________ test_binary_ufuncs ______________________________
def test_binary_ufuncs():
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=RuntimeWarning)
for name in ["add", "subtract", "multiply", "divide",
"remainder", "floor_divide"]:
np_func = np.__dict__[name]
ta_func = ta.__dict__[name]
for dtype in dtypes:
for shape in [(), 1, 3, (3, 2)]:
if dtype is complex and \
name in ["remainder", "floor_divide"]:
continue
a = make(shape, dtype)
b = make(shape, dtype)
> assert_equal(ta_func(a.tolist(), b.tolist()),
np_func(a, b))
E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 2 / 3 (66.7%)
E Max absolute difference: 1.11022302e-16
E Max relative difference: 1.11022302e-16
E x: array([nan+nanj, 1. +0.j, 1. +0.j])
E y: array([nan+nanj, 1. +0.j, 1. +0.j])
../../../test_tinyarray.py:396: AssertionError
```https://gitlab.kwant-project.org/kwant/tinyarray/-/issues/22Factor out object size calculation2023-02-28T16:35:27ZChristoph GrothFactor out object size calculationThere are at least three places (can be found by grepping for `ndim > 1`) there the size of the array object is computed according to the same rules. This should be factored out into a unique (inline) function.There are at least three places (can be found by grepping for `ndim > 1`) there the size of the array object is computed according to the same rules. This should be factored out into a unique (inline) function.https://gitlab.kwant-project.org/kwant/tinyarray/-/issues/23Mutable tinyarrays?2023-03-19T22:51:00ZChristoph GrothMutable tinyarrays?Tinyarrays are currently immutable. This is necessary so that they can be used as dictionary keys, which is the original application of tinyarray (with [Kwant](https://gitlab.kwant-project.org/kwant/kwant)). This issue discusses whethe...Tinyarrays are currently immutable. This is necessary so that they can be used as dictionary keys, which is the original application of tinyarray (with [Kwant](https://gitlab.kwant-project.org/kwant/kwant)). This issue discusses whether and how the tinyarray module could be extended to support mutable arrays as well. It is motivated by requests from tinyarray users.
## Purpose
- With today’s tinyarray it is possible to write code like
```
a /= sqrt(dot(a, a))
b += a
```
The above results in the allocation of two new tinyarray objects. With mutable tinyarrays, both allocations could be avoided.
- Individual elements of an array could be overwritten, allowing to modify elements of a matrix or looping over a vector:
```
v = zeros(2, int)
for v[0] in range(10):
for v[1] in range(10):
...
```
- What else?
## Types
Like tuples and strings, tinyarrays are immutable by design. [Immutability is a prerequisite to hashability](https://docs.python.org/3/faq/design.html#why-must-dictionary-keys-be-immutable), which in turn is necessary for tinyarrays to be usable as dictionary keys. Otherwise, there is technically nothing that prevents tinyarrays from being mutated. After all the underlying memory is writeable.
For some applications mutable tinyarrays would be useful. I can see two ways of achieving that without breaking current applications and backwards compatibility:
- Introduce new tinyarray types that correspond to today’s `ndarray_int`, `ndarray_float`, `ndarray_complex`, only that they are mutable and not hashable. That could be probably done without much code duplication by adding a second template parameter to the `Array` class. That parameter would specify whether the underlying arrays are mutable or not.
- Extend the current types by introducing a flag that determines for each tinyarray instance whether it is mutable or not. Technically, C-API equivalents of magic methods like `__hash__` and `__setitem__` would always be defined, but item assignment would only work for mutable, and hashing only for immutable variants.
The advantage of the first solution is that it would not require accommodating an additional flag in the tinyarray object. Unless we find some trick, an additional boolean flag would increase the object size by 8 bytes (on 64 bit platforms) due to alignment. See the section “data layout” below.
One advantage of the second solution is that the implementation will be most likely simpler. Another advantage would be that while it should not be possible to turn an existing immutable tinyarray object into a mutable one, the other way (=freezing) would be possible and potentially useful.
## API
Independently of the implementation, the API could mostly stay the way it is. One would have to come up with a way of constructing mutable arrays. I see the following possibilites:
- Add an option to `tinyarray.array`, `tinyarray.zeros`, etc. Since for performance reasons tinyarray constructors only accept positional arguments, adding such a “mutable” option would mean it has to go behind the existing “dtype” argument. Thus specifying “mutable” would require specifying “dtype” as well. More importantly, a positional argument for mutability would be incompatible with NumPy.
- Prefix the mutable constructors, e.g. `tinyarray.marray`, etc. Because of backwards compatibility, immutable would have to remain the default (=without prefix).
- Put the mutable constructors into a separate namespace, e.g `tinyarray.mut.array`. This seems attractive, especially if the submodule could have the same contents. Then a user interested in mutable arrays could say `import tinyarray.mut as ta`. On the other hand, `mut.array` means two dict lookups instead of on. Finally, realising a [Python package with submodules as a single file](https://github.com/python/cpython/issues/87533) (which is what we want for simplicity and code reuse) is quirky.
- Initially mark all newly constructed tinyarrays as mutable and freeze them upon hashing or on-demand (`freeze` method). This solution may sound a bit too clever, but it would have the advantage of keeping the API simple. This possibility is [discussed for the case of lists in the Python design FAQ](https://docs.python.org/3/faq/design.html#why-must-dictionary-keys-be-immutable), and rejected because the list elements could be mutable themselves. This, however, is not a problem for tinyarrays! Immutable objects have other uses beyond dictionary keys, for example as default values for function arguments. This occasional need would be covered by a `freeze` method - mutable as default otherwise seems useful. Detail: `tinyarray.array` would always make a copy and thus return a new mutable array. `tinyarray.asarray` would avoid a copy when possible and possibly return an immutable array.
One would also have to decide what the result should be of an operation involving two mutable arrays, and one that is mutable and one that is not. I would tend to make these immutable, except perhaps in the case when both arguments are mutable.
## Views and mutability
Some operations on NumPy arrays create view objects that create a new array object that references (a part of) the data of the original array. Examples of such operations are indexing with slices or the `transpose` method. It is possible to modify the original array’s contents through a view. Tinyarray so far does not support indexing with slices, but it has `transpose`.
Views introduce a layer of indirection and require the introduction of strides (without strides, `transpose` cannot be implemented using views). Tinyarray has so far avoided introducing both which has the benefit of keeping data structures and memory access pattern simple and compact. Both are important for tinyarray’s speed which is of crucial importance.
Anyway, the biggest benefit of views is with large arrays. A `PyVarObject` object without any payload consists of five `size_t`, that is 40 bytes on 64 bit architectures. Hence, allocating and creating a new array involves a certain ovehead. The data of a small array is typically comparable in size, so that views seem not worth it for typical (i.e. tiny) arrays.
So far, the absence of views does not break compatibility with NumPy: a copy of an immutable array behaves the same as a copy. However, if mutable tinyarrays were to be introduced, we would face the following choice:
- Accept that tinyarrays behave differently from NumPy arrays when views are involved. This is not unheard of: if `l` is a list, `l[:]` creates a view but `l[1:]` creates a copy. And tinyarray already deviates from NumPy in other ways: the numerical comparison operators (`==`, `<`, etc.) work like for tuples: otherwise tinyarrays would not be usable as keys for dictionaries.
- Implement views for tinyarrays. This would have benefits for somewhat larger arrays, but also costs in terms of storage and runtime.
## Data layout
Tinyarrays are `PyVarObjects` and use [a trick](https://discuss.python.org/t/may-types-use-pyvarobject-ob-size-to-store-arbitrary-data/24136) to reduce the storage size: the obligatory `ob_size` field is used in a clever way:
- When `ob_size >= 0`, it stores the length of a 1-dimensional array.
- When `ob_size < -1`, its absolute value stores the number of dimensions (`ndim`). The shape is stored separately.
- The meaning of `ob_size == -1` is `ndim = 0`.
This way of storing things has two advantages:
- In the common case of `ndim == 1` no shape needs to be stored. This saves one `size_t`, i.e. 8 bytes.
- When the method `Array_base::ndim_shape` (see file `array.hh`) returns the number of dimensions and shape, it is able to return a pointer to the shape even in the special case `ndim == 1`: a pointer to the `ob_size` field is returned, which corresponds to the shape!
Unfortunately, in the case where we want to store a mutability bit for each array, I do not see how to store it in `ob_size` while keeping the above optimization. The problem is not the loss of one bit for `ndim` or size, but that there is no way that `ob_size` can correspond to the shape for both mutable and immutable arrays.
A practical solution would be to use `ob_size` to store `ndim` as well as mutability. The shape would then have to be stored separately, also in the case of one-dimensional arrays. That’s probably negligible.
## Preliminary synthesis
I’d like to hear what others think is the best way to balance the above constraints.
Currently, I tend to:
- Types: Do not introduce new types. Store mutability for each new array instance.
- API: Make each new array initially mutable. Freeze upon first call of `__hash__` or when requested manually through the `freeze` method.
- Views: Do not introduce views. Accept that semantics differs from NumPy (as it does now).
- Data layout: Accept the additional `size_t` for one-dimensional arrays.
Do you see serious problems caused by the above choices?https://gitlab.kwant-project.org/kwant/tinyarray/-/issues/24Switch to multi-phase init2023-03-02T09:55:09ZChristoph GrothSwitch to multi-phase initThis will enable compatibility with the new [multiple interpreters](https://peps.python.org/pep-0684/#restricting-extension-modules) feature of Python 3.12.This will enable compatibility with the new [multiple interpreters](https://peps.python.org/pep-0684/#restricting-extension-modules) feature of Python 3.12.