-
Bas Nijholt authoredBas Nijholt authored
- Introduction
- Simulations are costly and often require sampling a region in parameter space.
- Choosing new points based on existing data improves the simulation efficiency.
- We describe a class of algorithms relying on local criteria for sampling, which allow for easy parallelization and have a low overhead.
- We provide a reference implementation, the Adaptive package, and demonstrate its performance.
- Review of adaptive sampling
- Experiment design uses Bayesian sampling because the computational costs are not a limitation.
- Plotting and low dimensional integration uses local sampling.
- PDE solvers and computer graphics use adaptive meshing.
- Design constraints and the general algorithm
- We aim to sample low dimensional low to intermediate cost functions in parallel.
- We propose to use a local loss function as a criterion for choosing the next point.
- As an example, the interpoint distance is a good loss function in one dimension.
- In general local loss functions only have a logarithmic overhead.
- With many points, due to the loss being local, parallel sampling incurs no additional cost.
- Loss function design
- A failure mode of such algorithms is sampling only a small neighborhood of one point.
- A solution is to regularize the loss such that this would be avoided.
- Adding loss functions allows for balancing between multiple priorities.
- A desirable property is that eventually, all points should be sampled.
- Examples
- Line simplification loss
- The line simplification loss is based on an inverse Visvalingam’s algorithm.
- A parallelizable adaptive integration algorithm based on cquad
- The cquad algorithm belongs to a class that is parallelizable.
- isosurface sampling
- Implementation and benchmarks
- The learner abstracts a loss based priority queue.
- The runner orchestrates the function evaluation.
- Possible extensions
- Anisotropic triangulation would improve the algorithm.
- Learning stochastic functions is a promising direction.
- Experimental control needs to deal with noise, hysteresis, and the cost for changing parameters.
title: 'Adaptive, tools for adaptive parallel sampling of mathematical functions'
journal: 'PeerJ'
author:
- name: Tinkerer
affiliation:
- Kavli Institute of Nanoscience, Delft University of Technology, P.O. Box 4056, 2600 GA Delft, The Netherlands
email: not_anton@antonakhmerov.org
abstract: |
Large scale computer simulations are time-consuming to run and often require sweeps over input parameters to obtain a qualitative understanding of the simulation output.
These sweeps of parameters can potentially make the simulations prohibitively expensive.
Therefore, when evaluating a function numerically, it is advantageous to sample it more densely in the interesting regions (called adaptive sampling) instead of evaluating it on a manually-defined homogeneous grid.
Such adaptive algorithms exist within the machine learning field.
These mehods can suggest a new point to calculate based on \textit{all} existing data at that time; however, this is an expensive operation.
An alternative is to use local algorithms---in contrast to the previously mentioned global algorithms---which can suggest a new point, based only on the data in the immediate vicinity of a new point.
This approach works well, even when using hundreds of computers simultaneously because the point suggestion algorithm is cheap (fast) to evaluate.
We provide a reference implementation in Python and show its performance.
acknowledgements: |
We'd like to thank ...
contribution: |
Bla
Introduction
Simulations are costly and often require sampling a region in parameter space.
In the computational sciences, one often does costly simulations---represented by a function
Choosing new points based on existing data improves the simulation efficiency.
A better alternative which improves the simulation efficiency is to choose new, potentially interesting points in
We describe a class of algorithms relying on local criteria for sampling, which allow for easy parallelization and have a low overhead.
Due to parallelization, the algorithm should be local, meaning that the information updates are only in a region around the newly calculated point. Additionally, the algorithm should also be fast in order to handle many parallel workers that calculate the function and request new points. A simple example is greedily optimizing continuity of the sampling by selecting points according to the distance to the largest gaps in the function values. For a one-dimensional function (Fig. @fig:loss_1D) this is to (1) construct intervals containing neighboring data points, (2) calculate the Euclidean distance of each interval and assign it to the candidate point inside that interval, and finally (3) pick the candidate point with the largest Euclidean distance. In this paper, we describe a class of algorithms that rely on local criteria for sampling, such as in the previously mentioned example. Here we associate a local loss to each of the candidate points within an interval, and choose the points with the largest loss. In the case of the integration algorithm, the loss is the error estimate. The most significant advantage of these local algorithms is that they allow for easy parallelization and have a low computational overhead.
We provide a reference implementation, the Adaptive package, and demonstrate its performance.
We provide a reference implementation, the open-source Python package called Adaptive[@Nijholt2019a], which has previously been used in several scientific publications[@vuik2018reproducing; @laeven2019enhanced; @bommer2019spin; @melo2019supercurrent]. It has algorithms for
ipyparallel
, mpi4py
, and dask.distributed
.
It provides auxiliary functionality such as live-plotting, inspecting the data as the calculation is in progress, and automatically saving and loading of the data.
Review of adaptive sampling
Optimal sampling and planning based on data is a mature field with different communities providing their own context, restrictions, and algorithms to solve their problems. To explain the relation of our approach with prior work, we discuss several existing contexts. This is not a systematic review of all these fields, but rather, we aim to identify the important traits and design considerations.
Experiment design uses Bayesian sampling because the computational costs are not a limitation.
Optimal experiment design (OED) is a field of statistics that minimizes the number of experimental runs needed to estimate specific parameters, and thereby, it reduces the costs of experimentation.[@emery1998optimal] It works with many degrees of freedom and can consider constraints, for example, when the sample space contains settings that are practically infeasible. One form of OED is response-adaptive design[@hu2006theory], which concerns the adaptive sampling of designs for statistical experiments. Here, the acquired data (i.e., the observations) are used to estimate the uncertainties of a certain desired parameter. Then it suggests further experiments that will optimally reduce these uncertainties. In this step of the calculation Bayesian statistics is frequently used. Since Bayesian statistics naturally provides tools for answering such questions; however, because it provides closed-form solutions Markov chain Monte Carlo (MCMC) sampling is the goto tool in determining the most promising samples. In a typical non-adaptive experiment, decisions on experiments are done, are made and fixed in advance.
Plotting and low dimensional integration uses local sampling.
Plotting a low dimensional function in between bounds requires one to evaluate the function on sufficiently many points such that when we interpolate values in between data points, we get an accurate description of the function values that were not explicitly calculated.
In order to minimize the number of function evaluations, one can use adaptive sampling routines.
For example, for one-dimensional functions, Mathematica[@Mathematica] implements a FunctionInterpolation
class that takes the function,
CQUAD
algorithm[@gonnet2010increasing] in the GNU Scientific Library[@galassi1996gnu] implements a more sophisticated strategy and is a doubly-adaptive general-purpose integration routine which can handle most types of singularities.
In general, it requires more function evaluations than the integration routines in QUADPACK
[@galassi1996gnu]; however, it works more often for difficult integrands.
It is doubly-adaptive because it can decide to either subdivide intervals into more intervals or refine an interval by adding more points---that do not lie on a regular grid---to each interval.
PDE solvers and computer graphics use adaptive meshing.
Hydrodynamics[@berger1989local; @berger1984adaptive] and astrophysics[@klein1999star] use an adaptive refinement of the triangulation mesh at which a partial differential equation is discretized. By providing smaller mesh elements in regions with a higher variation of the function, they reduce the amount of data and calculation needed at each step of time propagation. The remeshing at each time step happens globally, and this is an expensive operation. Therefore, mesh optimization does not fit our workflow because expensive global updates should be avoided. Computer graphics uses similar adaptive methods where a smooth surface can represent a surface via a coarser piecewise linear polygon mesh, called a subdivision surface[@derose1998subdivision]. An example of such a polygonal remeshing method is one where the polygons align with the curvature of the space or field; this is called anisotropic meshing[@alliez2003anisotropic].
Design constraints and the general algorithm
We aim to sample low dimensional low to intermediate cost functions in parallel.
The general algorithm that we describe in this paper works best for low to intermediary cost functions. The point suggestion step happens in a single sequential process while the function executions can be in parallel. This means that to benefit from an adaptive sampling algorithm
We propose to use a local loss function as a criterion for choosing the next point.
To minimize
As an example, the interpoint distance is a good loss function in one dimension.
An example of such a loss function for a one-dimensional function is the interpoint distance, such as in Fig. @fig:loss_1D. This loss will suggest to sample a point in the middle of an interval with the largest Euclidean distance and thereby ensure the continuity of the function. A more complex loss function that also takes the first neighboring intervals into account is one that adds more points where the second derivative (or curvature) is the highest. Figure @fig:adaptive_vs_grid shows a comparison between a result using this loss and a function that is sampled on a grid.
In general local loss functions only have a logarithmic overhead.
With many points, due to the loss being local, parallel sampling incurs no additional cost.
So far, the description of the general algorithm did not include parallelism. It needs to be able to suggest multiple points at the same time and remember which points it suggests. When a new point
Loss function design
A failure mode of such algorithms is sampling only a small neighborhood of one point.
import adaptive
A solution is to regularize the loss such that this would be avoided.
Adding loss functions allows for balancing between multiple priorities.
A desirable property is that eventually, all points should be sampled.
Examples
Line simplification loss
The line simplification loss is based on an inverse Visvalingam’s algorithm.
Inspired by a method commonly employed in digital cartography for coastline simplification, we construct a loss function that does its reverse. [@visvalingam1990douglas]