Skip to content
Snippets Groups Projects
paper.md 11.59 KiB
title:  'Adaptive, tools for adaptive parallel sampling of mathematical functions'
journal: 'PeerJ'
author:
- name: Tinkerer
  affiliation:
    - Kavli Institute of Nanoscience, Delft University of Technology, P.O. Box 4056, 2600 GA Delft, The Netherlands
  email: not_anton@antonakhmerov.org
abstract: |
  Large scale computer simulations are time-consuming to run and often require sweeps over input parameters to obtain a qualitative understanding of the simulation output.
  These sweeps of parameters can potentially make the simulations prohibitively expensive.
  Therefore, when evaluating a function numerically, it is advantageous to sample it more densely in the interesting regions (called adaptive sampling) instead of evaluating it on a manually-defined homogeneous grid.
  Such adaptive algorithms exist within the machine learning field.
  These mehods can suggest a new point to calculate based on \textit{all} existing data at that time; however, this is an expensive operation.
  An alternative is to use local algorithms---in contrast to the previously mentioned global algorithms---which can suggest a new point, based only on the data in the immediate vicinity of a new point.
  This approach works well, even when using hundreds of computers simultaneously because the point suggestion algorithm is cheap (fast) to evaluate.
  We provide a reference implementation in Python and show its performance.
acknowledgements: |
  We'd like to thank ...
contribution: |
  Bla

Introduction

Simulations are costly and often require sampling a region in parameter space.

In the computational sciences, one often does costly simulations---represented by a function f---where a certain region in parameter space X is sampled, mapping to a codomain Y: f \colon X \to Y. Frequently, the different points in X can be independently calculated. Even though it is suboptimal, one usually resorts to sampling X on a homogeneous grid because of its simple implementation.

Choosing new points based on existing data improves the simulation efficiency.

A better alternative which improves the simulation efficiency is to choose new, potentially interesting points in X based on existing data. [@gramacy2004parameter; @de1995adaptive; @castro2008active; @chen2017intelligent] Baysian optimization works well for high-cost simulations where one needs to find a minimum (or maximum). [@@takhtaganov2018adaptive] If the goal of the simulation is to approximate a contineous function with the least amount of points, the continuity of the approximation is achieved by a greedy algorithm that samples mid-points of intervals with the largest Euclidean distance or curvature[@mathematica_adaptive]. Such a sampling strategy would trivially speedup many simulations. One of the most significant complications here is to parallelize this algorithm, as it requires a lot of bookkeeping and planning ahead.

We describe a class of algorithms relying on local criteria for sampling, which allow for easy parallelization and have a low overhead.

Due to parallelization, the algorithm should be local, meaning that the information updates are only in a region around the newly calculated point. Additionally, the algorithm should also be fast in order to handle many parallel workers that calculate the function and request new points. A simple example is greedily optimizing continuity of the sampling by selecting points according to the distance to the largest gaps in the function values. For a one-dimensional function (Fig. @fig:loss_1D) this is to (1) construct intervals containing neighboring data points, (2) calculate the Euclidean distance of each interval and assign it to the candidate point inside that interval, and finally (3) pick the candidate point with the largest Euclidean distance. In this paper, we describe a class of algorithms that rely on local criteria for sampling, such as in the previous mentioned example. Here we associate a local loss to each of the candidate points within an interval, and choose the points with the largest loss. In the case of the integration algorithm the loss could just be an error estimate. The most significant advantage of these local algorithms is that they allow for easy parallelization and have a low computational overhead.

Visualization of a simple point choosing algorithm for a blackbox function (grey). The existing data points (green) \{x_i, y_i\}_{i \in 1...4} and corresponding candidate points (red) in the middle of each interval. Each candidate point has a loss L indicated by the size of the red dots. The candidate point with the largest loss will be chosen, which in this case is the one with L_{1,2}.

We provide a reference implementation, the Adaptive package, and demonstrate its performance.