@@ -32,7 +32,7 @@ Even though it is suboptimal, one usually resorts to sampling $X$ on a homogeneo
...
@@ -32,7 +32,7 @@ Even though it is suboptimal, one usually resorts to sampling $X$ on a homogeneo
<!-- This should convey the point that it is advantageous to do this. -->
<!-- This should convey the point that it is advantageous to do this. -->
A better alternative which improves the simulation efficiency is to choose new, potentially interesting points in $X$ based on existing data. [@gramacy2004parameter; @de1995adaptive; @castro2008active; @chen2017intelligent] <!-- cite i.e., hydrodynamics-->
A better alternative which improves the simulation efficiency is to choose new, potentially interesting points in $X$ based on existing data. [@gramacy2004parameter; @de1995adaptive; @castro2008active; @chen2017intelligent] <!-- cite i.e., hydrodynamics-->
Bayesian optimization works well for high-cost simulations where one needs to find a minimum (or maximum). [@@takhtaganov2018adaptive]
Bayesian optimization works well for high-cost simulations where one needs to find a minimum (or maximum). [@@takhtaganov2018adaptive]
If the goal of the simulation is to approximate a continuous function with the least amount of points, the continuity of the approximation is achieved by a greedy algorithm that samples mid-points of intervals with the largest Euclidean distance or curvature[@mathematica_adaptive].
If the goal of the simulation is to approximate a continuous function with the least amount of points, the continuity of the approximation is achieved by a greedy algorithm that samples mid-points of intervals with the largest Euclidean distance or curvature[@mathematica_adaptive], see Fig. @fig:algo.
Such a sampling strategy would trivially speedup many simulations.
Such a sampling strategy would trivially speedup many simulations.
One of the most significant complications here is to parallelize this algorithm, as it requires a lot of bookkeeping and planning ahead.
One of the most significant complications here is to parallelize this algorithm, as it requires a lot of bookkeeping and planning ahead.
...
@@ -40,7 +40,7 @@ One of the most significant complications here is to parallelize this algorithm,
...
@@ -40,7 +40,7 @@ One of the most significant complications here is to parallelize this algorithm,
We start by calculating the two boundary points.
We start by calculating the two boundary points.
Two consecutive existing data points (black) $\{x_i, y_i\}$ define an interval.
Two consecutive existing data points (black) $\{x_i, y_i\}$ define an interval.
Each interval has a loss associated with it that can be calculated from the points inside the interval $L_{i,i+1}(x_i, x_{i+1}, y_i, y_{i+1})$.
Each interval has a loss associated with it that can be calculated from the points inside the interval $L_{i,i+1}(x_i, x_{i+1}, y_i, y_{i+1})$.
At each time step the interval with the largest loss is indicated (red), with its corresponding candidate point (green) in the middle of the interval.
At each time step the interval with the largest loss is indicated (red), with its corresponding candidate point (green) picked in the middle of the interval.
The loss function in this example is the curvature loss.
The loss function in this example is the curvature loss.
](figures/algo.pdf){#fig:algo}
](figures/algo.pdf){#fig:algo}
...
@@ -50,11 +50,11 @@ Additionally, the algorithm should also be fast in order to handle many parallel
...
@@ -50,11 +50,11 @@ Additionally, the algorithm should also be fast in order to handle many parallel
A simple example is greedily optimizing continuity of the sampling by selecting points according to the distance to the largest gaps in the function values, as in Fig. @fig:algo.
A simple example is greedily optimizing continuity of the sampling by selecting points according to the distance to the largest gaps in the function values, as in Fig. @fig:algo.
For a one-dimensional function with three points known (its boundary points and a point in the center), the following steps repeat itself:
For a one-dimensional function with three points known (its boundary points and a point in the center), the following steps repeat itself:
(1) keep all points $x$ sorted, where two consecutive points define an interval,
(1) keep all points $x$ sorted, where two consecutive points define an interval,
(2) calculate the Euclidean distance for each interval (see $L_{1,2}$ in Fig. @fig:loss_1D),
(2) calculate the distance for each interval $L_{1,2}=\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$,
(3) pick a new point $x_\textrm{new}$ in the middle of the largest interval, creating two new intervals around that point,
(3) pick a new point $x_\textrm{new}$ in the middle of the largest interval, creating two new intervals around that point,
(4) calculate $f(x_\textrm{new})$,
(4) calculate $f(x_\textrm{new})$,
(5) repeat the previous steps, without redoing calculations for unchanged intervals.
(5) repeat the previous steps, without redoing calculations for unchanged intervals.
In this paper, we describe a class of algorithms that rely on local criteria for sampling, such as in the previously mentioned example.
In this paper, we describe a class of algorithms that rely on local criteria for sampling, such as in the former example.
Here we associate a *local loss* to each of the *candidate points* within an interval, and choose the points with the largest loss.
Here we associate a *local loss* to each of the *candidate points* within an interval, and choose the points with the largest loss.
In the case of the integration algorithm, the loss is the error estimate.
In the case of the integration algorithm, the loss is the error estimate.
The most significant advantage of these *local* algorithms is that they allow for easy parallelization and have a low computational overhead.
The most significant advantage of these *local* algorithms is that they allow for easy parallelization and have a low computational overhead.
...
@@ -135,18 +135,21 @@ This loss will suggest to sample a point in the middle of an interval with the l
...
@@ -135,18 +135,21 @@ This loss will suggest to sample a point in the middle of an interval with the l
A more complex loss function that also takes the first neighbouring intervals into account is one that adds more points where the second derivative (or curvature) is the highest.
A more complex loss function that also takes the first neighbouring intervals into account is one that adds more points where the second derivative (or curvature) is the highest.
Figure @fig:adaptive_vs_grid shows a comparison between a result using this loss and a function that is sampled on a grid.
Figure @fig:adaptive_vs_grid shows a comparison between a result using this loss and a function that is sampled on a grid.
#### In general local loss functions only have a logarithmic overhead.
<!-- Bas: not sure what to write here -->
#### With many points, due to the loss being local, parallel sampling incurs no additional cost.
#### With many points, due to the loss being local, parallel sampling incurs no additional cost.
<!-- Bas: the text below does not really describe what is written above, but is an essential part nonetheless -->
<!-- Bas: the text below does not really describe what is written above, but is an essential part nonetheless -->
So far, the description of the general algorithm did not include parallelism.
So far, the description of the general algorithm did not include parallelism.
It needs to be able to suggest multiple points at the same time and remember which points it suggests.
The algorithm needs to be able to suggest multiple points at the same time and remember which points it suggests.
When a new point $\bm{x}_\textrm{new}$ with the largest loss $L_\textrm{max}$ is suggested, the interval it belongs to splits up into $N$ new intervals (here $N$ depends on the dimensionality of the function $f$.)
When a new point $\bm{x}_\textrm{new}$ with the largest loss $L_\textrm{max}$ is suggested, the interval it belongs to splits up into $N$ new intervals (here $N$ depends on the dimensionality of the function $f$.)
A temporary loss $L_\textrm{temp} = L_\textrm{max}/N$ is assigned to these newly created intervals until $f(\bm{x})$ is calculated and the temporary loss can be replaced by the actual loss $L \equiv L((\bm{x},\bm{y})_\textrm{new}, (\bm{x},\bm{y})_\textrm{neigbors})$ of these new intervals, where $L \ge L_\textrm{temp}$.
A temporary loss $L_\textrm{temp} = L_\textrm{max}/N$ is assigned to these newly created intervals until $f(\bm{x})$ is calculated and the temporary loss can be replaced by the actual loss $L \equiv L((\bm{x},\bm{y})_\textrm{new}, (\bm{x},\bm{y})_\textrm{neigbors})$ of these new intervals, where $L \ge L_\textrm{temp}$.
For a one-dimensional scalar function, this procedure is equivalent to temporarily using the function values of the neighbours of $x_\textrm{new}$ and assign the interpolated value to $y_\textrm{new}$ until it is known.
For a one-dimensional scalar function, this procedure is equivalent to temporarily using the function values of the neighbours of $x_\textrm{new}$ and assign the interpolated value to $y_\textrm{new}$ until it is known.
When querying $n>1$ points, the above procedure simply repeats $n$ times.
When querying $n>1$ points, the above procedure simply repeats $n$ times.
#### In general local loss functions only have a logarithmic overhead.
Due to the local nature of the loss function, the asymptotic complexity is logarithmic.
This is because the losses per interval are stored in a sorted list.
When asking for a new candidate point, the top entry is picked $\mathcal{O}(1)$.
The interval then splits into $N$ new intervals, as explained in the previous paragraph, its losses have to be inserted into the sorted list again with $\mathcal{O}(\log{n})$.
# Loss function design
# Loss function design
#### A failure mode of such algorithms is sampling only a small neighbourhood of one point.
#### A failure mode of such algorithms is sampling only a small neighbourhood of one point.