add first draft of algorithm specifications

e38d0bde · Joseph Weston · f19388b9 · e38d0bde
Commit e38d0bde authored 5 years ago by Joseph Weston
--- a/paper.md
+++ b/paper.md
@@ -131,6 +131,30 @@ The local loss function values then serve as a criterion for choosing the next p
 This means that upon adding new data points, only the intervals near the new point needs to have their loss value updated.
 The amortized complexity of the point suggestion algorithm is, therefore, $\mathcal{O}(1)$.

+The algorithm can be summarized as follows, where `f` is the function to evaluate, `loss` is the loss function, and `heap_push`, `head_pop` and `heap_max` are functions for manipulating a max-heap.
+
+```
+data $\gets$ empty_hashmap()
+intervals $\gets$ empty_max_heap()
+data[a] $\gets$ f(a)
+data[b] $\gets$ f(b)
+l $\gets$ loss(a, b, data[a], data[b])
+heap_push(intervals, (l, a, b))
+
+while heap_max(intervals)[0] > $\epsilon$:
+    _, a, b $\gets$ heap_pop(intervals)
+    m $\gets$ (a + b) / 2
+    data[m] $\gets$ f(m)
+    l_left $\gets$ loss(a, m, data[a], data[m])
+    l_right $\gets$ loss(m, b, data[m], data[b])
+    heap_push(intervals, (l_left, a, m))
+    heap_push(intervals, (l_right, m, b))
+```
+
+In the above, `loss` only gets the data associated with a single interval;
+in order to support loss functions that rely on data from neighboring intervals we would need to maintain a separate datastructure that encodes the neighborhood information.
+For example, if `data` were a binary tree storing `(x, f(x))` then we could query neighboring points in $\mathcal{O}(\log N)$ time.
+
 #### As an example, the interpoint distance is a good loss function in one dimension.
 An example of such a loss function for a one-dimensional function is the interpoint distance.
 This loss will suggest to sample a point in the middle of an interval with the largest Euclidean distance and thereby ensure the continuity of the function.
@@ -145,6 +169,44 @@ A temporary loss $L_\textrm{temp} = L_\textrm{max}/N$ is assigned to these newly
 For a one-dimensional scalar function, this procedure is equivalent to temporarily using the function values of the neighbours of $x_\textrm{new}$ and assigning the interpolated value to $y_\textrm{new}$ until it is known.
 When querying $n>1$ points, the above procedure repeats $n$ times.

+This is illustrated in the following algorithm, where `parallel_map` takes a function and array of inputs and evaluates the function on each input in parallel, and `n_per_round` is the number of parallel evaluations to do at a time.
+
+```
+def get_next(data, intervals):
+  l, a, b, need_update $\gets$ heap_pop(intervals)
+  while need_update:
+    f_a $\gets$ data[a]
+    f_b $\gets$ data[b]
+    if f_a is None or f_b is None:
+      break
+    l $\gets$ loss(a, b, f_a, f_b)
+    heap_push(intervals, (l, a, b, False))
+    l, a, b, need_update $\gets$ heap_pop(intervals)
+  return (l, a, b)
+
+
+data $\gets$ empty_hashmap()
+intervals $\gets$ empty_max_heap()
+data[a] $\gets$ f(a)
+data[b] $\gets$ f(b)
+l $\gets$ loss(a, b, data[a], data[b])
+heap_push(intervals, (l, a, b))
+
+while heap_max(intervals)[0] > $\epsilon$:
+  xs $\gets$ empty_array(n_per_round)
+  for i in 0..n_per_round-1:
+    l, a, b $\gets$ get_next(data, intervals)
+    m $\gets$ (a + b) / 2
+    xs[i] $\gets$ m
+    heap_push(intervals, (l/2, a, m, True))
+    heap_push(intervals, (l/2, m, b, True))
+
+  fs $\gets$ parallel_map(f, xs)
+
+  for i in 0..n_per_round:
+    data[xs[i]] $\gets$ fs[i]
+```
+
 #### In general local loss functions only have a logarithmic overhead.
 Due to the local nature of the loss function, the asymptotic complexity is logarithmic.
 This is because Adaptive stores the losses per interval in a max-heap