Resolve "(Learner1D) improve time complexity"
Closes #126 (closed) #104 (closed)
improves time complexity from O(N^2) to O(N log N) (for learning a function from 0 to N points).
Speed is now ~1000 points per second on my computer (given the default loss) and remains more or less constant untill 5000 points
obviously for small N (say 10 points) the old method was faster, for larger N (say >20, the new method is faster).
It does need to be said that according to my profiler, most of the time is spend on this line: be74ad62 (line 572)
please do verify that the results are noticeable for you as well.
Also do note that the complexity now is as follows:
Operation | Comblexity @ master | Complexity @ branch |
---|---|---|
tell(x,y) | O(N) | O(log N) |
ask(1) | O(N) | O(log N) |
loss() | O(N) | O(1) |
ask(k) | O(N*k) | O(k * log N) |
run 1..N | O(N^2) | O(N log N) |
Merge request reports
Activity
mentioned in issue #125 (closed)
- Resolved by Jorn Hoofwijk
- Resolved by Jorn Hoofwijk
@Jorn can you post the benchmarking script so that we can veryify, please?
instructions:
git checkout master
- run every cell of the notebook (the last may be skipped)
git checkout 126--speed-up-learner1D
- notebook > kernel > restart & run all
- see difference
added 2 commits
- Resolved by Jorn Hoofwijk
658 666 659 667 def _set_data(self, data): 660 668 self.tell_many(*zip(*data.items())) 669 670 671 class LossManager: Maybe it is comparable to a dict sorted by value (
sortedcontainers
does not implement such class), and it also implements some behaviour specific to the losses. Like chosing the width of the interval as sorting criterion whenever the loss is infinite, and also rounding the loss to 12 digits.The main benefit of using this over what I did in the LearnerND is that this code is more readable: If you take for granted that the LossManager works, you may treat it like a dict whenever you need a dict, and treat it like a PriorityQueue whenever you need to have the highest loss.
Also I think this will eliminate a lot of potential bugs. The public functions of the LossManager are constructed such that the dict and the SortedKeyList always have the same data.
This then saves a lot of trouble in debugging, as it is generally really hard to find a bug when the data in two structures should be identical, yet isn't.
changed this line in version 6 of the diff
After your comment, I understood what happens there.
There is a package called
sortedcollections
(from the same author assortedcontainers
) that implements aItemSortedDict
. This is essentially what we needed.I reimplemented the
LossManager
using that, which makes it much simpler. However looking at the code now, I think we might not even need a subclass, because the only relevant code is thefinite_loss
function (@jbweston and @Jorn opinion?).I'll add another commit to suggest how it would look without the extra class.
added 2 commits
- Resolved by Bas Nijholt
(Here 'jorn' is 774c0e6e)
@basnijholt good work
If I understand correctly now it's ready to be merged?
added 1 commit
- 4b1641f6 - implement LossManager using sortedcollections.ItemSortedDict
enabled an automatic merge when the pipeline for 4b1641f6 succeeds
mentioned in commit 669cf077
mentioned in merge request !141 (merged)