Skip to content
Snippets Groups Projects

Resolve "(Learner1D) improve time complexity"

Merged Jorn Hoofwijk requested to merge 126--speed-up-learner1D into master
1 unresolved thread

Closes #126 (closed) #104 (closed)

improves time complexity from O(N^2) to O(N log N) (for learning a function from 0 to N points).

Speed is now ~1000 points per second on my computer (given the default loss) and remains more or less constant untill 5000 points

obviously for small N (say 10 points) the old method was faster, for larger N (say >20, the new method is faster).

It does need to be said that according to my profiler, most of the time is spend on this line: be74ad62 (line 572)

please do verify that the results are noticeable for you as well.

Also do note that the complexity now is as follows:

Operation Comblexity @ master Complexity @ branch
tell(x,y) O(N) O(log N)
ask(1) O(N) O(log N)
loss() O(N) O(1)
ask(k) O(N*k) O(k * log N)
run 1..N O(N^2) O(N log N)
Edited by Jorn Hoofwijk

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Jorn Hoofwijk changed the description

    changed the description

  • @Jorn can you post the benchmarking script so that we can veryify, please?

  • This MR introduces a nice clean abstraction over the loss calculation. Once we've addressed those 2 minor points this is good to go.

  • learner1d_complexity.ipynb

    instructions:

    • git checkout master
    • run every cell of the notebook (the last may be skipped)
    • git checkout 126--speed-up-learner1D
    • notebook > kernel > restart & run all
    • see difference
  • Jorn Hoofwijk added 2 commits

    added 2 commits

    • a881ee38 - uncomment __init__.py deletions
    • 3103d784 - add more_itertools to requirements

    Compare with previous version

  • Jorn Hoofwijk added 1 commit

    added 1 commit

    • 1b162cb6 - with dash instead of underscore

    Compare with previous version

  • 658 666
    659 667 def _set_data(self, data):
    660 668 self.tell_many(*zip(*data.items()))
    669
    670
    671 class LossManager:
    • Could you add some comments about what this LossManager is doing in general?

    • Maybe it is comparable to a dict sorted by value (sortedcontainers does not implement such class), and it also implements some behaviour specific to the losses. Like chosing the width of the interval as sorting criterion whenever the loss is infinite, and also rounding the loss to 12 digits.

      The main benefit of using this over what I did in the LearnerND is that this code is more readable: If you take for granted that the LossManager works, you may treat it like a dict whenever you need a dict, and treat it like a PriorityQueue whenever you need to have the highest loss.

      Also I think this will eliminate a lot of potential bugs. The public functions of the LossManager are constructed such that the dict and the SortedKeyList always have the same data.

      This then saves a lot of trouble in debugging, as it is generally really hard to find a bug when the data in two structures should be identical, yet isn't.

    • changed this line in version 6 of the diff

    • After your comment, I understood what happens there.

      There is a package called sortedcollections (from the same author as sortedcontainers) that implements a ItemSortedDict. This is essentially what we needed.

      I reimplemented the LossManager using that, which makes it much simpler. However looking at the code now, I think we might not even need a subclass, because the only relevant code is the finite_loss function (@jbweston and @Jorn opinion?).

      I'll add another commit to suggest how it would look without the extra class.

    • You are indeed correct, I had no clue that this ItemSortedDict existed, I think this indeed what we need

    • Please register or sign in to reply
  • Jorn Hoofwijk added 1 commit

    added 1 commit

    • 6100436c - add documentation to the lossmanager

    Compare with previous version

  • Jorn Hoofwijk added 1 commit

    added 1 commit

    Compare with previous version

  • Bas Nijholt added 2 commits

    added 2 commits

    • e3bd3609 - implement LossManager using sortedcollections.ItemSortedDict
    • b8c0269b - rename _losses -> losses

    Compare with previous version

  • Bas Nijholt added 3 commits

    added 3 commits

    • a8b69de9 - remove peek from the LossManager
    • d6b33c10 - remove discard from the LossManager
    • 02f80b5b - add sortedcollections to the dependencies

    Compare with previous version

  • Bas Nijholt added 1 commit

    added 1 commit

    • 400f929e - get rid of the more_itertools dependency

    Compare with previous version

  • Bas Nijholt added 2 commits

    added 2 commits

    • 4de6c2a3 - remove the LossManager
    • b3d315bf - change the LossManager into a function

    Compare with previous version

  • Joseph Weston
  • Other than that seems good

  • Bas Nijholt added 1 commit

    added 1 commit

    Compare with previous version

  • Bas Nijholt added 1 commit

    added 1 commit

    Compare with previous version

  • Just need to verify that the speedup is still there with sortedcollections.

  • Still good speed

    (Here 'jorn' is 774c0e6e)

  • @basnijholt good work

    If I understand correctly now it's ready to be merged?

  • Jorn Hoofwijk unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Bas Nijholt added 3 commits

    added 3 commits

    • c0012a9a - 1 commit from branch master
    • 877f65b2 - improve the time complexity of the Learner1D and implement a LossManager
    • 93df432d - implement LossManager using sortedcollections.ItemSortedDict

    Compare with previous version

  • Bas Nijholt added 1 commit

    added 1 commit

    • 4b1641f6 - implement LossManager using sortedcollections.ItemSortedDict

    Compare with previous version

  • Bas Nijholt enabled an automatic merge when the pipeline for 4b1641f6 succeeds

    enabled an automatic merge when the pipeline for 4b1641f6 succeeds

  • Yes, it will be merged when the tests succeed!

  • Bas Nijholt mentioned in commit 669cf077

    mentioned in commit 669cf077

  • merged

  • Jorn Hoofwijk mentioned in merge request !141 (merged)

    mentioned in merge request !141 (merged)

  • Please register or sign in to reply
    Loading