New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bag.top(bottom)Occurrences
#1471
Comments
Bag.top(bottom)Occurrences
Bag.top(bottom)Occurrences
the topOccurrences() method has an inefficient implementation that ought to be optimized. The current implementation sorts, as you said, making it O(n log n). The selection algorithm is O(n) and sorting would take O(k log k) in the absence of ties. |
I took a look at Python's |
That's only decent if I wonder how bad it would be to stuff the first k elements in a tree, then remove the worst one as new ones are added (the tree would have to be a list at each node to contain the equals). |
Sorry that was a typo, I forgot the logarithm. One implementation is to keep a min heap of fixed size k. Insert every (item, count) pair into it and remove the minimum whenever the heap size exceeds k. Inserting and removing in this heap is O(log k) since its fixed to size k. Handling ties is tricky in this implementation but can be done. Another implementation is to heapify all items into a max heap, and then pop the max until we're done. This approach makes it easier to handle ties, since they come at the end. Heapify is O(n). Popping the max from a max heap is O(log n). So overall, this approach is O(k log n). |
The API documentation says that these methods return
ListIterable<ObjectInPair<T>>
but it didn't say if the element at the beginning of the returned list would have the most or the least occurrence. I think it's a bit unclear there. The unit testMutableBagTestCase.topOccurences
says that the 1st element in the returned list has the most occurrences. Is there a unit test for this case (mentioned in the API doc) "In the event of a tie, all the items with the number of occurrences that match the occurrences of the last item will be returned"?The implementation for those
top(Bottom)Occurrences
, if I'm not wrong, is inAbstractBag.occurrencesSortingBy
. In this method, all elements in the bag are put into a list with occurrences. Then this list is sorted by some function. Then to get the top/bottom, e.g. 5, 5 first element in that list are returned. IMO, in some cases, this is inefficient: to get the top/bottom 5 occurrences, we have to sort the whole list.The text was updated successfully, but these errors were encountered: