Incidentally, you are not the first. Redis (an in-memory data structure server written in C) has had an indexable skiplist as part of it’s available data structures for over 2 years.

]]>In contrast, the world of skiplists is sparsely documented. Aside from the homework problem in some course handouts and a brief mention (without source code) in the original skiplist paper, it seems to have been completely forgotten that skiplists can augmented for O(log n) indexing.

Both data structures can solve the running median problem. The purported advantage of skiplists over binary search trees is that balancing is accomplished trivially through randomization. That makes the implementation comparatively simple.

]]>The CLRS algorithms book has a chapter dedicated to order statistics for binary search trees. They support insertion/deletion and finding the i’th item in sorted order all in O(log(n)). It’s clear how to implement the sliding window median if you have one of these already built.

When you get a new item in the stream, delete the trailing part of the window, insert the new item into the window, and then ask for the median.

A lecture on order statistics for BSTs

http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-seven

And a homework problem assigned (I think for undergrads?) in 2004 to add order statistics to skip lists: Problem 5-3 here:

http://courses.csail.mit.edu/6.046/spring04/handouts/ps5.pdf

Uh…

I think we can use 2 heaps (one for ), that’s a lot easier than skip lists.

When you add a value, just put the value in the right heap. If the median isn’t the median anymore, just put a value from a heap to the other.

To delete value in a heap, set “deleted” in a table. When the top of a heap is set “deleted”, delete it (that’s “lazy” heap). ]]>

It’s great for implementing ordered (indexed) sets where you would also like to retrieve elements by name. Again, sorting by index and sorting by name would give two different lists.

]]>