<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Deep Thoughts by Raymond Hettinger</title>
	<atom:link href="http://rhettinger.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://rhettinger.wordpress.com</link>
	<description>Ruminations on Computers, Programming and Life</description>
	<lastBuildDate>Sat, 12 Jan 2013 03:49:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='rhettinger.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Deep Thoughts by Raymond Hettinger</title>
		<link>http://rhettinger.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://rhettinger.wordpress.com/osd.xml" title="Deep Thoughts by Raymond Hettinger" />
	<atom:link rel='hub' href='http://rhettinger.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Python’s super() considered super!</title>
		<link>http://rhettinger.wordpress.com/2011/05/26/super-considered-super/</link>
		<comments>http://rhettinger.wordpress.com/2011/05/26/super-considered-super/#comments</comments>
		<pubDate>Thu, 26 May 2011 16:15:42 +0000</pubDate>
		<dc:creator>rhettinger</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[Inheritance]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://rhettinger.wordpress.com/?p=134</guid>
		<description><![CDATA[If you aren’t wowed by Python’s super() builtin, chances are you don’t really know what it is capable of doing or how to use it effectively. Much has been written about super() and much of that writing has been a failure. This article seeks to improve on the situation by: providing practical use cases giving [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rhettinger.wordpress.com&#038;blog=11889699&#038;post=134&#038;subd=rhettinger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you aren’t wowed by Python’s super() builtin, chances are you don’t really know what it is capable of doing or how to use it effectively.</p>
<p>Much has been written about super() and much of that writing has been a failure. This article seeks to improve on the situation by:</p>
<ul>
<li>providing practical use cases</li>
<li>giving a clear mental model of how it works</li>
<li>showing the tradecraft for getting it to work every time</li>
<li>concrete advice for building classes that use super()</li>
<li>favoring real examples over abstract ABCD <a href="http://en.wikipedia.org/wiki/Diamond_problem">diamond diagrams</a>.</li>
</ul>
<p>The examples for this post are available in both <a href="http://code.activestate.com/recipes/577721-how-to-use-super-effectively-python-27-version/">Python 2 syntax</a> and <a href="http://code.activestate.com/recipes/577720-how-to-use-super-effectively/">Python 3 syntax</a>.</p>
<p>Using Python 3 syntax, let’s start with a basic use case, a subclass for extending a method from one of the builtin classes:</p>
<pre>class LoggingDict(dict):
    def __setitem__(self, key, value):
        logging.info('Setting %r to %r' % (key, value))
        super().__setitem__(key, value)</pre>
<p>This class has all the same capabilities as its parent, <em>dict</em>, but it extends the __setitem__ method to make log entries whenever a key is updated. After making a log entry, the method uses super() to delegate the work for actually updating the dictionary with the key/value pair.</p>
<p>Before super() was introduced, we would have hardwired the call with <em>dict.__setitem__(self, key, value)</em>. However, super() is better because it is a computed indirect reference.</p>
<p>One benefit of indirection is that we don’t have to specify the delegate class by name. If you edit the source code to switch the base class to some other mapping, the super() reference will automatically follow. You have a single source of truth:</p>
<pre>class LoggingDict(SomeOtherMapping):            # new base class
    def __setitem__(self, key, value):
        logging.info('Setting %r to %r' % (key, value))
        super().__setitem__(key, value)         # no change needed</pre>
<p>In addition to isolating changes, there is another major benefit to computed indirection, one that may not be familiar to people coming from static languages. Since the indirection is computed at runtime, we have the freedom to influence the calculation so that the indirection will point to some other class.</p>
<p>The calculation depends on both the class where super is called and on the instance’s tree of ancestors. The first component, the class where super is called, is determined by the source code for that class. In our example, super() is called in the <em>LoggingDict.__setitem__</em> method. That component is fixed. The second and more interesting component is variable (we can create new subclasses with a rich tree of ancestors).</p>
<p>Let’s use this to our advantage to construct a logging ordered dictionary without modifying our existing classes:</p>
<pre>class LoggingOD(LoggingDict, collections.OrderedDict):
    pass</pre>
<p>The ancestor tree for our new class is: <em>LoggingOD</em>,<em> LoggingDict</em>,<em> OrderedDict</em>,<em> dict</em>,<em> object</em>. For our purposes, the important result is that <em>OrderedDict</em> was inserted after <em>LoggingDict</em> and before <em>dict</em>! This means that the super() call in <em>LoggingDict.__setitem__</em> now dispatches the key/value update to <em>OrderedDict</em> instead of <em>dict</em>.</p>
<p>Think about that for a moment. We did not alter the source code for <em>LoggingDict</em>. Instead we built a subclass whose only logic is to compose two existing classes and control their search order.</p>
<p>__________________________________________________________________________________________________________________</p>
<p style="text-align:center;"><strong>Search Order</strong></p>
<p>What I’ve been calling the search order or ancestor tree is officially known as the Method Resolution Order or MRO. It’s easy to view the MRO by printing the __mro__ attribute:</p>
<pre>&gt;&gt;&gt; pprint(LoggingOD.__mro__)
(&lt;class '__main__.LoggingOD'&gt;,
 &lt;class '__main__.LoggingDict'&gt;,
 &lt;class 'collections.OrderedDict'&gt;,
 &lt;class 'dict'&gt;,
 &lt;class 'object'&gt;)</pre>
<p>If our goal is to create a subclass with an MRO to our liking, we need to know how it is calculated. The basics are simple. The sequence includes the class, its base classes, and the base classes of those bases and so on until reaching <em>object</em> which is the root class of all classes. The sequence is ordered so that a class always appears before its parents, and if there are multiple parents, they keep the same order as the tuple of base classes.</p>
<p>The MRO shown above is the one order that follows from those constraints:</p>
<ul>
<li>LoggingOD precedes its parents, LoggingDict and OrderedDict</li>
<li>LoggingDict precedes OrderedDict because LoggingOD.__bases__ is (LoggingDict, OrderedDict)</li>
<li>LoggingDict precedes its parent which is dict</li>
<li>OrderedDict precedes its parent which is dict</li>
<li>dict precedes its parent which is object</li>
</ul>
<p>The process of solving those constraints is known as linearization. There are a number of good papers on the subject, but to create subclasses with an MRO to our liking, we only need to know the two constraints: children precede their parents and the order of appearance in <em>__bases__</em> is respected.</p>
<p>__________________________________________________________________________________________________________________</p>
<p style="text-align:center;"><strong>Practical Advice</strong></p>
<p>super() is in the business of delegating method calls to some class in the instance’s ancestor tree. For reorderable method calls to work, the classes need to be designed cooperatively. This presents three easily solved practical issues:</p>
<ul>
<li>the method being called by super() needs to exist</li>
<li>the caller and callee need to have a matching argument signature</li>
<li>and every occurrence of the method needs to use super()</li>
</ul>
<p>1) Let’s first look at strategies for getting the caller&#8217;s arguments to match the signature of the called method. This is a little more challenging than traditional method calls where the callee is known in advance. With super(), the callee is not known at the time a class is written (because a subclass written later may introduce new classes into the MRO).</p>
<p>One approach is to stick with a fixed signature using positional arguments. This works well with methods like __setitem__ which have a fixed signature of two arguments, a key and a value. This technique is shown in the <em>LoggingDict</em> example where __setitem__ has the same signature in both <em>LoggingDict</em> and <em>dict</em>.</p>
<p>A more flexible approach is to have every method in the ancestor tree cooperatively designed to accept keyword arguments and a keyword-arguments dictionary, to remove any arguments that it needs, and to forward the remaining arguments using **kwds, eventually leaving the dictionary empty for the final call in the chain.</p>
<p>Each level strips-off the keyword arguments that it needs so that the final empty dict can be sent to a method that expects no arguments at all (for example, <em>object.__init__</em> expects zero arguments):</p>
<pre>class Shape:
    def __init__(self, shapename, **kwds):
        self.shapename = shapename
        super().__init__(**kwds)        

class ColoredShape(Shape):
    def __init__(self, color, **kwds):
        self.color = color
        super().__init__(**kwds)

cs = ColoredShape(color='red', shapename='circle')</pre>
<p>2) Having looked at strategies for getting the caller/callee argument patterns to match, let’s now look at how to make sure the target method exists.</p>
<p>The above example shows the simplest case. We know that <em>object</em> has an __init__ method and that <em>object</em> is always the last class in the MRO chain, so any sequence of calls to <em>super().__init__</em> is guaranteed to end with a call to <em>object.__init__</em> method. In other words, we’re guaranteed that the target of the super() call is guaranteed to exist and won’t fail with an <em>AttributeError</em>.</p>
<p>For cases where <em>object</em> doesn’t have the method of interest (a draw() method for example), we need to write a root class that is guaranteed to be called before <em>object</em>. The responsibility of the root class is simply to eat the method call without making a forwarding call using super().</p>
<p><em>Root.draw</em> can also employ <a href="http://en.wikipedia.org/wiki/Defensive_programming">defensive programming</a> using an assertion to ensure it isn&#8217;t masking some other draw() method later in the chain.  This could happen if a subclass erroneously incorporates a class that has a draw() method but doesn&#8217;t inherit from <em>Root</em>.:</p>
<pre>class Root:
    def draw(self):
        # the delegation chain stops here
        assert not hasattr(super(), 'draw')

class Shape(Root):
    def __init__(self, shapename, **kwds):
        self.shapename = shapename
        super().__init__(**kwds)
    def draw(self):
        print('Drawing.  Setting shape to:', self.shapename)
        super().draw()

class ColoredShape(Shape):
    def __init__(self, color, **kwds):
        self.color = color
        super().__init__(**kwds)
    def draw(self):
        print('Drawing.  Setting color to:', self.color)
        super().draw()

cs = ColoredShape(color='blue', shapename='square')
cs.draw()</pre>
<p>If subclasses want to inject other classes into the MRO, those other classes also need to inherit from <em>Root</em> so that no path for calling draw() can reach <em>object</em> without having been stopped by <em>Root.draw</em>. This should be clearly documented so that someone writing new cooperating classes will know to subclass from <em>Root</em>. This restriction is not much different than Python’s own requirement that all new exceptions must inherit from <em>BaseException</em>.</p>
<p>3) The techniques shown above assure that super() calls a method that is known to exist and that the signature will be correct; however, we’re still relying on super() being called at each step so that the chain of delegation continues unbroken. This is easy to achieve if we’re designing the classes cooperatively – just add a super() call to every method in the chain.</p>
<p>The three techniques listed above provide the means to design cooperative classes that can be composed or reordered by subclasses.</p>
<p>__________________________________________________________________________________________________________________</p>
<p style="text-align:center;"><strong>How to Incorporate a Non-cooperative Class</strong></p>
<p>Occasionally, a subclass may want to use cooperative multiple inheritance techniques with a third-party class that wasn’t designed for it (perhaps its method of interest doesn’t use super() or perhaps the class doesn’t inherit from the root class). This situation is easily remedied by creating an <a href="http://en.wikipedia.org/wiki/Adapter_pattern">adapter class</a> that plays by the rules.</p>
<p>For example, the following <em>Moveable</em> class does not make super() calls, and it has an __init__() signature that is incompatible with <em>object.__init__</em>, and it does not inherit from <em>Root</em>:</p>
<pre>class Moveable:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def draw(self):
        print('Drawing at position:', self.x, self.y)</pre>
<p>If we want to use this class with our cooperatively designed <em>ColoredShape</em> hierarchy, we need to make an adapter with the requisite super() calls:</p>
<pre>class MoveableAdapter(Root):
    def __init__(self, x, y, **kwds):
        self.movable = Moveable(x, y)
        super().__init__(**kwds)
    def draw(self):
        self.movable.draw()
        super().draw()

class MovableColoredShape(ColoredShape, MoveableAdapter):
    pass

MovableColoredShape(color='red', shapename='triangle',
                    x=10, y=20).draw()</pre>
<p>__________________________________________________________________________________________________________________</p>
<p style="text-align:center;"><strong>Complete Example – Just for Fun</strong></p>
<p>In Python 2.7 and 3.2, the collections module has both a <em>Counter</em> class and an <em>OrderedDict</em> class. Those classes are easily composed to make an <em>OrderedCounter</em>:</p>
<pre>from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first seen'
     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__,
                            OrderedDict(self))
     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

oc = OrderedCounter('abracadabra')</pre>
<p>__________________________________________________________________________________________________________________</p>
<p style="text-align:center;"><strong>Notes and References</strong></p>
<p><strong>*</strong> When subclassing a builtin such as dict(), it is often necessary to override or extend multiple methods at a time. In the above examples, the __setitem__ extension isn’t used by other methods such as <em>dict.update</em>, so it may be necessary to extend those also. This requirement isn’t unique to super(); rather, it arises whenever builtins are subclassed.</p>
<p><strong>*</strong> If a class relies on one parent class preceding another (for example, <em>LoggingOD</em> depends on <em>LoggingDict</em> coming before <em>OrderedDict</em> which comes before <em>dict</em>), it is easy to add assertions to validate and document the intended method resolution order:</p>
<pre>position = LoggingOD.__mro__.index
assert position(LoggingDict) &lt; position(OrderedDict)
assert position(OrderedDict) &lt; position(dict)</pre>
<p><strong>*</strong> Good write-ups for linearization algorithms can be found at <a href="http://www.python.org/download/releases/2.3/mro/">Python MRO documentation</a> and at <a href="http://en.wikipedia.org/wiki/C3_linearization">Wikipedia entry for C3 Linearization</a>.</p>
<p><strong>*</strong> The <a href="http://en.wikipedia.org/wiki/Dylan_(programming_language)">Dylan programming language</a> has a <em>next-method</em> construct that works like Python&#8217;s super(). See <a href="http://www.opendylan.org/books/dpg/db_347.html">Dylan&#8217;s class docs</a> for a brief write-up of how it behaves.</p>
<p><strong>*</strong> The Python 3 version of super() is used in this post. The full working source code can be found at:  <a href="http://code.activestate.com/recipes/577720-how-to-use-super-effectively/">Recipe 577720</a>. The Python 2 syntax differs in that the <em>type</em> and <em>object</em> arguments to super() are explicit rather than implicit. Also, the Python 2 version of super() only works with new-style classes (those that explicitly inherit from <em>object</em> or other builtin type). The full working source code using Python 2 syntax is at <a href="http://code.activestate.com/recipes/577721-how-to-use-super-effectively-python-27-version/">Recipe 577721</a>.<br />
__________________________________________________________________________________________________________________</p>
<p style="text-align:center;"><strong>Acknowledgements</strong></p>
<p style="text-align:left;">Serveral Pythonistas did a pre-publication review of this article.  Their comments helped improve it quite a bit.</p>
<p style="text-align:left;">They are:  Laura Creighton, Alex Gaynor, Philip Jenvey, Brian Curtin, David Beazley, Chris Angelico, Jim Baker, Ethan Furman, and Michael Foord.  Thanks one and all.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rhettinger.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rhettinger.wordpress.com/134/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rhettinger.wordpress.com&#038;blog=11889699&#038;post=134&#038;subd=rhettinger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rhettinger.wordpress.com/2011/05/26/super-considered-super/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/bc2071afd499daef001e75e14d7f9cce?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rhettinger</media:title>
		</media:content>
	</item>
		<item>
		<title>Open Source Challenge: Open Your Source, More</title>
		<link>http://rhettinger.wordpress.com/2011/01/28/open-your-source-more/</link>
		<comments>http://rhettinger.wordpress.com/2011/01/28/open-your-source-more/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 13:15:59 +0000</pubDate>
		<dc:creator>rhettinger</dc:creator>
				<category><![CDATA[Documentation]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://rhettinger.wordpress.com/?p=40</guid>
		<description><![CDATA[We all agree, open source is a good thing. It is posited that many eyeballs make all bugs shallow. With crowd sourcing, your users become contributors. And even seasoned programmers benefit from time spent reading code. Further, code can serve as an adjunct to documentation. If you really want to know what software is doing, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rhettinger.wordpress.com&#038;blog=11889699&#038;post=40&#038;subd=rhettinger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><strong>We all agree, open source is a good thing</strong>. It is posited that <a title="Linus' Law" href="http://en.wikipedia.org/wiki/Linus'_Law" target="_blank">many eyeballs make all bugs shallow</a>. With <a title="Wikipedia: Crowd Sourcing" href="http://en.wikipedia.org/wiki/Crowdsourcing" target="_blank">crowd sourcing</a>, your users become contributors. And even seasoned programmers benefit from time spent reading code. Further, code can serve as an adjunct to documentation. If you really want to know what software is doing, read the source.</p>
<p>Once upon a time in Python land, it seemed everyone had the standard library source at their fingertips. Newbies would read the source to learn from programming luminaries like <a title="Tim Peters in a Nutshell" href="http://everything2.com/title/Tim+Peters" target="_blank">Tim Peters</a> and <a title="Fredrik Lundh's Site" href="effbot.org" target="_blank">Fredrik Lundh</a>. Respondents on the famously friendly <a title="Newsgroup: comp.lang.python" href="http://groups.google.com/group/comp.lang.python/topics" target="_blank">Python newsgroup</a> would answer questions by referring to the source, and the phrase &#8220;<a title="History of &quot;Use the Source Luke&quot;" href="http://encyclopedia2.thefreedictionary.com/Use+the+Source+Luke" target="_blank">Use the Source Luke</a>&#8221; was a familiar refrain.</p>
<p>Times have changed. <strong>While core-developers have the source at their fingertips, other developers seem to have lost touch with the source.</strong> People get to know code in their own projects, but rarely peer into other projects.</p>
<p>I&#8217;ve worked in several Python shops, have done consulting work with first-rate programmers, and discussed programming at meet-ups and conventions. I&#8217;m disappointed to report that <strong>it is becoming increasingly rare to look at another project&#8217;s source code</strong>, even among top-notch programmers.</p>
<p>At first, I wanted to blame the practice of tool <a title="Define: Mashup" href="http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)">mashups</a>. If you&#8217;re furiously grafting a dozen tools together, who has time to study them in depth? But then, that&#8217;s not true; when the docs don&#8217;t answer your questions,<strong> it&#8217;s not a waste of time to examine the relevant source code, provided you know exactly where to look.</strong></p>
<p>Why don&#8217;t people look for answers in the code? <strong>I place the blame on huge code repositories and on installers that move the code out-of-sight and out-of-mind.</strong></p>
<p>A fresh checkout of Python 3.2 has 216 sub-directories with almost 478,000 lines of Python and 398,000 lines of C.<a href="#footnote1">¹</a> I&#8217;ve worked with that code extensively for almost a decade, yet every day I find something new. Unless you know where to look, <strong>searching the source for an answer can be a time consuming intellectual investment</strong>.</p>
<p>Repository labyrinths are only half the problem. The rest of the blame lies with installers. They <a title="Sweeping generalization" href="http://www.fallacydetective.com/news/read/sweeping-generalization" target="_blank">all</a> treat human-readable scripts like they were binaries and <strong>tuck the code away in a dark corner</strong>. On my Mac, the source for the <em>heapq</em> module is stored at <em>/Library/Frameworks /Python.framework‌/Versions/3.2/lib/python3.2/heapq.py</em>. Likewise, the <a title="Python's PIP installer" href="http://pip.openplans.org/" target="_blank">pip installer</a> puts the source for <a title="Mercurial Distributed Version Control System" href="http://mercurial.selenic.com/" target="_blank">Mercurial</a> in <em>/Library/Python /2.6/site-packages/hgext</em>. <strong>News flash</strong>: no one is going to view this code.</p>
<p>Large code bases and obscure installation paths are a fact of life, <strong>but there is something we can do</strong> to reconnect our users with the source code. Starting with Python 3.2, I&#8217;ve gone through the module documentation and added <strong>direct links to the relevant source code</strong>.</p>
<p>For example, in the <a title="Documentation for Python's heapq module" href="http://docs.python.org/dev/library/heapq.html#module-heapq" target="_blank">heapq documentation</a>, the first line under the heading now provides a source link:</p>
<ul>
<li><strong>Source code</strong>: <a title="Source Code for the heapq module" href="http://svn.python.org/view/python/branches/py3k/Lib/heapq.py?view=markup" target="_blank">Lib/heapq.py</a></li>
</ul>
<p>This idea is not new. The <a title="Documentation for Go's heap package" href="http://golang.org/pkg/container/heap/" target="_blank">heap package documentation</a> for the Go Programming Language also starts with a source link:</p>
<ul>
<li><strong>Package files</strong>: <a title="Source code for Go's heap package" href="http://golang.org/src/pkg/container/heap/heap.go" target="_blank">heap.go</a></li>
</ul>
<p>There you have it &#8212; direct, convenient and relevant navigation into a large code base. Perhaps users will start reading code again. Maybe you&#8217;ll receive better bug reports, perhaps accompanied by code patches and documentation improvements. Maybe someone will read your code and become a better programmer. Maybe you&#8217;ll be inspired to write better code.</p>
<p><strong>My challenge to you open-sourcers is to open your source, more.</strong> Provide direct links from the docs. It is not sufficient to expose your repository as one big ball of mud. Make it easy to go right to heart of an issue.</p>
<hr />
<blockquote><p>&#8220;Publishing programs is a healthy habit. Every program I&#8217;ve written knowing it was to be published was improved by that knowledge. I think more clearly when I&#8217;m writing for an audience, and find it help to pretend there always is one.&#8221; <em>&#8211; Rob Pike (quoted from Programming Pearls)</em></p></blockquote>
<blockquote><p>&#8220;I had known for a long time that the programs I construct for publication in a book, or the programs that I construct in front of a class, have tended to be comparatively free of errors, because I am forced to clarify my thoughts as I do the programming.&#8221; <em>&#8211; Donald Knuth (quoted from Literate Programming)</em></p></blockquote>
<hr />
<p><strong>My challenge to you and hope for the future:</strong></p>
<ul>
<li>A universal practice of <strong>docs linking back to source code</strong></li>
<li> Users getting reacquainted with the sources of their tools</li>
<li> Programmers writing code as if they expected it to be read</li>
</ul>
<hr />
<h5>Footnote:</h5>
<p><a name="footnote1">[1]</a> Sizing the Python 3.2 repository:</p>
<pre>$ svn co http://svn.python.org/projects/python/branches/py3k tmp
$ cd tmp
$ find . -type 'd' | grep -v 'svn' | wc -l
     216
$ wc -l $(find . -name "*.py" | egrep -v 'svn|build') | tail -1
  477868 total
$ wc -l $(find . -name "*.[ch]" | egrep -v 'svn|build') | tail -1
  398407 total</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rhettinger.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rhettinger.wordpress.com/40/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rhettinger.wordpress.com&#038;blog=11889699&#038;post=40&#038;subd=rhettinger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rhettinger.wordpress.com/2011/01/28/open-your-source-more/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/bc2071afd499daef001e75e14d7f9cce?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rhettinger</media:title>
		</media:content>
	</item>
		<item>
		<title>Regaining Lost Knowledge</title>
		<link>http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/</link>
		<comments>http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 23:28:30 +0000</pubDate>
		<dc:creator>rhettinger</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Lost Knowledge]]></category>
		<category><![CDATA[Running Median]]></category>
		<category><![CDATA[Skiplists]]></category>
		<category><![CDATA[Wikipedia]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Rediscovery of Indexable Skiplists in the solution of the Running Median problem.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rhettinger.wordpress.com&#038;blog=11889699&#038;post=1&#038;subd=rhettinger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A <a href="http://groups.google.com/group/comp.lang.python/browse_frm/thread/d0e011c87174c2d0">recent Python newsgroup query</a> asked for an efficient solution to the problem of computing a running median as a large sliding window advances over a stream of data</p>
<p>One category of replies can be classified as <strong>clever</strong>.  The respondents used their innate intelligence and knowledge of Python for a fresh look at the problem.  Their solutions focused on the fact the position of the median doesn&#8217;t move much between successive updates.  Unfortunately, these solutions were catastrophically slow for large data windows.</p>
<p>Another category of reply relied on <strong>education</strong>.  A respondent remembered that <a href="http://en.wikipedia.org/wiki/Selection_algorithm">QuickSelect</a> is a fast <em>O(n)</em> way of finding a median in unsorted data.  I responded with an <a href="http://code.activestate.com/recipes/269554">ASPN recipe implementing QuickSelect</a> (written by yours truly).  These posts represented progress, a triumph of education over cleverness, but even that improved solution was unusably slow for large window sizes.</p>
<p>A more promising type of reply relied on <strong>research</strong>.  Surely, this problem had been solved before. Indeed, there is a published paper: <a href="http://www.ligo.caltech.edu/docs/T/T030168-00.pdf">Efficient Algorithm for Computing a Running Median by Soymya D. Mohanty</a> with an <em>O(sqrt(n))</em> solution.  Score one for science!</p>
<p>However, that solution was trumped by respondents who characterized the solution <strong>mathematically</strong>, “the obvious way to  compute a running median involves a tree structure so you can  quickly insert and delete elements, and find the median. That would be asymptotically <em>O(log n)</em> but messy to implement.” Fortunately, such an implementation exists using the <a href="http://pypi.python.org/pypi/blist/0.9.4">blist</a> Python extension.  Alas, we had a good solution but not a portable one.  Without the extension module, the B+ tree structure is non-trivial to implement.</p>
<p>When I thought about the problem,  the mathematical characterization suggested data structures that maintained sorted data with <em>O(log n)</em> updates, and previous education indicated a <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.9211&amp;rep=rep1&amp;type=pdf">skiplist</a> would fit the bill, but it took cleverness to discover that indexing the skiplist to find the median could be reduced to <em>O(log n)</em> time by adding link widths to the structure.  This thinking led to <a href="http://code.activestate.com/recipes/576930/">my solution</a> which is easily portable across languages and scales well to very large window sizes.</p>
<p>Had I discovered something new under the sun?  Yes and no.</p>
<p>Yes, as far as I can tell the idea of using an indexable skiplist to solve the running median problem in <em>O(log n)</em> time had never been presented before anywhere else.  The best published solution was Mohanty&#8217;s <em>O(sqrt n)</em> solution. Score one for combining <strong>mathematical characterization</strong> with <strong>education</strong> and <strong>cleverness</strong>.</p>
<p>And no, the big inspiration of figuring out how to make a skiplist indexable was not a new result.  Score a big failure for <strong>research</strong>.  Everywhere I had looked for skiplist resources, only the basics were presented (insertion and deletion in <em>O(log n)</em> time).  No resource mentioned indexable skiplists.  The <a href="http://eprints.kfupm.edu.sa/21076/">previous work on the problem</a> had effectively been <strong>lost</strong>.  An entire generation of programmers was learning about skiplists but not being taught that they could be made efficiently indexable.</p>
<p>To help the world regain this lost knowledge, I updated the <a href="http://en.wikipedia.org/wiki/Skip_list">wikipedia entry for skiplists</a> to show how to make them indexable with my Python recipe and I&#8217;ve added a link to Pugh&#8217;s earlier research on the problem.</p>
<p>Will that wikipedia entry really solve the <strong>problem of lost knowledge</strong>?  The <a href="http://stats.grok.se/en/200912/Skip_list">page view statistics</a> suggest that it will.  Only time will tell.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rhettinger.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rhettinger.wordpress.com/1/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rhettinger.wordpress.com&#038;blog=11889699&#038;post=1&#038;subd=rhettinger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/bc2071afd499daef001e75e14d7f9cce?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rhettinger</media:title>
		</media:content>
	</item>
	</channel>
</rss>
