Open Source Challenge: Open Your Source, More
We all agree, open source is a good thing. It is posited that many eyeballs make all bugs shallow. With crowd sourcing, your users become contributors. And even seasoned programmers benefit from time spent reading code. Further, code can serve as an adjunct to documentation. If you really want to know what software is doing, read the source.
Once upon a time in Python land, it seemed everyone had the standard library source at their fingertips. Newbies would read the source to learn from programming luminaries like Tim Peters and Fredrik Lundh. Respondents on the famously friendly Python newsgroup would answer questions by referring to the source, and the phrase “Use the Source Luke” was a familiar refrain.
Times have changed. While core-developers have the source at their fingertips, other developers seem to have lost touch with the source. People get to know code in their own projects, but rarely peer into other projects.
I’ve worked in several Python shops, have done consulting work with first-rate programmers, and discussed programming at meet-ups and conventions. I’m disappointed to report that it is becoming increasingly rare to look at another project’s source code, even among top-notch programmers.
At first, I wanted to blame the practice of tool mashups. If you’re furiously grafting a dozen tools together, who has time to study them in depth? But then, that’s not true; when the docs don’t answer your questions, it’s not a waste of time to examine the relevant source code, provided you know exactly where to look.
Why don’t people look for answers in the code? I place the blame on huge code repositories and on installers that move the code out-of-sight and out-of-mind.
A fresh checkout of Python 3.2 has 216 sub-directories with almost 478,000 lines of Python and 398,000 lines of C.¹ I’ve worked with that code extensively for almost a decade, yet every day I find something new. Unless you know where to look, searching the source for an answer can be a time consuming intellectual investment.
Repository labyrinths are only half the problem. The rest of the blame lies with installers. They all treat human-readable scripts like they were binaries and tuck the code away in a dark corner. On my Mac, the source for the heapq module is stored at /Library/Frameworks /Python.framework/Versions/3.2/lib/python3.2/heapq.py. Likewise, the pip installer puts the source for Mercurial in /Library/Python /2.6/site-packages/hgext. News flash: no one is going to view this code.
Large code bases and obscure installation paths are a fact of life, but there is something we can do to reconnect our users with the source code. Starting with Python 3.2, I’ve gone through the module documentation and added direct links to the relevant source code.
For example, in the heapq documentation, the first line under the heading now provides a source link:
- Source code: Lib/heapq.py
This idea is not new. The heap package documentation for the Go Programming Language also starts with a source link:
- Package files: heap.go
There you have it — direct, convenient and relevant navigation into a large code base. Perhaps users will start reading code again. Maybe you’ll receive better bug reports, perhaps accompanied by code patches and documentation improvements. Maybe someone will read your code and become a better programmer. Maybe you’ll be inspired to write better code.
My challenge to you open-sourcers is to open your source, more. Provide direct links from the docs. It is not sufficient to expose your repository as one big ball of mud. Make it easy to go right to heart of an issue.
“Publishing programs is a healthy habit. Every program I’ve written knowing it was to be published was improved by that knowledge. I think more clearly when I’m writing for an audience, and find it help to pretend there always is one.” — Rob Pike (quoted from Programming Pearls)
“I had known for a long time that the programs I construct for publication in a book, or the programs that I construct in front of a class, have tended to be comparatively free of errors, because I am forced to clarify my thoughts as I do the programming.” — Donald Knuth (quoted from Literate Programming)
My challenge to you and hope for the future:
- A universal practice of docs linking back to source code
- Users getting reacquainted with the sources of their tools
- Programmers writing code as if they expected it to be read
 Sizing the Python 3.2 repository:
$ svn co http://svn.python.org/projects/python/branches/py3k tmp $ cd tmp $ find . -type 'd' | grep -v 'svn' | wc -l 216 $ wc -l $(find . -name "*.py" | egrep -v 'svn|build') | tail -1 477868 total $ wc -l $(find . -name "*.[ch]" | egrep -v 'svn|build') | tail -1 398407 total