Open Source Challenge: Open Your Source, More
We all agree, open source is a good thing. It is posited that many eyeballs make all bugs shallow. With crowd sourcing, your users become contributors. And even seasoned programmers benefit from time spent reading code. Further, code can serve as an adjunct to documentation. If you really want to know what software is doing, read the source.
Once upon a time in Python land, it seemed everyone had the standard library source at their fingertips. Newbies would read the source to learn from programming luminaries like Tim Peters and Fredrik Lundh. Respondents on the famously friendly Python newsgroup would answer questions by referring to the source, and the phrase “Use the Source Luke” was a familiar refrain.
Times have changed. While core-developers have the source at their fingertips, other developers seem to have lost touch with the source. People get to know code in their own projects, but rarely peer into other projects.
I’ve worked in several Python shops, have done consulting work with first-rate programmers, and discussed programming at meet-ups and conventions. I’m disappointed to report that it is becoming increasingly rare to look at another project’s source code, even among top-notch programmers.
At first, I wanted to blame the practice of tool mashups. If you’re furiously grafting a dozen tools together, who has time to study them in depth? But then, that’s not true; when the docs don’t answer your questions, it’s not a waste of time to examine the relevant source code, provided you know exactly where to look.
Why don’t people look for answers in the code? I place the blame on huge code repositories and on installers that move the code out-of-sight and out-of-mind.
A fresh checkout of Python 3.2 has 216 sub-directories with almost 478,000 lines of Python and 398,000 lines of C.¹ I’ve worked with that code extensively for almost a decade, yet every day I find something new. Unless you know where to look, searching the source for an answer can be a time consuming intellectual investment.
Repository labyrinths are only half the problem. The rest of the blame lies with installers. They all treat human-readable scripts like they were binaries and tuck the code away in a dark corner. On my Mac, the source for the heapq module is stored at /Library/Frameworks /Python.framework/Versions/3.2/lib/python3.2/heapq.py. Likewise, the pip installer puts the source for Mercurial in /Library/Python /2.6/site-packages/hgext. News flash: no one is going to view this code.
Large code bases and obscure installation paths are a fact of life, but there is something we can do to reconnect our users with the source code. Starting with Python 3.2, I’ve gone through the module documentation and added direct links to the relevant source code.
For example, in the heapq documentation, the first line under the heading now provides a source link:
- Source code: Lib/heapq.py
This idea is not new. The heap package documentation for the Go Programming Language also starts with a source link:
- Package files: heap.go
There you have it — direct, convenient and relevant navigation into a large code base. Perhaps users will start reading code again. Maybe you’ll receive better bug reports, perhaps accompanied by code patches and documentation improvements. Maybe someone will read your code and become a better programmer. Maybe you’ll be inspired to write better code.
My challenge to you open-sourcers is to open your source, more. Provide direct links from the docs. It is not sufficient to expose your repository as one big ball of mud. Make it easy to go right to heart of an issue.
“Publishing programs is a healthy habit. Every program I’ve written knowing it was to be published was improved by that knowledge. I think more clearly when I’m writing for an audience, and find it help to pretend there always is one.” — Rob Pike (quoted from Programming Pearls)
“I had known for a long time that the programs I construct for publication in a book, or the programs that I construct in front of a class, have tended to be comparatively free of errors, because I am forced to clarify my thoughts as I do the programming.” — Donald Knuth (quoted from Literate Programming)
My challenge to you and hope for the future:
- A universal practice of docs linking back to source code
- Users getting reacquainted with the sources of their tools
- Programmers writing code as if they expected it to be read
Footnote:
[1] Sizing the Python 3.2 repository:
$ svn co http://svn.python.org/projects/python/branches/py3k tmp $ cd tmp $ find . -type 'd' | grep -v 'svn' | wc -l 216 $ wc -l $(find . -name "*.py" | egrep -v 'svn|build') | tail -1 477868 total $ wc -l $(find . -name "*.[ch]" | egrep -v 'svn|build') | tail -1 398407 totalExplore posts in the same categories: Documentation, Open Source, Python
January 28, 2011 at 9:38 am
Agreed — reading stdlib source code is very helpful and a great skill for coders. Note that Python application developers who use the Rope IDE library http://rope.sourceforge.net/ can get to Standard Library source code as easily as they ask for any other code or documentation: move your cursor over the class or function, enter your editor’s Rope command for “show me the definition of {this}”, and it will jump right into stdlib source code wherever it happens to be sitting on your system.
January 28, 2011 at 9:50 am
[…] This post was mentioned on Twitter by John Chandler, raymondh. raymondh said: Raymond's OpenSource Challenge: Open Your Source, More: http://wp.me/pNT3l-E […]
January 28, 2011 at 12:47 pm
That’s hardly a “blame” of installers. The modules are placed in such locations because they need to be accessible in a hierarchy at a location that is known to not conflict with anything else, and be predictable for the Python interpreter on the system.
If you want to “blame” anything for this (though I think it’s inaccurate to frame it as a problem), the correct target of your accusation is the fact that a filesystem path is the identifier for these modules that will be used by programs to find them.
As for reading the source and making it more available to programmers, yes, I agree wholeheartedly. Encouraging the routine reading of other projects’s source code is a good thing, and thank you for beating the drum.
January 28, 2011 at 3:12 pm
Right on!
Clojure has a built in function called “source” which shown the source of the function. Also there is a bot in the IRC channel (clojurebot) than when you ask it “source ” will give you right url into github. Maybe we can add both?
January 28, 2011 at 3:29 pm
That’s a very good point, thanks for posting this. I’ll start taking this into consideration more seriously with my own projects, for sure.
March 23, 2011 at 5:50 pm
I’m a little late to the party, but after reading this, I found an excellent extension for Sphinx that comes with 1.0.x: sphinx.ext.viewcode
Basically makes linking to the Python source from generated reference docs a no-brainer.
May 11, 2011 at 4:04 pm
Hi Raymond
Just found your blog following our tweeting today.
One thing I find interesting is that back in the 80’s there were a lot of people using tag files to jump to source code. Both vi and emacs supported (and still support) jumping to tags, and the ctags and etags commands recognize Python. We used to have the full BSD source code tags in a well-known location (like /usr/src/tags), and it was trivial to jump to the source of pretty much anything.
I often wonder why that functionality, which seemed widespread, seems to have fallen into neglect.
Another thing that could be done therefore would be for distributions to ship Python with tags and TAGS files for the source. Then you don’t care where it is, you just hit C-] (vi, if I recall right) or M-. (emacs) when you’re sitting on a function call name or class name, etc., and boom – you’re there.
Regards.
May 11, 2011 at 5:00 pm
[Terry Jones]
>> Another thing that could be done therefore would be for distributions to ship Python with tags and TAGS files for the source
+1
May 17, 2011 at 1:48 pm
I work in the Launchpad.net codebase daily. It relies upon a large number of standard and external libraries. Some of the external libraries are installed on the system, some by buildout. Since last year I’ve been using a script, pyctags, to help generate tags for those libraries I’m interested in (alongside tags for Launchpad’s code):
http://allenap.tumblr.com/post/1248890048/tagging-installed-python-libraries
I usually run pyctags with the Python executable that buildout creates so that I can choose to generate tags for libraries that it manages. The same would probably apply with virtualenv and pip or easy_install.
May 29, 2011 at 10:23 am
I really appreciate the links to the source code in the documentation! I was wondering who had added that 🙂
July 15, 2011 at 5:30 am
As a young programmer, I’m in agreement. It was only a year ago that I actually spent time to deliberately read the source code of an open source project. That project was Python 3.
I have tried in the past to read source code. I remember each and every time I would see a bunch of C files, tried my very very hardest to find main() and when I invariably failed, I just gave up because I had no idea what to do next. Do I just start reading random files? Most of these projects didn’t have very good documentation for that sort of thing.
I found a series of blog posts online about disecting the python source code and a PEP that described how the parser worked. After reading those over 15 times, I was running python3 through GDB and understanding what was happening along the way. The surprising amount of resources out there for Python source code is what got me over that hurdle.
What I’d like to see more of is blog posts and articles actually about source code. real source code, in real projects. Somebody needs to popularize the learning of source code by making it as fun and easy as possible.
October 15, 2011 at 1:03 pm
Way late here…
“Starting with Python 3.2, I’ve gone through the module documentation and added direct links to the relevant source code.”
Yay! I was wishing for this a few weeks ago when reading the Python 2.7 docs. Your blog here prompted me to suggest that a feature be added to Sphinx, but as Casey Duncan points out above, it already exists. It’s use should be more strongly encouraged. Thanks.
March 4, 2012 at 8:10 am
Hi.. i’m a programmer too.. Just wanna share.. I’ve work for 1.5 years.. And correct me if i’m wrong.. More interesting learn free javascript framework such as extjs.. And i think now on its the best and the complete and the most easy javascript framework that i’ve learned..
May 9, 2012 at 5:58 am
I usually consider it a weakness in the docs if someone needs to view code I’ve given them in a library in order to make effective use of it. But these days we take that a little too far and actively hide the code and pretend it doesnt exist. I think linking to the source from the docs is a great idea.
May 9, 2012 at 6:48 am
For Python code, IPython is a big help with this. You can look up the source code for any module, function or object – so long as it’s defined in Python – with “foo??”.
May 9, 2012 at 7:54 am
Good points, Raymond.
One reason people find it hard to “read the source” is, as you correctly noted, largish repositories and checkouts with many sub-directories. Conventional tools (find | xargs grep) are somewhat lacking in this respect, so some folks use special source-searching tools like ack. I wrote ‘pss’ (https://bitbucket.org/eliben/pss – shameless plug!) to be a Python clone of ack, precisely for the goal of making search inside large source code repositories easier.
September 2, 2012 at 5:36 pm
Well said sir! Using “the source luke” has gone from obscure to impossible. We need a standard toolset that can index code and documentation.
June 21, 2015 at 2:20 am
I wouldn’t say that. It’s really a cultural thing. I teach my student that if they want to know about a function from a module they can just do :
import module
module.__file__[:-1]
And open the given file in their editor when it’s pure python code.
They never do it.
Partly it’s because they are scared.
Partly it’s because no teacher teach that anymore at the university.
But essentially it’s because any other method is more productive, so they are trained to use anything but looking at the source code.
Googling, asking on SO, on a forum, on IRC, calling a colleague and even reading the doc. Anything is just faster, and easier than reading the source code, since reading the source code assume you understand what’s going on.
Reading the source code made sense when you didn’t have 4 computers at hand, all with fast internet connection to the entire human knowledge.
Of course, they got used to the speed. And above all, their teacher/boss got use to it, and their behaviour implicitly broadcasted expectations leading to not read the source.
It took 5 years of professional programming before I started to read the source by myself on regular basic.
And yet, I do it only for Python code, because I have few C skills and never took the time to download the C files. I do know C is an important skill to have. I do want to learn it. And Rust, and Erlang, and Haskell, and spanish, and get better at CSS management, sys admin, system security, and try riak and open stack… BTW, that’s just the technical stuff, I got a whole life on the side.
Here lies a second truth : the time where you where you could master it all is over.
I now spend one HOUR everyday to keep it up, just in my narrow field of expertise, and I can’t even read it all, let alone practice. That 300 hours a year of self training just to keep afloat.
Now imagine the newcomers. They arrive in this world with so many languages, libs, frameworks, tools and techs. So many layers, so many level of indirection. They will try to get the answer as easily as they can, because the sum of knowledge you accumulated allowing you to read the source efficiently (source is not a free meal, it’s an illusion you get once you are good enough to read it effortlessly), you had much more time to swallow it than they have. And yet, they have more things to learn.
As the market experience changed, the customers started to expect a much more complex software experiences : polish UI, animations, speed, network connection, synchronisation, real time notification, service interconnections, Web APIs… That’s just for the app/web products, as any market got such higher expectations.
So you are expected to deliver all that stuff. My students will try to find a way to have the result they need as fast they can, with as little effort as they can.
It’s sad, but I understand.