Machine learning makes a difference, and makes a lot of money worldwide. Yet the JMLR has a free web site, and it only costs $75 ($101 outside the US) for an annual individual print subscription, a small fraction of the cost of subscribing to a conventional science journal. The journal runs like a collective, with MIT Press taking just the paper print rights, so costs are minimised. Turn-around time for authors is dramatically reduced. If somebody, say, in the third world wants to know anything up-to-date and rigorous about machine learning, this is the definitive place to reference. This is an excellent model for all journals to copy, especially in science. As one of the editors says: "What is the role of the scientist in academic publishing? Doing the publishing!"
The bulk of the journal's papers are devoted to discussing and evaluating learning methods. I was interested to see how ideas talked about in the journal actually worked, because that's really the whole point. So, as the journal is available on-line, I looked at every paper and then emailed the authors to ask them about their ideas. After a few weeks I had over 100 replies. I drafted this review, and then bounced it off the editorial board and the authors again. The enthusiasm of authors for their work was impressive; I had replies covering every paper published.
I asked whether the system described in each paper was available. Of course, some papers were theoretical; some replies said my question was irrelevant. Of the remaining, about a third specifically said their systems were unavailable. Their systems were private, commercial confidential, or incomplete in some way. Consider some quotes from replies I got: "Unfortunately, I do not have the system in a state where I can give it away right now" and "We don't have the data ready to be published". Further quotes are quite revealing about authors' attitudes. Somehow research, even stuff published in the journal, isn't considered public: "The system is a research prototype developed in my group, and is not appropriate for public dissemination" and "The implementations we had were very much 'research code', and not suitable for public consumption".
My informal survey suggests some authors have a relaxed regard for scientific virtues: reproducibility, testability, and availability of data, methods and programs -- the openness and attention to detail that supports other researchers. It's a widespread problem in computer science generally. I'm guilty, too. We programmers tend not to keep the equivalent of lab books, and reconstructing what we have done is often unnecessarily hard. As I wrote elsewhere (see www.uclic.ucl.ac.uk/harold/warp) there can be problems with publishing work that is not rigorously supported. It is the computer science equivalent of fudging experimental data -- whether this really matters for the progress of science is another question.
Then there is the problem of who owns the work. As one author put it: "We have not had the time to turn our experimental code into something other people can use (and anyway our employers wouldn't like to see things given away)." Certainly there needs to be a balance between science and protecting intellectual property; it's a big problem, as turning research ideas into code that really works might involve a company that then owns it. On the other hand, there is no reason why open source code cannot be made freely and immediately available, at least to the depth the ideas are discussed in the papers. And it is possible: look at sites like the GNU-licensed open source Weka machine learning project (www.cs.waikato.ac.nz/ml), which provides a framework people can give and take shared work. Many other sites have papers, code, demos and data too.
The Journal of Machine Learning Research does try to encourage authors to add electronic appendices with source code, data, demonstrations: anything, as the journal puts it, that will make life easier or more interesting for readers and researchers who follow in the authors' footsteps. Some authors do an excellent job, but spreading the good practice is an uphill struggle! Machine learning will change our uses of computers dramatically, so let's hope the journal achieves its goals with more and more success.
Harold Thimbleby is Director of UCLIC, the UCL Interaction Centre, and Gresham Professor of Geometry. He is a Royal Society-Wolfson Research Merit Award Holder.