By definition scientific researchers strive to find greater
understanding for their chosen communities. However, I am noticing that in the
field of computational computing, the desire to get to the next step is so
great simple “future-proofing” steps in our code is ignored.
My adventures in benchmarking The Eel Pond Protocol drove
home this point over the past few weeks. As such I am going to share my experiences on
this project to illustrate areas that seem to pop up quite a bit in open-source
I have developed a make file that allows me to run the
entire protocol with one command line call. Based on the initial testing I
mentioned in my last post (found here), I decided to add some commands to break
down a few of the stats I had collected. Then, I begin a new run, then walked
away with the expectation that it would run as seamlessly as before. Every several hours I peak in on the machine
just to make sure it is behaving; the control freak in me I suppose.
To my surprise, the first time I checked in, the script had
thrown an error and stopped. Luckily the stopping point was a khmer script
call, whose path was outdated due to package updates. At that point, I realized
I hadn’t subscribed to the mailing list thus missed the announcement. In this
case I plead ignorance, Titus and the team had announced the changes in a
timely matter, which is all we can ask. Though, I will say that not only had it
been moved from the sandbox (YAY, official release of a feature), the author
had also renamed the file so even a search of all of khmer files couldn’t find
it. Thus my first tip to open source coding is: if you are moving a script it
is good practice not to change the file name. This way if a user with a fair
amount of command line experience can find its new home!
Having fixed my broken script I was off to the races again! This
time I made it all the way to assembly (around 30 hours in), and realized that
Trinity had also pushed a new version out. Each release of the software
installs to a folder that contains the date it was made public. As a
consequence, our protocol that download and installs the latest release of
Trinity and the call path needed revision. While this is a rather trivial task,
the company has a 6-month update cycle it would need to be updated fairly
frequently. As a result, a few of us bounce ideas on the best way to make this
a relative path. The public protocol now reflects this; effectively eliminating
said maintenance to this part of the protocol.
As many of you are probably thinking, both of these are extremely minor issues in the big scheme of things. In the spirit of transparency, running the protocols to find the errors took exponentially more time then the process of fixing them. By the same token, once a project runs correctly, taking an extra hour or two to review and modify code from a “future-proofing perspective” we can save both time and resources down the line. Then again, I have the tendency to over complicate logic on the first run through, so I automatically go back in an effort to reduce runtime complexity. The simple “future-proofing” steps seem like a natural extension into my workflow.
At the end of the day, there is a lingering thought that had I just wondered upon the tutorial in an effort to learn more computational biology I would have moved on to another source.