echo "hey, it works" > /dev/null

just enough to be dangerous

Close to completion


When I feel like I'm not making much progress with my PhD, I tell myself that everything I do is progress towards completion. Most of the time that motivates me to do something, anything, to move forward.1

But if I keep making progress, why aren't I ever finished?

eternity.png

  1. "Moving forward" will be one of those phrases that no-one can use without rolling their eyes. Thanks, Julia.

How did I get here, or a history of my thesis topic


Talking to my beloved last night, I said something along the lines that I'd never been excited by my thesis topic, and she pressed me. If that was the case, how did I end up choosing it? During the explanation I found that I'd raised my voice and became quite animated, and I realised that I might have rewritten history. I've previously written about some of the practical history of my candidature, but not the topical history. So how did I get here?

In the early noughties, I was doing a course in my masters called intelligent web systems, and doing a pretty bad job of it actually, mostly because I wasn't spending much time on it. That, and the fact that we could write our assignments in any language we liked, so I decided to start the night before it was due and do it in python, which I'd never used before. I digress. The final assessment was to write an essay of a couple of thousand words about anything that broadly fit into intelligent web systems. I was teaching at the time, so I was thinking about educational things, and so I chose to write about edutella, a p2p network for searching semantic web data with the aim of facilitating the exchange of educational resources.

In my final semester of the masters, I enrolled in a "dissertation", a one-semester course writing a few thousand words on a topic of my choice. I did the dissertation because I wanted to avoid having to enrol in the research masters program, preferring to try to go straight into the PhD, and needed to demonstrate that I could write. Not that I actually wanted to do a PhD, but the powers that be had made clear I had to do it if I wanted to keep my job. Still teaching and thinking about education, and in an attempt to make things as easy as possible for myself, I decided to continue on from my essay, looking at retrieval of educational resources.

The state of play was that, mostly, educational resources were stored in repositories. To retrieve them, systems would search human‑assigned descriptive metadata. That's pretty much still the case. I talked to some wise people in the school, and came to the conclusion that this library-style approach to retrieval could be improved using techniques drawn from the information retrieval community. For a start, I could extract text and search the primary resource, rather than secondary data.

That's where I got excited about things, but it's where I should have started to realise that doing a PhD wasn't a great idea for me. The dissertation ended up being about 5000 words, and I struggled. But I went ahead and blindly enrolled in the PhD anyway.

On the strength of my dissertation, and knowing the right people, I was seconded to RMIT's Teaching and Learning Portfolio, to do a scoping project aimed at helping improve the management and reuse of educational resources. During this time, along with Henric Beiers, I conducted a bunch of interviews, focus groups, and a survey. How did other universities manage resources? What are the barriers to reuse? How did educators want to find resources? This work would become a chunk of my thesis, and by that time I felt my path was pretty much set.

After three years part time, I decided to leave my job, go full time, and spend the next 18 months finishing my PhD. Six months in, working on applying IR methods to retrieval of resources from respositories, I realised I had no faith in the area. Millions and millions of dollars was being spent around the world trying to set up these repositories, and the top down approach just didn't seem to be working. I'm sure there are many, many people working on these projects who would vehemently disagree with me, but that's how I felt at the time. How had those educators said they wanted to find educational resources? They wanted to search with Google.

So I threw away six months' work and tried to regroup. I changed focus to filtering educational resources from search results returned by a regular search engine (Yahoo! because their API was easier to work with), changed to thinking more about learners in general rather than academics. The question of how such filtering systems should be evaluated became the next chunk of my thesis. The final part was the implementation of a simple filtering system, throwing a bunch of resource features at some machine learning classifiers and seeing what worked.

I'm now approaching the end of the road, I'm due to submit my thesis to the school in six weeks (I'm quivering with stress at the thought of how much work that's still left to do).

And after all that, it seems that at times I have been excited. But I sure as hell hate it now.

Two goals for 2010


Along with a host of smaller goals for 2010, there are two major things I have to get done this year. In this post, I'll talk about those two goals and look back a bit as well.

Finish my PhD

Yes, this will be the year that I finish my PhD. I first enrolled in 2004, studying part time while working as an academic at RMIT's School of CS and IT. My main role at the time was the operations manager of the delivery of the African Virtual University project. Like all good roles, it was at the edge of my comfort zone when I started, and I spent a lot of time trying to do it well. In hindsight, I might have been able to delegate some of the work, but I think that was part of the learning experience.

Delivery to Africa wrapped up at the end of 2006, and I saw it as an opportunity to focus on my research, which had been terribly neglected for the three years of my enrolment. I quit my job and went full time. It took a few months to make the transition to full time study, something I hadn't done for more than a decade, and by the time I was starting to gain a bit of momentum I realised I had no faith in the direction my research was heading. Painful though it was, I ditched what I'd been doing and took a couple of big steps backwards, resulting in six months of work that won't make it into the main body my thesis.

From there I've battled with all the usual things that PhD candidates battle with; distractions, procrastination, yak shaving, family stuff, a stint back in industry, loss of motivation, what the hell is this all for anyway, et cetera. But now I'm finally within striking distance of the end, so it really must be finished this year.

That means trying to focus more, compartmentalise the worthy distractions, not spend too much time surfing engaging music sites, and writing regularly.

Establish a reliable, enjoyable source of income

And then what? I could spend a lifetime just exploring the stuff that I find interesting, working on Habari and other open source projects, but in and of itself that doesn't put food on the table. The next goal is much more nebulous. How do I turn the stuff I enjoy into an income stream?

While I'm happy to work hard and work long hours, I don't really want to go back to traditional full-time work, a 9-to-5 job. The idea makes me yawn, though I guess I'd do it if the job was awesome in other ways. I definitely don't want to go back into traditional academia. Teaching can be fun, but it can also suck up any amount of available time, and given the amount of bureaucracy that seems to be required, the chances of getting any research done as a junior academic are slim.

My ideal job would allow me to work on web stuff, interesting open source projects, particularly Habari, be engaged with people. I don't need a huge income, but flexibility is important. I don't want to be tied to a physical location, mostly because I want to be able to work from the farm when we're there and I don't want my work to tie down Rachel's job opportunities, wherever they may be.

I would like to continue to collaborate on research work, stuff related to the web and to open source, evaluating the stuff I'm working on so that other people can benefit from it and build upon it. Maybe twofish creative will be re-energised, and we can do web sites for people we like (we're doing a bit of that, but not much). Maybe I'll start another business with like-minded folk. Maybe I'll do some freelance coding. Maybe interesting projects will pop up.

Whatever shape the thing is, I have to wrangle it by the end of the year. Wish me luck (or, when I've submitted the thesis, suggest something or make me an offer).

Why would you do a PhD ?


A friend who started his PhD at the same time as me, and finished a while ago, answered a question on LinkedIn recently. Paraphrased, the question was, "Why would someone do a PhD?" and his answer was something like, "Well, you get a title, and you never know, it may come in handy some day."

Compelling reasons, for sure.

This "WTF am I doing" moment, was brought to you by the universe.

Stereotyping of information retrieval evaluation methodology


Last Friday I went to an interesting seminar by William Webber (blog), the basic premise of which was that IR researchers should consider constructing their own test collections, and outlined how to go about that. Here's the abstract.

I thought one slide pointing out how reusing test collections can lead to an unhealthily narrow focus was especially pithy, and I reproduce it here with William's permission.

Methodology section before TREC:

We identify as experimental variables: user characteristics;
problem statement; question statement; question characteristics;
search strategy; search characteristics …

Methodology section after TREC:

We take the TREC 8 AdHoc track collection. Our evaluation metrics
are P@10 and MAP.

History of IR: evaluation methodologies


Much of the discussion in 1966 has continued to revolve around methodological issues and has consisted largely of a repetitious dissection of a very limited amount of experimental activity with little theoretical basis. Refutations of rebuttals are often interesting but they typically generate little additional knowledge. Thus, the dialogue between those who accept the Cranfield methodology and those who, for a variety of reasons, are critical of it, has not been particularly productive from the point of view of advancing the state of the art. There is, and can be, no one way to test and evaluate retrieval systems, and it is absurd to imagine that any particular testing technique, or set of measures, will solve the problem of evaluation. Rather than engage only in carping criticism of the deficiencies of any one research project (thus giving rise to a new round of justifications of the procedures employed), it would be more desirable to devise and test alternative methodologies. Unfortunately, only scattered instances of this more positive approach can be found.

This is Alan Rees in 1967, as quoted by Cyril Cleverdon, one of the founders of modern information retrieval, in 1968. A common problem in many areas of research, especially in their youth. I love that Cleverdon doesn't directly go after his (or at least his methodology's) detractors, but quotes someone else doing it.

Presentation tips


I recently attended my school's research conference, where PhD candidates present their work. While there's lots of great research going on, some of the speakers weren't able to get that great research across effectively. I'm no great speaker, and my talk at the conference was completely devoid of content, so my advice is freely ignorable. Take it or leave it.

Don't cover too much

I know you want to include all the cool things in your presentation but all you're really doing is forcing yourself to speak fast so that you can fit everything in. Either that or you're going over time, which is just rude. And if you talk into your question time then you might miss some of the best stuff that can happen at a gig like this -- interaction. I know you might be scared of it, but it will do you good.

So take your time, don't have too much content, even have some slides that you can skip if you're running short of time.

Don't crowd your slides

On the subject of slides, cut the text! If you have a heap of text, people will read the text and they won't listen to you talk. People can't focus on both you and the slides, so lots of text is just distracting. Slides should provide some structure but they shouldn't provide all the content.

You should also think very carefully before including tables of data in your presentation. Graphs are fine, as long as they're not too complex, but tables are hard for your audience to absorb at a glance. If a table isn't absolutely crucial to your talk, leave it out.

Don't distract your audience

Laser pointers are like digital watches. You might think they're cool, but you're wrong. I've seen very few good uses of laser pointers in presentations (mostly pointing out interesting spots on graphs), and very many bad uses (like when the speaker reads the slides and follows along with the laser pointer, karaoke style). If you really must draw attention to something on your slides (you've already cut the down, right, so there shouldn't be much distraction on there anyway) consider using the presentation software itself to call things out (but be subtle about that too, no vomit inducing animations). And if a laser pointer is essential, practice holding it steady. Don't wave it around like a light sabre.

Most importantly ...

... try to have a good time!

So you want to get a PhD, huh?


No, I'm not trying to sell you anything.

There are all sorts of reasons to do a PhD, all of them insane. I haven't finished mine yet, so I may not be the best person to give advice on the topic of how to get the thesis out the door, but here are a few things that I wish people had said to me when I started. Hopefully it might help to make your journey a little bit easier.

Summarise everything

You'll read lots of stuff. So that you don't read lots of stuff, forget lots of stuff, read it again with a vaguely familiar feeling, forget it again, then design an experiment that seems perfect only to find you've redone an experiment you read about two years ago, summarise all the papers you read, even the crap ones. This will help clarify your thoughts on the papers and provide pointers to them later. You still may end up reading the same paper a few times but hopefully summarising them will let you get more out of them each time.

You could choose to implement a specialised system for keeping track of the papers that you've read, but just putting them in the literature review chapter of your thesis template works pretty well. At the same time, you should enter all the required bibliographic information in whatever bibliographic information management software you're going to use.

Keep track of your citations

Did I mention that you should use bibliographic information management software? You must, whether it's BibTeX, EndNote or something else. The last thing you want to do is go back to that paper you read that's been sitting on your desk for a year because it's just completely on point and then spend a week trying to find the goddamn citation details because you didn't write down where you got it from.

Start writing

In fact, don't just write about the papers you've read. Write about anything and everything that pops into your head that's even vaguely related to your research. Maybe even stuff that's not related. It will improve your written expression and it will clarify your thoughts. Some people keep a paper journal, but I recommend blogging, partly because your words will be searchable and viewable in different ways--by date or by tag, for example--but also because I type faster, and more legibly, than I write.

You can choose to keep your super special secret sauce under wraps so that you don't get scooped. I keep two blogs, one private, one public (that would be this one), but in hindsight I probably should have written more about my research publicly. An additional benefit of a public blog is that you can hand out the address to those people who come up to you at conferences. Sure, they may be completely bonkers to be interested in what you're doing, but insane recognition is better than none.

Get stuff in the template

If you haven't already, download your thesis template now. Right now. Become familiar with the layout. While you're there, find a couple of theses in related areas and have a look at them too. It's fun to read what your supervisor wrote nearly 20 years ago. Try and work out what needs to go where. Put in some headings and regenerate the table of contents. That's your plan, see?

You don't have to constrain yourself to only filling in the literature review, write down a paragraph or two under each heading. With all the guff in the template it's probably 20 pages long already. Not far to go, hey?

Start versioning

This is a bit of a tricky one if you're doing a PhD in a discipline where versioning is not common, but I highly recommend it. If you're programming at all, you have no choice, set up a repository using your favourite versioning software. If you don't have a favourite, ask someone clever. I use subversion, and I'm happy.

Just to be clear, I'm not just talking about code. I'm talking about versioning your writing output. In research, that's your thesis and any papers you're writing. Submit a paper to a conference? Tag the release. Got accepted? Well done. Reviewers give you good feedback? Fix the paper and tag the final release.

Getting work done

If you're anything like me, you're likely to find the world an immensely interesting place. Hugely more interesting than your research topic, in fact. The one rule that helped me start making some real progress was that whenever I sat down at the computer, the first thing I did had to be research related. Go to the toilet, get something to eat, get some exercise--highly recommended--go to bed, but when you come back to the computer, research is the first priority. It doesn't really matter what it is, it's just got to be something to do with your research. What is important is what it's not. It's not checking your email, or catching up on the news headlines, or reading the 612 unread items in your feed reader, or playing Nethack. Yes, I know, I'm linking to Nethack. Don't go there.

A final word ...

Don't procrastinate by writing crap on your blog though.

Good luck with your study!