echo "hey, it works" > /dev/null

just enough to be dangerous


And unless you're a bank or an airline, I'll never mention it again.

On pride and the PhD

With the risk of turning this blog into a research whinge-fest ....

I'm working on my completion seminar. That means I'm really close to done with this whole PhD journey, and I should be feeling happy and bouncy and joyful about the prospect of prattling on about what I've achieved over these last mumble years. But I don't.

I've learnt an enormous amount about lots of things during my candidature: technical things, computer science theory, research practice, critical thinking, statistics, writing, personal things like my tendency to procrastinate, and about my capacity to follow things through. I am proud to have managed to follow this through, and to have a complete thesis, and I'm happy to have learnt a great deal.

But here's the reason I don't feel happy and bouncy and joyful: I don't actually feel proud of the work.

Part of the reason is that I'd do everything differently (better) now. I know the PhD is supposed to be training to become a researcher, so by definition I wasn't ready to be a researcher when I was doing much of the work. I list a bunch of contributions in my conclusion, and I think my work matches the illustrated guide to a PhD. I think the thesis is probably good enough to pass.

William Webber, in a comment on another of my navel-gazing research posts, suggests I should wait 15 years for the pride to kick in.

I wonder if this lack of pride is a common feeling for students reaching the end of their candidature.

Close to completion

When I feel like I'm not making much progress with my PhD, I tell myself that everything I do is progress towards completion. Most of the time that motivates me to do something, anything, to move forward.1

But if I keep making progress, why aren't I ever finished?


  1. "Moving forward" will be one of those phrases that no-one can use without rolling their eyes. Thanks, Julia.

How did I get here, or a history of my thesis topic

Talking to my beloved last night, I said something along the lines that I'd never been excited by my thesis topic, and she pressed me. If that was the case, how did I end up choosing it? During the explanation I found that I'd raised my voice and became quite animated, and I realised that I might have rewritten history. I've previously written about some of the practical history of my candidature, but not the topical history. So how did I get here?

In the early noughties, I was doing a course in my masters called intelligent web systems, and doing a pretty bad job of it actually, mostly because I wasn't spending much time on it. That, and the fact that we could write our assignments in any language we liked, so I decided to start the night before it was due and do it in python, which I'd never used before. I digress. The final assessment was to write an essay of a couple of thousand words about anything that broadly fit into intelligent web systems. I was teaching at the time, so I was thinking about educational things, and so I chose to write about edutella, a p2p network for searching semantic web data with the aim of facilitating the exchange of educational resources.

In my final semester of the masters, I enrolled in a "dissertation", a one-semester course writing a few thousand words on a topic of my choice. I did the dissertation because I wanted to avoid having to enrol in the research masters program, preferring to try to go straight into the PhD, and needed to demonstrate that I could write. Not that I actually wanted to do a PhD, but the powers that be had made clear I had to do it if I wanted to keep my job. Still teaching and thinking about education, and in an attempt to make things as easy as possible for myself, I decided to continue on from my essay, looking at retrieval of educational resources.

The state of play was that, mostly, educational resources were stored in repositories. To retrieve them, systems would search human‑assigned descriptive metadata. That's pretty much still the case. I talked to some wise people in the school, and came to the conclusion that this library-style approach to retrieval could be improved using techniques drawn from the information retrieval community. For a start, I could extract text and search the primary resource, rather than secondary data.

That's where I got excited about things, but it's where I should have started to realise that doing a PhD wasn't a great idea for me. The dissertation ended up being about 5000 words, and I struggled. But I went ahead and blindly enrolled in the PhD anyway.

On the strength of my dissertation, and knowing the right people, I was seconded to RMIT's Teaching and Learning Portfolio, to do a scoping project aimed at helping improve the management and reuse of educational resources. During this time, along with Henric Beiers, I conducted a bunch of interviews, focus groups, and a survey. How did other universities manage resources? What are the barriers to reuse? How did educators want to find resources? This work would become a chunk of my thesis, and by that time I felt my path was pretty much set.

After three years part time, I decided to leave my job, go full time, and spend the next 18 months finishing my PhD. Six months in, working on applying IR methods to retrieval of resources from respositories, I realised I had no faith in the area. Millions and millions of dollars were being spent around the world trying to set up these repositories, and the top down approach just didn't seem to be working. I'm sure there are many, many people working on these projects who would vehemently disagree with me, but that's how I felt at the time. How had those educators said they wanted to find educational resources? They wanted to search with Google.

So I threw away six months' work and tried to regroup. I changed focus to filtering educational resources from search results returned by a regular search engine (Yahoo! because their API was easier to work with), changed to thinking more about learners in general rather than academics. The question of how such filtering systems should be evaluated became the next chunk of my thesis. The final part was the implementation of a simple filtering system, throwing a bunch of resource features at some machine learning classifiers and seeing what worked.

I'm now approaching the end of the road, I'm due to submit my thesis to the school in six weeks (I'm quivering with stress at the thought of how much work that's still left to do).

And after all that, it seems that at times I have been excited. But I sure as hell hate it now.

Two goals for 2010

Along with a host of smaller goals for 2010, there are two major things I have to get done this year. In this post, I'll talk about those two goals and look back a bit as well.

Finish my PhD

Yes, this will be the year that I finish my PhD. I first enrolled in 2004, studying part time while working as an academic at RMIT's School of CS and IT. My main role at the time was the operations manager of the delivery of the African Virtual University project. Like all good roles, it was at the edge of my comfort zone when I started, and I spent a lot of time trying to do it well. In hindsight, I might have been able to delegate some of the work, but I think that was part of the learning experience.

Delivery to Africa wrapped up at the end of 2006, and I saw it as an opportunity to focus on my research, which had been terribly neglected for the three years of my enrolment. I quit my job and went full time. It took a few months to make the transition to full time study, something I hadn't done for more than a decade, and by the time I was starting to gain a bit of momentum I realised I had no faith in the direction my research was heading. Painful though it was, I ditched what I'd been doing and took a couple of big steps backwards, resulting in six months of work that won't make it into the main body my thesis.

From there I've battled with all the usual things that PhD candidates battle with; distractions, procrastination, yak shaving, family stuff, a stint back in industry, loss of motivation, what the hell is this all for anyway, et cetera. But now I'm finally within striking distance of the end, so it really must be finished this year.

That means trying to focus more, compartmentalise the worthy distractions, not spend too much time surfing engaging music sites, and writing regularly.

Establish a reliable, enjoyable source of income

And then what? I could spend a lifetime just exploring the stuff that I find interesting, working on Habari and other open source projects, but in and of itself that doesn't put food on the table. The next goal is much more nebulous. How do I turn the stuff I enjoy into an income stream?

While I'm happy to work hard and work long hours, I don't really want to go back to traditional full-time work, a 9-to-5 job. The idea makes me yawn, though I guess I'd do it if the job was awesome in other ways. I definitely don't want to go back into traditional academia. Teaching can be fun, but it can also suck up any amount of available time, and given the amount of bureaucracy that seems to be required, the chances of getting any research done as a junior academic are slim.

My ideal job would allow me to work on web stuff, interesting open source projects, particularly Habari, be engaged with people. I don't need a huge income, but flexibility is important. I don't want to be tied to a physical location, mostly because I want to be able to work from the farm when we're there and I don't want my work to tie down Rachel's job opportunities, wherever they may be.

I would like to continue to collaborate on research work, stuff related to the web and to open source, evaluating the stuff I'm working on so that other people can benefit from it and build upon it. Maybe twofish creative will be re-energised, and we can do web sites for people we like (we're doing a bit of that, but not much). Maybe I'll start another business with like-minded folk. Maybe I'll do some freelance coding. Maybe interesting projects will pop up.

Whatever shape the thing is, I have to wrangle it by the end of the year. Wish me luck (or, when I've submitted the thesis, suggest something or make me an offer).

Why would you do a PhD ?

A friend who started his PhD at the same time as me, and finished a while ago, answered a question on LinkedIn recently. Paraphrased, the question was, "Why would someone do a PhD?" and his answer was something like, "Well, you get a title, and you never know, it may come in handy some day."

Compelling reasons, for sure.

This "WTF am I doing" moment, was brought to you by the universe.

Stereotyping of information retrieval evaluation methodology

Last Friday I went to an interesting seminar by William Webber (blog), the basic premise of which was that IR researchers should consider constructing their own test collections, and outlined how to go about that. Here's the abstract.

I thought one slide pointing out how reusing test collections can lead to an unhealthily narrow focus was especially pithy, and I reproduce it here with William's permission.

Methodology section before TREC:

We identify as experimental variables: user characteristics; problem statement; question statement; question characteristics; search strategy; search characteristics …

Methodology section after TREC:

We take the TREC 8 AdHoc track collection. Our evaluation metrics are P@10 and MAP.

History of IR: evaluation methodologies

Much of the discussion in 1966 has continued to revolve around methodological issues and has consisted largely of a repetitious dissection of a very limited amount of experimental activity with little theoretical basis. Refutations of rebuttals are often interesting but they typically generate little additional knowledge. Thus, the dialogue between those who accept the Cranfield methodology and those who, for a variety of reasons, are critical of it, has not been particularly productive from the point of view of advancing the state of the art. There is, and can be, no one way to test and evaluate retrieval systems, and it is absurd to imagine that any particular testing technique, or set of measures, will solve the problem of evaluation. Rather than engage only in carping criticism of the deficiencies of any one research project (thus giving rise to a new round of justifications of the procedures employed), it would be more desirable to devise and test alternative methodologies. Unfortunately, only scattered instances of this more positive approach can be found.

This is Alan Rees in 1967, as quoted by Cyril Cleverdon, one of the founders of modern information retrieval, in 1968. A common problem in many areas of research, especially in their youth. I love that Cleverdon doesn't directly go after his (or at least his methodology's) detractors, but quotes someone else doing it.

Presentation tips

I recently attended my school's research conference, where PhD candidates present their work. While there's lots of great research going on, some of the speakers weren't able to get that great research across effectively. I'm no great speaker, and my talk at the conference was completely devoid of content, so my advice is freely ignorable. Take it or leave it.

Don't cover too much

I know you want to include all the cool things in your presentation but all you're really doing is forcing yourself to speak fast so that you can fit everything in. Either that or you're going over time, which is just rude. And if you talk into your question time then you might miss some of the best stuff that can happen at a gig like this -- interaction. I know you might be scared of it, but it will do you good.

So take your time, don't have too much content, even have some slides that you can skip if you're running short of time.

Don't crowd your slides

On the subject of slides, cut the text! If you have a heap of text, people will read the text and they won't listen to you talk. People can't focus on both you and the slides, so lots of text is just distracting. Slides should provide some structure but they shouldn't provide all the content.

You should also think very carefully before including tables of data in your presentation. Graphs are fine, as long as they're not too complex, but tables are hard for your audience to absorb at a glance. If a table isn't absolutely crucial to your talk, leave it out.

Don't distract your audience

Laser pointers are like digital watches. You might think they're cool, but you're wrong. I've seen very few good uses of laser pointers in presentations (mostly pointing out interesting spots on graphs), and very many bad uses (like when the speaker reads the slides and follows along with the laser pointer, karaoke style). If you really must draw attention to something on your slides (you've already cut the down, right, so there shouldn't be much distraction on there anyway) consider using the presentation software itself to call things out (but be subtle about that too, no vomit inducing animations). And if a laser pointer is essential, practice holding it steady. Don't wave it around like a light sabre.

Most importantly ...

... try to have a good time!