Seems Atom is popping up everywhere in my world at the moment, after a bit of a quiet period.

Chris J. Davis announced his plans to create an AtomPub client for iPhone. I had a discussion with Rick Cockrum about AtomPub and the metaweblog API, on his blog, which was raised again on the Habari developers list. And then Habari trunk was merged into the AtomPub development branch, because no-one has had a chance to do any work there recently. Well, no-one being me, and I got a poke from freakerz. Better start working on some of those AtomPub tickets.

The Habari developers have done a great job getting AtomPub implemented early on. The AtomPub implementation is now error free, according to the APE, but there are still warnings and it is still partially incomplete. As I see things, these are the outstanding issues that I know about, in no particular order. Each of these should probably be entered as a separate issue in the tracker. As I'm not super familiar with the spec, I've probably missed bits that Habari doesn't support. If any of this is wrong, or it's incomplete, or you think my solution is off, please let me know. [I've posted a similar message to habari-dev.]

Atom and AtomPub share a URL

While this works okay, because Habari uses the presence of authentication details to tell the difference between a request for a feed and a request for an AtomPub service document, it relies on the client sending a correct username and password with the initial request. If the client wants a service document and they don't send authentication, they'll get a feed instead of a 401 authentication challenge.
Solution: Provide a separate URL for the service document, probably /atom/service.

PHP as a CGI

When PHP is running as a CGI the HTTP Authorization header is not passed on to the script. Currently this means that authorisation will fail.
Solution: There is a simply (but ugly) workaround that uses mod_rewrite to write variables that the script can read. I have a patch ready for this, but haven't submitted it yet.

Support WSSE authentication

Habari currently supports only HTTP Basic Authentication. Some clients and devices only implement WSSE Authentication, such as the Nokia N73.
Solution: As a proof of concept in the past, I implemented WSSE in WordPress. One slight catch, and the reason it didn't work in WordPress, is that WSSE requires plain text passwords to be retrievable on the server. Are they available in Habari? If so, it should be trivial for me to write a patch.

AtomPub categories support

Atom entries can contain category elements. While Habari doesn't support categories, it does support tags, so categories supplied in an Atom entry should be written as tags on the post.
Solution: I've submitted a patch to do this.

AtomPub draft support

Habari does not currently support the AtomPub protocol provision for a client to send a request for an Atom entry to be considered a draft.
Solution: Detect the app:control element in Atom entries and honour the app:draft setting, where a value of 'yes' means the entry is a draft.

Atom media collection support

Habari does not currently support the upload of media collections.
Solution: This is contingent upon Habari implementing file uploads, and will be slightly more complex than publishing of simple Atom entries because there's a level of indirection between uploaded resources and their Atom entries.

AtomPub summary support

I'm not sure I've got a handle on this one completely. In the section entitled Media Link Entries, the AtomPub spec says, "[RFC4287] specifies that Atom Entries MUST contain an atom:summary element" but the Atom spec clearly says in both the (normative) text and (non-normative) RelaxNG fragments that atom:summary is optional. The Atom spec says, "It is advisable that each atom:entry element contain ... a non-empty atom:summary element when the entry contains no atom:content element." So, given an uploaded media entry, Habari may create a summary element for the media link entry associated with the media entry.
Solution: I don't really see any way of producing a summary automatically. I think the only thing we can do is to support summary elements provided by the client. As file uploads has not yet been implemented, it's trivial to add a summary column when it is.

Support foreign markup delivered by AtomPub

I'm a bit hazy on this one too. The Ape produces a warning when Habari drops a subject element that is a child of the entry element. The relevant section seems to be section 6 of the Atom spec, but it doesn't specifically say that foreign markup should be preserved. Section 6.2 of the AtomPub spec says, "Unrecognized markup in an Atom Publishing Protocol document is considered "foreign markup" [...] Clients SHOULD preserve foreign markup when transmitting such documents." While this isn't talking about servers, it implies foreign markup in legal locations should be preserved. I might be missing a reference here, but it seems to me the spec isn't clear on this point.
Solution: Explore the issue more.

Decouple app:edited and atom:updated

Atom entries can have a client provided atom:updated element. The purpose of this element is to signify that a significant change has been made to the entry. AtomPub provides the app:edited element, which is a server provided timestamp representing the last time that the entry was changed in any way. An AtomPub server can override the atom:updated value and use its own, which Habari currently does. Habari's Post class's updated column is written on all edits and is used to fill app:edited elements.

Solution: Create an edited column for the Post class, and use this to populate app:edited, and use the client provided atom:updated value (or a timestamp if it isn't provided) to fill the updated column. This would have the side effect that with a small UI change in admin, users could mark things as minor edits, such as fixing typos or adding tags. However, changing Post would also have far-reaching consequences. We could also sort the Atom feed by atom:updated, as this makes semantic sense and order is not mandated in the Atom spec, and still have AtomPub sending in app:edited order.

get_collection and get_entry violate DRY

Solution: Refactor the code.

  • Write a mini AtomPub desktop client with a Shoes frontend
  • Write a bibliographic tool (distributed?)
  • Complete the AtomPub implementation in Habari (media and comments collections, content types, categories (tags), summary)
  • Implement a templated workflow system
  • Migrate to Habari
  • ... um, finish my Phd, I guess

freakerz asked an interesting question on the habari issues tracker.

How should we handle this kind of behavior? If the client does not use the APP properly, should we prevent the loss of information by catching/handling bad requests?

Of course, this is time to mention Postel's law, "be conservative in what you do, be liberal in what you accept from others." Again, of course, this solves nothing because there is an enormous amount of argument about what that means for implementors. There's already been a lot of discussion of this in relation to how aggregators should handle broken feeds.

The AtomPub spec does have something to say about how consumers should behave when they receive non-conforming content.

The Atom Protocol imposes few restrictions on the actions of servers. Unless a constraint is specified here, servers can be expected to vary in behavior, in particular around the manipulation of Atom Entries sent by clients. [...] Servers can choose to accept, reject, delay, moderate, censor, reformat, translate, relocate or re-categorize the content submitted to them. [...] The same series of requests to two different publishing sites can result in a different series of HTTP responses, different resulting feeds or different entry contents.

Aside: The anchor to this section is named "lark's vomit." Those wacky spec authors, I did but chuckle.

This doesn't answer the question at all for server implementers. It simply says you can do whatever you like when you receive stuff. The end user doesn't care, but they don't want to have their content rejected. This is a competitive business, and you don't want to lose clients, and you will if a competitor accepts something that you don't. People will switch.

Well-formedness is a minimum, so stuff that's not well-formed should stop processing. But in a nice way, that upsets the user as little as possible. So, we're really talking about invalid entries. Should there be specific code for producers that are known to create borked entries? More usefully, like testing for browser behaviour rather than a specific user agent, consumers should do their best to catch classes of errors and deal with them appropriately.

The server should accept all content and fail gracefully when it can't be consumed properly. When broken content is received server implementers have a responsibility to do everything they possibly can to work with the producer's builder to fix the problem, but in the meantime they should try to handle the broken content.

Another suggestion was that we do away with the Atom autodiscovery <link> element and just use an HTTP header, because parsing HTML is perceived as being hard and parsing HTTP headers is perceived as being simple. This does not work for Bob either, because he has no way to set arbitrary HTTP headers. It also ignores the fact that the HTML specification explicitly states that all HTTP headers can be replicated at the document level with the <meta http-equiv="..."> element. So instead of requiring clients to parse HTML, we should just require them to parse HTTP headers… and HTML.

I was excited to learn recently that the Nokia N73 can speak AtomPub, and that a friend of mine owns one. I thought I'd try to make it talk to the new AtomPub implementation in WordPress, but reading through the N73 documentation I found that it only supports WSSE authentication, and WordPress only speaks HTTP Basic Authentication. I'd never heard of WSSE, but Mark Pilgrim has a good write up on XML.com, and the Ape has the ability to speak WSSE, so I thought I'd implement it in WordPress. Bear in mind that I'm not writing this from a security point of view, I'm just looking at authentication as a necessary evil to get cool AtomPub things working. And there's a spoiler: it can't be done :)

A WSSE client will send an Authorization header which, as we know, will get dropped if Apache is passing the request off to a CGI, and a X-WSSE header that looks like this:
X-WSSE: UsernameToken Username="USERNAME", PasswordDigest="PASSWORDDIGEST", Nonce="NONCE", Created="2007-09-08T05:52:36Z"

PasswordDigest is a base64 encoded SHA1 digest of the concatenation of the nonce, the timestamp and the password. The nonce is of course some random string.

So, to add WSSE into WordPress AtomPub, we can add some code to the authentication function in wp-app.php.

First, we check if the client is trying to authenticate using WSSE by looking for a X-WSSE header.
if(isset($username_token = $_SERVER['HTTP_X_WSSE'])) {

We then take the Username Token contained therein and split out the user, digest, nonce, created information sent by the client. There are probably nicer ways to do this.
$wsse = array( 'user' => "", 'digest' => "", 'nonce' => "", 'created' => "", 'password' => ""); $tokens = explode(", ", trim(strstr(stripslashes($username_token), " "))); foreach ($tokens as $token) { $pivot = strpos($token, '='); $key = substr($token, 0, $pivot); $value = trim(substr($token, $pivot + 1), '"'); switch ($key) { case "Username": $wsse['user'] = $value; break; case "PasswordDigest": $wsse['digest'] = $value; break; case "Nonce": $wsse['nonce'] = $value; break; case "Created": $wsse['created'] = $value; break; } }

Finally, we recreate the digest on the server, and compare it to what was sent, and close the if.
$wsse['password'] = get_password_by_login($wsse['user']); $server_digest = base64['encode(pack("H*", sha1($wsse['nonce'] . $wsse['created'] . $wsse['password']))); if ($server_digest == $wsse['digest']) { $login_data = array('login' => $wsse['user'], 'password' => $wsse['password']); } }

If you have familiarity with WordPress's code, you might be saying something like, "WTF is this get_password_by_login() function call? I've never seen such a thing!" Good question. And the dirty little secret is that no such function exists. A weakness of the WSSE authentication scheme appears to be that to recalculate the digest the password needs to be stored in plain text on the server. This is probably at least as bad as sending the password in plain text over the wire, the thing that WSSE is trying to avoid. WordPress, sensibly, does not store passwords in plain text, but computes an md5 hash of them and stores that.

So, as far as I can tell, there is no way to implement WSSE in WordPress in any sensible way.

One little word on security. If we could implement WSSE, the code should keep track of nonces and make sure they aren't repeated, and should reject UsernameTokens created more than a couple of minutes ago (leaving aside any discussion of synchronisation of your client's clock with my server).

P.S. I hadn't read Joe Cheng's comment or Joseph Scott's reply in the comments of the post I linked to above before I started off on this wild goose chase.

I mentioned previously that the AtomPub server in my WordPress installation wasn't successfully deleting entries. More specifically, I get a 403 Forbidden when trying to PUT or DELETE posts or media files. I posted to the wp-testers mailing list and Joseph Scott passed the question along to Sam Ruby, Tim Bray, Elias Torres, and Pete Lacey, and I basically eavesdropped on their conversation.

Turns out that the problem is likely to be a firewall rejecting PUTs and DELETEs. This started a discussion of how WordPress should workaround the problem. Some options are URL munging, custom headers, or message in a message. All horrible for various and obvious reasons, and none going to make it into 2.3. Sam cautioned, and suggested revisiting PacePutDelete, a method proposed by Joe Gregorio on the Atom lists to overcome a lack of PUT and DELETE support on clients, which led me off on a fascinating journey through the Atom mailing list archives.

I'm not sure how big the problem really is now. It may take more time and effort to come up with a "fix" than it would to lobby purveyors of broken internet devices to really fix them. The biggest concern with a workaround, however, is that it may slow said fix. Why should we fix our firewall when your client can just use X-Method-That-Really-Is: delete? If such a workaround is implemented it should be a fallback in case a correct request fails.

It sure is frustrating that the solution is baked right in to HTTP.

I've posted a small WordPress AtomPub FAQ. It's a temporary home until I can find somewhere sensible to put it. If you have anything you'd like to add or correct or mock, comment here. If you can offer a sensible place to put it, let me know.

So, WordPress 2.3 beta adds support for AtomPub. All good. I installed it (separately to this blog, I'm just playing around), but all I could get out of the APE was a 401, even though I'd provided the correct authentication credentials. Looking at the code, with liberal use of the logging therein, I worked out that PHP_AUTH_* weren't being set, so I pulled out some auth code and tried it on it's own. No luck. Weird. I then grabbed a previously working snippet and tried that, but it was broken too. On both the servers to which I have easy access. I sent the snippet to Donal, and the snippet worked for him. WTF?

Turns out, that both the servers I was testing on are running PHP as a CGI, not using mod_php, and if PHP is being run as a CGI PHP_AUTH_* aren't available. Who knew? Well, someone, but not me. And they knew a workaround too.

Thanks for your help, Donal! And to all the cool people who are working on this (Elias Torres, Pete Lacey, Sam Ruby, Tim Bray get special mention).

The APE now exercises my beta blog. Now I have to work out why it can't delete stuff ...