Page:Aaron Swartz s A Programmable Web An Unfinished Work.pdf/25

This page has been proofread, but needs to be validated.

2. BUILDING FOR USERS: DESIGNING URLS 13


    http://posterous.com/p/235
    http://posterous.com/p/236

Nothing wrong with that. However, if your site is a little more popular, the IDs can get quite long and confusing:

    http://books.example.org/b/30283833

In a situation like this, you might want to encode numbers using base 36 instead of base 10. Base 36 means you get to use all the letters in addition to just numbers, but only one case, so there’s no confusion about how to capitalize numbers. (Imagine someone reading the URL over the phone. It’s a lot easier to say “gee, five, enn, four” than “lower-case gee, the number five, upper case enn, the number four.”)

To be super-careful, you might want to go a step further and skip any numbers that end up having zero, O, one, L, or I in them, since those letters can often be confused.

The result is that you have URLs that look like:

    http://books.example.org/b/3j7is

and end up being a lot shorter. While four base 10 digits can only go up to 9999, in base 36 zzzz is actually 1,679,615. Not bad.

One problem with numerical identifiers, however, is that they’re not “optimized” for search engines. Search engines don’t just look at the content of a page to decide whether it’s a good result for someone’s search, they also look at the URL which, because it’s so limited, is given special weight. But if your URLs are just numbers, they’re unlikely to have anything that matches people’s search engine queries, making them less likely to be found in search results. To fix this, people are appending some text after the number, as in:

    http://www.hulu.com/watch/17003/saturday-night-liveweekend-update-judy-grimes

The text at the end is part of the URL, but it’s not used to identify the right page. Instead, the system looks only at the number. Once it gets there, it looks at the current title it has for the number, sees if it matches the URL, and if it doesn’t, redirects users to the correct one. That way they can type in:

    http://www.hulu.com/watch/17003/