Page:Aaron Swartz s A Programmable Web An Unfinished Work.pdf/35

This page has been proofread, but needs to be validated.

3. BUILDING FOR SEARCH ENGINES: FOLLOWING REST 23

can pull up your file, cookies can allow servers to build pages customized just for you. There’s nothing wrong with that.

(Although you have to wonder whether users might not be better served by the more secure Digest authentication features built into HTTP, but since just about every application on the Web uses cookies at this point, that’s probably a lost cause. There’s some hope for improvement in HTML5 (the next version of HTML) since they’re– oh, wait, they’re not fixing this. Hmm, well, I’ll try suggesting it.)[1]

The real problem comes when you use cookies to create sessions. For example, imagine if Amazon.com just had one URL: http://www.amazon.com/. The first time you visited it’d give you the front page and a session number (let’s say 349382). Then, you’d send call back and say “I’m session number 349382 and I want to look at books” and it’d send you back the books page. Then you’d say call back and say “I’m session number 349382 and I want to search for Dostoevsky.” And so on.

Crazy as it sounds, a lot of sites work this way (and many more used to). For many years, the worst offender was probably a toolkit called WebObjects, which most famously runs Apple’s Web store. But, after years and years, it seems WebObjects might have been fixed. Still, new frameworks like Arc and Seaside are springing up to take its place. All do it for the same basic reason: they’re software for building Web apps that want to hide the Web from you. They want to make it so that you just write some software normally and it magically becomes a web app, without you having to do any of the work of thinking up URLs or following REST. Well, you may get an application you can use through a browser out of it, but you won’t get a web app.

The next major piece of Web architecture is caching. Since we have this long series of stateless requests, it sure would be nice if we could cache them. That is, wouldn’t it be great if every time you hit the back button, your browser didn’t have to go back to the server and redownload the whole page? It sure would. That’s why all browsers cache pages—they keep a copy of them locally and just present that back to you if you request it again.

But there’s no reason things need to be limited to just browser caches. ISPs also sometimes run caches. That way, if one person downloads the hot new movie trailer, the ISP can keep a copy of it and just serve the same file to all of their customers. This makes things much faster for the customers (who aren’t competing with the whole world for the same files) and much easier on the server operator

  1. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-October/016742.html.