27 Aug

Tantek – How many ways can you slice a URL and name the pieces?

I was developing a small single purpose microsite and decided to build it using CASSIS not just for application logic, but for the server-side runtime execution and flow as well. I figured the needs of a simple real world site would work well to drive the design of a simple runtime.

No need to invent anything new, just re-use Apache/CGI environment variables (e.g. as used in PHP, like SERVER_NAME). But they look like old C constants, and CASSIS coders will be more familiar with Javascript.

Window.location’s properties seem reasonable, until you get to "search" for the "?" query part of a URL. What about the source, the specs for URL and HTTP? And that’s when I started to see the problem.

With a little more research I found a half-dozen different ways to slice and dice URLs. Kevin Marks asked me, what about Python? And that made seven. I published my research publicly on the microformats wiki, which is a good place to document existing formats for something (a key step in the microformats process).

Among all the differences (and overloading of the same terms to mean different things) it did seem that there were some patterns. So I made a diagram of a sample URL, chopped into pieces and named according to seven different conventions over the years, in the hopes that doing so might reveal such patterns.

via Tantek – How many ways can you slice a URL and name the pieces?. Or standards are so awesome everyone keeps making a new one.