UrlEncoding in Web applications can be a pain and in .NET, with its various utilities that all behave slightly differently for various edge cases, doesn't make it any easier. I wrote about the pain of UrlEncoding in .NET before. The resolution of that previous post was that Uri.EscapeDataString() is as close as it gets to a best solution out of the box.
But even with that knowledge I ran into trouble again with this topic, this time with URL paths created as part of an update to an old WebForms application and adding routing in order to provide cleaner URLs for accessing the product pages and categories. Here I'm not actually encoding query string parameters or post values, but instead encoding path segments on URL routes.
Essentially, I created extensionless URLs for a few select URLs of the application. When encoding extensionless URLs extra care has to be given to properly encoding path strings, as paths are more sensitive to special rules that determine how the paths are parsed by the Web server and ASP.NET.
Routing 101 in Web Forms
The process of adding routing features to an old Web Forms application is pretty straight forward and using a few MapPageRoute() calls make short work of this process.
In this case I'm routing urls in my Web Store by mapping out products and categories like this (fired off global.asax's Application_Init()):
// Specific Route mapping
routes.MapPageRoute("ProductPage", "product/{sku}", "~/item.aspx");
routes.MapPageRoute("ProductPageWithQty", "product/{sku}/{qty}", "~/item.aspx");
// List Views Routings
routes.MapPageRoute("ProductCategory", "products/{category}", "~/itemlist_abstract.aspx");
routes.MapPageRoute("ProductWithoutCategory", "products", "~/itemlist_abstract.aspx");
This turns urls like:
item.aspx?sku=ProductID
into
product/ProductId
and
itemlist.aspx?Category=Books
into
products/Books
Note that unlike MVC, there is no direct support for optional parameters in MapPageRoute(), so each path configuration requires its own explicit route config.
So far so good. This is nice and easy to accomplish even in a WebForms application. This is an old app so I only updated a few URLs that are the most commonly externally accessed and crawled links, but it would be easy enough to do most of the application links using a similar approach.
Capturing the Route Data in WebForm
Capturing the RouteData in the routed pages is also very easy to do. Previously the code was capturing the query string, now the code captures both query string and the RouteData collection for checking the url parameters. Here's the item SKU and QTY mapping logic:
void GetSkuAndQty()
{
Sku = Request.QueryString["Sku"];
if (string.IsNullOrEmpty(Sku))
{
Sku = RouteData.Values["sku"] as string;
}
string Qty = Request.QueryString["qty"];
if (string.IsNullOrEmpty(Qty))
{
Qty = RouteData.Values["qty"] as string;
}
// redirect permanently to new url
if (Request.Url.AbsoluteUri.Contains(".aspx"))
{
if (!string.IsNullOrEmpty(Sku))
{
string newUrl = "~/product/" +Sku + "/" + Qty;
Response.RedirectPermanent(ResolveUrl(newUrl));
}
}
}
From here the code is identical to the original code, using the Sku and Qty to load the Item business object and displaying the inventory item purchase UI.
Creating Route Links Manually
In many places of the application the URLs to link to the product and category pages are generated, meaning that the links are generated as well. It seems easy enough, using code like this (for the category list):
ItemListForm = ResolveUrl("~/products");
foreach (LineItem item in LineItems)
{
sb.AppendFormat(@"<div class='menurow'><a class='menulink' href='{0}/{1}'>{2}</a></div>",
ItemListForm,
Uri.EscapeDataString(item.Category),
HttpUtility.HtmlEncode(item.Category));
}
Note that that I URL encode the category for the URL and HtmlEncode the category for the display text.
This produces Urls like:
products/Books and products/Development%20Tools. We're golden, right?
It works great - until it doesn't!
Yes it works great, until you use a few categories that use special formatting. This is not obvious, because the vast majority of categories work just fine - it's just a couple of specific ones that will fail.
The problem is that if you have certain names that include special characters. Specifically a . (period) or # can throw all this out. .NET and C# are good examples of where this can get hosed.
Dot me Not
So this URL is a problem:
products/.NET
Note that EscapeDataString() doesn't encode the period - as per spec that's actually correct in that . should not be urlencoded. Even if you DO fix the period to:
products/%2ENET
it still causes problems as the value is still parsed to .NET.
Why? IIS/ASP.NET doesn't parse this URL as an extensionless URL. The period forces ASP.NET to treat the request like a page that cannot be found. Luckily there's a simple workaround for this problem by adding a trailing slash:
products/.NET/
works just fine. For generated code that is. If you generate URLs in your app it's easy enough to slap on the trailing slash. But if somebody decides to navigate to your site via a manually typed URL without the trailing slash they'll get a failure.
Don't be a #ie
The other one that has caused me pain is a # in the url, for example: C#. If you try using:
products/c#
or
products/c#/
you find that RouteData.Values["category"] returns just 'c' rather than 'c#'. The problem is that the hash character (#) has meaning in a url, namely it is meant for page level anchor jumps. More recently # has also been highjacked for history management in AJAX/SPA applications, but regardless a # in a URL is not treated as content.
In this scenario UrlEncoding DOES solve the # encoding problem, as long as you use Uri.EscapeDataString(), rather than Uri.EscapeUriString(). The latter doesn't escape # for reasons unknown.
So this URL:
products/c%23
works just fine. Again this sucks for the user who happens to type this in manually, but again, that's an edge case.
Summarizing Path Encoding
What I described here applies to any non-MVC routing scenarios, using the standard ASP.NET routing mechanisms. MVC adds a host of features on top of routing that solve these issues - mainly through ActionLink() functionality which is effectively a URL builder based on existing routes. The raw ASP.NET routing has no such construct so it's up to you to ensure that urls are properly encoded and terminated.
When you're encoding extensionless URLs extra care has to be given to properly encoding path strings as they are a bit more sensitive than query string values. Basically you're creating a path and so all the rules for URL path formatting apply, which is much more strict than what's legal in query strings. If you have many long, complex strings to pass, it's probably better to stick to query strings or POST data for that matter.
If you do use paths for route parameters remember to:
Always terminate your routed paths with a / to force an extensionless path
Don't use HttpUtility.UrlEncode() or Uri.EscapeUri()
Always use Uri.EscapeDataString() to encode your paths
or else strip out or replace problem characters before encoding
and do the same when you try to match the routes.
Related Info
Other Posts you might also like