Jan 10, 2013

Nginx - nested locations and pagination

When building this code blog I was looking into alternatives on how to handle things like pagination for a statically generated site. When investigating a solution to my problem I came across nested location blocks.

What are nested location blocks?

In the configuration file you define locations to handle different URLs. These can be configured as literal strings and regular expressions.

Regular expressions are more expensive to use than a literal string and here is where the nesting might come in handy.

Let's consider the following configuration example:

location /code/ {
    try_files $uri $uri/index.html =404;
}

location ~ ^/code/(?<uri_path>.*/)(?<page>\d+)$ {
    try_files $uri $uri_path/index$page.html =404;
}

location ~ ^/foo/.*.html {
    return 404;
}

When a request is made to, e.g. /foobar/, Nginx will first check if the URL matches /code/. It will then go on to the next alternative after literal strings which is regular expressions and go through both the one searching for location as well as the one searching for foo before sending 404 to the client.

By nesting we could remove one regular expression check in the example above, resulting in the following configuration:

location /code/ {
    location ~ ^(?<uri_path>/code/.*)(?<page>\d+)$ {
        try_files $uri $uri_path/index$page.html =404;
    }

    try_files $uri $uri/index.html =404;
}

location ~ ^/foo/.*.html {
    return 404;
}

The impact in this example might not be that noticeable but if you do a lot of magic under one URL-path it might be a good idea. If not for performance it makes the syntax cleaner and easier to follow. One thing to notice is that the path isn't relative to the parent location block.

Pagination for static sites

I'm not a big fan of URLs like /code/index.html or /post/nginx-amazing-web-server.html. When I can I try to strip the file extension which made it a bit more cumbersum to solve the issue with pagination.

The index page is generated like: index.html, index2.html, indexN.html etc. There are different ways of paginate when it comes to the URL scheme. The two most common are /code/2 and /code/?page=2 (i.e. with a query string or by appending to the URI). I will explain how to solve both variants in an easy and efficient way, all thanks to the great power that is Nginx.

Using the query string

This might be the easiest of the two alternatives to get up and running. Nginx parses the query string and saves it in the variable $args and also gives you the possibility to access a specific parameter in the query string using $arg_PARAMETER.

With the information above we could simple add this to solve pagination:

location / {
    try_files $uri $uri/index$arg_page.html =404;
}

If you request /foo/ Nginx will first test the $URI itself, if it doesn't match it tries to find foo/index.html and if that fails we return status 404.

Does it look a bit strange? What happens is that if the parameter ?page=x is missing; $arg_page = '' and that in turn generates foo/index.html. If we were to request /foo/?page=2 Nginx would try foo/index2.html. Pretty nifty, right?!

Using "pretty" URLs

If you (like me) don't want to use GET arguments for pagination there is another solution. By using named location regular expressions, we can parse the data to find the correct page requested.

Looking at the example earlier in this post we can see the following code:

location ~ ^(?<uri_path>/code/.*)(?<page>\d+)$ {
    try_files $uri $uri_path/index$page.html =404;
}

This is what you need to support an URL scheme like /code/tags/foo/2 to show the second page of the tag foo. The named parameters are used in the try_files directive. If the first page is requested (which doesn't end with /1) we let the default location block handle this (which e.g. serves $uri/index.html).

Marcus Carlsson

Code monkey.

Nginx - nested locations and pagination

What are nested location blocks?

Pagination for static sites

Using the query string

Using "pretty" URLs