API Pagination

Pagination is how an API returns a large collection without sending the entire thing in one response. A request for “all issues” or “all transactions” might match millions of rows; serializing them all into a single payload would exhaust memory, time out, and overwhelm the client. Instead the server returns the results in bounded chunks, called pages, and gives the client a way to ask for the next one. Every serious web API does this, and the way it does it has real consequences for performance and correctness.

The simplest scheme is offset and limit, sometimes expressed as page and per-page. The client asks for a page number and a page size, and the server skips ahead by the offset and returns that many rows. GitHub’s official REST API pagination documentation describes exactly this style, noting that endpoints support a “page” query parameter and that “if an endpoint supports the per_page query parameter, then you can control how many results are returned on a page.” Offset pagination is easy to understand and lets a client jump directly to any page, which is why it is so common.

Offset pagination has two well-known weaknesses. The first is performance: to return page 10,000, the database must count past all the preceding rows before it can begin returning results, so deep pages get progressively slower. The second is correctness under change: if rows are inserted or deleted while a client is paging through, the offsets shift, and the client can see a record twice or skip one entirely because the data moved out from under the page boundary.

Cursor-based, or keyset, pagination solves both problems. Rather than a page number, the server hands the client an opaque cursor that encodes a stable position in the result set, typically the sorted key of the last row returned. The next request carries that cursor, and the server fetches the rows that come after it using an indexed comparison rather than a count-and-skip. GitHub’s documentation notes that paginated endpoints may use “before”/“after” or “since” parameters in addition to plain page numbers, reflecting exactly this cursor approach. Because the query seeks to an indexed position instead of scanning past every skipped row, performance stays roughly constant no matter how deep the client pages.

The trade-off is that cursor pagination gives up the ability to jump to an arbitrary page; a client can move forward (and often backward) through the sequence but cannot ask for “page 500” directly. For the large, append-heavy collections that dominate modern APIs, that is usually an acceptable price. The navigation links themselves are commonly delivered out of band: GitHub returns them in a Link response header containing “the previous, next, first, and last page of results,” so the client follows server-provided URLs rather than constructing pagination parameters itself.

Sources

Related