TL;DR: There's a very common pagination setup that may end up with users never seeing some of their results because of how SQL handles sorts by columns other than the index. I'll show why and how to fix it.
I've always felt a bit of discomfort when using applications that utilize pagination. Sure, it makes sense from the standpoint of performance optimization, and avoids browser slowdowns and endless scrolling. But I could never quite kick the nagging voice in the back of my head, saying: "But how do you know you're seeing all of the results?"
Recently, I came across a oddly-behaving pagination bug in one of our applications. I wasn't seeing all of the expected results, and (spoilers) it was my fault. When I finally figured out what was going on, I described the behavior to a few other developers at Tighten—their reactions were the same: "Whoa, that's unexpected behavior."
It turns out there's a very common pagination setup that might leave your apps hiding some key results from your users. None of us at Tighten had ever heard people talk about this potential problem, so I want to share a summary and hopefully a nugget of wisdom.
I'd finally put together a paginated entity index page that I liked. It was a single-column sort with subtle visual cues, a smart default sort, and clear and useful secondary sort logic.
A user reported a few missing items from their index page. Since I'm a developer, I assumed PEBKAC, but took a look at the user's account and, indeed, couldn't find the entities in question. My next step was to fiddle: change the per-page pagination settings, navigate around the various pages, and try to find behaviors that would explain why they were missing.
Depending on the sort, per-page count, and which page I was on, these clandestine rows would appear and disappear seemingly at random. Additionally, there were other rows which would appear twice.
If you're impatient, you can skip to the solution.
Here are each of the queries and their respective results. I've named some according to their behavior: Disapearing Contacts don't show up in pagination results, and Reappearing Contacts show up in multiple result sets.
select pledges.* from `pledges` where `pledges`.`account_id` = 123order by `pledges`.`started_at` desc limit 10 offset 0;
select pledges.* from `pledges` where `pledges`.`account_id` = 123order by `pledges`.`started_at` desc limit 10 offset 10;
Listing all results showed each contact exactly once, as expected:
select pledges.* from `pledges` where `pledges`.`account_id` = 123order by `pledges`.`started_at` desc;
This is a small result set with clearly-labeled names. It's much harder when debugging this type of issue with a result set of over a hundred.
I printed the SQL that Laravel Pagination was executing to pull the paginated result, threw it into SequelPro, and found the inconsistent behavior was still occurring. My results were deterministic for a given query (they would return the same items and order for a given query), but MySQL would not maintain an overall sort order between pages (different
limits) – causing some rows to appear multiple times throughout the various
offsets, and never returning other rows.
Pro Tip: When you're running into trouble, you can stringify any Laravel Query Builder Instance:
What I discovered about the duplicated or missing rows was that they all had similar content, in that they shared the same date or monetary value.
My code's problem was that it wasn't giving MySQL enough uniquness in the sort to determine exactly how to sort all of the potential rows.
In order to return consistent ordering through pagination, the query must have a single or series of order directives that uniquely identify a row.
What this means is, for example, if you ask MYSQL to sort based on last name and a dozen people have the same last name, and you're paginating your results, MySQL doesn't know enough to know exactly which entry-with-the-same-last-name should go on which page. So, it's not consistent the way we would expect.
So, how could I modify my queries to make the sort more unique—even in the case when a given pledge had exactly the same entries in my sorted columns?
id of the entity as a final order was all I needed to solve the issue. Since the primary index is already indexed, the impact on query speed is minimal.
select pledges.* from `pledges` where `pledges`.`account_id` = 123order by `pledges`.`started_at` desc, `pledges`.`id` desc limit 10 offset 0;select pledges.* from `pledges` where `pledges`.`account_id` = 123order by `pledges`.`started_at` desc, `pledges`.`id` desc limit 10 offset 10;
Once I had a general idea of what was going on, I tried to write some regression tests. However, I couldn't recreate this issue in SQLite. SQLite would return rows in a consistent order regardless of the ambiguity of the defined sort order. (I guess we'll just add this to the ever-expanding list of subtle difference between MySQL and SQLite.) For now, the test I wrote is commented out and ready for when we get this application over to MySQL-based testing.
So go forth and check your paginated applications!