MySQL: on implicit ORDER BY

During a conversation on SO I promised to show an example where implicit ORDER BY may make sense.

Here is an example based on the data I have at hand.

Let’s suppose we have two tables: corpora and requests within the corpora. Here are the definitions:

CREATE TABLE  `corpora` (
  `corpus_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `corpus_name` varchar(50) NOT NULL,
  PRIMARY KEY (`corpus_id`),
  KEY `corpus_name` (`corpus_name`)
) ENGINE=InnoDB COLLATE utf8_general_ci;

CREATE TABLE `requests` (
  `request_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `corpus_id` smallint(5) unsigned NOT NULL,
  `req_text` varchar(1024) NOT NULL,
  `references` varchar(4000) NOT NULL DEFAULT '',
  `sorter` char(5) NOT NULL,
  `req_hash` char(32) COLLATE latin1_general_ci NOT NULL,
  PRIMARY KEY (`request_id`) USING BTREE,
  UNIQUE KEY `unique_item` (`corpus_id`,`req_hash`),
  KEY `sorter` (`corpus_id`,`sorter`) USING BTREE,
  CONSTRAINT `FK_corpus_id` FOREIGN KEY (`corpus_id`)
    REFERENCES `corpora` (`corpus_id`)
      ON DELETE NO ACTION ON UPDATE NO ACTION  
) ENGINE=InnoDB COLLATE utf8_general_ci;

Corpora contains 25 rows and requests contain 20641 rows. Now, let’s look at the following query:

SELECT SQL_NO_CACHE corpus_name, req_text, `references`
FROM corpora c
JOIN requests r
USING (corpus_id)
ORDER BY corpus_name, sorter
LIMIT 10;

Sorter is a prefix for the req_text column, we do not need the exact sorting but requests are better observable when sorted. Thus, the query displays the first 10 rows of the first (alphanumeric) corpus.

Here is the EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: index
possible_keys: PRIMARY
          key: corpus_name
      key_len: 152
          ref: NULL
         rows: 25
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: r
         type: ref
possible_keys: unique_item,sorter
          key: sorter
      key_len: 2
          ref: test.c.corpus_id
         rows: 521
        Extra:

The average speed of the query on my PC is 0,7 secs. Unfortunatelly, we can not do much with any indexes to make it perform better, since only an index from the first table can positively affect the ORDER BY. MySQL reads corpora names from the coverage index, finds the corresponding rows in requests, fetches the required columns, puts everything into a temporary table, sorts and takes the first 10 rows. The more rows in the requests table, the quicker the temporary table grows, the slower our query is.

What we can do is to reduce our table to the information taken from the coverage indexes only and join to the table with the haviest fields. Most notable here is that we will not need to order the results since that data will be read in the order of the subquery rows (since for each row in the subquery there is only one row in requests found by the primary key):

SELECT SQL_NO_CACHE corpus_name, req_text, `references`
FROM (
  SELECT corpus_name, request_id
  FROM corpora
  JOIN requests USING (corpus_id)
  ORDER BY corpus_name, sorter
  LIMIT 10) ids
JOIN requests USING (request_id);

The query return the same but takes about 0,27 secs. Here is the EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: requests
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: ids.request_id
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: corpora
         type: index
possible_keys: PRIMARY
          key: corpus_name
      key_len: 152
          ref: NULL
         rows: 25
        Extra: Using index; Using temporary; Using filesort
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: requests
         type: ref
possible_keys: unique_item,sorter
          key: sorter
      key_len: 2
          ref: test.corpora.corpus_id
         rows: 521
        Extra: Using index

But we would not need to do any ordering if the data would be read in the required order from the beginning. To make it possible we will need to give some hints to MySQL:

SELECT SQL_NO_CACHE
  STRAIGHT_JOIN corpus_name, req_text, `references`
FROM corpora
JOIN requests USE INDEX (sorter)
USING (corpus_id)
LIMIT 10;

The query takes about 0,009 secs. Here is the EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: corpora
         type: index
possible_keys: PRIMARY
          key: corpus_name
      key_len: 152
          ref: NULL
         rows: 25
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: requests
         type: ref
possible_keys: sorter
          key: sorter
      key_len: 2
          ref: test.corpora.corpus_id
         rows: 521
        Extra:

Nevertheless, explicit is better than implicit. Much better is to avoid such sorts at all or adjust the schema so that all columns in ORDER BY would be handled by a single index.

Tags: mysql