2011/06/28

Using O_DIRECT for the InnoDB transaction log [by Mark Callaghan]

We want to use large transaction log files for InnoDB to reduce page writes done for fuzzy checkpoints. The sum of the sizes for log files is 4G in official MySQL and much larger in Percona XtraDB. I previously published results to show the benefit from using larger log files. Until recently we were concerned about that impact of log files on crash recovery time. Crash recovery is much faster now. We are also concerned about the impact on the buffer pool. We run with innodb_flush_method=O_DIRECT. When this is set log files use buffered IO and can use space in the OS buffer cache. We prefer not to dedicate 4G or more of RAM for caching log files.

The alternative on Linux is to use O_DIRECT for log files. I don't think O_SYNC would be a good choice in this case because that allows the file to be stored in the OS buffer cache which again dedicates too much RAM to log files.

There is another potential problem when the log file can be cached in the OS buffer cache. The first write to an uncached page requires the page to be read into the buffer cache. This is not a good use of disk IO. We think that read-modify-write will not occur when O_DIRECT is used. To test this I added a new option for the innodb_flush_method configuration variable. When set to all_direct both database and log files are opened with O_DIRECT. I had to change a few places in the log code to use 512-byte aligned buffers when reading and writing.

The results from testing so far are not conclusive. I need to ask my peers to look at the data from vmstat and /proc/meminfo and we need more results.

I first tested this with a simple benchmark. I used sysbench to update 1 row per-transaction using a cached 1M row table. The test was run for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 concurrent connections using MySQL 5.1.52 with the Facebook patch.

This lists the throughput in commits/second for innodb_flush_method=all_direct and O_DIRECT. Results below are for 1 to 128 connections to avoid line-wrap. Results for all_direct are better at low-concurrency and O_DIRECT is faster for 4 or more concurrent connections.
   1       2       4       8      16      32      64    128
1952    2777    3530    3755    3829    3741    3760    3803  all_direct
1608    2479    3507    4541    4550    4644    4698    4581  O_DIRECT

But throughput on an artificial benchmark isn't the only way to determine whether the change is useful. I have also been running this on a test server that gets a mirror of the production workload. Response time for write transactions is unchanged, which is good. I also want to know whether this increases the read IO rate for the transaction log, but I need to put the log on a separate filesystem to collect that data and that is work for the future.

The final data I have is the output from vmstat -sa. This is from two test servers that get a mirror of the production workload. Each server uses 2 2G transaction log files. With O_DIRECT the 4G log files might be in the OS buffer cache which can consume 4G of RAM for no good reason. The value for active memory with O_DIRECT is about 1.7G larger than for all_direct. This might be a benefit from using all_direct and that might allow me to make the InnoDB buffer pool larger when all_direct is used.

Output for all_direct
     74178160  total memory
     73788576  used memory
     66792708  active memory
      5973644  inactive memory

Output for O_DIRECT
     74178688  total memory
     73782800  used memory
     68543584  active memory
      4138808  inactive memory

Article 

Nessun commento:

Posta un commento

Nota. Solo i membri di questo blog possono postare un commento.