Nearly a year has passed since our descent into the 9th ring of latency Hades, and I wanted to make an update post on ZFS' interaction with SAN arrays containing battery-backed cache. (For the full details, please check out this older post.)
For one thing, the instructions I previously gave to ignore cache flushes on the STK FLX200/300 series (and similar LSI OEM'd products), don't seem to work very well on the new generation Sun StorageTek 6×00 arrays. Not to mention it's kind of nasty to have to modify your array's NVRAM settings to get good write latency.
But thanks to the brilliant engineers on the ZFS team, you no longer have to modify your array (since circa May '07 in the OpenSolaris tree). Simply add this line to your Solaris /etc/system file and ZFS will no longer issue SYNCHRONIZE CACHE commands to your array:
set zfs:zfs_nocacheflush=1
I can confirm that this works REALLY well on both the older (FLX200/300) and newer (6140/6540) Sun/Engenio arrays! It seems to me that since the new way is a ZFS configuration directive, it should be portable/functional against any array in existence. Please note that setting this directive will disable cache flushing for ALL zpools on the system, which would be dangerous for any zpools using local disks. As always, caveat emptor. Your mileage may vary so please do let others know through the comments what works/doesn't work for you.
P.S.
We've tested the zfs:zfs_nocacheflush directive successfully in Build 72 of OpenSolaris. It should also work in Solaris 10 Update 4, though we haven't tested that ourselves.
d.s.ivanov
February 14th, 2008 at 6:46 pm
It's work at Solaris 10u4.UB
May 22nd, 2008 at 1:55 pm
Works with a patched s10u3.Tip: Check Your kernel:
echo "zfs_nocacheflush/D" | mdb -k
Joe
July 6th, 2008 at 3:56 am
I don't get it …Your premise (in the prior article) is:
* Tell your array to ignore ZFS' flush commands. This is pretty safe, and massively beneficial.
The former option, is really a no go because it opens you up to losing data. The second option really works well and is darn safe. It ends up being safe because if ZFS is waiting for the write to complete, that means the write made it to the array, and if its in the array cache you're golden. Whether famine or flood or a loose power cable come, your array will get that write to the disk eventually. So its OK to have the array lie to ZFS and release ZFS almost immediately after the ZIL flush command executes. On our StorageTek FLX210 this took the idle latencies to 1ms and the heavy load latencies to 9ms. 9 bloody milliseconds! Our InnoDB problems disappeared like sand down a rat hole.
How do you KNOW that the write will be made within 72 hours ?
In the event of "flood or famine" it likely will not.
How about I wire you a few million and you reply immediately that you received the cash, then prior to you actually writing the authentiction to your HD you get a tornado. I have my "proof" that I gave you the money but you don't have the money in your hand.
Gerardo Diaz
October 15th, 2008 at 8:14 am
If the option to not flush the cache is used, would that cause the filesystems memory (server memory not the arrays cache) buffers to not flush too?Shenanigans with ZFS flushing and intelligent arrays… - Jason’s .plan
January 15th, 2009 at 12:29 pm
[...] NOTE: ZFS has been enhanced to better address the situation described below by using ZFS configuration directives. This article is still accurate and provides decent background on the problem. However, an update has been posted with the newer, stronger, better way of resolving the problem: Back in the sandbox…ZFS flushing shenanigans revisited. [...]