[Users] discard support for SSD in OpenVZ kernel

David Brown david at westcontrol.com
Tue Aug 27 18:47:23 EDT 2013


On 27/08/13 21:48, spameden wrote:
> Hi.
> 
> 
> 2013/8/27 David Brown <david at westcontrol.com <mailto:david at westcontrol.com>>
> 
>     The answer is quite simple - don't use "discard" mounts.  They make your
>     SSD much slower, especially for metadata-heavy operations.
> 
> 
> I tend to disagree with this.
> 
> On the writing operations I saw at least 2x speed with discard / TRIM
> enabled on dm-crypt LUKS partition.
> 

There must be something else causing this difference - or you have
tested using a very old or cheap SSD which has seen a lot of writes, but
currently has little data on it.  If you have a decent SSD of a sensible
size, you will always have plenty of free blocks to write - that's what
overprovisioning gives you.  And the garbage collection will keep things
sorted in the background.  TRIM can give /marginal/ write speed
increases on such devices, but never anything close to 2x, but at the
cost of significant latency on deletes.

> On the another note it's not secure to use TRIM on dm-crypt, because
> attacker might identify which blocks are free and potentially can detect
> filesystem.

With an encrypted filesystem, the data written is encrypted - it is not
going to be useful to an attacker (except conceivably to get a rough
idea of how much has been written to the disk).  But certainly with an
SSD, you can never know when a physical sector has been deleted - that
applies equally to when you are using TRIM or not using it.

> 
> The fix for dm-crypt devices only will be in 3.1.x mainline, so don't
> think it's gonna be ported in the current 2.6.32 branch.
> 
> 
> 
> 
>     The problem is that the halfwits that added TRIM to the SATA
>     specifications made it a synchronous operation, so it blocks all queues
>     and buffers.  They also failed to give it proper semantics, such as
>     specifying that reading from a TRIM'ed sector would give all zeros.  If
>     the people behind this nonsense had read the SCSI specifications for the
>     equivalent operation, they would have made a much better TRIM that was
>     asynchronous, could be queued, and would guarantee reading all zeros
>     from a TRIM'ed block - which would have been much more useful.
> 
> 
> Well, if you won't use TRIM, you'll see in the future write performance
> degraded all because of the SSD internals.
>  

No, you won't.

With the earliest SSDs, before TRIM was supported (we are talking about
SATA flash SSDs here - SCSI had TRIM, and ram-based SSD's don't need
TRIM), you got a drop in write speed as you ended up with little or no
blank space to write new data.  The disk then had to do garbage
collection - collect the mapped sectors from an erase block and copy
them to a new sector in a fresh erase block until the old erase block
had nothing but unmapped sectors and could thus be erased and re-used.
The trouble was, once a sector was written (mapped), it only became
unmapped, or known-unused, when that same logical sector was written
again.  When a file is deleted, the SSD does not know that the file's
sectors can be unmapped until the filesystem re-uses those same logical
sectors.

So TRIM was invented as a way for the filesystem to tell the SSD that
the deleted sectors could be unmapped and recycled.  And there was much
rejoicing, because now the completely meaningless benchmarks used by
testers worked faster - you could write lots to an SSD, erase the files,
and re-write them at the same speed as before.

Of course, these artificial benchmarks did not test the speed of the
disks in the real world, as they get used and filled up.  Once you have
a reasonably amount of data on the disk, TRIM does nothing useful at all
- you still have lots mapped sectors scattered around so garbage
collection is hard, and writes to new files over-write the logical
sectors used by old deleted files - so the old sectors get unmapped that
way.  At best, the SSD gets to unmap the old sectors a little earlier.


Then SSD's got better garbage collection that runs on off-peak times
with little IO, and they got overprovisioning (i.e., your 128 GB SSD
actually has 140 GB flash).  This meant that you always had empty erase
blocks on hand - if your SSD has 10% overprovisioning, then you are
guaranteed at least 10% of your sectors are unmapped at any given time
(less the small percentage of bad blocks) - and background garbage
collection means that these will be organized reasonably well when you
want to write, unless you have really weird and difficult usage patterns.

So what does TRIM give you now?  Very little, for most purposes.  It can
still help a bit if you have a mostly full SSD, delete /lots/ of data,
TRIM, then write lots again - as it lets the garbage collection do a
better job.  But online TRIM, such as with ext4's "discard" mount
option, means that TRIM's are sent on deletions when a logical sector is
no longer in use.  These commands are slow and synchronous, thus they
block the queue and the flow of commands.  If you have a lot of small
commands, as you would get for something like a recursive deletion of
lots of files, TRIM slows down the whole thing by orders of magnitude.

off-line TRIM using fstrim gives you the same minor benefits as online
TRIM can do, without the slowdown.  It is thus the recommended solution
for setups that support TRIM - but if your layers of dm-crypt, raid,
etc., don't support TRIM, then there is nothing to worry about.

<https://patrick-nagel.net/blog/archives/337>

> 
>     Off-line TRIM using fstrim is useful, but not essential if you have
>     bought a half-decent SSD that is not too small, and not too old.  So use
>     fstrim if it works on your setup - but don't worry if it doesn't.  It's
>     very unlikely that you will notice the difference.
> 
> 
> I read about it, but don't think it's a proper solution, I'm thinking
> about trying another virtualization technology instead.

Read about it again, learn about it /properly/, and try it out with
/real-world/ loads rather than simple benchmarks.

There are many reasons for changing virtualisation technology - this is
not one of them.  You will /never/ see more than a marginal increase in
speed from enabling TRIM, and you may well see a significant decrease
for some operations.  But even if your claims were right and "discard"
mounts gave you twice the write speed, that is still not a reason to
change platforms unless the half-speed writes are too slow for the job
you need to do.  If the half-speed writes are fast enough, then doubling
that speed is unnecessary - and not worth making drastic changes.

> 
> LXC seems to be unstable for production yet.

That depends on your needs.  It gives some reasonable isolation for the
virtual machines, but it is not as secure as openvz (root users in the
containers can escape to the host) and it doesn't have the same number
of tunable limits.  But for some uses, it is perfectly good.

> 
> Thinking about trying XEN on the more recent kernel.

XEN is a totally different beast - it does full virtualisation, rather
than lightweight virtualisation of OpenVZ and LXC.  There is no doubt
whatsoever that the extra overhead of XEN will far outweigh any
differences you have in the SSD speeds with TRIM enabled or not.

David

>  
> 
> 
>     Hope that helps,
> 
>     David
> 
> 
> 
>     On 27/08/13 17:10, spameden wrote:
>     > is it implemented?
>     >
>     > I've tried on 3.2.0-4 debian wheezy default kernel it's working
>     just fine:
>     > # dmsetup table
>     > vg0-home: 0 443277312 linear 9:1 25166208
>     > home: 0 443273216 crypt aes-cbc-plain
>     > 0000000000000000000000000000000000000000000000000000000000000000 0
>     253:2
>     > 4096 1 allow_discards
>     >
>     > But not on OpenVZ's 2.6.32.xxxx:
>     > # dmsetup table
>     > vg0-home: 0 443277312 linear 9:1 25166208
>     > home: 0 443273216 crypt aes-cbc-plain
>     > 0000000000000000000000000000000000000000000000000000000000000000 0
>     253:2
>     > 4096
>     > vg0-swap: 0 4194304 linear 9:1 20971904
>     > vg0-root: 0 20971520 linear 9:1 384
>     >
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > Users mailing list
>     > Users at openvz.org <mailto:Users at openvz.org>
>     > https://lists.openvz.org/mailman/listinfo/users
>     >
> 
> 



More information about the Users mailing list