MarketWrite amplification
Company Profile

Write amplification

Write amplification (WA) is an undesirable phenomenon associated with flash memory and solid-state drives (SSDs) where the actual amount of information physically written to the storage media is a multiple of the logical amount intended to be written.

Basic SSD operation
(P/E cycles) it can sustain over the life of the flash memory. Single-level cell (SLC) flash, designed for higher performance and longer endurance, can typically operate between 50,000 and 100,000 cycles. , multi-level cell (MLC) flash is designed for lower cost applications and has a greatly reduced cycle count of typically between 3,000 and 5,000. Since 2013, triple-level cell (TLC) (e.g., 3D NAND) flash has been available, with cycle counts dropping to 1,000 program-erase (P/E) cycles. A lower write amplification is more desirable, as it corresponds to a reduced number of P/E cycles on the flash memory and thereby to an increased SSD life. The wear of flash memory may also cause performance degrade, such as I/O speed degrade. == Calculating the value ==
Calculating the value
Write amplification was always present in SSDs before the term was defined, but it was in 2008 that both Intel and SiliconSystems started using the term in their papers and publications. All SSDs have a write amplification value and it is based on both what is currently being written and what was previously written to the SSD. In order to accurately measure the value for a specific SSD, the selected test should be run for enough time to ensure the drive has reached a steady state condition. :\text{write amplification} = \frac{\text{data written to the flash memory}}{\text{data written by the host}} The two quantities used for calculation can be obtained via SMART statistics (ATA F7/F8; ATA F1/F9). == Factors affecting the value ==
Factors affecting the value
Many factors affect the write amplification of an SSD. The table below lists the primary factors and how they affect the write amplification. For factors that are variable, the table notes if it has a direct relationship or an inverse relationship. For example, as the amount of over-provisioning increases, the write amplification decreases (inverse relationship). If the factor is a toggle (enabled or disabled) function then it has either a positive or negative relationship. == Garbage collection ==
{{Anchor|GC}}Garbage collection
. Background garbage collection The process of garbage collection involves reading and rewriting data to the flash memory. This means that a new write from the host will first require a read of the whole block, a write of the parts of the block which still include valid data, and then a write of the new data. This can significantly reduce the performance of the system. Many SSD controllers implement background garbage collection (BGC), sometimes called idle garbage collection or idle-time garbage collection (ITGC), where the controller uses idle time to consolidate blocks of flash memory before the host needs to write new data. This enables the performance of the device to remain high. The SandForce SSD controllers It is not clear if this feature is still available in currently shipping SSDs from these manufacturers. Systemic data corruption has been reported on these drives if they are not formatted properly using MBR and NTFS. == TRIM ==
TRIM
TRIM is a SATA command that enables the operating system to tell an SSD which blocks of previously saved data are no longer needed as a result of file deletions or volume formatting. When an LBA is replaced by the OS, as with an overwrite of a file, the SSD knows that the original LBA can be marked as stale or invalid and it will not save those blocks during garbage collection. If the user or operating system erases a file (not just remove parts of it), the file will typically be marked for deletion, but the actual contents on the disk are never actually erased. Because of this, the SSD does not know that it can erase the LBAs previously occupied by the file, so the SSD will keep including such LBAs in the garbage collection. The introduction of the TRIM command resolves this problem for operating systems that support it like Windows 7, FreeBSD since version 8.1, and Linux since version 2.6.33 of the Linux kernel mainline. When a file is permanently deleted or the drive is formatted, the OS sends the TRIM command along with the LBAs that no longer contain valid data. This informs the SSD that the LBAs in use can be erased and reused. This reduces the LBAs needing to be moved during garbage collection. The result is the SSD will have more free space enabling lower write amplification and higher performance. Limitations and dependencies The TRIM command also needs the support of the SSD. If the firmware in the SSD does not have support for the TRIM command, the LBAs received with the TRIM command will not be marked as invalid and the drive will continue to garbage collect the data assuming it is still valid. Only when the OS saves new data into those LBAs will the SSD know to mark the original LBA as invalid. SSD Manufacturers that did not originally build TRIM support into their drives can either offer a firmware upgrade to the user, or provide a separate utility that extracts the information on the invalid data from the OS and separately TRIMs the SSD. The benefit would be realized only after each run of that utility by the user. The user could set up that utility to run periodically in the background as an automatically scheduled task. Just because an SSD supports the TRIM command does not necessarily mean it will be able to perform at top speed immediately after a TRIM command. The space which is freed up after the TRIM command may be at random locations spread throughout the SSD. It will take a number of passes of writing data and garbage collecting before those spaces are consolidated to show improved performance. Even after the OS and SSD are configured to support the TRIM command, other conditions might prevent any benefit from TRIM. , databases and RAID systems are not yet TRIM-aware and consequently will not know how to pass that information on to the SSD. In those cases the SSD will continue to save and garbage collect those blocks until the OS uses those LBAs for new writes. The actual benefit of the TRIM command depends upon the free user space on the SSD. If the user capacity on the SSD was 100 GB and the user actually saved 95 GB of data to the drive, any TRIM operation would not add more than 5 GB of free space for garbage collection and wear leveling. In those situations, increasing the amount of over-provisioning by 5 GB would allow the SSD to have more consistent performance because it would always have the additional 5 GB of additional free space without having to wait for the TRIM command to come from the OS. == Over-provisioning ==
Over-provisioning
tables. Mid-end and high-end flash products are usually have bigger over-provisioning spaces. Over-provisioning is represented as a percentage ratio of extra capacity to user-available capacity: :\text{over-provisioning} = \frac{\text{physical capacity}-\text{user capacity}}{\text{user capacity}} Over-provisioning typically comes from three sources: • The computation of the capacity and use of gigabyte (GB) as the unit instead of gibibyte (GiB). Both HDD and SSD vendors use the term GB to represent a decimal GB or 1,000,000,000 (= 109) bytes. Like most other electronic storage, flash memory is assembled in powers of two, so calculating the physical capacity of an SSD would be based on 1,073,741,824 (= 230) per binary GB or GiB. The difference between these two values is 7.37% (= (230 − 109) / 109 × 100%). Therefore, a 128 GB SSD with 0% additional over-provisioning would provide 128,000,000,000 bytes to the user (out of 137,438,953,472 total). This initial 7.37% is typically not counted in the total over-provisioning number, and the true amount available is usually less as some storage space is needed for the controller to keep track of non-operating system data such as block status flags. Free user space The SSD controller will use free blocks on the SSD for garbage collection and wear leveling. The portion of the user capacity which is free from user data (either already TRIMed or never written in the first place) will look the same as over-provisioning space (until the user saves new data to the SSD). If the user saves data consuming only half of the total user capacity of the drive, the other half of the user capacity will look like additional over-provisioning (as long as the TRIM command is supported in the system). == DRAM buffer ==
DRAM buffer
The DRAM buffer (if present) on flash devices (usually SSD) can be used for caching FTL table, buffering data writes, and garbage collection. == Secure erase ==
Secure erase
The ATA Secure Erase command is designed to remove all user data from a drive. With an SSD without integrated encryption, this command will put the drive back to its original out-of-box state. This will initially restore its performance to the highest possible level and the best (lowest number) possible write amplification, but as soon as the drive starts garbage collecting again the performance and write amplification will start returning to the former levels. Many tools use the ATA Secure Erase command to reset the drive and provide a user interface as well. One free tool that is commonly referenced in the industry is called HDDerase. GParted and Ubuntu live CDs provide a bootable Linux system of disk utilities including secure erase. Drives which encrypt all writes on the fly can implement ATA Secure Erase in another way. They simply zeroize and generate a new random encryption key each time a secure erase is done. In this way the old data cannot be read any more, as it cannot be decrypted. Some drives with an integrated encryption will physically clear all blocks after that as well, while other drives may require a TRIM command to be sent to the drive to put the drive back to its original out-of-box state (as otherwise their performance may not be maximized). ==Wear leveling ==
Wear leveling
If a particular block was programmed and erased repeatedly without writing to any other blocks, that block would wear out before all the other blocks – thereby prematurely ending the life of the SSD. For this reason, SSD controllers use a technique called wear leveling to distribute writes as evenly as possible across all the flash blocks in the SSD. In a perfect scenario, this would enable every block to be written to its maximum life so they all fail at the same time. Unfortunately, the process to evenly distribute writes requires data previously written and not changing (cold data) to be moved, so that data which are changing more frequently (hot data) can be written into those blocks. Each time data are relocated without being changed by the host system, this increases the write amplification and thus reduces the life of the flash memory. The key is to find an optimal algorithm which maximizes them both. Separating static and dynamic data The separation of static (cold) and dynamic (hot) data to reduce write amplification is not a simple process for the SSD controller. The process requires the SSD controller to separate the LBAs with data which is constantly changing and requiring rewriting (dynamic data) from the LBAs with data which rarely changes and does not require any rewrites (static data). If the data is mixed in the same blocks, as with almost all systems today, any rewrites will require the SSD controller to rewrite both the dynamic data (which caused the rewrite initially) and static data (which did not require any rewrite). Any garbage collection of data that would not have otherwise required moving will increase write amplification. Therefore, separating the data will enable static data to stay at rest and if it never gets rewritten it will have the lowest possible write amplification for that data. The drawback to this process is that somehow the SSD controller must still find a way to wear level the static data because those blocks that never change will not get a chance to be written to their maximum P/E cycles. == Performance implications ==
Performance implications
Sequential writes When an SSD is writing large amounts of data sequentially, the write amplification is equal to one meaning there is less write amplification. The reason is as the data is written, the entire (flash) block is filled sequentially with data related to the same file. If the OS determines that file is to be replaced or deleted, the entire block can be marked as invalid, and there is no need to read parts of it to garbage collect and rewrite into another block. It will need only to be erased, which is much easier and faster than the read–erase–modify–write process needed for randomly written data going through garbage collection. Random writes The peak random write performance on an SSD is driven by plenty of free blocks after the SSD is completely garbage collected, secure erased, 100% TRIMed, or newly installed. The maximum speed will depend upon the number of parallel flash channels connected to the SSD controller, the efficiency of the firmware, and the speed of the flash memory in writing to a page. During this phase the write amplification will be the best it can ever be for random writes and will be approaching one. Once the blocks are all written once, garbage collection will begin and the performance will be gated by the speed and efficiency of that process. Write amplification in this phase will increase to the highest levels the drive will experience. == Impact on performance ==
Impact on performance
The overall performance of an SSD is dependent upon a number of factors, including write amplification. Writing to a flash memory device takes longer than reading from it. An SSD generally uses multiple flash memory components connected in parallel as channels to increase performance. If the SSD has a high write amplification, the controller will be required to write that many more times to the flash memory. This requires even more time to write the data from the host. An SSD with a low write amplification will not need to write as much data and can therefore be finished writing sooner than a drive with a high write amplification. == Product statements ==
Product statements
In September 2008, Intel announced the X25-M SATA SSD with a reported WA as low as 1.1. In April 2009, SandForce announced the SF-1000 SSD Processor family with a reported WA of 0.5 which uses data compression to achieve a sub 1.0 WA. Before this announcement, a write amplification of 1.0 was considered the lowest that could be attained with an SSD. == See also ==
tickerdossier.comtickerdossier.substack.com