From NAND to EDSFF: 8 SSD Features You May Not Know About, But Should

By Praveen Midha

Director of Segment Marketing & Technical Marketing in Data Center Flash

Western Digital Corporation

March 05, 2024


From NAND to EDSFF: 8 SSD Features You May Not Know About, But Should

In the ever-evolving landscape of data storage, selecting the right enterprise-class SSD (Solid-State Drive) for your specific data center application or workload has become a critical decision.

While factors like price and performance are undoubtedly important, there are numerous other aspects that must be considered. Get the form factor wrong and it may not fit into your server. Choose the wrong endurance rating and you may find yourself replacing the SSD in several months’ time.

SSDs have evolved tremendously over the past several years — pushing endurance, capacity, speed, and low power to meet expanding needs for storage in a data-centric world. This includes NVMe™ — the latest storage protocol that enables five times the speed of traditional serial-attached SCSI (SAS) and SATA drives.

However, given this rapid pace of innovation, choosing the right SSD solution for your applications may require a deep dive into the new features and capabilities you may not know about but are transforming data center and cloud storage landscapes.

A Guide to Eight SSD Features

Here are eight SSD features and characteristics that you should evaluate carefully to better understand the impact of the enterprise SSDs on the market today, and how they will help meet the needs of your organization:

  1. Capacity: One of the top purchasing considerations when it comes to any type of storage is the capacity, or the total amount that the drive can store. This is usually measured in gigabytes (GB) or terabytes (TB) today. Be aware, however, that there is a difference in as-built vs. as-sold capacities of SSDs. This is because some of the drive space is conserved for overprovisioning. This capacity, usually between 0-28%, is reserved for the drive’s operational functions and garbage collection to enhance the overall drive performance and speed. For example, with 7% overprovisioning, a 1024GB as-built drive will have 960GB of as-sold capacity. Higher amounts of overprovisioning can also increase the drive’s endurance as a significant portion of the drive capacity can be reserved as back up data storage to better distribute the number of read/write cycles across a larger population of memory to increase the lifetime of the drive.

  2. Endurance: Flash cells can only be written a finite number of times, and going over this limit – usually between 100 to more than 10,000 times – will lessen the integrity of the device. How long a drive is expected to last under certain conditions, or the drive’s endurance, is measured utilizing two different metrics.

Drive writes per day (DWPD) measures how many times you can write to the drive’s entire capacity each day of its lifetime based on its warranty. Usually this metric is stated in numbers between 0.3-1 DWPD. For example, if you have a 1TB drive with a 5-year warranty and a DWPD of 1, you can write 1TB of data to the drive each day for the next 5 years.

Terabytes written (TBW) or petabytes written (PBW) measures how much cumulative data you can write to the drive over its entire lifetime. This metric can be calculated from DWPD. For instance, expanding on the example above, TBW = capacity (1TB) x DWPD (1) x 365 (days of the year) x 5 (years in the warranty). In this case, TBW is 1825TB.

While higher DWPD and TBW is generally better, some applications may be mostly read-intensive with very few writes. It’s important to choose an SSD solution that meets application requirements without increasing costs. Make sure you choose an SSD solution that includes endurance features that automatically measure the number of writes that have been executed and how many you have left until the endurance runs out. Having the right measurement infrastructure in place allows you to determine your needs without having to monitor your own application to get exact values. This helps with resource optimization, procurement planning, and over-provisioning without having to manually keep track of writes.

  1. Error Rates: Data integrity should always be a consideration when it comes to choosing your data storage. The error rate of a device is the frequency that errors occur during the reading or writing of the data on the drive. In other words, it is the expected failure rate over the lifetime of the drive. This metric can be measured in three different ways.

The first is unrecoverable bit error rate (UBER) which measures the data corruption that cannot be corrected by the drive. It is the ratio of the total number of data bits read from or written to the the SSD to the number of bits that are unrepairable. It is reported in scientific notation in a range of 1x10-16 to 1x10-18. For example, an UBER of 1x10-16 means that there is one uncorrectable error for every 1016 bits read.

Mean time between failure (MTBF) measures the average expected time in hours between system breakdowns and directly correlates to the drive’s reliability. It is usually stated in hours and most SSDs are rated from 1-2 million hours. This metric doesn’t indicate that a specific drive is guaranteed to last that long, however. Instead, this is a statistical calculation for the average time before a drive fails.

Annual failure rate (AFR) is the probability of a drive failing during a full year of use. It is usually stated as a percentage with most SSDs rated between 0.3-0.4%. This number can be calculated from MTBF as 1 AFR = 8760 MTBF hours.

  1. NAND: The NAND cell is the most fundamental storage component in an SSD, determining the number of bits stored per cell and, ultimately, density and costs. In single layer cell (SLC) NAND, only 1 bit of data can be stored per cell. SLC has the highest speed and endurance, but it also has the lowest data density which means the cost is higher. Multi-layer cell (MLC) stores 2 bits of data per cell. It is less expensive than SLC as it has a higher capacity, but is slower and has less endurance. Triple-layer cell (TLC) can store, you guessed it, 3 bits of data per cell with higher capacity than MLC which gives it a cost advantage, but again, is slower and has less endurance. Quad-layer cell (QLC) has higher capacity still, but is slower and has less endurance than MLC. Penta-layer cell (PLC), with 5 bits per cell, is the next step that is actively being developed. Choosing the correct type means balancing your particular needs for durability, speed, and cost.
  2. NVMe: Nonvolatile memory express (NVMe) is set of commands that is typically transported over the PCIe® bus. Earlier NAND SSDs used legacy protocols built for hard drives like SATA and SAS. These protocols did not take advantage of NAND flash speeds and characteristics very well. NVMe was created to optimize data transfers to flash devices and increase the performance ceiling. PCIe was chosen as the medium for mainstream NVMe drives but is not the only method. NVMe-oF™ (nonvolatile memory express-over fabrics) was introduced to expand the use cases for NVMe devices allowing them to be linked via Fibre Channel, Ethernet, InfiniBand™, or other hardware pipelines.

NVMe SSDs provide higher performance, throughput, scalability, and efficiency with lower latency over SAS or SATA. For use cases that require speed and responsiveness, NVMe provides numerous advantages.

  1. Enterprise & Data Center SSD Form Factor:

Form factor determines how and where the SSD physically fits in the storage environment, and it’s important to remember that one size does not fit all. While SSDs still come in 2.5-inch small form factor (SFF) drives, add-in cards (AIC) and long, thin "gum stick” form factors (M.2), a more recent addition are the EDSFF designs.

Optimized for storage servers to achieve high density in a small server footprint, EDSFF helps increase petabytes-per-server rack storage in cloud infrastructures – a requirement for modern applications that generate, consume, and deliver huge volumes of information. EDSFF now include drives in multiple sizes and dimensions, including long, short, thin, and thick depending on the desired characteristics and the type of enclosure. There are 4 main varients of EDSFF: E1.L, E1.S, E3.S and E3.L. In addition to increased density, EDSFF drives can also improve airflow for better cooling and overall performance, provide seamless access for easier serviceability, and include “hot-swap” capability - the ability to leave a system on for replacement. These features improve TCO with better cost efficiency.

  1. Power loss protection: Power loss protection (PLP) aims to protect data in devices against sudden power loss. During normal operation, a portion of the data (usually some active/hot data) is temporarily stored in volatile memory rather than the NAND. This volatile memory needs a constant power to hold that data. During a normal shutdown procedure, there is a step to flush the data from the volatile memory into the NAND so it is preserved. But during a sudden or unexpected power loss, the flushing step cannot be completed, so whatever data was on the volatile memory is lost.

PLP is a way to create more time for the data flushing process under sudden power loss by utilizing additional capacitor components within the device that charge up when there is power and can keep the volatile memory charged up enough to move the data to the NAND. Without PLP, data in flight cannot be guaranteed to be written and may result in corrupt, incomplete, or even lost data.

  1. Performance: When choosing the right SSD, there are two ways to look at performance measurements: burst and steady state. These performance metrics will give you insights on how a drive behaves under different conditions. Your use case and application will determine which one is more important.

Client drives are measured by their burst performance, you can think of them as the sprinters of the flash world.  Most client workloads are quick and fast like opening a program or saving a file and can utilize an SLC cache to speed it up. Burst performance values are tested when the drive is not full of data and has not had any I/O commands issued. From a rested state, the drive is issued a read or write command to see how quickly it can be completed. This allows the drive to use clever algorithms and caching technology to rapidly serve the user the data they are asking for or to write what they need saved.

Enterprise drives are the workhorses of the flash world, you can compare them to ultra marathon runners. They are typically built to handle constant activity from multiple users at the same time and therefore are measured differently from client drives. Enterprise drives require more vigorous testing compared to client drives so the workloads measured on each differ. Before performance is measured on an enterprise SSD, they must first get to “steady state.” To reach steady state, a drive is filled with data and then executes a workload until performance is consistent, this can take multiple hours depending on the capacity and speed of the drive. Once performance has stabilized, then measurements are taken which show how the drive performs in a worst-case scenario. There is also a heavy focus on Quality of Service (QoS) during steady state since response time is imperative when dealing with real-time applications.

Finding the Right SSD for Your Applications

Recent advances in SSD solutions have made them ideally suited for many of today’s high-performance applications across enterprise storage data center and cloud environments. As applications continue to grow more diverse in storage needs, it’s important to be up to date on advanced SSD technology so you can achieve the best capacity, speed, and density with the right price per performance that is ideal for your enterprise applications.

Praveen is focused on expanding Western Digital’s enterprise flash storage business across data center customers. He’s focused on product strategy, portfolio expansion and key GTM initiatives. Praveen earned a master’s in business administration from Santa Clara University and a bachelor’s degree from Indian Institute of Technology (BHU) Varanasi, India. He is based at the company’s San Jose, CA headquarters location.

More from Praveen