Last September, at SNIA’s Storage Developer’s Conference, Microsoft presented a prototype of the Project Denali SSD.
Project Denali drives provide the flexibility needed to optimize for the workloads of a wide variety of cloud applications, the simplicity to keep pace with rapid innovations in NAND flash memory and application design, and the scale required for multitenant hardware that is so common in the cloud.
This month, Azure Storage senior software engineer Laura Caulfield is attending the Open Compute Project (OCP) U.S. Summit 2018 to begin formalization of the specification that will define the interface to Project Denali drives.
Once in place, the specification will allow both hardware vendors and cloud providers to build and release their final products.
The specification defines a new abstraction, which separates the roles of managing NAND and managing data placement.
The former will remain in the hardware, close to the NAND and in the product that reinvents itself with every new generation of NAND.
The latter, once separated from the NAND management algorithms, will be allowed to follow its own schedule for innovation, and won’t be prone to bugs introduced by product cycles that track solely with NAND generations.
Caulfield states in a blog post, “One of the primary drivers behind the Project Denali architecture is the mismatch of write and reclaim size between the hardware and the application.
“Cloud-scale applications tend to scale out the number of tenants on a machine, whereas the hardware scales up the size of its architectural parameters, as the number of cores in a server increases, a single machine can support more VMs.”
“When storage servers increase their capacity, they typically increase the number of tenants using each as a back-end, while there are some notable exceptions, there is still a need for cloud hardware to provide enough flexibility to efficiently serve these multi-tenant designs.”
SSDs’ caching further increases this divide.
Caulfield continues, “To make the best use of the SSD’s parallelism, its controller collects enough writes to fill one flash page in every plane, about 4MB in our current drives.
“If the data comes from multiple applications, the drive mixes their data in a way that is difficult to predict or control, this becomes a problem when the device performs garbage collection (GC).”
By this time, some applications may have freed their data, or updated it at a faster rate than other applications.
For efficient GC, the ideal is to free or update the data in a single block.
But when caching brings the effective block size to 1GB, it’s very unlikely that the host can issue enough data to the drive for one tenant before servicing writes for the next tenant.
The final nail in the coffin is in how flash block sizes change over time.
Advances in the density of NAND flash typically increase the size of the block.
For this reason, it will be even more important for the host to have access to place its data on the native block size in future designs.
Caulfield concludes, “To address the challenges described above, we took a closer look at the algorithms in SSD’s flash translation layers, and in cloud applications.
“We look forward to finalizing the Denali specification in the months ahead through the Denali group and plan to make this specification available broadly later this year.”
“Refactoring the flash translation layer will promote innovation across the storage stack."