Unity ECC: Unified Memory Protection Against Bit and Chip Errors
DescriptionDRAM vendors utilize On-Die Error Correction Codes (OD-ECC) to correct random bit errors internally. Meanwhile, system companies utilize Rank-Level ECC (RL-ECC) to protect data against chip errors. Separate protection increases the redundancy ratio to 32.8% in DDR5 and incurs significant performance penalties. This paper proposes a novel RL-ECC, Unity ECC, that can correct both single-chip and double-bit error patterns. Unity ECC corrects double-bit errors using unused syndromes of single-chip correction. Our evaluation shows that Unity ECC without OD-ECC can provide the same reliability level as Chipkill RL-ECC with OD-ECC. Moreover, it can significantly improve system performance and reduce DRAM energy and area by eliminating OD-ECC.
Event Type
TimeWednesday, 15 November 202311:30am - 12pm MST
Architecture and Networks
Data Analysis, Visualization, and Storage
Fault Handling and Tolerance
Registration Categories
Award Finalists
Best Student Paper Finalist