Noted hardware historian and reverse-engineer Ken Shirriff recently found the exact transistors in the original Intel Pentium which caused the "FDIV bug", leading to a $475 million recall in 1994. As seen on his Mastodon thread, Shirriff took a microscopic dive into the PLA which holds a faulty division table, tracking down the root cause of Intel's first major failure 30 years ago.
The image seen above is a photo of the CPU die of the original Pentium chip, Intel's first CPU on the P5 architecture which helped the company become a household name. The Pentium was made on an 800nm process, with the above die shot taken through stitched-together microscope photography. The die contains 3.1 million transistors, with transistor grids being visible to microscopic vision and the operations of blocks on the die able to be identified. Compare this to today's processors, which have tens of billions of transistors and are nigh-indecipherable.
The math error that led to the FDIV bug was caused by calculation errors in the PLA (programmable logic array). The Pentium's floating point unit was much faster than contemporary chips, thanks to the SRT division algorithm. SRT calculates division at two bits per clock cycle, compared to one bit per clock cycle of Pentium's predecessor.
For this to work, SRT required the presence of a 2,048-cell table on the die, listing values -2, -1, 0, 1, and 2 in a very compact 112 rows. The values are indicated by the presence or lack of transistors along grid points. This would have been a brilliant strategy, if not for one flaw: 5 entries on the table are missing their crucial transistors, set to 0 by default rather than the correct "2".
The mislabeled entries create an error in floating point calculations, but the error's rarity was debated in the day. After discovery by Professor Thomas Nicely, the FDIV bug was called unimportant by Intel, claiming it would only happen once every 27,000 years. IBM declared it could happen every 24 days and halted sales of Pentiums. Intel caved to immense monetary pressure and recalled all affected chips at a loss of $475 million (read our 30th-anniversary post on the event for more of the history).
"Smart mathematicians figured out Pentium's division algorithm and the missing entries in 1995 by examining the pattern of errors," says Shirriff. "But I can confirm it in silicon." What's more, Shirriff's investigation found 16 missing data points, 11 more than the originally believed five. These 11 don't cause errors simply "due to luck." Intel later fixed the problem by filling all unused entries on the boards with 2's, a quick fix that worked and saved tons of room on future revisions of the Pentium.
For a fuller description of the Pentium die and the error, see Shirriff's full Mastodon thread. In the coming days, Shirriff promises a deeper dive into his investigation on his blog, which may include if it's possible to fix bug-affected Pentiums through physically editing the PLA.