IT LOOKS LIKE the wait for more Cypress/HD58xx cards just got a little longer, but not by much. One has to wonder how TSMC can be so bad for so long.
The short story is that the flow of Cypress chips was supposed to uncork on December 1, plus or minus a week or so. ATI just updated AIBs that it is now another week or two on top of that. Instead of December 1, think December 15. Not the end of the world, but it is still annoying.
The blame has to be laid squarely at the feet of TSMC. Yields going from good to awful without any changes to the chip are not a design problem, they are a process problem. How TSMC could backslide like that is beyond me.
Officially, the problem is a ‘chamber mismatch’. This is where a tool, likely a plasma etch chamber, is out of calibration. Basically, if you set it to 5, and it works like a 3 or a 7, it is ‘off’. This can happen for a number of reasons, but semiconductor process engineers spoken with by SemiAccurate say that this is a bring up error. It doesn’t just happen in the middle of a run.
It looks like this happened at TSMC, and not only that, but it went undetected for weeks or months. Given the number of chips affected, or more to the point, the number of chips that were supposed to come out but didn’t, it was undetected for a long long time.
If this ‘mismatch’ happened early in the chip making process, then you might think it was plausible that the problem wasn’t detected until the dead chips came off the line. This isn’t likely given the cost of bad parts. Chipmakers check and double check the results of tools with stunning regularity. In the industry parlance, the term for this is called metrology.
So, what happened is that TSMC claims that it essentially had a chamber mis-calibrated. Unlikely, but fair enough. In the industry technical parlance, this is called ‘sh*t happens’. For TSMC’s metrology checks not to catch this for a month or more however is not acceptable. It should have been at most a day, and then a little longer to figure out why it happened, and maybe a few days to get the darn thing re-calibrated.
For TSMC to screw up this badly for this long is not an understandable problem. It goes beyond error, and falls well into that nebulous zone where too many things went wrong for them to all have been chance. You have a bring up error, a sudden tool miscalibration, and then two plus months of no one noticing something that should have set off alarm bells within an hour or two. This should have been tested for and caught within a wafer batch or two.
TSMC has been making a mess of things lately, but it does have a 40nm semiconductor process that mostly works. This makes it one of the few places in the world that has that high a level of technology. It didn’t get there by chance, that took work, engineering, and attention to detail.
With that in mind, the sheer incompetence of this error, coupled with the sheer incompetence of the metrology, makes you wonder if it was actually by chance. From there, things get really odd.S|A