WE’VE SPENT A fair bit of time over the last month pondering the upcoming Radeon series from AMD. Specifically, the card which is rumored to be the 6770 is of interest. Sure the bigger cards get the crowns and bigger fanfare but this should be a card that most of us are more likely to buy given the price ranges.
The 6770, or Barts, has been rumored to have a 256 bit memory bus. This seems to break Radeon tradition in this price category. A 128 bit bus is simpler, takes up less die space and saves a little on the PCB. Keeping costs down is a definite AMD/ATi strategy as of late. A few dollars here and there really makes a difference on these cards. Why would AMD want to increase costs? Is a bigger memory controller really going to make a worthwhile difference? Many are saying no, Barts doesn’t need twice the bandwidth of Juniper. Does it? Lets examine.
A few tests
We began with tests on a Radeon 5770 adjusting both GPU frequency and memory frequency. Many other games were tested and ended up with similar results to what the two below show as well as what other sites have found in the past. As testing progressed, there was an ever growing concern about DDR5 error correction affecting the results. It wasn’t until the Crysis and Vantage were tested that the methodology was looking valid, but we don’t pretend our methodology covers everything.
Memory clocks were adjusted in small increments until test failure. Resulting percentages never really wavered. Feeling confident with those findings we simply used 3DMark Vantage and Crysis in the charts below and left out the rest as opposed to another couple of days of re-running the benchmarks.
Vantage refused to cooperate with an 8.3 % memory overclock **Note that the whole is greater than the sum
These tests were performed on a Phenom X4 @3.2 GHz, 4GB ram, Windows 7 64-bit. Similar benchmarks were performed on faster I7 systems indicating that a CPU bottleneck is not a problem.
The results are somewhat mixed but do show that GPU overclocking benefits performance more than memory overclocking. In the end, GPU overclocking showed to be overall more beneficial that memory clocking in just about every case. Underclocking tests were also run to be sure that the error correcting abilities of DDR5 weren’t affecting the results. In the end, about a net 35% benefit would sum it up. There’s one interesting point to note with the Crysis scores. Combining the overclocks of both memory and GPU provided a greater improvement than the sum of separate increases. The GPU seems to have been limited by bandwidth, at least in this case. So far, the information presented clearly demonstrates a fairly balanced 5770. There are benefits from both forms of overclocking here and the benefits are approaching linear in cases like Crysis Warhead when both GPU and memory clocks are changed.
Investigating other GPUs
After the 5770 examination, the comparison to the HD4890 seems almost mandatory. These two chips have almost seemingly identical architectures, barring dx11, and, of course, the memory controllers. It has been theorized that the 20% performance difference between these two cards is due to memory bandwidth. Note that the 4890 does not have double the bandwidth even though its memory controller is twice the width (256 bit vs 128 bit). Using different rated memory, the 4890 has 67% more bandwidth to the tune of 124.8 GB/sec vs 76.8 GB/sec. Lets observe this fact in the tables with the last memory figures. We’ll use a 20% difference between the 4890 and 5770 for this comparison as a perfectly accurate delta can never be achieved.
Interesting? The benefits of memory bandwidth seem to be mirrored with the new comparison. You’re not going to get 100% scaling with bandwidth increases. You are, however, going to achieve something to the tune of 30-40% with this architecture. At least to a certain point. Sooner or later, the returns will obviously diminish. Its worth noting that the 4890 was provided with faster memory than the 4870. Had the 4870 been provided with more than enough bandwidth (50% more than the 5770) in would not have been necessary to add faster memory chips. Usually though, faster memory is added with all faster increments to a particular series to be sure to squeeze out every last bit of performance.
Another scenario we can consider is Nvidia’s 9600GT and the 9800GT. Both have the same sized 256 bit memory controller. Here are comparative specs.
The 9800 GT has a couple major advantages over the 9600 GT. The 9800’s Texel fill rate and floating point ops have a 62% advantage over the 9600. Memory fill rate is identical which is why the 9800 GT was chosen over the GTX for the comparison. It’s generally accepted that the 9800 has a 15% performance advantage over the 9600. All things considered, we’re seeing scaling less effective than performance increase from memory bandwidth scaling. Once again, calculating these scenarios is not an exact science but the overall effects of the architectures can still be reasonably examined and concluded.
The cost of bandwidth
X-bit labs once calculated the memory controller size of the RV770 to be between 40-45 million transistors. That represents less than 5% of the total transistor count of the RV770. Given the similarities between Juniper and the RV770, it seems reasonable that the controller hasn’t changed that much in transistor count according to the width. This means Juniper’s controller would be around 22 million transistors or around 2% of the transistor count. What’s the size occupied on the die? We don’t know and can only estimate based on transistor count without knowing the transistor density of that particular component. The point is, the controller isn’t taking up a whole lot of die space whether its 128 bit or 256 bit. The cost of the die will go up a little more than the percentages show due to yields as well as the size. If we add in the cost of a pcb with more layers we could be looking at an additional $10 per card, most of the cost being attributed to the pcb. Is that increase in cost worth it? Let’s continue.
It should be reasonable to consider the advantages of alternative methods of increasing performance such as shader counts and texture units as seen with the 9600/9800 comparison. Doubling the size of the memory controller may only increase the size of a chip by between 2-10%, depending on what chip it is. Increasing shader counts and texture units require more die space for similar benefits for the architectures discussed here. Benefits come at a greater cost in die area but save on the cost of the pcb. There are many factors to consider when deciding which route to take. AMD and Nvidia need to keep their gross margins high without destroying the gross margins of their partners. Yields need to be anticipated and considered. Appropriate costs need to be maintained for the performance categories being targeted. A balance is needed.
Will Barts have a 256 bit memory controller?
It’s pretty important to remember what the competition looks like for Barts. That would be Nvidia’s GTX 460 which has a 256 bit controller. Can AMD compete against this series with just another 128 bit memory controller. Not likely. Even if the speculated DDR5 for the Cayman series was available for Barts at 1600 MHz (x4) we’re only looking at 25% more bandwidth. If the net benefit of the extra bandwidth is as high as 40%, we’ll end up with only a 10% performance increase due to the memory bandwidth. A 256 bit controller will probably net over 20% of a performance improvement. It’s fairly certain that Barts will have other architectural improvements. More bandwidth will be needed to accommodate those improvements. Remember the excellent scaling when both GPU and memory clocks were raised together. The last thing anyone would want is to limit the benefits of the new architecture by limiting the memory bandwidth when the die space cost would be so small. In the end, a bigger memory controller could be the biggest contributor to performance improvements in the the Barts series of GPUs.
Does AMD agree? We’ll know in a few more weeks.S|A