Nvidia Turing GPU deep dive: What's inside the radical GeForce RTX 2080 Ti


It’s time to pull back the curtain on the Turing GPU inside Nvidia’s radical new GeForce RTX 20-series, the first-ever graphics cards designed to handle real-time ray tracing thanks to the inclusion of dedicated tensor and RT cores. But the GeForce RTX 2080 and RTX 2080 Ti were also designed to significantly improve performance in traditionally rendered games, with enough power to feed those blazing-fast 4K, 144Hz G-Sync HDR gaming monitors

Nvidia revealed plenty of numbers during the GeForce RTX 2080 Ti announcement. Clock speeds, memory bandwidth, CUDA core counts—it was all there. This deeper dive explains the underlying, architectural changes that make Nvidia’s Turing GPU more potent than its Pascal predecessor. We’ll also highlight some new Nvidia tools that developers can embrace to speed up performance even more, or bring the AI-boosted power of Nvidia’s Saturn V supercomputer into your graphics card.

This isn’t a review of the GeForce RTX 2080 or GeForce RTX 2080 Ti, both of which are scheduled to hit the streets on September 20. But with RTX preorders already open, hopefully this gives you a better idea of what you’re spending your money on. Again, be sure to read our coverage of the GeForce GTX 2080 Ti’s announcement for feeds, speeds, and product details about the graphics cards themselves.

Nvidia Turing GPU overview

Before we dig in, here’s a high-level specifications overview for the Turing TU102 GPU inside the flagship GeForce RTX 2080 Ti.

lkdfghpzhx 17 Nvidia

Nvidia’s TU102 GPU, found inside the GeForce RTX 2080 Ti. (Click on any image in this article to enlarge it.)

Here’s Nvidia’s high-level overview, in case what you’re looking at isn’t clear:

“The TU102 GPU includes six Graphics Processing Clusters (GPCs), 36 Texture Processing Clusters (TPCs), and 72 Streaming Multiprocessors (SMs). Each GPC includes a dedicated raster engine and six TPCs, with each TPC including two SMs. Each SM contains 64 CUDA Cores, eight Tensor Cores, a 256 KB register file, four texture units, and 96 KB of L1/shared memory which can be configured for various capacities depending on the compute or graphics workloads… Tied to each memory controller are eight ROP units and 512 KB of L2 cache.”

You’ll also find a single RT processing core within each SM, so there are 72 in the GeForce RTX 2080 Ti. Because the RT and tensor cores are baked right into each streaming multiprocessor, the lower you go in the GeForce RTX 20-series lineup, the fewer you’ll find of each. The RTX 2080 has 46 RT cores and 368 tensor cores, for example, and the RTX 2070 will have 36 RT cores and 288 tensor cores.

With all that cutting-edge stuff packed in, all requiring dedicated hardware, it shouldn’t come as a surprise that Turing is absolutely massive. The die measures in at a whopping 754mm, compared to the 471mm Pascal GPU inside the GTX 1080 Ti.

Inside Nvidia Turing GPU: Shading and memory improvements

Let’s explain the improvements to the long-established stuff before digging into the exotic new tensor and RT cores.



Source link

Leave a Reply

Your email address will not be published.