Nvidia Turing GPU Architecture

User Manual - Page 26

For Turing GPU Architecture.

PDF File Manual, 86 pages, Read Online | Download pdf file

Turing GPU Architecture photo
Loading ...
Loading ...
Loading ...
Turing GPU Architecture In-Depth
NVIDIA Turing GPU Architecture WP-09183-001_v01 | 20
GDDR6 is the next big advance in high-bandwidth GDDR DRAM memory design. Enhanced with
many high-speed SerDes and RF techniques, GDDR6 memory interface circuits in Turing GPUs
have been completely redesigned for speed, power efficiency, and noise reduction. This new
interface design comes with many new circuit and signal training improvements that minimize
noise and variations due to process, temperature, and supply voltage. Extensive clock gating was
used to minimize power consumption during periods of lower utilization, resulting in significant
overall power efficiency improvement. Turing’s GDDR6 memory subsystem delivers 14 Gbps
signaling rates and 20% power efficiency improvement over GDDR5X memory used in Pascal
GPUs.
Achieving this speed increase requires end-to-end optimizations. Using extensive signal and
power integrity simulations, NVIDIA carefully crafted Turing’s package and board designs to meet
the higher speed requirements. An example is a 40% reduction in signal crosstalk, which is one of
the most severe impairments in large memory systems.
To realize speeds of 14 Gbps, every aspect of the memory subsystem was carefully crafted to
meet the demanding standards that are required for such high frequency operation. Every signal
in the design was carefully optimized to provide the cleanest memory interface signaling as
possible (see Figure 11).
Figure 11. Turing GDDR6
L2 Cache and ROPs
Turing GPUs add larger and faster L2 caches in addition to the new GDDR6 memory subsystem.
The TU102 GPU ships with 6 MB of L2 cache, double the 3 MB of L2 cache that was offered in the
prior generation GP102 GPU used in the TITAN Xp. TU102 also provides significantly higher L2
cache bandwidth than GP102.
Like prior generation NVIDIA GPUs, each ROP partition in Turing contains eight ROP units and
each unit can process a single-color sample. A full TU102 chip contains 12 ROP partitions for a
total of 96 ROPs.
Loading ...
Loading ...
Loading ...