Maxwell (microarchitecture)

Maxwell is the codename for a GPU microarchitecture developed by Nvidia as the successor to the Kepler microarchitecture. The Maxwell architecture was introduced in later models of the GeForce 700 series and is also used in the GeForce 800M series, GeForce 900 series, and Quadro Mxxx series, as well as some Jetson products.

{{Anchor|SMM}}First generation Maxwell (GM10x)

First generation Maxwell GPUs (GM107/GM108) were released as GeForce GTX 745, GTX 750/750 Ti, GTX 850M/860M (GM107) and GeForce 830M/840M (GM108). These new chips introduced few consumer-facing additional features, as Nvidia instead focused more on increasing GPU power efficiency. The L2 cache was increased from 256 KiB on Kepler to 2 MiB on Maxwell, reducing the need for more memory bandwidth. Accordingly, the memory bus was reduced from 192 bit on Kepler (GK106) to 128 bit, reducing die area, cost, and power draw. The latter necessitated an SMX-wide crossbar that used unnecessary power to allow all execution units to be shared. Nvidia also claims an eight to ten times performance increase in PureVideo Feature Set E video decoding due to the video decoder cache, paired with increases in memory efficiency. However, H.265 is not supported for full hardware decoding in first generation Maxwell GPUs, relying on a mix of hardware decoding and software decoding (CPU decoding). but they actually use tiled caching. Since first generation Maxwell, UEFI Graphics Output Protocol is fully supported on NVIDIA GPUs. Chips • GM107 • GM108 == Second generation Maxwell (GM20x) ==

Second generation Maxwell (GM20x)

Second generation Maxwell GPUs introduced several new technologies: Dynamic Super Resolution, Third Generation Delta Color Compression, Multi-Pixel Programming Sampling, Nvidia VXGI (Real-Time-Voxel-Global Illumination), VR Direct, Multi-Projection Acceleration, (however, support for Coverage-Sampling Anti-Aliasing(CSAA) was removed), and Direct3D12 API at Feature Level 12_1. HDMI 2.0 support was also added. The ROP to memory controller ratio was changed from 8:1 to 16:1. However, some of the ROPs are generally idle in the GTX 970 because there are not enough enabled SMMs to give them work to do, reducing its maximum fill rate. The Polymorph Engine responsible for tessellation was upgraded to version 3.0 in second generation Maxwell GPUs, resulting in improved tessellation performance per unit/clock. Second generation Maxwell also has up to 4 SMM units per GPC, compared to 5 SMM units per GPC. GM20x GPUs have an upgraded NVENC which supports HEVC encoding and adds support for H.264 encoding resolutions at 1440p/60FPS & 4K/60FPS (compared to NVENC on Maxwell first generation GM10x GPUs which only supported H.264 1080p/60FPS encoding). Nvidia revealed that it is able to disable individual units, each containing 256KB of L2 cache and 8 ROPs, without disabling whole memory controllers. This comes at the cost of dividing the memory bus into high speed and low speed segments that cannot be accessed at the same time for reads, because the L2/ROP unit managing both of the GDDR5 controllers shares the read return channel and the write data bus between the GDDR5 controllers. This makes simultaneous reading from both GDDR5 controllers or simultaneous writing to both GDDR5 controllers impossible. This is used in the GeForce GTX 970, which therefore can be described as having 3.5 GB in a high-speed segment on a 224-bit bus and 512 MB in a low-speed segment on a 32-bit bus. The peak speed of such a GPU can still be attained, but the peak speed figure is only reachable if one segment is executing a read operation while the other segment is executing a write operation. Chips • GM200 • GM204 • GM206 == Performance ==

Performance

The theoretical single-precision processing power of a Maxwell GPU in FLOPS is computed as 2 (operations per FMA instruction per CUDA core per cycle) × number of CUDA cores × core clock speed (in Hz). The theoretical double-precision processing power of a Maxwell GPU is 1/32 of the single precision performance (which has been noted as being very low compared to the previous generation Kepler). == Successor ==

Successor

The successor to Maxwell is codenamed Pascal. The Pascal architecture features higher bandwidth unified memory and NVLink. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com