On 1st September 2020, Nvidia announced their latest flagship GPUs, the GeForce RTX 3000 series, based on the state of the art “Ampere” architecture. They claim the new GPUs offer nearly twice the performance and efficiency of the GeForce RTX 2000 series. Let us see what’s new in this series.
Nvidia Ampere Architecture
Researchers and engineers are trying to solve the scientific, industrial, and big data challenges with AI and high-performance computing. The NVIDIA Ampere architecture delivers the next giant leap by providing unmatched acceleration at every scale. The latest and groundbreaking innovations in this are:
- Third-Generation Tensor Cores: First introduced in the NVIDIA Volta architecture, the NVIDIA Ampere architecture builds upon these innovations by bringing new precision Tensor Float (TF32) and Floating-Point 64 (FP64).
- Multi-Instance GPU (MIG): Not every application needs the performance of a full A100 GPU (launched May 14th,2020). With MIG, A100 can be partitioned into seven GPU instances, fully isolated and secured at the hardware level.
- Third-Generation NVLink: Scaling big data across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA NVLink in A100 doubles the GPU-to-GPU direct bandwidth to 600 GB/s.
- Structural Sparsity: Deep learning networks are getting bigger, reaching billions of parameters. Not all these parameters are needed for accurate predictions and inference, and some can be converted to zeros. Tensor Cores in A100 can provide up to 2X higher performance for sparse models.
- Smarter and Faster Memory: To keep the compute engines fully utilized, it has 1.6 TB/sec of memory bandwidth. In addition, A100 has significantly more on-chip memory, including a 40 MB level 2 cache 7X larger than the previous generation to maximize compute performance.
- Converged Acceleration at the Edge: The Mellanox SmartNIC includes security offloads that decrypt at line rates up to 200 Gb/s and GPUDirect that transfers video frames directly into GPU memory for AI processing.
- CUDA 11: CUDA 11 supports the new hardware capabilities in Ampere architecture to accelerate HPC, genomics, 5G, rendering, deep learning, data analytics, data science, robotics, and many more diverse workloads.
- cuDNN 8: cuDNN 8 is optimized for Ampere architecture GPUs delivering up to 5x higher performance versus Voltas architecture GPUs out of the box. It includes new optimizations and APIs for applications such as conversational AI and computer vision.
After going through all these technicalities, the first question that comes to mind is “Why should I buy it?” or “What’s in it for me?”. To answer this question, first, let us show you this.
Nvidia claims this to be the greatest generational leap ever. The new RTX 3070 provides virtually the same performance as RTX 2080 Ti at less than half the price. The RTX 3080 provides up to 2x the performance of RTX 2080 Super at the same price. RTX 3090 is a beast on a completely different level. To be able to play games in 8K resolution at 60 frames per second is just something else. This is a huge jump in performance per dollar.
Volta vs Ampere Architecture Comparison
Now let us compare the Tesla A100 GPU based on Ampere architecture to the Tesla V100 GPU based on Volta architecture(older).
|A100 FP16 vs. V100 FP16||31.4 TFLOPS||78 TFLOPS||2.5x|
|A100 FP16 TC vs. V100 FP16 TC||125 TFLOPS||312 TFLOPS||2.5x|
|A100 BF16 TC vs.V100 FP16 TC||125 TFLOPS||312 TFLOPS||2.5x|
|A100 FP32 vs. V100 FP32||15.7 TFLOPS||19.5 TFLOPS||1.25x|
|A100 TF32 TC vs. V100 FP32||15.7 TFLOPS||156 TFLOPS||10x|
|A100 FP64 vs. V100 FP64||7.8 TFLOPS||9.7 TFLOPS||1.25x|
|A100 FP64 TC vs. V100 FP64||7.8 TFLOPS||19.5 TFLOPS||2.5x|
|A100 INT8 TC vs. V100 INT8||62 TOPS||624 TOPS||10x|
We can clearly see that the latest Ampere architecture is up to 10x faster compared to the previous Volta architecture. But this comparison was for Ampere based Tesla A100 GPU which was launched way back in May 2020. The RTX 3000 series has a huge amount of CUDA cores ranging from 5888 cores in RTX 3070 to a whopping 10496 cores in RTX 3090 whereas A100 has 6912 CUDA cores. With this, we can only imagine the improvement in performance the RTX 3000 series will have.
Three new GPUs are available in the latest RTX 3000 series based on the Ampere architecture. Let’s go through them one by one.
GeForce RTX 3070
The GeForce RTX 3070 is based on Ampere. It has enhanced Ray Tracing Cores and Tensor Cores, new streaming multiprocessors, and high-speed G6 memory.
- NVIDIA CUDA Cores: 5888
- Base Clock: 1.50 GHz
- Standard Memory Config: 8 GB GDDR6
- Memory Interface Width: 256-bit
- Graphics Card Power: 220 W
- 3rd generation Tensor Cores
- Price: $499
GeForce RTX 3080
The GeForce RTX 3080 delivers the ultra-performance powered by Ampere. It’s built with enhanced RT Cores and Tensor Cores, new streaming multiprocessors, and superfast G6X memory.
- NVIDIA CUDA Cores: 8704
- Base Clock: 1.44 GHz
- Standard Memory Config: 10 GB GDDR6X
- Memory Interface Width: 320-bit
- Graphics Card Power: 320 W
- 3rd generation Tensor Cores
- Price: $699
GeForce RTX 3090
The GeForce RTX 3090 is a GPU with TITAN class performance. It’s powered by Ampere doubling down on ray tracing and AI performance with enhanced Ray Tracing (RT) Cores, Tensor Cores, and new streaming multiprocessors.
- NVIDIA CUDA Cores: 10496
- Base Clock: 1.40 GHz
- Standard Memory Config: 24 GB GDDR6X
- Memory Interface Width: 384-bit
- Graphics Card Power: 350 W
- 3rd generation Tensor Cores
- Price: $1499
The RTX 3070 is claimed to give better performance than the RTX 2080 Ti at about half the price. On the other hand, RTX 3090 is just a beast with 24GB of memory and 10496 CUDA cores. With these GPUs, Nvidia has made a big jump in performance, efficiency, and cost. The RTX 3080 and 3090 are slated to be launched on September 17th, 2020, and September 24th, 2020 respectively, the RTX 3070 will be launched in October 2020.