Neat Blog

RTX 4090. New and powerful?

It’s time, NVIDIA has launched its new RTX 4090. The company has promised it will deliver outstanding results and perform 2x to 4x faster than the previous generation flagship RTX 3090 Ti.

After the first look at the specifications of RTX 4090 we were not 100% convinced to see that much of an improvement.

As we have written in our previous articles, the main factors affecting GPU efficiency in Neat Video include:

  • GPU processing power which is mostly dictated by the number of GPU cores and their frequency
  • GPU memory size (VRAM size)
  • GPU memory bandwidth
  • CPU-GPU connection speed (or, to be more precise, the speed of data exchange between main system memory and GPU memory). This is determined by the bus interface.

The comparison of those characteristics in RTX 3090, RTX 3090 Ti and RTX 4090 doesn’t give a clear picture how NVIDIA has been able to speed up its new GPU.

Let’s have a look at the numbers:

 Processing power
(GFLOPS),
single precision
Number of GPU coresFrequency
(MHz)
VRAM
(GB)
Memory
bandwidth
(GB/s)
Bus interface
RTX 309029.28410,496139524936PCIe 4.0
RTX 3090 Ti33.54610,7521560241008PCIe 4.0
RTX 409073.07316,3842230241008PCIe 4.0

As you can see, while NVIDIA has increased the number of GPU cores and their frequency, the other factors affecting performance have not changed. Most importantly, the internal GPU memory bandwidth did not increase. What this means is, yes, sure, every second all the GPU cores of RTX 4090 can perform 2.17x more calculations than those of RTX 3090 Ti, but most likely they won’t as data will not be arriving in time. This was our line of thinking.

To be honest, we were very skeptical before running our tests on RTX 4090…

Speed race

But if NVIDIA is claiming NVIDIA RTX 4090 is much more powerful than its predecessor, they must have some proof. There should be a way how they have managed to speed up the new GPU utilizing other resources, otherwise customers won’t trust them again. Right?

To begin our comparison of RTX 3090 and RTX 4090 we ran NeatBench test with FullHD frames:

 RTX 3090 speed, FPSRTX 4090 speed, FPSSpeed improvement
FullHD clip, 2 frames radius67.9120.077%
FullHD clip, 5 frames radius49.386.475%

WOW! Who could have thought… a 77% speed increase is definitely more than we expected of RTX 4090. Not the advertised x2–x4 improvement, but still quite impressive.

Now, let’s check what happens when we increase frame size to 4K.

 RTX 3090 speed, FPSRTX 4090 speed, FPSSpeed improvement
4K clip, 2 frames radius20.631.352%
4K clip, 5 frames radius15.323.654%

Still not too bad, although somewhat less spectacular. Let’s keep going and run the 8K tests:

 RTX 3090 speed, FPSRTX 4090 speed, FPSSpeed improvement
8K clip, 2 frames radius 5.316.8529%
8K clip, 5 frames radius3.935.3135%

The 8K figures are getting closer to our initial expectations. But still, how does the GPU manage to perform so well on smaller frames, especially on FullHD? And why is the gain over its predecessor becoming smaller when we increase the frame size?

The “new” kid on the block

To get the answer to this mystery, we unpacked our microscope and looked at the specs again. We have already checked the core configuration, memory bandwidth and CPU interconnect, but these parameters alone have offered no explanation. What are we missing?

The answer came from our years of experience in optimizing CPU code: L2 cache. While NVIDIA has been equipping its GPUs with L2 cache for ages, its amount has been quite small and increasing slowly from generation to generation. So small that its slight variation between different GPUs has not affected Neat Video performance. Until RTX 4090 that is. While RTX 3090 and 3090 Ti only have 6 MB of cache, the 4090 has whopping 72 MB on-board! That compares like an elephant to a pony…

So, what does the large amount of cache mean? Basically, RTX 4090 now has a large pool where it can hold lots of data that it is currently working with. It’s easy to get any byte of information for the current task. RTX 3090 and RTX 3090 Ti have smaller pools and when they need to get some data, that required data is often out of the pool and they need to switch the tap on and wait until all needed information has flown into the pool from VRAM. When that happens the old data gets pushed out of the pool and if you need to get it back, then you’ll have to switch the tap on again…

That is why we are seeing such a great speed improvement when running FullHD and 4K tests. The 72 MB cache pool is large enough to hold a significant portion of working data set, so calculations are done extremely quickly thanks to the increased computing power of the new GPU.

But when we throw a 8K clip onto RTX 4090, its advantage over RTX 3090 is dropping significantly as the 72 MB cache is no longer enough and the pool tap is getting used more often.

Keep in mind that the tests we ran were bench tests and not those of real-life render performance. When you add other consumers of GPU resources (most importantly VRAM), like your main video editing application, other effects, etc. the speed improvement isn’t going to be as high as we have seen in the tests. That is true for all frame sizes.

Should you imminently upgrade?

RTX 4090 is certainly the fastest GPU we have seen so far. In fact, this is the first graphics card capable of running Neat Video denoising on 4K clips in real time (assuming that the host application is efficient enough in its supply of data of course). Well done NVIDIA!

If you are on the market for a new GPU, then yes, RTX 4090 (and most likely RTX 4090 Ti in the future) could be a good option. However, in case you are on a budget, then RTX 3090 or RTX 3090 Ti will deliver great results and you won’t be disappointed.

Should you get this video card if you already have RTX 3090 or RTX 3090 Ti? Probably no. Those two cards are still quite fast for most rendering tasks. However if you do want to get a RTX 4090, that card won’t disappoint.