How to choose card for GPU acceleration?

questions about practical use of Neat Video, examples of use
cdmikelis
Posts: 7
Joined: Thu May 17, 2012 6:21 pm

560ti vs 570 and 2500K vs 2600K

Post by cdmikelis »

I took my time and compared GTX560ti and GTX570 on same machine on same clip with same NeatVideo settings, to see if much more expensive graphics card pays off with faster rendering time.-

It does not (in most real life situations).

Here are links to quite some test results (performance optimizing in NeatVideo):
http://dl.dropbox.com/u/9205816/NeatVid ... _560ti.txt
http://dl.dropbox.com/u/9205816/NeatVid ... eo_570.txt
http://dl.dropbox.com/u/9205816/NeatVid ... _570OC.txt

On short:
- Smaller the frame size, less difference between GPU's. Obviosly latency of ram and other components plays big role here. The bigger the frame size, the bigger the difference between GPU's.
- overclocking CPU and GPU does have obvious effects again only at big frame size like 4K.
- Differences in performance are visible when hit you "Optimize" NeatVideo filter, but in real life when you render clip, they are more or les gone in most of time. Rendering 200 frames with 570 finishes 2 seconds before 560ti or 4 seconds before if 570 is maximum overclocked. That means 8 minutes less rendering time if rendering 30 minutes of video (rendering time 133 min instead of 141). If that is worth buying much more expensive 570, everyone himself should consider.

Comparison of test I did.
from "optimize" option in Neat Filter.
HD= 1920x1080 [1080p]
SD= 720x576 [576p]
4K= "4K" option
8b = 8bit
32b = 32bit
Radus= always 1
570= [ENGTX570 DCII/2DIS/1280MD5]
570OC= overclocked to 900/4000 Mhz as is 560ti from [GV-N560OC-1GI]
[x]= fps best option calculated of "optimize"

SD8b = 560ti[50], 570[52,6], 570OC[52,6]
SD32b = 560ti[43,5], 570[45,5], 570OC[47,6]
HD8b = 560ti[11,5], 570[13,2], 570OC[14,5]
HD32b = 560ti[10,6], 570[11,2], 570OC[12,2]
4K8b = 560ti[2,98], 570[3,38], 570OC[3,51]
4K32b = 560ti[2,41], 570[2,7], 570OC[2,75]

It can be seen improvement from my previous posts. I changed version of NeatVideo from 3.1.0 to 3.2.0 and optimised (OCed) RAM from 9-9-9-27@1600 to 9-10-9-24@1866. All other things are same, I had latest nVidia driver.

My real life tests on a 1920x1080 clip. Rendering with 560ti = 40-42 sec. Rendering with overclocked 570 (which doubles the power consuption compared to 560ti) 37-39 sec.

In my various projects I need to clean as much as 50% 1440x1080 50i HDV, 30% 1280x720 50p HDV and 19% various SD footage and 1% fullHD. No 4k yet... So in most cases I operate in frame sizes where speed is held back with frame transport to processor and storage, rather than filter speed. 570 it seems was waste of my money.

I can now either start playing games in my free time (GPU's give huge difference in games, though!), sell the 570GPU and buy 560ti again, invest in super speed overclocked ram (My Asus P8Z86-V PRO can have up to 2200Mhz ram), or sell everything but GPU and go to SandyBridge-e with 4 chanel RAM.

Is there anyone tryed with realy tuned up X79 platform yet to see if any improvement in speed? Looking benchmarks of quad channel RAM the badwidth is much much bigger than with double channel.

**********
Regarding of 2500K vs 2600K:
2600K wins everytime at same clock, even if Neat chooses as best option to only 4 cores being used in i7. 2500K must be OCed much higher to cope with 2600K and saving here is not good favor (always buy i7 for video!). I have not exact figures with 2500K, but what is rendering 37-42 seconds on 2600K (@4500Mhz), that takes 120+ seconds on 2500K @4300Mhz. The only fugure I remembered as 2500K was so slow I didn't want to waste my time with this CPU. 2500K was used with 560ti on Same MoBo,Ram,SSD.

**********
REGARDS,
MIHAEL
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

Thank you very much for extensive testing and posting your results. It looks like they did a very good job with GTX560ti. It was developed significantly later than GTX570, which perhaps allowed the developers to better optimize the parameters to offer the performance that is so close to a more expensive GTX570.

Regarding the four-channel memory and new generation of CPUs, we have a sample measurement:
System: P9X79 PRO, 3930K @ 4000 Mhz, DDR3-2133

Frame: 1920x1080 progressive, 8 bits per channel, Radius: 1 frame
Running the test data set on up to 12 CPU cores

CPU only (1 core): 1.8 frames/sec
CPU only (2 cores): 3.65 frames/sec
CPU only (3 cores): 5.32 frames/sec
CPU only (4 cores): 6.94 frames/sec
CPU only (5 cores): 8.06 frames/sec
CPU only (6 cores): 9.17 frames/sec
CPU only (7 cores): 9.62 frames/sec
CPU only (8 cores): 10 frames/sec
CPU only (9 cores): 10.1 frames/sec
CPU only (10 cores): 10.2 frames/sec
CPU only (11 cores): 10.3 frames/sec
CPU only (12 cores): 10.2 frames/sec
The fast four-channel memory does indeed help.

Vlad
cdmikelis
Posts: 7
Joined: Thu May 17, 2012 6:21 pm

Post by cdmikelis »

Hi, it's me again :)

Last week I put Gigabyte 660ti GPU in. It renders exactly the same speed as my current GTX570. But that was expected.

My new update regards RAM. As I previously suspected that ram speed is VERY important, today I can again confirm that. I changed Kingston HyperX Blu 1600 (oced to 1866) with 2133 Genesis version. Upgrade costed 23 EUR (sold old ones, bought new ones). It brought 9,9% (let roud it to 10%) speed improvement! It's not yet overclocked RAM :)

Optimizing best NV settings before always stuck at 4 core+ GPU. But now 7 cores+ GPU. Just RAM stick change gained such improvement:

CPU from 8,55 -> 9,26 fps
GPU from 9 > 12,7 fps (!)*
CPU+GPU from 14,1 -> 16,9 fps.

* GTX570 is now paying back for my patience, not tossing it out. :)

10% exceeded all my expectations.

Bye, MIHAEL
vvulture
Posts: 24
Joined: Sat Jul 23, 2011 5:51 am

Post by vvulture »

Well, here are my results... Was expecting better to be honest :-( I can't believe a single GTX570 is faster than my 2 x 6950's... something is wrong with this...

Using :
Win 7 64bit
AMD FX-8150 Stock
2 x HD6950 GPU's
16Gig 1866Mhz RAM


Frame: 1920x1080 progressive, 8 bits per channel, Radius: 1 frame
Running the test data set on up to 8 CPU cores and on up to 2 GPUs

CPU only (1 core): 0.978 frames/sec
CPU only (2 cores): 1.96 frames/sec
CPU only (3 cores): 2.86 frames/sec
CPU only (4 cores): 3.64 frames/sec
CPU only (5 cores): 3.98 frames/sec
CPU only (6 cores): 4.22 frames/sec
CPU only (7 cores): 4.5 frames/sec
CPU only (8 cores): 4.74 frames/sec
GPU only (AMD Radeon HD 6900 Series #1): 5.41 frames/sec
GPU only (AMD Radeon HD 6900 Series #2): 5.41 frames/sec
GPU only (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (1 core) and GPU (AMD Radeon HD 6900 Series #1): 4.63 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #1): 5.46 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #1): 6.21 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1): 6.62 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.35 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.41 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.63 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.81 frames/sec
CPU (1 core) and GPU (AMD Radeon HD 6900 Series #2): 4.44 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #2): 5.43 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #2): 6.25 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #2): 6.85 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.3 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.35 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.63 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.87 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 8.06 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.17 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.09 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.09 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 8.93 frames/sec

Best combination: CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2)
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

That looks normal. We received reports of HD6950 performance in NeatBench and the results were quite similar.

Please also remember that is the first version of AMD GPU support. We will of course try to further optimize it to make NV run faster on AMD cards, and on NVidia cards too.

Vlad
vvulture
Posts: 24
Joined: Sat Jul 23, 2011 5:51 am

Post by vvulture »

Vlad,
Looking at my results above your last post, can you explain why my CPU+GPU score does not improve on my GPU only score ?

I would think that my GPU+CPU score would be a few fps faster than GPU only..

Thx.
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

Usually CPU adds some speed indeed. In this case, I guess, either CPU itself or the system RAM is so busy with providing data to the GPUs that CPU-based processing (which also requires memory bandwidth) is slowed down to the point when CPU no longer contributes much to the overall processing speed. CPU does contribute when only one GPU is working, which seems to be in line with my guess.

Vlad
cdmikelis
Posts: 7
Joined: Thu May 17, 2012 6:21 pm

i7-3930K + GTX570 - not worth the money (right now)

Post by cdmikelis »

NVTeam wrote:Usually CPU adds some speed indeed. In this case, I guess, either CPU itself or the system RAM is so busy with providing data to the GPUs that CPU-based processing (which also requires memory bandwidth) is slowed down to the point when CPU no longer contributes much to the overall processing speed. CPU does contribute when only one GPU is working, which seems to be in line with my guess.

Vlad
Hi,

I again forked some big money out in a respect to gain some performance, but I lost big this time.

I have DX79SR board from Intel + i7-3930K + 16GB HyperX 2133 (same sticks from my previous Z68 system) + bunch of Intel and Crucial SSD's.

RAM SPEED is everything with NeatVideo, I can confirm.
On Z68 I gained up to 26GB/s RAM WRITE SPEED - vs - 22GB/s at most on that "server" platform (measuring with same AIDA64, which is only dual-threaded ram bechmark).

So I actulally LOST performcance. CPU does render a bit faster (10,5 as it is stated above), but my GTX570 now does ONLY 7,5 fps, which is much less than on Z68. On this platform CPU+GPU does not gain eny benefit unless RAM is overclocked. Performance in NEATVIDEO is same if I give 4 sticks of ram or just 2, to this board.

CONCLUSION: NeatVideo is optimized for DUAL channel ONLY.
While Sisoftware SANDRA2012 shows over 50GB/s on that sistem with 4 sticks of ram, NeatVideo does not benefit from this.

SO right now it is best to have IvyBridge (or even SandyBridge with Z68), really fast ram and OC'ed CPU. Even regarding money investment! It's really cheap now to buy used Z68 platform with 2600K. :)

But I hope NEATVIDEO v4 will be 4-channel optimized too.

best regards,
MIHAEL
vvulture
Posts: 24
Joined: Sat Jul 23, 2011 5:51 am

Post by vvulture »

An update...

I upgraded my ram to 2133Mhz CL9 and it made zero difference...
I overclocked my cpu and it made zero difference...
I overclock my video cards and i get on average about 0.5 fps gain

I have no idea what my limiting factor is now...
cdmikelis
Posts: 7
Joined: Thu May 17, 2012 6:21 pm

reply to vvulture -

Post by cdmikelis »

Hi vvulture.

1) Do you have HDD or SSD?
I have 2x SSD system, where O.SYSTEM and TEMP FILES is on first 6Gbps SSD, but video files are on second 6gbps SSD.

2) Did you confirm ram speed by some propper software? Does it really work at that frequency? When overclock do not work (too high settings) system boots from "old" (stock) settings, but in BIOS it is sometimes still shown as it works overclocked. If you experienced NO difference something could be wrong. Overclocking CPU in my system ALWAYS show bigger performance, because whole system works faster. Overclocking GPU showed improvement only if RAM was not limiting factor. So with 1600 ram no difference, with 1866 some, with 2133 more.

3) Overclocking RAM on GPU boasts more fps than GPU engine itself. At least with my nVidia.

4) I don't know how modern AMD arhitecture works, because I left that team since I'm in video. I still have office AMD computers, where they shine. But for video AMD is not proper decision, as your 8-core results show. Your score is only a fraction better as my old Q9550 was able ( (oced a little).

5) On my new machine (6-core Intel X79) my GPU now makes only 7,6 fps, which half what did on my previous Z68 platform. Now NeatVideo does not use GPU anymore if I overclock CPU. Only at stock frequencies it uses GPU. But overal I'm stuck at 10-11fps either way. I'm crying to :(

Please run AIDA64 RAM benchmark tests and post results. Than run SiSOFTWARE SANDRA 2012 benchamrak and post results (GB/s). At AIDA64, beside results there will be written frequency and timings for ram, too. You can check current settings. My ram is CL11 BTW.

6) Does your mobo even support 2133 RAM?
vvulture
Posts: 24
Joined: Sat Jul 23, 2011 5:51 am

Post by vvulture »

1 - Just using a standard 7200rpm SATA 2.0 300Gig drive
2 - RAM speed is confirmed at 2133Mhz ( 9-11-10-28 2T )
3 - Overclocking GPU yielded very small gain 9.43fps up from 9.17.
4 - Agreed
5 - RAM 22 GB/s transfer speed
6 - Mobo supports 2133 yes

cheers
apefos
Posts: 14
Joined: Thu Jun 28, 2012 11:45 am

a way to predict

Post by apefos »

I did some math calculation to try to predict the GPUs performance in this topic:

http://www.neatvideo.com/nvforum/viewtopic.php?t=839

I was trying to predict the benefits from the new GTX650ti and I found some interesting results.
apefos
Posts: 14
Joined: Thu Jun 28, 2012 11:45 am

3570k + GTX650Ti benchmark

Post by apefos »

Ivy Bridge 3570k OC 4.2GHZ Turbo Off + Gigabyte GTX650Ti 2GB 1033MHZ
Works great, no issues.
Real World test rendering footage (Radius 2, 1920x1080p):
Cpu only 4 cores = 3.51 fps
Cpu 4 cores + GTX650Ti 2GB = 4.54 fps
1440 frames footage (1 minute 24p footage) Cineform 422 I frames 1920x1080p file
Cpu only 4 cores render time = 6min50sec
Cpu 4 cores + GT650Ti 2GB 1033MHZ render time = 5min17sec
(1.03 fps increase) 1.273x speed increase compared to cpu only
The real world performance is slower than Neatvideo benchmark, but is pretty good.
I chose Radius 2 to do the tests because it is the best results for my denoise from GH2 footage

Benchmarks from Neatvideo Optimize tool (shows more fps than real render):

One curious observation: the benchmarks for Cpu only 2 cores and Cpu 2 cores + Gpu is very close to the real render speed using cpu 4 cores only and cpu 4 cores + gpu.

Frame: 1920x1080 progressive, 8 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU

CPU only (1 core): 1.75 frames/sec
CPU only (2 cores): 3.55 frames/sec
CPU only (3 cores): 4.98 frames/sec
CPU only (4 cores): 6.02 frames/sec
GPU only (GeForce GTX 650 Ti): 3.86 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 3.66 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.52 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 6.06 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 7.25 frames/sec

Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)


Frame: 1920x1080 progressive, 16 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU

CPU only (1 core): 1.43 frames/sec
CPU only (2 cores): 3.03 frames/sec
CPU only (3 cores): 4.22 frames/sec
CPU only (4 cores): 4.93 frames/sec
GPU only (GeForce GTX 650 Ti): 3.73 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 2.71 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.37 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 5.59 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 6.33 frames/sec

Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)


Frame: 1920x1080 progressive, 32 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU

CPU only (1 core): 1.73 frames/sec
CPU only (2 cores): 3.45 frames/sec
CPU only (3 cores): 4.81 frames/sec
CPU only (4 cores): 5.59 frames/sec
GPU only (GeForce GTX 650 Ti): 3.76 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 3.28 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.65 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 6.29 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 6.9 frames/sec

Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)
Last edited by apefos on Sat Mar 02, 2013 12:17 am, edited 1 time in total.
peederj
Posts: 4
Joined: Thu May 16, 2013 9:17 pm

FCPX on a 2011 MacBook Pro 17"

Post by peederj »

On my late 2011 MacBook Pro 17" running Neat Video 3.3 in FCPX 10.8 and OS 10.8.3 the Radeon HD 6770M takes so long to spin up it looks it actually is slower to use than CPU only. The progress pie takes a while to show its first wedges and then speeds up so maybe it would work faster on a stream rather than in the benchmark? I had hoped GPU acceleration would be a bigger payoff than it appears to be, but this is actually a cost. I found 50% GPU memory (half of the 1GB available) to be the fastest setting. Changing the radius and bit depth did not alter this equation.

Please let me know if there's anything I'm doing wrong or if that's just the way it is.

Code: Select all

Frame: 1920x1080 progressive, 32 bits per channel, Radius: 1 frame
Running the test data set on up to 8 CPU cores and on up to 1 GPU

CPU only (1 core): 1.59 frames/sec
CPU only (2 cores): 3.11 frames/sec
CPU only (3 cores): 4.17 frames/sec
CPU only (4 cores): 4.61 frames/sec
CPU only (5 cores): 4.61 frames/sec
CPU only (6 cores): 4.5 frames/sec
CPU only (7 cores): 4.31 frames/sec
CPU only (8 cores): 4.15 frames/sec
GPU only (ATI Radeon HD 6770M): 1.73 frames/sec
CPU (1 core) and GPU (ATI Radeon HD 6770M): 1.52 frames/sec
CPU (2 cores) and GPU (ATI Radeon HD 6770M): 2.79 frames/sec
CPU (3 cores) and GPU (ATI Radeon HD 6770M): 3.12 frames/sec
CPU (4 cores) and GPU (ATI Radeon HD 6770M): 3.03 frames/sec
CPU (5 cores) and GPU (ATI Radeon HD 6770M): 3 frames/sec
CPU (6 cores) and GPU (ATI Radeon HD 6770M): 2.99 frames/sec
CPU (7 cores) and GPU (ATI Radeon HD 6770M): 2.9 frames/sec
CPU (8 cores) and GPU (ATI Radeon HD 6770M): 2.82 frames/sec

Best combination: CPU only (4 cores)
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

With most mobile GPUs, that is currently quite typical that the CPU alone is faster. Setting the GPU memory to 50-70% may help but not much, just because the GPU itself is not really fast enough to beat the CPU.

We continue to work on further optimizations, so the balance of power, CPU vs GPU, may change in the future.

Vlad
Post Reply