Špatně je na tom to, že ten soubor není stejný. Principem v programování je, aby se něco vypočetlo a aby ten výpočet vedl vždy ke stejnému výsledku, přičemž daný postup vede k nejlepšímu možnému řešení. Pokud se výsledek mění s tím, že je spočten na různém hardware, tak je něco evidentně špatně. K čemu mi bude, že 80bit FPU výpočet převedu na 32bit SSE2, tím ho urychlím 4x, když výstupní kvalita bude horší? A stejně u multithreadingu - čím víc vláken, tím častěji musí být porušeny logické závislosti a tím víc klesá kvalita.
WMV codec:
"Yes, although it's probably negligible in most cases. Multithreaded encoding is done by splitting up the image into splices and then encoding each splice in a separate thread. This can affect the estimations because each thread only has access to a part of the image."
LAME:
"Of course the MT version is of lesser quality if you want to use it in MT, as it requires to disable bit reservoir."
x.264 (MPEG 4 AVC):
"x264 has poor communication between threads regarding the state of the VBV buffer, meaning that it often runs out of bits to encode with and you get serious artifacting on some frames. When encoding a 1-hour TV show from DVD for my iPod I get about 20 or so VBV underflow warnings, even with the VBV patches applied. The underflows are much less frequent (though not eliminated altogether) when encoding with one thread."
VC1:
"Classic multithreaded encoding uses slices, basically cutting the frame into horizontal stripes which are encoded independently.
Speaking of our current VC-1 implementation, we do good rate control between the slices (so if one band has more detail, it'll get more bits than the other bands, in order to keep the overall level of compression constant in each frame). But motion vectors won't cross over between threads. Now, most video doesn't have that much vertical motion (most motion is horizontal), but a wide shot of a pogo-stick contest might not look as good at the same bitrate with 4-slices compared to a single-slice. But most of the time, its fine (especially when you factor in the more accurate work we can do within each slice given the greater horsepower). But we normally recommend to have at least 64 pixels per slice.
Our current codec implementation goes up to 4 slices per instance, and for >4 core machines, we find that doing temporal segmenting (encoding different parts of the movie at the same time) is a better way to speed things up further.
The big reason why codecs do this is because they're predicting future frames compared to the previously predicted frame, and things get gnarly when the reference frame is a different size than the encode frame. There are other approaches that are being looked at, but require more raw horsepower per pixel, and quite a bit more memory for the total encode. But for now, I think segment parallelism is the right approach for >4 core encoding."