

All the better for us running Linux!
All the better for us running Linux!
This is great, but the context is that this is for specific inner loops, and it is compared to the C version of that specific inner loop. Typically what was used before this on a computer with avx512 was the avx2 version of the inner loop, and the speedup compared to that version appears to be up to 60%: https://x.com/FFmpeg/status/1852542388851601913 . Then as not a specific inner loop isn’t run all the time, the speedup is probably much less than 60%. This is still sizeable, but the actual speedup in practice with this implementation is far far from 94x.
Yeah 7000-series Ryzen benefits from the avx512 code paths in ffmpeg. I’ve benchmarked a 5900x vs a 7900x specifically for software H.265 decoding and there was a sizeable difference.
For example, maybe branching is something you’d like to be able to do without it being a nightmare?
Mmm yes, compress me harder baby