Pixel Pipeline seems not to get "optimized" with OpenCL enabled => causing performance drops (especially on low-end GPU)
I've recently switched to Manjaro Linux again from Windows, thus needing a replacement for my Windows photo workflow software (PhotoDirector). I've remembered the darktable project and took a closer look at it.
I'm also quite deep into GPU programming, have made projects using GPU acceleration directly with OpenGL and was happy to read about OpenCL support in darktable. I also managed to get darktable running on OpenCL on my Intel I5 7th gen with integrated graphics (HD 620 or so) by compiling master branch without NEO Blacklist entry (don't have those problems mentioned in #12541 by the way).
With that setting I experienced similar problems as mentioned in #12308 (better performance without OpenCL) which was marked down as "slow GPU". However, as for my personal experience in GPU programming, it's close to impossible for a CPU to beat the power of even a low-end GPU, especially by a factor of 3 as in my case as far as the application is able to do some demanding work on it which is worth the memory copy overhead (darktable should easily be able to ;) ). So I started profiling to find out actually why this is.
I've decided to make a new issue for this and not only comment on #12541, as this bug entry is pretty "generic" and does not mention a certain problem. I've made a very simple test setup using a 16MP RAW-image from my Sony NEX-5R (ARW format) and changing Exposure as a single action (it's also the only action visible in the stack), and compared the acutal work done for activated / deactivated OpenCL. Total processing Time was around 0.3s with CPU vs. 0.9s with GPU, i've done this several times for each setting.
I've attached some files showing the results. As a brief summary:
In CPU mode, darktable seems to omit (obviously reuse results of) certain, partly very time consuming actions, namely "Demosaic", "White balance", "Highlights reconstruction" and "Raw Black/White levels" (names translated from german so they might differ a bit). They seem to be computed once and being reused, thus leading to a very "clean" pipeline with few actions for changing Exposure setting, performing quite well.
In OpenCL mode actually the opposite: results of above mentioned actions seem not to be reused, resulting in a much larger pixel pipeline including all steps being processed every time for "exposure" action. They come alongside with several additional colorspace transformations which is not really surprising though. One can see that, comparing action level performance, processing times are better with GPU, as expected. However, the far more complex pipeline leads to larger overall processing times.
It might be that this problem is in fact overcompensated by high performance GPU's, leading to the short answer "slow gpu". Although this issue is especially sensible for low-end GPU's, I guess anyone will have a benefit from it if it can be fixed. Of course I'm not as deep into darktables architecture to decide whether this is possible or not.