Project

General

Profile

Bug #12423

OpenCL / Local Contrast / Local Laplacian - issue with AMD/ROCM: 'amplified effect'

Added by Ari El 3 months ago. Updated 11 days ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Lighttable
Target version:
-
Start date:
11/26/2018
Due date:
% Done:

0%

Affected Version:
git master branch
System:
Ubuntu
bitness:
64-bit
hardware architecture:
amd64/x86

Description

The result of applying Local Contrast > Local Laplacian is totally different with OpenCL ON and OpenCL off, for the exact same settings of the module. The more % of Detail is set, the more striking is the difference.

This issue does NOT happen for Local Contrast > Bilateral Grid, it is specific to Local Laplacian. When I select Bilateral Grid, the result is the same regardless of whether OpenCL is enabled or not.

It looks like the Local Contrast > Local Laplacian OpenCL implementation has a problem. See the attached JPG snapshots, taken with the exact same settings of the model, one with OpenCL ON and one with OpenCL off.

The issue happens in BOTH current stable and github master (as downloaded from OBS (opensuse), tested on Ubuntu 18.10, stock Kernel 4.18, and rocm-opencl from AMD.

It looks like with OpenCL enabled, the "Detail" effect of the Local Contrast Filter is like grossly amplified.

Full log attached, which captures: launching darktable, opening the image in darktable mode, disabling and re-enabling the Local Contrast module.

I can see no errors at all in the log or CLI.

darktable-cltest output attached.

To be fair this I have been using amdgpu-pro OpenCL from many months on darktable 2.4.x before and never noticed the problem. It could be that somehow ROCM OpenCL triggers the issue.

I use many other modules that use OpenCL including Denoise (Profiled), that show NO issue with ROCM-OpenCL: they produce the same output regardless of whether OpenCL is enabled or disabled. With OpenCL of course they work much faster.

Hardware: AMD RX-560.

Again this seems to point to some issue with the CL implementation of Local Laplacian.

OpenCL OFF - Local Contrast - Local Laplacian - detail = 200.png (317 KB) Ari El, 11/26/2018 03:08 AM

OpenCL ON - Local Contrast - Local Laplacian - detail = 200.png (431 KB) Ari El, 11/26/2018 03:08 AM

OpenCL Tests - original image.NEF (27.2 MB) Ari El, 11/26/2018 03:09 AM

OpenCL Tests Debug (OpenCL ON).log Magnifier (789 KB) Ari El, 11/26/2018 03:16 AM

OpenCL-darktable-cltest.log Magnifier (31.9 KB) Ari El, 11/26/2018 03:21 AM

OpenCL ON (AMDGPU-PRO OCL instead of ROCM)- Local Contrast - Local Laplacian - detail = 200.png (342 KB) Ari El, 11/26/2018 03:37 AM

History

#1 Updated by Ari El 3 months ago

Ari El wrote:

The result of applying Local Contrast > Local Laplacian is totally different with OpenCL ON and OpenCL off, for the exact same settings of the module. The more % of Detail is set, the more striking is the difference.

This issue does NOT happen for Local Contrast > Bilateral Grid, it is specific to Local Laplacian. When I select Bilateral Grid, the result is the same regardless of whether OpenCL is enabled or not.

It looks like the Local Contrast > Local Laplacian OpenCL implementation has a problem. See the attached JPG snapshots, taken with the exact same settings of the model, one with OpenCL ON and one with OpenCL off.

The issue happens in BOTH current stable and github master (as downloaded from OBS (opensuse), tested on Ubuntu 18.10, stock Kernel 4.18, and rocm-opencl from AMD.

It looks like with OpenCL enabled, the "Detail" effect of the Local Contrast Filter is like grossly amplified.

Full log attached, which captures: launching darktable, opening the image in darktable mode, disabling and re-enabling the Local Contrast module.

I can see no errors at all in the log or CLI.

darktable-cltest output attached.

To be fair this I have been using amdgpu-pro OpenCL from many months on darktable 2.4.x before and never noticed the problem. It could be that somehow ROCM OpenCL triggers the issue.

I use many other modules that use OpenCL including Denoise (Profiled), that show NO issue with ROCM-OpenCL: they produce the same output regardless of whether OpenCL is enabled or disabled. With OpenCL of course they work much faster.

Hardware: AMD RX-560.

Again this seems to point to some issue with the CL implementation of Local Laplacian.

I did another test using the OpenCL libraries extracted from the AMDGPU-PRO driver instead of ROCM-OpenCL. The issue is gone.
So this problem is ROCM-OpenCL -specific. It is also specific to the Local-Laplacian implementation of darktable, all other opencl kernels in Darktable that I have used do not show any problem with ROCM whatsoever.

ROCM implementation is more modern than AMDGPU-Pro and is AMD's driver future, so it would be useful to most AMD users to sort this problem out.

Please let me know if there if there is any debug or test I could do to help isolate the problem.

#2 Updated by Ari El 3 months ago

For the record

  1. apt show rocm-opencl
    Package: rocm-opencl
    Version: 1.2.0-2018111340
    Priority: optional
    Section: devel
    Maintainer: Laurent Morichetti
    Installed-Size: unknown
    Depends: hsa-rocr-dev (>= 1.1.5)
    Download-Size: 41.2 MB
    APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
    Description: OpenCL/ROCm
  • My OpenCL settings in darktablerc: (same settings applied to both ROCM and AMDGPU-PRO)

opencl=TRUE
opencl_async_pixelpipe=true
opencl_avoid_atomics=false
opencl_checksum=3106484759
opencl_device_priority=*/!0,*/*/*
opencl_disable_drivers_blacklist=false
opencl_library=
opencl_mandatory_timeout=200
opencl_memory_headroom=300
opencl_memory_requirement=768
opencl_micro_nap=0
opencl_number_event_handles=150
opencl_scheduling_profile=default
opencl_size_roundup=16
opencl_synch_cache=false
opencl_use_cpu_devices=false
opencl_use_pinned_memory=true

#3 Updated by Piotr Ryszkiewicz 2 months ago

I confirm the bug. I have similar setup, just different graphics - RX 570. Tested with Darktable 2.4.4.

#4 Updated by Ari El 2 months ago

Tested with darktable 2.6RC2 and ROCM 2.0 (which introduces OpenCL 2.0 support)
Same issue :(

apt show rocm-opencl
Package: rocm-opencl
Version: 1.2.0-2018121317
Priority: optional
Section: devel
Maintainer: Laurent Morichetti
Installed-Size: unknown
Depends: hsa-rocr-dev (>= 1.1.5)
Download-Size: 43.1 MB
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: OpenCL/ROCm

#5 Updated by Rene van Rijsselt 18 days ago

Same issue with an AMD Vega 56 and Rocm. The tile-like artifacts are already quite noticeable with detail set to the default 120%. It completely ruined a bunch of printed photos for me (I installed the gfx card and exported previous edits without checking... I was amazed by the speed improvement though!).

Installing the AMDGPU-Pro drivers indeed solves the issue (with the --headless option). Removing the locallaplacian.cl, and thereby forcing cpu for this module, also works great as a temporary workaround.

I tried to compare the cached binaries for the locallaplacian kernel. But the difference is huge and it is difficult to analyze (Rocm binary is 43kB vs Pro 175kB). I also tried to compile the kernel with Radeon GPU Analyzer but that only supported Rocm so cannot be used to compare. I also tried some custom builds of Darktable with different opencl compiler options but with no effect (except disabling optimizations completely freezes my machine). AMD CodeXL can perhaps also give some insights but so far I have had no luck with it.

#6 Updated by Roman Lebedev 18 days ago

It is most likely to be a bug in the driver.
Was this reported upstream already?
If not, i doubt it will ever be fixed.

#7 Updated by Ari El 17 days ago

Thanks @Rene for diving deep into this issue!

@Roman until there is something specific to report in the ROCM tracker it would be difficult to get them engaged. Hoping Rene's analysis will show whether the implementation of locallaplacian is triggering a ROCM bug, or the other way around.

ROCM is not only faster and easier to deploy but also the future for AMD OpenCL so really hopeful to get the bottom of this one.

#8 Updated by Rene van Rijsselt 16 days ago

Managed to trace the issue to the laplacian_assemble kernel. Confirmed that the inputs to this kernel are equal for both drivers (by dumping all the intermediate images to disk). The coarsest level output already shows the first artifacts.

I am not sure yet how to continue the investigation inside this kernel. I am guessing somewhere there is a rounding or boundary error leading to the wrong gamma scale selection. I could try to isolate different parts of the algorithms and keep verifying all the inputs and outputs. But that would take some time without understanding all the math stuff. And if it is some optimization issue in the driver then that would be all for nothing... any ideas?

I wished there was a better way to compare the binary kernels or even generate some readable assembly... All the applications I tried to compile the kernel with seem to use rocm no matter what driver is installed. Anyone know if it is possible to get ISA/assembly with the pro driver? And if it is even possible to compare it to rocm?

#9 Updated by Ari El 11 days ago

Just to confirm that rocm is usable (no other issues besides this one) by changing the extension of the file:

/usr/share/darktable/kernels/locallaplacian.cl to something else.

Even without locallaplacian acceleration rocm is worth it.

@Rene not sure what you are asking about, but just in case this helps: I switch back an forth between rocm and amdgpupro libraries easily - I extracted the libraries

  • libamdocl12cl64, libamdocl64.so, libamdocl-orca64.so,
  • libcltrace.so, libOpenCL.so, libOpenCL.so.1

from the amdgpupro package, placed them in a directory under /opt, and add that directory in ld's configuration like so:

cat /etc/ld.so.conf.d/opencl-amdgpupro.conf
/opt/amdgpu-pro/OpenCL-amdgpupro-Libs

After removing the rocm-opencl package and running ldconfig, darktable-cltest immediately starts using these. To switch back to rocm, just commenting out the entry and running ldconfig does it.

Also available in: Atom PDF