Project

General

Profile

Bug #9117

release 1.1 crashing when exporting using OpenCL

Added by Albert Castells about 7 years ago. Updated about 6 years ago.

Status:
Closed: invalid
Priority:
Low
Assignee:
-
Category:
-
Target version:
Start date:
12/06/2012
Due date:
% Done:

0%

Estimated time:
Affected Version:
1.1.0
System:
Ubuntu
bitness:
64-bit
hardware architecture:
amd64/x86

Description

I installed DT 1.1 and it always crashes when exporting to jpg using OpenCL.

If I uncheck OpenCL, restart and export again, everything is fine.

My system: ubuntu 12.10, 8Gbs RAM, NVidia Gt240 1Gb, AMD 640 Quad core, OS installed on SSD, photos on SATA HDD.

darktable_bt_ZBE6OW.txt (51.6 KB) darktable_bt_ZBE6OW.txt Albert Castells, 12/06/2012 06:51 AM
log.txt (16.4 KB) log.txt Albert Castells, 12/06/2012 06:51 AM
tiling_1.c (63.5 KB) tiling_1.c Ulrich Pegelow, 12/06/2012 06:55 PM
darktable_bt_UOW7OW.txt (32.7 KB) darktable_bt_UOW7OW.txt Jiri Netopil, 12/16/2012 09:55 PM
crash.txt (12.7 KB) crash.txt Jiri Netopil, 12/17/2012 09:22 PM
darktable_bt_MD9BPW.txt (57.6 KB) darktable_bt_MD9BPW.txt Jiri Netopil, 12/17/2012 09:22 PM
start_w-o_opencl.txt (800 Bytes) start_w-o_opencl.txt Jiri Netopil, 12/17/2012 09:22 PM
shadhi_1.c (24.1 KB) shadhi_1.c Ulrich Pegelow, 12/18/2012 06:43 AM
darktable_bt_ZHZXPW.txt (60.1 KB) darktable_bt_ZHZXPW.txt Jiri Netopil, 12/18/2012 11:26 PM

History

#1 Updated by Ulrich Pegelow about 7 years ago

Thanks for reporting. I'd like to ask you for some tests.

1) got into $HOME/.config/darktable and edit file darktablerc. Search for config parameter
opencl_memory_headroom, set the value from 300 to 350 and test again. If this didn't help, increase
to 400 and test again

2) if 1) didn't help try out a recent development version of darktable. Either take the recent PPA or
compile it by yourself. There have been some opencl fixes recently. I'd like to know if this already fixes your problem.

3) if 2) didn't help and if you are able to compile darktable by yourself. Please replace file src/develop/tiling.c
by the attached file tiling_1.c. recompile and test please.

#2 Updated by Albert Castells about 7 years ago

None of those things worked.

1) I increased opencl_memory headroom from 300 to 500 (50 each time).

2) Then updated using unstable ppa and also tried compiling it myself.

3) I replaced tiling.c with tiling_1.c (renamed to tiling.c) and I compiled again.

No luck....

#3 Updated by Ulrich Pegelow about 7 years ago

Seems to be a new bug then :(

Judging from your log.txt the crash seems not to happen each and every tim. Can you find out a pattern? Is there
as specific module that triggers the crash. At first sight it seems that module equalizer (atrous) runs
well for several output jobs, whereas module shadows&highlights crashes. Is the pattern that simple?
You should make a few tests with different combinations of modules, focusing on: equalizer, s&h,
nlmeans denoise, highpass.

Another indication is the kind of opencl error before darktable crashes. The value -5 indicates an out-of-resources
situation in your opencl driver (should not lead to a crash normally, but that's life with opencl). I have the
suspicion that lack of event handles might be the cause. In branch opencl there is a way to test this.

You would need to checkout branch opencl: 'git checkout opencl' and recompile everything. After a first
start you will find in $HOME/.config/darktable/darktablerc a new configuration parameter opencl_use_events. Set
this to FALSE and test again please. It will deactivate all use of event handlers.

#4 Updated by Ulrich Pegelow about 7 years ago

BTW could you please also supply an example RAW+XMP for an image that crashes?

You can attach the XMP in redmine. The RAW is too big and needs to be uploaded
somewhere (eg. dropbox). Just post the link here.

#5 Updated by Ulrich Pegelow about 7 years ago

Albert, any news here?

#6 Updated by Albert Castells about 7 years ago

Hi,

On my Dropbox public folder (https://www.dropbox.com/sh/kx416wyptry5xpd/foIqh_Y4Ib/DT_crash) you will find a picture and an xmp.

Using stable version 1.1, I tried exporting picture by picture and this one made Darktable crash.

Then I focused on it and I exported by adding one step of the history stack each time (beginning from 0).

It crashes when having selected the last step of the history, which is a sharpen. If you take a look at the history stack (which is not compressed) you will see another sharpen (step 6) previous to the last one (step 10).

Hope it helps...

#7 Updated by Ulrich Pegelow about 7 years ago

Albert,

thank you for the example image. Here on my system it exports without any issues.

From your description I assume that the problem is initiated by shadows&highlights. For some reason your GPU
runs out of some resources (that's the meaning of error code -5 you get in log.txt). The next time an OpenCL kernel
is called it crashes darktable. This is module sharpen - no. 10 in your history stack.

Module shadows&highlights in this case uses bilateral filter, which we already found to be quite demanding.

Please recheck with darktable 1.1.1 as soon as it's published. Please also do the test I describe in last paragraph
of my posting #3.

#8 Updated by Albert Castells about 7 years ago

  • Target version set to 1.1.1

Just did an update and now version 1.1.1 says "failed to get parameters from storage module, aborting export..."

I tried to remove and reimport again the film roll but I'm unable to export anything from lighttable or darkroom...

#9 Updated by Ulrich Pegelow about 7 years ago

The new issue with 1.1.1 might be related to an installation problem. Darktable will get into problems
when it stumbles over an old incompatible module (darkroom or input/output plugin). Best way to resolve is make
a fresh install after having removed all old darktable binaries. If you compile yourself make sure to remove
./build before calling build.sh.

Concerning your OpenCL issue I have detected one potential problem with the estimation of memory demand for
bilateral filter (used in shadows&highlights). This problem could lead to OpenCL failures when darktable uses
tiling, i.e. during export.

I will need to investigate a bit further and might come up with a provisional fix later for you to test.

#10 Updated by Ulrich Pegelow about 7 years ago

Albert,

any progress on this topic?

#11 Updated by Albert Castells about 7 years ago

I completely removed darktable (sudo apt-get purge darktable) and cleaned packages (sudo apt-get clean/autoclean/autoremove).

Then I installed it again and the crash is still there...

#12 Updated by Ulrich Pegelow about 7 years ago

Sure :)

We need to get some more info to solve the issue.

To me the picture looks like a problem in combination of shadows&highlights and tiling. Within shadows&highlights
it might be the bilateral filter which causes problems.

Maybe you can do the following three tests for us:

1) with current darktable 1.1.1 and the image you supplied earlier: could you please check if it also crashes
when you switch from bilateral filter to gaussian blur in shadows&highlights?

2) with a self-compiled fresh version from branch opencl ('git checkout opencl' and 'git pull'): does the crash
still happen?

3) with the same version as in 2): set parameter opencl_use_events to FALSE in $HOME/.config/darktable/darktablerc.
does it still crash?

#13 Updated by Ulrich Pegelow almost 7 years ago

  • Status changed from Triaged to Incomplete

#14 Updated by Jiri Netopil almost 7 years ago

Hi Ulrich,

when I finished editing of my last set of images after you workarounded the system freezing on RAW open issue, I discovered that I`m affected by dt crash when exporting images into jpgs with opencl on as well. Right now I do export with opencl off and it goes well so far. Tomorrow evening I`ll try the three tests you proposed in your last post and will let you know. I attach my current issue backtrace. I`m running the production 1.1.1 version right now.

#15 Updated by Jiri Netopil almost 7 years ago

Hi Ulrich,

I did all the 3 test scenarios you proposed in your last post. My test result is quite straightforward. It has behaved in the same manner in all those tests this way:
a) Export of not edited image (only basecurve and sharpen modules applied by default) with opencl on: OK
b) Export of image with shadows&highlights module with gaussian blur applied and opencl on: CRASH
c) The same scenario as in b) but bilateral instead of gaussian blur filter applied: OK

Note: The "Warning: Directory Thumbnail, entry 0x0201: Data area exceeds data buffer, ignoring it." message appears two times in terminal on every successful export described in c) above.

And one strange thing. dt crashes even on export of image even with disabled shadows&highlights module in case it has crashed on export of this particular image with shadows&highlights module and gaussian blur applied before.

#16 Updated by Ulrich Pegelow almost 7 years ago

These opencl issues drive me crazy :/

At least this time it seems not to be bilateral. Can you please generate a crash like in b) and redirect darktable's -d opencl
output into a file and attach here?

Also when darktable crashes as you said even without s&h active, please generate an output like above.

#17 Updated by Jiri Netopil almost 7 years ago

At first I have to disclaim may previous info that dt crashes even on image without s&h active. It seems I didn`t deactivated it in fact.

I attach file crash.txt taken at the crash in situation b).

But weird things become to happen. I`m doing all my testing on current opencl branch checkouted from git. At some point dt has become to crash on opening every RAW file a tried. At crash dt generates the attached backtrace. When I run dt after this crash again, it start with opencl disabled and I cannot enable it again because the appropriate checkbox in settings is grayed out. Terminal output says what is attached in start_w-o_opencl.txt file. And what is the most strange, it seems that if I start dt after couple of minutes, it starts with opencl enabled again like no problem was there. It has happened once, now I try to repeat it again. But on opening RAW it crashes again. opencl_omit _whitebalance is TRUE and opencl_use_events is FALSE.

#18 Updated by Jiri Netopil almost 7 years ago

I can confirm my last observation. After waiting some 45 minutes or so I started dt and it has finally start with opencl enabled. In my previous try some 10 minutes before that dt started without opencl enabled with terminal output the same as in file start_w-o_opencl.txt I attached in my last post. I did nothing regarding dt in between these two tries, just waited. Now I`m able to open JPG file, but on opening RAW dt freezes and has to be killed manually. In terminal output nothing interesting here, the last row says: "[pixelpipe_process] [full] using device 0" and only this single line can be found in backtrace: "this is darktable 1.1+100~gd8d12a7 reporting a segfault:". After killing the dt I can start it again, but only without opencl enabled with the same terminal output as in start_w-o_opencl.txt file. I`m spining in a circle:-(

Huh, it looks like another not easy opencl issue to cope with...

#19 Updated by Ulrich Pegelow almost 7 years ago

Of course we don't know how a specific OpenCL device recovers from memory related errors, maybe
there is some garbage collection going on in the background which sorts out things after a time.

On the other hand this smells a bit like a potential thermal problem on your hardware. That would
also explain why dt crashes during export and not during normal runs. Do you have a chance to
monitor your GPU's temperature? Here for NVIDIA there is a small program nvidia-settings that
is able to do that. You could even limit the GPU frequency as a test. Maybe there is something similar for
your AMD card.

On the software side you might test the attached file as a replacement. The only thing it does is
freeing the intermediate buffers in shadows&highlights later. No idea if this might help.

#20 Updated by Ulrich Pegelow almost 7 years ago

Jiri, if you have a chance please also check latest branch opencl.

#21 Updated by Jiri Netopil almost 7 years ago

Hi Ulrich,

thermal problem is not the case for sure. I have may graphics card placed in wind tunnel even though it is not overclocked. I monitor my GPU temperature using the AMDOverdriveCtrl tool and it remains under 50 Celsius degree. Moreover the crash occurs at the very first exported image and it don`t crash at all when s&h module is not used or is used with bilateral instead of gaussian blur filter applied.

I propose to focus on the crash on export issue, not the other weird things I described in my last two posts. They may be some consequence of export crash and may disappear when we get rid of it.

I will test latest opencl branch and then your file this evening.

#22 Updated by Ulrich Pegelow almost 7 years ago

Jiri, another test I'd like to ask you to do. Still in branch opencl go to src/common/opencl.c and add the following code into line 173:

cl->dev[dev].max_mem_alloc = 128*1024*1024;

#23 Updated by Jiri Netopil almost 7 years ago

Ulrich, my test findings follows.

First I pulled the opencl branch (approx. at 21 oclock). I didn`t apply the shadhi_1.c file because I realised it is already changed according to the git pull output. I set opencl_use_events=FALSE and opencl_omit_whitebalance=TRUE, checked that opencl is on and begun testing with these results:

1) RAW image with s&h and gaussian blur => CRASH, terminal says following:
[pixelpipe_process] [export] using device 0
tiling->factor 4.034867
tiling->maxbuf 1.017433
tiling->factor 4.034867
tiling->maxbuf 1.017433
[default_process_tiling_cl_ptp] use tiling on module 'shadhi' for image with full size 4608 x 3072
[default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 3760 x 3072 and overlap 400
[default_process_tiling_cl_ptp] tile (0, 0) with 3760 x 3072 at origin [0, 0]
[opencl copy_buffer_to_image] could not copy buffer: -5
Neoprávněný přístup do paměti (SIGSEGV)

2) RAW image with s&h and bilateral filter => NO CRASH, terminal says following:
[pixelpipe_process] [export] using device 0
tiling->factor 2.000086
tiling->maxbuf 1.000000
[opencl_sharpen] couldn't enqueue kernel! -5
[opencl_pixelpipe] failed to run module 'sharpen'. fall back to cpu path
[opencl_colorout] couldn't enqueue kernel! -5
[opencl_pixelpipe] failed to run module 'colorout'. fall back to cpu path
Warning: Directory Thumbnail, entry 0x0201: Data area exceeds data buffer, ignoring it.
Warning: Directory Thumbnail, entry 0x0201: Data area exceeds data buffer, ignoring it.
[export_job] exported to `/home/jirka/FOTO/Praha/test/darktable_exported/img_0001_02.jpg'

3) then I changed nothing and repeated previous test #2 => CRASH!, terminal says following:
[pixelpipe_process] [export] using device 0
[opencl_demosaic] couldn't enqueue kernel! -5
[opencl_pixelpipe] failed to run module 'demosaic'. fall back to cpu path
backtrace written to /tmp/darktable_bt_ZHZXPW.txt
Neoprávněný přístup do paměti (SIGSEGV)

The backtrace is attached. I tried it several times on several images with the same result. Export with gaussian blur fails always at the first try. While export with bilateral filter passes OK at the first try, but makes crash at the second try no matter if I keep bilateral filter or change it into gaussian blur.

After these tests I added the line 173 into src/common/opencl.c and compilled. Now exports works without any crash no matter if gaussian blur or bilateral filter is applied! After the successful export terminal says following:
[pixelpipe_process] [export] using device 0
[default_process_tiling_cl_ptp] use tiling on module 'demosaic' for image with full size 4608 x 3072
[default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 2728 x 3072 and overlap 6
[default_process_tiling_cl_ptp] tile (0, 0) with 2728 x 3072 at origin [0, 0]
[default_process_tiling_cl_ptp] tile (1, 0) with 1892 x 3072 at origin [2716, 0]
[default_process_tiling_cl_ptp] use tiling on module 'basecurve' for image with full size 4608 x 3072
[default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 2728 x 3072 and overlap 0
[default_process_tiling_cl_ptp] tile (0, 0) with 2728 x 3072 at origin [0, 0]
[default_process_tiling_cl_ptp] tile (1, 0) with 1880 x 3072 at origin [2728, 0]
[default_process_tiling_cl_ptp] use tiling on module 'colorin' for image with full size 4608 x 3072
[default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 2728 x 3072 and overlap 0
[default_process_tiling_cl_ptp] tile (0, 0) with 2728 x 3072 at origin [0, 0]
[default_process_tiling_cl_ptp] tile (1, 0) with 1880 x 3072 at origin [2728, 0]
tiling->factor 4.034867
tiling->maxbuf 1.017433
tiling->factor 4.034867
tiling->maxbuf 1.017433
[default_process_tiling_cl_ptp] use tiling on module 'shadhi' for image with full size 4608 x 3072
[default_process_tiling_cl_ptp] (3 x 1) tiles with max dimensions 2680 x 3072 and overlap 400
[default_process_tiling_cl_ptp] tile (0, 0) with 2680 x 3072 at origin [0, 0]
[default_process_tiling_cl_ptp] tile (1, 0) with 2680 x 3072 at origin [1880, 0]
[default_process_tiling_cl_ptp] tile (2, 0) with 848 x 3072 at origin [3760, 0]
[default_process_tiling_cl_ptp] use tiling on module 'sharpen' for image with full size 4608 x 3072
[default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 2728 x 3072 and overlap 5
[default_process_tiling_cl_ptp] tile (0, 0) with 2728 x 3072 at origin [0, 0]
[default_process_tiling_cl_ptp] tile (1, 0) with 1890 x 3072 at origin [2718, 0]
[default_process_tiling_cl_ptp] use tiling on module 'colorout' for image with full size 4608 x 3072
[default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 2728 x 3072 and overlap 0
[default_process_tiling_cl_ptp] tile (0, 0) with 2728 x 3072 at origin [0, 0]
[default_process_tiling_cl_ptp] tile (1, 0) with 1880 x 3072 at origin [2728, 0]
Warning: Directory Thumbnail, entry 0x0201: Data area exceeds data buffer, ignoring it.
Warning: Directory Thumbnail, entry 0x0201: Data area exceeds data buffer, ignoring it.
[export_job] exported to `/home/jirka/FOTO/Praha/test/darktable_exported/img_0001.jpg'

So it seams that the new line 173 in src/common/opencl.c did the trick!

I observed one more thing. During export with bilateral filter applied the system becomes unresponsive for 3 times per single image for approx. 2.5 seconds each time. It has happened before the change in opencl.c and it happens after it as well. But it doesn`t lead to crash. It does not happens with gaussian blur.

#24 Updated by Tobias Ellinghaus about 6 years ago

  • System set to Ubuntu

What is the state of this bug? Can it be closed?

#25 Updated by Simon Spannagel about 6 years ago

  • bitness set to 64-bit
  • % Done changed from 20 to 0
  • Status changed from Incomplete to Closed: invalid

apparently yes.

Also available in: Atom PDF

Go to top