Project

General

Profile

Bug #11592

Local contrast module with laplacian crashes on Windows

Added by Peter Budai about 2 years ago. Updated about 2 years ago.

Status:
Fixed
Priority:
Low
Assignee:
-
Category:
Darkroom
Target version:
Start date:
04/25/2017
Due date:
% Done:

100%

Affected Version:
git master branch
System:
Windows
bitness:
64-bit
hardware architecture:
amd64/x86

Description

Select any picture, go to darkroom module
Open "local contrast" module
select "local | laplacian filter"
Turn on the module

Application crashes

Note: Bilateral grid mode is working

Associated revisions

Revision c355237f
Added by Peter Budai about 2 years ago

Replacing free() with dt_free_align() for buffers
which were allocated using dt_alloc_align. Fixes #11592

History

#1 Updated by Tobias Ellinghaus about 2 years ago

  • Status changed from New to Triaged
  • % Done changed from 0 to 20

Jo, Roman and me didn't find anything obvious when looking over the code. It might be some memory corruption of the stack as it typically crashes with gdb complaining about broken call stack and the Windows code that has the unhandled exception is also in some stack traversing functions.

#2 Updated by Tobias Ellinghaus about 2 years ago

(gdb) bt full
#0  0x00000000771cf3b0 in ntdll!RtlUnhandledExceptionFilter ()
   from C:\Windows\SYSTEM32\ntdll.dll
No symbol table info available.
#1  0x00000000771cf9c6 in ntdll!EtwEnumerateProcessRegGuids ()
   from C:\Windows\SYSTEM32\ntdll.dll
No symbol table info available.
#2  0x00000000771d0592 in ntdll!RtlQueryProcessLockInformation ()
   from C:\Windows\SYSTEM32\ntdll.dll
No symbol table info available.
#3  0x00000000771d2204 in ntdll!RtlLogStackBackTrace ()
   from C:\Windows\SYSTEM32\ntdll.dll
No symbol table info available.
#4  0x000000007716d21c in ntdll!RtlIsDosDeviceName_U ()
   from C:\Windows\SYSTEM32\ntdll.dll
No symbol table info available.
#5  0x000007feff1310c8 in msvcrt!free () from C:\Windows\system32\msvcrt.dll
No symbol table info available.
#6  0x000000006362807e in gauss_reduce_sse2 (ht=<optimized out>, wd=743,
    coarse=0x17af3b40, input=0x18400050)
    at C:/darktable/darktable/src/common/locallaplacian.c:207
        cw = 372
        ringbuf = 0x17816070
        rowj = 579
        ch = 290
        stride = 376
#7  local_laplacian_internal (input=0x1ed70050, out=0x14970050, wd=487,
    ht=324, sigma=0.200000003, shadows=1, highlights=1, clarity=0.200000003,
    use_sse2=1) at C:/darktable/darktable/src/common/locallaplacian.c:497
        l = <optimized out>
        num_levels = 8
        max_supp = 128
        w = 743
        h = 580
        padded = {0x18400050, 0x17af3b40, 0x17b5d130, 0x17808bb0, 0x177a64c0,
          0xa4d22a0, 0x177eb190, 0x174f0970, 0x0 <repeats 22 times>}
        output = {0x19010050, 0x17b776e0, 0x17be0cd0, 0x1780f610, 0x1781b410,
          0x1781cf90, 0xa9ea630, 0x17865e30, 0x0 <repeats 22 times>}
        gamma = {0, 0, 0, 0, 0, 0}
        buf = {{0x4080000040800000, 0x4080000040800000, 0x3b8000003b800000,
            0x3b8000003b800000, 0x17b5c540, 0x178177f0, 0x17816070,
            0x17816650, 0x17816c30, 0x17817210, 0x174,
            0x0 <repeats 19 times>}, {0x0 <repeats 30 times>}, {
            0x0 <repeats 14 times>, 0x3fe0000000000000, 0x0,
            0x4069000000000000, 0x0, 0x3ff0000000000000, 0x0, 0x4522a000,
            0x0, 0x45744000, 0x0, 0x0, 0x5952edbfad2581b9, 0x4db9780,
            0xa4eb9e0, 0x4438080, 0x4c8bd20}, {0x4c8b5f0, 0x4438360,
            0x4438330, 0x6367a6b9 <dt_dev_pixelpipe_process_rec+329>, 0x0,
            0x0, 0x4438328, 0x0, 0x4438338, 0x4437f40, 0x4db9820, 0x4cbe080,
            0x25, 0x0 <repeats 17 times>}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
            0x0, 0x0, 0x0, 0x144000001e7, 0x3dff36c3,
            0x0 <repeats 19 times>}, {0x0 <repeats 27 times>,
            0x76ef3ae5 <KERNEL32!K32GetProcessMemoryInfo+85>, 0x0, 0x0}}
#8  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

#3 Updated by Peter Budai about 2 years ago

peterbud> now I have a list of ringbuf allocations/free messages

ringbuf after dt_alloc_align: 000000000ea16e90
ringbuf after dt_alloc_align: 0000000023f1ad60
ringbuf before free: 000000000ea16e90
ringbuf before free: 0000000023f1ad60
ringbuf after dt_alloc_align: 0000000027aa2400
ringbuf before free: 0000000027aa2400
ringbuf after dt_alloc_align: 0000000023bc5050
ringbuf before free: 0000000023bc5050
ringbuf after dt_alloc_align: 000000000e8d2930
ringbuf before free: 000000000e8d2930
ringbuf after dt_alloc_align: 000000000e3d4ba0
ringbuf before free: 000000000e3d4ba0
ringbuf after dt_alloc_align: 000000000e3d7530
ringbuf before free: 000000000e3d7530
ringbuf after dt_alloc_align: 000000000dfcd940
ringbuf before free: 000000000dfcd940
ringbuf after dt_alloc_align: 0000000023a18430
ringbuf before free: 0000000023a18430
ringbuf after dt_alloc_align: 0000000023f47140
ringbuf before free: 0000000023f47140
ringbuf after dt_alloc_align: 00000000240b81a0
ringbuf before free: 00000000240b81a0
ringbuf after dt_alloc_align: 000000002403cf90
ringbuf before free: 000000002403cf90
ringbuf after dt_alloc_align: 000000000e8d4a00
ringbuf before free: 000000000e8d4a00
ringbuf after dt_alloc_align: 000000000e3d3ea0
ringbuf before free: 000000000e3d3ea0
ringbuf after dt_alloc_align: 000000000e3d5f30
ringbuf before free: 000000000e3d5f30
ringbuf after dt_alloc_align: 0000000027aa0860
ringbuf before free: 0000000027aa0860
ringbuf after dt_alloc_align: 0000000023a19930
ringbuf before free: 0000000023a19930
ringbuf after dt_alloc_align: 0000000023f83be0

<LebedevRI> okay, so why the hell is the last "ringbuf before free: " is missing?
<peterbud> No clue, it should be there, as it is before the free() which supposed to crash
<peterbud> maybe even trying to printf the point crashes?
<peterbud> *pointer
<LebedevRI> can you show the bt with all these printfs?
<peterbud> let me try
<peterbud> but one more info:
<peterbud> I have added another printf() within this cycle: https://github.com/darktable-org/darktable/blob/master/src/common/locallaplacian.c#L146
<peterbud> and while tfor each ringbuf it has been executed 200+times, at the crash / at tle last cycle only 15 times - then crashed

<LebedevRI> did we try increasing sizes of buffers yet?
<LebedevRI> try float *ringbuf = dt_alloc_align(16, sizeof(*ringbuf)*stride*10);
<LebedevRI> and float *rows10 = {0};
<peterbud> wait, first here is the bt: https://pastebin.com/maBxVjiz
<peterbud> and searching for the error: "Critical error detected c0000374", I have found this, maybe can help: https://blogs.msdn.microsoft.com/jiangyue/2010/03/15/windows-heap-overrun-monitoring/
<LebedevRI> try only float *ringbuf = dt_alloc_align(16, sizeof(*ringbuf)*stride*10);
<peterbud> increasing buffer sizes (changinf from 5 to 10) is not helping, it still crashes
<peterbud> without the float *rows10 = {0}; ?

<LebedevRI> try both
<peterbud> it seems to me that code somehow overwrites memory area which it should not
<peterbud> I'll try the second version as well
<peterbud> both version crashes
<peterbud> I believe it doesn't matter how big we are allocating, as the error message indicates that the heap block header gots corrupted/overwritten
<LebedevRI> even float *rows10 = {0}; does not help?
<peterbud> None of them
<peterbud> As I read the article, its getting more clear
<peterbud> we got this error only when we are trying to free the memory
<peterbud> as this is the time, when the heap manager checks the integrity
<peterbud> meaning the memory is being overwritten earlier,
<peterbud> no error happens
<peterbud> but when the code tries to free it, then the heap manager detects the error, and then application crashes
<peterbud> does that make sense?
<LebedevRI> yeah
<LebedevRI> then i don't understand yet what overflows
<peterbud> well, I wish I could help, but this part of the code is still chinese to me
<peterbud> but it is using a bunch of raw pointers
  • LebedevRI is as usual trying to do two things at once
    <peterbud> so it could happen easily

#4 Updated by Peter Budai about 2 years ago

  • Status changed from Triaged to Fixed
  • % Done changed from 20 to 100

#5 Updated by Roman Lebedev about 2 years ago

  • Target version set to 2.4.0

Also available in: Atom PDF