Project

General

Profile

Bug #11167

Downscaling without demosaicing: artefacts

Added by Roman Lebedev about 2 years ago. Updated 12 months ago.

Status:
In Progress
Priority:
Low
Assignee:
-
Category:
Darkroom
Target version:
Start date:
09/20/2016
Due date:
% Done:

50%

Affected Version:
git master branch
System:
all
bitness:
64-bit
hardware architecture:
amd64/x86

Description

New MIP_F code only downscales, preserving mosaic.
While this is awesome, it currently produces somewhat-visible color noise, "confetti", near the edges.
It would be nice to get rid of those artefacts.

See also: https://github.com/darktable-org/darktable/pull/1263#issuecomment-247379812

pistons-3a76ec89e.png - bayer-subsample-bandlimited (117 KB) Dan Torop, 04/21/2017 07:18 PM

pistons-master.png - git master (127 KB) Dan Torop, 04/21/2017 07:18 PM


Related issues

Related to darktable - Bug #11272: Module `demosaic' outputs NaNs! [preview] Fixed 10/29/2016

Associated revisions

Revision 517622ca
Added by Dan Torop about 2 years ago

demosaic: hack to interpolate previews

Right now for previews, roi_out->scale is always 1, hence
full_scale_demosaicing will generally occur. This can produce negative
values in the results of demosaic for images with very dark pixels,
perhaps because the good demosaicing algorithms produce negative values
in response to noisy low-quality downscaled data? Auto exposure will set
a negative black level in order to compensate for these outlying
negative pixels, making the image low-contrast, see #11340.

This change is a total hack which ignores the value of roi_out->scale if
demosaic is working on a preview. Please see #11167, as there are other
problems here -- MIP_F's coming out of demosaic in the preview pipe
should be smaller 720x450, rather than, as now, approx. 1440x900.

Revision 8273513e
Added by Dan Torop about 2 years ago

demosaic: downscale MIP_Fs

Up until demosaic, work with larger 1-channel images, but as 4-channel
make them fit in same size buffer by halving height/width.

See bug #11167 for description of edge artifacts. This downscaling
lessens the visibility of the artifacts, though doesn't solve them.

Also cleans up the formatting of the downscaling calls in process().

Revision 551f29a6
Added by Dan Torop almost 2 years ago

imageop_math: blur while downsample (x-trans)

For the x-trans int path, do a simple blur while downsampling mosaiced
images. This gets rid of color fringes (confetti). See #11167.

History

#1 Updated by Roman Lebedev about 2 years ago

  • Target version set to 2.2.0

#2 Updated by Roman Lebedev about 2 years ago

  • Related to Bug #11272: Module `demosaic' outputs NaNs! [preview] added

#3 Updated by Dan Torop about 2 years ago

Have been looking at this all-too-sporadically.

As a thought-experiment, I replaced the contents of dt_iop_clip_and_zoom_mosaic_third_size_xtrans() with:

  for(int y = 0; y < roi_out->height; y++)
  {
    const uint16_t *pin = in + in_stride * y;
    uint16_t *pout = out + out_stride * y;
    for(int x = 0; x < roi_out->width; x++, pin++, pout++)
      *pout = *pin;
  }

In other words, instead of downscaling the image, it merely crops it. I still see colored fringes at image edges, which suggests to me that the flaw isn't in the downscaling algorithm but somewhere else (presumably further along) in the pipeline. I'll follow through on that speculation, but wanted to note even something so preliminary.

I've made some fixes to the x-trans downscaling code in https://github.com/dtorop/darktable/tree/xtrans-better-downscale. Mostly to get rid of residual not-useful whitepoint handling and to optimize a bit (it seems slightly faster to not calculate all 3 colors when downscaling if only using one, now that the output is mosaiced). This branch still needs work, and regardless doesn't address the confetti artifacts. I haven't yet implemented sub-sampling as in the Bayer code.

Incidentally, is there any code path which would lead to dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f() getting called?

#4 Updated by Roman Lebedev about 2 years ago

Dan Torop wrote:

Have been looking at this all-too-sporadically.

That is still more than what i'm doing :)

As a thought-experiment, I replaced the contents of dt_iop_clip_and_zoom_mosaic_third_size_xtrans() with:

[...]

In other words, instead of downscaling the image, it merely crops it. I still see colored fringes at image edges, which suggests to me that the flaw isn't in the downscaling algorithm but somewhere else (presumably further along) in the pipeline. I'll follow through on that speculation, but wanted to note even something so preliminary.

Aha, interesting..

I've made some fixes to the x-trans downscaling code in https://github.com/dtorop/darktable/tree/xtrans-better-downscale. Mostly to get rid of residual not-useful whitepoint handling and to optimize a bit (it seems slightly faster to not calculate all 3 colors when downscaling if only using one, now that the output is mosaiced).

Yep, makes sense and looks good except the last line (*outc = (uint16_t)(col[c] / (num * div[c]));), i'm not too sure about integer division here.

This branch still needs work, and regardless doesn't address the confetti artifacts. I haven't yet implemented sub-sampling as in the Bayer code.

Incidentally, is there any code path which would lead to dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f() getting called?

Yes. https://github.com/darktable-org/darktable/blob/a7d0fd4a1dbca5c8da184eee5eb402728257b17e/src/common/mipmap_cache.c#L1061-L1065
Take any X-Trans RAF raw file, select it in lighttable, and click on "create hdr" in "Selected image(s)" module in lighttable.
That DNG would be still mosaiced, with X-Trans mosaic, and with the data stored as floats.
And then import that DNG and open it in darkroom.

Thanks for looking into this!

#5 Updated by Dan Torop about 2 years ago

Good to know about HDR and that is where to find a float codepath.

And good point about the integer division. Fixed in that branch, too many optimizations piling on each other.

I'm not sure if there'd be a speed-up in the Bayer codepath to only calculate one color per downscale or if the requisite conditionals would slow things back down. I can experiment, but think none of that is as pressing as eliminating the confetti.

More if/as I get any traction on the latter, and thank you for information and looking over the code.

#6 Updated by Dan Torop about 2 years ago

Some questions on this (actually from a couple weeks ago, but I got waylaid, and was hoping to answer them on my own):

In mipmap_cache.c _init_f(), since 91d54ac8cfe15e052a745e05012c79ddcb9fab9f, mosaiced MIP_F's start out the pipeline with *2 greater width/height than demosaiced ones. My understanding of the reasoning for this is that while MIP_F's should be no larger than 720x450 in their demosaiced state (same as mip2), in the case of mosaiced Bayer MIP_F's, we start out with them in the pipeline at up to twice that on each dimension (maximum of 1440x900). This is in expectation of a quick downscale of them in demosaic via dt_iop_clip_and_zoom_demosaic_half_size_f() (or for X-Trans dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f()). But I don't actually see that quick downscale happening in demosaic's process(), at least in the x-trans case (which I tested). The modify_roi_in()/modify_roi_out() code in demosaic seems to guarantee that roi_out->scale is always 1, hence full_scale_demosaicing is always true.

And to make things more tricky, as 91d54ac8cfe15e052a745e05012c79ddcb9fab9f notes, the x2 scaling coeff is right for Bayer, but X-Trans still needs work (it should be x3). The catch to making this happens is that _init_f() is dealing with an already allocated buffer which is meant for up to 720x450 4-channel, which turns out to be just fine for up to 1440x900 1-channel. But not workable, of course for up to 2160x1350 1-channel, which is what X-Trans should want. I don't think dt_mipmap_cache_alloc(), which allocates the buffer, knows enough to understand if it is dealing with an X-Trans image and hence should allocate the extra space.

I'm not clear if any of this actually relates to the confetti edges, nor if two downscales (once in init_f and another in the demosaic iop) could be part of the problem. But throwing all this thinking out while I get back to experimenting, particularly in case it turns out to be a false lead on which someone (Roman?) could easily set me right.

#7 Updated by Johannes Hanika about 2 years ago

oh, you're saying that the mipf buffer got doubled? that may explain some weird behaviour i've been seeing with expensive modules (that preview pipeline took like 4x the amount of time to process the full buffer..). not sure this is a good idea. while memory etc is not an issue with the mosaiced mipf, once it is interpolated this makes the preview pipe run very slowly (and even a 1400x buffer seems excessive for the little preview image in the top left corner..). i'd be in favour of making this smaller again.

#8 Updated by Ulrich Pegelow about 2 years ago

Please be very cautious here. Effectively reducing size of the preview buffer can have unexpected negative effects on several modules which depend on it. Namely those that need to see the full image to shape their parameter settings. Ashift is an example: it derives all values for automatic correction by fitting lines that are detected from the preview buffer. Why the preview buffer? The full buffer can be scaled to whatever level and we often only see a minor portion of the full image there. I'd hate to see ashift break days before the release.

#9 Updated by Roman Lebedev about 2 years ago

Johannes Hanika wrote:

oh, you're saying that the mipf buffer got doubled?

The mipf size did not get doubled. but since it no longer contains RGBA, but mosaic, the mipf now is 4x the area it used to be (i.e. width and height are both 2x)

that may explain some weird behaviour i've been seeing with expensive modules (that preview pipeline took like 4x the amount of time to process the full buffer..). not sure this is a good idea.

Back when i changed it, i did tell you that i changed it

while memory etc is not an issue with the mosaiced mipf, once it is interpolated this makes the preview pipe run very slowly (and even a 1400x buffer seems excessive for the little preview image in the top left corner..).

I think demosaic should ideally use halfsampling for preview, and scale it down.

i'd be in favour of making this smaller again.

Not unless you prefer ugly pixellization of preview.
So at least not before this issue is solved properly.

#10 Updated by Roman Lebedev about 2 years ago

Dan Torop wrote:

Some questions on this (actually from a couple weeks ago, but I got waylaid, and was hoping to answer them on my own):

In mipmap_cache.c _init_f(), since 91d54ac8cfe15e052a745e05012c79ddcb9fab9f, mosaiced MIP_F's start out the pipeline with *2 greater width/height than demosaiced ones.

Right.

My understanding of the reasoning for this is that while MIP_F's should be no larger than 720x450 in their demosaiced state (same as mip2),

True. At least that was the case before my MIPF changes.

in the case of mosaiced Bayer MIP_F's, we start out with them in the pipeline at up to twice that on each dimension (maximum of 1440x900).
This is in expectation of a quick downscale of them in demosaic via dt_iop_clip_and_zoom_demosaic_half_size_f() (or for X-Trans dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f()).

Yes, quick downscale is expected, but the real reason was
1. all that MIPF size was just laying around unused for downscaled mosaic
2. it "fixed" ugly pixellization of preview.

But I don't actually see that quick downscale happening in demosaic's process(), at least in the x-trans case (which I tested). The modify_roi_in()/modify_roi_out() code in demosaic seems to guarantee that roi_out->scale is always 1, hence full_scale_demosaicing is always true.

Again, i do think that demosaic should do halfsampling, but it currently does not do it.
I'll try to have a look, i have an idea.

And to make things more tricky, as 91d54ac8cfe15e052a745e05012c79ddcb9fab9f notes, the x2 scaling coeff is right for Bayer, but X-Trans still needs work (it should be x3).

Wrong commit id maybe?
I got 2x coeff this way:
1. We have MIPF 720x450 x4 channels
2. but we do not output 4 channels
3. which means, we can double each dimension (2*2=4)
4. so we have MIPF, holding image with size 1440x900 and one channel

There was no other thought behind it, at all.

The catch to making this happens is that _init_f() is dealing with an already allocated buffer which is meant for up to 720x450 4-channel, which turns out to be just fine for up to 1440x900 1-channel. But not workable, of course for up to 2160x1350 1-channel, which is what X-Trans should want.

You have logic inversion here, i did not want it to have such a size that stupid half/third sampling can always just happen, but it just happens to be true for bayer.

I don't think dt_mipmap_cache_alloc(), which allocates the buffer, knows enough to understand if it is dealing with an X-Trans image and hence should allocate the extra space.

Indeed, MIPF size was meant to stay constant, always, for any content it to hold.
Ideally, the fix needs to be not by changing MIPF size, but other logic downstream.

I'm not clear if any of this actually relates to the confetti edges, nor if two downscales (once in init_f and another in the demosaic iop) could be part of the problem.

I'm thinking it is somehow caused by combination of both the init_f and another in the demosaic iop.
Else, we would have seen it before, without this smart downsampling.

But throwing all this thinking out while I get back to experimenting, particularly in case it turns out to be

a false lead on .. someone (Roman?)

No, i'm pretty sure i'm the one to blame here :D

#11 Updated by Roman Lebedev about 2 years ago

Ulrich Pegelow wrote:

Please be very cautious here. Effectively reducing size of the preview buffer can have unexpected negative effects on several modules which depend on it. Namely those that need to see the full image to shape their parameter settings. Ashift is an example: it derives all values for automatic correction by fitting lines that are detected from the preview buffer. Why the preview buffer? The full buffer can be scaled to whatever level and we often only see a minor portion of the full image there. I'd hate to see ashift break days before the release.

+1. the preview size after demosaic should be no smaller than MIPF.
However, i do think demosaic should use halfsampling and output smaller buffer than it does currently for preview..

#12 Updated by Roman Lebedev about 2 years ago

Dan Torop wrote:

the x2 scaling coeff is right for Bayer, but X-Trans still needs work (it should be x3).

BTW, due to the unusable borders that are cropped off in rawprepare, the demosaic is not guaranteed to receive that whole 1440x900 image.
E.g. with canon 5dmk4 sample i just tested, demosaic receives 1327x890 image.

#13 Updated by Dan Torop about 2 years ago

All very interesting.

Roman Lebedev wrote:

The catch to making this happens is that _init_f() is dealing with an already allocated buffer which is meant for up to 720x450 4-channel, which turns out to be just fine for up to 1440x900 1-channel. But not workable, of course for up to 2160x1350 1-channel, which is what X-Trans should want.

You have logic inversion here, i did not want it to have such a size that stupid half/third sampling can always just happen, but it just happens to be true for bayer.

So it sounds like both Bayer and X-Trans mosaiced images should continue come out of _init_f() at 1440x900 1-channel. This will fully inhabit the buffer dt_mipmap_cache_alloc() allocates, without putting extra logic into the allocation. Then in the demosaic iop, in the case of Bayer, they should be quicky downsampled via dt_iop_clip_and_zoom_demosaic_half_size_f() to 720x450 4-channel, but X-Trans would get a quick downsample via dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f() to 480x300. Assuming that that 480x300 give the rest of the pipe sufficient information?

The pre-demosaic modules in the pipeline will run a bit slower with a 1440x900 image than they did prior to the PR 1263 MIP_F revamp, when they dealt with 720x450 4-channel images, but the overall improvement is worth it?

BTW, due to the unusable borders that are cropped off in rawprepare, the demosaic is not guaranteed to receive that whole 1440x900 image.
E.g. with canon 5dmk4 sample i just tested, demosaic receives 1327x890 image.

Definitely, and even before rawprepare, due to aspect ratio, e.g. for an X-Trans image I'm seeing _init_f() producing 1363x900 output.

#14 Updated by Roman Lebedev about 2 years ago

Dan Torop wrote:

All very interesting.

Roman Lebedev wrote:

The catch to making this happens is that _init_f() is dealing with an already allocated buffer which is meant for up to 720x450 4-channel, which turns out to be just fine for up to 1440x900 1-channel. But not workable, of course for up to 2160x1350 1-channel, which is what X-Trans should want.

You have logic inversion here, i did not want it to have such a size that stupid half/third sampling can always just happen, but it just happens to be true for bayer.

So it sounds like both Bayer and X-Trans mosaiced images should continue come out of _init_f() at 1440x900 1-channel. This will fully inhabit the buffer dt_mipmap_cache_alloc() allocates, without putting extra logic into the allocation.

Yes.

Then in the demosaic iop, in the case of Bayer, they should be quicky downsampled via dt_iop_clip_and_zoom_demosaic_half_size_f() to 720x450 4-channel, but X-Trans would get a quick downsample via dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f() to 480x300. Assuming that that 480x300 give the rest of the pipe sufficient information?

Actually now that i think about it, no.
The preview pipe, at the end, should produce the image with size of MIPF, i.e. 720x450, not "demosaic should output MIPF-sized buffer".
Because the may be some modules after demosaic that change size of the processed buffer (lens, borders, cropping).

Whether demosaic can/can't do fast half/third -sampling is not very important.
Even if it will always do full demosaic, it will still be faster than what currently happens - the rest of the preview pipe is processing ~1440x900 image.

The pre-demosaic modules in the pipeline will run a bit slower with a 1440x900 image than they did prior to the PR 1263 MIP_F revamp, when they dealt with 720x450 4-channel images, but the overall improvement is worth it?

Most of the modules that user iteracts with are located after demosaic, so this shouldn't be too much of a performance hit.

Also, if all goes well, i will add a inpainting module working on mosaic data, and for that you definitely want as much of a resolution as you can, as long it does not come with too much of a cost.

BTW, due to the unusable borders that are cropped off in rawprepare, the demosaic is not guaranteed to receive that whole 1440x900 image.
E.g. with canon 5dmk4 sample i just tested, demosaic receives 1327x890 image.

Definitely, and even before rawprepare, due to aspect ratio, e.g. for an X-Trans image I'm seeing _init_f() producing 1363x900 output.


I have looked at making demosaic output smaller image for preview pipe. Right now it results in even more artefacts.
So i'm proposing to not think about decreasing preview pipe image size, but first fixing artefacts.

#15 Updated by Dan Torop about 2 years ago

Roman Lebedev wrote:

The preview pipe, at the end, should produce the image with size of MIPF, i.e. 720x450, not "demosaic should output MIPF-sized buffer".
Because the may be some modules after demosaic that change size of the processed buffer (lens, borders, cropping).

This makes sense

Whether demosaic can/can't do fast half/third -sampling is not very important.

Indeed in current master it is doing slow (full) demosaicing.

Even if it will always do full demosaic, it will still be faster than what currently happens - the rest of the preview pipe is processing ~1440x900 image.

I have looked at making demosaic output smaller image for preview pipe. Right now it results in even more artefacts.
So i'm proposing to not think about decreasing preview pipe image size, but first fixing artefacts.

Yes, artifacts are the worrying part. If even a full demosaic as in the current master branch produces them, I worry that there is still some other cause for them?

#16 Updated by Roman Lebedev about 2 years ago

Dan Torop wrote:

I have looked at making demosaic output smaller image for preview pipe. Right now it results in even more artefacts.
So i'm proposing to not think about decreasing preview pipe image size, but first fixing artefacts.

Yes, artifacts are the worrying part. If even a full demosaic as in the current master branch produces them, I worry that there is still some other cause for them?

[22:49:46] <hanatos> also, super weird that we get these massive coloured fringes in the mipf demosaicing
[22:49:55] <hanatos> must be some stupid signal sub-sampling
[22:50:05] <hanatos> if i find some time i shall play with more blurring here :)
[22:50:16] <hanatos> probably needs some blur on green and red-green and blue-green

Though, #11340 seem to be not another manifestation of this issue, but a proper issue on it's own.

#17 Updated by Dan Torop about 2 years ago

I created PR #1382 for at least sake of discussion. One argument is that the artifacts are so notable because viewing the preview in place of the full image at 100% is essentially pixel-peeping, and all the flaws of the quickly rendered preview are visible. By forcing a quick downscale of the preview by 50% during demosaic, some of the flaws are less visible (plus the rest of the pipeline should run faster). I believe that this matches the preview resolution as it was prior to MIP_Fs revamps, but haven't tested this.

I still need to work on sub-sampling.

#18 Updated by Roman Lebedev almost 2 years ago

  • Target version changed from 2.2.0 to 2.4.0

#19 Updated by Dan Torop almost 2 years ago

I updated https://github.com/dtorop/darktable/commits/xtrans-better-downscale such that for X-Trans demosaicing, there are no longer edge artifacts -- though on the other hand, the results are more blurry. I also rebased onto current master.

Ideas behind these changes:

  1. Prior to the new MIP_F code, subsampling was useful in order to simultaneously downscale and demosaic at reasonable quality. As dt_iop_clip_and_zoom_mosaic_third_size_xtrans() and ilk now only downsample, they can be vastly simpler. They no longer needs to align to the CFA matrix with special cases for subsampling edges which don't align to that matrix. This should be true for both the Bayer and X-Trans case.
  2. The confetti at edges may be aliasing which is the result of naively downsampling/decimating the mosaiced data. Hence it makes sense to naively anti-alias the results. I do this via expand the dimensions of the downsampled block by 2/3 (a number fairly naively chosen by experimentation). I experimented briefly with anti-aliasing the green channel less, with the argument that it was higher frequency, but couldn't produce better results. Someone with a proper understanding of signal processing could do better?

Some of PR 1382 still seems useful (forcing quick demosaic of previews in the demosaic iop), but I haven't yet integrated this here.

If these changes seem sensible, I'd be interesting to apply them to dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f() as well as the various Bayer downscaling paths. That, plus integrating what is good about 1382, could make for healthy improvement.

#20 Updated by Roman Lebedev almost 2 years ago

  • % Done changed from 0 to 50
  • Status changed from New to In Progress

Dan Torop wrote:

I updated https://github.com/dtorop/darktable/commits/xtrans-better-downscale such that for X-Trans demosaicing, there are no longer edge artifacts -- though on the other hand, the results are more blurry. I also rebased onto current master.

Hi!
Very nice!

Ideas behind these changes:

  1. Prior to the new MIP_F code, subsampling was useful in order to simultaneously downscale and demosaic at reasonable quality. As dt_iop_clip_and_zoom_mosaic_third_size_xtrans() and ilk now only downsample, they can be vastly simpler. They no longer needs to align to the CFA matrix with special cases for subsampling edges which don't align to that matrix. This should be true for both the Bayer and X-Trans case.
  2. The confetti at edges may be aliasing which is the result of naively downsampling/decimating the mosaiced data. Hence it makes sense to naively anti-alias the results. I do this via expand the dimensions of the downsampled block by 2/3 (a number fairly naively chosen by experimentation). I experimented briefly with anti-aliasing the green channel less, with the argument that it was higher frequency, but couldn't produce better results. Someone with a proper understanding of signal processing could do better?

So i have pushed everything but the last commit, which should go as a new pr, i'd like jo to take a look at it,
and i kept the output division as float.

That last commit does seem to work, very good.

Some of PR 1382 still seems useful (forcing quick demosaic of previews in the demosaic iop), but I haven't yet integrated this here.

If these changes seem sensible, I'd be interesting to apply them to dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f() as well as the various Bayer downscaling paths. That, plus integrating what is good about 1382, could make for healthy improvement.

Please Pr just that last commit, and let's discuss applying similar fixes to other codepaths in that pr.

#21 Updated by Dan Torop almost 2 years ago

Roman Lebedev wrote:

So i have pushed everything but the last commit, which should go as a new pr, i'd like jo to take a look at it,
and i kept the output division as float.

Good point regarding output division. In this case is *outc = (uint16_t)((float)col / (float)num); just a slower and verbose version of *outc = col / num;? If so, I will change this appropriately.

That last commit does seem to work, very good.

Please Pr just that last commit, and let's discuss applying similar fixes to other codepaths in that pr.

Great! I created PR 1397 as suggested.

#22 Updated by Dan Torop almost 2 years ago

I've got another piece of the puzzle, and alas it turns out while PR 1397 of my recent PRs helped with X-Trans artifacts, PR 1382 made X-Trans and Bayer previews needlessly blurry. The problem is that quick downscaling routines in imageop_math are no longer appropriate for MIP_Fs:

Since the MIP_F revamp (PR 1263), MIP_Fs are not downscaled in the demosaic iop but rather in _init_f() (which for mosaiced MIP_Fs scales them to a bit smaller than 1440x900). PR 1382 makes the demosaic iop always perform a quick downscale (via dt_iop_clip_and_zoom_demosaic_half_size_f() or dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f()) for MIP_Fs. But these functions downscale and demosaic images by merging CFA cells into single pixels, which only works if the scale is <1/2 or <1/3 (Bayer/X-Trans). As the demosaic iop now always uses a scale of 1 for previews (MIP_Fs), the result is terribly blurry previews (especially for X-Trans).

I think a better solution would be to perform the fastest possible full demosaic in the case of MIP_Fs. Even with a full demosaic, the demosaic iop is relatively fast for previews, e.g. at a quick test on a laptop, something like 0.08 sec. for a decent full demosaic vs. 0.04 sec. for a (flawed) downsample.

Another possible fix is to go back to downscaling MIP_Fs in the demosaic iop (by 1/2 for Bayer, 1/3 for X-Trans), though as Ulrich Pegelow points out, this may cause routines later in the pipeline which depend on preview data to produce worse results.

Another PR forthcoming to try to fix this. Apologies for the wrong call in PR 1382.

#23 Updated by Dan Torop almost 2 years ago

It looks like the best solution is just to use the same demosaic method in the preview pipeline as the full pipeline would use. Anything worse looks lousy, and the demosaic only needs to be done once per preview (unless something changes at or before demosaic in the pipeline).

#24 Updated by Dan Torop almost 2 years ago

I roughed out a version of the Bayer downscale which:

  1. linear interpolates to produce a 4-channel image
  2. performs a Gaussian blur on each channel
  3. samples the blurred image at intervals to produce a 1-channel downsampled mosaic

The good news is that there are fewer (no?) artifacts, presumably due to the Gaussian blur bandlimiting the image. The result also just plain looks good, like a properly subsampled image. I'm using src/common/gaussian.c with sigma of px_footprint (1/roi_out->scale) and DT_IOP_GAUSSIAN_ZERO.

The bad news is that it's painfully slow, and allocates a bunch of memory. To eliminate a lot of wasted processing work, I'll try to collapse the steps. Also, I'll experiment if Gaussian blur is the best choice. I haven't checked if the bandlimited results address #11340.

Also, though edge artifacts appear to be gone, blown out highlights are notably magenta -- unlike the same image processed via the full pipeline. More as I sort this out.

#25 Updated by Roman Lebedev almost 2 years ago

Dan Torop wrote:

I roughed out a version of the Bayer downscale which:

  1. linear interpolates to produce a 4-channel image
  2. performs a Gaussian blur on each channel
  3. samples the blurred image at intervals to produce a 1-channel downsampled mosaic

Nice, could you please push just that as a branch, that is exactly what i think i need for highlight inpainting.
I have (had) a kind-of-working code, but variety is good :)

Also, though edge artifacts appear to be gone, blown out highlights are notably magenta -- unlike the same image processed via the full pipeline. More as I sort this out.

That is totally expected, and totally unacceptable. The problem is, if you average a pixel slightly above clipping threshold (1.0, but here the threshold will be different per-channel, say 15000), and a pixel below threshold, the result will be below clipping threshold (thus highlight clipping won't touch it), but it also will be wrong, as you can see.

#26 Updated by Dan Torop almost 2 years ago

Nice, could you please push just that as a branch, that is exactly what i think i need for highlight inpainting.
I have (had) a kind-of-working code, but variety is good :)

I put the branch so far up at https://github.com/dtorop/darktable/tree/bayer-subsample-bandlimited. It's a pretty messy bunch of commits, and as noted above, a naive/wasteful implementation (very proof-of-concept). Certainly I am curious on any thoughts on the results, and instances where it fails.

For a next step, it seems possible to combine the interpolation and the Gaussian blur. That is, for every output pixel, go to a lookup to find the input pixels by weight which would produce an interpolated/blurred result.

I'm not 100% sure that the linear interpolation is the right way to go -- it seems a bit baroque to perform what is essentially a sloppy demosaic of the image, just to bandlimit it, but I don't know a way otherwise to bandlimit just one color of a mosaic. I'm also curious if the linear interpolation is good enough for this purpose (in quick tests it does seem good, though). Or, indeed, if it is overkill and some sort of nearest-neighbor would be good enough (and faster!). Also, hanatos suggested working on R-G and B-G planes, which could be interesting...

Also, though edge artifacts appear to be gone, blown out highlights are notably magenta -- unlike the same image processed via the full pipeline. More as I sort this out.

That is totally expected, and totally unacceptable. The problem is, if you average a pixel slightly above clipping threshold (1.0, but here the threshold will be different per-channel, say 15000), and a pixel below threshold, the result will be below clipping threshold (thus highlight clipping won't touch it), but it also will be wrong, as you can see.

Interesting, that makes sense. That's helpful in figuring out what to fix, though I don't have a great idea yet for a solution...

#27 Updated by Dan Torop almost 2 years ago

An in-progress update on this:

I rolled my own Gaussian blur code, and the code now interpolates as it bandlimits and downsamples. Things are edging beyond proof-of-concept, though not ready for PR/review. The in-progress code (Bayer non-SSE only) is at https://github.com/dtorop/darktable/tree/bayer-subsample-bandlimited.

Good news:
  • Recent changes get rid of a big intermediary buffer and a bunch of code, and generally speed things up.
  • The results continue to be a vast improvement (preview way more smooth, preview in 100% view almost entirely without artifacts), though I think I may have introduced a bug in the least few versions as I'm seeing a bit of purple/green fringing on fine detail again.
Bad news:
  • It is still substantially slower (4x-9x slower compared to the pre v2.2.2 non-SSE path!). As a worst case example, for a 50MP image, on my 2012-era laptop, the pre v2.2.2 non-SSE downscale path takes 0.1 sec, vs. about 0.8 sec. for my branch. A 16MP image required 0.07 sec. but now takes nearly 0.3 sec. This is principally because the Gaussian blur kernel is pretty wide, and the more the downscale, the wider the kernel needs to be. I think it would take a new approach to reach a speed near that of the prior code.
  • I still haven't dealt with the blown-out highlights turning magenta, though that'll be the next thing at which I look.
  • The downscale code allocates an intermediary buffer, principally to store roi_in->width * roi_out->height 1-channel float pixels. There's probably a way to make this smaller (perhaps even roi_in->height * kernel_width) at the cost of more complexity.

#28 Updated by Roman Lebedev almost 2 years ago

Dan Torop wrote:

An in-progress update on this:

I rolled my own Gaussian blur code, and the code now interpolates as it bandlimits and downsamples. Things are edging beyond proof-of-concept, though not ready for PR/review. The in-progress code (Bayer non-SSE only) is at https://github.com/dtorop/darktable/tree/bayer-subsample-bandlimited.

Good news:
  • Recent changes get rid of a big intermediary buffer and a bunch of code, and generally speed things up.
  • The results continue to be a vast improvement (preview way more smooth, preview in 100% view almost entirely without artifacts), though I think I may have introduced a bug in the least few versions as I'm seeing a bit of purple/green fringing on fine detail again.

Cool!

Bad news:
  • It is still substantially slower (4x-9x slower compared to the pre v2.2.2 non-SSE path!). As a worst case example, for a 50MP image, on my 2012-era laptop, the pre v2.2.2 non-SSE downscale path takes 0.1 sec, vs. about 0.8 sec. for my branch. A 16MP image required 0.07 sec. but now takes nearly 0.3 sec. This is principally because the Gaussian blur kernel is pretty wide, and the more the downscale, the wider the kernel needs to be. I think it would take a new approach to reach a speed near that of the prior code.
  • I still haven't dealt with the blown-out highlights turning magenta, though that'll be the next thing at which I look.

I'm not quite sure how this could possibly be done.
At that point, the whitelevel (i.e. the threshold used for detection of this clipped/garbage highlights) is simply not known.
And it can't really be known. Only rawprepare iop knows about it.
And it can be changed by user. And then if user changes it, the thumbnail will be wrong.
Which was the very point why i removed demosaicing from MIP_F generation...

This is not what you wanted to hear, i know :)

  • The downscale code allocates an intermediary buffer, principally to store roi_in->width * roi_out->height 1-channel float pixels. There's probably a way to make this smaller (perhaps even roi_in->height * kernel_width) at the cost of more complexity.

#29 Updated by Dan Torop almost 2 years ago

I still haven't dealt with the blown-out highlights turning magenta, though that'll be the next thing at which I look.

I'm not quite sure how this could possibly be done.
At that point, the whitelevel (i.e. the threshold used for detection of this clipped/garbage highlights) is simply not known.
And it can't really be known. Only rawprepare iop knows about it.
And it can be changed by user. And then if user changes it, the thumbnail will be wrong.
Which was the very point why i removed demosaicing from MIP_F generation...

This is not what you wanted to hear, i know :)

Indeed, but makes sense. It does seem possible to use some heuristic, for example if the pixel brightness is >=90% of the maximum brightness for that channel, to use an interpolated value for it rather than a blurred one. This gets rid of big magenta areas (at least in my one test image), though the edges can of blown out areas can still turn magenta... Will look it over...

#30 Updated by Dan Torop almost 2 years ago

Another thought is to make the blur be adapative/selective, so that it only goes up to its full strength in more contrasty areas. Hence it should still bandlimit the image, but there wouldn't be a lot of CPU time wasted using a big kernel on softer areas of the image, and at least some blown out areas wouldn't get blurred into their surroundings.

This is getting into subjective territory, of course...

#31 Updated by Dan Torop over 1 year ago

I still haven't dealt with the blown-out highlights turning magenta

At that point, the whitelevel (i.e. the threshold used for detection of this clipped/garbage highlights) is simply not known.
And it can't really be known. Only rawprepare iop knows about it.
And it can be changed by user. And then if user changes it, the thumbnail will be wrong.

Perhaps the solution isn't to have the downscale anticipate the needs of later iops, but rather have it pass on necessary information otherwise lost by downscale to later iops. I think it would be enough if it not only generated on nicely downscaled output, but also for each output pixel the maximum of the source pixels of that color.

This would double the need for output buffer size, of course... (Or reduce preview resolution.) And I don't have a sense right away of how it would work with the caching setup. And I do worry that it would require either another codepath or an awkward conditional for the highlights iop.

#32 Updated by Dan Torop over 1 year ago

have the downscale anticipate the needs of later iops ... have it pass on necessary information otherwise lost by downscale to later iops

I tried to play this out, without the best results. I quadrupled the size of the buffer allocated by dt_mipmap_cache_init() and caches for all iops prior to highlights. In the extra space, for each downscaled pixel, I also stored the nearest useful input pixel and the min/max from the source pixels of the blur. (Minimum is necessary in case the invert iop is enabled.) Each iop prior to highlights processes the orig/min/max as they do normal data. Then the highlights iop, in the case that the corresponding maximum for a pixel is > clip, knows that the pixel needs extra reconstruction work.

Sadly, that's where it all falls apart. Just plugging in the original pixel in the place of the blurred pixel gets back to nasty aliasing noise. This is especially bad as the problem pixels tend to be at the edge of highlit areas, and hence already prone to aliasing. I haven't been able to figure out any algorithm which, given an original pixel, downsampled data, and the minimum/maximium of the source data for the downsample, can handle blown out areas as if the source data had been properly clipped.

Which is another way of saying that good highlight reconstruction may need to happen prior to downsampling.

Possibilities of where to go from here:

  1. Leave code as in master, essentially a box filter downsample, and expand to other cases (float, SSE). Code in master is a bit noisy in fine detail and does produce magenta edges to blown out highlights.
  2. Move on to the Gaussian downsample code which I've been working on, which produces smoother output, but produces worse magenta edges and is substantially slower and more complicated.
  3. Do something tricky with the pixel pipe: either a way that highlights can backtrack and reprocess the original image in the case of blown out highlights, or can somehow reverse transform whitelevel back to the downscale code. Or make the downscale code know something about the work of later iops (rawprepare to highlights) so it can do some clipping work on downscale. This is all pretty messy!
  4. In the case of previews, don't downscale them until after the highlights iop. Which acknowledges that reconstructing highlights of blurred downscaled data isn't a solved problem.

I'm curious to play out the fourth possibility. Could MIP_F processing parallel that of MIP_FULL through the highlights iop? Such that the early stages would essentially be NOPs, and then post-highlights the MIP_FULL results would be requested from cache then downscaled? In the case of a preview in darkroom mode, the MIP_FULL should already have been generated by the time the preview is being processed, so this should be a low cost operation. If a MIP_F is being used as a source for a thumbnail, this would be a bit more processor/memory intensive, but hard to imagine much worse than the sort of contortions which I've been having the code perform.

For reference, I'm attaching screenshots of a portion of a preview using the code in git master and from my bayer-subsample-bandlimited branch.

#33 Updated by Roman Lebedev over 1 year ago

Dan Torop wrote:

have the downscale anticipate the needs of later iops ... have it pass on necessary information otherwise lost by downscale to later iops

I tried to play this out, without the best results. I quadrupled the size of the buffer allocated by dt_mipmap_cache_init() and caches for all iops prior to highlights. In the extra space, for each downscaled pixel, I also stored the nearest useful input pixel and the min/max from the source pixels of the blur. (Minimum is necessary in case the invert iop is enabled.) Each iop prior to highlights processes the orig/min/max as they do normal data. Then the highlights iop, in the case that the corresponding maximum for a pixel is > clip, knows that the pixel needs extra reconstruction work.

Sadly, that's where it all falls apart. Just plugging in the original pixel in the place of the blurred pixel gets back to nasty aliasing noise. This is especially bad as the problem pixels tend to be at the edge of highlit areas, and hence already prone to aliasing. I haven't been able to figure out any algorithm which, given an original pixel, downsampled data, and the minimum/maximium of the source data for the downsample, can handle blown out areas as if the source data had been properly clipped.

Fascinating, thank you for looking into this!

Which is another way of saying that good highlight reconstruction may need to happen prior to downsampling.

Just to put it all in context, i "un-demosaiced" the preview pipe so that highlight reconstruction module in preview pipe is funcional, and recieves the same unclipped data as the main view. So any highlight tampering will pretty much null and void the reasoning.
Why it is important? Well, there were three reasons, first i wanted to be able to apply white balance before demosaic. second, i wanted to finally get rid of "stuck" preview (when you go from normal demosaic mode to passthrough, the old thumb would still be in cache..). and most importantly, i'm still eyeing the idea of proper highlight inpainting module. And for it to work, the preview pipe MUST have the same, untampered, highlight information as the main view...

Possibilities of where to go from here:

  1. Leave code as in master, essentially a box filter downsample, and expand to other cases (float, SSE). Code in master is a bit noisy in fine detail and does produce magenta edges to blown out highlights.

I must say, i'm beginning to think this may be the answer, at least for the time being (i.e. to be revisited after some time, when new knowledge is gained.)

  1. Move on to the Gaussian downsample code which I've been working on, which produces smoother output, but produces worse magenta edges and is substantially slower and more complicated.

I would love to see what was jo talking about - some kind of per-channel cfa aware blurring.

  1. Do something tricky with the pixel pipe: either a way that highlights can backtrack and reprocess the original image in the case of blown out highlights, or can somehow reverse transform whitelevel back to the downscale code. Or make the downscale code know something about the work of later iops (rawprepare to highlights) so it can do some clipping work on downscale. This is all pretty messy!
  2. In the case of previews, don't downscale them until after the highlights iop. Which acknowledges that reconstructing highlights of blurred downscaled data isn't a solved problem.

I'm curious to play out the fourth possibility. Could MIP_F processing parallel that of MIP_FULL through the highlights iop? Such that the early stages would essentially be NOPs, and then post-highlights the MIP_FULL results would be requested from cache then downscaled?

In the case of a preview in darkroom mode, the MIP_FULL should already have been generated by the time the preview is being processed, so this should be a low cost operation.

I'm not too sure that is a safe assumption to make, in light of those recent upegelow opencl changes.
Also, do note that MIP_FULL likely contains cropped image, when you zoom-in in darkroom.

If a MIP_F is being used as a source for a thumbnail, this would be a bit more processor/memory intensive, but hard to imagine much worse than the sort of contortions which I've been having the code perform.

For reference, I'm attaching screenshots of a portion of a preview using the code in git master and from my bayer-subsample-bandlimited branch.

#34 Updated by Dan Torop over 1 year ago

Fascinating, thank you for looking into this!

Just to put it all in context, i "un-demosaiced" the preview pipe so that...

And thank you for the thoughts/perspective... It's an interesting problem.

1. Leave code as in master, essentially a box filter downsample, and expand to other cases (float, SSE). Code in master is a bit noisy in fine detail and does produce magenta edges to blown out highlights.

I must say, i'm beginning to think this may be the answer, at least for the time being (i.e. to be revisited after some time, when new knowledge is gained.)

Certainly this would be fast. And as most of the time the preview is viewed small, and only fleetingly large in the main view, in most cases, and for most images, this would be good enough.

I would love to see what was jo talking about - some kind of per-channel cfa aware blurring.

I never figured this out, but am curious as well, and will keep on thinking about it. The Gaussian variant isn't bad, being pretty ideal for eliminating artifacts. It is a bit of a hack to make it understand a mosaiced image.

In the case of a preview in darkroom mode, the MIP_FULL should already have been generated by the time the preview is being processed, so this should be a low cost operation.

I'm not too sure that is a safe assumption to make, in light of those recent upegelow opencl changes.
Also, do note that MIP_FULL likely contains cropped image, when you zoom-in in darkroom.

Good point, I was forgetting all of this. In the case of a fit-to-screen main view, perhaps it would still be possible to borrow work from the full pipeline up to the point of demosaic? I'm curious to experiment with this, even in an unoptimized form. Downscaling previews after highlights (or during demosaic?) seems like the best quality solution.

#35 Updated by Dan Torop over 1 year ago

Still need to get back to this.

It does seem like for now a slightly better box filtered downsample is the way to go, with SSE and X-Trans implementations. I think the current code could be improved a bit by taking advantage of the separable nature of a box filter. That should make it faster, at the cost of allocating a temporary buffer.

It also bothers me that the start/end points of the sample are on pixel boundaries of the input image. The first and last pixel should be weighted. I'd want to see if that would make the output more clean, and not slow things down.

#36 Updated by Roman Lebedev 12 months ago

  • Target version changed from 2.4.0 to 2.6.0

Also available in: Atom PDF