Project

General

Profile

Bug #9872

Image import time grows quadratically with the size of the image collection

Added by Pedro Côrte-Real over 5 years ago. Updated over 5 years ago.

Status:
In Progress
Priority:
Low
Category:
General
Target version:
-
Start date:
03/23/2014
Due date:
% Done:

50%

Estimated time:
Affected Version:
1.4.1
System:
Ubuntu
bitness:
32-bit
hardware architecture:
amd64/x86

Description

As discussed on the darktable-users list:

https://www.mail-archive.com/darktable-users@lists.sourceforge.net/msg04204.html
https://www.mail-archive.com/darktable-users@lists.sourceforge.net/msg04205.html

Importing images becomes slower and slower as the database grows. The reason for this seems to be that image.c:dt_image_import() does a SQL SELECT for images with the same filename on each import. Apparently sqlite is doing a full table scan to respond to that SELECT. Adding an index to the filename field in the images table should fix this.

import_timestamps.jpg (28.5 KB) import_timestamps.jpg Tobias Ellinghaus, 03/25/2014 03:10 PM
20140323 DarktableImportStatistics.gnumeric (7.08 KB) 20140323 DarktableImportStatistics.gnumeric Pedro Côrte-Real, 03/25/2014 04:22 PM
FirstRunChart.png (37.9 KB) FirstRunChart.png Pedro Côrte-Real, 03/25/2014 04:22 PM

Associated revisions

Revision b0cc3396 (diff)
Added by Tobias Ellinghaus over 5 years ago

Add an index on images.filename

See bug #9872.

History

#1 Updated by Tobias Ellinghaus over 5 years ago

  • % Done changed from 0 to 50
  • Assignee set to Tobias Ellinghaus
  • Status changed from New to In Progress

I am not sure if an image on filename or on (film_id, filename) would be better, but since we already have one on film_id and I hope that sqlite is smart I will just add one on filename. Please report back if you can try it with master once it's in. I don't have a big collection around to test the speed improvement.

#2 Updated by Tobias Ellinghaus over 5 years ago

I added the index to the git master code in b0cc339692e55381bce4ed80b1ff3789c33b306f.

#3 Updated by Pedro Côrte-Real over 5 years ago

As discussed on IRC the slowdown is actually caused by the refresh of the lighttable view on every image import. Tobias is on the case! :)

#4 Updated by Tobias Ellinghaus over 5 years ago

I just wanted to see how bad the situation is and did the following test:
  • I added printf("XXXX %f\n", dt_get_wtime()); to dt_image_import() to get the wall clock time when an image got imported
  • I created 10k JPEG files using convert -size 6000x4000 -compress LZW xc:white 0.jpg; for ((i=1; i<10000; i++)); do cp 0.jpg $i.jpg; done;
  • I imported those into darktable running it as /opt/darktable/bin/darktable | grep XXXX > /tmp/benchmark.txt (/tmp/ is a ramdisk)
  • I imported the resulting file into libreoffice and generated a graph from the times. It's attached

The result looked quite linear so I am not sure why you get quadratic runtimes.

In the "Affected Version" field you said 1.4.1, did you also try it with a compile from git master (I guess that the answer is yes, as you tried the indexing commit)?

#5 Updated by Pedro Côrte-Real over 5 years ago

I've attached my chart and gnumeric spreadsheet. I did it by using a stopwatch and noting the elapsed time when a number of images had passed (as the counter of the interface updated). It's clearly quadratic (it fits perfectly). This graph was done with 1.4.1 as was the bug report as that was what I had installed. I tested it with master though and the results were the same.

I'm not sure how you imported into darktable but is it possible the lighttable view update wasn't being run? You were in map mode perhaps? Or you've disabled the dt_image_import signal? It could also be that for some reason your sqlite is faster than mine and so the penalty isn't so large. Maybe it's a newer version or running under a VM makes it much slower for me?

Also available in: Atom PDF

Go to top