add thread lock debugging facilities
it seems that import/thumbnail creation speed might be limited by lock contention.
this should be debugged by wrapping all pthread_* functions and types into custom functions and structs.
dt_pthread_lock() would then (if compiled in debug mode) measure the time taken to lock the mutex (i.e. measure directly before and after the pthread_mutex_lock call to get wait times).
additionally, the times and function names/line numbers of acquired locks should be stored.
these statistics can then be analysed to obtain the problem mutices and the problem functions (which keep locking mutices for a long time).