Infinite monkey - Nico Brailovsky's blog: May 2015

Wednesday, 27 May 2015

PSA: Use nautilus (GTK) in Kubuntu if Dolphin crashes

So, for some reason my (brand new) Kubuntu is not very stable. KDE tends to randomly crash whenever I send a large-ish file to the recycling bin. Dunno why, looks like some threading + caching issue in dolphin, and I couldn't find a solution anywhere on the web. Well, there are two workarounds:

1. Don't use the recycling bin. This sort of breaks my workflow, so I prefer to: 2. Use nautilus.

You'll have to install gtk packages, but that's a small price to pay to have KDE not crash every couple of minutes.

Thursday, 7 May 2015

Vim tip: Stop escaping slashes

If you do a lot of search & replace in Vim, eventually you'll end up escaping a lot of slashes. Whenever you have to replace a path, for example. Isn't that annoying? After a couple of levels you end up with a horrible "\/path\/to\/foo\/bar" pattern. And if you miss an escape slash, good luck. It's better to scrap the whole thing and start over.

Luckily, when you are using the 's'earch command you can pick a different separator. Instead of typing "%s/\/foo\/bar\/baz\//foo\/bar\//g", you can simply type "%s#/foo/bar/baz/#foo/bar/#g". Vim will automagically detect you want to use '#' as a delimiter, and you'll end up with a much more readable pattern.

Extra tip: this also works in sed

Tuesday, 5 May 2015

C++: A jump table with a template device

A few articles ago we saw how gcc might need some help when mixing template instanciation (pure compile time data) with function calls (deducible compile time information, but not available to the template expander). Now we'll go one step further and combine all three types: pure compile time data, deducible compile time data and pure run time data (*). Just to annoy the compiler, and to see how gcc is able to optimize the results.

Let's build a simple example, similar to what we used last time: an object that will determine the range of an integer and then invoke a callback with the closest range. Something like this could be used, for example, to allocate a buffer.

void boring(int x, func f) {
    if (x < 2) {
        f(2);
    } else if (x < 4) {
        f(4);
    } else if (x < 8) {
        f(8);
    } else if (x < 16) {
        // You get the idea...
    }
}

Can we build a prettier template version of this code, without any overhead? Let's try:

typedef void (*func)(int);

template <int My_Size>
struct Foo {
    void bar(size_t size, func callback) {
        if (size > My_Size) {
            callback(My_Size);
        } else {
            next_foo.bar(size, callback);
        }
    }

    Foo<My_Size/2> next_foo;
};

// Stop condition
template<> struct Foo<0> {
    void bar(size_t, func) { }
};

void wrapper(int x, func f) {
    Foo<512> jump_table;
    jump_table.bar(x, f);
}

And now, let's compile like as "g++ -fverbose-asm -S -O0 -c foo.cpp -o /dev/stdout | c++filt". You'll see something like this:

wrapper(int, void (*)(int)):
    call    Foo<512>::bar(unsigned long, void (*)(int))

Foo<512>::bar(unsigned long, void (*)(int)):
    cmpq    $512, %rsi    #, size
    jbe    .L4
    call    *%rdx    # callback
    jmp    .L3
.L4:
    call    Foo<256>::bar(unsigned long, void (*)(int))    #
.L3:
    leave

Foo<256>::bar(unsigned long, void (*)(int)):
    cmpq    $256, %rsi    #, size
    jbe    .L4
    call    *%rdx    # callback
    jmp    .L3
.L4:
    call    Foo<128>::bar(unsigned long, void (*)(int))    #
.L3:
    leave

# You get the idea, right?

Foo<0>::bar(unsigned long, void (*)(int)):
    # Stop condition, do nothing

That doesn't look too good, does it? We don't need to worry: we already learned that gcc needs help from the optimizer to handle template expansion and non static function calls. Let's move to O1:

rapper(int, void (*)(int)):
.LFB14:
    cmpq    $512, %rdi    #, D.2974
    jbe    .L2    #,
    movl    $512, %edi    #,
    call    *%rsi    # f
    jmp    .L1    #
.L2:
    cmpq    $256, %rdi    #, D.2974
    jbe    .L4    #,
    movl    $256, %edi    #,
    call    *%rsi    # f
    jmp    .L1    #

# Again, it should be clear what's going on...

.L11:
    cmpq    $1, %rdi    #, D.2974
    .p2align 4,,2
    jbe    .L1    #,
    movl    $1, %edi    #,
    .p2align 4,,2
    call    *%rsi    # f
.L1:

It's better than last time, but it doesn't look great either: gcc managed to inline all calls, but it stopped there. Let's move to O2 and see what happens:


wrapper(int, void (*)(int)):
    movslq    %edi, %rdi    # x, D.2987
    cmpq    $512, %rdi    #, D.2987
    ja    .L13    #,
    cmpq    $256, %rdi    #, D.2987
    ja    .L14    #,
    [ .... ]
    cmpq    $2, %rdi    #, D.2987
    ja    .L21    #,

.L13:
    movl    $512, %edi    #,
    jmp    *%rsi    # f

.L14:
    movl    $256, %edi    #,
    jmp    *%rsi    # f

[ .... ]

.L21:
    movl    $2, %edi    #,
    jmp    *%rsi    # f

.L1:
    rep
    ret
    .p2align 4,,10
    .p2align 3

Now, that looks much better. And we can now see that gcc generates the same code at -O2 for both versions of our code.

(*) Just for the sake of completion:

Pure compile time data is information directly available during compilation time, like a constant.
Deducible compile time data means something that can easily be deduced, like a function call to a non virtual method.
Run-time only data means something that a compiler could never deduce, like a volatile variable or the parameter of a function called from outside the current translation unit.

syntax highlight