Infinite monkey - Nico Brailovsky's blog: Templates

Showing posts with label Templates. Show all posts

Wednesday, 18 January 2017

Fixing templates with constexpr's

For my hundredth (and a bit) c++ post I decided to do something I never did before: fix my old code!

A long time ago I wrote about template metaprogramming devices. There, I tried to explain that many atrocities have been commited in the name of performance and "compile time evaluation". Template metaprogramming is probably one of the worse culprits of job security. Its (ab)use can create monstrosities, all in the name of runtime performance. Like, for example, my template device to calculate e. Let's remember what that atrocious code looks like (follow the link if you want an explanation on how this works):

template <int N, int D> struct Frak {
        static const long Num = N;
        static const long Den = D;
};
  
template <int N, typename F> struct ScalarMultiplication {
    typedef Frak<N*F::Num, N*F::Den> result;
};
  
template <int X, int Y> struct MCD {
        static const long result = MCD<Y, X % Y>::result;
};
  
template <int X> struct MCD<X, 0> {
        static const long result = X;
};
  
template <class F> struct Simpl {
        static const long mcd = MCD<F::Num, F::Den>::result;
        typedef Frak< F::Num / mcd, F::Den / mcd > result;
};
  
template <typename X1, typename Y1> struct SameBase {
    typedef typename ScalarMultiplication< Y1::Den, X1>::result X;
    typedef typename ScalarMultiplication< X1::Den, Y1>::result Y;
};
  
template <typename X, typename Y> struct Sum {
    typedef SameBase<X, Y> B;
    static const long Num = B::X::Num + B::Y::Num;
    static const long Den = B::Y::Den; // == B::X::Den
    typedef typename Simpl< Frak<Num, Den> >::result result;
};

template <int N> struct Fact {
    static const long result = N * Fact<N-1>::result;
};
template <> struct Fact<0> {
    static const long result = 1;
};
 
template <int N> struct E {
    // e = S(1/n!) = 1/0! + 1/1! + 1/2! + ...
    static const long Den = Fact<N>::result;
    typedef Frak< 1, Den > term;
    typedef typename E<N-1>::result next_term;
    typedef typename Sum< term, next_term >::result result;
};
template <> struct E<0> {
    typedef Frak<1, 1> result;
};
 
int main() {
  typedef E<8>::result X;
  std::cout << "e = " << (1.0 * X::Num / X::Den) << "\n";
  std::cout << "e = " << X::Num <<"/"<< X::Den << "\n";
  return 0;
}

While this is just a toy example to play with templates, it does illustrate code I've seen in the wild. Would this look cleaner in c++11? Yes, it would. Constexprs are, in my opinion, one of the most overlooked "killer" features of c++11.

Starting with a simple example:

constexpr int foo(int a, int b) { return a+b; }
static constexpr int n = foo(1, 2);
int bar() { return n; }

Try to compile it with "g++ -std=c++11 -fverbose-asm -O0 -c -S -o /dev/stdout" and see what happens. You should get the equivalent of "return 3" - just as anyone would expect - but note that no optimizations were enabled. What about loops? Let's try this:

constexpr int f(int n) {
    return (n<2)? 1 : n + f(n-1);
}

constexpr int n = f(999);

You'll probably get an error about maximum depth exceeded, but that's alright: we have loops in constexprs too! (note that some of these restrictions have been relaxed in C++17).

In general, if you can express your function as a single const return statement, it should be a valid constexpr. With this new knowledge, let's convert the template meta-atrocity above to something a bit less hideous:

struct PodFrac {
    int num;
    int den;
};

constexpr int mcd(int a, int b) {
    return (b==0)? a : mcd(b, a%b);
}

constexpr PodFrac simpl(const PodFrac &f) {
    return PodFrac{f.num / mcd(f.num, f.den), f.den / mcd(f.num, f.den)};
}

constexpr PodFrac sum(const PodFrac &a, const PodFrac &b) {
    return simpl(PodFrac{a.num*b.den + b.num*a.den, a.den*b.den});
}

constexpr int fact(int n) {
    return (n==0)? 1 : n*fact(n-1);
}

constexpr PodFrac e(int n) {
    return (n==0)? PodFrac{1, 1} :
                   sum(PodFrac{1, fact(n)}, e(n-1));
}

constexpr float e_num = 1.0 * e(8).num / e(8).den;

float get_e() {
    return e_num;
}

Disclaimer: while I explicitly stated this multiple times in my "C++ template metaprogramming introduction" article, it's worth re-stating it: this code is meant as an example to showcase a c++ feature, not as a proper way of deriving a mathematical constant in production code.

First thoughts after comparing the two versions: much, much [, much]*100 cleaner.

As you may notice, all constexprs need to be a return statement. There's no multi-statement constexpr in c++11, which explains why loops are not really supported. For the same reason the implementation of e() is a bit hindered by this limitation: its code would be much more readable splitting it in a few lines with proper names. Good news: some of these restrictions have been relaxed in C++17.

Note that if you analyze your compiler's output when building without optimizations, you may see either a const with e's value, or a static initializer that does some trivial operation, like loading e's value from a fraction: gcc seems to get tired of constexpr evaluation after a few recursive calls, so your results may vary (slightly).

I called constexpr's one of c++11's killer features, and hopefully you can see why I'm so enthusiastic about them now: there's much less incentive for people to write horrible template metaprogramming devices when simply adding a little keyword to a normal function has the same effect, only cleaner.

Tuesday, 5 May 2015

C++: A jump table with a template device

A few articles ago we saw how gcc might need some help when mixing template instanciation (pure compile time data) with function calls (deducible compile time information, but not available to the template expander). Now we'll go one step further and combine all three types: pure compile time data, deducible compile time data and pure run time data (*). Just to annoy the compiler, and to see how gcc is able to optimize the results.

Let's build a simple example, similar to what we used last time: an object that will determine the range of an integer and then invoke a callback with the closest range. Something like this could be used, for example, to allocate a buffer.

void boring(int x, func f) {
    if (x < 2) {
        f(2);
    } else if (x < 4) {
        f(4);
    } else if (x < 8) {
        f(8);
    } else if (x < 16) {
        // You get the idea...
    }
}

Can we build a prettier template version of this code, without any overhead? Let's try:

typedef void (*func)(int);

template <int My_Size>
struct Foo {
    void bar(size_t size, func callback) {
        if (size > My_Size) {
            callback(My_Size);
        } else {
            next_foo.bar(size, callback);
        }
    }

    Foo<My_Size/2> next_foo;
};

// Stop condition
template<> struct Foo<0> {
    void bar(size_t, func) { }
};

void wrapper(int x, func f) {
    Foo<512> jump_table;
    jump_table.bar(x, f);
}

And now, let's compile like as "g++ -fverbose-asm -S -O0 -c foo.cpp -o /dev/stdout | c++filt". You'll see something like this:

wrapper(int, void (*)(int)):
    call    Foo<512>::bar(unsigned long, void (*)(int))

Foo<512>::bar(unsigned long, void (*)(int)):
    cmpq    $512, %rsi    #, size
    jbe    .L4
    call    *%rdx    # callback
    jmp    .L3
.L4:
    call    Foo<256>::bar(unsigned long, void (*)(int))    #
.L3:
    leave

Foo<256>::bar(unsigned long, void (*)(int)):
    cmpq    $256, %rsi    #, size
    jbe    .L4
    call    *%rdx    # callback
    jmp    .L3
.L4:
    call    Foo<128>::bar(unsigned long, void (*)(int))    #
.L3:
    leave

# You get the idea, right?

Foo<0>::bar(unsigned long, void (*)(int)):
    # Stop condition, do nothing

That doesn't look too good, does it? We don't need to worry: we already learned that gcc needs help from the optimizer to handle template expansion and non static function calls. Let's move to O1:

rapper(int, void (*)(int)):
.LFB14:
    cmpq    $512, %rdi    #, D.2974
    jbe    .L2    #,
    movl    $512, %edi    #,
    call    *%rsi    # f
    jmp    .L1    #
.L2:
    cmpq    $256, %rdi    #, D.2974
    jbe    .L4    #,
    movl    $256, %edi    #,
    call    *%rsi    # f
    jmp    .L1    #

# Again, it should be clear what's going on...

.L11:
    cmpq    $1, %rdi    #, D.2974
    .p2align 4,,2
    jbe    .L1    #,
    movl    $1, %edi    #,
    .p2align 4,,2
    call    *%rsi    # f
.L1:

It's better than last time, but it doesn't look great either: gcc managed to inline all calls, but it stopped there. Let's move to O2 and see what happens:


wrapper(int, void (*)(int)):
    movslq    %edi, %rdi    # x, D.2987
    cmpq    $512, %rdi    #, D.2987
    ja    .L13    #,
    cmpq    $256, %rdi    #, D.2987
    ja    .L14    #,
    [ .... ]
    cmpq    $2, %rdi    #, D.2987
    ja    .L21    #,

.L13:
    movl    $512, %edi    #,
    jmp    *%rsi    # f

.L14:
    movl    $256, %edi    #,
    jmp    *%rsi    # f

[ .... ]

.L21:
    movl    $2, %edi    #,
    jmp    *%rsi    # f

.L1:
    rep
    ret
    .p2align 4,,10
    .p2align 3

Now, that looks much better. And we can now see that gcc generates the same code at -O2 for both versions of our code.

(*) Just for the sake of completion:

Pure compile time data is information directly available during compilation time, like a constant.
Deducible compile time data means something that can easily be deduced, like a function call to a non virtual method.
Run-time only data means something that a compiler could never deduce, like a volatile variable or the parameter of a function called from outside the current translation unit.

Tuesday, 21 April 2015

gcc: Optimization levels and templates

Analyzing the assembly output for template devices can be a bit discouragging at times, specially when we spend hours trying to tune a mean looking template class only to find out the compiler is not able to reduce it's value like we expected. But hold on, before throwing all your templates away you might want to figure out why they are not optimized.

Let's start with a simple example: a template device to return the next power of 2:

template <int n, long curr_pow, bool stop>
struct Impl_Next_POW2 {
    static const bool is_smaller = n < curr_pow;
    static const long next_pow = _Next_POW2<n, curr_pow*2, is_smaller>::pow;
    static const long pow = is_smaller? curr_pow : next_pow;
};

template <int n, long curr_pow>
struct Impl_Next_POW2<n, curr_pow, true> {
    // This specializtion is important to stop the expansion
    static const long pow = curr_pow;
};

template <int n>
struct Next_POW2 {
    // Just a wrapper for _Next_POW2, to hide away some
    // implementation details
    static const long pow = _Next_POW2<n, 1, false>::pow;
};

Gcc can easily optimize that away, if you compile with "g++ foo.cpp -c -S -o /dev/stdout" you'll just see the whole thing is replaced by a compile time constant. Let's make gcc's life a bit more complicated now:

template <int n, long curr_pow, bool stop>
struct Impl_Next_POW2 {
    static long get_pow() {
        static const bool is_smaller = n < curr_pow;
        return is_smaller?
                    curr_pow :
                    _Next_POW2<n, curr_pow*2, is_smaller>::get_pow();
    }
};

template <int n, long curr_pow>
struct Impl_Next_POW2<n, curr_pow, true> {
    static long get_pow() {
        return curr_pow;
    }
};

template <int n>
struct Next_POW2 {
    static long get_pow() {
        return _Next_POW2<n, 1, false>::get_pow();
    }
};

Same code but instead of using plain static values we wrap everything in a method. Compile with "g++ foo.cpp -c -S -fverbose-asm -o /dev/stdout | c++filt" and you'll see something like this now:

main:
    call    Next_POW2<17>::get_pow()

Next_POW2<17>::get_pow():
    call    _Next_POW2<17, 1l, false>::get_pow()

_Next_POW2<17, 1l, false>::get_pow():
    call    _Next_POW2<17, 2l, false>::get_pow()

_Next_POW2<17, 2l, false>::get_pow():
    call    _Next_POW2<17, 4l, false>::get_pow()

_Next_POW2<17, 4l, false>::get_pow():
    call    _Next_POW2<17, 8l, false>::get_pow()

_Next_POW2<17, 8l, false>::get_pow():
    call    _Next_POW2<17, 16l, false>::get_pow()

_Next_POW2<17, 16l, false>::get_pow():
    call    _Next_POW2<17, 32l, false>::get_pow()

_Next_POW2<17, 32l, false>::get_pow():
    movl    $32, %eax    #, D.2171

What went wrong? It's very clear for us the whole thing is just a chain of calls which could be replaced by the last one, however that information is now only available if you "inspect" the body of each function, and this is something the template instanciator (at least in gcc) can't do. Luckily you just need to enable optimizations, -O1 is enough, to have gcc output the reduced version again.

Keep it in mind for the next time you're optimizing your code with template metaprogramming: some times the template expander needs some help from the optimizer too.

Tuesday, 15 October 2013

A C++ template device to obtain an underlying type

What happens when you need to get the underlying data type of a pointer or reference? You can write some crazy metaprogram to do it for you. Like this:

template <typename T> struct get_real_type      { typedef T type; };
template <typename T> struct get_real_type<T*>  { typedef T type; };
template <typename T> struct get_real_type<T&>  { typedef T type; };

template <class T>
int foo() {
    return get_real_type<T>::type::N;
}

struct Bar {
    static const int N=24;
};

#include <iostream>
using namespace std;
int main() {
    cout << foo<Bar*>() << endl;
    cout << foo<Bar&>() << endl;
    cout << foo<Bar>() << endl;
}

Incidentally, this is also the basis for the implementation of std::remove_reference. Actually you'd be better of using std::remove_reference, for your own sanity.

Thursday, 27 June 2013

Useless code: a template device to calculate e

Recently I needed to flex a bit my template metaprogrammingfooness, so I decided to go back and review and old article I wrote about it (C++11 made some parts of those articles obsolete, but I'm surprised of how well it's aged). To practice a bit I decided to tackle a problem I'm sure no one ever had before: defining a mathematical const on compile time. This is what I ended up with, do you have a better version? Shouldn't be to hard.

template <int N, int D> struct Frak {
	static const long Num = N;
	static const long Den = D;
};

template <class X, int N> struct MultEscalar {
	typedef Frak< N*X::Num, N*X::Den > result;
};

template <class X1, class Y1> struct IgualBase {
	typedef typename MultEscalar< X1, Y1::Den >::result X;
	typedef typename MultEscalar< Y1, X1::Den >::result Y;
};

template <int X, int Y>	struct MCD {
	static const long result = MCD<Y, X % Y>::result;
};
template <int X> struct MCD<X, 0> {
	static const long result = X;
};

template <class F> struct Simpl {
	static const long mcd = MCD<F::Num, F::Den>::result;
	typedef Frak< F::Num / mcd, F::Den / mcd > result;
};

template <class X, class Y> struct Suma {
	typedef IgualBase<X, Y> B;
	static const long Num = B::X::Num + B::Y::Num;
	static const long Den = B::Y::Den; // == B::X::Den
	typedef typename Simpl< Frak<Num, Den> >::result result;
};

template <int N> struct Fact {
	static const long result = N * Fact<N-1>::result;
};
template <> struct Fact<0> {
	static const long result = 1;
};

template <int N> struct E {
	// e = S(1/n!) = 1/0! + 1/1! + 1/2! + ...
	static const long Den = Fact<N>::result;
	typedef Frak< 1, Den > term;
	typedef typename E<N-1>::result next_term;
	typedef typename Suma< term, next_term >::result result;
};
template <> struct E<0> {
	typedef Frak<1, 1> result;
};

#include <iostream>
int main() {
	typedef E<8>::result X;
	std::cout << "e = " << (1.0 * X::Num / X::Den) << "\n";
	std::cout << "e = " << X::Num <<"/"<< X::Den << "\n";
	return 0;
}

Thursday, 29 November 2012

Cool C++0X features XIII: auto and ranged for, cleaner loops FTW

Long time without updating this series. Last time we saw how the ugly

for (FooContainer::const_iterator i = foobar.begin(); i != foobar.end(); ++i)

could be transformed into the much cleaner

for (auto i = foobar.begin(); i != foobar.end(); ++i)

Yet we are not done, we can clean that a lot more using for range statements.

Ranged for is basically syntactic sugar (no flamewar intended) for shorter for statements. It's nothing new and it's been part of many languages for many years already, so there will be no lores about the greatness of C++ innovations (flamewar intended), but it still is a nice improvement to have, considering how tedious can be to write nested loops. This certainly looks much cleaner:

for (auto x : foobar)

This last for-statement, even though it looks good enough to print and hang in a wall, raises a lot of questions. What's the type of x? What if I want to change its value? Let's try to answer that.

The type of the iterator will be the same as the type of the vector, so in this case x would be an int:

std::vector foobar;
for (auto x : foobar) {
	std::cout << (x+2);
}

And now, what happens if you want to alter the contents of a list and not only display them? That's easy too, just declare x as an auto reference:

std::vector foobar;
for (auto& x : foobar) {
	std::cout << (x+2);
}

This looks really nice but it won't really do anything, for two different reasons:

Ranged fors won't work until g++ 4.5.6 is released
The list is empty!

There are many ways to initialize that list, but we'll see how C++0X let's you do it in a new way the next time.

Thursday, 1 November 2012

stlfilt: read ugly tmpl errors

There's nothing better for Monday mornings than the smell of hundreds of template errors after a make clean all. When using template metaprogramming, a tiny misplaced coma can generate enough error code that, if printed, would crush you under tones of paper. And don't even try to read them, it'll make your head explode.

Luckily STLFilt can be quite a relief when dealing with this kind of errors. Granted, it won't make a steaming pile of poo seem to be a nice poem, but if you have something like the dog in the picture, to use a metaphor, at least it would put a blanket on its face.

Tuesday, 4 October 2011

Cool C++0X features XII: type inference with auto

In the last four entries we worked on a simple example, like the one I'm pasting below, of type inference with decltype, which led us to learn about delayed type declaration and decltypes with auto. This time I want to focus just on the auto keyword instead.

template <class... Args>
auto wrap(Args... a) -> decltype( do_something(a...) ) {
	std::cout << __PRETTY_FUNCTION__ << "n";
	return do_something(a...);
}

We saw last time how decltype can be used in a contrived way to create a local variable without specifying its type, only how to deduce the type for this variable. Luckily, that verbose method of type declaration can be summed up in the following way:

	int x = 2;
	int y = 3;
	decltype(x*y) z = x*y;

Should be written as:

	int x = 2;
	int y = 3;
	auto z = x*y;

That's right, when you are declaring local variables it's easier and cleaner to just use auto. This feature isn't even "in the wild" yet, so you can't really predict what will people do with it, but it seems to me that limiting its use to local variables with a very short lived scope is the best strategy. We are yet to see what monstrosities the abuse of this feature will produce, and I'm sure there will be many. Regardless of their potential to drive insane any maintainers, its best use probably comes in loops.

In any C++ application, you'll find code like this:

for (FooContainer<Bar>::const_iterator i = foobar.begin(); i != foobar.end(); ++i)

This ugly code can be eliminated with something much more elegant:

for (auto i = foobar.begin(); i != foobar.end(); ++i)

Looks nicer indeed, but we can improve it much further with other tools. We'll see how the next time. For the time being, let's see for what auto is not to be used.

When using auto, keep in mind it was designed to simplify the declaration of a variable with a complex or difficult to reason type, not as a replacement for other language features like templates. This is a common mistake:

Wrong:

void f(auto x) {
	cout << x;
}

Less wrong:

template <T>
void f(T x) {
	cout << x;
}

It makes no sense to use auto in the place of a template, since a template means that the type will be completed later whereas auto means it should be deduced from an initializer.

Friday, 10 June 2011

Cool C++0X features X: type inference with decltype

After creating a wrapper object on the last entries, we were left with three syntax changes two analyze:

-> (delayed declaration)
decltype
auto

We already saw the first, and we'll be talking about the other two this time. This was the original wrapper function which led us here:

template <class... Args>
auto wrap(Args... a) -> decltype( do_something(a...) ) { 
	std::cout << __PRETTY_FUNCTION__ << "n";
	return do_something(a...);
}

Back on topic: decltype

This operator (yes, decltype is an operator) is a cousin of sizeof which will yield the type of an expression. Why do I say it's a cousin of sizeof? Because it's been in the compilers for a long time, only in disguise. This is because you can't get the size of an expression without knowing it's type, so even though it's implementation has existed for a long time only now it's available to the programmer.

One of it's interesting features is that the expression with which you call decltype won't be evaluated, so you can safely use a function call within a decltype, like this:

auto foo(int x) -> decltype( bar(x) ) { 
	return bar(x);
}

Doing this with, say, a macro, would get bar(x) evaluated twice, yet with decltype it will be evaluated only once. Any valid C++ expression can go within a decltype operator, so for example this is valid too:

template <typename A, typename B>
auto multiply(A x, B y) -> decltype( x*y )
{ 
	return x*y;
}

What's the type of A and B? What's the type of A*B? We don't care, the compiler will take care of that for us. Let's look again at that example, more closely:

-> (delayed declaration) and decltype

Why bother creating a delayed type declaration at all and not just use the decltype in place of the auto? That's because of a scope problem, see this:

// Declare a template function receiving two types as param
template <typename A, typename B>
// If we are declaring a multiplication operation, what's the return type of A*B?
// We can't multiply classes, and we don't know any instances of them
auto multiply(A x, B y)
// Luckily, the method signature now defined both parameters, meaning
// we don't need to expressly know the type of A*B, we just evaluate
// x*y and use whatever type that yields
	-> decltype( x*y )
{ 
	return x*y;
}

decltype

As you see, decltype can be a very powerful tool if the return type of a function is not known for the programmer when writing the code, but you can use it to declare any type, anywhere, if you are too lazy to type. If you, for example, are very bad at math and don't remember that the integers group is closed for multiplication, you could write this:

	int x = 2;
	int y = 3;
	decltype(x*y) z = x*y;

Yes, you can use it as VB's dim! (kidding, just kidding, please don't hit me). Even though this works and it's perfectly legal, auto is a better option for this. We'll see that on the next entry.

Tuesday, 7 June 2011

Cool C++0X features IX: delayed type declaration

In the last two entries we worked on a wrapper object which allows us to decorate a method before or after calling (hello aspects!), or at least that's what it should do when g++ fully implements decltypes and variadic templates. Our wrapper function looks something like this (check out the previous entry for the wrapper object):

#include <iostream>

void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }
void do_something(const char*) { std::cout << __PRETTY_FUNCTION__ << "n"; }
int do_something(int) { std::cout << __PRETTY_FUNCTION__ << "n"; return 123; }

template <class... Args>
auto wrap(Args... a) -> decltype( do_something(a...) ) { 
	std::cout << __PRETTY_FUNCTION__ << "n";
	return do_something(a...);
}

int main() {
	wrap();
	wrap("nice");
	int x = wrap(42);
	std::cout << x << "n";
	return 0;
}

After the example, we were left with three new syntax changes to analyze:

-> (delayed declaration)
decltype
auto

Let's study the -> operator this time: -> (delayed declaration)

This is the easiest one. When a method is declared auto (I've left this one for the end because auto is used for other things too) it means its return type will be defined somewhere else. Note that in this regard the final implementation differs from Stroustroup's FAQ.

The -> operator in a method's definition says "Here's the return type". I'll paste the same simple example we had last time, the following two snippets of code are equivalent:

void foo() {}

Is the same as:

auto foo() -> void {}

Tuesday, 31 May 2011

Cool C++0X features VIII: Variadic wrapper and type inference with decltype

The wrapper function we built last time looks something like this now:

#include <iostream>

void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }
void do_something(const char*) { std::cout << __PRETTY_FUNCTION__ << "n"; }

template <class... Args>
void wrap(Args... a) {
	std::cout << __PRETTY_FUNCTION__ << "n";
	do_something(a...);
}

int main() {
	wrap();
	wrap("nice");
	return 0;
}

But, as we saw last time, this approach has the problem of requiring the return type of do_something to be known before hand. What can we do to remove this dependency? In C++, not much. You can't really declare a type based on the return type of another function. You do have the option of using lots of metaprogramming wizardy, but this is both error prone and ugly (see Stroustroup's C++0x FAQ).

C++0x lets you do some magic with type inference using decltype; decltype(expr) will yield the type of that expression. It works quite similarly as sizeof does; decltype is resolved at compile time and the expression with which it's being called is not evaluated (more on this later).

How would this work on our example?

#include <iostream>

void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }
void do_something(const char*) { std::cout << __PRETTY_FUNCTION__ << "n"; }
int do_something(int) { std::cout << __PRETTY_FUNCTION__ << "n"; return 123; }

template <class... Args>
auto wrap(Args... a) -> decltype( do_something(a...) ) {
	std::cout << __PRETTY_FUNCTION__ << "n";
	return do_something(a...);
}

int main() {
	wrap();
	wrap("nice");
	int x = wrap(42);
	std::cout << x << "n";
	return 0;
}

Try it (remember to add -std=c++0x) it works great! The syntax is not so terribly difficult to grasp as it was with variadic templates. The auto keywords says "hey, compiler, the return type for this method will be defined later", and then the -> actually declares the return type. This means that the auto-gt idiom isn't part of typedecl but a helper, which in turns means that even if not useful, this is valid C++0x code:

auto wrap() -> void {
}

This means that we have three interesting components to analyze in this scenario:

-> (delayed declaration)
auto
decltype

We'll go over each one the next time.

Closing remark: At first I choose the following example to introduce delayed return types and decltype (warning, untested code ahead):

#include <iostream>

struct Foo {
	void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }
	void do_something(const char*) { std::cout << __PRETTY_FUNCTION__ << "n"; }
	int do_something(int) { std::cout << __PRETTY_FUNCTION__ << "n"; return 123; }
};

// Untested code ahead
// This makes g++ coredump (v 4.4.5)
template <class T>
struct Wrap : public T {
	template <class... Args>
	auto wrap(Args... a) -> decltype( T::do_something(a...) ) {
		std::cout << __PRETTY_FUNCTION__ << "n";
		return T::do_something(a...);
	}
};

int main() {
	Wrap<Foo> w;
	w.wrap();
	w.wrap("nice");
	std::cout << w.wrap(42) << "n";
	return 0;
}

Though this looks MUCH better (and useful), at the time of writing this article mixing variadic templates with decltypes in a template class makes g++ segfault. It should be valid C++, but I can't assure it's correct code since I've never tried it.

Tuesday, 24 May 2011

Cool C++0X features VII: A variadic wrapper solution

Last time we were trying to build a wrapper function, in which we don't control the class being wrapped nor the user of the wrapper (meaning we can't change either of those but they could change without warning).

This was the first approach:

#include <iostream>

void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }

void wrap() {
	std::cout << __PRETTY_FUNCTION__ << "n";
	do_something();
}

int main() {
	wrap();
	return 0;
}

Yet, as we saw, it's not scalable, when either part changes the whole things break. We proposed then a variadic template solution, which, if you tried it yourself, should look something like this:

#include <iostream>

void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }
void do_something(const char*) { std::cout << __PRETTY_FUNCTION__ << "n"; }

template <class... Args>
void wrap(Args... a) {
	std::cout << __PRETTY_FUNCTION__ << "n";
	do_something(a...);
}

int main() {
	wrap();
	wrap("nice");
	return 0;
}

That's better. Now we don't care about which parameters do_something should get, nor how many of them are there supposed to be, just how it's called. If you read the previous entries on variadic templates this should be a walk in the park. It still has a flaw though: we need to know the return type of do_something!

Is there a way to write a wrapper without knowing the return type of a function you are wrapping? Yes, in Ruby you can. But now you can do it in C++0x too, and we'll see how to do it next time.

A closing remark: You could do something like this wrapping everything in a class:

#include <iostream>

struct Foo {
	void do_something() { std::cout << __PRETTY_FUNCTION__ << "\n"; }
	void do_something(const char*) { std::cout << __PRETTY_FUNCTION__ << "\n"; }
};

template 
struct Wrapper : public Base {
	template <class... Args>
	void wrap(Args... a) {
		std::cout << __PRETTY_FUNCTION__ << "n";
		Base::do_something(a...);
	}
};

int main() {
	Wrapper w;
	w.wrap();
	w.wrap("nice");
	return 0;
}

The above works just fine, but due to some limitations in the current (stable) version of gcc we will use the former version (the problem with this form will be clear later, I promise).

Tuesday, 17 May 2011

Cool C++0X features VI: A variadic wrapper

Let's work on the last variadic exercise, a wrapper. Say you have something like this:

#include <iostream>

void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }

int main() {
	do_something();
	return 0;
}

And you want to wrap do_something with something else (Remember __PRETTY_FUNCTION__?). This is a solution, the worst one though (or, to be accurate, the most boring one):

#include <iostream>

void do_something() { std::cout << __PRETTY_FUNCTION__ << "n"; }

void wrap() { 
	std::cout << __PRETTY_FUNCTION__ << "n";
	do_something();
}

int main() {
	wrap();
	return 0;
}

Why is it so bad? Let's say you don't control do_something, you just control the wrapper. You may not even control main(), it may be beyond your scope. That means each time do_something changes, or adds an overload, you have to change your code. That's ugly and you should already know how to set up a variadic function to forward the arguments to do_something. Give it a try, next time the solution.

Tuesday, 10 May 2011

Cool C++0X features V: Templates and angle brackets, a short interlude

In the heart of C++ template metaprogramming and code obfuscation, lies the (ab)use of angle brackets. This seemingly innocent token can turn the most clean looking code into the mess that template-detractors like so much to complain about.

C++0x doesn't do much to clean up this mess, it's probably impossible, but it does offer a subtle feature to improve the legibility of C++ template code, a nifty little feature we have (inadvertently) used.

Up to C++0x, having two angle brackets together (>>) was parsed as the shift operator (like the one cout uses), meaning that if you had nested templates a lot of compiler errors ensued. C++0x corrects this, meaning that code which in C++ would be like this:

Reduce<Sum, Lst<Num<2>, Lst<Num<4>, Lst<Num<6>, Lst< Num<8> > > > > >

Can now be written like this:

Reduce<Sum, Lst<Num<2>, Lst<Num<4>, Lst<Num<6>, Lst< Num<8>>>>>>

Aaand, back to the normal schedule...

Tuesday, 3 May 2011

Cool C++0X features IV: Variadic templates again

Last time we finally solved the varargs problem. Let's review what we learned:

Variadic templates let us create something receiving a variable set of arguments
We can process the head of that set, then recursively process the tail
It adds weird new syntax
- When declaring typename... T you are saying "here goes a list of types"
- When declaring T... t you are saying t is a list of objects with different type
- When you write t..., you are saying "expand the list of arguments"
It's type safe
It's very neat to confuse your coworkers

So, what can we do with it besides implementing our own version of printf? Let's do something better, let's try adding up a list of numbers to start flexing our variadic templatefooness (?).

What's the usual way of adding a list of numbers? In templates, that is. We need something like this:

sum (H:T) <- H + sum(T)
sum () <- 0

Of course, in C++ templates you don't have values, you just have types. We could implement it like this (if this looks like a new language you may want to check my template metaprogramming series):

#include <iostream>

struct Nil{};
template <typename H, typename T=Nil> struct Lst {
	typedef H Head;
	typedef T Tail;
};

template <
		template<typename A, typename B> class Op,
		typename Head,
		typename Lst>
struct intForeach
{
	typedef typename intForeach
		< Op, typename Lst::Head, typename Lst::Tail >::result Next;
	typedef typename Op< Head, Next >::result result;
};

template <
		template<typename A, typename B> class Op,
		typename Head>
struct intForeach <Op, Head, Nil>
{
	typedef Head result;
};

template <
		typename Lst,
		template<typename A,
		typename B>
		class Op>
struct Reduce
{
	typedef typename intForeach
		< Op, typename Lst::Head, typename Lst::Tail >::result result;
};

template <int N> struct Num {
	const static int value = N;
};

template <typename A, typename B> struct Sum {
	static const int r = A::value + B::value;
	typedef Num<r> result;
};

int main() {
	std::cout << Reduce<
		Lst<Num<2>, Lst<Num<4>, Lst<Num<6>, Lst< Num<8> > > > >,
		Sum >::result::value << "n";
	return 0;
}

Nothing too fancy, plain old recursion with a sum. Yet it's quite verbose, can we make this a little bit more terse and, hopefully, more clear? Yes, we can. Take a look at that Lst, Lst<...> It sucks. And it's the perfect place to use variadic templates, we just need to construct a structure getting a list of ints, like this:

template <
	// The operation we wish to apply
	template<typename A, typename B> class Op,
	// Current element to process
	class H,
	// All the rest
	class... T>
struct Reduce_V
{
	// TODO
}

That one should look familiar from last time article. Now, to implement a reduce operation we need to operate the current element with the result of reducing the tail, so we have to do something like this:

// Remember how T... means to expand T for the next instance
	typedef typename Reduce_V<Op, T...>::result Tail_Result

There's something missing. Can you see what? The ending condition, of course. Let's add it and we'll get something like this:

template <
        // The operation we wish to apply
        template<typename A, typename B> class Op,
        // All the rest
        class... T>
struct Reduce_V
{
};

template <
        // The operation we wish to apply
        template<typename A, typename B> class Op,
        // All the rest
        class H>
struct Reduce_V<Op, H>
{
	typedef H result;
};
 
template <
        // The operation we wish to apply
        template<typename A, typename B> class Op,
        // Current element to process
        class H,
        // All the rest
        class... T>
struct Reduce_V<Op, H, T...>
{
        // Remember how T… means to expand T for the next instance
   typedef typename Reduce_V<Op, T...>::result Tail_Result;
 
   // Reduce current value with the next in line
   typedef typename Op<H, Tail_Result>::result result;
};

And using it is very simple too:

std::cout << Reduce_V< Sum, Num<1>, Num<2>, Num<3>, Num<4>>::result::value << "n";

Next time we'll see another example for variadic templates and a new C++0x feature.

Tuesday, 26 April 2011

Cool C++0X features III: Variadic templates, a fix for varargs

Last time we saw why a function with varargs may bring lots of problems. Then we saw how to solve it, but never explained why that last solution doesn't have the problems the varargs solution had, nor how does it work. Let's start by copying the solution here:

// Stop condition
void println() {}

// General case
template <typename H, typename... T>
void println(H p, T... t)
{
   std::cout << p;
   println(t...);
}

int main() {
   println("Hola", " mundo ", 42, 'n';);
   return 0;
}

It certainly looks much better than the varargs function, even though some new strange syntax has been introduced. Keep in mind some template-foo is required, not only because of the syntax but because we'll be talking about functional programming too.

With all that intro (the last 2 articles were just an intro!) now we are in a good shape to ask what a variadic template really is. In its easiest form, it's just a list of template arguments, like this:

template <typename... T> void foo(T... t) {}

That simple template can accept as many parameters as you need, of any type. This is much safer than a vararg because:

Doesn't require the user to specify the number of args passed to foo, so it just can't get out of sync
It's typesafe; since C++ templates are type-safe, variadic templates are type safe too. You won't be able to request an int where a char is required, you'll just get a compiler error.
Compile time check: you get type safety just because this is all compiled code. If it doesn't compile, you get an error (albeit a little cryptic).
POD types support
Better performance; small gain, but a gain indeed. Since this is all done in compile time there's no need to handle the stack dynamically, nor of having a loop getting the args. It's all known when you compile, thus the compiler can just optimize the hell out of everything

Pretty neat, huh? But how does it work? Variadic templates are actually very similar to how Haskell handles lists, you get all the arguments as a list of types in which you can either get the head or the tail. To do something useful, get the head and continue processing the tail recursively.

template <typename H, typename... T>
void do_something(H h, T... t)
{
	// Do something useful with h
	really_do_something(h);
	// Continue processing the tail
	do_something(t...);
}

Of course, you'll eventually need a condition to stop processing: (we'll explain the new syntax later)

void do_something()
{
	// Do nothing :)
}

When the list is completely processed the empty do_something function will be called. Easy, right? But it does have a lot of weird syntax. Let's see what each of those ellipses mean:

When declaring typename... T you are saying "here goes a list of types". That is, when you use ellipses after the typename (or class) declaration but before the name of the type, then you are expecting a list of types there.
When declaring T... t you are saying t is a list of objects with different type. That is, you declared T... as a type which holds a list of types, therefore t, of type T, is an instance of a list of objects, each of different type
When you write t..., you are saying "expand the list of arguments". You declared t as a list of objects, but you have no way of accessing each of those objects, just to the list as a whole. When you write the name of the object followed by ellipses, you are saying expand these types and their instance for the called function

With all that in mind, let's put together our typesafe printf:

// Condition to stop processing
void println() {}

// Println receives a list of arguments. We don't know it's type nor
// how many there are, so we just get the head and expand the rest
template <typename H, typename... T>
void println(H p, T... t)
{
	// Do something useful with the head
	std::cout << p;
	// Expand the rest (pass it recursively to println)
	println(t...);
}

int main() {
	// See how it works even better than varargs?
   println("Hola", " mundo ", 42, 'n');
   return 0;
}

Next time, we'll see a more complex (and fun) example of variadic templates.

Tuesday, 19 April 2011

Cool C++0X features II, Variadic templates: What's wrong with varargs

Last time we explained what variadic templates are. We'll see what they can do now. We mentioned that solving the problem of having a type-safe varargs is one of the best ways of applying variadic templates, but what's varargs?

Varargs functions (from C world, not even from C++!) are functions which have a variable number of arguments, just like printf. These are usually very dangerous functions, since they are not typesafe. Let's see how they are implemented with an example:

#include <stdarg.h>
#include <iostream>

// My god, it's full of bugs
void va_println(int args_left, ...) {
   va_list arg_lst;
   va_start(arg_lst, args_left);

   while(args_left--) {
      const char *p = va_arg(arg_lst, const char*);
      std::cout << p;
   }

   va_end(arg_lst);
}

int main() {
   va_println(3, "Hola ", "mundo", "n");
   return 0;
}

This implementation of a function with variable arguments is, more or less, the best C can give us, yet it riddled with bugs and hidden problems. Let's go one by one:

Arg num will get out of sync: You need to specify the list of args as well as how many you have. That WILL get out of sync. Trust me, it's just a mater of time. And when it does, you'll have a coredump.
Type-unsafe: You just tell varargs "Hey, get me an int". And it will give you an int, no warranties included. If it was supposed to be a short instead, though luck, you end up with a coredump.
No, really, coredump: Where are so many coredumps coming from, you may ask. Easy, varargs it's just a way of handling the stack. Calling va_arg just moves the stack pointer by the sizeof the datatype you requested. That means no compile-time checks are included.
No pod types: Remember POD types? Try running this code:

#include <stdarg.h>

struct X { virtual ~X(){} };

void va_println(int args_left, ...) {
   va_list arg_lst;
   va_start(arg_lst, args_left);

   while(args_left--) {
      X *p = va_arg(arg_lst, X*);
   }

   va_end(arg_lst);
}

int main() {
   X x, y, z;
   va_println(3, x, y, z);
   return 0;
}

And how do we fix it?

The fix is easy. Too easy. You just need C++0X. We will discuss why this is better next time, but just as a sneak peak:

void println() {}
template <typename H, typename... T> void println(H p, T... t) {
   std::cout << p;
   println(t...);
}

int main() {
   println("Hola", " mundo ", 42, 'n');
   return 0;
}

Remember to compile using -std=c++0x in gcc. (Thanks Hugo Arregui for correcting the POD example)

Monday, 18 April 2011

Cool C++0X features I: Intro

C++0X brings some very cool changes, and I wanted to start a series of posts regarding some of these changes, with a small explanation of each new feature (that I currently understand, at least), an example of its usage and why I think it's a cool thing. Notice these two may be mutually exclusive, some of these may just be cool but I wouldn't recommend using them on a day to day basis. An example of a very cool feature which I wouldn't normally use in a project is the one I want to write about today: variadic templates.

What's not to love about variadic templates? Its name implies (correctly) that it uses templates, and it also has a "variadic" thingy, which you can use to look smart since no one really knows what it means.

Templates themselves can quickly get complicated if used by unexperienced padawans in the art of martial C++, yet their hypnotic beauty draws every programmer to use them just like flies are drawn to fire. When used correctly they can produce very elegant code; if not for the template programmer, at least for the end user. Yet in all their power, templates in C++ have been lacking a fundamental aspect: a variable number of arguments.

There are ways to work around this limitation, like using a list of types paired with a template-paramlist-object. Sounds familiar? (I know it doesn't, don't worry). You could also generate N constructors, one overload for each parameter count. The drawback, exponential compile time (say, TR1). These are all hacks, which are in place only because there wasn't a safe way of passing a list of types associated with a list of arguments. This is over now with variadic templates in C++0X.

So, what kind of problem would variadic templates solve? Let's name a few:

A typesafe varargs function (a function with a variable number of arguments)
Easily create a template object which acts as a tuple
An easier implementation of a reduce (inject) function

This entry is getting quite long so we'll start seeing these examples on the next post.

Thursday, 31 March 2011

CRTP for static dispatching

So, virtual dispatching is just too much overhead for you? I bet you do need every femtosecond from your CPU. Even if you don't, who doesn't like weird C++ constructs? Take CRTP, for example, a Curiously recurring template pattern:

template <class Derived> struct CRTP {
    const char* greeting() const {
        const Derived* self = static_cast<const Derived*>(this);
        return self->greeting();
    }
};

struct Hello : public CRTP<Hello> {
    const char* greeting() const { return "Hello world"; }
};

struct Bye : public CRTP<Bye> {
    const char* greeting() const { return "Bye world"; }
};

#include <iostream>
template <class T> void print(const CRTP<T> &x) {
    std::cout << x.greeting() << "n";
}

int main() {
    print(Hello());
    print(Bye());
    return 0;
}

Using this weird looking (ain't them all?) template device you can have static dispatching with most of the flexibility of dynamic dispatching. As a bonus, you'll drive all your cow-orkers insane! Bonus non useful information: In C++ 0X you could use variadic templates and have a proxy object with static dispatching. How cool is that?

Thursday, 23 September 2010

Template Metaprogramming XV: Gemini

This is the end. My only reader, the end. After 15 chapters of template metaprogramming you should have learned why staying away from them is a good idea, but if you have been following this series then you should know now when and why they could be useful.

These posts were a compendium of mostly isolated data I found during my travels through the depths of metaprogramming tricks, there are books and people much more capable than me if you want to learn more about this subject (Modern C++ Design by Andrei Alexandrescu comes to mind).

The whole idea of having a cache and a virtual template method was nice, but after seeing the result I decided it was best to have a factory method and an IDL. It may not be so l33t, but whoever has to maintain the code after me will be grateful.

This is the last post on this topic because I feel I have written most, if not everything, I can transmit through this medium but also for an important reason, most likely I won't be working with C++ code so much from now on [1] so there won't be as many chances for me to see the dark, insane, side of this beautiful (in its own way) programming language in a programming language. I know most of you must have barely skimmed through these articles, but I still hope you enjoyed them.

[1] That's right, I'm leaving C++ for the dark side of development, I'll be working with Java from now on. Keep in mind this article may have been written a long time before it's published.

[2] Wow, it was a long time since I used the meta-post category

syntax highlight