Infinite monkey - Nico Brailovsky's blog: July 2013

Tuesday 30 July 2013

Force a program to output to stdout

Silly but handy CLI trick on Linux: Some programs don't have an option to output to stdout. Gcc comes to mind. In that case the symlink '/dev/stdout' will come in handy: /dev/stdout will be symlinked to stdout for each process.

With this trick you could, for example, run "gcc -S foo.cpp -o /dev/stdout", to get the assembly listing for foo.cpp.

You probably shouldn't use this trick on anything other than CLI scripting stuff (keep in mind /dev/stdout might be closed or not accessible for some processes).

Thursday 25 July 2013

C++ exceptions under the hood appendix III: RTTI and exceptions orthogonality

Exception handling on C++ requires a lot of reflexion. I don't mean the programmer should be reflecting on exception handling (though that's probably not a bad idea), I mean that a piece of C++ code should be able to understand things about itself. This looks a lot like run-time type information, RTTI. Are they the same? If they are, does exception handling work without RTTI?

We might be able to get a clue about the difference between RTTI and exception handling by using -fno-rtti on gcc when compiling our ABI project. Let's use the throw.cpp file:

g++ -fno-rtti -S throw.cpp -o throw.nortti.s
g++ -S throw.cpp -o throw.s
diff throw.s throw.nortti.s

If you try that yourself you should see there's no difference between the RTTI and the No-RTTI version. Can we conclude then that gcc's exception handling is done with a mechanism different to RTTI? Not yet, let's see what happens if we try to use RTTI ourselves:

void raise() {
    Exception ex;
    typeid(ex);
    throw Exception();
}

If you try and compile that, gcc will complain: you can't use typeid with -fno-rtti specified. Which makes sense. Let's see what typeid does with a simple test:

#include <typeinfo>

class Bar {};
const std::type_info& foo()
{
        Bar bar;
            return typeid(bar);
}

If we compile this with "g++ -O0 -S", you will see foo compiled into something like this:

_Z3foov:
.LFB19:
    # Prologue stuff...

    subl    $16, %esp
    # Bar bar

    movl    $_ZTI3Bar, %eax
    # typeid(bar)

    leave
    # Epilogue stuff...

_ZTS3Bar:
    # Definition for _ZTS3Bar...

_ZTI3Bar:
    .long   _ZTVN10__cxxabiv117__class_type_infoE+8
    .long   _ZTS3Bar
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

Does that look familiar? If it doesn't, then try changing the sample code to this one:

class Bar {};
void foo() { throw Bar(); }

Compile it like "g++ -O0 -fno-rtti -S test.cpp" and see the resulting file. You should see something like this now:

_Z3foov:
    # Prologue stuff...

    # Initialize exception
    subl    $24, %esp
    movl    $1, (%esp)
    call    __cxa_allocate_exception
    movl    $0, 8(%esp)

    # Specify Bar as exception thrown
    movl    $_ZTI3Bar, 4(%esp)
    movl    %eax, (%esp)

    # Handle exception
    call    __cxa_throw

    # Epilogue stuff...

_ZTS3Bar:
    # Definition for _ZTS3Bar...

_ZTI3Bar:
    .long   _ZTVN10__cxxabiv117__class_type_infoE+8
    .long   _ZTS3Bar
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

That should indeed look familiar: the class being thrown is exactly the same as the class that was used for typeid!

We can now conclude what's going on: the implementation for exception throwing type information, which needs reflexion and relies on RTTI info for it, is exactly the same as the underlying implementation for typeid and other RTTI friends. Specifying -fno-rtti on g++ only disables the "frontend" functions for RTTI: that means you won't be able to use typeid, and no RTTI classes will be generated... unless an exception is thrown, in which case the needed RTTI classes will be generated regardless of -fno-rtti being present (you still won't be able to access the RTTI information of this class via typeid, though).

Tuesday 23 July 2013

A random slideshow in Ubuntu

The other day I wanted to use my tv for a slideshow of my travel pictures. Something simple, just select a folder and have a program like Shotwell create a slideshow with a random order on my tv. Of course, Ubuntu and double screen equals fail. For some reaason all the programs I tried either were incapable of using the tv as the slideshow screen (even after cloning screens... now that's a wtf) or where not able to recursively use all the pictures in a folder.

feh to the rescue. It's not pretty, but feh seems to be exactly what I was looking for. It's a CLI application for Linux and after some RTFM I came up with this script:

feh ~/Pictures \
     --scale-down \
     --geometry 1920x760 \
     --slideshow-delay 9 \
     --recursive \
     --randomize \
     --auto-zoom \
     --draw-filename \
     --image-bg black

You can probably figure out by yourself what each option means. If not, just man feh.

Thursday 18 July 2013

Mocking in C++: the virtual problem

Mocking objects is crucial for a good test suite. If you don't have a way to mock heavy objects you'll end up with slow and unreliable tests that depend on database status to work. On the other hand, C++ mocking tends to be a bit harder than it is on dynamic languages. A frequent problem people find when mocking are virtual methods.

What's the problem with virtual methods? C++ has the policy of "not paying for what you don't use". That means, not using virtual methods is "cheaper" than using them. Classes with no virtual methods don't require a virtual dispatch nor a vtable, which speeds things up. So, for a lot of critical objects people will try to avoid virtual methods.

How is this a problem for mocking? A mock is usually a class which inherits from the real class, as a way to get the proper interface and to be compatible with code that uses the real implementation. If a mock inherits from the real thing you'll need to define all of its methods as virtual, even if you don't need to, just so you can actually implement a mock.

A possible solution

The problem is clear: we need some methods to behave as virtual, without defining them as virtual.

A solution to this problem, the one I personally choose, is using a TEST_VIRTUAL macro in the definition of each mockeable method for a class; in release builds I just compile with -DTEST_VIRTUAL="", and for testing builds I compile with -DTEST_VIRTUAL="virtual". This method is very simple and works fine but has the (very severe) downside of creating a different release code than the code you test; this might be acceptable, but it's a risk nonetheless.

Other possible solutions I've seen in the past are:

Making everything virtual, even if not strictly necessary. Quite an ugly solution, in my opinion, it can affect performance and the code is stating that a method can be overridden, even if this don't make sense.
Using some kind of CRTP for static dispatching: probably one of the cleanest solutions, but I think it adds too much overhead to the definition of each class.
Don't make the mock inherit from the real implementation, make the user code deduce the correct type (eg by using templates). It's also a clean solution, but you loose a lot of type information (which might or might not be important) and it might also severely impact the build time

To conclude, I don't think there's a perfect solution to the virtual problem. Just choose what looks better and accept we live in an imperfect world.

Tuesday 16 July 2013

Counting lines per second with bash

The other day I wanted to quickly monitor the status of a production nginx after applying some iptables rules and changing some VPN stuff. It's easy to know if you completely screwed up the server: the number of requests per second will drop to zero, all requests will have an httpstatus different from 200, or some other dramatic and easy to measure side effect.

What happens if you broke something in a slightly more subtle way? Say, you screwed up something in ipsec (now, I wonder how that can happen...) and now networking is slow. Or iptables now enforces some kind of throttling in a way you didn't expect. To detect this type of errors I wrote a quick bash script to output how many lines per second are added to a file. This way I was able to monitor if the throughput of my nginx install didn't decrease after my config changes, without installing a full fledged solution like zabbix.

I didn't find anything like this readily available, so I'm posting it here in case someone else finds it useful.

#!/bin/bash

# Time between checks
T=5

# argv[1] will be the file to check
LOG_FILE=$1

while true; do
    tmp=`mktemp`
    # tail a file into a temp. -n0 means don't output anything at the start so
    # we can sleep $T seconds and we don't need to worry about previous entries
    tail -n0 -f $LOG_FILE > $tmp 2>/dev/null & sleep $T;
    kill $! > /dev/null 2>&1;
    echo "Requests in $LOG_FILE in the last $T seconds: `cat $tmp | wc -l`";
    rm $tmp;
done

Thursday 11 July 2013

Starting an EMR job with Boto

I've noticed there are not many articles about boto and Amazon web services. Although boto's documentation is quite good, it lacks some practical examples. Most specifically, I found quite a fair amount of RTFM was needed to get an elastic map reduce job started on Amazon using Boto (and I did it from Google app engine, just to go full cloud!). So here it goes, a very basic EMR job launcher using boto:

zone_name = 'eu-west-1'
access_id = ...
private_key = ...

# Connect to EMR
conn = EmrConnection(access_id, private_key,
                    region=RegionInfo(name=zone_name,
                    endpoint= zone_name + '.elasticmapreduce.amazonaws.com'))

# Create a step for the EC2 instance to install Hive
args = [u's3://'+zone_name+'.elasticmapreduce/libs/hive/hive-script',
            u'--base-path', u's3://'+zone_name+'.elasticmapreduce/libs/hive/',
            u'--install-hive', u'--hive-versions', u'0.7.1']
start_jar = 's3://'+zone_name+ \
            '.elasticmapreduce/libs/script-runner/script-runner.jar'
setup_step = JarStep('Hive setup', start_jar, step_args=args)

# Create a jobflow using the connection to EMR and specifying the
# Hive setup step
jobid = conn.run_jobflow(
                    "Hive job", log_bucket.get_bucket_url(),
                    steps=[setup_step],
                    keep_alive=keep_alive, action_on_failure='CANCEL_AND_WAIT',
                    master_instance_type='m1.medium',
                    slave_instance_type='m1.medium',
                    num_instances=2,
                    hadoop_version="0.20")

# Set the termination protection, so the job id won't be killed after the
# script is finished (that way we can reuse the instance for something else
# Don't forget to shut it down when you're done!
conn.set_termination_protection(jobid, True)

s3_url = 'Link to a Hive SQL file in S3'
args = ['s3://'+zone_name+'.elasticmapreduce/libs/hive/hive-script',
        '--base-path', 's3://'+zone_name+'.elasticmapreduce/libs/hive/',
        '--hive-versions', '0.7.1',
        '--run-hive-script', '--args',
        '-f', s3_url]

start_jar = 's3://'+zone_name+'.elasticmapreduce/libs/script-runner/script-runner.jar'
step = JarStep('Run SQL', start_jar, step_args=args)
conn.add_jobflow_steps(jobid, [step])

Tuesday 9 July 2013

A coverage report for C++ unit tests

A lot of tools and metrics which are pretty much given for some dynamic languages are quite esoteric in C++ land. Unit testing is one of these tools, and so code coverage metrics is even more obscure in C++. Turns out it's not impossible. I have uploaded an example C++ project with unit tests and code coverage report generation. Shouldn't be to hard to adapt this code to your own project.

Let's analyze some of the core concepts of this example.

Unit testing

A coverage report only makes sense if you have a suite of unit/integration tests. gtest and gmock have worked the best for me but I guess anything that can run a suit of tests will be good to get a coverage report.

Getting some coverage

gcov is a simple utility you can find on Linunx to generate coverage reports. gcc has support for it, you just need to compile with "-fprofile-arcs -ftest-coverage --coverage" and link with "--coverage -lgcov". If you see the line 10 on the makefile for the example project, you'll see I define a new build type, special for coverage report.

Once the project is built with support for gcov, running the tests will generate a bunch of stats for lcov to pick up. The makefile includes a target that takes care of all these steps, compiling the program with gcov support, running the tests and then collecting the results into a nice html report.

Getting it running

Unfortunatelly, generating a coverage report has a lot of dependencies in C++. For the example on my github repository you'll have to install lcov, cppcheck, gtest, gmock and vera++ (a code style checker for C++ which is now discontinued... you should probably search for a replacement). Once you have it running, though, you can easily integrate this with your jenkins setup.

Thursday 4 July 2013

My own gdb cheatsheet, just because

Gdb is the de facto tool for debugging applications on GNU/Linux. The first time you see it, it would appear to be a very simple application with very limited capabilities. Truth is, gdb is a very complex tool for a very difficult job, and becoming an proficient user can be a daunting task. To top it off, gdb graphical interfaces don't help at all when using it, so you are better off learning how to use it in console mode.

There are a ton of guides to learn the basics of gdb, so I'll just leave here a very quick list on the very basics needed to start understanding it:

Running stuff

Start your debugging session with "gdb $path_to_app"
If you have a core dump you need to analyze, start it as "gdb $path_to_app $path_to_core"
Don't forget to 'ulimit -c unlimited' if you want to get core files
Don't forget to compile with debug symbols ("-g3")
Are you using gcc? Then instead of -g3 use -ggdb

Breaking stuff

Set breakpoints by typing "break"
Break on functions by typing "break 'Namespace::Class::InnerClass::function(overload_t)'"
When breaking on function's names, use tab's autocompletion. It's your best friend (don't forget the quotes in the function's name, otherwise the double colon symbol will break the autocompletion)
You can also "break filename.cpp:line_number"
Start the show by typing "run"

Viewing the source

"list" will show the source code for your current location
"list foo" will show the source code for function foo
"list *0x080483c7" will list the source code for whatever there is at address 0x080483c7
Replace list for disassemble for extra fun
"disassemble /r ..." will additionally print an hex dump
"disassemble /m ..." will also interleave the original source

While running

step will continue execution until next line
stepi will continue execution until next assembly instruction
next will continue execution until next line, skipping function calls (ie won't step into another function)
continue will run until the next breakpoint

Inspecting stuff

'print x' will print an expression. You can print pretty much any valid c/c++ expression.
"print *0x080483b4" will print whatever there is at 0x080483b4
"info locals" will print local vars
"info registers" will print cool stuff
"backtrace", bt for his friends, will print the current calling stack.

This cheatsheet is far from being "advanced stuff" but it should be enough to get you started. The rest is practice.

Tuesday 2 July 2013

A tardis in gdb? Reverse a program's execution!

Have you ever been running a long debug session only to find you missed the spot by overstepping? I sure have and that can be one of the strongest motivators to invent a time machine. And it seems I'm not the only one who thinks so, given that gdb can now travel back in time. That's right, you can save a snapshot of a running program and then reverse the polarity to go back in time, just before you missed your breakpoint!

It's very simple to use too, you don't need six people to use this feature. Just type "checkpoint" in gdb to let it know you want to record the execution's state, then "restore N" to go back in time. I've recorded a sample debugging session:

(gdb) list 
1	int main()
2	{
3	    int a = 1;
4	    int b =2 ;
5	    a = b;
6	    b = 42;
7	    return 0;
8	}

(gdb) run
Breakpoint 1, main () at test.cpp:3
3	    int a = 1;
(gdb) n
4	    int b =2 ;
(gdb) p a
$1 = 1

Next, create a checkpoint:

(gdb) checkpoint 
checkpoint: fork returned pid 29192.

Interesting: a checkpoint is actually implemented as a fork. Moving on:

(gdb) n
5	    a = b;
(gdb) n
6	    b = 42;
(gdb) p a
$2 = 2

Ohnoes! We overstepped. Let's go back:

(gdb) restart 1
Switching to process 29192
#0  main () at test.cpp:4
4	    int b =2 ;
(gdb) p a
$3 = 1

And we're back in time.

How does it work

Reversing to a previous execution state is not an easy task. Gdb implements this feature by forking out a new process, a process we can later switch to. This means that reverting to a previous state might break things. The way forking is implemented in Linux, things like open files shouldn't be much of a problem. Sockets should still be connected but, of course, whatever you already sent won't be "unsent".

Gdb internals docs have some useful information on the limitation of this feature.