Infinite monkey - Nico Brailovsky's blog: Programming

Showing posts with label Programming. Show all posts

Tuesday, 14 May 2019

Say nice things

As software developers, we need to put much more emphasis on positive interactions with our peers. Engineering requires critical thinking. Looking for cases where something (code!) will break. Criticizing what we do, on the hope of doing it better, is a key and necessary aspect of our profession. However, even when done properly (already a hard enough job!) this practice emphasizes negative interactions. In a normal job, what would you say is the ratio of times you hear "this might be better if .." vs "I really liked X"?

Saying "this was good" is hard. More often than not, it's hard to even notice good code (and I'd go as far as saying that noticing good things is hard in general!). Unlike criticism, positive interactions don't lead to direct improvements. No code will be enhanced by saying "I liked this solution", though people may be more inclined to considering criticism if the positive aspect is also noted.

In the end, maybe someone just had a slightly better day because you said something nice. That's already a small victory.

Tuesday, 15 November 2016

Ageing Stack overflow?

Hacking away on a little side project of mine, I found myself checking Stack Overflow for implementation tips about things I don't usually work with. Android UI stuff, mostly, which apparently is a very dynamic and ever changing ecosystem. After more than a few wasted hours, I noticed a worrying trend: in SO, answers tend to age horribly. If you are looking for "How to do X in platform Y", you may find a 4 year old answer that solves the problem, but only for platform Y, version "ancient".

Information ageing is quite a problem on its own. The answer is still valid, and, for people working on that specific platform, probably relevant. This will make it the first answer, leaving a lot of people (like myself) frustrated because the solution won't work in newer platforms. Is there a solution? Implement some kind of ageing time-window for information? Make the date a more prominent search parameter? Explicitly specify your platform and environment's version when asking a question? I have no idea.

While Stack Overflow seems to exacerbate the issue, this is a problem even for products with a company actively maintaining their documentation. A very annoying example; looking for ways to manage the media key I ended up in a page which (as of August 2016) points to a very outdated API (registerMediaButtonEventReceiver, in case you are wondering). If even Google encounters problems when managing documentation ageing for their own products, what can we expect of people like us, who only have a tiny fraction of that budget?

/Rant

Friday, 8 April 2016

Code and Google translate: awesomeness

Some time ago I found out one of my articles was translated to another language (yay for that, woo for not letting me know about it). To understand what my own article said, I had to use Google translate on the site. Guess what? c++ and Google translate can produce hilarious results:


# The include "throw.h" the extern "the C" {void seppuku () {throw statement the Exception () ; }}

Another one I liked:


the struct the Exception {};
#ifdef __cplusplus
  the extern "the C" {
#endif

void seppuku ();
# Ifdef __cplusplus
}
# endif

Now you know it. Next time you're looking at some incomprehensible c++ code, run it through Google translate. It may improve it.

Thursday, 24 March 2016

Gimple

Lately I've been toying around with gcc to learn a bit better how its optimization phases work. Understanding Gimple, the intermediate representation used by gcc, is a useful skill for this. Of course actually *understanding* it is quite an ambitious and daunting task, so it may be a bit more useful to skim through it.

Turns out that using -fdump-tree-all and -fdump-rtl-all its possible to get a lot of interesting information on the phases the compiler follows to get your code optimized, but the sheer amount of information produced makes it rather hard to make sense out of it. During the next few posts (weeks? months? probably until I satisfy my curiosity about gcc) I will be investigating a little bit the output of the -fdump options in gcc, to see what can be learned from it.

Tuesday, 23 June 2015

Useful predefined variables in make

I always forget about two very useful make variables, so I'll leave this here: $(COMPILE.cpp), $(LINK.cpp). It's easy, instead of writing a rule as

foo.o: foo.cpp
  g++ -c foo.coo

you should instead write this:

foo.o: foo.cpp
  $(COMPILE.cpp) foo.coo

COMPILE.cpp will have the default compiler you are supposed to use, and probably some helpful parameters as well. Likewise, LINK.cpp will have the linker you are supposed to use.

There are many useful predefined variables in make. Be sure to check them all by running "make -p" in a console.

Tuesday, 9 June 2015

Debugging multiple processes with gdb

If your buggy program generates lots of child processes, gdb will keep attached to the parent program and let all the children run loose. If you're having problems to find what causes your crash this is probably not what you want: for those occasions gdb has a very helpful flag called detach on fork.

With detach on fork you can tell gdb to keep debugging the parent, follow the children, or keep track of all processes. Must be nice to troubleshoot forkbombs with this option.

Tuesday, 28 April 2015

gdb: Print very long strings

gdb defaults are usually quite sensible and just "let you work". Some times, though, your project is not very sensible and you have to do weird things in gdb. An example: printing huge strings or vectors to try and reproduce a heisenbug. A lot of people get surprised at first because gdb will refuse to print very long strings, printing only the first few chars. Same for vectors. And if you have many repeating elements (eg "f000000000000000000000000000000bar") you might see something like "fooo0bar".
Just type these magic commands to see the whole string:

> set print repeats 0
> set print elements 0

Thursday, 9 April 2015

Code natural selection

A funny thought just came to me: if you write nice clean code, it's easy to replace it. Just plug out an object somewhere, replace it with another one implementing the same interface, run the tests. Tada! On the other hand, if you write crappy code it might be nigh impossible to replace it. It will probably be worked around whenever a change is needed, simply adding layers of crust. Maybe that's why legacy code sucks: it's simply code natural selection - and the fittest to survive is simply the crappiest one. I think I'm depressed now.

Tuesday, 22 October 2013

Some gratuitous MSVC bashing

Recently I found out Microsoft's Visual Studio doesn't support alternative tokens (ie "and" instead of "&&"). Even worse than that, apparently they don't think it's even necessary. And by the looks of this thread, the people working on MSVC need to take some time to actually READ the cpp standard. You know... it's kind of like a spec for your product. It's always good to take some time to understand the specs for your product...

I can only imagine how incredibly ugly their lexer must be to say it's not a fixable problem.

Thursday, 5 September 2013

Stopping commits on git

Who hasn't commited debug code by mistake? It's only normal to forget to remove an #include we added only to test some stuff. Luckily it's easy to tell git that we don't want to commit any changes with a certain string.

On any (git) repo you'll find a .git/hooks folder; add this script in .git/hooks/pre-commit (and don't forget to chmod +x it):

#!/bin/sh

if [ 0 != `git diff | grep "STOPCOMMIT" | wc -l` ]; then
    echo "Error: STOPCOMMIT found, remove it before commiting";
    git diff
    exit 1
fi

Now git will check your commits and stop them if they contain the STOPCOMMIT string. Now you can add all the debug changes you want, as long as you add a //STOPCOMMIT after them you'll never end up commiting them by mistake.

Thursday, 22 August 2013

Crazy git error

Have you ever run into this error message on git before?

fatal: example.com/repo.git/info/refs not found: did you run git update-server-info on the server?

It can be very baffling, because it may happen even if you change absolutely nothing in your git's configuration. I've read most people attribute this to a typo, and that seems to be the most common case, but I found yet another thing that might trigger this error: if you have set a proxy server, for example for wget, using an environment variable like http_proxy, https_proxy or ftp_proxy then git might be tripping up on your proxy and producing this error message.

Tuesday, 13 August 2013

Git tip: auto update your ctags

On any your .git/hooks folder; add this script in .git/hooks/post-merge (and don't forget to chmod +x it):

ctags -R -f .ctags .

Now every time you do a git pull your ctags file will automagically update. You might also want to copy or ln -s this script for the post-commit hook, if you want to run a ctags update on each git commit. Be aware that this will make your commits slower, if generating your tags file takes a long time.

Extra tip: "-f .ctags" will make ctags write into a hidden file, .ctags, which you can then add to .gitignore. Now ctags magically works in Vim and you won't even need to see your tags file (just don't forget to "set tags=./.ctags;/" on vim).

Tuesday, 16 July 2013

Counting lines per second with bash

The other day I wanted to quickly monitor the status of a production nginx after applying some iptables rules and changing some VPN stuff. It's easy to know if you completely screwed up the server: the number of requests per second will drop to zero, all requests will have an httpstatus different from 200, or some other dramatic and easy to measure side effect.

What happens if you broke something in a slightly more subtle way? Say, you screwed up something in ipsec (now, I wonder how that can happen...) and now networking is slow. Or iptables now enforces some kind of throttling in a way you didn't expect. To detect this type of errors I wrote a quick bash script to output how many lines per second are added to a file. This way I was able to monitor if the throughput of my nginx install didn't decrease after my config changes, without installing a full fledged solution like zabbix.

I didn't find anything like this readily available, so I'm posting it here in case someone else finds it useful.

#!/bin/bash

# Time between checks
T=5

# argv[1] will be the file to check
LOG_FILE=$1

while true; do
    tmp=`mktemp`
    # tail a file into a temp. -n0 means don't output anything at the start so
    # we can sleep $T seconds and we don't need to worry about previous entries
    tail -n0 -f $LOG_FILE > $tmp 2>/dev/null & sleep $T;
    kill $! > /dev/null 2>&1;
    echo "Requests in $LOG_FILE in the last $T seconds: `cat $tmp | wc -l`";
    rm $tmp;
done

Thursday, 11 July 2013

Starting an EMR job with Boto

I've noticed there are not many articles about boto and Amazon web services. Although boto's documentation is quite good, it lacks some practical examples. Most specifically, I found quite a fair amount of RTFM was needed to get an elastic map reduce job started on Amazon using Boto (and I did it from Google app engine, just to go full cloud!). So here it goes, a very basic EMR job launcher using boto:

zone_name = 'eu-west-1'
access_id = ...
private_key = ...

# Connect to EMR
conn = EmrConnection(access_id, private_key,
                    region=RegionInfo(name=zone_name,
                    endpoint= zone_name + '.elasticmapreduce.amazonaws.com'))

# Create a step for the EC2 instance to install Hive
args = [u's3://'+zone_name+'.elasticmapreduce/libs/hive/hive-script',
            u'--base-path', u's3://'+zone_name+'.elasticmapreduce/libs/hive/',
            u'--install-hive', u'--hive-versions', u'0.7.1']
start_jar = 's3://'+zone_name+ \
            '.elasticmapreduce/libs/script-runner/script-runner.jar'
setup_step = JarStep('Hive setup', start_jar, step_args=args)

# Create a jobflow using the connection to EMR and specifying the
# Hive setup step
jobid = conn.run_jobflow(
                    "Hive job", log_bucket.get_bucket_url(),
                    steps=[setup_step],
                    keep_alive=keep_alive, action_on_failure='CANCEL_AND_WAIT',
                    master_instance_type='m1.medium',
                    slave_instance_type='m1.medium',
                    num_instances=2,
                    hadoop_version="0.20")

# Set the termination protection, so the job id won't be killed after the
# script is finished (that way we can reuse the instance for something else
# Don't forget to shut it down when you're done!
conn.set_termination_protection(jobid, True)

s3_url = 'Link to a Hive SQL file in S3'
args = ['s3://'+zone_name+'.elasticmapreduce/libs/hive/hive-script',
        '--base-path', 's3://'+zone_name+'.elasticmapreduce/libs/hive/',
        '--hive-versions', '0.7.1',
        '--run-hive-script', '--args',
        '-f', s3_url]

start_jar = 's3://'+zone_name+'.elasticmapreduce/libs/script-runner/script-runner.jar'
step = JarStep('Run SQL', start_jar, step_args=args)
conn.add_jobflow_steps(jobid, [step])

Thursday, 4 July 2013

My own gdb cheatsheet, just because

Gdb is the de facto tool for debugging applications on GNU/Linux. The first time you see it, it would appear to be a very simple application with very limited capabilities. Truth is, gdb is a very complex tool for a very difficult job, and becoming an proficient user can be a daunting task. To top it off, gdb graphical interfaces don't help at all when using it, so you are better off learning how to use it in console mode.

There are a ton of guides to learn the basics of gdb, so I'll just leave here a very quick list on the very basics needed to start understanding it:

Running stuff

Start your debugging session with "gdb $path_to_app"
If you have a core dump you need to analyze, start it as "gdb $path_to_app $path_to_core"
Don't forget to 'ulimit -c unlimited' if you want to get core files
Don't forget to compile with debug symbols ("-g3")
Are you using gcc? Then instead of -g3 use -ggdb

Breaking stuff

Set breakpoints by typing "break"
Break on functions by typing "break 'Namespace::Class::InnerClass::function(overload_t)'"
When breaking on function's names, use tab's autocompletion. It's your best friend (don't forget the quotes in the function's name, otherwise the double colon symbol will break the autocompletion)
You can also "break filename.cpp:line_number"
Start the show by typing "run"

Viewing the source

"list" will show the source code for your current location
"list foo" will show the source code for function foo
"list *0x080483c7" will list the source code for whatever there is at address 0x080483c7
Replace list for disassemble for extra fun
"disassemble /r ..." will additionally print an hex dump
"disassemble /m ..." will also interleave the original source

While running

step will continue execution until next line
stepi will continue execution until next assembly instruction
next will continue execution until next line, skipping function calls (ie won't step into another function)
continue will run until the next breakpoint

Inspecting stuff

'print x' will print an expression. You can print pretty much any valid c/c++ expression.
"print *0x080483b4" will print whatever there is at 0x080483b4
"info locals" will print local vars
"info registers" will print cool stuff
"backtrace", bt for his friends, will print the current calling stack.

This cheatsheet is far from being "advanced stuff" but it should be enough to get you started. The rest is practice.

Tuesday, 2 July 2013

A tardis in gdb? Reverse a program's execution!

Have you ever been running a long debug session only to find you missed the spot by overstepping? I sure have and that can be one of the strongest motivators to invent a time machine. And it seems I'm not the only one who thinks so, given that gdb can now travel back in time. That's right, you can save a snapshot of a running program and then reverse the polarity to go back in time, just before you missed your breakpoint!

It's very simple to use too, you don't need six people to use this feature. Just type "checkpoint" in gdb to let it know you want to record the execution's state, then "restore N" to go back in time. I've recorded a sample debugging session:

(gdb) list 
1	int main()
2	{
3	    int a = 1;
4	    int b =2 ;
5	    a = b;
6	    b = 42;
7	    return 0;
8	}

(gdb) run
Breakpoint 1, main () at test.cpp:3
3	    int a = 1;
(gdb) n
4	    int b =2 ;
(gdb) p a
$1 = 1

Next, create a checkpoint:

(gdb) checkpoint 
checkpoint: fork returned pid 29192.

Interesting: a checkpoint is actually implemented as a fork. Moving on:

(gdb) n
5	    a = b;
(gdb) n
6	    b = 42;
(gdb) p a
$2 = 2

Ohnoes! We overstepped. Let's go back:

(gdb) restart 1
Switching to process 29192
#0  main () at test.cpp:4
4	    int b =2 ;
(gdb) p a
$3 = 1

And we're back in time.

How does it work

Reversing to a previous execution state is not an easy task. Gdb implements this feature by forking out a new process, a process we can later switch to. This means that reverting to a previous state might break things. The way forking is implemented in Linux, things like open files shouldn't be much of a problem. Sockets should still be connected but, of course, whatever you already sent won't be "unsent".

Gdb internals docs have some useful information on the limitation of this feature.

Tuesday, 25 June 2013

Watchpoints in gdb: wake me up when foo changes

I've noticed a lot of people claim gdb is not a good debugger because it doesn't support feature X. X is many times the ability to monitor changes to a memory location (ie when the value of a variable changes). Most times, though, people believe gdb doesn't implement X only because not enough time was spent reading its manual.

In gdb it's very easy to monitor variable changes using watchpoints. Here's a very simple example session:

(gdb) list 
1	int main()
2	{
3	    int a = 1;
4	    int b;
5	    a = b;
6	    b = 42;
7	    return 0;
8	}

Of course we need to be in the proper scope to set a watchpoint:

(gdb) run
Breakpoint 1, main () at test.cpp:3

Let's try to catch when b changes value:

(gdb) watch b
Hardware watchpoint 2: b

Interesting: a hardware watchpoint was set. What might that be?

(gdb) continue
Hardware watchpoint 2: b
  Old value = 0
  New value = 42
main () at test.cpp:7

Nice! gdb alerted us of the value change by breaking program execution. This can come in handy to fix race conditions.

Hardware and software watchpoints

Gdb will use hardware watchpoints if the underlying platform provides them; that means your architecture should provide some kind of hook for gdb to be alerted when a memory write at a certain address occurs. Hardware watchpoints are quite easy to use, relatively speaking, but not all platforms support them. In that case gdb will use software watchpoints, which are quite expensive and slow. Did you ever try to run a program by pressing "step" continuously? Well, a software watchpoint is similar, gdb will have to execute a program step by step and check if the value has changed in between steps.

As usual, gdb's manual has a lot more info.

PS: Once you find your bug with the aid of a watchpoint, please go and read some books about encapsulation!

Thursday, 20 June 2013

Detecting and ignoring third party memory problems with Valgrind

Lot's of people seem to give up on Valgrind after they see the dreaded "More than ### errors detected, go and fix your program". If the bulk of these errors are caused by crappy code in third party libraries there's very little to be done to fix them, other than creating a ticket for the library maintainer (and if the bulk of these errors are caused by your own code... well, don't write a watchdog please, do fix your program!). And that's assuming the reported error is not even a false positive, since Valgrind can report problems for crazy optimizations -O3 might have or for weird operator arithmetic.

If these spurious memory errors are there for too long most people will start ignoring Valgrind's output. Luckily, ignoring errors we can't fix is a possibility too, using Valgrind's ignore files.

Check if someone else has already found this issue. Many times libraries do have an "official" ignore file for the lib
If you find no ignore file, make really really sure the problem is not on your code. Preferably write a minimal unit test that triggers the warning on Valgrind. Make sure you're not misusing the library.
Add whatever warnings you found which were not on your application to a new ignore file
Share your ignore file with the world! Other people will either find it useful or tell you that what you thought was a bug on a lib is actually a problem on your code. That happens more often than not.

Thursday, 7 March 2013

Hive speedup trick

I've been playing around with Hive on top of Hadoop using AWS lately, but until recently I only knew about optimizing your query for better data-crunching throughput. Turns out you can also parallelize the subqueries execution, but this feature is not enabled by default.

Try running an explain on your query: if you have many root stages without dependencies then run this magic command: "set hive.exec.parallel=true", and then try your query again. If everything worked out fine you should be running multiple stages in parallel. use hive.exec.parallel.thread.number to control exactly how many stages to run in parallel.

Thursday, 21 February 2013

Bebugging / Fault injection

Releasing any software with a degree of confidence on its quality can be a difficult task. A way to improve this confidence is adding test. Easy enough, but then how do you know if you are actually testing what needs to be tested? Metrics like code coverage are very helpful but they also provide a false sense of security that can be even worse than having no unit testing at all.

One way of determining how reliable your testing suit is, is to test your testing suit. No, not by writing more tests but by writing more bugs!

The idea is simple: give your code to someone who is not a programmer on the project, someone who can know nothing about the implicit assumptions and preconditions you and your team mates already have about the code, the same assumptions and preconditions necessary to make stuff don't crash. Ask him to break things. Nothing too fancy, only subtle stuff; a memory leak over there, a sign vs unsigned comparison over here, an equal changed by a non-equal in an if (and please do it in a branch!).

Once you get a faulty branch of your project, try to see how many of the bugs you can detect using your testing suite (yes, valgrind should probably be part of your testing suit, albeit not a unit-test). No diffs, please.

Seeing how many bugs go unnoticed on a fault injection session can give you an idea of how comprehensive your unit tests are. It can be a very humbling experience, too.

You can find more information about bebugging in Wikipedia.

syntax highlight