Useful info for debugging

  • hexdump -C <file_name>. Displays ASCII value of the file. Very useful to debug errors while you are interested in a string value.
  • Always enable a single breakpoint in a multithreaded application under GDB. This helps your execution flow stick to one single thread of execution (provided the breakpoint is unique).
  • To access elements of an array, typecast the pointer with an equivalent array size. The other way of accessing is using @num_of_elements. gdb) (char[1024]) *ptr
  • gdb) set print pretty (useful to understand nested structure)
Advertisements

Rules for abort handling

An “Abort” is a special type of error in a system, usually injected by an external actor. In a multi-threaded application, managing abort requests becomes pain. I am sharing a few observations that could improve/minimize mistakes.

  • Implement one single handler for abort requests
  • Outside the handler, if a thread is going to wait, and abort may arrive in the meanwhile; then thread should check for abort as the first task
  • Use locks if we are determining abort with a flag
  • Never mix the abort path with a regular path in application. It is not wise to scatter abort related functionality among other threads

Debugging production system with gdb

GNU GDB provides capability to take core of a live application. This is very useful:

  • if you have a hung application (deadlocks, infinite waits)
  • if you can’t afford to stall the whole system for debugging

The “gcore” facility allows you to save memory image of a process to a core file.

This core file can be analyzed similar to an ordinary core.

Example: The process PID 4560 is a hung multi-threaded application.

kanaujia@ubuntu:~$ sudo gdb -p 4560

GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
This GDB was configured as "i686-linux-gnu".
Attaching to process 4560
Reading symbols from /home/kanaujia/work/a.out...(no debugging symbols found)...done.
Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb6da3b40 (LWP 4562)]
[New Thread 0xb75a4b40 (LWP 4561)]
Loaded symbols for /lib/i386-linux-gnu/libpthread.so.0
Reading symbols from /lib/i386-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb7758e1c in pthread_join () from /lib/i386-linux-gnu/libpthread.so.0
#2  0x0804866b in main ()
(gdb) gcore
Saved corefile core.4560
(gdb) q
A debugging session is active.

    Inferior 1 [process 4560] will be detached.

Quit anyway? (y or n) y
Detaching from program: /home/kanaujia/work/a.out, process 4560
kanaujia@ubuntu:~$ ls
core.4560                   Dropbox           PDF       VirtualBox VMs

Analyse the core

kanaujia@ubuntu:~/work$ gdb ./a.out ./core.4560 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Reading symbols from /home/kanaujia/work/a.out...(no debugging symbols found)...done.
[New LWP 4561]
[New LWP 4562]
[New LWP 4560]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `a.out'.
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb775e5a2 in __lll_lock_wait () from /lib/i386-linux-gnu/libpthread.so.0
#2  0xb7759ead in _L_lock_686 () from /lib/i386-linux-gnu/libpthread.so.0
#3  0xb7759cf3 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
#4  0x080486e0 in functionCount1 ()
#5  0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#6  0xb7695d3e in clone () from /lib/i386-linux-gnu/libc.so.6
(gdb) info threads
  Id   Target Id         Frame 
  3    LWP 4560          0xb7789424 in __kernel_vsyscall ()
  2    LWP 4562          0xb7789424 in __kernel_vsyscall ()
* 1    LWP 4561          0xb7789424 in __kernel_vsyscall ()

(gdb) thread
[Current thread is 1 (LWP 4561)]
(gdb) thread 2
[Switching to thread 2 (LWP 4562)]
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb775b96b in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/i386-linux-gnu/libpthread.so.0
#2  0x08048765 in functionCount2 ()
#3  0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#4  0xb7695d3e in clone () from /lib/i386-linux-gnu/libc.so.6
(gdb) thread 3
[Switching to thread 3 (LWP 4560)]
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb7758e1c in pthread_join () from /lib/i386-linux-gnu/libpthread.so.0
#2  0x0804866b in main ()
(gdb) thread 1
[Switching to thread 1 (LWP 4561)]
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb775e5a2 in __lll_lock_wait () from /lib/i386-linux-gnu/libpthread.so.0  <<<- This is the cause of hang
#2  0xb7759ead in _L_lock_686 () from /lib/i386-linux-gnu/libpthread.so.0
#3  0xb7759cf3 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
#4  0x080486e0 in functionCount1 ()
#5  0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#6  0xb7695d3e in clone () from /lib/i386-linux-gnu/libc.so.6

In thread 1, the back trace shows that we are waiting on a lock.

(gdb) thread 1
[Switching to thread 1 (LWP 4561)]
#0 0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7789424 in __kernel_vsyscall ()
#1 0xb775e5a2 in __lll_lock_wait () from /lib/i386-linux-gnu/libpthread.so.0
#2 0xb7759ead in _L_lock_686 () from /lib/i386-linux-gnu/libpthread.so.0
#3 0xb7759cf3 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
#4 0x080486e0 in functionCount1 ()
#5 0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0

>How to debug with GDB: Philosophy

>Debugging is an art.

In general, debugging is the most admired tool in the developer’s arsenal. My points in this article are related to GNU GDB debugger, for Linux operating system.

What is your first reaction when you see a core-dump?

Let’s debug it!

Hold-on. Wait, and first ruminate over the problem. I’d suggest performing a few rituals before treading on to debugging would be helpful in saving time and effort.

o) Try to reproduce the problem. Ensure problem is consistent.

o) Check for the logs emitted by the application. If not available, see if you could get the logs.

o) Note down the application configuration, including the input, system configuration and every tiny detail that makes sense to you.

o) Okay, check the stack-frames with “backtrace”.

o) You may inquire about frame locals and dig-in parent frames as well.

o) Also, it’s useful to have a look at the code and a 2-3 level dry-run of code to reach at the point of failure. So, say you have a stack-trace as:
foo()
bar()
pop()
top() –> dumps core
hop()

Check how did we reach till top from say bar(). Look out for any ancillary function call in meanwhile and report its side-effect.

Remember, debugging is more than digging-deep; it’s understanding the code tree and it’s branches. Always approach to the problem on a high level, think what might have caused this failure.

Then, just jump in.