Debugging production system with gdb

GNU GDB provides capability to take core of a live application. This is very useful:

  • if you have a hung application (deadlocks, infinite waits)
  • if you can’t afford to stall the whole system for debugging

The “gcore” facility allows you to save memory image of a process to a core file.

This core file can be analyzed similar to an ordinary core.

Example: The process PID 4560 is a hung multi-threaded application.

kanaujia@ubuntu:~$ sudo gdb -p 4560

GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
This GDB was configured as "i686-linux-gnu".
Attaching to process 4560
Reading symbols from /home/kanaujia/work/a.out...(no debugging symbols found)...done.
Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb6da3b40 (LWP 4562)]
[New Thread 0xb75a4b40 (LWP 4561)]
Loaded symbols for /lib/i386-linux-gnu/libpthread.so.0
Reading symbols from /lib/i386-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb7758e1c in pthread_join () from /lib/i386-linux-gnu/libpthread.so.0
#2  0x0804866b in main ()
(gdb) gcore
Saved corefile core.4560
(gdb) q
A debugging session is active.

    Inferior 1 [process 4560] will be detached.

Quit anyway? (y or n) y
Detaching from program: /home/kanaujia/work/a.out, process 4560
kanaujia@ubuntu:~$ ls
core.4560                   Dropbox           PDF       VirtualBox VMs

Analyse the core

kanaujia@ubuntu:~/work$ gdb ./a.out ./core.4560 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Reading symbols from /home/kanaujia/work/a.out...(no debugging symbols found)...done.
[New LWP 4561]
[New LWP 4562]
[New LWP 4560]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `a.out'.
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb775e5a2 in __lll_lock_wait () from /lib/i386-linux-gnu/libpthread.so.0
#2  0xb7759ead in _L_lock_686 () from /lib/i386-linux-gnu/libpthread.so.0
#3  0xb7759cf3 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
#4  0x080486e0 in functionCount1 ()
#5  0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#6  0xb7695d3e in clone () from /lib/i386-linux-gnu/libc.so.6
(gdb) info threads
  Id   Target Id         Frame 
  3    LWP 4560          0xb7789424 in __kernel_vsyscall ()
  2    LWP 4562          0xb7789424 in __kernel_vsyscall ()
* 1    LWP 4561          0xb7789424 in __kernel_vsyscall ()

(gdb) thread
[Current thread is 1 (LWP 4561)]
(gdb) thread 2
[Switching to thread 2 (LWP 4562)]
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb775b96b in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/i386-linux-gnu/libpthread.so.0
#2  0x08048765 in functionCount2 ()
#3  0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#4  0xb7695d3e in clone () from /lib/i386-linux-gnu/libc.so.6
(gdb) thread 3
[Switching to thread 3 (LWP 4560)]
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb7758e1c in pthread_join () from /lib/i386-linux-gnu/libpthread.so.0
#2  0x0804866b in main ()
(gdb) thread 1
[Switching to thread 1 (LWP 4561)]
#0  0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7789424 in __kernel_vsyscall ()
#1  0xb775e5a2 in __lll_lock_wait () from /lib/i386-linux-gnu/libpthread.so.0  <<<- This is the cause of hang
#2  0xb7759ead in _L_lock_686 () from /lib/i386-linux-gnu/libpthread.so.0
#3  0xb7759cf3 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
#4  0x080486e0 in functionCount1 ()
#5  0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#6  0xb7695d3e in clone () from /lib/i386-linux-gnu/libc.so.6

In thread 1, the back trace shows that we are waiting on a lock.

(gdb) thread 1
[Switching to thread 1 (LWP 4561)]
#0 0xb7789424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7789424 in __kernel_vsyscall ()
#1 0xb775e5a2 in __lll_lock_wait () from /lib/i386-linux-gnu/libpthread.so.0
#2 0xb7759ead in _L_lock_686 () from /lib/i386-linux-gnu/libpthread.so.0
#3 0xb7759cf3 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
#4 0x080486e0 in functionCount1 ()
#5 0xb7757d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s