During my phd research, I have to run a large number of numerical simulations. Those simulations are mainly written in C++ and are executed from the Linux console. When the executable is compiled using all debug macros enabled, it will spit out a lot of messages and numbers computed during the simulation. When I need to analyse this log (in case when something unexpected happened), I often redirect the standard output to some file, such that I can analyse the behaviour of the simulation afterwards. For example:
./sim > dump.txt
runs the command ./sim and all standard output is redirected to dump.txt. This is a simple method for creating a log of a small simulation. When the complexity and the running-time of the simulation is increased, the log can be easily a few hundreds of gigabytes large. This is very disadvantageous, because the log-file can consume all free space of your hard-drive and writing all debug information to disk will slow down the simulation. So initially one wants to redirect all output to the bit-bucket and redirect the output to a file in case something unexpected happened. For example:
./sim > /dev/null
starts the simulation and trashes all output. In order to redirect the output in case something happened (the simulation got stuck in a certain time-step), one needs to redirect the output of a running process. To do so first the process id needs to be found. Using
ps aux | grep sim
one obtains a list which contains the PID of the simulation, i.e.,
user 25965 96.4 1.1 165372 47652 pts/5 R+ 10:40 0:27 ./sim2
25965 is the PID of our process. To see which files are in use by the process one executes the following command:
user@node2:~/simulation$ ls -l /proc/25965/fd total 0 lrwx------ 1 user user 64 Oct 28 10:41 0 -> /dev/pts/5 l-wx------ 1 user user 64 Oct 28 10:44 1 -> /dev/null lrwx------ 1 user user 64 Oct 28 10:41 2 -> /dev/pts/5 l-wx------ 1 user user 64 Oct 28 10:44 3 -> /home/user/simulation/stats.csv
File 0 is the standard input stream, which is now attached to the console. File 1 and 2 are the standard output and error stream. In this case the standard output is redirected to the bit-bucket and the standard error stream is displayed in the console. The third file is an additional file which writes out some statistics of the simulation, and can be used to determine its status. In order to redirect stream 2, we attach the debugger to the process by executing
gdb ./sim 25965
which halts the execution of the simulation and allows to perform some debugging tasks. In our case we just want to close stream 1 and attach a different file to it. Using the following commands, we can perform this redirection
(gdb) p close(1) $1 = 0 (gdb) p creat("/home/user/simulation/dump2.txt", 0600) $2 = 1 (gdb) q A debugging session is active. Inferior 1 [process 25965] will be detached. Quit anyway? (y or n) y
The first command closes stream 1 using a close system call, while the second command creates a new file and replaces the previous file handle. By quitting the debugger, the debugger is detached from the process and the process continues. Using the following command we can see that indeed the standard output is now written to disk.
user@node2:~/simulation$ ls -l /proc/25965/fd total 0 lrwx------ 1 user user 64 Oct 28 10:41 0 -> /dev/pts/5 l-wx------ 1 user user 64 Oct 28 10:44 1 -> /home/user/simulation/dump2.txt lrwx------ 1 user user 64 Oct 28 10:41 2 -> /dev/pts/5 l-wx------ 1 user user 64 Oct 28 10:44 3 -> /home/user/simulation/stats.csv
I’ve found this approach very useful in cases in which a process got stuck, but still is running and spitting out some useful debug information. If one initially redirects all output to a file, the execution was much slower and the log-file will consume a large part of the free disk-space.