Apple’s command-line development tools

In addition to the GUI-based development tools, Apple has included some very powerful and useful command-line tools for debugging and monitoring Mac OS X applications. You may wonder why you need to use UNIX-like command-line tools for developing Mac OS XGUI applications when GUI tools are available.

Mac OS X applications primarily use the Cocoa and Carbon frameworks for their services; but these services use the underlying Darwin operating system, which is a preemptive multitasking system that supports many programs running concur- rently. Understanding this interaction and being able to use it to your best advantage can make all the difference between a snappy, properly performing program and a sluggish program that is no fun to use. Currently, the command- line tools supplied by Apple let you peek into the operating system while your program is running and see how it is using the system’s resources. This is a powerful tool ability that will help you understand how to design and potentially opti- mize your program to make the best use of Darwin’s power. In addition, the command-line tools offer a greater level of detail than the GUI tools.

Another application of these tools is troubleshooting programs you did not write, but suspect are causing problems (reverse engineering programs). Imagine you are using a script to insert data records into a database. Some insert operations are very slow and cause excessive disk thrashing in the database program.

By using the command-line tools, you can get a snapshot of how the database program interacts with the operating system, which may shed light on the cause of the problem.

All the command-line tools are simple to use, but they do require some study to understand their use and features. In truth, you must understand the operating system and the memory allocation scheme, and you need some experience using the tools on real problems. Luckily, man pages are available for all the tools.

Some of these tools, like top and gprof, will be familiar to most UNIX developers;

others are specific to the Mac OS X environment. In this section, I will try to mini- mize repeating information from the man pages and instead concentrate on showing examples of how you can use these tools for common development activities.

4.6.1 ps (process status) and top (system usage statistics)

Both the ps and top commands will be familiar to most UNIX users. You use the ps command to get status information for a process. Its typical command-line invocation is in one of the following forms:

ps aux

ps aux | grep [process-name]

The first syntax lists extended information for all process on the system for all users.

The second displays the same information, but only for the specified process name.

The top command iteratively shows system usage statistics for the top processe.

The Mac OS X implementation is somewhat different from those running on other flavors of UNIX. It displays more information that is specific to Mac OS X and gives you a quick snapshot of what is going on in the system. Figure 4.19 shows the output of the top command for a Mac OS X machine (Darwin Kernel Version 5.5: Thu May 30 14:51:26 PDT 2002; root:xnu/xnu-201.42.3.obj~1/RELEASE_PPC Power Macintosh powerpc).

Figure 4.20 shows the output of the top command for a Linux machine (Linux 2.4.7-10smp #1 SMP Thu Sep 6 17:09:31 EDT 2001 i686 unknown).

Figure 4.19 Output of the top command on a Mac OS X machine

Figure 4.20 Output of the top command for a Linux machine

The output of the top command shown in figure 4.21 comes from a Solaris 2.7 machine (SunOS 5.7 Generic sun4u sparc SUNW,Ultra-60).

As you can see, the Mac OS X implementation provides information about thread usage at both the system and individual process level. Another valuable feature of the Mac OS X version is that if a process’s VSIZE (the total address space currently allocated) is increasing, the command places a + after the value.

This is a quick indicator that the program’s memory usage is increasing, which could indicate a memory leak. See the man pages for more detailed information about the ps and top commands’ usage and options under Mac OS X.

4.6.2 sc_usage: showing system call usage statistics

Suppose you are developing a simulation program that requires the processing of large amounts of data. Ideally, you would like to read the data into physical memory, perform your calculations on the data, and write the result. Alterna- tively, perhaps you are writing a multithreaded program and you need to get detailed information about what system calls the program makes, as well as thread performance, cache hits, and timing. You need a tool that enables you to peek into a program as it runs and view its runtime state. In either case, the sc_usage command is a good choice.

The sc_usage command samples an application at a specified interval, showing the system calls it makes as well as other information such as the number of generated page faults. This information helps you understand the kinds of system calls your program makes, and also lets you determine potential performance bottlenecks.

Let’s use sc_usage to look at the threaded server program described in section 4.5.17:

Figure 4.21 Output of the top command for a Solaris 2.7 machine

1 Run the ThreadedServer program and get its process identifier (using top or ps).

2 Open a shell in the Terminal application and enter the following command (you must be root or have root privileges to run this command):

% sudo sc_usage [pid-of-server]

3 By default, sc_usage samples the server application every second. Send it some messages with the client Perl script (see section 4.5.17 for more information). Figure 4.22 shows the output of the program.

As you can see, the output includes a lot of useful information. The upper part of the display tells you the number of threads in the program, the current system time (21:15:31), how long the sc_usage command has been running, and some global state information. The next columns of data show the system calls made thus far, the number of times each call was made (from when the sc_usage command was run), the CPU time consumed by the command at the current sample time, and the time the process has been waiting. Below this, the output lists the current system calls, the last path name that was blocked, the cumulative thread block time, the thread number, and the thread priority.

Figure 4.22 Output of the threaded server program using the sc_usage command

With this information, you can see that the server makes a lot of calls to the read and write functions, as you would expect, and a high number of calls to accept. You may be able to improve performance by changing the server from an accept-based server to a select server. The information at the bottom of the display is also help- ful: it tells you that four of the threads are in the read system call, one is blocked on a semaphore, and the other is waiting on the accept call. It also provides cumulative timing information for each thread.

The output of sc_usage provides detailed information about the current state of the program. Compare this with the output of the Thread Viewer program run on the same example—Thread Viewer provides a nice graphical view of the program and threads, but it does not offer the same level of information.

The sc_usage command works for a range of applications and types of problems. Try it with programs you did not write, to look at the system call distribu- tion and timing information. It is an excellent reverse-engineering tool, and it is especially useful for looking at Mac OS X programs that you suspect are really Cocoa interfaces that call UNIX commands for services.

4.6.3 fs_usage: reporting system calls and page faults related to the filesystem in real-time

One of the great things about UNIX is the number of cool programming tools that come with the system. Apple continues this tradition by providing a useful file system utility called fs_usage. This command presents a continuous display of system-call usage information for file system operations. In its normal form (run with no command-line arguments), it displays information about all instan- tiated processes except the running fs_usage process, Terminal, telnetd, sshd, rlogind, tcsh, csh and sh. A less noisy way to use the command is to supply a process identifier (pid) as its only command-line argument. In this form, it reports all activity for the specified process.

The following listing shows an example of the program’s output while monitoring the ThreadedServer program:

% sudo fs_usage 3008

11:12:03 read 0.002226 W ThreadedServ 11:12:03 write 0.000017 ThreadedServ 11:12:03 write 0.000070 ThreadedServ 11:12:04 read 1.007925 W ThreadedServ 11:12:04 close 0.000105 ThreadedServ 11:12:04 read 0.000016 ThreadedServ 11:12:04 write 0.000032 ThreadedServ 11:12:04 write 0.000064 ThreadedServ 11:12:06 read 1.044468 W ThreadedServ

11:12:06 close 0.000066 ThreadedServ 11:12:06 read 0.000014 ThreadedServ 11:12:06 write 0.000021 ThreadedServ 11:12:06 write 0.000063 ThreadedServ 11:12:07 read 1.041570 W ThreadedServ

4.6.4 gprof: displaying execution profile data

As you saw earlier, the Apple developer tools come with a program called Sampler that helps you profile your program for performance bottlenecks. Because it uses a sample-based monitor technique, it will tell you the number of times the program called a function, but not the percentage of time each function takes within the program’s total runtime. For this type of analysis, gprof is the right tool.

GNUgprof displays the execution profile of a program. Let’s look at an example of how you can get this type of information using gprof.

The SamplerServer project implements two versions of a server, each with a different socket read function. The first, called SlowServer, reads from a socket one byte at a time until it reaches the string terminator. The second, called FastServer, reads a specific number of bytes from the socket. Richard Stevens points out this problem and discusses design choices and solutions in his classic book on net- work programming; as Stevens points out, the byte-by-byte version spends most of its time in the kernel (trapping the kernel with repeated system calls), whereas the other version significantly reduces kernel noise and improves performance.

Both server implementations enable you to test this behavior and see for yourself the performance differences.

The project is set up with two targets: one for the slow socket read (Slow- Server) and one for the fast read (FastServer). Let’s test them, use gprof, and eval- uate the results:

1 Select the Targets tab, select SlowServer from the Target list, select the Build Settings tab in the Editor pane, and make sure the Generate Profil- ing Code checkbox is selected (under Compiler Options).

2 Build and run the program (Command-R).

3 Open the Terminal application, open a new shell, change to the project directory, and run the send script as follows. Make sure you send the server a very long string (say, longer than 3000) so you can really look at the socket read times. This script sends the string 100 times to the server, pausing one second between sends:

% perl send.pl 100 1 localhost 4444 [enter-long-string]

4 After reading 100 messages, the server will exit and generate a profiling script called gmon.out. The gprof program uses the gmon.out file to print program statistics. To view the program’s runtime statistics, change to the build directory and enter the following command:

% gprof SlowServer.app/Contents/MacOS/SlowServer gmon.out | less 5 Notice the full path to the executable. SlowServer.app is the bundle, not

the executable program, so you need to specify the full path to the executable within the bundle.

NOTE The gprof tool has been used for years by UNIX programmers to generate execution profiles of programs. Here’s how it works:

1 The first step is to add the –pg option when building the program you wish to profile. This option tells the compiler to compile source files for profiling and link with the profiling library. For example, to compile and link for profiling, use the following commands:

% gcc –o foo foo.c bar.c –g -pg

2 Now that the program is built for profiling, you can run it to generate the profiling information or the profiling profile. Run the program as you normally would and let it execute and exit as usual.

3 When the program exits, it writes to the current directory a file called gmon.out. This file contains the program’s runtime profile.

4 Run the gprof tool, passing it the gmon.out file, which displays the program’s runtime execution profile:

% gprof gmon.out > gprog.log

5 Repeat this process for the FastServer to get its performance statistics.

Figure 4.23 shows the output of the gprof program for the SlowServer implementation (reading from the socket one byte at a time).

The output for the FastServer implementation (reading a specific number of bytes from the socket at one time) is shown in figure 4.24.

The program spends 78.9 percent of the total runtime in the system call read, accounting for 0.15 seconds. The fast version is quite different; runtime statistics are so negligible that they do not even show up in the profiling output.

From this example, you can see that using gprof to profile your program can help you pinpoint and diagnose potential performance problems. In addition, try running the same example with the Sampler program and compare the result.

Although doing so is like comparing apples to oranges, it will give you a feel for how these tools differ and how to use them together to solve certain problems.

For more information, see gprof’s man page, as well as its GNU documentation.

4.6.5 leaks: searching a process’s memory for unreferenced malloc buffers

The Mac OS X development tools provide several programs you can use to detect memory leaks in your application. A memory leak occurs when you allocate memory within a program and never free it. The command-line tool called leaks performs a similar role as the GUI-based profiling tools discussed earlier in the chapter: detecting malloc-allocated memory locations where your program has lost the pointer to the allocated memory, causing a memory leak. The leaks command takes one argument, the pid of the process you wish to examine.

Let’s look at a simple example. The LeaksExample project implements a simple example of a memory leak. Open this project in Project Builder and build and run the program. You will see several logging messages in the output window fol- lowed by the program displaying a window. While the program is still running, get its process ID from the output window and run the following command in a shell (you must be user root or have root privileges to run the leaks command):

Figure 4.23 Output of the gprof program for the SlowServer implementation

Figure 4.24 Output of the gprof program for the FastServer implementation

% leaks 11786

Process 11786: 7424 nodes malloced Process 11786: 7 leaks

Leak: 0x0008fda0 size=46

0x80813ff0 0x80813ae0 0x80813ffc 0xa1b1c1d3 0x0008fd00 0x00091ef0 0x00091f70 0x00000000 0x00000000 0x00000000 0x00000000

Leak: 0x0008fcd0 size=46 instance of 'NSCFDictionary' 0x00058610 0x00010395 0x00000003 0x00000003

0x00000004 0xa1b1c1d3 0xa1b1c1d5 0x00000000 0x0008fda0 0x0008fdb0 0x00000000

Leak: 0x0007f340 size=46

0x00530068 0x006f0075 0x006c0064 0x0020006e 0x006f0074 0x00200073 0x00650065 0x0020006d 0x0065002e 0x00000000 0x00000000

Leak: 0x0007f040 size=46

0x00530068 0x006f0075 0x006c0064 0x0020006e 0x006f0074 0x00200073 0x00650065 0x0020006d 0x0065002e 0x00000000 0x00000000

Leak: 0x0008fcb0 size=30 instance of 'NSUserDefaults' 0x808190bc 0x00082850 0x0008fcd0 0x0008e980

0x00000000 0x00000000 0x00000000

Leak: 0x0007f370 size=30 instance of 'NSCFString' 0x80160880 0x000107f0 0x0007f340 0x00000012 0x0007f2b0 0x00000000 0x00000000

Leak: 0x0007f320 size=30 instance of 'NSCFString' 0x80160880 0x000107f0 0x0007f040 0x00000012 0x0007f2b0 0x00000000 0x00000000

As you can see, the leaks command detects that the program contains memory leaks and provides you with information about their locations.

Let’s look at the format of the information using the last record in the display (italicized in the code). The first line lists the address of the leaked memory block, its size (in bytes), and the source (in this case, an instance of the NSCF- String class). The next series of lines shows the contents of the allocated memory buffer in hexadecimal. You can use the –nocontext option to suppress displaying the allocated memory contents:

% leaks -nocontext 11786

Process 11786: 7424 nodes malloced Process 11786: 7 leaks

Leak: 0x0008fda0 size=46

Leak: 0x0008fcd0 size=46 instance of 'NSCFDictionary' Leak: 0x0007f340 size=46

Leak: 0x0007f040 size=46

Leak: 0x0008fcb0 size=30 instance of 'NSUserDefaults' Leak: 0x0007f370 size=30 instance of 'NSCFString' Leak: 0x0007f320 size=30 instance of 'NSCFString'

This information should give you a good idea where your program is leaking.

4.6.6 heap: listing all the malloc-allocated buffers in the process’s heap

The heap command is a experimental BSD tools that displays memory objects, including Objective-C objects, allocated on the heap of the specified process.

You run the command, passing it the pid of the program you wish to monitor.

The following listing shows a condensed example of heap’s output:

% heap [pid]

% sudo heap 3186 | more Process 3186: 6 zones

Zone CoreGraphicsDefaultZone_0x1671d0: Overall size: 256KB;

278 nodes malloced for 48KB (18% of capacity); largest unused:

[0x001 7331e-207KB]

Zone kCFAllocatorNull_0x701e6944: Overall size: 0KB Zone kCFAllocatorMalloc_0x701e6914: Overall size: 0KB Zone DefaultMallocZone_0x11f1d0: Overall size: 852KB;

6849 nodes malloced for 618KB (72% of capacity);

largest unused: [0x01eed88 e-205KB]

Zone Custom CFAllocator_0x701e698c: Overall size: 0KB

Zone kCFAllocatorSystemDefault_0x701e6928: Overall size: 0KB All zones: 7127 nodes malloced - 666KB

--- Zone DefaultMallocZone_0x11f1d0: 6849 nodes (632582 bytes)

<not Objective C object> = 6424 (613064 bytes) NSMenuItem = 52 (4056 bytes)

NSDynamicSystemColor = 29 (870 bytes) NSImage = 21 (630 bytes)

NSBitmapImageRep = 20 (1240 bytes) NSMethodSignature = 20 (2280 bytes) NSPathStore2 = 15 (1474 bytes)

NSMenu = 10 (300 bytes)

NSCarbonMenuImpl = 10 (140 bytes) NSCachedWhiteColor = 9 (126 bytes) NSDistantObject = 3 (74 bytes)

NSConcreteMutableData = 3 (90 bytes) NSWindowGraphicsContext = 3 (74 bytes) NSBundle = 2 (92 bytes)

NSView = 2 (188 bytes)

4.6.7 malloc_history: showing malloc allocations that a process has performed

The malloc_history command is another command-line tool that detects nonfreed memory allocations and buffer overwrites in your application. To use the command, follow these steps:

1 Open a shell and set the environment variables MallocStackLogging and MallocStackLoggingNoCompact to value 1 (or place them in your .csrch file):

% setenv MallocStackLogging 1

% setenv MallocStackLoggingNoCompact 1

2 Make sure you compile the program you wish to debug with debugging turned on (either through the –g option or by checking the Generate Debugging Symbols checkbox in Project Builder, under the Target Build settings).

3 Run the program. While it is running, run the following malloc_history command. (The leading clear command will clear the current shell’s output, enabling you to view the results more easily.) If the program contains leaks, malloc_history displays them, along with the call stack:

% clear; malloc_history 11941 -all_by_size

The malloc_debug command has many additional options than are described here. For more information, see the command’s man page.

4.6.8 sample: profiling a process during a time interval

The sample command acquires performance statistics for the specified application by sampling its execution at an interval specified by the user. It gathers data the same way as the GUI Sampler application, covered in section 4.5.16. The command takes three arguments: the pid of the process to monitor, how long the command should sample the program (in seconds), and the sampling rate (in milliseconds). For example, the following command collects performance statistics for the SampleExample program:

% sample 12525 20 10

Sampling 12525 each 10 msecs 2000 times Now analyzing results...

Samples: 42696 bytes

Analysis written to file /tmp/SamplerExample_12525.sample.txt

As you can see, a performance report is written to the /tmp directory, which contains a textual representation of the collected statistics. Here’s an example of the output generated by the sample program:

752 UnOptimized_LoopFusion(double *, double *, int) 392 sqrt

361 sqrt [STACK TOP]

Apple’s command-line development tools

The Mac OS X user interface

Creating an application with Project Builder