1 HOWTO - using the library with perf {#howto_perf}
2 ===================================
4 @brief Using command line perf and OpenCSD to collect and decode trace.
6 This HOWTO explains how to use the perf cmd line tools and the openCSD
7 library to collect and extract program flow traces generated by the
8 CoreSight IP blocks on a Linux system. The examples have been generated using
9 an aarch64 Juno-r0 platform. All information is considered accurate and tested
10 using the latest version of the library and the `master` branch on the
11 [perf-opencsd github repository][1].
14 On Target Trace Acquisition - Perf Record
15 -----------------------------------------
16 All the enhancement to the Perf tools that support the new `cs_etm` pmu have
17 not been upstreamed yet. To get the required functionality branch
18 `perf-opencsd-master` needs to be downloaded to the target system where
19 traces are to be collected. This branch is a vanilla upstream kernel
20 supplemented with modifications to the CoreSight framework and drivers to be
21 usable by the Perf core. The remaining out of tree patches are being
22 upstreamed incrementally.
24 From there compiling the perf tools with `make -C tools/perf` will yield a
25 `perf` executable that will support CoreSight trace collection. Note that if
26 traces are to be decompressed *off* target, there is no need to download and
27 compile the openCSD library (on the target).
29 Before launching a trace run a sink that will collect trace data needs to be
30 identified. All CoreSight blocks identified by the framework are registed in
34 linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/
35 20010000.etf 20040000.main_funnel 22040000.etm 22140000.etm
36 230c0000.A53_funnel 23240000.etm replicator@20020000 20030000.tpiu
37 20070000.etr 220c0000.A57_funnel 23040000.etm 23140000.etm 23340000.etm
40 CoreSight blocks are listed in the device tree for a specific system and
41 discovered at boot time. Since tracers can be linked to more than one sink,
42 the sink that will recieve trace data needs to be identified and given as an
43 option on the perf command line. Once a sink has been identify trace collection
44 can start. An easy and yet interesting example is the `uname` command:
46 linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@20070000.etr/ --per-thread uname
48 This will generate a `perf.data` file where execution has been traced for both
49 user and kernel space. To narrow the field to either user or kernel space the
50 `u` and `k` options can be specified. For example the following will limit
54 linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@20070000.etr/u --per-thread uname
55 Problems setting modules path maps, continuing anyway...
56 -----------------------------------------------------------
60 { sample_period, sample_freq } 1
61 sample_type IP|TID|IDENTIFIER
68 ------------------------------------------------------------
69 sys_perf_event_open: pid 11375 cpu -1 group_fd -1 flags 0x8
70 ------------------------------------------------------------
75 { sample_period, sample_freq } 1
76 sample_type IP|TID|IDENTIFIER
88 ------------------------------------------------------------
89 sys_perf_event_open: pid 11375 cpu -1 group_fd -1 flags 0x8
91 AUX area mmap length 131072
92 perf event ring buffer mmapped per thread
93 Synthesizing auxtrace information
95 auxtrace idx 0 old 0 head 0x11ea0 diff 0x11ea0
96 [ perf record: Woken up 1 times to write data ]
98 7f99daf000-7f99db0000 0 [vdso]
99 7f99d84000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
100 7f99d84000-7f99daf000 0 /lib/aarch64-linux-gnu/ld-2.21.so
101 7f99db0000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
102 failed to write feature 8
103 failed to write feature 9
104 failed to write feature 14
105 [ perf record: Captured and wrote 0.072 MB perf.data ]
107 linaro@linaro-nano:~/kernel$ ls -l ~/.debug/ perf.data
108 _-rw------- 1 linaro linaro 77888 Mar 2 20:41 perf.data
110 /home/linaro/.debug/:
112 drwxr-xr-x 2 linaro linaro 4096 Mar 2 20:40 [kernel.kallsyms]
113 drwxr-xr-x 2 linaro linaro 4096 Mar 2 20:40 [vdso]
114 drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 bin
115 drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 lib
119 The amount of traces generated by CoreSight tracers is staggering, event for
120 the most simple trace scenario. Reducing trace generation to specific areas
121 of interest is desirable to save trace buffer space and avoid getting lost in
122 the trace data that isn't relevant. Supplementing the 'k' and 'u' options
123 described above is the notion of address filters.
125 On CoreSight two types of address filter have been implemented - address range
126 and start/stop filter:
128 **Address range filters:**
129 With address range filters traces are generated if the instruction pointer
130 falls within the specified range. Any work done by the CPU outside of that
131 range will not be traced. Address range filters can be specified for both
132 user and kernel space session:
134 perf record -e cs_etm/@20070000.etr/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname
136 perf record -e cs_etm/@20070000.etr/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
138 When dealing with kernel space trace addresses are typically taken in the
139 'System.map' file. In user space addresses are relocatable and can be
140 extracted from an objdump output:
142 $ aarch64-linux-gnu-objdump -d libcstest.so.1.0
145 000000000000072c <coresight_test1>: <------------ Beginning of traces
146 72c: d10083ff sub sp, sp, #0x20
147 730: b9000fe0 str w0, [sp,#12]
148 734: b9001fff str wzr, [sp,#28]
149 738: 14000007 b 754 <coresight_test1+0x28>
150 73c: b9400fe0 ldr w0, [sp,#12]
151 740: 11000800 add w0, w0, #0x2
152 744: b9000fe0 str w0, [sp,#12]
153 748: b9401fe0 ldr w0, [sp,#28]
154 74c: 11000400 add w0, w0, #0x1
155 750: b9001fe0 str w0, [sp,#28]
156 754: b9401fe0 ldr w0, [sp,#28]
157 758: 7100101f cmp w0, #0x4
158 75c: 54ffff0d b.le 73c <coresight_test1+0x10>
159 760: b9400fe0 ldr w0, [sp,#12]
160 764: 910083ff add sp, sp, #0x20
165 Following the address the amount of byte is specified and if tracing in user
166 space, the full path to the binary (or library) being traced.
168 **Start/Stop filters:**
169 With start/stop filters traces are generated when the instruction pointer is
170 equal to the start address. Incidentally traces stop being generated when the
171 insruction pointer is equal to the stop address. Anything that happens between
172 there to events is traced:
174 perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname
176 perf record -vvv -e cs_etm/@20070000.etr/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0, \
177 stop 0x40082c@/home/linaro/main' \
180 **Limitation on address filters:**
181 The only limitation on address filters is the amount of address comparator
182 found on an implementation and the mutual exclusion between range and
183 start stop filters. As such the following example would _not_ work:
185 perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop
186 filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' \ // address range
189 Additional Trace Options
190 ------------------------
191 Additional options can be used during trace collection that add information to the captured trace.
193 - Timestamps: These packets are added to the trace streams to allow correlation of different sources where tools support this.
194 - Cycle Counts: These packets are added to get a count of cycles for blocks of executed instructions. Adding cycle counts will considerably increase the amount of generated trace.
195 The relationship between cycle counts and executed instructions differs according to the trace protocol.
196 For example, the ETMv4 protocol will emit counts for groups of instructions according to a minimum count threshold.
197 Presently this threshold is fixed at 256 cycles for `perf record`.
199 Command line options in `perf record` to use these features are part of the options for the `cs_etm` event:
201 perf record -e cs_etm/timestamp,cycacc,@20070000.etr/ --per-thread uname
203 At current version, `perf record` and `perf script` do not use this additional information.
205 On Target Trace Collection
206 --------------------------
207 The entire program flow will have been recorded in the `perf.data` file.
208 Information about libraries and executable is stored under `$HOME/.debug`:
210 linaro@linaro-nano:~/kernel$ tree ~/.debug
212 ├── [kernel.kallsyms]
213 │ └── 0542921808098d591a7acba5a1163e8991897669
216 │ └── 551fbbe29579eb63be3178a04c16830b8d449769
220 │ └── ed95e81f97c4471fb2ccc21e356b780eb0c92676
223 └── aarch64-linux-gnu
225 │ └── 94912dc5a1dc8c7ef2c4e4649d4b1639b6ebc8b7
228 └── 169a143e9c40cfd9d09695333e45fd67743cd2d6
231 13 directories, 5 files
232 linaro@linaro-nano:~/kernel$
235 All this information needs to be collected in order to successfully decode
238 linaro@linaro-nano:~/kernel$ tar czf uname.trace.tgz perf.data ~/.debug
241 Note that file `vmlinux` should also be added to the bundle if kernel traces
242 have also been collected.
245 Off Target OpenCSD Compilation
246 ------------------------------
247 The openCSD library is not part of the perf tools. It is available on
248 [github][1] and needs to be compiled before the perf tools. Checkout the
249 required branch/tag version into a local directory.
251 linaro@t430:~/linaro/coresight$ git clone -b v0.8 https://github.com/Linaro/OpenCSD.git my-opencsd
252 Cloning into 'OpenCSD'...
253 remote: Counting objects: 2063, done.
254 remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063
255 Receiving objects: 100% (2063/2063), 2.51 MiB | 1.24 MiB/s, done.
256 Resolving deltas: 100% (1399/1399), done.
257 Checking connectivity... done.
258 linaro@t430:~/linaro/coresight$ ls my-opencsd
259 decoder LICENSE README.md HOWTO.md TODO
261 Once the source code has been acquired compilation of the openCSD library can
262 take place. For Linux two options are available, LINUX and LINUX64, based on
263 the host's (which has nothing to do with the target) architecture:
265 linaro@t430:~/linaro/coresight/$ cd my-opencsd/decoder/build/linux/
266 linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls
267 makefile rctdl_c_api_lib ref_trace_decode_lib
269 linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ make LINUX64=1 DEBUG=1
273 linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls ../../lib/linux64/dbg/
274 libopencsd.a libopencsd_c_api.a libopencsd_c_api.so libopencsd.so
276 From there the header file and libraries need to be installed on the system,
277 something that requires root privileges. The default installation path is
278 /usr/include/opencsd for the header files and /usr/lib/ for the libraries:
280 linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ sudo make install
281 linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/include/opencsd
283 drwxr-xr-x 2 root root 4096 Dec 12 10:19 c_api
284 drwxr-xr-x 2 root root 4096 Dec 12 10:19 etmv3
285 drwxr-xr-x 2 root root 4096 Dec 12 10:19 etmv4
286 -rw-r--r-- 1 root root 28049 Dec 12 10:19 ocsd_if_types.h
287 drwxr-xr-x 2 root root 4096 Dec 12 10:19 ptm
288 drwxr-xr-x 2 root root 4096 Dec 12 10:19 stm
289 -rw-r--r-- 1 root root 7264 Dec 12 10:19 trc_gen_elem_types.h
290 -rw-r--r-- 1 root root 3972 Dec 12 10:19 trc_pkt_types.h
292 linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/lib/libopencsd*
293 -rw-r--r-- 1 root root 598720 Dec 12 10:19 /usr/lib/libopencsd_c_api.so
294 -rw-r--r-- 1 root root 4692200 Dec 12 10:19 /usr/lib/libopencsd.so
296 A "clean_install" target is also available so that openCSD installed files can
297 be removed from a system. Going forward the goal is to have the openCSD library
298 packaged as a Debian or RPM archive so that it can be installed from a
299 distribution without having to be compiled.
302 Off Target Perf Tools Compilation
303 ---------------------------------
304 As mentionned above the openCSD library is not part of the perf tools' code base
305 and needs to be installed on a system prior to compilation. Information about
306 the status of the openCSD library on a system is given at compile time by the
307 perf tools build script:
309 linaro@t430:~/linaro/linux-kernel$ make VF=1 -C tools/perf
310 Auto-detecting system features:
312 ... dwarf_getlocations: [ on ]
319 ... numa_num_possible_cpus: [ OFF ]
321 ... libpython: [ on ]
323 ... libcrypto: [ on ]
324 ... libunwind: [ OFF ]
325 ... libdw-dwarf-unwind: [ on ]
328 ... get_cpuid: [ on ]
330 ... libopencsd: [ on ] <-------
333 At the end of the compilation a new perf binary is available in `tools/perf/`:
335 linaro@t430:~/linaro/linux-kernel$ ldd tools/perf/perf
336 linux-vdso.so.1 => (0x00007fff135db000)
337 libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f15f9176000)
338 librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f15f8f6e000)
339 libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f15f8c64000)
340 libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f15f8a60000)
341 libopencsd_c_api.so => /usr/lib/libopencsd_c_api.so (0x00007f15f884e000) <-------
342 libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f15f8635000)
343 libdw.so.1 => /usr/lib/x86_64-linux-gnu/libdw.so.1 (0x00007f15f83ec000)
344 libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007f15f81c5000)
345 libslang.so.2 => /lib/x86_64-linux-gnu/libslang.so.2 (0x00007f15f7e38000)
346 libperl.so.5.22 => /usr/lib/x86_64-linux-gnu/libperl.so.5.22 (0x00007f15f7a5d000)
347 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f15f7693000)
348 libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007f15f7104000)
349 libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f15f6eea000)
350 /lib64/ld-linux-x86-64.so.2 (0x0000559b88038000)
351 libopencsd.so => /usr/lib/libopencsd.so (0x00007f15f6c62000) <-------
352 libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f15f68df000)
353 libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f15f66c9000)
354 liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f15f64a6000)
355 libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f15f6296000)
356 libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f15f605e000)
357 libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f15f5e5a000)
360 Additional debug output from the decoder can be compiled in by setting the
361 `CSTRACE_RAW` environment variable. Setting this to `packed` gets trace frame
364 Frame Data; Index 576; RAW_PACKED; d6 d6 d6 d6 d6 d6 d6 d6 fc fb d6 d6 d6 d6 e0 7f
365 Frame Data; Index 576; ID_DATA[0x14]; d7 d6 d7 d6 d7 d6 d7 d6 fd fb d7 d6 d7 d6 e0
367 Set to any other value will remove the RAW_PACKED lines.
369 Working with a debug version of the openCSD library
370 ---------------------------------------------------
371 When compiling the perf tools it is possible to reference another version of
372 the openCSD library than the one installed on the system. This is useful when
373 working with multiple development trees or having the desire to keep system
374 libraries intact. Two environment variable are available to tell the perf tools
375 build script where to get the header file and libraries, namely CSINCLUDES and
378 linaro@t430:~/linaro/linux-kernel$ export CSINCLUDES=~/linaro/coresight/my-opencsd/decoder/include/
379 linaro@t430:~/linaro/linux-kernel$ export CSLIBS=~/linaro/coresight/my-opencsd/decoder/lib/linux64-rel/
380 linaro@t430:~/linaro/linux-kernel$ make VF=1 -C tools/perf
382 This will have the effect of compiling and linking against the provided library.
383 Since the system's openCSD library is in the loader's search patch the
384 LD_LIBRARY_PATH environment variable needs to be set.
386 linaro@t430:~/linaro/linux-kernel$ export LD_LIBRARY_PATH=$CSLIBS
389 Trace Decoding with Perf Report
390 -------------------------------
391 Before working with custom traces it is suggested to use a trace bundle that
392 is known to be working properly. A sample bundle has been made available
393 here [2]. Trace bundles can be extracted anywhere and have no dependencies on
394 where the perf tools and openCSD library have been compiled.
396 linaro@t430:~/linaro/coresight$ mkdir sept20
397 linaro@t430:~/linaro/coresight$ cd sept20
398 linaro@t430:~/linaro/coresight/sept20$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
399 linaro@t430:~/linaro/coresight/sept20$ md5sum uname.v4.user.sept20.tgz
400 f53f11d687ce72bdbe9de2e67e960ec6 uname.v4.user.sept20.tgz
401 linaro@t430:~/linaro/coresight/sept20$ tar xf uname.v4.user.sept20.tgz
402 linaro@t430:~/linaro/coresight/sept20$ ls -la
404 drwxrwxr-x 3 linaro linaro 4096 Mar 3 10:26 .
405 drwxrwxr-x 5 linaro linaro 4096 Mar 3 10:13 ..
406 drwxr-xr-x 7 linaro linaro 4096 Feb 24 12:21 .debug
407 -rw------- 1 linaro linaro 78016 Feb 24 12:21 perf.data
408 -rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.sept20.tgz
410 Perf is expecting files related to the trace capture (`perf.data`) to be located
411 under `~/.debug` [3]. This example will remove the current `~/.debug` directory
412 to be sure everything is clean.
414 linaro@t430:~/linaro/coresight/sept20$ rm -rf ~/.debug
415 linaro@t430:~/linaro/coresight/sept20$ cp -dpR .debug ~/
416 linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio
418 # To display the perf.data header info, please use --header/--header-only options.
421 # Total Lost Samples: 0
423 # Samples: 0 of event 'cs_etm//u'
424 # Event count (approx.): 0
426 # Children Self Command Shared Object Symbol
427 # ........ ........ ....... ............. ......
431 # Samples: 0 of event 'dummy:u'
432 # Event count (approx.): 0
434 # Children Self Command Shared Object Symbol
435 # ........ ........ ....... ............. ......
439 # Samples: 115K of event 'instructions:u'
440 # Event count (approx.): 522009
442 # Children Self Command Shared Object Symbol
443 # ........ ........ ....... ................ ......................
445 4.13% 4.13% uname libc-2.21.so [.] 0x0000000000078758
446 3.81% 3.81% uname libc-2.21.so [.] 0x0000000000078e50
447 2.06% 2.06% uname libc-2.21.so [.] 0x00000000000fcaf4
448 1.65% 1.65% uname libc-2.21.so [.] 0x00000000000fcae4
449 1.59% 1.59% uname ld-2.21.so [.] 0x000000000000a7f4
450 1.50% 1.50% uname libc-2.21.so [.] 0x0000000000078e40
451 1.43% 1.43% uname libc-2.21.so [.] 0x00000000000fcac4
452 1.31% 1.31% uname libc-2.21.so [.] 0x000000000002f0c0
453 1.26% 1.26% uname ld-2.21.so [.] 0x0000000000016888
454 1.24% 1.24% uname libc-2.21.so [.] 0x0000000000078e7c
455 1.24% 1.24% uname libc-2.21.so [.] 0x00000000000fcab8
458 Additional data can be obtained, which contains a dump of the trace packets received using the command
460 mjl@ubuntu-vbox:./perf-opencsd-master/coresight/tools/perf/perf report --stdio --dump
462 resulting a large amount of data, trace looking like:-
464 0x618 [0x30]: PERF_RECORD_AUXTRACE size: 0x11ef0 offset: 0 ref: 0x4d881c1f13216016 idx: 0 tid: 15244 cpu: -1
466 . ... CoreSight ETM Trace data: size 73456 bytes
468 0: I_ASYNC : Alignment Synchronisation.
469 12: I_TRACE_INFO : Trace Info.
470 17: I_TRACE_ON : Trace On.
471 18: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F24D80; Ctxt: AArch64,EL0, NS;
472 28: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
473 29: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
474 30: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
475 32: I_ATOM_F6 : Atom format 6.; EEEEN
476 33: I_ATOM_F1 : Atom format 1.; E
477 34: I_EXCEPT : Exception.; Data Fault; Ret Addr Follows;
478 36: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C;
479 45: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS;
480 56: I_TRACE_ON : Trace On.
481 57: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; Ctxt: AArch64,EL0, NS;
482 68: I_ATOM_F3 : Atom format 3.; NEE
483 69: I_ATOM_F3 : Atom format 3.; NEN
484 70: I_ATOM_F3 : Atom format 3.; NNE
485 71: I_ATOM_F5 : Atom format 5.; ENENE
486 72: I_ATOM_F5 : Atom format 5.; NENEN
487 73: I_ATOM_F5 : Atom format 5.; ENENE
488 74: I_ATOM_F5 : Atom format 5.; NENEN
489 75: I_ATOM_F5 : Atom format 5.; ENENE
490 76: I_ATOM_F3 : Atom format 3.; NNE
491 77: I_ATOM_F3 : Atom format 3.; NNE
492 78: I_ATOM_F3 : Atom format 3.; NNE
493 80: I_ATOM_F3 : Atom format 3.; NNE
494 81: I_ATOM_F3 : Atom format 3.; ENN
495 82: I_EXCEPT : Exception.; Data Fault; Ret Addr Follows;
496 84: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0;
497 93: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS;
498 104: I_TRACE_ON : Trace On.
499 105: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; Ctxt: AArch64,EL0, NS;
500 116: I_ATOM_F5 : Atom format 5.; NNNNN
501 117: I_ATOM_F5 : Atom format 5.; NNNNN
504 Trace Decoding with Perf Script
505 -------------------------------
506 Working with perf scripts needs more command line options but yields
509 linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
510 linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
511 linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
512 linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump
514 7f89f24d80: 910003e0 mov x0, sp
515 7f89f24d84: 94000d53 bl 7f89f282d0 <free@plt+0x3790>
516 7f89f282d0: d11203ff sub sp, sp, #0x480
517 7f89f282d4: a9ba7bfd stp x29, x30, [sp,#-96]!
518 7f89f282d8: 910003fd mov x29, sp
519 7f89f282dc: a90363f7 stp x23, x24, [sp,#48]
520 7f89f282e0: 9101e3b7 add x23, x29, #0x78
521 7f89f282e4: a90573fb stp x27, x28, [sp,#80]
522 7f89f282e8: a90153f3 stp x19, x20, [sp,#16]
523 7f89f282ec: aa0003fb mov x27, x0
524 7f89f282f0: 910a82e1 add x1, x23, #0x2a0
525 7f89f282f4: a9025bf5 stp x21, x22, [sp,#32]
526 7f89f282f8: a9046bf9 stp x25, x26, [sp,#64]
527 7f89f282fc: 910102e0 add x0, x23, #0x40
528 7f89f28300: f800841f str xzr, [x0],#8
529 7f89f28304: eb01001f cmp x0, x1
530 7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0>
531 7f89f28300: f800841f str xzr, [x0],#8
532 7f89f28304: eb01001f cmp x0, x1
533 7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0>
534 7f89f28300: f800841f str xzr, [x0],#8
535 7f89f28304: eb01001f cmp x0, x1
536 7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0>
538 Kernel Trace Decoding
539 ---------------------
541 When dealing with kernel space traces the vmlinux file has to be communicated
542 explicitely to perf using the "--vmlinux" command line option:
544 linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio --vmlinux=./vmlinux
547 linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf script --vmlinux=./vmlinux
549 When using scripts things get a little more convoluted. Using the same example
550 an above but for traces but for kernel traces, the command line becomes:
552 linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
553 linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
554 linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
555 linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script \
556 --vmlinux=./vmlinux \
557 --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- \
558 -d ${XTOOLS_PATH}/aarch64-linux-gnu-objdump \
563 The option "--vmlinux=./vmlinux" is interpreted by the "perf script" command
564 the same way it if for "perf report". The option "-k ./vmlinux" is dependant
565 on the script being executed and has no related to the "--vmlinux", though it
566 is highly advised to keep them synchronized.
569 Perf Test Environment Scripts
570 -----------------------------
572 The decoder library comes with a number of `bash` scripts that ease the setting up of the
573 offline build and test environment for perf, and executing tests.
575 These scripts can be found in
577 decoder/tests/perf-test-scripts
579 There are three scripts provided:
581 - `perf-setup-env.bash` : this sets up all the environment variables mentioned above.
582 - `perf-test-report.bash` : this runs `perf report` - using the environment setup by `perf-setup-env.bash`
583 - `perf-test-script.bash` : this runs `perf script` - using the environment setup by `perf-setup-env.bash`
587 1. Prior to building perf, edit `perf-setup-env.bash` to conform to your environment. There are four lines at the top of the file that will require editing.
589 2. Execute the script using the command
591 source perf-setup-env.bash
593 This will set up all the environment variables mentioned in the sections on building and running
594 perf above, and these are used by the `perf-test...` scripts to run the tests.
596 3. Build perf as described above.
597 4. Follow the instructions for downloading the test capture, or create a capture from your target.
598 5. Copy the `perf-test...` scripts into the capture data directory -> the one that contains `perf.data`.
600 6. The scripts can now be run. No options are required for the default operation, but any command line options will be added to the perf report / perf script command line.
604 ./perf-test-report.bash --dump
606 will add the --dump option to the end of the command line and run
608 ${PERF_EXEC_PATH}/perf report --stdio --dump
611 Generating coverage files for Feedback Directed Optimization: AutoFDO
612 ---------------------------------------------------------------------
614 Below is an example of using ARM ETM for autoFDO. The updates to the perf
615 support for this is experimental and available on the 'autoFDO' branch of
616 the [perf-opencsd github repository][1].
618 It also requires autofdo (https://github.com/google/autofdo) and gcc version 5. The bubble
619 sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
621 $ gcc-5 -O3 sort.c -o sort_optimized
622 $ taskset -c 2 ./sort_optimized
623 Bubble sorting array of 30000 elements
626 $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
627 Bubble sorting array of 30000 elements
629 [ perf record: Woken up 35 times to write data ]
630 [ perf record: Captured and wrote 69.640 MB perf.data ]
632 $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
633 $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
634 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
635 $ taskset -c 2 ./sort_autofdo
636 Bubble sorting array of 30000 elements
640 The Linaro CoreSight Team
641 -------------------------
650 We welcome help on this project. If you would like to add features or help
651 improve the way things work, we want to hear from you.
654 *The Linaro CoreSight Team*
656 --------------------------------------
657 [1]: https://github.com/Linaro/perf-opencsd "perf-opencsd Github"
659 [2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
661 [3]: Get in touch with us if you know a way to change this.