hjd | 0304f11 | 2016-11-14 22:36:53 | [diff] [blame] | 1 | # Heap Profiler Internals |
| 2 | |
| 3 | This document describes how the heap profiler works and how to add heap |
| 4 | profiling support to your allocator. If you just want to know how to use it, |
| 5 | see [Heap Profiling with MemoryInfra](heap_profiler.md) |
| 6 | |
| 7 | [TOC] |
| 8 | |
| 9 | ## Overview |
| 10 | |
| 11 | The heap profiler consists of tree main components: |
| 12 | |
| 13 | * **The Context Tracker**: Responsible for providing context (pseudo stack |
| 14 | backtrace) when an allocation occurs. |
| 15 | * **The Allocation Register**: A specialized hash table that stores allocation |
| 16 | details by address. |
| 17 | * **The Heap Dump Writer**: Extracts the most important information from a set |
| 18 | of recorded allocations and converts it into a format that can be dumped into |
| 19 | the trace log. |
| 20 | |
| 21 | These components are designed to work well together, but to be usable |
| 22 | independently as well. |
| 23 | |
| 24 | When there is a way to get notified of all allocations and frees, this is the |
| 25 | normal flow: |
| 26 | |
| 27 | 1. When an allocation occurs, call |
| 28 | [`AllocationContextTracker::GetInstanceForCurrentThread()->GetContextSnapshot()`][context-tracker] |
| 29 | to get an [`AllocationContext`][alloc-context]. |
| 30 | 2. Insert that context together with the address and size into an |
| 31 | [`AllocationRegister`][alloc-register] by calling `Insert()`. |
| 32 | 3. When memory is freed, remove it from the register with `Remove()`. |
| 33 | 4. On memory dump, collect the allocations from the register, call |
| 34 | [`ExportHeapDump()`][export-heap-dump], and add the generated heap dump to |
| 35 | the memory dump. |
| 36 | |
| 37 | [context-tracker]: https://chromium.googlesource.com/chromium/src/+/master/base/trace_event/heap_profiler_allocation_context_tracker.h |
| 38 | [alloc-context]: https://chromium.googlesource.com/chromium/src/+/master/base/trace_event/heap_profiler_allocation_context.h |
| 39 | [alloc-register]: https://chromium.googlesource.com/chromium/src/+/master/base/trace_event/heap_profiler_allocation_register.h |
| 40 | [export-heap-dump]: https://chromium.googlesource.com/chromium/src/+/master/base/trace_event/heap_profiler_heap_dump_writer.h |
| 41 | |
| 42 | *** aside |
| 43 | An allocator can skip step 2 and 3 if it is able to store the context itself, |
| 44 | and if it is able to enumerate all allocations for step 4. |
| 45 | *** |
| 46 | |
| 47 | When heap profiling is enabled (the `--enable-heap-profiling` flag is passed), |
| 48 | the memory dump manager calls `OnHeapProfilingEnabled()` on every |
| 49 | `MemoryDumpProvider` as early as possible, so allocators can start recording |
| 50 | allocations. This should be done even when tracing has not been started, |
| 51 | because these allocations might still be around when a heap dump happens during |
| 52 | tracing. |
| 53 | |
| 54 | ## Context Tracker |
| 55 | |
| 56 | The [`AllocationContextTracker`][context-tracker] is a thread-local object. Its |
| 57 | main purpose is to keep track of a pseudo stack of trace events. Chrome has |
| 58 | been instrumented with lots of `TRACE_EVENT` macros. These trace events push |
| 59 | their name to a thread-local stack when they go into scope, and pop when they |
| 60 | go out of scope, if all of the following conditions have been met: |
| 61 | |
| 62 | * A trace is being recorded. |
| 63 | * The category of the event is enabled in the trace config. |
| 64 | * Heap profiling is enabled (with the `--enable-heap-profiling` flag). |
| 65 | |
| 66 | This means that allocations that occur before tracing is started will not have |
| 67 | backtrace information in their context. |
| 68 | |
| 69 | A thread-local instance of the context tracker is initialized lazily when it is |
| 70 | first accessed. This might be because a trace event pushed or popped, or because |
| 71 | `GetContextSnapshot()` was called when an allocation occurred. |
| 72 | |
| 73 | [`AllocationContext`][alloc-context] is what is used to group and break down |
| 74 | allocations. Currently `AllocationContext` has the following fields: |
| 75 | |
| 76 | * Backtrace: filled by the context tracker, obtained from the thread-local |
| 77 | pseudo stack. |
| 78 | * Type name: to be filled in at a point where the type of a pointer is known, |
| 79 | set to _[unknown]_ by default. |
| 80 | |
| 81 | It is possible to modify this context after insertion into the register, for |
| 82 | instance to set the type name if it was not known at the time of allocation. |
| 83 | |
| 84 | ## Allocation Register |
| 85 | |
| 86 | The [`AllocationRegister`][alloc-register] is a hash table specialized for |
| 87 | storing `(size, AllocationContext)` pairs by address. It has been optimized for |
| 88 | Chrome's typical number of unfreed allocations, and it is backed by `mmap` |
| 89 | memory directly so there are no reentrancy issues when using it to record |
| 90 | `malloc` allocations. |
| 91 | |
| 92 | The allocation register is threading-agnostic. Access must be synchronised |
| 93 | properly. |
| 94 | |
| 95 | ## Heap Dump Writer |
| 96 | |
| 97 | Dumping every single allocation in the allocation register straight into the |
| 98 | trace log is not an option due to the sheer volume (~300k unfreed allocations). |
| 99 | The role of the [`ExportHeapDump()`][export-heap-dump] function is to group |
| 100 | allocations, striking a balance between trace log size and detail. |
| 101 | |
| 102 | See the [Heap Dump Format][heap-dump-format] document for more details about the |
| 103 | structure of the heap dump in the trace log. |
| 104 | |
| 105 | [heap-dump-format]: https://docs.google.com/document/d/1NqBg1MzVnuMsnvV1AKLdKaPSPGpd81NaMPVk5stYanQ |
| 106 | |
| 107 | ## Instrumenting an Allocator |
| 108 | |
| 109 | Below is an example of adding heap profiling support to an allocator that has |
| 110 | an existing memory dump provider. |
| 111 | |
| 112 | ```cpp |
| 113 | class FooDumpProvider : public MemoryDumpProvider { |
| 114 | |
| 115 | // Kept as pointer because |AllocationRegister| allocates a lot of virtual |
| 116 | // address space when constructed, so only construct it when heap profiling is |
| 117 | // enabled. |
| 118 | scoped_ptr<AllocationRegister> allocation_register_; |
| 119 | Lock allocation_register_lock_; |
| 120 | |
| 121 | static FooDumpProvider* GetInstance(); |
| 122 | |
| 123 | void InsertAllocation(void* address, size_t size) { |
| 124 | AllocationContext context = AllocationContextTracker::GetInstanceForCurrentThread()->GetContextSnapshot(); |
| 125 | AutoLock lock(allocation_register_lock_); |
| 126 | allocation_register_->Insert(address, size, context); |
| 127 | } |
| 128 | |
| 129 | void RemoveAllocation(void* address) { |
| 130 | AutoLock lock(allocation_register_lock_); |
| 131 | allocation_register_->Remove(address); |
| 132 | } |
| 133 | |
| 134 | // Will be called as early as possible by the memory dump manager. |
| 135 | void OnHeapProfilingEnabled(bool enabled) override { |
| 136 | AutoLock lock(allocation_register_lock_); |
| 137 | allocation_register_.reset(new AllocationRegister()); |
| 138 | |
| 139 | // At this point, make sure that from now on, for every allocation and |
| 140 | // free, |FooDumpProvider::GetInstance()->InsertAllocation()| and |
| 141 | // |RemoveAllocation| are called. |
| 142 | } |
| 143 | |
| 144 | bool OnMemoryDump(const MemoryDumpArgs& args, |
| 145 | ProcessMemoryDump& pmd) override { |
| 146 | // Do regular dumping here. |
| 147 | |
| 148 | // Dump the heap only for detailed dumps. |
| 149 | if (args.level_of_detail == MemoryDumpLevelOfDetail::DETAILED) { |
| 150 | TraceEventMemoryOverhead overhead; |
| 151 | hash_map<AllocationContext, size_t> bytes_by_context; |
| 152 | |
| 153 | { |
| 154 | AutoLock lock(allocation_register_lock_); |
| 155 | if (allocation_register_) { |
| 156 | // Group allocations in the register into |bytes_by_context|, but do |
| 157 | // no additional processing inside the lock. |
| 158 | for (const auto& alloc_size : *allocation_register_) |
| 159 | bytes_by_context[alloc_size.context] += alloc_size.size; |
| 160 | |
| 161 | allocation_register_->EstimateTraceMemoryOverhead(&overhead); |
| 162 | } |
| 163 | } |
| 164 | |
| 165 | if (!bytes_by_context.empty()) { |
| 166 | scoped_refptr<TracedValue> heap_dump = ExportHeapDump( |
| 167 | bytes_by_context, |
| 168 | pmd->session_state()->stack_frame_deduplicator(), |
| 169 | pmb->session_state()->type_name_deduplicator()); |
| 170 | pmd->AddHeapDump("foo_allocator", heap_dump); |
| 171 | overhead.DumpInto("tracing/heap_profiler", pmd); |
| 172 | } |
| 173 | } |
| 174 | |
| 175 | return true; |
| 176 | } |
| 177 | }; |
| 178 | |
| 179 | ``` |
| 180 | |
| 181 | *** aside |
| 182 | The implementation for `malloc` is more complicated because it needs to deal |
| 183 | with reentrancy. |
| 184 | *** |