1Ramoops oops/panic logger
2=========================
3
4Sergiu Iordache <sergiu@chromium.org>
5
6Updated: 10 Feb 2021
7
8Introduction
9------------
10
11Ramoops is an oops/panic logger that writes its logs to RAM before the system
12crashes. It works by logging oopses and panics in a circular buffer. Ramoops
13needs a system with persistent RAM so that the content of that area can
14survive after a restart.
15
16Ramoops concepts
17----------------
18
19Ramoops uses a predefined memory area to store the dump. The start and size
20and type of the memory area are set using three variables:
21
22  * ``mem_address`` for the start
23  * ``mem_size`` for the size. The memory size will be rounded down to a
24    power of two.
25  * ``mem_type`` to specify if the memory type (default is pgprot_writecombine).
26  * ``mem_name`` to specify a memory region defined by ``reserve_mem`` command
27    line parameter.
28
29Typically the default value of ``mem_type=0`` should be used as that sets the pstore
30mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
31``pgprot_noncached``, which only works on some platforms. This is because pstore
32depends on atomic operations. At least on ARM, pgprot_noncached causes the
33memory to be mapped strongly ordered, and atomic operations on strongly ordered
34memory are implementation defined, and won't work on many ARMs such as omaps.
35Setting ``mem_type=2`` attempts to treat the memory region as normal memory,
36which enables full cache on it. This can improve the performance.
37
38The memory area is divided into ``record_size`` chunks (also rounded down to
39power of two) and each kmesg dump writes a ``record_size`` chunk of
40information.
41
42Limiting which kinds of kmsg dumps are stored can be controlled via
43the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
44``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
45``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
46``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
47(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
48``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
49otherwise KMSG_DUMP_MAX.
50
51The module uses a counter to record multiple dumps but the counter gets reset
52on restart (i.e. new dumps after the restart will overwrite old ones).
53
54Ramoops also supports software ECC protection of persistent memory regions.
55This might be useful when a hardware reset was used to bring the machine back
56to life (i.e. a watchdog triggered). In such cases, RAM may be somewhat
57corrupt, but usually it is restorable.
58
59Setting the parameters
60----------------------
61
62Setting the ramoops parameters can be done in several different manners:
63
64 A. Use the module parameters (which have the names of the variables described
65 as before). For quick debugging, you can also reserve parts of memory during
66 boot and then use the reserved memory for ramoops. For example, assuming a
67 machine with > 128 MB of memory, the following kernel command line will tell
68 the kernel to use only the first 128 MB of memory, and place ECC-protected
69 ramoops region at 128 MB boundary::
70
71	mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1
72
73 B. Use Device Tree bindings, as described in
74 ``Documentation/devicetree/bindings/reserved-memory/ramoops.yaml``.
75 For example::
76
77	reserved-memory {
78		#address-cells = <2>;
79		#size-cells = <2>;
80		ranges;
81
82		ramoops@8f000000 {
83			compatible = "ramoops";
84			reg = <0 0x8f000000 0 0x100000>;
85			record-size = <0x4000>;
86			console-size = <0x4000>;
87		};
88	};
89
90 C. Use a platform device and set the platform data. The parameters can then
91 be set through that platform data. An example of doing that is:
92
93 .. code-block:: c
94
95  #include <linux/pstore_ram.h>
96  [...]
97
98  static struct ramoops_platform_data ramoops_data = {
99        .mem_size               = <...>,
100        .mem_address            = <...>,
101        .mem_type               = <...>,
102        .record_size            = <...>,
103        .max_reason             = <...>,
104        .ecc                    = <...>,
105  };
106
107  static struct platform_device ramoops_dev = {
108        .name = "ramoops",
109        .dev = {
110                .platform_data = &ramoops_data,
111        },
112  };
113
114  [... inside a function ...]
115  int ret;
116
117  ret = platform_device_register(&ramoops_dev);
118  if (ret) {
119	printk(KERN_ERR "unable to register platform device\n");
120	return ret;
121  }
122
123 D. Using a region of memory reserved via ``reserve_mem`` command line
124    parameter. The address and size will be defined by the ``reserve_mem``
125    parameter. Note, that ``reserve_mem`` may not always allocate memory
126    in the same location, and cannot be relied upon. Testing will need
127    to be done, and it may not work on every machine, nor every kernel.
128    Consider this a "best effort" approach. The ``reserve_mem`` option
129    takes a size, alignment and name as arguments. The name is used
130    to map the memory to a label that can be retrieved by ramoops.
131
132	reserve_mem=2M:4096:oops  ramoops.mem_name=oops
133
134You can specify either RAM memory or peripheral devices' memory. However, when
135specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
136very early in the architecture code, e.g.::
137
138	#include <linux/memblock.h>
139
140	memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
141
142Dump format
143-----------
144
145The data dump begins with a header, currently defined as ``====`` followed by a
146timestamp and a new line. The dump then continues with the actual data.
147
148Reading the data
149----------------
150
151The dump data can be read from the pstore filesystem. The format for these
152files is ``dmesg-ramoops-N``, where N is the record number in memory. To delete
153a stored record from RAM, simply unlink the respective pstore file.
154
155Persistent function tracing
156---------------------------
157
158Persistent function tracing might be useful for debugging software or hardware
159related hangs. The functions call chain log is stored in a ``ftrace-ramoops``
160file. Here is an example of usage::
161
162 # mount -t debugfs debugfs /sys/kernel/debug/
163 # echo 1 > /sys/kernel/debug/pstore/record_ftrace
164 # reboot -f
165 [...]
166 # mount -t pstore pstore /mnt/
167 # tail /mnt/ftrace-ramoops
168 0 ffffffff8101ea64  ffffffff8101bcda  native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0
169 0 ffffffff8101ea44  ffffffff8101bcf6  native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0
170 0 ffffffff81020084  ffffffff8101a4b5  hpet_disable <- native_machine_shutdown+0x75/0x90
171 0 ffffffff81005f94  ffffffff8101a4bb  iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90
172 0 ffffffff8101a6a1  ffffffff8101a437  native_machine_emergency_restart <- native_machine_restart+0x37/0x40
173 0 ffffffff811f9876  ffffffff8101a73a  acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0
174 0 ffffffff8101a514  ffffffff8101a772  mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0
175 0 ffffffff811d9c54  ffffffff8101a7a0  __const_udelay <- native_machine_emergency_restart+0x110/0x1e0
176 0 ffffffff811d9c34  ffffffff811d9c80  __delay <- __const_udelay+0x30/0x40
177 0 ffffffff811d9d14  ffffffff811d9c3f  delay_tsc <- __delay+0xf/0x20
178