blob: 45e03b8dd04af10df840d90818e33b979b7e3fa1 [file] [log] [blame]
Grant Likely482c9e42011-10-24 11:09:15 +02001Linux and the Device Tree
2The Linux usage model for device tree data
3
4Author: Grant Likely <grant.likely@secretlab.ca>
5
6This article describes how Linux uses the device tree. An overview of
7the device tree data format can be found at the <a
8href="http://devicetree.org/Device_Tree_Usage">Device Tree Usage</a>
9page on <a href="http://devicetree.org">devicetree.org</a>.
10
11
12 All the cool architectures are using device tree. I want to
13 use device tree too!
14
15The "Open Firmware Device Tree", or simply Device Tree (DT), is a data
16structure and language for describing hardware. More specifically, it
17is a description of hardware that is readable by an operating system
18so that the operating system doesn't need to hard code details of the
19machine.
20
21Structurally, the DT is a tree, or acyclic graph with named nodes, and
22nodes may have an arbitrary number of named properties encapsulating
23arbitrary data. A mechanism also exists to create arbitrary
24links from one node to another outside of the natural tree structure.
25
26Conceptually, a common set of usage conventions, called 'bindings',
27is defined for how data should appear in the tree to describe typical
28hardware characteristics including data busses, interrupt lines, GPIO
29connections, and peripheral devices.
30
31As much as possible, hardware is described using existing bindings to
32maximize use of existing support code, but since property and node
33names are simply text strings, it is easy to extend existing bindings
34or create new ones by defining new nodes and properties.
35
36<h2>History</h2>
37The DT was originally created by Open Firmware as part of the
38communication method for passing data from Open Firmware to a client
39program (like to an operating system). An operating system used the
40Device Tree to discover the topology of the hardware at runtime, and
41thereby support a majority of available hardware without hard coded
42information (assuming drivers were available for all devices).
43
44Since Open Firmware is commonly used on PowerPC and SPARC platforms,
45the Linux support for those architectures has for a long time used the
46Device Tree.
47
48In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit
49and 64-bit support, the decision was made to require DT support on all
50powerpc platforms, regardless of whether or not they used Open
51Firmware. To do this, a DT representation called the Flattened Device
52Tree (FDT) was created which could be passed to the kernel as a binary
53blob without requiring a real Open Firmware implementation. U-Boot,
54kexec, and other bootloaders were modified to support both passing a
55Device Tree Binary (dtb) and to modify a dtb at boot time.
56
57Some time later, FDT infrastructure was generalized to be usable by
58all architectures. At the time of this writing, 6 mainlined
59architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1
60out of mainline (nios) have some level of DT support.
61
62<h2>Data Model</h2>
63If you haven't already read the
64href="http://devicetree.org/Device_Tree_Usage">Device Tree Usage</a>
65page, then go read it now. It's okay, I'll wait....
66
67<h3>High Level View</h3>
68The most important thing to understand is that the DT is simply a data
69structure that describes the hardware. There is nothing magical about
70it, and it doesn't magically make all hardware configuration problems
71go away. What it does do is provide a language for decoupling the
72hardware configuration from the board and device driver support in the
73Linux kernel (or any other operating system for that matter). Using
74it allows board and device support to become data driven; to make
75setup decisions based on data passed into the kernel instead of on
76per-machine hard coded selections.
77
78Ideally, data driven platform setup should result in less code
79duplication and make it easier to support a wide range of hardware
80with a single kernel image.
81
82Linux uses DT data for three major purposes:
831) platform identification,
842) runtime configuration, and
853) device population.
86
87<h4>Platform Identification</h4>
88First and foremost, the kernel will use data in the DT to identify the
89specific machine. In a perfect world, the specific platform shouldn't
90matter to the kernel because all platform details would be described
91perfectly by the device tree in a consistent and reliable manner.
92Hardware is not perfect though, and so the kernel must identify the
93machine during early boot so that it has the opportunity to run
94machine-specific fixups.
95
96In the majority of cases, the machine identity is irrelevant, and the
97kernel will instead select setup code based on the machine's core
98CPU or SoC. On ARM for example, setup_arch() in
99arch/arm/kernel/setup.c will call setup_machine_fdt() in
100arch/arm/kernel/devicetree.c which searches through the machine_desc
101table and selects the machine_desc which best matches the device tree
102data. It determines the best match by looking at the 'compatible'
103property in the root device tree node, and comparing it with the
104dt_compat list in struct machine_desc.
105
106The 'compatible' property contains a sorted list of strings starting
107with the exact name of the machine, followed by an optional list of
108boards it is compatible with sorted from most compatible to least. For
109example, the root compatible properties for the TI BeagleBoard and its
110successor, the BeagleBoard xM board might look like:
111
112 compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3";
113 compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3";
114
115Where "ti,omap3-beagleboard-xm" specifies the exact model, it also
116claims that it compatible with the OMAP 3450 SoC, and the omap3 family
117of SoCs in general. You'll notice that the list is sorted from most
118specific (exact board) to least specific (SoC family).
119
120Astute readers might point out that the Beagle xM could also claim
121compatibility with the original Beagle board. However, one should be
122cautioned about doing so at the board level since there is typically a
123high level of change from one board to another, even within the same
124product line, and it is hard to nail down exactly what is meant when one
125board claims to be compatible with another. For the top level, it is
126better to err on the side of caution and not claim one board is
127compatible with another. The notable exception would be when one
128board is a carrier for another, such as a CPU module attached to a
129carrier board.
130
131One more note on compatible values. Any string used in a compatible
132property must be documented as to what it indicates. Add
133documentation for compatible strings in Documentation/devicetree/bindings.
134
135Again on ARM, for each machine_desc, the kernel looks to see if
136any of the dt_compat list entries appear in the compatible property.
137If one does, then that machine_desc is a candidate for driving the
138machine. After searching the entire table of machine_descs,
139setup_machine_fdt() returns the 'most compatible' machine_desc based
140on which entry in the compatible property each machine_desc matches
141against. If no matching machine_desc is found, then it returns NULL.
142
143The reasoning behind this scheme is the observation that in the majority
144of cases, a single machine_desc can support a large number of boards
145if they all use the same SoC, or same family of SoCs. However,
146invariably there will be some exceptions where a specific board will
147require special setup code that is not useful in the generic case.
148Special cases could be handled by explicitly checking for the
149troublesome board(s) in generic setup code, but doing so very quickly
150becomes ugly and/or unmaintainable if it is more than just a couple of
151cases.
152
153Instead, the compatible list allows a generic machine_desc to provide
154support for a wide common set of boards by specifying "less
155compatible" value in the dt_compat list. In the example above,
156generic board support can claim compatibility with "ti,omap3" or
157"ti,omap3450". If a bug was discovered on the original beagleboard
158that required special workaround code during early boot, then a new
159machine_desc could be added which implements the workarounds and only
160matches on "ti,omap3-beagleboard".
161
162PowerPC uses a slightly different scheme where it calls the .probe()
163hook from each machine_desc, and the first one returning TRUE is used.
164However, this approach does not take into account the priority of the
165compatible list, and probably should be avoided for new architecture
166support.
167
168<h4>Runtime configuration</h4>
169In most cases, a DT will be the sole method of communicating data from
170firmware to the kernel, so also gets used to pass in runtime and
171configuration data like the kernel parameters string and the location
172of an initrd image.
173
174Most of this data is contained in the /chosen node, and when booting
175Linux it will look something like this:
176
177 chosen {
178 bootargs = "console=ttyS0,115200 loglevel=8";
179 initrd-start = &lt;0xc8000000&gt;;
180 initrd-end = &lt;0xc8200000&gt;;
181 };
182
183The bootargs property contains the kernel arguments, and the initrd-*
184properties define the address and size of an initrd blob. The
185chosen node may also optionally contain an arbitrary number of
186additional properties for platform-specific configuration data.
187
188During early boot, the architecture setup code calls of_scan_flat_dt()
189several times with different helper callbacks to parse device tree
190data before paging is setup. The of_scan_flat_dt() code scans through
191the device tree and uses the helpers to extract information required
192during early boot. Typically the early_init_dt_scan_chosen() helper
193is used to parse the chosen node including kernel parameters,
194early_init_dt_scan_root() to initialize the DT address space model,
195and early_init_dt_scan_memory() to determine the size and
196location of usable RAM.
197
198On ARM, the function setup_machine_fdt() is responsible for early
199scanning of the device tree after selecting the correct machine_desc
200that supports the board.
201
202<h4>Device population</h4>
203After the board has been identified, and after the early configuration data
204has been parsed, then kernel initialization can proceed in the normal
205way. At some point in this process, unflatten_device_tree() is called
206to convert the data into a more efficient runtime representation.
207This is also when machine-specific setup hooks will get called, like
208the machine_desc .init_early(), .init_irq() and .init_machine() hooks
209on ARM. The remainder of this section uses examples from the ARM
210implementation, but all architectures will do pretty much the same
211thing when using a DT.
212
213As can be guessed by the names, .init_early() is used for any machine-
214specific setup that needs to be executed early in the boot process,
215and .init_irq() is used to set up interrupt handling. Using a DT
216doesn't materially change the behaviour of either of these functions.
217If a DT is provided, then both .init_early() and .init_irq() are able
218to call any of the DT query functions (of_* in include/linux/of*.h) to
219get additional data about the platform.
220
221The most interesting hook in the DT context is .init_machine() which
222is primarily responsible for populating the Linux device model with
223data about the platform. Historically this has been implemented on
224embedded platforms by defining a set of static clock structures,
225platform_devices, and other data in the board support .c file, and
226registering it en-masse in .init_machine(). When DT is used, then
227instead of hard coding static devices for each platform, the list of
228devices can be obtained by parsing the DT, and allocating device
229structures dynamically.
230
231The simplest case is when .init_machine() is only responsible for
232registering a block of platform_devices. A platform_device is a concept
233used by Linux for memory or I/O mapped devices which cannot be detected
234by hardware, and for 'composite' or 'virtual' devices (more on those
235later). While there is no 'platform device' terminology for the DT,
236platform devices roughly correspond to device nodes at the root of the
237tree and children of simple memory mapped bus nodes.
238
239About now is a good time to lay out an example. Here is part of the
240device tree for the NVIDIA Tegra board.
241
242/{
243 compatible = "nvidia,harmony", "nvidia,tegra20";
244 #address-cells = <1>;
245 #size-cells = <1>;
246 interrupt-parent = <&intc>;
247
248 chosen { };
249 aliases { };
250
251 memory {
252 device_type = "memory";
253 reg = <0x00000000 0x40000000>;
254 };
255
256 soc {
257 compatible = "nvidia,tegra20-soc", "simple-bus";
258 #address-cells = <1>;
259 #size-cells = <1>;
260 ranges;
261
262 intc: interrupt-controller@50041000 {
263 compatible = "nvidia,tegra20-gic";
264 interrupt-controller;
265 #interrupt-cells = <1>;
266 reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >;
267 };
268
269 serial@70006300 {
270 compatible = "nvidia,tegra20-uart";
271 reg = <0x70006300 0x100>;
272 interrupts = <122>;
273 };
274
275 i2s-1: i2s@70002800 {
276 compatible = "nvidia,tegra20-i2s";
277 reg = <0x70002800 0x100>;
278 interrupts = <77>;
279 codec = <&wm8903>;
280 };
281
282 i2c@7000c000 {
283 compatible = "nvidia,tegra20-i2c";
284 #address-cells = <1>;
285 #size-cells = <1>;
286 reg = <0x7000c000 0x100>;
287 interrupts = <70>;
288
289 wm8903: codec@1a {
290 compatible = "wlf,wm8903";
291 reg = <0x1a>;
292 interrupts = <347>;
293 };
294 };
295 };
296
297 sound {
298 compatible = "nvidia,harmony-sound";
299 i2s-controller = <&i2s-1>;
300 i2s-codec = <&wm8903>;
301 };
302};
303
304At .machine_init() time, Tegra board support code will need to look at
305this DT and decide which nodes to create platform_devices for.
306However, looking at the tree, it is not immediately obvious what kind
307of device each node represents, or even if a node represents a device
308at all. The /chosen, /aliases, and /memory nodes are informational
309nodes that don't describe devices (although arguably memory could be
310considered a device). The children of the /soc node are memory mapped
311devices, but the codec@1a is an i2c device, and the sound node
312represents not a device, but rather how other devices are connected
313together to create the audio subsystem. I know what each device is
314because I'm familiar with the board design, but how does the kernel
315know what to do with each node?
316
317The trick is that the kernel starts at the root of the tree and looks
318for nodes that have a 'compatible' property. First, it is generally
319assumed that any node with a 'compatible' property represents a device
320of some kind, and second, it can be assumed that any node at the root
321of the tree is either directly attached to the processor bus, or is a
322miscellaneous system device that cannot be described any other way.
323For each of these nodes, Linux allocates and registers a
324platform_device, which in turn may get bound to a platform_driver.
325
326Why is using a platform_device for these nodes a safe assumption?
327Well, for the way that Linux models devices, just about all bus_types
328assume that its devices are children of a bus controller. For
329example, each i2c_client is a child of an i2c_master. Each spi_device
330is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The
331same hierarchy is also found in the DT, where I2C device nodes only
332ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB,
333etc. The only devices which do not require a specific type of parent
334device are platform_devices (and amba_devices, but more on that
335later), which will happily live at the base of the Linux /sys/devices
336tree. Therefore, if a DT node is at the root of the tree, then it
337really probably is best registered as a platform_device.
338
339Linux board support code calls of_platform_populate(NULL, NULL, NULL)
340to kick off discovery of devices at the root of the tree. The
341parameters are all NULL because when starting from the root of the
342tree, there is no need to provide a starting node (the first NULL), a
343parent struct device (the last NULL), and we're not using a match
344table (yet). For a board that only needs to register devices,
345.init_machine() can be completely empty except for the
346of_platform_populate() call.
347
348In the Tegra example, this accounts for the /soc and /sound nodes, but
349what about the children of the SoC node? Shouldn't they be registered
350as platform devices too? For Linux DT support, the generic behaviour
351is for child devices to be registered by the parent's device driver at
352driver .probe() time. So, an i2c bus device driver will register a
353i2c_client for each child node, an SPI bus driver will register
354its spi_device children, and similarly for other bus_types.
355According to that model, a driver could be written that binds to the
356SoC node and simply registers platform_devices for each of its
357children. The board support code would allocate and register an SoC
358device, an SoC device driver would bind to the SoC device, and
359register platform_devices for /soc/interrupt-controller, /soc/serial,
360/soc/i2s, and /soc/i2c in its .probe() hook. Easy, right? Although
361it is a lot of mucking about for just registering platform devices.
362
363It turns out that registering children of certain platform_devices as
364more platform_devices is a common pattern, and the device tree support
365code reflects that. The second argument to of_platform_populate() is
366an of_device_id table, and any node that matches an entry in that
367table will also get its child nodes registered. In the tegra case,
368the code can look something like this:
369
370static struct of_device_id harmony_bus_ids[] __initdata = {
371 { .compatible = "simple-bus", },
372 {}
373};
374
375static void __init harmony_init_machine(void)
376{
377 /* ... */
378 of_platform_populate(NULL, harmony_bus_ids, NULL);
379}
380
381"simple-bus" is defined in the ePAPR 1.0 specification as a property
382meaning a simple memory mapped bus, so the of_platform_populate() code
383could be written to just assume simple-bus compatible nodes will
384always be traversed. However, we pass it in as an argument so that
385board support code can always override the default behaviour.
386
387<h2>Appendix A: AMBA devices</h2>
388
389ARM Primecells are a certain kind of device attached to the ARM AMBA
390bus which include some support for hardware detection and power
391management. In Linux, struct amba_device and the amba_bus_type is
392used to represent Primecell devices. However, the fiddly bit is that
393not all devices on an AMBA bus are Primecells, and for Linux it is
394typical for both amba_device and platform_device instances to be
395siblings of the same bus segment.
396
397When using the DT, this creates problems for of_platform_populate()
398because it must decide whether to register each node as either a
399platform_device or an amba_device. This unfortunately complicates the
400device creation model a little bit, but the solution turns out not to
401be too invasive. If a node is compatible with "arm,amba-primecell", then
402of_platform_populate() will register it as an amba_device instead of a
403platform_device.