ARM Architecture Basics


Overview


ARM stands for Acorn RISC machine, based on the company’s name started ARM designing back in 1983.
ARM Holdings’ primary business is selling  IP cores, which licensees use to create micro controllers (MCUs) and CPUs based on those cores
In this article we will be studying ARM7TDMI in detail, as studying all could be little too much. ARM7TDMI is the most successful implementation of ARM with hundreds of millions sold. Most ARM variants are developed on top of this.


I. Features


  • RISC (Reduced Instruction set)
  • High performance, low power and small size
  • load/store architecture
  • Pipelining
  • Uniform and fixed length instructions
  • ALU and Shifter control
  • Multiple load/store register instructions
  • Coprocessor instruction interface
  • THUMB support (16-bit dense compressed instruction set)
  • 7 Processor Modes

II. PIPELINING


Usually ARM instructions are executed in 3 stages :

1. Fetch : fetch instruction from memory to pipeline
2. Decode : decode the instruction to ARM
3. Execute : ALU result written to destination registers

with latest processor adding two more stages as
Memory access  and write back.
So what is pipelining ?, lets understand
Portion of hardware which does fetching of instruction will be idle while decode and execute phase of instruction, this leaves the room for starting the next instruction’s fetch before first instruction finishes the decode or execute phase.
So in optimized way,  when first instruction is getting executed, second instruction can be decoded and a third instruction can be fetched. This is what pipelining is.  simple !! :).
Below figure will help you memorizing it.

arm_pipeline


III. Processor Modes


 

  1. USER
  2. Fast Interrupt FIQ
  3. Interrupt IRQ
  4. Supervisor SVC
  5. Abort ABT
  6. Undefined UND
  7. System SYS

Lets have some quick understanding of these modes:
Most application program runs in USER mode. A program in user mode unable to access protected system resource, in order to use them mode need to be changed from USER mode to some other mode by raising an exception.
Modes other the USER are called privileged modes.
Modes other then USER and SYSTEM mode are called exception modes.
Processor enters into Privileged modes under specific exception conditions.
Different modes have few different additional registers, to avoid corrupting USER state registers when exception occurs.
SYSTEM Mode have same number of registers as USER mode.
summary :modes


IV. REGISTERS


ARM has 37, 32-bit long registers:

30 – General purpose
5 – SPSR (saved process status register)
1 – CPSR (current process status register)
1 – PC (program counter)

General purpose registers : 

15 registers are visible at max in one mode(in USER mode) naming R0 to R14.
R0 to R7 are unbanked registers(ie same physical address across all the modes)  R8 to R14 are banked registers(ie separate copy of these registers in different mode if they exist).
Thing to remember is banked register contents are preserved when the mode change and hence no need to save there data.
R13 is used as stack pointer commonly known as SP.
R14 is used as link register to store the return address for exception/sub-routine. If there are multiple nested levels, the previous return address goes to stack, pointed by R13, and the last address is kept in R14.

Program Counter:

R15 is known as PC. PC contains the address of the instruction being executed at the current time.As each instruction gets fetched, the program counter increments by 4 bytes in ARM state and 2 bytes in THUMB state.
Due to pipelining, current executing instruction is typically PC-8 for ARM and PC-4 for thumb.
For ARM state bits 1 & 0 are always 0 or ignored.
For THUMB state bit 0 is always 0 and ignored.

CPSR (Current process status register):

As the name suggest CPSR holds the information of current process.

CPSR

SPSR(Saved Process status Register):

Used to store CPSR when an exception occurs, each exception mode has its separate SPSR. USER mode and SYSTEM mode doesn’t have SPSR as they need not to execute exception handlers.

Thumb State :

Its a subset of ARM state, In thumb state there is no access to R8 to R12.

Summary:

ARM registers


V. Exceptions


As the processor enters in to an exception mode, some registers are automatically switched depending on the type of mode. This ensure that task state is not corrupted by occurrence of exception.
When an exception occurs ARM completes its current instruction, then :

Step 1 : saves the PC to LR (R14)
Step 2: saves CPSR in new mode’s SPSR
Step 3: changes the mode corresponding to the exception
Step 4: Disable the exceptions of lower priority
Step 5: Load the new mode’s instruction to PC (exception handler or ISR)

A unique address is predefined for each exception handler, address to which processor is forced to branch is called exception/ interrupt vector.

Exception/Interrupt Vector:

exception_vector

Once the exception is handled by the exception handler, mode is changed back to USER mode and the user task is resumed. Handler program must restore the user state exactly as it was before exception.
Any modified register must be restored from the handler stack.
CPSR must be restored from its SPSR.
PC must be changed back to what it was executing, LR (R14) will help here.

In case multiple exception occurs at same time, depending on there priority they will be serviced.


VI. CORE


core_arm7tdmi_Arch

Two main blocks Data path and Decoder.
Two read ports to register banks from A-Bus and B-bus and one write port from ALU.
Barrel Shifter : shift/rotate 2nd operand by any number of bits
ALU: Perform airthmatic/logic functions
Address Register and Address incrementer holds either PC address or operand address.
Data register holds read/write data from/to memory
Instruction decoder decodes machine code to control signals
In single cycle, data values are read on bus A & B , and the result from ALU is written to registers.
ARM 7 core has Von Neuman architecture, which means single 32 bit data bus carrying both data and instructions. In latter ARM architectures like ARM9 Harvard architecture is implemented, which means separate buses for data and instructions.


These all were the top view of ARM processor, hope this helps.
Please do let me know your feedback/concern in comment section below.

Saurabh Sengar
mailto: saurabh.truth@gmail.com

 

* all images used in this blog are from google images search, and I don’t own them

Device Tree Tutorial (ARM)


Overview


 

The linux kernel requires the entire description of the hardware, like which board it is booting(machine type), which all devices it is using there addresses(device/bus addresses), there interrupts numbers(irq), mfp pins configuration(pin muxing/gpios)  also some board level information like memory size, kernel command line etc etc …

Before device tree, all these information use to be set in a huge cluster of board files. And, Information like command line, memory size etc use to be passed by bootloaders as part of ATAGS through register R2(ARM). Machine type use to be set separately in register R1(ARM).
At this time each kernel compilation use to be for only one specific chip an a specific board.

So there was a long pending wish to compile the kernel for all ARM processors, and let the kernel somehow detect its hardware and apply the right drivers as needed just like your PC.
But how? On a PC, the initial registers are hardcoded, and the rest of the information is supplied by the BIOS. But ARM processors don’t have a BIOS.
The solution chosen was device tree, also referred to as Open Firmware (abbreviated OF) or Flattened Device Tree (FDT). This is essentially a data structure in byte code format which contains information that is helpful to the kernel when booting up.

The bootloader now loads two binaries: the kernel image and the DTB.
DTB is the device tree blob. The bootloader passes the DTB address through R2 instead of ATAGS and R1 register is not required now.

For a one line bookish definition “A device tree is a tree data structure with nodes that describe the physical devices in a system”

Currently device tree is supported by ARM, x86, Microblaze, PowerPC, and Sparc architectures.

 


I. Device Tree Compilation


Device tree compiler and its source code  located at scripts/dtc/.
On ARM all device tree source are located at /arch/arm/boot/dts/.
The Device Tree Blob(.dtb) is produced by the compiler, and it is the binary that gets loaded by the bootloader and parsed by the kernel at boot time.

$ scripts/dtc/dtc -I dts -O dtb -o /path/my_tree.dtb /arch/arm/boot/dts/my_tree.dts

This will result my_tree.dtb

For creating the dts from dtb

$ scripts/dtc/dtc -I dtb -O dts -o /path/my_tree.dts /path/my_tree.dtb

This will result my_tree.dts

 


 II. Device Tree Basics


 

Each module in device tree is defined by a node and all its properties are defined under that node. Depending on the driver it can have child nodes or parent node.
For example a device connected by i2c bus, will have i2c as its parent node, and that device will be one of the child node of i2c node, i2c may have apd bus as its parent and so on. All leads up to root node, which is parent of all. (Don’t worry an example after this section will make it more clear.)
Under the root of the Device Tree, one typically finds the following most common top-level nodes:

  • cpus: its each sub-nodes describing each CPU in the system.
  • memory : defines location and size of the RAM.
  • chosen : defines parameters chosen or defined by the system firmware at boot time. In practice, one of its usage is to pass the kernel command line.
  • aliases: shortcuts to certain nodes.
  • One or more nodes defining the buses in the SoC
  • One or mode nodes defining on-board devices

 


III. Device Tree Structure example


Here will take the example of a dummy dts code for explanation

 #include "pxa910.dtsi"
/ {
    compatible = "mrvl,pxa910-dkb", "mrvl,pxa910";
    chosen {
	bootargs = "<boot args here>";
    };
    memory {
        reg = <0x00000000 0x10000000>;
    };
    soc {
	apb@d4000000 {         

	    uart1: uart@d4017000 {
	    status = "okay";
	    };
	    twsi1: i2c@d4011000 {
                #address-cells = <1>
                #size-cells = <0>
		status = "okay";
		pmic: 88pm860x@34 {
                    compatible = "marvell,88pm860x";
		    reg = <0x34>;
		    interrupts = <4>;
		    interrupt-parent = <&intc>;
		    interrupt-controller;
		    #interrupt-cells = <1>;

Figure 1

Each module is defined in one curly bracket area under one node, any sub modules can be defined further inside.

Explaning the above tree starting from the first line :

#include : including any headed file, just like any C file
.dtsi : extended dts file, single dts can have any number of dtsi, but couldn’t include other dts file
/: root node, device tree structure starts here


IV. Properties


 

There are data define in dts as form of property which are read by the kernel code, lets read about some of the major properties

Compatible

The top-level compatible property typically defines a compatible string for the board. Priority always given with the most-specific first, to least-specific last. It used to match with the dt_compat field of the DT_MACHINE structure.
Inside a driver or bus node , it is the most crucial one, as it is the link between the hardware and its driver.Each node belongs to one compatible string and based on compatible string only kernel matches the device driver with its data in device tree node.
The connection between a kernel driver and the “compatible” entries it should be attached to, is made by a code segment as follows in the driver’s source code:

static struct of_device_id dummy_of_match[] = {
  { .compatible = "marvell,88pm860x", },
    {}
  };
MODULE_DEVICE_TABLE(of, dummy_of_match);

 The above code in driver matches it to the pmic node shown in device tree structure shown in figure 1.

reg

defines the address for that node/device

#address-cells

property indicate how many cells (i.e 32 bits values) are needed to form the base address part in the reg property

#size-cells

the size part of the reg property

interrupt-controller

is a boolean property that indicates that the current node is an interrupt controller

#interrupt-cells

indicates the number of cells in the interrupts property for the interrupts managed by the selected interrupt controller

interrupt-parent

is a phandle that points to the interrupt controller for the current node. There is generally a top-level interrupt-parent definition for the main interrupt controller.

The label and node name

First, the label (”pmic”) and entry’s name (”88pm860x@34″). The label could have been omitted altogether, and the entry’s node name should stick to this format (some-name@address). This tells the kernel that this driver name 88pm860x and connected to its parent bus(i2c in this case) with the adress 34 (i2c slave address here). PMIC is the label which could be use as a phandle to refer this node inside dts.

 


 V. Getting the resources from DTS


 

Below are the few major APIs in current kernel (4.3) for reading the various properties from DTS.

of_address_to_resource: Reads the memory address of device defined by res property

irq_of_parse_and_map: Attach the interrupt handler, provided by the properties interrupt and interrupt-parent

of_find_property(np, propname, NULL): To find if property named in argument2 is present or not.

of_property_read_bool: To read a bool property named in argument 2, as it is a bool property it just like searching if that property present or not. Returns true or false

of_get_property: For reading any property named in argument 2

of_property_read_u32To read a 32 bit property, populate into 3rd argument. Doesn’t set anything to 3rd argument in case of error.

of_property_read_string: To read string property

of_match_device: Sanity check for device that device is matching with the node, highly optional, I don’t see much use of it.

 

let me know if you have any doubts related to device tree in comment section below or an personal email to me.

 

Saurabh Singh Sengar

email to: saurabh.truth@gmail.com