Chapter_01_See_Program_Running
Chapter_01_See_Program_Running
Chapter 1
Computer and Assembly Language
Spring 2018
1
Embedded Systems
2
Amazon Warehouse
Kiva Robot
3
Assembly Programs
http://www.andysinger.com/
4
Why do we learn Assembly?
Assembly isn’t “just another language”.
Help you understand how does the processor work
Assembly program runs faster than high-level language. Performance critical codes
must be written in assembly.
Use the profiling tools to find the performance bottle and rewrite that code section in
assembly
Latency-sensitive applications, such as aircraft controller
Standard C compilers do not use some operations available on ARM processors, such ROR
(Rotate Right) and RRX (Rotate Right Extended).
Hardware/processor specific code,
Processor booting code
Device drivers
A test-and-set atomic assembly instruction can be used to implement locks and
semaphores.
Cost-sensitive applications
Embedded devices, where the size of code is limited, wash machine controller,
automobile controllers
5The best applications are written by those who've mastered assembly language or
Why ARM processor
As of 2005, 98% of the more than one billion
mobile phones sold each year used ARM processors
USART,SPI, Advanced
In 2010 alone, 6.1 billion ARM-based processor, I2C timers
representing 95% of smartphones, 35% of digital
televisions and set-top boxes and 10% of mobile Motor
LCD Driver control
computers
6
iPhone 7
Teardown
A10 processor:
• 64-bit system on chip
(SoC)
• ARMv8-A core
7
Apple Watch
Apple S1 Processor
32-bit ARMv7-A compatible
# of Cores: 1
CMOS Technology: 28 nm
L1 cache 32 KB data
L2 cache 256 KB
GPU PowerVR SGX543
8
Kindle HD Fire
Texas
Instruments
OMAP 4460
dual-core
processor
9 http://www.ifixit.com
Fitbit Flex Teardown
STMicroelectronics
32L151C6 Ultra Low
Power ARM Cortex M3
Microcontroller
Nordic Semiconductor
nRF8001 Bluetooth Low
Energy Connectivity IC
10
www.ifixit.com
Samsung Galaxy Gear
STMicroelectronics
STM32F401B ARM-
Cortex M4 MCU with
source: ifixit.com
128KB Flash
11
Pebble Smartwatch
source: ifixit.com
STMicroelectronics STM32F205RE
ARM Cortex-M3 MCU, with a
maximum speed of 120 MHz
12
Oculus VR
STMicroelectronics
32F072R8 ARM Cortex-
M0 Microcontroller
14 source: ifixit.com
Nest Learning Thermostat
source: ifixit.com
source: ifixit.com
STMicroelectronics STM32F439ZI
180 MHz, 32 bit ARM Cortex-M4
CPU
16
Data Address
Memory 8 bits 32 bits
18
Computer Architecture
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.
19
ARM Cortex-M Series Family
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.
20
Levels of Program Code 001000010000000
0
001000000000000
C Program Assembly Program Machine 0Program
111000000000000
int main(void){ 1
int i; 010001000000000
int total = 0; 1
for (i = 0; i < 10; i++) Compil Assemble
000111000100000
{ e 0
total += i;
} 001010000000101
while(1); // Dead loop 0
} 110111001111101
1
101111110000000
High-level Assembly Hardware0
111001111111111
language language representati
0
Level of abstraction Textual on
closer to problem representation of Binary digits
domain instructions (bits)
Provides for Encoded
productivity and instructions
portability and data
21
See a Program Runs
C Code
Assembly Code
int main(void)
{ MOVS r1, #0x00 ; int a
int a = 0; compiler =0
int b = 1; MOVS r2, #0x01 ; int b
int c; =1
c = a + b; ADDS r3, r1, r2 ; c = a + b
return 0; l er MOVS r0, 0x00 ; set return
b
} em value
s
as BX lr ; return
Machine Code
001000010000 2100 ; MOVS r1,
0000 2201 #0x00
001000100000 188 ; MOVS r2,
0001 B #0x01
000110001000 2000 ; ADDS r3, r1,
1011 4770 r2
In Binary In Hex
001000000000 ; MOVS r0,
22 0000 #0x00
010001110111 ; BX lr
Processor Registers
32 bits
Fastest way to read and write
Registers are within the processor chip
R0 A register stores 32-bit value
R1 STM32L has
R2 R0-R12: 13 general-purpose registers
Low R3 R13: Stack pointer (Shadow of MSP or
Registers
R4
PSP)
R5 R14: Link register (LR)
General
R6 Purpose
Register R15: Program counter (PC)
R7
Special registers (xPSR, BASEPRI,
R8
PRIMASK, etc)
R9
High
32 bits
Registers R10
R11 xPSR
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL
23
Program Execution
Program Counter (PC) is a register that holds the memory
address of the next instruction to be fetched from the memory.
Memory Address
1. Fetch
instruction
at PC 477 0x080001B
address 0 4
PC 200 0x080001B
0 2
3. 2. 188 0x080001B
Execute Decode B 0
the the PC = 0x080001B0
220 0x080001A
instructio instructio Instruction = 1188B Eor
n n 2000188B or 210 8B180020
0x080001A
0 C
24
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
25
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
Clock
int main(void){
int i;
total = 0;
for (i = 0; i < 10; i++) {
total += a[i];
}
while(1);
}
34
Example:
Calculate the Sum of an Array
Instruction Data
Memory Memory
(Flash) (RAM)
int main(void){ int a[10] = {1, 2, 3, 4,
int i; 5, 6, 7, 8, 9, 10};
total = 0; int total;
for (i = 0; i < 10;
i++) { I/O
CPU total += a[i];
} Devices
while(1);
}
Starting memory address Starting memory address
0x08000000 0x20000000
35
Example:
Calculate the Sum of an Array
0010 0001 0000
0000
0100 1010 0000 MOVS r1, #0x00
Instruction 1000 LDR r2, =
Memory 0110 0000 0001 total_addr
0001 STR r1, [r2,
(Flash)
0010 0000 0000 #0x00]
int main(void){ 0000 MOVS r0, #0x00
int i; 1110 0000 0000 B Check
total = 0; 1000 Loop: LDR r1, = a_addr
for (i = 0; i < 10;
0100 1001 0000 LDR r1, [r1, r0,
i++) {
total += a[i]; 0111 LSL #2]
} 1111 1000 0101 LDR r2, =
while(1); 0001 total_addr
}
Starting memory address 0001 0000 0010 LDR r2, [r2,
0x08000000 0000 #0x00]
0100 1010 0000 ADD r1, r1, r2
0100 LDR r2, =
0110 1000 0001 total_addr
0010 STR r1,
0100 0100 0001 [r2,#0x00]
0001 ADDS r0, r0, #1
0100 1010 0000 Check: CMP r0, #0x0A
0011
36 0110 0000 0001 BLT Loop
0001 NOP
Example:
Calculate the Sum of an Array
0x200000 0x000 a[0] =
00 1 0x00000001
0x200000 0x000
02 0 a[1] =
Data 0x200000 0x000 0x00000002
04 2
Memory (RAM) 0x200000 0x000 a[2] =
06 0 0x00000003
0x200000 0x000
int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 08 3 a[3] =
0x200000 0x000
9, 10}; 0x00000004
int total; 0A 0
a[4] =
0x200000 0x000 0x00000005
0C 4
0x200000 0x000
0E 0 a[5] =
0x000 0x00000006
0x200000
5
10
Assume the starting memory 0x000 a[6] =
address of the data memory is 0x200000
0 0x00000007
0x20000000 12
0x000
0x200000
6 a[7] =
14
0x000 0x00000008
0x200000 0
16 0x000
0x200000 7 a[8] =
Memory 18 0x000 0x00000009
Memory
address
0x200000 0
1A content
in bytes 0x000 a[9] =
37 0x200000 8 0x0000000A
1C 0x000
Loading Code and Data into Memory
38
Loading Code and Data into Memory
39
Loading Code and Data into Memory
• Stack is mandatory
• Heap is used only if
dynamic allocation
(e.g. malloc, calloc) is
used.
40
View of a Binary Program
41
42
from st.com
43
from st.com
44
from st.com
STM32L4
45 from st.com
Memory
Map
46