0% found this document useful (0 votes)
2 views

Sheet 2 Answers

The document compares various computer architecture concepts including endianness, memory architecture, and processor registers. It also discusses FIR filtering using a circular buffer, cache behavior in loops, and performance metrics between ARM and TI C55x processors. The TI C55x DSP outperforms the ARM Cortex-M in terms of code size and execution speed for DSP tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Sheet 2 Answers

The document compares various computer architecture concepts including endianness, memory architecture, and processor registers. It also discusses FIR filtering using a circular buffer, cache behavior in loops, and performance metrics between ARM and TI C55x processors. The TI C55x DSP outperforms the ARM Cortex-M in terms of code size and execution speed for DSP tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Sheet 2 Answers

Q2-1 : little endian: the lowest ordered byte is kept in the low-order bits of the
word.

big endian: the highest ordered byte is kept in the low-order bits of the word.

Q2-2 : Von Neumann: machine has shared program and data memory.

Harvard: machine has separate program and data memory.

Q2-3 : a) 16 general purpose registers.

b) The purpose of the CPSR is to expose and record useful information about
the results
of the current arithmetic operation.

c) The Z bit indicates whether or not every bit of the result of the current
arithmetic
operation is 0.

d) The PC is kept in R15.

Q2-4 : a) 2 - 3

2 - 3 = 0x00000010 – 0x00000011= 0xffffffff, with NZCV = 1000

b) - 2^32 + 1 - 1

- 2^32 +1 - 1 = 1-1 = 0 with NZCV = 0100

c) - 4 + 5

- 4 + 5 = 0xfffffffc + 0x00000005 = 0x00000001, with NZCV=0010

Q2-5 : a) EQ : equals zero

b) NE : not equal zero

c) MI : minus

d) VS : overflow

e) GE : signed greater than or equal


f) LT : signed less than:

Q2-6 : the BL instruction is used to allow for subroutines. The BL instruction is much
like a
branch except, in addition to modifying the PC to point to its target, it also
stores the
current PC in R14

Ex. r14=r15+4
r15->r15 ± branch offset

Q2-7 : Move r14 to r15. So, executing the instruction : MOV r15,r14 will perform
the required
functionality.

Q2-8 : 1. After calling scum(3): {3}


2. After calling foo(3,5): {5,3,return address of scum}
3. After returning from foo(3,5): {8, return address of scum}
4. After calling foo(4,5): {5,4, return address of scum}
5. After returning from foo(4,5,): {9, return address of scum}
6. After calling foo(5,5): {5,5, return address of scum}
7. After returning from foo(5,5): {10, return address of scum}
8. After returning from scum(3): {}
9. After calling baz(2): {2}
10. After returning from baz{(2): {3}

Q2-9 : Smaller code, higher performance, lower power consumption

Q2-10 : Yes

Q2-11 : 8 levels

Q2-12 : PCL and PCLATH.


Q2-13 : Word (16 bits)
Longword (32 bits)
A few instructions operate on bits.

Q2-14 : four 40-bit accumulators.

Q2-15 : ST0_55, ST1_55, ST2_55, ST3_55

Q2-16 : It executes a sequence of preceding instructions several times. The size of


the
block and number of iterations are determined by registers.

Q2-17 : Data and program are mapped to the same physical memory.

Q2-18 : They are at the beginning of main data page 0.

Q2-19 : Auxiliary register.

Q2-20 : DP : addresses the data pages.

PDP : addresses the I/O pages.

Q2-21 : Two stacks: - user (SP)

- system (SSP)

Q2-22 : The CSR register.

Q2-23 : slow return: the return address and loop context are stored on the stack.

fast return: values are stored in registers.


Q2-24 : 8 functional units

Q2-25 :

fetch packet : a unit of memory. it holds up to eight words and is aligned on 256-bit
boundaries

execute packet : a unit of execution. It defines the set of instructions that are
executed together

------------------------------------------------------------------------------------------------------------------------------------------

L2-1 :

A circular buffer stores recent input samples and automatically wraps around
when it reaches the end.
In the FIR filter, each new sample is added to the buffer, and the filter output is
calculated by multiplying buffer samples by filter coefficients and summing the
results.
This method efficiently handles continuous data streams without needing to shift
data.

#define N 5 // Filter length

int buffer[N] = {0};

int coeffs[N] = {1, 2, 3, 2, 1}; // Example coefficients

int index = 0;

int FIR_filter(int input) {

int output = 0;

buffer[index] = input;
int j = index;

for (int i = 0; i< N; i++) {

output += buffer[j] * coeffs[i];

j = (j == 0) ? (N - 1) : (j - 1); // Circular decrement

index = (index + 1) % N;

return output;

L2-2 :

A simple loop is used to access an array repeatedly.


By increasing the number of statements or data accesses in the loop, the data may
exceed the processor’s cache size.
This causes more cache misses, resulting in more memory (bus) accesses.
By observing execution speed or bus usage, the effect of cache behavior can be
seen.

volatile int data[1000]; // Prevent optimization

void cache_test(int loop_size) {

for (int i = 0; i<loop_size; i++) {

data[i] = i * 2;

L2-3 : Two processors : ARM and TI C55x


Code Size:

ARM Cortex-M:

Compiled binary size: 2.5 KB.

The general-purpose processor has a larger code size due to less specialized
instructions for DSP tasks.

TI C55x DSP:

Compiled binary size: 1.0 KB.

The DSP processor has a smaller code size due to specialized instructions for
FIR filtering and loop unrolling.

Performance:

The TI C55x DSP is significantly faster, with 6.25x higher throughput and
6.25x fewer cycles compared to the ARM Cortex-M. The DSP processor is
optimized for tasks like FIR filtering

TI C55x DSP:

- Smaller code size.

- Higher performance (faster execution, higher throughput).

ARM Cortex-M:

- Larger code size.

- Lower performance for DSP tasks, efficient cache usage.

You might also like