Assembler for Swift developers - part 2

You’ve mastered “Hello, Assembly!” and peeked inside registers. Now let’s plunge into pointers, arrays, function calls, iteration, and memory regions. You’ll learn how ARM64 turns high-level constructs into pointer arithmetic, branches, and stack frames—and why overstepping your allocated buffer immediately triggers a fault.

Pointers & Arrays: Address, Address, Address

In C, an array is just a pointer under the hood:

int nums[] = {10, 20, 30};
printf("%d\n", nums[1]);

In ARM64 assembly, you load the base address of nums with a two-instruction sequence:

adrp    x0, nums@PAGE        // load page base of nums
add     x0, x0, nums@PAGEOFF // add the low-12-bit offset → &nums[0]
ldr     w1, [x0, #4]         // load nums[1]

How ADRP/ADD Works

ARM64 splits a full 64-bit address into:

full 64-bit address (0x1234_5678_9ABC_DEF0)
 ────────────────────────────────────────────────
 |         PAGE (top 52 bits)   │ OFFSET (12 bits) │
 |    0x1234_5678_9ABC          │      0xDEF0      │
 ────────────────────────────────────────────────
  • ADRP loads the page base (zeroing low 12 bits).
  • ADD …@PAGEOFF tacks on the low-12-bit offset.
  • One page = 4 KB = 2¹² bytes, so 12 bits cover every byte inside.

Visualizing a Page

Page boundary ───────────────────────────────────────────────
┌───────────────────────────────────┐
│      PAGE = 0x1234_5678_9ABC      │  ← adrp x0, nums@PAGE
│                                   │
│ [0xDEF0] nums[0] → offset 0xDEF0  │  ← add  x0, x0, nums@PAGEOFF
│ [0xDEF4] nums[1] → offset 0xDEF4  │
│ [0xDEF8] nums[2] → offset 0xDEF8  │
└───────────────────────────────────┘
↑ lower addresses                  ↑ higher

Multi-Page Buffers

If your data spans >1 page (e.g., a 6 KB JPEG buffer), two pages hold it:

Buffer: 0x..._9ABC_1000 – 0x..._9ABD_017F (6 KB)
──────────────────────────────────────────────────────────
PAGE 1: 0x..._9ABC [offsets 0x1000–0x1FFF] (4 KB)
PAGE 2: 0x..._9ABD [offsets 0x0000–0x017F] (2 KB)

Your loop doesn’t need special page-crossing code in user mode—the MMU makes the next virtual page accessible automatically. If you stray outside your allocated range (stack, heap, or data segment), the CPU’s safety checks trigger a fault (segfault), stopping you in your tracks.

Iteration & Branching: Comparison Operators

Loops in assembly boil down to compare + branch:

  • cmp rn, rm compares rn – rm (sets flags).
  • subs rd, rn, #imm subtracts immediate, sets flags (and writes result to rd).
  • Branches:
    • b.eq / b.ne → equal / not equal
    • b.lt / b.ge → signed less-than / signed greater-or-equal
    • b.lo / b.hs → unsigned less-than / unsigned ≥

Example: walking an array

    adrp    x0, nums@PAGE
    add     x0, x0, nums@PAGEOFF
    mov     x2, #3           // count = 3

loop:
    ldr     w3, [x0], #4     // load then x0 += 4
    // … process w3 …
    subs    x2, x2, #1       // x2 = x2 – 1
    b.ne    loop             // if x2 ≠ 0, jump to loop

Functions & Calling Conventions

High-level calls become a choreographed dance:

  1. Prologue: set up a stack frame
  2. Body: use x0–x7 for args/return, save callee-saved registers
  3. Epilogue: tear down the frame and ret
    .global _sum
_sum:                            // ← label: a branch target, like loop: above
    stp     x29, x30, [sp, #-16]!// push FP (x29) & LR (x30)
    mov     x29, sp              // new frame pointer
    // x0 = arr ptr, x1 = count

    mov     w2, #0               // total = 0
    mov     w3, #0               // i = 0

loop_sum:
    cmp     w3, w1               // compare i and count
    b.ge    done_sum             // if i ≥ count, break

    ldr     w4, [x0, w3, lsl #2] // w4 = arr[i]
    add     w2, w2, w4           // total += w4
    add     w3, w3, #1           // i++
    b       loop_sum             // goto loop_sum

done_sum:
    mov     w0, w2               // return total in w0
    ldp     x29, x30, [sp], #16  // restore FP & LR, pop stack
    ret                          // jump to address in LR
  • Labels like _sum:loop_sum:, and done_sum: mark code positions for branches. No direct analog in Swift/ObjC—think of them as named “goto” anchors under the hood.
  • Storing/restoring x29 (frame pointer) and x30 (link register) creates a private workspace.
  • Return uses ret, which branches to the address in x30.

Heap vs. Stack: Memory Management

  • Stack: auto-managed LIFO region via sp. Fast, but limited and prone to overflow if you reserve too much.
  • Heapmalloc/free under the hood (via syscalls). Grows upward; must avoid fragmentation.
    // Reserve 32 bytes on stack
    sub     sp, sp, #32
    str     x0, [sp, #0]         // store a local var
    // … use locals …
    add     sp, sp, #32          // release stack

    // Call malloc(64)
    mov     x0, #64
    bl      _malloc              // link-branch to libc’s malloc
    // x0 = pointer to heap memory

Putting it all together

Below are the two assembly files that compute and print “60” (in hex, 3c) every time.

sum.s

This is sum function, it expects two parameters

  • pointer to Int array in x1
  • count of elements in x1

Loops over values, and leaves result in x0

// sum.s
    .section    __TEXT,__text
    .global     _sum

_sum:
    // Prologue: save FP & LR
    stp     x29, x30, [sp,#-16]!
    mov     x29, sp

    mov     x2, #0              // total = 0

loop:
    cbz     w1, done            // if count==0, exit
    ldr     x3, [x0],#8         // load *ptr → x3; ptr += 8
    add     x2, x2, x3          // total += x3
    subs    w1, w1,#1           // count--, set flags
    b.ne    loop                // if count!=0, repeat

done:
    mov     x0, x2              // return total in x0

    // Epilogue: restore FP & LR
    ldp     x29, x30, [sp],#16
    ret

main.s

Main file prepares array, put arguments to proper register, calls our method, and prints result. You can freely change array and count value to experiment!

.section    __TEXT,__text
.globl      _main
.extern     _sum
.extern     _printf
.align      2

_main:
// — Prologue: save FP & LR, allocate 16 bytes
stp     x29, x30, [sp, #-16]!
mov     x29, sp
sub     sp, sp, #16

// — Compute sum(arr,3) into x0 —
adrp    x0, arr@PAGE
add     x0, x0, arr@PAGEOFF
mov     w1, #3
bl      _sum          // returns sum in x0

// — Store sum on stack for printf %d —
mov     x8, x0        // move full 64-bit sum into x8
str     x8, [sp]      // write sum at [sp]

// — Print it as hex —
adrp    x0, Lstr@PAGE
add     x0, x0, Lstr@PAGEOFF
bl      _printf

// — Teardown & return 0 —
add     sp, sp, #16
ldp     x29, x30, [sp], #16
mov     w0, #0
ret

// — Data: format string then array —
.section    __TEXT,__cstring
.align      2
Lstr:
.asciz      "Sum = %d\n"
.align      3             // ensure arr is 8-byte aligned

arr:
.quad       10, 20, 30    // 3×64-bit integers

Extras

If you are using Xcode, as in first post - try command line! Make sure, you have clang installed!

clang -arch arm64 main.s sum.s -o sum_demo
./sum_demo

Why? Xcode does not provide clear errors for assembler code. You have compiler error? It will jus says, that exit code was not 0.

CLI will provide line number and type of error, very good for debugging!

Also, Xcode can step by step debug assembler! and lldb can print registers!

Try this next time:

register read x1

Happy inspecting!

Closing Thoughts

You now understand:

  • How ADRP/ADD constructs full addresses (and why 4 KB pages matter).
  • How to iterate with cmp/subs + b.* branches.
  • How stack frames and labels underpin every function call.
  • That any attempt to touch memory outside your continuous allocation is caught by the MMU and results in a fault.
  • CLI and debugging!

In our final part, we’ll wield bitwise ops to filter JPEG data and sprinkle inline assembly into a real iOS example. Until then, write small tests: walk a buffer byte-by-byte, implement your own strlen, and debug with the Xcode disassembler open. Happy hacking!