Assembler for Swift developers - part 2
You’ve mastered “Hello, Assembly!” and peeked inside registers. Now let’s plunge into pointers, arrays, function calls, iteration, and memory regions. You’ll learn how ARM64 turns high-level constructs into pointer arithmetic, branches, and stack frames—and why overstepping your allocated buffer immediately triggers a fault.
Pointers & Arrays: Address, Address, Address
In C, an array is just a pointer under the hood:
int nums[] = {10, 20, 30};
printf("%d\n", nums[1]);In ARM64 assembly, you load the base address of nums with a two-instruction sequence:
adrp x0, nums@PAGE // load page base of nums
add x0, x0, nums@PAGEOFF // add the low-12-bit offset → &nums[0]
ldr w1, [x0, #4] // load nums[1]How ADRP/ADD Works
ARM64 splits a full 64-bit address into:
full 64-bit address (0x1234_5678_9ABC_DEF0)
────────────────────────────────────────────────
| PAGE (top 52 bits) │ OFFSET (12 bits) │
| 0x1234_5678_9ABC │ 0xDEF0 │
────────────────────────────────────────────────- ADRP loads the page base (zeroing low 12 bits).
- ADD …@PAGEOFF tacks on the low-12-bit offset.
- One page = 4 KB = 2¹² bytes, so 12 bits cover every byte inside.
Visualizing a Page
Page boundary ───────────────────────────────────────────────
┌───────────────────────────────────┐
│ PAGE = 0x1234_5678_9ABC │ ← adrp x0, nums@PAGE
│ │
│ [0xDEF0] nums[0] → offset 0xDEF0 │ ← add x0, x0, nums@PAGEOFF
│ [0xDEF4] nums[1] → offset 0xDEF4 │
│ [0xDEF8] nums[2] → offset 0xDEF8 │
└───────────────────────────────────┘
↑ lower addresses ↑ higherMulti-Page Buffers
If your data spans >1 page (e.g., a 6 KB JPEG buffer), two pages hold it:
Buffer: 0x..._9ABC_1000 – 0x..._9ABD_017F (6 KB)
──────────────────────────────────────────────────────────
PAGE 1: 0x..._9ABC [offsets 0x1000–0x1FFF] (4 KB)
PAGE 2: 0x..._9ABD [offsets 0x0000–0x017F] (2 KB)Your loop doesn’t need special page-crossing code in user mode—the MMU makes the next virtual page accessible automatically. If you stray outside your allocated range (stack, heap, or data segment), the CPU’s safety checks trigger a fault (segfault), stopping you in your tracks.
Iteration & Branching: Comparison Operators
Loops in assembly boil down to compare + branch:
cmp rn, rmcomparesrn – rm(sets flags).subs rd, rn, #immsubtracts immediate, sets flags (and writes result tord).- Branches:
b.eq/b.ne→ equal / not equalb.lt/b.ge→ signed less-than / signed greater-or-equalb.lo/b.hs→ unsigned less-than / unsigned ≥
Example: walking an array
adrp x0, nums@PAGE
add x0, x0, nums@PAGEOFF
mov x2, #3 // count = 3
loop:
ldr w3, [x0], #4 // load then x0 += 4
// … process w3 …
subs x2, x2, #1 // x2 = x2 – 1
b.ne loop // if x2 ≠ 0, jump to loop
Functions & Calling Conventions
High-level calls become a choreographed dance:
- Prologue: set up a stack frame
- Body: use
x0–x7for args/return, save callee-saved registers - Epilogue: tear down the frame and
ret
.global _sum
_sum: // ← label: a branch target, like loop: above
stp x29, x30, [sp, #-16]!// push FP (x29) & LR (x30)
mov x29, sp // new frame pointer
// x0 = arr ptr, x1 = count
mov w2, #0 // total = 0
mov w3, #0 // i = 0
loop_sum:
cmp w3, w1 // compare i and count
b.ge done_sum // if i ≥ count, break
ldr w4, [x0, w3, lsl #2] // w4 = arr[i]
add w2, w2, w4 // total += w4
add w3, w3, #1 // i++
b loop_sum // goto loop_sum
done_sum:
mov w0, w2 // return total in w0
ldp x29, x30, [sp], #16 // restore FP & LR, pop stack
ret // jump to address in LR
- Labels like
_sum:,loop_sum:, anddone_sum:mark code positions for branches. No direct analog in Swift/ObjC—think of them as named “goto” anchors under the hood. - Storing/restoring
x29(frame pointer) andx30(link register) creates a private workspace. - Return uses
ret, which branches to the address inx30.
Heap vs. Stack: Memory Management
- Stack: auto-managed LIFO region via
sp. Fast, but limited and prone to overflow if you reserve too much. - Heap:
malloc/freeunder the hood (via syscalls). Grows upward; must avoid fragmentation.
// Reserve 32 bytes on stack
sub sp, sp, #32
str x0, [sp, #0] // store a local var
// … use locals …
add sp, sp, #32 // release stack
// Call malloc(64)
mov x0, #64
bl _malloc // link-branch to libc’s malloc
// x0 = pointer to heap memory
Putting it all together
Below are the two assembly files that compute and print “60” (in hex, 3c) every time.
sum.s
This is sum function, it expects two parameters
- pointer to
Intarray inx1 - count of elements in
x1
Loops over values, and leaves result in x0
// sum.s
.section __TEXT,__text
.global _sum
_sum:
// Prologue: save FP & LR
stp x29, x30, [sp,#-16]!
mov x29, sp
mov x2, #0 // total = 0
loop:
cbz w1, done // if count==0, exit
ldr x3, [x0],#8 // load *ptr → x3; ptr += 8
add x2, x2, x3 // total += x3
subs w1, w1,#1 // count--, set flags
b.ne loop // if count!=0, repeat
done:
mov x0, x2 // return total in x0
// Epilogue: restore FP & LR
ldp x29, x30, [sp],#16
ret
main.s
Main file prepares array, put arguments to proper register, calls our method, and prints result. You can freely change array and count value to experiment!
.section __TEXT,__text
.globl _main
.extern _sum
.extern _printf
.align 2
_main:
// — Prologue: save FP & LR, allocate 16 bytes
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16
// — Compute sum(arr,3) into x0 —
adrp x0, arr@PAGE
add x0, x0, arr@PAGEOFF
mov w1, #3
bl _sum // returns sum in x0
// — Store sum on stack for printf %d —
mov x8, x0 // move full 64-bit sum into x8
str x8, [sp] // write sum at [sp]
// — Print it as hex —
adrp x0, Lstr@PAGE
add x0, x0, Lstr@PAGEOFF
bl _printf
// — Teardown & return 0 —
add sp, sp, #16
ldp x29, x30, [sp], #16
mov w0, #0
ret
// — Data: format string then array —
.section __TEXT,__cstring
.align 2
Lstr:
.asciz "Sum = %d\n"
.align 3 // ensure arr is 8-byte aligned
arr:
.quad 10, 20, 30 // 3×64-bit integers
Extras
If you are using Xcode, as in first post - try command line! Make sure, you have clang installed!
clang -arch arm64 main.s sum.s -o sum_demo
./sum_demoWhy? Xcode does not provide clear errors for assembler code. You have compiler error? It will jus says, that exit code was not 0.
CLI will provide line number and type of error, very good for debugging!
Also, Xcode can step by step debug assembler! and lldb can print registers!
Try this next time:
register read x1Happy inspecting!
Closing Thoughts
You now understand:
- How ADRP/ADD constructs full addresses (and why 4 KB pages matter).
- How to iterate with
cmp/subs+b.*branches. - How stack frames and labels underpin every function call.
- That any attempt to touch memory outside your continuous allocation is caught by the MMU and results in a fault.
- CLI and debugging!
In our final part, we’ll wield bitwise ops to filter JPEG data and sprinkle inline assembly into a real iOS example. Until then, write small tests: walk a buffer byte-by-byte, implement your own strlen, and debug with the Xcode disassembler open. Happy hacking!