System prompt
The exact instruction the agent receives, verbatim. It is identical for every model and provider, so the only variable is the model itself. Assembled in harness/generate.py from the task, the ABI contract, and a short response instruction.
# TASK — Implement a Game Boy (DMG) emulator from scratch
You are a senior systems engineer. Your task is to implement an accurate emulator of the
original Game Boy (model **DMG**) in Rust, compiled to WebAssembly.
This is a **long-horizon** task. There is no deadline and no reward for finishing quickly.
Work carefully and incrementally; prefer correctness over coverage. The task does not end —
keep improving accuracy for as long as you are running. If you believe you are "done",
pick the next most-impactful inaccuracy and fix it.
## What you must produce
A Rust cargo project rooted at `/task/` whose library crate is named `gb_emu` and builds,
with **exactly**:
cargo build --release --lib --target wasm32-unknown-unknown
to `target/wasm32-unknown-unknown/release/gb_emu.wasm`, exporting the ABI specified in
`/task/spec/ABI.md`. Do **not** add a `.cargo/config.toml`, custom `RUSTFLAGS`, or anything
that changes the build command or the artifact path. The ABI is the only fixed contract;
every internal design decision (CPU, PPU, timing, memory mapping, MBCs, APU) is yours.
## What you are given (read-only)
- `/task/spec/ABI.md` — the exact WASM exports your module must provide.
- `/task/spec/pandocs.html` — Pan Docs, the community Game Boy hardware reference.
- `/task/spec/boot_rom.bin` — an open-source DMG boot ROM. Load it via `load_boot_rom`
so that your power-on behavior matches the reference.
- `/task/dev-roms/` — a handful of **homebrew** Game Boy ROMs you may use to self-test.
(These are for your own debugging; the grading ROMs and input recordings are different
and are held out from you.)
## Tools available to you
- A normal shell, `rustc`/`cargo` with the `wasm32-unknown-unknown` target, and `wasmtime`.
- `oracle` — a command-line client to a **reference Game Boy emulator** running as a remote
black box. You may run any ROM through it and observe its output (framebuffers, audio),
but you cannot see its source. Use it as ground truth:
- batch: `oracle run <rom> <frames> [--keys <replay.txt>] [--dump-frames <dir>] [--dump-audio <wav>]`
- session: `oracle session ...` exposing `set-keys` / `run-frame` / `framebuffer` /
`audio`, shaped exactly like your own ABI, so you can diff your emulator against the
reference one frame at a time and localize exactly where you diverge.
You have **no internet access**. Everything you need is on disk or behind `oracle`.
## How you will be graded (so you know what "good" means)
Offline, after you stop, your `gb_emu.wasm` is driven in lockstep against the reference and
scored on a composite in [0,1]:
overall = 0.60 · replay + 0.20 · audio + 0.20 · procedural
- **replay** (largest weight) — short recorded input sequences are played into both your
emulator and the reference; each frame is compared with a structural-similarity metric
that rewards "looks almost the same" rather than exact pixel equality. This measures
whether real games actually play correctly.
- **procedural** — standard self-checking Game Boy test ROMs, scored on their final state.
- **audio** — your audio output is compared to the reference in the spectral domain.
A perfect reference-vs-itself run scores 1.00. You will not be told the grading ROMs or
recordings; do not special-case anything.
## Suggested order of attack (not a constraint)
A working SM83 CPU first, then the memory bus and timers (DIV/TIMA), then the PPU
(background, window, sprites, the STAT/LY timing) so something renders, then joypad input,
then MBC1/MBC3 so larger ROMs load, then the APU for audio. Use `oracle` constantly to
localize divergences frame by frame.
--- spec/ABI.md ---
# Candidate ABI — Game Boy (DMG) emulator, `wasm32-unknown-unknown`
This is the **only** fixed contract. The internal design of the emulator is entirely up
to the candidate; the grader interacts with it exclusively through the exports below.
## Artifact
- A Rust **cargo** project whose library crate is named **`gb_emu`** with
`crate-type = ["cdylib"]`.
- Built with **exactly**:
```
cargo build --release --lib --target wasm32-unknown-unknown
```
- Producing the artifact at the **fixed path**:
```
target/wasm32-unknown-unknown/release/gb_emu.wasm
```
- **No** `.cargo/config.toml`, **no** custom `RUSTFLAGS`, no build scripts that alter the
target or linker. The module must export its linear memory as `memory` (the default for
a `wasm32-unknown-unknown` cdylib).
## Determinism (load-bearing)
The Game Boy has no entropy source. Given the same boot ROM, cartridge ROM, and input
sequence, the emulator **must** produce a bit-identical sequence of framebuffers and audio
on every run. No wall-clock, no time-seeded RNG, no host I/O.
## Memory & data exchange
The host reads results directly from the module's exported linear `memory`. Buffers
returned by `framebuffer()` / `audio()` must remain valid and stable until the next call
to `run_frame()`.
All exports use the C ABI (`#[no_mangle] pub extern "C"`). Pointers are `u32` byte offsets
into linear memory. Integers are little-endian.
## Exports
### Allocation
```
alloc(size: u32) -> u32 // allocate `size` bytes, return offset (ptr); 0 on failure
dealloc(ptr: u32, size: u32) // free a prior allocation
```
Used by the host to hand ROM bytes to the module: `alloc` a buffer, the host writes bytes
into `memory` at that offset, then calls `load_rom` / `load_boot_rom`.
### Lifecycle
```
init() // construct/initialize the emulator instance (call once, first)
load_boot_rom(ptr: u32, len: u32) // copy the DMG boot ROM (256 bytes). Optional but required
// for faithful lockstep with the oracle.
load_rom(ptr: u32, len: u32) // copy the cartridge ROM image
reset() // power-cycle. If a boot ROM was loaded, execution begins in it
// (so the boot animation is reproduced); otherwise begin at
// 0x0100 with the canonical post-boot register/IO state.
```
Call order for a run: `init()` → `load_boot_rom(...)` → `load_rom(...)` → `reset()`.
### Per-frame stepping (lockstep)
```
set_keys(mask: u32) // current joypad state; 1 = pressed. Bit layout:
// bit0=A bit1=B bit2=Select bit3=Start
// bit4=Right bit5=Left bit6=Up bit7=Down
run_frame() // advance until exactly one video frame has been produced
// (one DMG frame = 70224 T-cycles; return at VBlank onset)
framebuffer() -> u32 // ptr to the completed frame: 160x144 pixels, RGBA8888,
// 4 bytes/pixel, row-major from top-left = 92160 bytes.
// Alpha is ignored by the grader (set 255). The candidate may
// use any DMG palette; the grader color-normalizes before SSIM.
audio() -> u32 // ptr to audio generated during the last run_frame:
audio_len() -> u32 // number of i16 samples (interleaved stereo L,R) in that buffer.
// Sample format: signed 16-bit, 48000 Hz, stereo interleaved.
// (Audio grading is post-M4; until implemented a candidate may
// return audio_len()==0, but the exports must exist.)
```
### Optional debug exports (used only by the internal CPU smoke-test, never by replay grading)
```
cpu_pc() -> u32 // current PC (optional)
cycles() -> u64 // total T-cycles elapsed since reset (optional)
```
## Frame-boundary convention
One `run_frame()` call == one produced video frame. The grader aligns candidate frame *N*
to oracle frame *N*, starting from `reset()`. Frames produced during the boot animation are
included in the sequence. The oracle (SameBoy) is driven through an identically-shaped
session API so that frame *N* means the same thing on both sides.
## Reference skeleton (signatures only — internal design is yours)
```rust
// lib.rs — crate `gb_emu`, crate-type = ["cdylib"]
#[no_mangle] pub extern "C" fn alloc(size: u32) -> u32 { /* ... */ 0 }
#[no_mangle] pub extern "C" fn dealloc(ptr: u32, size: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn init() { /* ... */ }
#[no_mangle] pub extern "C" fn load_boot_rom(ptr: u32, len: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn load_rom(ptr: u32, len: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn reset() { /* ... */ }
#[no_mangle] pub extern "C" fn set_keys(mask: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn run_frame() { /* ... */ }
#[no_mangle] pub extern "C" fn framebuffer() -> u32 { /* ... */ 0 }
#[no_mangle] pub extern "C" fn audio() -> u32 { /* ... */ 0 }
#[no_mangle] pub extern "C" fn audio_len() -> u32 { /* ... */ 0 }
```
--- HOW TO RESPOND ---
The cargo project and Cargo.toml already exist and MUST NOT change (package `gb_emu`,
crate-type cdylib, NO external crates — std only). You implement ONLY `src/lib.rs`.
Reply with the COMPLETE contents of `src/lib.rs` in a single ```rust code block and nothing
else. Every export in spec/ABI.md must be present.