System prompt

The exact instruction the agent receives, verbatim. It is identical for every model and provider, so the only variable is the model itself. Assembled in harness/generate.py from the task, the ABI contract, and a short response instruction.

# TASK — Implement a Game Boy (DMG) emulator from scratch

You are a senior systems engineer. Your task is to implement an accurate emulator of the
original Game Boy (model **DMG**) in Rust, compiled to WebAssembly.

This is a **long-horizon** task. There is no deadline and no reward for finishing quickly.
Work carefully and incrementally; prefer correctness over coverage. The task does not end —
keep improving accuracy for as long as you are running. If you believe you are "done",
pick the next most-impactful inaccuracy and fix it.

## What you must produce

A Rust cargo project rooted at `/task/` whose library crate is named `gb_emu` and builds,
with **exactly**:

    cargo build --release --lib --target wasm32-unknown-unknown

to `target/wasm32-unknown-unknown/release/gb_emu.wasm`, exporting the ABI specified in
`/task/spec/ABI.md`. Do **not** add a `.cargo/config.toml`, custom `RUSTFLAGS`, or anything
that changes the build command or the artifact path. The ABI is the only fixed contract;
every internal design decision (CPU, PPU, timing, memory mapping, MBCs, APU) is yours.

## What you are given (read-only)

- `/task/spec/ABI.md` — the exact WASM exports your module must provide.
- `/task/spec/pandocs.html` — Pan Docs, the community Game Boy hardware reference.
- `/task/spec/boot_rom.bin` — an open-source DMG boot ROM. Load it via `load_boot_rom`
  so that your power-on behavior matches the reference.
- `/task/dev-roms/` — a handful of **homebrew** Game Boy ROMs you may use to self-test.
  (These are for your own debugging; the grading ROMs and input recordings are different
  and are held out from you.)

## Tools available to you

- A normal shell, `rustc`/`cargo` with the `wasm32-unknown-unknown` target, and `wasmtime`.
- `oracle` — a command-line client to a **reference Game Boy emulator** running as a remote
  black box. You may run any ROM through it and observe its output (framebuffers, audio),
  but you cannot see its source. Use it as ground truth:
    - batch:   `oracle run <rom> <frames> [--keys <replay.txt>] [--dump-frames <dir>] [--dump-audio <wav>]`
    - session: `oracle session ...` exposing `set-keys` / `run-frame` / `framebuffer` /
      `audio`, shaped exactly like your own ABI, so you can diff your emulator against the
      reference one frame at a time and localize exactly where you diverge.

You have **no internet access**. Everything you need is on disk or behind `oracle`.

## How you will be graded (so you know what "good" means)

Offline, after you stop, your `gb_emu.wasm` is driven in lockstep against the reference and
scored on a composite in [0,1]:

    overall = 0.60 · replay  +  0.20 · audio  +  0.20 · procedural

- **replay** (largest weight) — short recorded input sequences are played into both your
  emulator and the reference; each frame is compared with a structural-similarity metric
  that rewards "looks almost the same" rather than exact pixel equality. This measures
  whether real games actually play correctly.
- **procedural** — standard self-checking Game Boy test ROMs, scored on their final state.
- **audio** — your audio output is compared to the reference in the spectral domain.

A perfect reference-vs-itself run scores 1.00. You will not be told the grading ROMs or
recordings; do not special-case anything.

## Suggested order of attack (not a constraint)

A working SM83 CPU first, then the memory bus and timers (DIV/TIMA), then the PPU
(background, window, sprites, the STAT/LY timing) so something renders, then joypad input,
then MBC1/MBC3 so larger ROMs load, then the APU for audio. Use `oracle` constantly to
localize divergences frame by frame.


--- spec/ABI.md ---
# Candidate ABI — Game Boy (DMG) emulator, `wasm32-unknown-unknown`

This is the **only** fixed contract. The internal design of the emulator is entirely up
to the candidate; the grader interacts with it exclusively through the exports below.

## Artifact

- A Rust **cargo** project whose library crate is named **`gb_emu`** with
  `crate-type = ["cdylib"]`.
- Built with **exactly**:
  ```
  cargo build --release --lib --target wasm32-unknown-unknown
  ```
- Producing the artifact at the **fixed path**:
  ```
  target/wasm32-unknown-unknown/release/gb_emu.wasm
  ```
- **No** `.cargo/config.toml`, **no** custom `RUSTFLAGS`, no build scripts that alter the
  target or linker. The module must export its linear memory as `memory` (the default for
  a `wasm32-unknown-unknown` cdylib).

## Determinism (load-bearing)

The Game Boy has no entropy source. Given the same boot ROM, cartridge ROM, and input
sequence, the emulator **must** produce a bit-identical sequence of framebuffers and audio
on every run. No wall-clock, no time-seeded RNG, no host I/O.

## Memory & data exchange

The host reads results directly from the module's exported linear `memory`. Buffers
returned by `framebuffer()` / `audio()` must remain valid and stable until the next call
to `run_frame()`.

All exports use the C ABI (`#[no_mangle] pub extern "C"`). Pointers are `u32` byte offsets
into linear memory. Integers are little-endian.

## Exports

### Allocation
```
alloc(size: u32) -> u32        // allocate `size` bytes, return offset (ptr); 0 on failure
dealloc(ptr: u32, size: u32)   // free a prior allocation
```
Used by the host to hand ROM bytes to the module: `alloc` a buffer, the host writes bytes
into `memory` at that offset, then calls `load_rom` / `load_boot_rom`.

### Lifecycle
```
init()                         // construct/initialize the emulator instance (call once, first)
load_boot_rom(ptr: u32, len: u32)  // copy the DMG boot ROM (256 bytes). Optional but required
                                   //   for faithful lockstep with the oracle.
load_rom(ptr: u32, len: u32)   // copy the cartridge ROM image
reset()                        // power-cycle. If a boot ROM was loaded, execution begins in it
                               //   (so the boot animation is reproduced); otherwise begin at
                               //   0x0100 with the canonical post-boot register/IO state.
```
Call order for a run: `init()` → `load_boot_rom(...)` → `load_rom(...)` → `reset()`.

### Per-frame stepping (lockstep)
```
set_keys(mask: u32)            // current joypad state; 1 = pressed. Bit layout:
                               //   bit0=A  bit1=B  bit2=Select  bit3=Start
                               //   bit4=Right bit5=Left bit6=Up bit7=Down
run_frame()                    // advance until exactly one video frame has been produced
                               //   (one DMG frame = 70224 T-cycles; return at VBlank onset)
framebuffer() -> u32           // ptr to the completed frame: 160x144 pixels, RGBA8888,
                               //   4 bytes/pixel, row-major from top-left = 92160 bytes.
                               //   Alpha is ignored by the grader (set 255). The candidate may
                               //   use any DMG palette; the grader color-normalizes before SSIM.
audio() -> u32                 // ptr to audio generated during the last run_frame:
audio_len() -> u32             // number of i16 samples (interleaved stereo L,R) in that buffer.
                               //   Sample format: signed 16-bit, 48000 Hz, stereo interleaved.
                               //   (Audio grading is post-M4; until implemented a candidate may
                               //    return audio_len()==0, but the exports must exist.)
```

### Optional debug exports (used only by the internal CPU smoke-test, never by replay grading)
```
cpu_pc() -> u32                // current PC (optional)
cycles() -> u64                // total T-cycles elapsed since reset (optional)
```

## Frame-boundary convention

One `run_frame()` call == one produced video frame. The grader aligns candidate frame *N*
to oracle frame *N*, starting from `reset()`. Frames produced during the boot animation are
included in the sequence. The oracle (SameBoy) is driven through an identically-shaped
session API so that frame *N* means the same thing on both sides.

## Reference skeleton (signatures only — internal design is yours)

```rust
// lib.rs — crate `gb_emu`, crate-type = ["cdylib"]
#[no_mangle] pub extern "C" fn alloc(size: u32) -> u32 { /* ... */ 0 }
#[no_mangle] pub extern "C" fn dealloc(ptr: u32, size: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn init() { /* ... */ }
#[no_mangle] pub extern "C" fn load_boot_rom(ptr: u32, len: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn load_rom(ptr: u32, len: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn reset() { /* ... */ }
#[no_mangle] pub extern "C" fn set_keys(mask: u32) { /* ... */ }
#[no_mangle] pub extern "C" fn run_frame() { /* ... */ }
#[no_mangle] pub extern "C" fn framebuffer() -> u32 { /* ... */ 0 }
#[no_mangle] pub extern "C" fn audio() -> u32 { /* ... */ 0 }
#[no_mangle] pub extern "C" fn audio_len() -> u32 { /* ... */ 0 }
```


--- HOW TO RESPOND ---
The cargo project and Cargo.toml already exist and MUST NOT change (package `gb_emu`,
crate-type cdylib, NO external crates — std only). You implement ONLY `src/lib.rs`.
Reply with the COMPLETE contents of `src/lib.rs` in a single ```rust code block and nothing
else. Every export in spec/ABI.md must be present.