Hacking Super Mario 64

Stephen Schroeder
9 min readFeb 1, 2018

The first video game console I fell in love with was the N64, and at the time I was blown away by Super Mario 64. One of the first things I can remember designing as a kid, racing home for dinner from my friend’s house, were ideas for custom levels for the game, scribbled down on paper that probably remains in a box somewhere in my parents house to this day. I was dimly aware that people made brutal romhacks of the older 2D mario games, but it was only recently that I decided to see if there was a community for hacking SM64.

Perhaps the most famous SM64 ROM hack, “Star Road”

As it turned out, there were a surprising number of full length SM64 hacks, some even featuring over 150 stars! And it’s not all editing 1s and 0s, there’s a slew of custom tools made specifically for editing this game for object placement, importing level geometry, and editing text.

Toad’s Tool 64

However one thing most hacks have in common is the lack of custom code. While they have unique star counts, level geometry, and textures, they all for the most part share the same moveset, the same cap powerups, and the same set of objects found in the game. The most notable exception was Super Mario 64: Last Impact which managed to recreate every powerup from Super Mario Galaxy 1 and 2 including a rideable Yoshi using assembly code injected into the ROM file. Clearly the information was out there to do anything. So, I decided to try creating a parkour wall running mechanic for Mario!

The Origami64 Wiki

The two primary resources for injecting your custom code are the documentation (largely from Origami64 wiki) and a trusty debugger of your choice (I’d recommend the Project 64 debugger mod by Shygoo). The human-readable language this debugger provides is assembly (ASM). I’ll cover some ASM briefly, but for a more thorough explanation check out this material on Hexidecimal, ASM, and SM64 specific commands.

The Project 64 Debugger by Shygoo

The right side of the debugger shows a line by line view of each ASM command and its arguments as disassembled by the debugger. ASM takes the form of a series of cryptic abbreviated commands, followed by a combination of 2 digit alpha-numeric registers and hexidecimal immediate values. Hexidecimal is a base 16 number system useful in computing because a single digit can represent 4 of the computer’s ones and zeros in binary (2⁴ = 16). Since conventional decimal runs out of digits at 9, we start using English letters. In hex: 9=9, A=10, B=11, F=15, and 10=16. Here’s a code example:

BEQZ T3, 0x80370454

The command BEQZ stands for “Branch on EQual to Zero”. It’s part of a family of commands called branch commands that examine some condition and jump to somewhere else in the code if it’s true; BEQZ does this if its argument is zero. So what is it’s argument? Why, it’s the register T3, which is a place a temporary value can be stored by the computer for math and logic operations. T3 can be seen on the list of registers on the right of the debugger: its value in the image is 00000000 801A6BC0, a hexidecimal number. Finally, the last hexidecimal immediate argument for a BEQZ command represents the location in code to jump to should the comdition be met. So, if T3 = 00000000 801A6BC0, BEQZ T3 will fail to jump to its immediate location, and execute the next instruction as normal. A more complete list of ASM commands can be found here.

Another related command is the BNEZ command. This command does the opposite of BEQZ: BNEZ stands for “Branch on Not Equal to Zero”. As you might expect, we can use this to negate a BEQZ command. One interesting place to apply this is the code line 80256498

BNEZ AT, 0x802564BC

which branches on perpendicular wall collision; it’s the part of the code that gives the player the opportunity to walljump on perpendicular walls. So, if we swap it out for a BEQZ we should be able to walljump on parallel walls!

Take that, physics!

Now, let’s see what injecting a custom function looks like. We’ll be looking at an example that overrides a debug function (it does nothing, but the code calls it anyway) to allow the player to jump override any state to jumping by pressing L:

// [ARMIPS 0.9+] Moon-Jump Example by Davideesk

// When the player holds the L button, Mario will float up into the air

.defineLabel BTN_L, 0x20
.defineLabel MARIO_STRUCT, 0x8033B170

.orga 0x861C0 ; Set ROM address, we are overwritting a useless loop function as our hook.
.area 0xA4 ; Set data import limit to 0xA4 bytes
addiu sp, sp, -0x18
sw ra, 0x14 (sp)

// Tests if the player is holding down the L button.
.f_testInput BUTTON_L, BUTTON_PRESSED, proc802CB1C0_end

li t0, MARIO_STRUCT
li t1, ACTION_JUMP
sw t1, 0x0C(t0) ; Set mario’s action to jumping.
li t1, 30.0
mtc1 t1, f2
swc1 f2, 0x4C(t0) ; Set mario’s y-speed to be 30.0

proc802CB1C0_end:
lw ra, 0x14 (sp)
jr ra
addiu sp, sp, 0x18
.endarea

Let’s start with the bread and butter of functions:

.orga 0x861C0 ; Set ROM address, we are overwritting a useless loop function as our hook.
.area 0xA4 ; Set data import limit to 0xA4 bytes
addiu sp, sp, -0x18
sw ra, 0x14 (sp)

Functions in ASM all pretty much start the same way, with a addiu sp, sp [negative immediate] command and a sw ra, [immediate](sp). Since this is ASM and everything has to be difficult, we must allocate our own memory on the stack for any variables we want to preserve between function calls by incrementing the stack pointer, especially the return address which allows the program to return to whichever function called the function we’re currently executing. When we use a command like sw ra, 0x14 (sp) that has a register in parentheses, we are using the immediate as an offset to that register, so this saves the return address 0x14 half-bytes (20 in base ten) after the stack pointer’s memory address. This way we’ll be able to retrieve it later. The “.area” compiler directive is not strictly necessary, but it does help prevent us from unknowingly overriding key code and causing crashes, so it’s good practice when using MIPS but won’t translate to actual code. In this case the example tells us that it’s safe to override the first 0xA4 half-bytes (or the first 164, in our familiar base 10) so we’ll prevent anything further from being overwritten.

The end of the function looks somewhat similar:

proc802CB1C0_end:
lw ra, 0x14 (sp)
jr ra
addiu sp, sp, 0x18
.endarea

“[locationName]:” is a MIPS shortcut for giving us quick human-readable locations to jump to so we can end the function early by jumping or branching to proc802CB1C0_end in our function. Functions all end largely the same way with a lw ra, [immediate](sp) which loads the address we saved on the stack earlier, jr ra which jumps to the return address location so we return to our parent function that called us, and a addiu sp, sp, [positive immediate] command that frees up the stack memory we had been hoarding.

Now that that busy work’s out of the way, we can get to the meat of the function:

// Tests if the player is holding down the L button.
.f_testInput BUTTON_L, BUTTON_PRESSED, proc802CB1C0_end

li t0, MARIO_STRUCT
li t1, ACTION_JUMP
sw t1, 0x0C(t0) ; Set mario’s action to jumping.
li t1, 30.0
mtc1 t1, f2
swc1 f2, 0x4C(t0) ; Set mario’s y-speed to be 30.0

The first line is an ARMIPS compiler shortcut that allows us to test if the L button was pressed on this frame, and if not jumps to the proc802CB1C0_end location we put at the end of our function. This acts as a branch that let’s us skip the meat of the function if the player didn’t press L.

The li command stands for “Load Immediate”. The line “li t0, MARIO_STRUCT” loads the immediate defined at the top of the function into register t0. MARIO_STRUCT is the location in memory for Mario’s unique value types: these include active powerups, his current action, and his speeds in various directions. What we want to do is set his current action and his y speed in order to jump. In order to do that, we’ll need a new family of save commands:

sw t1, 0x0C(t0) ; Set mario’s action to jumping.

This stands for “Save Word”, and in ASM a word is a 4 byte value. It takes the first argument register, and saves whatever value it holds to the memory location in the second half of the argument. Since t0 currently hold’s mario’s memory location, using 0x0C(t0) saves the value of ACTION_JUMP to a location 12 bytes into Mario’s memory, which is the location for his current action. We also need to save a positive value to Mario’s y speed in order to make him float. However Mario’s speeds are stored in floating point format. Rather than storing, say, “6” as “6”, it’s stored as “0x40C00000”. So we’ll have to convert and then use a special floating point save command.

li t1, 30.0
mtc1 t1, f2
swc1 f2, 0x4C(t0) ; Set mario’s y-speed to be 30.0

Once we load 30.0 into t1, we’ll use the mtc1 command to convert it into the floating point register, f2. Once it’s in floating point format, swc1 will let us save it to 0x4C bytes offset of Mario’s memory position: this is where the speed variable is stored.

After compiling this into a Super Mario 64 ROM, we have a new ability!

Well, this makes things easier.

Wonderful!

What I’ve been intent on recreating is this parkour move from Prince of Persia:

Wall running in Prince of Persia: Sands of Time

Our goal is to make a move that allows Mario to run along a wall a short distance, and to jump off early if the player so chooses. Now, armed with a working knowledge of what all this means, we can start digging! The easiest way to find what we’re looking for is to start with a something useful from the documentation: setting the Wall Collision Triangle 0x8033B1D0, a reference to the wall geometry Mario is currently touching. Remember, we’re making a wall running function so walls will be pretty important to us! So, we’ll set a break point (see the center of the debugger window) for WRITE at 0x8033B1D0 to find all the places that care about our wall collision.

Functions are called with the JAL command, and this gives us most of what we need to know to start navigating through the code to find region of interest. Step into JAL functions to see what they do, return to parent functions to see who called us, and see what changes and when these functions are called to find out what they do. This is how the perpendicular wall collision test above was found.

A wallrun will have a lot of small subtasks in it: keeping mario aloft while he’s wall running, allowing him to jump off the wall into a jumping state, rotating him along the angle of the wall, keeping him adjacent to the wall while running, and determining what speeds he should have for all of these movements. In the end, it might look something like this!

Smoke shown while he can wall run, Sparkles while actually running.

That’s some serious air time!

--

--