Prerequisites

The most common mistake newbies make is to decompile the whole EXE and try to understand everything from the start. But we are not trying to rewrite the whole engine from scratch, so there are no need to know everything. We only need the part that will get what we want, or the relevant part.

This tutorial is about locating the ASM code to help us to understand and to work with data. Naturally, the reader is expected to be able to recognize the ASM code the game is executing as.

  • PSX/PS2/PSP is running MIPS code.

  • Saturn is running SH-2 code.

  • GBA/NDS is running ARM code.

  • etc…​

The readers is also expected to understand the basic 4 main data types:

  • int

  • float

  • bool or bitflags

  • text string or binary AOB (array of byte)

And understand the fixed data size varieties:

  • int is read/parse from a text file. It is native register size as the operating system.

  • int16 is read from a binary file. It is int in 16-bits or 2 bytes.

  • int32 is read from a binary file. It is int in 32-bits or 4 bytes.

  • int64 is read from a binary file. It is int in 64-bits or 8 bytes.

When exploring the data with a hex editor, this is how they are represented as Array of Byte (AOB):

endianness int32 0x123456 as AOB

big endian byte order

00 12 34 56

little endian byte order

56 34 12 00

The nonsense ASM code jargons in different architecture:

  • Able to distinguish doubleword (int) and double (float).

  • On DOS, byte is int8, word is in16, and doubleword is int32.

  • On MIPS, byte is int8, halfword is in16, and word is int32.

  • On SH-2, byte is int8, word is in16, and longword is int32.

  • On MIPS, lb is load signed byte and lbu is load unsigned byte.

  • On ARM, ldrb is load unsigned byte and ldrsb is load signed byte.

  • etc…​

Warning
Be careful about shorthands! If longword shorten to long, and halfword shorten to half, then doubleword will shorten to double…​

Lastly, know your tools! Study and learn the features of the tool you’re using. Try to find a workaround if required feature is not available. In worst case, the tool is not good enough, you will need to replace it with something better.

The Method Overview

RAM Memory holds many things, including the stuff you want. Any ASM code reading or writing those data must also be the relevant ASM code working on it.

Why would the code want to read compressed data? Most likely it is trying to decompress it. So you’ll find the decompression function.

Why would the code want to read animation data? Most likely it is trying to transform drawing XY coordinates on the screen. So you’ll find the transformation function.

ram read write

Hence, the basic steps are

  1. Identify the data you are looking for. It can be anything, including texture data, animation data, dialogue text, etc.

  2. Run the game until the data is loaded to memory. Pause the game and look for the RAM address (or user memory address) that holds the data.

  3. Put a READ/WRITE breakpoint on that RAM Address region and resume running the game. The game will automatic break when executing the ASM code working with the data. That region of ASM code is the relevant function.

A visual example on 3 steps above:

With Input Data With Output Data
over input data

From compressed data (game file) to RAM address 0x8007d160

over output data

From decompressed data to RAM address 0x801a067c

over input break 1

Use RAM Address 0x8007d160 to put a READ breakpoint

over output break 1

Use RAM Address 0x801a067c to put a WRITE breakpoint

over input break 2

Game breaks on ASM code 0x8002896c

over output break 2

Game breaks on ASM code 0x800289e8

Using 2 "jr ra" (return), the ASM code from 0x80028960 to 0x80028a8c is identified as the "decompression function".

After that, you can then use a decompiler to help understanding the ASM code, such as Ghidra [ghidra].

over ghidra

Some notes about workarounds

  • If your tool doesn’t have the ability to explore RAM, try to find a way to dump the RAM to a file and explore it with a hex editor.

  • Save state (not in-game Save File) will have a copy of the RAM dump. Starts from there!

  • You can also use Cheat Engine [cheat], but unless you are working on PC game, the RAM address you get is for the emulator. You’ll need to find the base address and do some math to match the address back to the game.

  • Your tool must have the ability to put Memory READ or WRITE breakpoints!

Examples

Sprite Data

Game

PSX Choujin Gakuen - Gowcaizer

Debugger

nocash PSX [nopsx] + Cheat Engine [cheat]

Let’s start with something simple - we are trying to rip a character sprite. Using nocash PSX VRAM Viewer, we can see the sprite is a complete sprite.

decode vram

Remember - data need to be read to RAM before it can be transfer to VRAM. So we need to explore RAM for traces of the sprite.

Unfortunately, while nocash psx is very useful, it doesn’t have some necessary features to help us explore RAM. so we use Cheat Engine to workaround its limitations.

The first thing we need to find is the base RAM address. We need to tie Cheat Engine address back to nocash PSX address. So we go to Cheat Engine Scan Settings and turn on MEM_MAPPED.

Now make a Save State at nocash PSX. Then goto RAM 80010000 and edit in byte sequence "41 79 61" (Aya). Use Cheat Engine to search for this byte sequence.

Tip
Always make a save state before editing RAM. If something went wrong, you can always revert back by using load state.

decode aya 1 decode aya 2

So we get an address b09100 from Cheat Engine, and that is correspond to 80010000 on nocash PSX.

The size of PSX RAM is 2MB, from 80000000 to 80200000. That would translate into Cheat Engine address as:

PSX 80000000 = CE b09100 - 10000
	= CE af9100

PSX 80200000 = CE af9100 + 200000
	= CE cf9100

Now we can use Cheat Engine to explore PSX RAM. We are looking for graphic data, so we open Memory Viewer subwindow, and select View → Graphical memory view.

We will need to look for anything suspicious at address from af9100 to cf9100. Refer Raw Image Data for more details.

decode cheat 1 decode cheat 2

We found the graphic data at c9f5a. Using the base address, we can translate it back to PSX address.

CE c9f05a = c9f05a - af9100 + PSX 80000000
	= PSX 801a5f50

Let’s go back to nocash and double-check if we are doing it correctly and looking at the same data.

decode cheat 3 decode cheat 4

YES! They are both indeed the same!

Now we can put a WRITE breakpoint to look for the decompression function:

decode break 1 decode break 2

The data is READ from t2 at RAM 8004ef70 and WRITE to a0 at RAM 801a5f50. Now we can use the data at t2 to locate the original game file.

Since the game files still have their original names, the task becomes very easy. The character name is Karin, so obviously the file is data/act/kari.act. The data is then located at offset 0x4f1e9.

decode break 3 decode break 4

With the original game file and the decompression function found, and also have decompressed data to verify against with, we have pretty much everything we need to start coding.

Here is the result with all sprites decompressed and packed into a texture atlas:

kari.act
Tip
This example starts from output data (VRAM) and work its way back to input data (game file). Try do the steps in reverse to link data from game file to VRAM.

Sprite Data (no Cheat Engine)

It is an alternative method to explore RAM when Cheat Engine is not available, didn’t work or could not be used.

A save state will make a backup for the whole program. It will have a copy of RAM dump in it. So the first step is to find a way to extract the RAM dump from save state.

Luckily, nocash PSX has the option to make uncompressed save state:

save aya 1

Just like Cheat Engine before, we need to find the base RAM address. Goto RAM 80010000, edit in byte sequence "41 79 61" (Aya) and then make a save state.

Since the save state is not compressed, we can simply use a hex editor and look for "Aya" (as string or byte sequence).

save aya 2 save aya 3

We found "Aya" at 99064. Convert it back to PSX RAM address from 80000000 to 80200000:

PSX 80000000 = FILE 99064 - 10000
	= FILE 89064

PSX 80200000 = FILE 89064 + 200000
	= FILE 289064

With that, we can start exploring RAM dump for sprite data. Refer to Raw Image Data for more details.

save sprite 1 save sprite 2

We discover the graphic data on save state offset 22efe0. Using the base address, we can translate it back to PSX RAM at:

FILE 22efe0 = 22efe0 - 89064 + PSX 80000000
	= PSX 801a5f7c
Tip
It never hurt to go back to nocash PSX to double-check if both address has the same data.

And then we can put WRITE breakpoint to look for the decompression function. Everything is the same after that.

Other Emulators
  • PCSX2 save state is a normal ZIP file. Just rename the extension from .p2s to .zip, the eeMemory.bin file is the RAM dump.

Meta Data

Game

PS2 Odin Sphere

Debugger

PCSX2 [pcsx2]

Starting with output data is only possible for visual data. Meta data like texture atlas, hitbox, and animation data, you’ll want to start from input data instead.

The process is mostly the same. Start from a game file (or a section of it), look for it on RAM, and then put a READ breakpoint on the whole thing.

Note
If the game file is compressed, then you’ll need to do it 2 times. The first time is to find the decompression function. When you have the data decompressed, you can look for it on RAM and put READ breakpoint as usual.

meta file 1 meta file 2

The file is loaded at RAM da8c00. Offsets for each sections are at 0x54 to 0x80. The data we’re interested in is on section[8] (or s8), from offset 0x13b440 to 0x157080 (size = 0x1bc40).

Caution
Noticed the offsets are updated on RAM? So DO NOT use offsets for Array of Byte search!
offset start = 54 + PS2 da8c00
	= PS2 da8c54

offset end = 80 + PS2 da8c00
	= PS2 da8c80

s8 start = 13b440 + PS2 da8c00
	= PS2 ee4040

s8 end = 157080 + PS2 da8c00
	= PS2 effc80

With that, we can put a READ breakpoint on the whole section[8].

meta break

By noting down the ASM code on every break, we have a very good idea how the data is read.

Remember - we are looking for relevant function for further analysis later. So write down the ASM address along the way.

Data[] ASM data note

00

lhu

int16

  • 172488 , n * 18 + section[6] offset

  • 17248c , n * 18 + section[6] offset

  • 172494 , n * 18 + section[6] offset

  • 1724c8 , n * 18 + section[6] offset

02

-

-

-

03

-

-

-

04

lhu

int16

  • 172500 , n * 30 + section[7] offset

  • 172504 , n * 30 + section[7] offset

06

lhu

int16

  • 172278

  • 172394

08

lw

int32

  • 172060 , AND 80

  • 1720f0 , AND 100 , 4 , 8

  • 17232c , AND 40

  • 172a44 , AND 1

  • 172a50 , AND 2

  • 172a64 , AND 2

  • 172c28 , AND 400

  • 172ecc , AND 20 , 10

0c

lhu

int16

  • 172160

0e

lbu

int8

  • 172820 , == 2 , == 1

0f

lbu

int8

  • 17239c

10

lbu

int8

  • 1724c4 , == 2 , == 1

11

lbu

int8

  • 1725bc , == 2 , == 1

  • 172f7c

12

lbu

int8

  • 172f74

13

lb

int8

  • 17246c

14

lw

int32

  • 172084

  • 17209c

18

lhu

int16

  • 172088

  • 1720a8

1a

lhu

int16

  • 17208c

  • 1720ac

1c

lw

int32

  • 172080

  • 1720b8

Using the ASM address above, we can conclude there are 3 function parsing the data.

  1. function A from 171fc0 to 1722c0

  2. function B from 1722d0 to 1723f0

  3. function C from 1723f0 to 173020

Function A seems to be doing End of Animation + Looping check. It also have SFX/voice playback.

Function B seems to be doing math to normalize animation rate from FPS to value between 0.0 to 1.0.

Function C is a rather huge function and seems to be drawing function with a final draw call to GPU.


With the ASM address, we can have a proper understanding how certain things work. For example, data[10] read at ASM address 1724c4 will lead us to 2 if’s and 2 function.

meta ghidra s7

When data[10] is 2 or 1, each will lead us to a function. For anything else, use the current frame only.

Let us examine what these 2 function do:

meta ghidra s7 2

The function accepts 4 frames as arguments. Based on the code using power of 2 and power of 3, we can guess the intepolation is based on polynomial formula:

P(t) = at^3 + bt^2 + ct + d]

Arrange the algorithm from the screenshot according to Polynomial formula above:

result =   prev * (-0.5t^3 +  1.0t^2 + -0.5t + 0)
         + cur  * ( 1.5t^3 + -2.5t^2 +     0 + 1)
         + nxt1 * (-1.5t^3 +  2.0t^2 +  0.5t + 0)
         + nxt2 * ( 0.5t^3 + -0.5t^2 +     0 + 0)

Then convert it to matrix form:

\$"result" = [t^3 t^2 t 1] * [ [-0.5 , 1.5 , -1.5 , 0.5] , [ 1.0 , -2.5 , 2.0 , -0.5] , [-0.5 , 0 , 0.5 , 0 ] , [ 0 , 1.0 , 0 , 0 ] ] * [ ["prev"] , ["cur" ] , ["nxt1"] , ["nxt2"] ]\$

The 4x4 matrix is the Characteristic Matrix, and it matched Catmull-Rom Spline [catmull].

We can say for certain - when data[10] is 2, do 4-frames Catmull-Rom Spline interpolation.

meta ghidra s7 1

This function is a lot more simple. The formula is just:

result = cur * (1.0 - t) + next * t

So when data[10] is 1, do 2-frames Linear interpolation.


Meta data can be something hard to observe with videos and screenshots. And game can lag and skip frame to maintain performance further complicate things. Having the ability to refer to ASM code will help eliminate a lot of these guesswork.

Loading Files

Game

PSP Princess Crown

Debugger

PPSSPP [ppsspp]

To be able to start from input data, you’ll need to know which files is actually loaded for the scene.

Let’s start by looking for a file list on RAM using a file name from the game

Tip
File names can be case insensitive matched. So try all uppercase, all lowercase, and/or mixed case.
Tip
You can start with file extension.

load file 1 load file 2

We found the the file at RAM 8aeb410. By observing the pattern, the file entry is an array ["FILE.NAME", int32 LBA, int32 file size] and has fixed 0x1c bytes size.

Looking for where the list begins and ends, we discovered the area for the whole list is from RAM 8ae6ae4 to RAM 8af51e4 (size = 0xe700).

load break 1

As usual, when we have something we want to inspect, we’ll put a READ breakpoint on the whole thing.

load break 2 load break 3

As the game search by file name, the ASM code is about comparing every character for a match. Even PPSSPP able to recognize this function as strcmp() (string compare).

We are no interest for strcmp() itself, but the parent function that called it. We will use "Step Into" until the function returned. We end up at ASM address 89e38e4.

load ghidra 1 load ghidra 2

With that, we can use Ghidra [ghidra] to decompile the function.

SUB_001ecdbc is strcmp(), so basically the function is about looping a list of 3000 file entries and return the file entry when it matched.

Note
You can skip this step with Ghidra if you can understand the ASM code directly. This is a very simple function to begin with.

Since READ breakpoint will always break at strcmp() on every file entries, with a list of 3000 entries, it is no longer usefule for us.

This is where we’ll disable the READ breakpoint. and use EXECUTE breakpoint instead.

load opcode

We will put EXECUTE breakpoint when the file entry matched and ready for return, at this 2 ASM address:

89e3904  lw  s2, 18(v0)
89e39a0  lw  s2, 18(v0)
Note
EXECUTE breakpoint is for discover inner workings of the function, which is something only possible after the ASM code is discovered by using READ or WRITE breakpoint.

By keeping track of value at v0, we will get a list of file names the game trying to load. For example, when Gradriel went into Valenadine City Pub, the game will look for these files (in order):

v0 filename

8af1d2c

TORUNEKO.VOL

8aed0d4

GODY.VOL

8aea00c

BABA.VOL

8aed294

GORO.VOL

8af4b1c

WINE.VOL

8af4dbc

WN1C.VOL

8ae7388

002_01_4.EVN

When the file loading order is known, we also reduced the number of files we are working with from 2000+ files to just 7 files.

This technique is also useful on these situations

  1. When the game files are a bunch of meaningless numbers, like 000_00_0.EVN.

  2. When the game files are in sets, it can be in 2’s (texture + atlas) or in 3’s (texture + atlas + palette)

  3. When the game files are shared and has very weird or unknown combinations.

Fundamental Terminology

RAM Memory

Some interesting facts about RAM (Random Access Memory)

  1. It hold pretty much everything currently active on the screen.

  2. Executable data is loaded to RAM, hence PC (program counter) is also a RAM address.

  3. Breakpoint is triggered by ASM code on CPU side. DMA (Direct Memory Access) transfers will not trigger any breakpoints.

    1. DMA from CD-ROM to RAM will not trigger any WRITE breakpoints.

    2. DMA from RAM to GPU and SPU will not trigger any READ breakpoints.

  4. Data read from CD-ROM cannot directly transfer to GPU and SPU. The (partial) data will need to be loaded to RAM first, then transfer to GPU and SPU for video/audio streaming playback.

  5. That also means you cannot transfer any custom format data to GPU/SPU. These files will need to be converted first, so raw pixel data to GPU and raw audio data to SPU.

  6. Modern GPU/SPU may accepts compressed texture/audio file format, but for certain few file formats. And you’ll need to test for its support first before transferring.

  7. Compressed data are decompressed within RAM, from one RAM address to another RAM address.

  8. Emulator Save State has a full dump of the whole RAM.

  9. By removing every temporary data and data loaded from game files, RAM can trimmed down and backup as in-game Save File.

  10. C Dynamic Memory Allocation affect data only. Hence the variables, texture atlas, animation data and stuff are always at different RAM address when loaded.

  11. ASM code from main executable is only load once, hence they’ll always at the same RAM address.

  12. ASM code from overlays has a fixed RAM address to load to, but like data, they can be unload anytime when not needed.

Raw Image Data

In essence, image data is like an ASCII art, with 1 character representing 1 pixel. Example:

+---------------------+
| BBBB   IIII    GGGG |
| BB  B   II    GG    |
| BBBB    II    GG GG |
| BB  B   II    GG  G |
| BBBB   IIII    GGG  |
+---------------------+

The ASCII art "BIG" above has 21 characters per line and it has 5 lines. If converted to an image data, you’ll get a 21x5 pixels image, with each letter "B","I","G" is 7x5 pixels image.

Here is an interactive example for raw image data:

PNG image HTML text
elisa sprite 24
Tip
Try inspect every pixel for the terms "Color", "LookUp", and "Table" (CLUT).
Note
Raw Image is pretty much a giant pixel table. You can either use a "class" to lookup a color (Palette Image), or use a "style" to define a color directly (True Color Image).

Here’s how the same raw image data looks on different apps:

App Screenshot

HxD [hxd] hex editor

raw image hxd

HexView , from psxtools/tsr_hexview.php.

raw image hexview

Cheat Engine [cheat]

raw image cheat

Nana [nana] raw image viewer

raw image nana

GIMP [gimp] photo editor

raw image gimp 1 raw image gimp 2

Pixel Space

The drawing XY are in space-like coordinates, similar to 3D space. The XY can be from -infinite to +infinite px. So all sprite are aligned with [0,0].

Basically, like 3D models, [0,0] makes the sprite coordinate self-contained. The game can then "transform" the sprite to its in-game entity position.

Some people refer [0,0] as "origin" or "root". I call it "center point". And here is an article referring it as "hot spots" [mckids].

Here is the process how I convert XY from pixel space to canvas position:

elise space 1

From the image above, the drawing XY for sprite is [-130,-160 , -130,10 , 150,10 , 150,-160]

First, we’ll need to prepare a canvas large enough to hold the full sprite. We will use the absolute value [absval] of the drawing XY to calculate the size.

Note
The absolute value of a number may be thought of as its distance from zero. It is thus always either a positive number or zero, but never negative.

The absolute values for X are [130,130,150,150]. So the canvas width is:

max value * 2 = 150 px * 2
	= 300 px

The absolute values for Y are [160,10,160,10]. So the canvas height is:

max value * 2 = 160 px * 2
	= 320 px

With that we will prepare a 300x320 px canvas, with [150,160] represent [0,0] in pixel space. The sprite will be drawn at

[-130+150 , -160+160] = [ 20 ,   0]
[-130+150 ,   10+160] = [ 20 , 170]
[ 150+150 ,   10+160] = [300 , 170]
[ 150+150 , -160+160] = [300 ,   0]

The result drawing position on the canvas is [20,0 , 20,170 , 300,170 , 300,0].

elise space 2

It is also very simple to revert canvas position back to pixel space - just subtract [150,160].

Since everything is aligned to [0,0] in pixel space, it doesn’t matter what the final canvas size is, they will all align together correctly for animations.