Prerequisites
The most common mistake newbies make is to decompile the whole EXE and try to understand everything from the start. But we are not trying to rewrite the whole engine from scratch, so there are no need to know everything. We only need the part that will get what we want, or the relevant part.
This tutorial is about locating the ASM code to help us to understand and to work with data. Naturally, the reader is expected to be able to recognize the ASM code the game is executing as.
-
PSX/PS2/PSP is running MIPS code.
-
Saturn is running SH-2 code.
-
GBA/NDS is running ARM code.
-
etc…
The readers is also expected to understand the basic 4 main data types:
-
int
-
float
-
bool or bitflags
-
text string or binary AOB (array of byte)
And understand the fixed data size varieties:
-
int
is read/parse from a text file. It is native register size as the operating system. -
int16
is read from a binary file. It isint
in 16-bits or 2 bytes. -
int32
is read from a binary file. It isint
in 32-bits or 4 bytes. -
int64
is read from a binary file. It isint
in 64-bits or 8 bytes.
When exploring the data with a hex editor, this is how they are represented as Array of Byte (AOB):
endianness | int32 0x123456 as AOB |
---|---|
big endian byte order |
|
little endian byte order |
|
The nonsense ASM code jargons in different architecture:
-
Able to distinguish
doubleword
(int) anddouble
(float). -
On DOS,
byte
is int8,word
is in16, anddoubleword
is int32. -
On MIPS,
byte
is int8,halfword
is in16, andword
is int32. -
On SH-2,
byte
is int8,word
is in16, andlongword
is int32. -
On MIPS,
lb
isload signed byte
andlbu
isload unsigned byte
. -
On ARM,
ldrb
isload unsigned byte
andldrsb
isload signed byte
. -
etc…
Warning
|
Be careful about shorthands! If longword shorten to long , and halfword shorten to half , then doubleword will shorten to double …
|
Lastly, know your tools! Study and learn the features of the tool you’re using. Try to find a workaround if required feature is not available. In worst case, the tool is not good enough, you will need to replace it with something better.
The Method Overview
RAM Memory holds many things, including the stuff you want. Any ASM code reading or writing those data must also be the relevant ASM code working on it.
Why would the code want to read compressed data? Most likely it is trying to decompress it. So you’ll find the decompression function.
Why would the code want to read animation data? Most likely it is trying to transform drawing XY coordinates on the screen. So you’ll find the transformation function.
Hence, the basic steps are
-
Identify the data you are looking for. It can be anything, including texture data, animation data, dialogue text, etc.
-
Run the game until the data is loaded to memory. Pause the game and look for the RAM address (or user memory address) that holds the data.
-
Put a READ/WRITE breakpoint on that RAM Address region and resume running the game. The game will automatic break when executing the ASM code working with the data. That region of ASM code is the relevant function.
A visual example on 3 steps above:
With Input Data | With Output Data |
---|---|
Using 2 "jr ra" (return), the ASM code |
After that, you can then use a decompiler to help understanding the ASM code, such as Ghidra [ghidra].
Some notes about workarounds
-
If your tool doesn’t have the ability to explore RAM, try to find a way to dump the RAM to a file and explore it with a hex editor.
-
Save state (not in-game Save File) will have a copy of the RAM dump. Starts from there!
-
You can also use Cheat Engine [cheat], but unless you are working on PC game, the RAM address you get is for the emulator. You’ll need to find the base address and do some math to match the address back to the game.
-
Your tool must have the ability to put Memory READ or WRITE breakpoints!
Examples
Sprite Data
Let’s start with something simple - we are trying to rip a character sprite. Using nocash PSX VRAM Viewer, we can see the sprite is a complete sprite.
Remember - data need to be read to RAM before it can be transfer to VRAM. So we need to explore RAM for traces of the sprite.
Unfortunately, while nocash psx is very useful, it doesn’t have some necessary features to help us explore RAM. so we use Cheat Engine to workaround its limitations.
The first thing we need to find is the base RAM address. We need to tie Cheat Engine address back to nocash PSX address. So we go to Cheat Engine Scan Settings and turn on MEM_MAPPED.
Now make a Save State at nocash PSX. Then goto RAM 80010000 and edit in byte sequence "41 79 61" (Aya). Use Cheat Engine to search for this byte sequence.
Tip
|
Always make a save state before editing RAM. If something went wrong, you can always revert back by using load state. |
So we get an address b09100
from Cheat Engine, and that is correspond to 80010000
on nocash PSX.
The size of PSX RAM is 2MB, from 80000000
to 80200000
. That would translate into Cheat Engine address as:
PSX 80000000 = CE b09100 - 10000 = CE af9100 PSX 80200000 = CE af9100 + 200000 = CE cf9100
Now we can use Cheat Engine to explore PSX RAM. We are looking for graphic data, so we open Memory Viewer subwindow, and select View → Graphical memory view.
We will need to look for anything suspicious at address from af9100 to cf9100
. Refer Raw Image Data for more details.
We found the graphic data at c9f5a
. Using the base address, we can translate it back to PSX address.
CE c9f05a = c9f05a - af9100 + PSX 80000000 = PSX 801a5f50
Let’s go back to nocash and double-check if we are doing it correctly and looking at the same data.
YES! They are both indeed the same!
Now we can put a WRITE breakpoint to look for the decompression function:
The data is READ from t2
at RAM 8004ef70 and WRITE to a0
at RAM 801a5f50. Now we can use the data at t2
to locate the original game file.
Since the game files still have their original names, the task becomes very easy. The character name is Karin, so obviously the file is data/act/kari.act
. The data is then located at offset 0x4f1e9.
With the original game file and the decompression function found, and also have decompressed data to verify against with, we have pretty much everything we need to start coding.
Here is the result with all sprites decompressed and packed into a texture atlas:
Tip
|
This example starts from output data (VRAM) and work its way back to input data (game file). Try do the steps in reverse to link data from game file to VRAM. |
Sprite Data (no Cheat Engine)
It is an alternative method to explore RAM when Cheat Engine is not available, didn’t work or could not be used.
A save state will make a backup for the whole program. It will have a copy of RAM dump in it. So the first step is to find a way to extract the RAM dump from save state.
Luckily, nocash PSX has the option to make uncompressed save state:
Just like Cheat Engine before, we need to find the base RAM address. Goto RAM 80010000, edit in byte sequence "41 79 61" (Aya) and then make a save state.
Since the save state is not compressed, we can simply use a hex editor and look for "Aya" (as string or byte sequence).
We found "Aya" at 99064
. Convert it back to PSX RAM address from 80000000
to 80200000
:
PSX 80000000 = FILE 99064 - 10000 = FILE 89064 PSX 80200000 = FILE 89064 + 200000 = FILE 289064
With that, we can start exploring RAM dump for sprite data. Refer to Raw Image Data for more details.
We discover the graphic data on save state offset 22efe0
. Using the base address, we can translate it back to PSX RAM at:
FILE 22efe0 = 22efe0 - 89064 + PSX 80000000 = PSX 801a5f7c
Tip
|
It never hurt to go back to nocash PSX to double-check if both address has the same data. |
And then we can put WRITE breakpoint to look for the decompression function. Everything is the same after that.
- Other Emulators
-
-
PCSX2 save state is a normal ZIP file. Just rename the extension
from .p2s to .zip
, theeeMemory.bin
file is the RAM dump.
-
Meta Data
- Game
-
PS2 Odin Sphere
- Debugger
-
PCSX2 [pcsx2]
Starting with output data is only possible for visual data. Meta data like texture atlas, hitbox, and animation data, you’ll want to start from input data instead.
The process is mostly the same. Start from a game file (or a section of it), look for it on RAM, and then put a READ breakpoint on the whole thing.
Note
|
If the game file is compressed, then you’ll need to do it 2 times. The first time is to find the decompression function. When you have the data decompressed, you can look for it on RAM and put READ breakpoint as usual. |
The file is loaded at RAM da8c00. Offsets for each sections are at 0x54 to 0x80. The data we’re interested in is on section[8] (or s8
), from offset 0x13b440 to 0x157080 (size = 0x1bc40).
Caution
|
Noticed the offsets are updated on RAM? So DO NOT use offsets for Array of Byte search! |
offset start = 54 + PS2 da8c00 = PS2 da8c54 offset end = 80 + PS2 da8c00 = PS2 da8c80 s8 start = 13b440 + PS2 da8c00 = PS2 ee4040 s8 end = 157080 + PS2 da8c00 = PS2 effc80
With that, we can put a READ breakpoint on the whole section[8].
By noting down the ASM code on every break, we have a very good idea how the data is read.
Remember - we are looking for relevant function for further analysis later. So write down the ASM address along the way.
Data[] | ASM | data | note |
---|---|---|---|
00 |
lhu |
int16 |
|
02 |
- |
- |
- |
03 |
- |
- |
- |
04 |
lhu |
int16 |
|
06 |
lhu |
int16 |
|
08 |
lw |
int32 |
|
0c |
lhu |
int16 |
|
0e |
lbu |
int8 |
|
0f |
lbu |
int8 |
|
10 |
lbu |
int8 |
|
11 |
lbu |
int8 |
|
12 |
lbu |
int8 |
|
13 |
lb |
int8 |
|
14 |
lw |
int32 |
|
18 |
lhu |
int16 |
|
1a |
lhu |
int16 |
|
1c |
lw |
int32 |
|
Using the ASM address above, we can conclude there are 3 function parsing the data.
-
function A from 171fc0 to 1722c0
-
function B from 1722d0 to 1723f0
-
function C from 1723f0 to 173020
Function A seems to be doing End of Animation + Looping check. It also have SFX/voice playback.
Function B seems to be doing math to normalize animation rate from FPS to value between 0.0 to 1.0.
Function C is a rather huge function and seems to be drawing function with a final draw call to GPU.
With the ASM address, we can have a proper understanding how certain things work. For example, data[10] read at ASM address 1724c4 will lead us to 2 if’s and 2 function.
When data[10] is 2 or 1, each will lead us to a function. For anything else, use the current frame only.
Let us examine what these 2 function do:
The function accepts 4 frames as arguments. Based on the code using power of 2 and power of 3, we can guess the intepolation is based on polynomial formula:
P(t) = at^3 + bt^2 + ct + d]
Arrange the algorithm from the screenshot according to Polynomial formula above:
result = prev * (-0.5t^3 + 1.0t^2 + -0.5t + 0) + cur * ( 1.5t^3 + -2.5t^2 + 0 + 1) + nxt1 * (-1.5t^3 + 2.0t^2 + 0.5t + 0) + nxt2 * ( 0.5t^3 + -0.5t^2 + 0 + 0)
Then convert it to matrix form:
The 4x4 matrix is the Characteristic Matrix, and it matched Catmull-Rom Spline [catmull].
We can say for certain - when data[10] is 2, do 4-frames Catmull-Rom Spline interpolation.
This function is a lot more simple. The formula is just:
result = cur * (1.0 - t) + next * t
So when data[10] is 1, do 2-frames Linear interpolation.
Meta data can be something hard to observe with videos and screenshots. And game can lag and skip frame to maintain performance further complicate things. Having the ability to refer to ASM code will help eliminate a lot of these guesswork.
Loading Files
- Game
-
PSP Princess Crown
- Debugger
-
PPSSPP [ppsspp]
To be able to start from input data, you’ll need to know which files is actually loaded for the scene.
Let’s start by looking for a file list on RAM using a file name from the game
Tip
|
File names can be case insensitive matched. So try all uppercase, all lowercase, and/or mixed case. |
Tip
|
You can start with file extension. |
We found the the file at RAM 8aeb410. By observing the pattern, the file entry is an array ["FILE.NAME", int32 LBA, int32 file size]
and has fixed 0x1c bytes size.
Looking for where the list begins and ends, we discovered the area for the whole list is from RAM 8ae6ae4 to RAM 8af51e4
(size = 0xe700).
As usual, when we have something we want to inspect, we’ll put a READ breakpoint on the whole thing.
As the game search by file name, the ASM code is about comparing every character for a match. Even PPSSPP able to recognize this function as strcmp()
(string compare).
We are no interest for strcmp()
itself, but the parent function that called it. We will use "Step Into" until the function returned. We end up at ASM address 89e38e4
.
With that, we can use Ghidra [ghidra] to decompile the function.
SUB_001ecdbc
is strcmp()
, so basically the function is about looping a list of 3000 file entries and return the file entry when it matched.
Note
|
You can skip this step with Ghidra if you can understand the ASM code directly. This is a very simple function to begin with. |
Since READ breakpoint will always break at strcmp()
on every file entries, with a list of 3000 entries, it is no longer usefule for us.
This is where we’ll disable the READ breakpoint. and use EXECUTE breakpoint instead.
We will put EXECUTE breakpoint when the file entry matched and ready for return, at this 2 ASM address:
89e3904 lw s2, 18(v0) 89e39a0 lw s2, 18(v0)
Note
|
EXECUTE breakpoint is for discover inner workings of the function, which is something only possible after the ASM code is discovered by using READ or WRITE breakpoint. |
By keeping track of value at v0
, we will get a list of file names the game trying to load. For example, when Gradriel went into Valenadine City Pub, the game will look for these files (in order):
v0 | filename |
---|---|
8af1d2c |
TORUNEKO.VOL |
8aed0d4 |
GODY.VOL |
8aea00c |
BABA.VOL |
8aed294 |
GORO.VOL |
8af4b1c |
WINE.VOL |
8af4dbc |
WN1C.VOL |
8ae7388 |
002_01_4.EVN |
When the file loading order is known, we also reduced the number of files we are working with from 2000+ files to just 7 files.
This technique is also useful on these situations
-
When the game files are a bunch of meaningless numbers, like
000_00_0.EVN
. -
When the game files are in sets, it can be in 2’s (texture + atlas) or in 3’s (texture + atlas + palette)
-
When the game files are shared and has very weird or unknown combinations.
Fundamental Terminology
RAM Memory
Some interesting facts about RAM (Random Access Memory)
-
It hold pretty much everything currently active on the screen.
-
Executable data is loaded to RAM, hence PC (program counter) is also a RAM address.
-
Breakpoint is triggered by ASM code on CPU side. DMA (Direct Memory Access) transfers will not trigger any breakpoints.
-
DMA from CD-ROM to RAM will not trigger any WRITE breakpoints.
-
DMA from RAM to GPU and SPU will not trigger any READ breakpoints.
-
-
Data read from CD-ROM cannot directly transfer to GPU and SPU. The (partial) data will need to be loaded to RAM first, then transfer to GPU and SPU for video/audio streaming playback.
-
That also means you cannot transfer any custom format data to GPU/SPU. These files will need to be converted first, so raw pixel data to GPU and raw audio data to SPU.
-
Modern GPU/SPU may accepts compressed texture/audio file format, but for certain few file formats. And you’ll need to test for its support first before transferring.
-
Compressed data are decompressed within RAM, from one RAM address to another RAM address.
-
Emulator Save State has a full dump of the whole RAM.
-
By removing every temporary data and data loaded from game files, RAM can trimmed down and backup as in-game Save File.
-
C Dynamic Memory Allocation affect data only. Hence the variables, texture atlas, animation data and stuff are always at different RAM address when loaded.
-
ASM code from main executable is only load once, hence they’ll always at the same RAM address.
-
ASM code from overlays has a fixed RAM address to load to, but like data, they can be unload anytime when not needed.
Raw Image Data
In essence, image data is like an ASCII art, with 1 character representing 1 pixel. Example:
+---------------------+ | BBBB IIII GGGG | | BB B II GG | | BBBB II GG GG | | BB B II GG G | | BBBB IIII GGG | +---------------------+
The ASCII art "BIG" above has 21 characters per line and it has 5 lines. If converted to an image data, you’ll get a 21x5 pixels image, with each letter "B","I","G" is 7x5 pixels image.
Here is an interactive example for raw image data:
PNG image | HTML text | ||||
---|---|---|---|---|---|
|
Here’s how the same raw image data looks on different apps:
App | Screenshot |
---|---|
HxD [hxd] hex editor |
|
HexView , from |
|
Cheat Engine [cheat] |
|
Nana [nana] raw image viewer |
|
GIMP [gimp] photo editor |
Pixel Space
The drawing XY are in space-like coordinates, similar to 3D space. The XY can be from -infinite to +infinite px. So all sprite are aligned with [0,0]
.
Basically, like 3D models, [0,0]
makes the sprite coordinate self-contained. The game can then "transform" the sprite to its in-game entity position.
Some people refer [0,0]
as "origin" or "root". I call it "center point". And here is an article referring it as "hot spots" [mckids].
Here is the process how I convert XY from pixel space to canvas position:
From the image above, the drawing XY for sprite is [-130,-160 , -130,10 , 150,10 , 150,-160]
First, we’ll need to prepare a canvas large enough to hold the full sprite. We will use the absolute value [absval] of the drawing XY to calculate the size.
Note
|
The absolute value of a number may be thought of as its distance from zero. It is thus always either a positive number or zero, but never negative. |
The absolute values for X are [130,130,150,150]
. So the canvas width is:
max value * 2 = 150 px * 2 = 300 px
The absolute values for Y are [160,10,160,10]
. So the canvas height is:
max value * 2 = 160 px * 2 = 320 px
With that we will prepare a 300x320 px canvas, with [150,160]
represent [0,0]
in pixel space. The sprite will be drawn at
[-130+150 , -160+160] = [ 20 , 0] [-130+150 , 10+160] = [ 20 , 170] [ 150+150 , 10+160] = [300 , 170] [ 150+150 , -160+160] = [300 , 0]
The result drawing position on the canvas is [20,0 , 20,170 , 300,170 , 300,0]
.
It is also very simple to revert canvas position back to pixel space - just subtract [150,160]
.
Since everything is aligned to [0,0]
in pixel space, it doesn’t matter what the final canvas size is, they will all align together correctly for animations.
Links
- Debugger
- Tool
- Ghidra + Java
-
- Art + Assert
- Extra Readings
-
-
[] https://web.archive.org/web/20241230132023/https://games.greggman.com/game/programming_m_c__kids/
-
[] https://web.archive.org/web/20240229214551/https://www.angelfire.com/tx5/someone42/psfrip.txt
-
[] https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
-
[] https://gitlab.winehq.org/wine/wine/-/wikis/Wine-Developer%27s-Guide/Debugging-Wine
-