Debug symbols wooh yeah!
- Enables introspection of binary executables
- Type information!
- Global symbols & address resolution
- Global methods
- Secrets the goverment doesn't want us to know 👽
Formats?
- PDB - Produced by MS compilers, proprietary format
- DWARF - Everybody else?
Core Ideas
struct Struct {
Struct* StructPtr;
int yeet[2];
int yolo[3][2];
};
is (aproximately) represented in debug symbol format as (pseudocode)
Struct = {
members: [
("StructPtr", StructPtrType),
("yeet", ArrayType1)
("yolo", ArrayType2)
]
}
StructPtrType = POINTER(Struct)
ArrayType1 = Array(2, int)
ArrayType2 = Array(3, ArrayType1)
Libraries
- dbghelper lel
- DIA sdk
- pdbparse on github and pip by Moyix. Pretty old? Requires full parse before usable? No documentation? ðŸ˜
- roslyn is c# implementation of windows and portable
- CCI is old c# implementation
- libpdb is a linux(platform limited? 🤔)-lib to parse PDB files
- pdbex recreates struct/unions from PDB-files using DIA
- willglynn/pdb is a rust-library to lazily explore PDB files
Is rust always hard to follow, or is the implementation spooky? - microsoft-pdb from MS is actual implementation of stuff.
Seriously who wrote this, I get it,-92 and stuff, but when ismpspnpn
ormpsnsi
ever a good variable name???? 🙂🔫
References/Sources
- GREAT WRITEUP at LLVM by zjturner (see this)
-
A presentation, also from LLVM.
-
A twitter thread which asks for non-DIA PDB parsers
- Moyix recommended CCI
- Others recommend DIA
- Some bloke recommends Roslyn
- CCI from Microsoft - Old code for reading/writing PDB files in c#
- llvm-pdbutil which does work with PDB files
- Radare2 - OSS hacking/reversing/whatever tool with PDB capabilities
- MS_Symbol_Type_v1.0.pdf
c# packages?
Those are a bit messy but after trying to sort things out it boils down to "Windows PDB in Roslyn" and "Portable PDB in symreader-portable"
- Microsoft.DiaSymReader
- Project Website & Source Repo -> symreader
- symreader
- Managed COM definition for DiaSymReader
- Windows PDB implementation-> Microsoft.DiaSymReader.Native
- Portable PDB implementation-> Microsoft.DiaSymReader.PortablePdb
- Microsoft.DiaSymReader.Native
- Project Website -> roslyn
- Microsoft.DiaSymReader.PortablePdb
- Project Website & Source Repo -> symreader-portable
- symreader-portable
- COM interface definition -> Microsoft.DiaSymReader
- Implements DiaSymReader interface for Portable PDB
Portable PDB
- Portable PDB format intro.
Can't find any source that explicitly says "This is CLI only and not for any native images" ¯\_(ツ)_/¯
MS PDB implementation notes
-
Most of the code deals with
PB
(pointer bytes) for everything. What is typing? Who knows. -
PB
often points at aREC
which consists of a(whole_record_byte_count : u16, leaf_type : 16)
-tuple that is directly followed by type-specific data. -
The method for extracting the name from a type record calls
cbNumField
which callsCbGetNumericData
which is lacking an implementation.- BUT searching github for methods of that name gives a few interesting repos, like this and this. Claims to be NT 4 source code?
-
TI from name at
BOOL TPI1::QueryTiForUDT(const char *sz, BOOL fCase, OUT TI* pti)
-
hash string:
c++ LHASH hashSz(SZ_CONST sz) const { size_t cch = strlen(sz); return LHashPbCb((PB)sz, cch, hdr.tpihash.cHashBuckets); }
More hashing in misc.h -
LLVM hashing LLVM hashes in
\llvm-project\llvm\lib\DebugInfo\PDB\Native\Hash.cpp
LLVM TPI hash inllvm-project\llvm\lib\DebugInfo\PDB\Native\TpiStream.cpp
and writing inTpiStreamBuilder.cpp
.Seems that UDTs (non-forward?) are hashed on name (unique or not dep on scope-flag?) using V1 hash FWD entries are hashed on the whole bytes for the record tho??? But then that's not used Seems they are also hashed on name and that's used.
In file there's a hash for every TI, and bitfields with {valid, valid} kinda, so for every TI there is {hash, valid, valid}. Make a bucket idx from the hash by % (#buckets). Make a list yourself at HashLists[BucketId] and append the TI to it. Then to find a thing by name, make the hash, walk the list, match the LeafID, hash(?), name.
-
Index&Offset:
A plain "skip list" of {TI, offset}. Given a TI, "guess" where it should be in the list (like ((TI-min_ti)/TI_Count) * ListLen) and walk up/down until {TI, offset} is found that is as close below target TI as possible, then iterate records from that offset forward until found.
-
CCI field types
MS PDB implementation Glossary
-
Leaf - A thing that identifies a type record.
Though the type record doesn't need to be a leaf in a graph/tree of the type composition.
Chronological (or something ¯\_(ツ)_/¯ )
- Next: Factorio-related links
- Prev: Trains as belts?
- Next: Factorio-related links