Introduction of Name Mangle and Demangle
09/09/2021 Tags: C_C_plus_plus ProgrammingIntorduction
When developing in C++, one of usual tasks is to demangle the name of a C++ method. Because the linkers only support C identifiers for symbol names, but don’t have any knowledge of C++’s namespaces, object, overload functions, etc. That means the C++ compiler needs to generate C identifier compatible symbols for C++ constructs. This process is called “Name mangling”, the resulting symbol is a “mangled symbol”, and reconstrucing the original C++ name is “demangling”. In this post, I would like to go through the concept of name mangling and explan some ways to do demangle.
Name Mangling
The mangling scheme was estibalished by Microsoft and has been informally followed by other compilers including GNU GCC and Clanging, etc. The Name mangling is the process in which descriptive data is added to a functions identifier at link time. This data indicates which namespace and object (if any) that a function belongs to, along with the argumnets that it is designed to handle and the order in which those arguments should be passed.
How names and mangled
There isn’t a standarized scheme by which even trivial C++ identifiers are mangled, and consequently different compilers mangle public symbols in radically different ways. But, Here I take the “clang-1205.0.22.9” C++ compiler to mangle the a simple class including empty container constructor and a member function.
// Basic class
class CA
{
public:
CA(){};
void func1(map<pair<int, int>, int>& container);
};
$ nm basic_class.o | rg "CA
0000000000000000 T __ZN2CA5func1ERNSt3__13mapINS0_4pairIiiEEiNS0_4lessIS3_EENS0_9allocatorINS2_IKS3_iEEEEEE
0000000000000600 T __ZN2CAC1Ev
00000000000009e0 T __ZN2CAC2Ev
<Note> Attributes in the mangled name:
- Indication things are mangled: _Z
- Nested name indication: N<numver>
- Each component of the method name, including namespace, classes, and name, with a length and the identifier
- Argument types
- Const indication
- Reference indication
You could find and learn more information from Wiki: Name mangling
Useful Utilities
In order to make mangled name readable, there are two useful binary utilities in GNU Binutils to decipher individual symbols and demangle compiled C++ names: c++filt and nm. The c++filt utility is enable to demangle low-level names into user-level names so that the linker can keep these overloaded function from clash; The nm utility is enable to examine binary files and to display the contents of those files, and also supports the option to demangle low-level symbol names into user-level names (The optional demangling style arhyment can be used to choose an appropriate demangling style for your compiler.).
Using c++filt binary tool
The command line of c++filt utility basically pass an entire assembler source file that containing mangled names and decipher individual symbols. Also, see the same source file containing demangled names. Here’s the utility’s syntax:
$ c++filt -h
OVERVIEW: llvm symbol undecoration tool
USAGE: c++filt [options] <mangled>
And the following examples show the simple method to work with this command:
$ c++filt -n _Z1fv
f()
Using nm binary tool
The command line of nm utility basically lists symbols from object files. Here’s the utility’s syntax:
$ nm -h
OVERVIEW: llvm symbol table dumper
USAGE: nm [options] <input files>
And the following examples show how the command works:
$ nm inherit.o
0000000100003e54 s GCC_except_table19
0000000100003e94 s GCC_except_table22
0000000100003eb8 s GCC_except_table54
0000000100003ecc s GCC_except_table59
U __Unwind_Resume
0000000100002b70 T __ZN2CA5func1Ev
0000000100002c80 T __ZN2CA5func2Ev
0000000100002cc0 T __ZN2CA5func3Ev
0000000100002fd0 t __ZN2CAC1Ev
0000000100003030 t __ZN2CAC2Ev
...
$ nm --demangle inherit.o
0000000100003e54 s GCC_except_table19
0000000100003e94 s GCC_except_table22
0000000100003eb8 s GCC_except_table54
0000000100003ecc s GCC_except_table59
U __Unwind_Resume
0000000100002b70 T CA::func1()
0000000100002c80 T CA::func2()
0000000100002cc0 T CA::func3()
...
<NOTE:> The common symbol types:
Shorthand | Type |
---|---|
t or T | The symbol is in the text(code) section |
s or S | The symbol is in uninitialized or zero-initialized data section for small objects. |
U | The symbol is undefined |
g or G | The symbol is in an initialized data section for small objects. Some object file formats permit more efficient access to small data objects, such as a global int variable as opposed to a large global array. |
Reference
[1] Programming Utilities Guide: C++ Mangled Symbols
[3] The Secret Life of C++: Symbol Mangling
[4] GUN Binary Utilities: c++filt
[5] cpp_demangle: a C++ linker symbol demangler
Feel free to leave the comments below or email to me. Any pieces of advice are always welcome. :)