Louis Better than before

Introduction of Name Mangle and Demangle

Intorduction

When developing in C++, one of usual tasks is to demangle the name of a C++ method. Because the linkers only support C identifiers for symbol names, but don’t have any knowledge of C++’s namespaces, object, overload functions, etc. That means the C++ compiler needs to generate C identifier compatible symbols for C++ constructs. This process is called “Name mangling”, the resulting symbol is a “mangled symbol”, and reconstrucing the original C++ name is “demangling”. In this post, I would like to go through the concept of name mangling and explan some ways to do demangle.

Name Mangling

The mangling scheme was estibalished by Microsoft and has been informally followed by other compilers including GNU GCC and Clanging, etc. The Name mangling is the process in which descriptive data is added to a functions identifier at link time. This data indicates which namespace and object (if any) that a function belongs to, along with the argumnets that it is designed to handle and the order in which those arguments should be passed.

How names and mangled

There isn’t a standarized scheme by which even trivial C++ identifiers are mangled, and consequently different compilers mangle public symbols in radically different ways. But, Here I take the “clang-1205.0.22.9” C++ compiler to mangle the a simple class including empty container constructor and a member function.

// Basic class
class CA
{
    public:
        CA(){};
        void func1(map<pair<int, int>, int>& container);
};

$ nm basic_class.o | rg "CA
0000000000000000 T __ZN2CA5func1ERNSt3__13mapINS0_4pairIiiEEiNS0_4lessIS3_EENS0_9allocatorINS2_IKS3_iEEEEEE
0000000000000600 T __ZN2CAC1Ev
00000000000009e0 T __ZN2CAC2Ev

<Note> Attributes in the mangled name:

  • Indication things are mangled: _Z
  • Nested name indication: N<numver>
  • Each component of the method name, including namespace, classes, and name, with a length and the identifier
  • Argument types
  • Const indication
  • Reference indication

You could find and learn more information from Wiki: Name mangling

Useful Utilities

In order to make mangled name readable, there are two useful binary utilities in GNU Binutils to decipher individual symbols and demangle compiled C++ names: c++filt and nm. The c++filt utility is enable to demangle low-level names into user-level names so that the linker can keep these overloaded function from clash; The nm utility is enable to examine binary files and to display the contents of those files, and also supports the option to demangle low-level symbol names into user-level names (The optional demangling style arhyment can be used to choose an appropriate demangling style for your compiler.).

Using c++filt binary tool

The command line of c++filt utility basically pass an entire assembler source file that containing mangled names and decipher individual symbols. Also, see the same source file containing demangled names. Here’s the utility’s syntax:

$ c++filt -h
OVERVIEW: llvm symbol undecoration tool

USAGE: c++filt [options] <mangled>

And the following examples show the simple method to work with this command:

$ c++filt -n _Z1fv
f()

Using nm binary tool

The command line of nm utility basically lists symbols from object files. Here’s the utility’s syntax:

$ nm -h                                
OVERVIEW: llvm symbol table dumper
USAGE: nm [options] <input files>

And the following examples show how the command works:

$ nm inherit.o
0000000100003e54 s GCC_except_table19
0000000100003e94 s GCC_except_table22
0000000100003eb8 s GCC_except_table54
0000000100003ecc s GCC_except_table59
                 U __Unwind_Resume
0000000100002b70 T __ZN2CA5func1Ev
0000000100002c80 T __ZN2CA5func2Ev
0000000100002cc0 T __ZN2CA5func3Ev
0000000100002fd0 t __ZN2CAC1Ev
0000000100003030 t __ZN2CAC2Ev
...

$ nm --demangle inherit.o 
0000000100003e54 s GCC_except_table19
0000000100003e94 s GCC_except_table22
0000000100003eb8 s GCC_except_table54
0000000100003ecc s GCC_except_table59
                 U __Unwind_Resume                                                                                                             
0000000100002b70 T CA::func1()
0000000100002c80 T CA::func2()
0000000100002cc0 T CA::func3()
...

<NOTE:> The common symbol types:

Shorthand Type
t or T The symbol is in the text(code) section
s or S The symbol is in uninitialized or zero-initialized data section for small objects.
U The symbol is undefined
g or G The symbol is in an initialized data section for small objects. Some object file formats permit more efficient access to small data objects, such as a global int variable as opposed to a large global array.

Reference

[1] Programming Utilities Guide: C++ Mangled Symbols

[2] Wiki: Name mangling

[3] The Secret Life of C++: Symbol Mangling

[4] GUN Binary Utilities: c++filt

[5] cpp_demangle: a C++ linker symbol demangler

Feel free to leave the comments below or email to me. Any pieces of advice are always welcome. :)