Skip to content

Assembly Language: The Heart of Reverse Engineering

Posted on:7 September 2023 at 01:1918 min read

Assembly Language: The Heart of Reverse Engineering Image by Bruno from Pixabay

Table of contents

Open Table of contents

Introduction

Currently, as of September 2023, Wikipedia lists 679 languages in their List of Programming Languages.

Within the overwhelming landscape of languages, Assembly stands out as one of the most fundamental. It occupies a unique position, acting as a bridge between the abstract world of high-level languages and the raw binary of machine code.

It’s a language that might seem arcane to some, but for those delving into the world of reverse engineering, it’s an essential companion. It’s a world where the programmer is in direct conversation with the computer’s hardware, crafting instructions that are both explicit and powerful!

What to expect from this article

As we journey through the intricacies of Assembly and its role in reverse engineering, readers can expect to:

After reading this article, you will have a solid grasp of why Assembly is often dubbed the “heart” of reverse engineering and why, despite the rise of modern high-level languages, it remains a vital skill for many.

What is Assembly Language?

Assembly language, often simply referred to as ‘Assembly’ or ASM for short, is a low-level programming language that is specific to a computer’s architecture.

The Bridge Between High-Level Languages and Machine Code

Imagine high-level languages as the spoken ones we use in our daily lives – rich, expressive, and abstract. They allow us to convey complex ideas without delving into the minutiae of how our thoughts are processed.

Machine code, on the other hand, is like the electrical impulses in our brain – direct, binary, and not meant for human interpretation.

Unlike high-level languages like Python or Java, which are designed for readability and abstraction, Assembly is much closer to machine code – the binary language that computers natively understand.

When a programmer writes in Assembly, they’re essentially giving direct instructions to the computer’s hardware, making it a powerful tool for those who truly want mastery over a machine’s operation.

Language Processing Flow

To understand the role of Assembly in reverse engineering, it’s important to understand the flow of language processing in a computer.

At a high level, it looks something like this:

The Role of an Assembler

An assembler is a specialised tool that plays a pivotal role in the world of Assembly language. Its function is to take the code written by programmers and assemble it into machine code that can be executed by a computer’s central processing unit (CPU).

Think of it as a translator, converting the structured, mnemonic instructions of Assembly into the binary language of 1s and 0s that a machine understands.

For instance, an Assembly instruction to load a value into a register might look like MOV AX, 1. The assembler would take this instruction and convert it into the corresponding binary code, which might look something like 10110000 00000001.

This process of translation ensures that the programmer can work with a language that is more comprehensible than raw binary, while still producing code that runs with the efficiency and speed of machine code.


In understanding Assembly language, one appreciates the delicate balance it strikes – offering a glimpse into the machine’s soul while still being accessible to human intellect.

It’s this balance that makes Assembly both challenging and rewarding for those who persist despite the difficulty and attain fluency. That being said, there is still great value in having a basic familiarity with the low-level operation of our tools.

The Role of Assembly in Reverse Engineering

Reverse engineering is the art and science of taking apart a product (often software) to understand how it works, either for replicating it, improving upon it, or identifying vulnerabilities.

In the context of software, this often means diving deep into its compiled code to decipher the original source or functionality.

Since most software is compiled down to machine code for execution, understanding that machine code is critical for reverse engineers.

x86 and x64

x86 refers to a family of instruction set architectures (ISA’s) that originated with the Intel 8086 microprocessor. Over time, the term “x86” has come to represent a broad set of 32-bit microprocessors that share a common instruction set and architectural features.

x64, on the other hand, refers to a family of 64-bit microprocessors that are compatible with the x86 instruction set. They’re often referred to as x86-64 or AMD64.

The reason these architectures are so important is that they’re the most common architectures you’ll find in modern systems.

Key Components of Assembly Language

As we’ve established, Assembly language is inherently different from high-level languages, its components are more rudimentary, reflecting the basic operations that a computer can perform.

Let’s delve into some of these essential components and understand their significance.

Registers

Registers are one of the foundational elements of computer architecture. They’re akin to small storage units where data can be stored, retrieved, and manipulated during program execution.

Function of Registers

A register is a small, fast storage location directly within the CPU. It can hold data, instructions, addresses, or any other kind of information that the CPU might need immediate access to.

Because of their location and size, registers allow for rapid data retrieval and manipulation, which makes them essential for efficient program execution.

Commonly Used Registers and Their Roles

Different computer architectures have different registers, and the names will vary depending on the size. For instance, in the x86 architecture, there are 8-bit, 16-bit, 32-bit, and 64-bit registers.

Below, I’ll mostly be referring to the 32-bit registers; however, it’s good to be aware that the most significant bytes of the larger registers can be accessed by the smaller registers.

For example:

With that in mind, here are some of the common 32-bit registers to be aware of:

Instructions

Instructions are the heart of the Assembly language. They dictate the operations that the CPU should perform.

Understanding Common Assembly Instructions

The Role of Operands

Operands are the values or registers that an instruction operates on. For instance, in the instruction MOV AX, 1, AX and 1 are the operands.

The instruction specifies the operation, while the operands determine the data or locations involved in that operation.

Flags

Flags are special registers that store the outcomes of certain operations, especially arithmetic ones. They play a pivotal role in decision-making and control-flow within Assembly programs.

The Purpose of Flags in Decision-making

After an operation, flags can be set, cleared, or left unchanged based on the result. These flags can then be checked to make decisions.

For instance, after a subtraction operation, if the result is zero, the Zero flag will be set. This can be checked to make decisions in the program.

Common Flags


Understanding these components is important for anyone delving into Assembly language – they form the basic building blocks upon which all Assembly programs are constructed.

Basic Assembly Syntax and Structure

Assembly language, while being low-level, still has a structure and syntax that programmers must adhere to. This structure varies slightly between different Assembly syntaxes, the most common of which are AT&T syntax and Intel syntax.

AT&T vs Intel Syntax

AT&T and Intel syntax are two different ways of writing Assembly code. They differ in several ways, including operand order, immediate value representation, and register naming.

Understanding these differences is crucial when reading or writing Assembly code, as the same instruction can look very different.

Comments, Directives, and Labels

Assembly language also includes comments, directives, and labels, which help to structure the code and make it more readable.

Tools of the Trade

When working with Assembly language, especially in the context of reverse engineering, there are several tools that can aid in the process. Some of the most notable ones are disassemblers, debuggers, and decompilers.

Let’s dig into each of these tools a bit and understand their role in the reverse engineering process.

Disassemblers

A disassembler is a tool that translates machine code back into Assembly code. This is vital in reverse engineering, as it allows the reverse engineer to read and understand the code that a program is executing.

There are many disassemblers available, but some of the most popular include IDA Pro and Ghidra. These tools not only disassemble code, but also provide features like cross-referencing, graphing, and more.

IDA Pro

Hex-Rays IDA Pro

IDA Pro, short for Interactive Disassembler Professional, is a disassembler widely used in the field. It is developed by Hex-Rays, a private software company based in Belgium.

IDA Pro is known for its interactivity, graphical representation, and extensibility, making it a favourite among security researchers and reverse engineers. It provides a rich set of features such as multi-target disassembly, cross-references between code and data, a built-in debugger, and more.

However, IDA Pro is a commercial product and its cost can be a barrier for individual users. As of 2023, the cost of a named licence for IDA Pro starts at $1,859 for the base version, with additional costs for optional decompilers and support plans.

Ghidra

Ghidra

Ghidra is a suite of tools developed by the United States National Security Agency’s (NSA) Research Directorate. This open-source software includes a disassembler, assembler, and decompiler, among other tools, and is widely used in the cybersecurity community for analysing malicious code and malware.

Ghidra’s plugin architecture allows users to add new functionality to the software using Java plugins. The tool also includes a built-in scripting engine that supports Python and Java scripts.

Perhaps the most attractive feature of Ghidra is its cost – it’s completely free. The NSA released Ghidra as open-source software under the Apache Licence 2.0, making it accessible to anyone interested in reverse engineering.

Debuggers

A debugger is a tool that allows a programmer to step through code execution, set breakpoints, and inspect the state of the program at any point. This is incredibly useful in reverse engineering, as it allows the reverse engineer to see exactly what the program is doing at any given moment.

If you are a programmer you have likely used a debugger before, perhaps there’s one packaged with your IDE (Integrated Development Environment). The benefits of having an external debugging tool is that it can be used with any program, not just those that you have written.

Some popular debuggers include xdbg64 and GDB. These debuggers offer features like step-by-step execution, breakpoint setting, register and memory inspection, and more.

The Process of Reverse Engineering with Assembly

Reverse engineering with Assembly involves several steps, and many approaches. Some primary ones are called static analysis, dynamic analysis, and decompilation.

Static Analysis

Static analysis involves reviewing the Assembly code without actually executing it. This can help identify patterns, libraries, and functionalities in the code.

During static analysis, the reverse engineer might look for things like function calls, loop structures, and conditional statements. They might also look for calls to known libraries or APIs, which can give clues about what the program is doing.

Dynamic Analysis

Dynamic analysis involves running the software and observing its behaviour in real-time. This can be done thanks to a debugger, which allows the reverse engineer to step through the code, set breakpoints, and modify values.

Dynamic analysis can provide insights that static analysis can’t, such as how the program behaves under specific conditions, or how it interacts with the operating system and other programs.

Decompilation

Decompilation is the process of transforming Assembly code back into high-level source code. This can make the code easier to read and understand, but it’s not always accurate or possible.

Decompilation is a complex process that can be fraught with inaccuracies due to things like optimisation, obfuscation, and the inherent loss of information that occurs when code is compiled.

Despite these challenges, decompilation can be a useful tool in the reverse engineer’s toolkit, providing another perspective on the code.

Real-world Applications of Reverse Engineering

Reverse engineering has many real-world applications, including software security, legacy software maintenance, and intellectual property disputes.

Ethical Considerations

While reverse engineering can be a powerful tool, it’s important to use it ethically and legally. This means obtaining permission before reverse engineering software, respecting copyright laws, and not using reverse engineering for malicious purposes.

Conclusion

Assembly language, with its close ties to machine architecture, is a valuable tool in the world of reverse engineering. Understanding and mastering these tools and techniques can open a door to a whole new realm of control.

Whatever your motivations are for learning Assembly, it’s a skill that will serve you well in the years to come.

Whether you’re a seasoned programmer looking to expand your skills, or a beginner just starting out, I hope this article has sparked your interest in Assembly and reverse engineering. Happy coding!


This is the first article I’ve written on the subject of reverse engineering, and I plan to dive deeper into the specifics and demonstrate some practical examples in future posts.

If you have any feedback or suggestions, please feel free to reach out!

Further Reading and Resources

If you’re interested in learning more about Assembly and reverse engineering, here are some resources that might help: