Segmentation in Intel x64(IA-32e) architecture - explained using Linux

In this article we will go through Segmentation in basic and cover it for x64 (IA-32e) processors by extracting the details inside a Linux system.

12 min read
Segmentation in Intel x64(IA-32e) architecture - explained using Linux

Segmentation and paging are the two components of Intel architecture that are there for memory management and all types of memory related protection. Most of the time segmentation is been ignored by peoples due to its limited use compare to paging. Above that, In recent Intel architecture(64-bit processors) it is almost disabled and used for very limited purposes. But knowing about those limited uses and verify this by yourself will help you to understand the Intel memory management much clearly. Other than that there are not many resources on segmentation in 64 bit that are data driven.
In this article we will try to understand the segmentation by extracting the segmentation data from a 64-bit linux machine.

I am using a Debian based distribution with linux kernel version 4.10.5.

Basics of Segmentation (What happens in 32 bit?)

Physical RAM(memory) has different types of data which need to be stored in well-mannered format for easy accessibility and protection. For that, one of the feature Intel processor has is Segmentation. Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another.

Whatever memory address you came across inside a process is actually a logical address. This logical address then convert into linear address using segmentation, then into physical address(Address on RAM) using paging. Segmentation is mechanism for dividing the linear address space into smaller protected address spaces called segments. To get a good understanding on this, lets go through the diagram below.

Address conversion using Segmentation and paging

From the above figure, we only care about the segmentation part attached below. But before that we can see that the Logical address goes through segment descriptor to point to the linear address space. This linear address space divide into 3 parts and go through page directory and page table to get the physical address. Now lets take a closer look into segmentation.

Segmentation
If paging is not used, the linear address space of the processor is mapped directly to physical address space.

From above diagram you can conclude that for conversion of logical to linear address we need a segment selector, an offset, GDT(Global descriptor table) and a segment descriptor inside the GDT.

Segment Selector: It is the address present in Segment registers that will point to the particular Segment descriptor at a offset in GDT.
Offset (Effective address): It is nothing but the memory address user see inside a program or anywhere in the system.
Global Descriptor table: This is the table whose base address present in GDTR register and it contains Segment descriptors.
Segment descriptor: It contains base phisical address(and few more info) from which offset is added to get the exact linear address.

Segment Selectors

A segment selector is a 16 bit value held in a segment register. It is used to select an index for a segment descriptor from one of two tables.

GDT - Global Descriptor Table - for use system-wide
LDT - Local Descriptor Table - intended to be a table perprocess and switched when the kernel switches between process contexts

There are six segment registers used to store these segment selector.

  • CS - Code Segment
  • SS - Stack Segment
  • DS - Data Segment
  • ES/FS/GS - Extra (usually data) segment registers

For any kind of program execution to take place, at least CS, SS and DS segment registers must be loaded with valid segment selectors. Segment register also contains a hidden part along with segment selector that is used for caching purpose.

Segment register

A segment selector has the following format.

Segment selector

Table Indicator: Denotes if the descriptor that this particular selector points to is part of GDT table or LDT table.

RPL: 2 bit. Can be between 0-3. It is the privilege level that the task (or segment selector of the task) has. We will talk more on this later.

So, In bigger picture the segmentation will look like this:

How my logical address get translated?

Lets take a example of a simple program.

#include<stdio.h>
int main(){
	return 0;
}

After compiling and loading it in gdb you can see that first instruction is at offset 0x660 in process memory. This 0x660 is the offset or effective address(mentioned above).

When your EIP/RIP is pointing to a instruction youʼre implicitly using the CS (code segment) register as the segment selector. So, your eip is actually cs:eip. Similar, when youʼre accessing the stack, youʼre implicitly using a logical address that is using the SS (stack segment) register as the segment selector. (I.e. “ESP”  == “SS:ESP”).

Even the dissembler doesn't show the segment registers but it is always there with every operation.

Global discriptor table register (GDTR) and GDT

GDTR register holds a base address(32 bit in x32 and 64 bit in IA-32e(x64)) and 16-bits table limit for GDT.

GDTR

GDT is the table that contain all the Segment descriptors. Each segment has a segment descriptor, which specifies the size of the segment, the access rights and privilege level required for accessing the segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment.

A segment descriptor is mostly 8 bytes ( or 16 byte for system segment in x64). The format looks like this:

Segment descriptor

You can read more about each field in Intel developer's manual Vol-3a 3.4.5. We will cover only few important fields.

Base (32 bits) - linear address where the segment starts
Limit (20 bits) - Size of segment (either in bytes or 4kb blocks). End address of segment = base + limit.
G (Granularity) flag - if 0, interpret limit as size in bytes. If 1, interpret as size in 4kb blocks.
D/B - Default operation size flag. 0 = 16 bit default, 1
= 32 bit default. This is what actually controls whether an overloaded opcode is interpreted as dealing with 16 or 32 bit register/memory sizes
DPL (Descriptor Privilege Level - 2 bits) - Specify the privilage level required by the descriptor. More on this in next section.

Privilege levels

The processor use privilege level to prevent a program or task operating at a lesser privilege. The processor’s segment-protection mechanism recognizes 4 privilege levels, numbered from 0 to 3. The greater
number, mean lesser privileges. Below figure shows how these levels of privilege can be interpreted as rings of protection.

The center (reserved for the most privileged code, data, and stacks) is used for the segments containing the critical software, usually the kernel of an operating system. Outer rings are used for less critical software. (Mostly our operating system uses 2 privilege levels 0 and 3.)

In segmentation, privilege to access some resource is define using the RPL(Requested privilege level) and DPL(Descriptor privilege level). A program or task at any time will only able to access a segment when the RPL present in Segment selector is less than or equal to the DPL of a segment descriptor i.e RPL<=DPL.

Flat model

On starting of the article we have talked that operating system uses segmentation in limited way. The reason for this is the default flat model set by operating system.

In flat model the operating system and application programs have access to a continuous, unsegmented address space. To the greatest extent possible, this basic flat model hides the segmentation mechanism of the architecture from both the system designer and the application programmer.
To implement a basic flat memory model with the IA-32 architecture, at least two segment descriptors must be created, one for referencing a code segment and one for referencing a data segment (see figure below). Both of these segments, however, are mapped to the entire linear address space: that is, both segment descriptors have the same base address value of 0 and the same segment limit of 4 GB(i.e 0xFFFFFFFF). By setting the segment limit to 4 GBytes, the segmentation mechanism is kept from generating exceptions for out of limit memory references.

Segmentation in x64(IA-32e) Intel processor

Finally, we are on the situation to talk about segmentation in recent 64-bit processors. According to Intel documentation segmentation in IA-32e(x64) defined like this:

In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as additional base registers in linear address calculations. They facilitate addressing local data and certain operating system data structures.
Note that the processor does not perform segment limit checks at runtime in 64-bit mode.

So, the CS, DS,ES and SS registers are 0 and are not used for memory addressing. Also, FS and GS can be used for linear address calculation. Let's try to verify this in our linux machine.

A simple program below will move all the segment registers value to defined variables and print them.

#include<stdio.h>

int main(){
	short int cs;
	short int ds;
	short int gs;
	short int ss;
	short int es;
	short int fs;

	__asm__ __volatile__("mov %%cs, %[cs]"
			     : /* output */ [cs]"=rm"(cs));
	__asm__ __volatile__("mov %%ds, %[ds]"
		 		 : /* output */ [ds]"=rm"(ds));
	__asm__ __volatile__("mov %%gs, %[gs]"
				 : /* output */ [gs]"=rm"(gs));
	__asm__ __volatile__("mov %%ss, %[ss]"
			 	 : /* output */ [ss]"=rm"(ss));
	__asm__ __volatile__("mov %%es, %[es]"
			 	: /* output */ [es]"=rm"(es));
	__asm__ __volatile__("mov %%fs, %[fs]"
				: /* output */ [fs]"=rm"(fs));


    printf("Data in CS segement is - 0x%x\n",cs);
	printf("Data in DS segement is - 0x%x\n",ds);
	printf("Data in GS segement is - 0x%x\n",gs);
	printf("Data in FS segement is - 0x%x\n",fs);
	printf("Data in SS segement is - 0x%x\n",ss);
	printf("Data in ES segement is - 0x%x\n",es);

    return 0;
}
Printing all segment register.

The output in my case looks like this.

Output of running the above program

Interestingly, all register except CS and SS are 0 which is not what intel manual have mentioned. Before looking into the reason for that, lets try the same thing on kernel level through LKM.

#include <linux/init.h>
#include <linux/module.h>
#include <linux/const.h>
#include <linux/errno.h>


int __init start_init(void)
{
	uint16_t cs;
	uint16_t ds;
	uint16_t gs;
	uint16_t ss;
	uint16_t es;
	uint16_t fs;

	__asm__ __volatile__("mov %%cs, %[cs]"
			     : /* output */ [cs]"=rm"(cs));
	__asm__ __volatile__("mov %%ds, %[ds]"
		 		 : /* output */ [ds]"=rm"(ds));
	__asm__ __volatile__("mov %%gs, %[gs]"
				 : /* output */ [gs]"=rm"(gs));
	__asm__ __volatile__("mov %%ss, %[ss]"
			 	 : /* output */ [ss]"=rm"(ss));
	__asm__ __volatile__("mov %%es, %[es]"
			 	: /* output */ [es]"=rm"(es));
	__asm__ __volatile__("mov %%fs, %[fs]"
				: /* output */ [fs]"=rm"(fs));


    printk(KERN_INFO "Data in CS segement is - 0x%x\n",cs);
	printk(KERN_INFO "Data in DS segement is - 0x%x\n",ds);
	printk(KERN_INFO "Data in GS segement is - 0x%x\n",gs);
	printk(KERN_INFO "Data in FS segement is - 0x%x\n",fs);
	printk(KERN_INFO "Data in SS segement is - 0x%x\n",ss);
	printk(KERN_INFO "Data in ES segement is - 0x%x\n",es);

    return 0;
}

static void __exit end_exit(void)
{
    printk(KERN_INFO "Unloading the driver\n");
	return;
}

module_init(start_init);
module_exit(end_exit);


MODULE_LICENSE("GPL V3");
MODULE_AUTHOR("Shubham Dubey");
MODULE_DESCRIPTION("Segment registers ");

The output looks like this.

Above code output using dmesg

We can notice the same thing here. Beside CS and SS all other segments are 0.

To understand the reason lets first interpret these value according to the segment selector format.

For user mode

CS = 0x33 - 110011 -> RPL = 3, TI=0(GDT), Index = 6th segment descriptor
SS = 0x2b - 101011 -> RPL = 3, TI=0(GDT), Index = 5th segment descriptor

For Kernel mode

CS = 0x10 - 10000 -> RPL = 0, TI=0(GDT), Index = 2nd segment descriptor
SS = 0x18 - 11000 -> RPL = 0, TI=0(GDT), Index = 3rd segment descriptor

After understanding these values you are ready to conclude the segmentation in 64 bit.

  • In 64 bit, segemntation is not used for addressing but only used to know if something is in user mode (ring 3) or kernel mode(ring 0). This is later been used for paging also.
  • Only CS and SS segments are define because all type of code access is checked through CS register RPL and all data access is checked through SS register.
  • In a working OS, it is must that there are 4 segement descriptor defined. 2 for user mode and 2 for kernel mode.
  • FS and GS are used to by operating system for retrieving linear addresses (like TEB structure) but are options and may not be used inside all processes.

Now lets check the GDTR register to get the base and limit of GDT table.

#include <linux/init.h>
#include <linux/module.h>
#include <linux/const.h>
#include <linux/errno.h>


static inline uint64_t get_gdt_base1(void)
{
	struct desc_ptr gdt;
	__asm__ __volatile__("sgdt %[gdt]"
			     : /* output */ [gdt]"=m"(gdt));
	return gdt.address;
}

static inline uint64_t get_gdt_limit(void)
{
	struct desc_ptr gdt;
	__asm__ __volatile__("sgdt %[gdt]"
			     : /* output */ [gdt]"=m"(gdt));
	return gdt.size;
}
int __init start_init(void)
{
	
    uint64_t gdt_base = get_gdt_base1();
    uint64_t gdt_limit = get_gdt_limit();
    printk(KERN_INFO "Address of gdt is %llx\n", (long long)gdt_base);
    printk(KERN_INFO "Limit of gdt is %llx\n", (long long)gdt_limit);
    return 0;
}

static void __exit end_exit(void)
{
    printk(KERN_INFO "Unloading the driver\n");
	return;
}

module_init(start_init);
module_exit(end_exit);


MODULE_LICENSE("GPL V3");
MODULE_AUTHOR("Shubham Dubey");
MODULE_DESCRIPTION("GDT limit and base ");

In the above code we have used sgdt (store gdt) instruction to store gdt in stack. I have used predefined struct in kernel desc_ptr which interpret the gdtr output. The output in my case looks like this.

GDT base and limit

From above output we can conclude following things:

  • Base address of GDT table is 0xffff88024f289000
  • End address of GDT table is 0xffff88024f28907f
  • Max no of Segment descriptor(since few descriptor can be of 16 byte rather then 8) = 7f/8 = F = 16

Now, lets take a look at these segment descriptor. First extract the 0th descriptor data.

#include <linux/init.h>
#include <linux/module.h>
#include <linux/const.h>
#include <linux/errno.h>

static inline uint64_t get_gdt_base1(void)
{
	struct desc_ptr gdt;
	__asm__ __volatile__("sgdt %[gdt]"
			     : /* output */ [gdt]"=m"(gdt));
	return gdt.address;
}


int __init start_init(void)
{
	
    uint64_t gdt_base = get_gdt_base1();
    uint64_t *disc_address;
    uint64_t descriptor;
    printk(KERN_INFO "Address of gdt is %llx\n", (long long)gdt_base);
    disc_address = gdt_base;
	descriptor = *disc_address;
    printk(KERN_INFO "Value in 1st descriptor is %llx\n", (long long)*descriptor);
    return 0;
}

static void __exit end_exit(void)
{
    printk(KERN_INFO "Unloading the driver\n");
	return;
}

module_init(start_init);
module_exit(end_exit);


MODULE_LICENSE("GPL V3");
MODULE_AUTHOR("Shubham Dubey");
MODULE_DESCRIPTION("Segment descriptor ");

The output in my case looks like this:

First segment descriptor value

Interesting. The first descriptor looks like to be 0. Let's see what intel manual have to say about that.

The first descriptor in the GDT is not used by the processor. A segment selector to this “null descriptor” does not generate an exception when loaded into a data-segment register (DS, ES, FS, or GS), but it always generates a general-protection exception (#GP) when an attempt is made to access memory using the descriptor. By initializing the segment registers with this segment selector, accidental reference to unused segment registers can be guaranteed to generate an exception.

What about the second descriptor?

#include <linux/init.h>
...
int __init start_init(void){
...
disc_address = gdt_base+8;
	descriptor = *disc_address;
    printk(KERN_INFO "Value in 2nd descriptor is %llx\n", (long long)descriptor);
...

The output I get is:

After interpreting this value to descriptor format, we get:

  • Base address -> 0
  • Limit -> 0xFFFFFFFF
  • DPL - > 00 -> 0(Kernel segment)

So, it is true that most segments are set as flat model with base address 0x0 and limit 0xFFFFFFFF... Also note that this is the segment where our CS register in kernel mode points to(2nd descriptor). We can conclude that it is just used for checking the privilege and not for memory mapping in any way. Similar, you can check these value of other segment descriptor also to verify these things further.

We are ending this article here since we have covered most common things that you can find for segmentation in 64 bit system. As a researcher you can further extend your knowledge by verifying about Gates, LDT table etc.