CatCompiler.Dev

No no, don’t click away yet! You haven’t seen these before – this will cover the modern floating point instructions that use the XMM registers.

This article will specifically discuss the SSE floating point instructions in the context of the x86_64 assembly language. SEE stands for “Streaming SIMD Extensions”, it is a group of instructions that work on lots of data. The subset we’ll be exploring today is SSE2. It is supported on all major x86 CPUs as of the year 2000. The SIMD part of the name stands for “Single Instruction Multiple data”, these kinds of instructions take in multiple values at once and do work on all of them, however I will not be covering the full extent of that.

Let’s laser focus specifically on the assembly today. The best resource out there on this topic is The Intel CPU intrinsics guide. This is great documentation, but you need a trained eye for it, let’s jump right in!

Reading the Intrinsics Guide

The Intel Intrinsics Guide focuses mainly on documenting C and C++ wrappers, however it still provides us with valuable information. If you know some C/C++, the function definition gives some insight into what it does, however it’s not required.

On the left, you can filter for the kind of instruction you’d like. On the top right, you can filter by function name or instruction. Clicking on an instruction opens an interface bellow with details. An important one to an assembly dev is the Instruction field, this explains the exact assembly instruction used. The Description describes how it works and Operation gives a programmatic explanation.

The names of the SSE2 float instructions are quite obtuse, so let’s cover that now. Consider the divss instruction. This instruction works on any XMM register.

To break it down:

division
It works on Scalar values (just one a time)
It works on Single-length floating point numbers. (the normal size of 32-bit floats)

To put it all together, the divss instruction does division of one floating point number.
Let’s consider movsd:

Do a move, aka copy the value
It is scalar, working on only one value
It works on double-length floating point numbers (64-bit floats)

Pretty straightforward right? It’s worth reading over these examples a couple of times because they are important context.

The Registers in Depth

The SSE2 float instructions work on XMM registers, there are 16 of them, from XMM0, to XMM15. They are 128-bits each, the explanation today will not delve into using the entire registers, for the purposes of this article you can think of them as one float, or a double-float.

The calling convention for these is that they are passed into functions with XMM0 to XMM7 and the function may modify them. XMM0 and XMM1 are used to return values.

SSE2 Floats Cheat Sheet

Name	Summary
divsd	divide two double-floats
movsd	move a double-float
cvtsi2sd	convert an integer to a double-float
cvtsd2si	convert a double-float to an integer
comisd	compare two double-floats, setting the EFLAGs, useful for branching
addsd	add one double-float to another
mulsd	multiply two double-floats
subsd	subtract two double-floats

Those are the most useful instructions I have found for working with x86_64 floats so far, they each have plenty of variants, it’s always worth looking them up if you think you need something similar.

Some important notes are:

Each of these instructions takes one or more XMM register.
You cannot directly load an immediate float, you have to load it from the .data section, convert it from an integer, or otherwise

Example Snippets

mulsd xmm0, xmm0        ; xmm0 =  xmm0 * xmm0
cvtsi2sd xmm1, rdi      ; xmm1 = rdi

; return early if [normal_const] < xmm0
movsd xmm0, [normal_const]
ucomisd xmm0, xmm1
jb nm_end
    lea rax, [normal_msg]
    ret
nm_end:

CatCompiler.dev

Floats in 64-bit assembly

Reading the Intrinsics Guide

The Registers in Depth

SSE2 Floats Cheat Sheet

Example Snippets

Leave a Reply Cancel reply