No, not this kind of gas, though that would actually be a lot funnier. Maybe I need to back up a bit.
I have been playing around with the idea for a new project that would involve doing a little assembly language. I've done x86 assembly "back in the day" when I contributed some code for Retrocade, an arcade game emulator that was the fastest thing going at the time.
![]() |
Too Bad It Died |
This brought me to think about FASM. FASM has a variant called FASMARM that can assemble ARM mnemonics. However, FASMARM doesn't actually run on ARM itself. It can only generate code on a PC that then would have to be transferred to the Pi. That would kind of suck too.
Anybody that does any assembly is aware of the GNU Assembler gas (often written as, well, "as"). gas definitely sucks. It is the only commonly used assembler that uses ATT syntax. In a nutshell, ATT syntax is:
a) backwardsThis seminal article on IBM Developerworks provides all the gory details in comparing NASM to gas. Or forget about beating around the bush and read this quote from the 646 page tome Assembly Language Step By Step:
b) full of superfluous % signs
The GNU assembler gas uses a peculiar syntax that is utterly unlike that of all the other familiar assemblers used in the x86 world, including NASM. It has a whole set of instruction mnemonics unique to itself. I find them ugly, nonintuitive, and hard to read. This is the ATT syntax, so named because it was created by ATT as a portable assembly notation to make Unix easier to port from one underlying CPU to another. It’s ugly in part because it was designed to be generic, and it can be recast for any reasonable CPU architecture that might appear.Suffice to say, gas is generally avoided by programmers like Amy Winehouse avoided rehab.
Should Have Gone Into Rehab
However (and there is always a however), the GNU gods saw fit to introduce a compatibility mode to gas that allowed the use of Intel style syntax, the type used by programs such as NASM and FASM. Sounds good. I decided to give it a shot. I would do so by taking the NASM examples in the Developerworks article and see if I could get them to work in gas using Intel-style syntax. The binutils documentation led me to believe that this should be a pretty straightforward process. I was wrong. There was some headbanging on the way and I nearly gave up at one point, but now I have everything working well. This is what I learned along the way, presented as gists on gitub that you can dive into and fork as you please.
Here is Listing 1 from the Developerworks article in Intel style syntax and assembled using gas. This luckily worked right off the bat.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example adapted from http://www.ibm.com/developerworks/library/l-gas-nasm/ | |
# Assemble with: as --gstabs+ -o nasm2gas1.o nasm2gas1.s | |
# Link with: ld -o nasm2gas1 nasm2gas1.o | |
# Check program return status with "echo $?" | |
# | |
# Support intel syntal vs. ATT and don't use % before register names | |
.intel_syntax noprefix | |
.section .text | |
# Program entry point | |
.globl _start | |
_start: | |
# Put the code number for system call | |
mov eax, 1 | |
# Return value. Check returned value with "echo $?" on command line | |
mov ebx, 2 | |
# Call the OS | |
int 0x80 |
.intel_syntax noprefix
That is the magic incantation that makes it all happen. This key tidbit from Stackoverflow is what got me going down this road in the first place. This directive tells gas to use Intel syntax and to not need the % prefix before register names. Put that in there and besides the directives specific to gas, the code itself looks quite a bit like NASM.Then I got to Listing 2 and cruised through it to, mostly because it didn't introduce much in the way of new concepts. This looked like it was going to be easy.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example adapted from http://www.ibm.com/developerworks/library/l-gas-nasm/ | |
# Assemble with: as --gstabs+ -o nasm2gas2.o nasm2gas2.s | |
# Link with ld -o nasm2gas2 nasm2gas2.o | |
# Check program return status with "echo $?" | |
# | |
# Support intel syntal vs. ATT and don't use % before register names | |
.intel_syntax noprefix | |
.section .data | |
var1: .int 40 | |
var2: .int 20 | |
var3: .int 30 | |
.section .text | |
# Program entry point | |
.globl _start | |
_start: | |
# Move the contents of variables | |
mov ecx, [var1] | |
cmp ecx, [var2] | |
jg check_third_var | |
mov ecx, [var2] | |
check_third_var: | |
cmp ecx, [var3] | |
jg _exit | |
mov ecx, [var3] | |
_exit: | |
mov eax, 1 | |
mov ebx, ecx | |
int 0x80 |
The problem I had with Listing 3 was that I could not get the strings in .rodata to print to save my life. Running the code in the debugger shows that I was getting the string lengths read properly. The problem was getting the address of the strings that were then sent off to the write macro. My read macro wasn't working right either. I banged my head against a brick wall for an entire evening before I found this on the net the next morning. The key text is in the code but I want to put it down again below to drive the point home.
The somewhat mysterious OFFSET FLAT: incantation tells the assembler to figure out the (4-byte) address where the variable x will end up when the program is loaded. Even the assembler does not have all the information to figure this out, since a program may be in several pieces and the assembler does not know where each piece will go. It is up to the loader to figure this out, so in fact all the assembler does with the OFFSET FLAT: reference is to make a note in the object file and it is the loader that will finally fill in the right value in the generated instruction. This is one of the respects in which object code (which ends up in a .o file after assembly) is not pure machine code.Well, tie me to the side of a pig and roll me in the mud. This and similar code code
mov ecx, \str
as it was in the Developerworks article would refuse to work, and no
amount of BYTE PTR, brackets, dollar signs, or percent signs brought me
any joy. But this code worked just fine
mov ecx, OFFSET FLAT:\str
You'll see that what I actually ended up using here was
lea ecx, \str
just because this version is more clear (I'm trying to load the effective address of that string), I don't need "OFFSET FLAT:", and lea is generally a more powerful instruction for loading an address that I could end up using later.
I would need "OFFSET FLAT:" for a mov, push or any other instruction referencing a memory location in .data or .rodata. Note that I don't need it to access constants for the string lengths in those .sections. That is because the assembler can at least figure that much out on its own. The code below works perfectly.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example adapted from http://www.ibm.com/developerworks/library/l-gas-nasm/ | |
# Assemble with: as --gstabs+ -o nasm2gas3.o nasm2gas3.s | |
# Link with: ld -o nasm2gas3 nasm2gas3.o | |
# | |
# Support intel syntal vs. ATT and don't use % before register names | |
.intel_syntax noprefix | |
.section .rodata | |
prompt_str: .ascii "Enter Your Name: \n" | |
.set STR_SIZE, . - prompt_str | |
greet_str: .ascii "Hello " | |
.set GSTR_SIZE, . - greet_str | |
.section .bss | |
# Reserve 32 bytes of memory | |
.lcomm buff, 32 | |
# The somewhat mysterious OFFSET FLAT: incantation tells the assembler to figure | |
# out the (4-byte) address where the variable x will end up when the program is | |
# loaded. Even the assembler does not have all the information to figure this | |
# out, since a program may be in several pieces and the assembler does not know | |
# where each piece will go. It is up to the loader to figure this out, so in fact | |
# all the assembler does with the OFFSET FLAT: reference is to make a note in the | |
# object file and it is the loader that will finally fill in the right value in | |
# the generated instruction. This is one of the respects in which object code | |
# (which ends up in a .o file after assembly) is not pure machine code. | |
# | |
# So | |
# mov ecx, OFFSET FLAT:\str | |
# | |
# and | |
# lea ecx, \str | |
# | |
# will do the same thing. | |
# A macro with two parameters implements the write system call | |
.macro write str, str_size | |
mov eax, 4 | |
mov ebx, 1 | |
lea ecx, \str | |
mov edx, \str_size | |
int 0x80 | |
.endm | |
# Implements the read system call | |
.macro read buff, buff_size | |
mov eax, 3 | |
mov ebx, 0 | |
lea ecx, \buff | |
mov edx, \buff_size | |
int 0x80 | |
.endm | |
.section .text | |
# Program entry point | |
.globl _start | |
_start: | |
write prompt_str, STR_SIZE | |
read buff, 32 | |
# Read returns the length in eax | |
push eax | |
# Print the hello text | |
write greet_str, GSTR_SIZE | |
pop edx | |
# edx = length returned by read | |
write buff, edx | |
_exit: | |
mov eax, 1 | |
mov ebx, 0 | |
int 0x80 |
There are a couple things to note in Listing 3 above. First, the gas syntax for macros is still used. Variable substitution is made with backslashed references to the macro parameters. Just because we are using Intel style syntax doesn't change that part of it. No problem. The macro syntax in gas isn't arcane like the ATT syntax itself.
The second thing to notice is how I have coded the string length. The Developerworks example says I should use something like this
when in fact I have used this (thanks again, Stackoverflow).greet_str: .ascii "Hello "
gstr_end: .set GSTR_SIZE, gstr_end - greet_str
Either the Developerworks article was written before the "." was introduced to keep track of the current location or the author wasn't aware of it. This part of the code becomes cleaner and more NASM like since the extra label "gstr_end" is not required.greet_str: .ascii "Hello "
.set GSTR_SIZE, . - greet_str
On to Listing 4 and we're on a roll. You'll see the liberal scattering of "OFFSET FLAT:" because of all the non-lea instruction references to the strings in the .data section. There are a couple things a little different in this one though. First (shown in the code and mentioned in the Developerworks article), the parameters passed to the linker instruction are changed to link the external C standard library. Second is the use of "BYTE PTR" where NASM got away with just using "byte". I tried a search and replace of "BYTE PTR" to "byte" and the program stopped working properly, though gas did not complain. I could have experimented a bit more with this to see what was going on here but didn't bother. My initial read of the gas docs suggested to me that "byte" should have done the trick.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example adapted from http://www.ibm.com/developerworks/library/l-gas-nasm/ | |
# Assemble with: as --gstabs+ -o nasm2gas4.o nasm2gas4.s | |
# Link with: ld --dynamic-linker /lib/ld-linux.so.2 -lc -o nasm2gas4 nasm2gas4.o | |
# | |
# Support intel syntal vs. ATT and don't use % before register names | |
.intel_syntax noprefix | |
.section .data | |
array: .byte 89, 10, 67, 1, 4, 27, 12, 34, 86, 3 | |
.set ARRAY_SIZE, . - array | |
array_fmt: .asciz " %d" | |
usort_str: .asciz "unsorted array:" | |
sort_str: .asciz "sorted aray:" | |
newline: .asciz "\n" | |
.section .text | |
# Program entry point | |
.globl _start | |
_start: | |
push OFFSET FLAT:usort_str | |
call puts | |
add esp, 4 | |
push ARRAY_SIZE | |
push OFFSET FLAT:array | |
push OFFSET FLAT:array_fmt | |
call print_array10 | |
add esp, 12 | |
push ARRAY_SIZE | |
push OFFSET FLAT:array | |
call sort_routine20 | |
# Adjust the stack pointer | |
add esp, 8 | |
push OFFSET FLAT:sort_str | |
call puts | |
add esp, 4 | |
push ARRAY_SIZE | |
push OFFSET FLAT:array | |
push OFFSET FLAT:array_fmt | |
call print_array10 | |
add esp, 12 | |
jmp _exit | |
print_array10: | |
push ebp | |
mov ebp, esp | |
sub esp, 4 | |
mov edx, [ebp + 8] | |
mov ebx, [ebp + 12] | |
mov ecx, [ebp + 16] | |
mov esi, 0 | |
push_loop: | |
mov [ebp - 4], ecx | |
mov edx, [ebp + 8] | |
xor eax, eax | |
mov al, BYTE PTR [ebx + esi] | |
push eax | |
push edx | |
call printf | |
add esp, 8 | |
mov ecx, [ebp - 4] | |
inc esi | |
loop push_loop | |
push OFFSET FLAT:newline | |
call printf | |
add esp, 4 | |
mov esp, ebp | |
pop ebp | |
ret | |
sort_routine20: | |
push ebp | |
mov ebp, esp | |
# Allocate a word of space in stack | |
sub esp, 4 | |
# Get the address of the array | |
mov ebx, [ebp + 8] | |
# Store the array size | |
mov ecx, [ebp + 12] | |
dec ecx | |
# Prepare for ourter loop here | |
xor esi, esi | |
outer_loop: | |
# This stores the min index | |
mov [ebp - 4], esi | |
mov edi, esi | |
inc edi | |
inner_loop: | |
cmp edi, ARRAY_SIZE | |
jge swap_vars | |
xor al, al | |
mov edx, [ebp - 4] | |
mov al, BYTE PTR [ebx + edx] | |
cmp BYTE PTR [ebx + edi], al | |
jge check_next | |
mov [ebp - 4], edi | |
check_next: | |
inc edi | |
jmp inner_loop | |
swap_vars: | |
mov edi, [ebp - 4] | |
mov dl, BYTE PTR [ebx + edi] | |
mov al, BYTE PTR [ebx + esi] | |
mov BYTE PTR [ebx + esi], dl | |
mov BYTE PTR [ebx + edi], al | |
inc esi | |
loop outer_loop | |
mov esp, ebp | |
pop ebp | |
ret | |
_exit: | |
mov eax, 1 | |
mov ebx, 0 | |
int 0x80 |
Finally, we get to Listing 5. This was smooth sailing. Again there is liberal use of "OFFSET FLAT:" to reference memory locations in the .data section. There is also the introduction of the .rept command to simplify the allocation of ten memory locations, but this translated over straight from the gas code in the Developerworks article without a hitch.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example adapted from http://www.ibm.com/developerworks/library/l-gas-nasm/ | |
# Assemble with: as --gstabs+ -o nasm2gas5.o nasm2gas5.s | |
# Link with: ld --dynamic-linker /lib/ld-linux.so.2 -lc -o nasm2gas5 nasm2gas5.o | |
# | |
# Support intel syntal vs. ATT and don't use % before register names | |
.intel_syntax noprefix | |
.section .data | |
# Command table to store at most 10 command line arguments | |
cmd_tbl: .rept 10 | |
.long 0 | |
.endr | |
.section .text | |
# Program entry point | |
.globl _start | |
_start: | |
# Set up the stack frame | |
mov ebp, esp | |
# Top of stack contains the number of command line arguments. Default is 1. | |
mov ecx, [ebp] | |
# Exit if arguments are more than 10 | |
cmp ecx, 10 | |
jg _exit | |
mov esi, 1 | |
mov edi, 0 | |
# Store the command line arguments in the command table | |
store_loop: | |
mov eax, [ebp + esi * 4] | |
mov [OFFSET FLAT:cmd_tbl + edi * 4], eax | |
inc esi | |
inc edi | |
loop store_loop | |
mov ecx, edi | |
mov esi, 0 | |
print_loop: | |
#Make some local space | |
sub esp, 4 | |
# puts function corrupts ecx | |
mov [ebp - 4], ecx | |
mov eax, [OFFSET FLAT:cmd_tbl + esi * 4] | |
push eax | |
call puts | |
add esp, 4 | |
mov ecx, [ebp - 4] | |
inc esi | |
loop print_loop | |
jmp _exit | |
_exit: | |
mov eax, 1 | |
mov ebx, 0 | |
int 0x80 |
And that about wraps it up. Knowing a few rather arcane tricks makes gas much easier to work with. gas is also everywhere there is a gcc compiler, so you can use it pretty much anywhere on everything. All you need to know is a few tricks to make it more usable, and now you know those tricks. Give it a shot.
How does the nasm address es:[bx + 0x1a] transfer to gas?
ReplyDeleteIve done some looking around and the data inside the location [bx + 0x1a] would transfer as %bx(0x1a,1) right?
Hello maate great blog
ReplyDelete