Learning 64-bit ASM (amd64) on Linux

steveL · Last edited by steveL on Sat Oct 19, 2013 11:16 am; edited 1 time in total

Hi,

I often hear people on IRC ask how to learn assembler, which is IMO an essential knowledge to have for any programmer: even if you don't actually write assembly-code on a daily basis in your career, if you have ever coded asm, then nothing computers do is ever that mysterious to you. You gain a much better intuitive understanding of what computers really do, which informs all of your other code. As such it is something that I believe people should learn before they learn C, C++ or Java. To my mind it should be your second language, once you've understood the basics of typing in text, which a computer processes to decide what to do, and how variables work. If you're completely new to coding, I highly recommend the awkbook. Buy a copy when you can, you will not regret it: "The Awk Programming Language" (Aho, Kernighan and Weinberger, 1988) -- it is the best introduction to programming I have ever read, and it covers Core Computing much better than any course at a University, afaic.

So let's assume you know a bit about programming, and you're a Gentoo user which means you know your way around a terminal, so commands don't scare you. First off, the best book for learning asm that I've been able to find, is the second edition of "Introduction to 64 bit Intel Assembly Language Programming for Linux" Seyfarth (2012) http://rayseyfarth.com/asm

If you want to learn asm, you will need to buy a copy. Basically you can take it from there, just by reading the book, and working through every exercise. The author provides a nice build environment called ebe which you can download from his site. Hmm looks like it's on sourceforge now, so maybe we should look at an ebuild. I've also set up kate to do exactly the same thing, but in my usual work environment: I'll explain after you've setup your assembler (the most vital part.)

To actually assemble, you'll need yasm on Gentoo, which is a nasm-compatible assembler that works well for 64-bit coding too, and provides correct debugging info for gdb.

However you should also install nasm too, with the doc USE flag set on in package.use. That gives you the html docs for nasm, which you'll need to understand the syntax etc of yasm.

For me this was as simple as update -ia nasm yasm and then hit 'e' to Edit the list, and set the use for nasm. I didn't want the python flag for yasm, but you might. I don't know what it does (I imagine it's useful in conjunction with gdb python though), and I'm not interested at this point. Alternatively:

NeddySeagoon · Posted: Fri Oct 18, 2013 6:24 pm Post subject:

steveL,

I'm a hardware guy, I had to design and build my CPU before I could program it, assember would have been a luxury :)

I see knowing assembler as a double edged sword. It depends on where your software output is targeted.
If you are writing target hardware independent code, you have to trust the compiler, so it makes little difference if you know assembler or not.
There is even an argument that says not knowing assembler is best.

If you write a high level language for hardware that you understand at the machine level, you tend to know how your compiler of choice plants code, so you pick some code constructs over others to help the compiler to produce leaner meaner code. It may still be portable too but not as effiecient on other hardware when the compiler does things differently.

Don't get me started on byte code based interpreters, where the 'machine' doesn't actually exist.

I don't program much any more, its not as much fun now as it was on 8 bit micros, where every instruction you squeezed out made it faster.
Assembler for PPC made my head hurt - all that out of order execution, which you had to write by hand, now in Intel/AMD too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

steveL · Posted: Fri Oct 18, 2013 7:38 pm Post subject:

NeddySeagoon · Posted: Fri Oct 18, 2013 8:03 pm Post subject:

steveL,

Heh - like I said out the outset, I'm a hardware guy.

eccerr0r · Posted: Sat Oct 19, 2013 12:06 am Post subject:

steveL · Posted: Sat Oct 19, 2013 10:19 am Post subject:

If you mean VHDL/RTL, we call those people: microcoders. (note: not micro-coders.)

eccerr0r · Posted: Sat Oct 19, 2013 3:00 pm Post subject:

Microcode is yet another aspect of chip/design, gets even more confusing there.

Who uses VHDL these days, is it still popular? I thought most people have switched over to Verilog. Not sure though; I've seen a bit of switchover though...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

NeddySeagoon · Posted: Sat Oct 19, 2013 7:18 pm Post subject:

eccerr0r,

The out of order execution was exposed in the dim and distant past. If you wrote in a high level language, the compiler did it.
True - it was never essential to get the right answers.

I don't know how its handled on modern PPC.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

steveL · Posted: Sun Oct 20, 2013 7:08 am Post subject:

eccerr0r · Posted: Sun Oct 20, 2013 11:24 pm Post subject:

I suppose i/o instructions and memory instructions may be dumped out onto the bus out of order, that may be the only externally visible architectural issue with OOO machines. But that is sort of a problem with caches in general, too.

I'll always remember where the EIEIO instruction was for...and it was indeed for the PPC. X86/ia64 had special uncacheable bits and fences to force things to exit the CPU in proper order... but mostly it was for writeback cache issues.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

John R. Graham · Posted: Mon Oct 21, 2013 2:19 am Post subject:

I never really understood what those transistors all the hardware guys were talking about did until I studied physics.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.

miket · Posted: Mon Oct 21, 2013 4:26 am Post subject:

steveL · Posted: Mon Oct 21, 2013 8:55 am Post subject:

creaker · l33t Joined: 14 Jul 2012 Posts: 651

It was relevant at the time when my box has 16kB RAM only. It was a real treat - fit the program into 16 kB.
Last time I wrote in assembler for my old Athlon XP 2400. It was at 2008, then I could still (with great difficulty) to compete in the optimization with gcc and vc. Since I switched to Core2Duo, I use it very rarely, for random tasks like PIC or Atmega programming.
Though if masm32 can be used in linux, maybe...

steveL · Posted: Mon Oct 21, 2013 2:30 pm Post subject:

eccerr0r · Posted: Mon Oct 21, 2013 3:55 pm Post subject:

I always thought of the register bank switch opcode on the Z80 was kind of pointless except in a true embedded application where you won't even run general purpose code... Being when I coded assembly on Z80, it was on a general purpose computer (TRS-80) - that's how I got this impression...

Pretty much programs have gotten too complicated for a mere human to write in assembly anymore because there's too many variables to keep track of - yes, it makes it easier when you have a large register file but the problems get even larger. Plus it seems that abstracting data structures tend to make it easier for humans to write code faster...

I try not to write in assembly anymore mainly because it's very difficult to maintain, reusing it from one program to another. It does help save memory though. I was thinking about writing a nonstandard 24-bit floating point library for AVR to save register memory, but trying to write it in C it failed pretty badly.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

steveL · Posted: Mon Oct 21, 2013 5:14 pm Post subject:

NeddySeagoon · Posted: Mon Oct 21, 2013 8:20 pm Post subject:

eccerr0r,

The register bank switch on the Z-80 was really only useful for interrupt routines.
No need to push things onto the stack and pop them off afterwards.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

eccerr0r · Posted: Mon Oct 21, 2013 9:05 pm Post subject:

Right, the bank was great for interrupt routines that have other interrupts masked (and hope that NMI doesn't happen), which means you can't use them in user applications if you don't know when interrupts (like clock or i/o) happens and overwrite your values. So general purpose applications can't use them. Or are you going to be disabling interrupts whenever using the other bank and pray for no NMI?

Or will people end up pushing everything to stack anyway...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

NeddySeagoon · Posted: Tue Oct 22, 2013 5:55 pm Post subject:

eccerr0r,

You just connect the NMI pin on the Z-80 to +5v
If you get an NMI then, you have bigger things to worry about :)
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

steveL · Posted: Wed Oct 23, 2013 5:11 am Post subject:

Well not sure what the firmware did with NMI on the CPC-464, but I just used to DI at the start and get on with it.

I doubt it did much, since you could bank-switch it out (and we did: for video; to the best of my recollection: this was 1982 i think so bear with me;) and iirc it was only jump vectors in page 0. Interrupts weren't a concern in any case, for me. Sync was to the frame refresh, which I polled once everything had been done for the current cycle, and it was all about making that deadline. That's the sense in which I mean this was not a general-purpose application as you described, eccerr0r: interrupts were always disabled. With hindsight, it's what would be called hard-realtime now I guess, albeit with no real-world disastrous consequence: if you missed that pulse, you'd failed, since your program stalled until the next one, which was ages in terms of what got done. At the time I'd never even heard the word "realtime", nor "embedded", but there was a hard constraint on game-engine performance. That's just the way it was, and it wasn't considered anything special: just game coding.

I certainly never did anything with clock, and the only i/o was a tape drive which at the time didn't seem a big deal; it was only used to load your game (though cassette copy-protection was a fun diversion when I finally got a disk-drive and the games companies refused reasonable upgrade..;) Oh keyboard, display and audio, ofc, but those go without saying. I spent most time, in every sense of the word, on graphics and never did audio; but envy of the C64 made me weep. ;)

eccerr0r · Posted: Wed Oct 23, 2013 4:49 pm Post subject:

Well, that's exactly the issue at hand. If you were to switch banks, you had to know exactly what interrupts were coming in or not, and how to deal with it. This means that general purpose software can't use bank swaps unless you could disable OS and random housekeeping in the system. Seems most people are pointing Z80s to embedded apps - exactly. If you know your application you can stop using things like NMI and interrupts when they can cause realtime issues.

However it seems that computers have gotten so fast that people don't even care about cycle timing. Even worse, the newer CPUs you just don't know how many cycles it takes when the instruction gets read and when the bus responds to that instruction.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

steveL · Posted: Thu Oct 24, 2013 5:41 pm Post subject:

eccerr0r · Posted: Thu Oct 24, 2013 6:10 pm Post subject:

Me, actually, I didn't do much software. I had this general purpose computer (TRS-80 Model III, 2.5MHz Z-80) when I was 10-ish and that's where I learned BASIC. One program I really wanted to write was a graphics editor that I could paint on the screen (with its hideous graphics resolution). However when it came time to save creations to disk, I wrote it in BASIC.

It took like a few minutes to write it to disk and read back. Awful!

So I looked into assembly language. I had no assembler unfortunately so I had to hand assemble.

The two USR() routines I ended up writing for the TRS-80 were routines to read/write the screen to disk(that took seconds to do!) and one that filled the screen with random characters or an arbitrary character. I did this because I was fascinated by these routines that were written in a few games and wanted to write something equally as fast where BASIC wouldn't do. I had a whiteout routine that someone had written as a template, but still had to write the assembly language, refer to the Z80 spec sheets to hand-assemble to machine code, and then POKE it into a variable... Ah the memories...

I never bankswitched as I didn't need to, it was the foreground task. As the Model III had disk drives and a clock, there'd be plenty of interrupts flying around. I don't remember how DRAM refresh affected user programs but I do recall reading the refresh register to seed my "random" screen routine.

Sometimes I wish SMC was never created. It's the most vile creation ever made, yet it's almost ingenious to save space. I disapprove of SMC because it's a PITA to implement on chips, and seems the only reason for SMC these days is to write viruses. I really hate having to make special RTL code to make sure viruses work. That just seems wrong.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Yamakuzure · Posted: Fri Oct 25, 2013 10:06 am Post subject:

A bit back on topic:

If you do C/C++ (or any other compiled language) programming and sometimes have problems with debuggers being stuck in instructions between your code lines, you'd end up on disassembler output. This looks often rather different than what you'd have written in asm yourself. If you have the hex available along with the dissassembled code (for gdb, use /r modifier on "disas" command.) You might find the opcodes helpful:
http://ref.x86asm.net/

It helped me greatly when working on a JIT compiler. (Inject opcodes directly into memory and execute it.)
_________________
Important German:

"Aha" - German reaction to pretend that you are really interested while giving no f*ck.
"Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.