Inline Assembler
	
	D, being a systems programming language, provides an inline assembler. The inline assembler is standardized for D implementations across the same CPU family, for example, the Intel Pentium inline assembler for a Win32 D compiler will be syntax compatible with the inline assembler for Linux running on an Intel Pentium.
Implementations of D on different architectures, however, are free to innovate upon the memory model, function call/return conventions, argument passing conventions, etc.
This document describes the x86 implementation of the inline assembler.
AsmInstruction:
    Identifier : AsmInstruction
    align IntegerExpression
    even
    naked
    db Operands
    ds Operands
    di Operands
    dl Operands
    df Operands
    dd Operands
    de Operands
    Opcode
    Opcode Operands
Operands:
    Operand
    Operand , Operands
Labels
Assembler instructions can be labeled just like other statements. They can be the target of goto statements. For example:
void *pc;
asm
{
  call L1          ;
 L1:               ;
  pop  EBX         ;
  mov  pc[EBP],EBX ; // pc now points to code at L1
}
align IntegerExpression
IntegerExpression:
    IntegerLiteral
    Identifier
	Causes the assembler to emit NOP instructions to align the next assembler instruction on an IntegerExpression boundary. IntegerExpression must evaluate at compile time to an integer that is a power of 2.
Aligning the start of a loop body can sometimes have a dramatic effect on the execution speed.
even
Causes the assembler to emit NOP instructions to align the next assembler instruction on an even boundary.
naked
Causes the compiler to not generate the function prolog and epilog sequences. This means such is the responsibility of inline assembly programmer, and is normally used when the entire function is to be written in assembler.
db, ds, di, dl, df, dd, de
These pseudo ops are for inserting raw data directly into the code. db is for bytes, ds is for 16 bit words, di is for 32 bit words, dl is for 64 bit words, df is for 32 bit floats, dd is for 64 bit doubles, and de is for 80 bit extended reals. Each can have multiple operands. If an operand is a string literal, it is as if there were length operands, where length is the number of characters in the string. One character is used per operand. For example:asm
{
  db 5,6,0x83;   // insert bytes 0x05, 0x06, and 0x83 into code
  ds 0x1234;     // insert bytes 0x34, 0x12
  di 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00
  dl 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
  df 1.234;      // insert float 1.234
  dd 1.234;      // insert double 1.234
  de 1.234;      // insert real 1.234
  db "abc";      // insert bytes 0x61, 0x62, and 0x63
  ds "abc";      // insert bytes 0x61, 0x00, 0x62, 0x00, 0x63, 0x00
}
Opcodes
A list of supported opcodes is at the end.The following registers are supported. Register names are always in upper case.
Register:
    AL AH AX EAX
    BL BH BX EBX
    CL CH CX ECX
    DL DH DX EDX
    BP EBP
    SP ESP
    DI EDI
    SI ESI
    ES CS SS DS GS FS
    CR0 CR2 CR3 CR4
    DR0 DR1 DR2 DR3 DR6 DR7
    TR3 TR4 TR5 TR6 TR7
    ST
    ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7)
    MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7
    XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7
Special Cases
- lock, rep, repe, repne, repnz, repz
 - These prefix instructions do not appear in the same statement
	as the instructions they prefix; they appear in their own statement.
	For example:
asm { rep ; movsb ; } - pause
 - This opcode is not supported by the assembler, instead use
asm { rep ; nop ; }which produces the same result. - floating point ops
 - Use the two operand form of the instruction format;
fdiv ST(1); // wrong fmul ST; // wrong fdiv ST,ST(1); // right fmul ST,ST(0); // right 
Operands
Operand:
    AsmExp
AsmExp:
    AsmLogOrExp
    AsmLogOrExp ? AsmExp : AsmExp
AsmLogOrExp:
    AsmLogAndExp
    AsmLogAndExp || AsmLogAndExp
AsmLogAndExp:
    AsmOrExp
    AsmOrExp && AsmOrExp
AsmOrExp:
    AsmXorExp
    AsmXorExp | AsmXorExp
AsmXorExp:
    AsmAndExp
    AsmAndExp ^ AsmAndExp
AsmAndExp:
    AsmEqualExp
    AsmEqualExp & AsmEqualExp
AsmEqualExp:
    AsmRelExp
    AsmRelExp == AsmRelExp
    AsmRelExp != AsmRelExp
AsmRelExp:
    AsmShiftExp
    AsmShiftExp < AsmShiftExp
    AsmShiftExp <= AsmShiftExp
    AsmShiftExp > AsmShiftExp
    AsmShiftExp >= AsmShiftExp
AsmShiftExp:
    AsmAddExp
    AsmAddExp << AsmAddExp
    AsmAddExp >> AsmAddExp
    AsmAddExp >>> AsmAddExp
AsmAddExp:
    AsmMulExp
    AsmMulExp + AsmMulExp
    AsmMulExp - AsmMulExp
AsmMulExp:
    AsmBrExp
    AsmBrExp * AsmBrExp
    AsmBrExp / AsmBrExp
    AsmBrExp % AsmBrExp
AsmBrExp:
    AsmUnaExp
    AsmBrExp [ AsmExp ]
AsmUnaExp:
    AsmTypePrefix AsmExp
    offsetof AsmExp
    seg AsmExp
    + AsmUnaExp
    - AsmUnaExp
    ! AsmUnaExp
    ~ AsmUnaExp
    AsmPrimaryExp
AsmPrimaryExp:
    IntegerLiteral
    FloatLiteral
    __LOCAL_SIZE
    $
    Register
    DotIdentifier
DotIdentifier:
    Identifier
    Identifier . DotIdentifier
	The operand syntax more or less follows the Intel CPU documentation conventions. In particular, the convention is that for two operand instructions the source is the right operand and the destination is the left operand. The syntax differs from that of Intel's in order to be compatible with the D language tokenizer and to simplify parsing.
The seg means load the segment number that the symbol is in. This is not relevant for flat model code. Instead, do a move from the relevant segment register.
Operand Types
AsmTypePrefix:
    near ptr
    far ptr
    byte ptr
    short ptr
    int ptr
    word ptr
    dword ptr
    qword ptr
    float ptr
    double ptr
    real ptr
	In cases where the operand size is ambiguous, as in:
add	[EAX],3		;
	it can be disambiguated by using an AsmTypePrefix:
add  byte ptr [EAX],3 ;
add  int ptr [EAX],7  ;
	far ptr is not relevant for flat model code.
Struct/Union/Class Member Offsets
To access members of an aggregate, given a pointer to the aggregate is in a register, use the qualified name of the member:
struct Foo { int a,b,c; }
int bar(Foo *f) {
  asm {
    mov EBX,f          ;
    mov EAX,Foo.b[EBX] ;
  }
}
Stack Variables
Stack variables (variables local to a function and allocated on the stack) are accessed via the name of the variable indexed by EBP:
int foo(int x) {
  asm {
    mov EAX,x[EBP] ; // loads value of parameter x into EAX
    mov EAX,x      ; // does the same thing
  }
}
	If the [EBP] is omitted, it is assumed for local variables. If naked is used, this no longer holds.
Special Symbols
- $
 - Represents the program counter of the start of the next
	instruction. So,
jmp $ ;branches to the instruction following the jmp instruction. The $ can only appear as the target of a jmp or call instruction. - __LOCAL_SIZE
 - This gets replaced by the number of local bytes in the local stack frame. It is most handy when the naked is invoked and a custom stack frame is programmed.
 
Opcodes Supported
| aaa | aad | aam | aas | adc | 
| add | addpd | addps | addsd | addss | 
| and | andnpd | andnps | andpd | andps | 
| arpl | bound | bsf | bsr | bswap | 
| bt | btc | btr | bts | call | 
| cbw | cdq | clc | cld | clflush | 
| cli | clts | cmc | cmova | cmovae | 
| cmovb | cmovbe | cmovc | cmove | cmovg | 
| cmovge | cmovl | cmovle | cmovna | cmovnae | 
| cmovnb | cmovnbe | cmovnc | cmovne | cmovng | 
| cmovnge | cmovnl | cmovnle | cmovno | cmovnp | 
| cmovns | cmovnz | cmovo | cmovp | cmovpe | 
| cmovpo | cmovs | cmovz | cmp | cmppd | 
| cmpps | cmps | cmpsb | cmpsd | cmpss | 
| cmpsw | cmpxch8b | cmpxchg | comisd | comiss | 
| cpuid | cvtdq2pd | cvtdq2ps | cvtpd2dq | cvtpd2pi | 
| cvtpd2ps | cvtpi2pd | cvtpi2ps | cvtps2dq | cvtps2pd | 
| cvtps2pi | cvtsd2si | cvtsd2ss | cvtsi2sd | cvtsi2ss | 
| cvtss2sd | cvtss2si | cvttpd2dq | cvttpd2pi | cvttps2dq | 
| cvttps2pi | cvttsd2si | cvttss2si | cwd | cwde | 
| da | daa | das | db | dd | 
| de | dec | df | di | div | 
| divpd | divps | divsd | divss | dl | 
| dq | ds | dt | dw | emms | 
| enter | f2xm1 | fabs | fadd | faddp | 
| fbld | fbstp | fchs | fclex | fcmovb | 
| fcmovbe | fcmove | fcmovnb | fcmovnbe | fcmovne | 
| fcmovnu | fcmovu | fcom | fcomi | fcomip | 
| fcomp | fcompp | fcos | fdecstp | fdisi | 
| fdiv | fdivp | fdivr | fdivrp | feni | 
| ffree | fiadd | ficom | ficomp | fidiv | 
| fidivr | fild | fimul | fincstp | finit | 
| fist | fistp | fisub | fisubr | fld | 
| fld1 | fldcw | fldenv | fldl2e | fldl2t | 
| fldlg2 | fldln2 | fldpi | fldz | fmul | 
| fmulp | fnclex | fndisi | fneni | fninit | 
| fnop | fnsave | fnstcw | fnstenv | fnstsw | 
| fpatan | fprem | fprem1 | fptan | frndint | 
| frstor | fsave | fscale | fsetpm | fsin | 
| fsincos | fsqrt | fst | fstcw | fstenv | 
| fstp | fstsw | fsub | fsubp | fsubr | 
| fsubrp | ftst | fucom | fucomi | fucomip | 
| fucomp | fucompp | fwait | fxam | fxch | 
| fxrstor | fxsave | fxtract | fyl2x | fyl2xp1 | 
| hlt | idiv | imul | in | inc | 
| ins | insb | insd | insw | int | 
| into | invd | invlpg | iret | iretd | 
| ja | jae | jb | jbe | jc | 
| jcxz | je | jecxz | jg | jge | 
| jl | jle | jmp | jna | jnae | 
| jnb | jnbe | jnc | jne | jng | 
| jnge | jnl | jnle | jno | jnp | 
| jns | jnz | jo | jp | jpe | 
| jpo | js | jz | lahf | lar | 
| ldmxcsr | lds | lea | leave | les | 
| lfence | lfs | lgdt | lgs | lidt | 
| lldt | lmsw | lock | lods | lodsb | 
| lodsd | lodsw | loop | loope | loopne | 
| loopnz | loopz | lsl | lss | ltr | 
| maskmovdqu | maskmovq | maxpd | maxps | maxsd | 
| maxss | mfence | minpd | minps | minsd | 
| minss | mov | movapd | movaps | movd | 
| movdq2q | movdqa | movdqu | movhlps | movhpd | 
| movhps | movlhps | movlpd | movlps | movmskpd | 
| movmskps | movntdq | movnti | movntpd | movntps | 
| movntq | movq | movq2dq | movs | movsb | 
| movsd | movss | movsw | movsx | movupd | 
| movups | movzx | mul | mulpd | mulps | 
| mulsd | mulss | neg | nop | not | 
| or | orpd | orps | out | outs | 
| outsb | outsd | outsw | packssdw | packsswb | 
| packuswb | paddb | paddd | paddq | paddsb | 
| paddsw | paddusb | paddusw | paddw | pand | 
| pandn | pavgb | pavgw | pcmpeqb | pcmpeqd | 
| pcmpeqw | pcmpgtb | pcmpgtd | pcmpgtw | pextrw | 
| pinsrw | pmaddwd | pmaxsw | pmaxub | pminsw | 
| pminub | pmovmskb | pmulhuw | pmulhw | pmullw | 
| pmuludq | pop | popa | popad | popf | 
| popfd | por | prefetchnta | prefetcht0 | prefetcht1 | 
| prefetcht2 | psadbw | pshufd | pshufhw | pshuflw | 
| pshufw | pslld | pslldq | psllq | psllw | 
| psrad | psraw | psrld | psrldq | psrlq | 
| psrlw | psubb | psubd | psubq | psubsb | 
| psubsw | psubusb | psubusw | psubw | punpckhbw | 
| punpckhdq | punpckhqdq | punpckhwd | punpcklbw | punpckldq | 
| punpcklqdq | punpcklwd | push | pusha | pushad | 
| pushf | pushfd | pxor | rcl | rcpps | 
| rcpss | rcr | rdmsr | rdpmc | rdtsc | 
| rep | repe | repne | repnz | repz | 
| ret | retf | rol | ror | rsm | 
| rsqrtps | rsqrtss | sahf | sal | sar | 
| sbb | scas | scasb | scasd | scasw | 
| seta | setae | setb | setbe | setc | 
| sete | setg | setge | setl | setle | 
| setna | setnae | setnb | setnbe | setnc | 
| setne | setng | setnge | setnl | setnle | 
| setno | setnp | setns | setnz | seto | 
| setp | setpe | setpo | sets | setz | 
| sfence | sgdt | shl | shld | shr | 
| shrd | shufpd | shufps | sidt | sldt | 
| smsw | sqrtpd | sqrtps | sqrtsd | sqrtss | 
| stc | std | sti | stmxcsr | stos | 
| stosb | stosd | stosw | str | sub | 
| subpd | subps | subsd | subss | sysenter | 
| sysexit | test | ucomisd | ucomiss | ud2 | 
| unpckhpd | unpckhps | unpcklpd | unpcklps | verr | 
| verw | wait | wbinvd | wrmsr | xadd | 
| xchg | xlat | xlatb | xor | xorpd | 
| xorps | 
Pentium 4 (Prescott) Opcodes Supported
| addsubpd | addsubps | fisttp | haddpd | haddps | 
| hsubpd | hsubps | lddqu | monitor | movddup | 
| movshdup | movsldup | mwait | 
AMD Opcodes Supported
| pavgusb | pf2id | pfacc | pfadd | pfcmpeq | 
| pfcmpge | pfcmpgt | pfmax | pfmin | pfmul | 
| pfnacc | pfpnacc | pfrcp | pfrcpit1 | pfrcpit2 | 
| pfrsqit1 | pfrsqrt | pfsub | pfsubr | pi2fd | 
| pmulhrw | pswapd | 
		D Programming Language