//===---------------------------------------------------------------------===// // Random ideas for the X86 backend: FP stack related stuff //===---------------------------------------------------------------------===//
//===---------------------------------------------------------------------===//
Some targets (e.g. athlons) prefer freep to fstp ST(0): http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html
//===---------------------------------------------------------------------===//
This should use fiadd on chips where it is profitable: double foo(double P, int *I) { return P+*I; }
We have fiadd patterns now but the followings have the same cost and complexity. We need a way to specify the later is more profitable.
def FpADD32m : FpI<(ops RFP:dst,RFP:dst, RFP:src1, f32mem:src2),OneArgFPRW,[(setRFP:src2), OneArgFPRW, [(set RFP:dst, (fadd RFP:src1,(extloadf64f32addr:src1, (extloadf64f32 addr:src2)))]>; // ST(0) = ST(0) + [mem32]
def FpIADD32m : FpI<(ops RFP:dst,RFP:dst, RFP:src1, i32mem:src2),OneArgFPRW,[(setRFP:src2), OneArgFPRW, [(set RFP:dst, (fadd RFP:src1,(X86fildaddr:src1, (X86fild addr:src2, i32)))]>; // ST(0) = ST(0) + [mem32int]
//===---------------------------------------------------------------------===//
The FP stackifier should handle simple permutates to reduce number of shuffle instructions, e.g. turning:
fld P -> fld Q fld Q fld P fxch
or:
fxch -> fucomi fucomi jl X jg X
Ideas: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html
//===---------------------------------------------------------------------===//
Add a target specific hook to DAG combiner to handle SINT_TO_FP and FP_TO_SINT when the source operand is already in memory.
//===---------------------------------------------------------------------===//
Open code rint,floor,ceil,trunc: http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html
Opencode the sincos[f] libcall.
//===---------------------------------------------------------------------===//
None of the FPStack instructions are handled in X86RegisterInfo::foldMemoryOperand, which prevents the spiller from folding spill code into the instructions.
//===---------------------------------------------------------------------===//
Currently the x86 codegen isn't very good at mixing SSE and FPStack code:
unsigned int foo(double x) { return x; }
foo: subl $20, %esp movsd 24(%esp), %xmm0 movsd %xmm0, 8(%esp) fldl 8(%esp) fisttpll (%esp) movl (%esp), %eax addl $20, %esp ret
This just requires being smarter when custom expanding fptoui.
//===---------------------------------------------------------------------===//