Optimize sin/cos polynomial evaluation using FMA

This change conditionally uses Fused Multiply-Add operations to optimize
the approximate polynomial used to implement sin and cos operations. The
MulAdd() intrinsic evaluates to an FMA instruction when it is available
and deemed more efficient than individual multiplication and addition.

Bug: b/216472189
Bug: b/169754022
Change-Id: I423425250b1d5489514683d63f3d5261f5b59dbb
Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/63548
Kokoro-Result: kokoro <noreply+kokoro@google.com>
Tested-by: Nicolas Capens <nicolascapens@google.com>
Reviewed-by: Sean Risser <srisser@google.com>
diff --git a/src/Pipeline/ShaderCore.cpp b/src/Pipeline/ShaderCore.cpp
index c2ab391..68e7fcd 100644
--- a/src/Pipeline/ShaderCore.cpp
+++ b/src/Pipeline/ShaderCore.cpp
@@ -182,7 +182,7 @@
 
 	Float4 x2 = x * x;
 
-	return ((A * x2 + B) * x2 + C) * x;
+	return MulAdd(MulAdd(A, x2, B), x2, C) * x;
 }
 
 Float4 Sin(RValue<Float4> x)