Don't make use of cvtps2dq in MSan builds

MemorySanitizer's instrumentation currently does not handle the cvtps2dq
intrinsic/instruction. It falls back to checking the source operand for
any uninitialized data. In shaders we can have conditional code which
causes only some SIMD lanes to be initialized, and only those lanes are
logically used in further computations. So it's valid in these cases to
do operations that use cvtps2dq on a partially initialized vector, but
MSan will report it as an error.

This can be worked around by avoiding the use of cvtps2dq and instead
relying on lowerRoundInt(), which translates into the nearbyint
intrinsic followed by an FPToSI. MemorySanitizer handles the former
in the maybeHandleSimpleNomemIntrinsic() method, which copies the shadow
of the source vector to the destination vector (i.e. it propagates it
without checking it).

Note that cvtps2dq does not follow the same code path as nearbyint
because it has different types for the source and destination vector.
We could handle it explicitly by doing the same shadow propagation.
Considering that the workaround in Reactor results in simply using two
cheap instructions instead of one it's not performance critical to fix
this in LLVM.

The rr::RoundIntClamped() intrinsic required a fix in the fallback path:
0x80000000 was meant to represent -2147483648 but instead got casted
to a positive float value of 2147483648.0f. Explicitly casting it to
int first produces the desired negative integer value before the
conversion to float.

Bug: b/172238865
Change-Id: I4f07bb8cb6d25d914dab836f64510f8b2bad18ba
Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/65608
Kokoro-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Alexis Hétu <sugoi@google.com>
Tested-by: Nicolas Capens <nicolascapens@google.com>
diff --git a/src/Reactor/LLVMReactor.cpp b/src/Reactor/LLVMReactor.cpp
index 77b9fbc..05002a6 100644
--- a/src/Reactor/LLVMReactor.cpp
+++ b/src/Reactor/LLVMReactor.cpp
@@ -2671,7 +2671,7 @@
 RValue<Int4> RoundInt(RValue<Float4> cast)
 {
 	RR_DEBUG_INFO_UPDATE_LOC();
-#if defined(__i386__) || defined(__x86_64__)
+#if(defined(__i386__) || defined(__x86_64__)) && !__has_feature(memory_sanitizer)
 	return x86::cvtps2dq(cast);
 #else
 	return As<Int4>(V(lowerRoundInt(V(cast.value()), T(Int4::type()))));
@@ -2683,7 +2683,7 @@
 	RR_DEBUG_INFO_UPDATE_LOC();
 
 // TODO(b/165000222): Check if fptosi_sat produces optimal code for x86 and ARM.
-#if defined(__i386__) || defined(__x86_64__)
+#if(defined(__i386__) || defined(__x86_64__)) && !__has_feature(memory_sanitizer)
 	// cvtps2dq produces 0x80000000, a negative value, for input larger than
 	// 2147483520.0, so clamp to 2147483520. Values less than -2147483520.0
 	// saturate to 0x80000000.
@@ -2698,7 +2698,7 @@
 	    jit->module.get(), llvm::Intrinsic::fptosi_sat, { T(Int4::type()), T(Float4::type()) });
 	return RValue<Int4>(V(jit->builder->CreateCall(fptosi_sat, { rounded })));
 #else
-	RValue<Float4> clamped = Max(Min(cast, Float4(0x7FFFFF80)), Float4(0x80000000));
+	RValue<Float4> clamped = Max(Min(cast, Float4(0x7FFFFF80)), Float4(static_cast<int>(0x80000000)));
 	return As<Int4>(V(lowerRoundInt(V(clamped.value()), T(Int4::type()))));
 #endif
 }
@@ -3591,6 +3591,8 @@
 
 RValue<Int4> cvtps2dq(RValue<Float4> val)
 {
+	ASSERT(!__has_feature(memory_sanitizer));  // TODO(b/172238865): Not correctly instrumented by MemorySanitizer.
+
 	return RValue<Int4>(createInstruction(llvm::Intrinsic::x86_sse2_cvtps2dq, val.value()));
 }