Subzero: Add support for SSE4.1 instructions.

* Add initial support for code generation with SSE4.1 instructions. The
following operations are affected:
 - multiplication with v4i32
 - select
 - insertelement
 - extractelement

* Add appropriate lit checks for SSE4.1 instructions. Run the crosstests
in both SSE2 and SSE4.1 mode.

* Introduce the -mattr flag to llvm2ice to control which instruction set
gets used.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/427843002
diff --git a/src/IceInstX8632.def b/src/IceInstX8632.def
index ece6a0a..932500c 100644
--- a/src/IceInstX8632.def
+++ b/src/IceInstX8632.def
@@ -88,9 +88,9 @@
   X(IceType_i64,   IceType_void, "si", ""  , "" ,  "qword ptr")   \
   X(IceType_f32,   IceType_void, "ss", "ss", "" ,  "dword ptr")   \
   X(IceType_f64,   IceType_void, "sd", "sd", "" ,  "qword ptr")   \
-  X(IceType_v4i1,  IceType_i32 , "?" , ""  , "" ,  "xmmword ptr") \
-  X(IceType_v8i1,  IceType_i16 , "?" , ""  , "" ,  "xmmword ptr") \
-  X(IceType_v16i1, IceType_i8  , "?" , ""  , "" ,  "xmmword ptr") \
+  X(IceType_v4i1,  IceType_i32 , "?" , ""  , "d",  "xmmword ptr") \
+  X(IceType_v8i1,  IceType_i16 , "?" , ""  , "w",  "xmmword ptr") \
+  X(IceType_v16i1, IceType_i8  , "?" , ""  , "b",  "xmmword ptr") \
   X(IceType_v16i8, IceType_i8  , "?" , ""  , "b",  "xmmword ptr") \
   X(IceType_v8i16, IceType_i16 , "?" , ""  , "w",  "xmmword ptr") \
   X(IceType_v4i32, IceType_i32 , "dq", ""  , "d",  "xmmword ptr") \