Subzero: Initial implementation of multithreaded translation.

Provides a single-producer, multiple-consumer translation queue where the number of translation threads is given by the -threads=N argument.  The producer (i.e., bitcode parser) blocks if the queue size is >=N, in order to control the memory footprint.  If N=0 (which is the default), execution is purely single-threaded.  If N=1, there is a single translation thread running in parallel with the parser thread.  "make check" succeeds with the default changed to N=1.

Currently emission is also done by the translation thread, which limits scalability since the emit stream has to be locked.  Also, since the ELF writer stream is not locked, it won't be safe to use N>1 with the ELF writer.  Furthermore, for N>1, emitted function ordering is nondeterministic and needs to be recombobulated.  This will all be fixed in a follow-on CL.

The -timing option is broken for N>0.  This will be fixed in a follow-on CL.

Verbose flags are now managed in the Cfg instead of (or in addition to) the GlobalContext, due to the -verbose-focus option which wants to temporarily change the verbose level for a particular function.

TargetLowering::emitConstants() and related methods are changed to be static, so that a valid TargetLowering object isn't required.  This is because the TargetLowering object wants to hold a valid Cfg, and none really exists after all functions are translated and the constant pool is ready for emission.

The Makefile.standalone now has a TSAN=1 option to enable ThreadSanitizer.

BUG= none
R=jfb@chromium.org

Review URL: https://codereview.chromium.org/870653002
diff --git a/src/IceDefs.h b/src/IceDefs.h
index 9add7ba..051edbb 100644
--- a/src/IceDefs.h
+++ b/src/IceDefs.h
@@ -23,7 +23,9 @@
 #include <limits>
 #include <list>
 #include <map>
+#include <mutex>
 #include <string>
+#include <system_error>
 #include <vector>
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/BitVector.h"
@@ -58,6 +60,7 @@
 class LiveRange;
 class Liveness;
 class Operand;
+class TargetGlobalLowering;
 class TargetLowering;
 class Variable;
 class VariableDeclaration;
@@ -120,6 +123,17 @@
 typedef uint32_t TimerStackIdT;
 typedef uint32_t TimerIdT;
 
+// Use alignas(MaxCacheLineSize) to isolate variables/fields that
+// might be contended while multithreading.  Assumes the maximum cache
+// line size is 64.
+enum {
+  MaxCacheLineSize = 64
+};
+// Use ICE_CACHELINE_BOUNDARY to force the next field in a declaration
+// list to be aligned to the next cache line.
+#define ICE_CACHELINE_BOUNDARY                                                 \
+  alignas(MaxCacheLineSize) struct {}
+
 // PNaCl is ILP32, so theoretically we should only need 32-bit offsets.
 typedef int32_t RelocOffsetT;
 enum { RelocAddrSize = 4 };
@@ -163,6 +177,37 @@
 typedef llvm::raw_ostream Ostream;
 typedef llvm::raw_fd_ostream Fdstream;
 
+typedef std::mutex GlobalLockType;
+
+enum ErrorCodes {
+  EC_None = 0,
+  EC_Args,
+  EC_Bitcode,
+  EC_Translation
+};
+
+// Wrapper around std::error_code for allowing multiple errors to be
+// folded into one.  The current implementation keeps track of the
+// first error, which is likely to be the most useful one, and this
+// could be extended to e.g. collect a vector of errors.
+class ErrorCode : public std::error_code {
+  ErrorCode(const ErrorCode &) = delete;
+  ErrorCode &operator=(const ErrorCode &) = delete;
+
+public:
+  ErrorCode() : HasError(false) {}
+  void assign(ErrorCodes Code) {
+    if (!HasError) {
+      HasError = true;
+      std::error_code::assign(Code, std::generic_category());
+    }
+  }
+  void assign(int Code) { assign(static_cast<ErrorCodes>(Code)); }
+
+private:
+  bool HasError;
+};
+
 // Reverse range adaptors written in terms of llvm::make_range().
 template <typename T>
 llvm::iterator_range<typename T::const_reverse_iterator>