Subzero: Add branch optimization.

1. Unconditional branch to the next basic block is removed.

2. For a conditional branch with a "false" edge to the next basic block, remove the unconditional branch to the fallthrough block.

3. For a conditional branch with a "true" edge to the next basic block, invert the condition and do like #2.

This is enabled only for O2, particularly because inverting the branch condition is a marginally risky operation.

This decreases the instruction count by about 5-6%.

Also, --stats prints a final tally to make it easier to post-process the output.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/580903005
diff --git a/src/IceTargetLoweringX8632.cpp b/src/IceTargetLoweringX8632.cpp
index 464a2e8..31c11d4 100644
--- a/src/IceTargetLoweringX8632.cpp
+++ b/src/IceTargetLoweringX8632.cpp
@@ -395,6 +395,14 @@
   T_genFrame.printElapsedUs(Context, "genFrame()");
   Func->dump("After stack frame mapping");
 
+  // Branch optimization.  This needs to be done just before code
+  // emission.  In particular, no transformations that insert or
+  // reorder CfgNodes should be done after branch optimization.  We go
+  // ahead and do it before nop insertion to reduce the amount of work
+  // needed for searching for opportunities.
+  Func->doBranchOpt();
+  Func->dump("After branch optimization");
+
   // Nop insertion
   if (shouldDoNopInsertion()) {
     Func->doNopInsertion();
@@ -444,6 +452,13 @@
   }
 }
 
+bool TargetX8632::doBranchOpt(Inst *I, const CfgNode *NextNode) {
+  if (InstX8632Br *Br = llvm::dyn_cast<InstX8632Br>(I)) {
+    return Br->optimizeBranch(NextNode);
+  }
+  return false;
+}
+
 IceString TargetX8632::RegNames[] = {
 #define X(val, init, name, name16, name8, scratch, preserved, stackptr,        \
           frameptr, isI8, isInt, isFP)                                         \