Process independent vertex elements

Previously, vertices would be processed in consecutive groups of four
(for SSE/NEON). Now four indices are read from the index buffer.
Reading the input was already a gather operation, but with constant
stride. The vertex cache now performs a scatter. The vertices are
written in reverse order so that the first vertex in a group is always
present in the cache.

Also use 2^32-1 as invalid vertex cache index (corresponds with the
primitive restart index) instead of 0x80000000, since
maxDrawIndexedIndexValue is UINT32_MAX.

Bug: b/27351835
Test: dEQP-VK.glsl.loops.special.do_while_dynamic_iterations.dowhile_trap_vertex
Change-Id: Ic69dbf53c67cbda50e44913ccae91aaca2b86e21
Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/32609
Presubmit-Ready: Nicolas Capens <nicolascapens@google.com>
Kokoro-Presubmit: kokoro <noreply+kokoro@google.com>
Tested-by: Nicolas Capens <nicolascapens@google.com>
Reviewed-by: Alexis Hétu <sugoi@google.com>
diff --git a/src/Pipeline/VertexProgram.cpp b/src/Pipeline/VertexProgram.cpp
index 647ff3a..e240e7f 100644
--- a/src/Pipeline/VertexProgram.cpp
+++ b/src/Pipeline/VertexProgram.cpp
@@ -73,17 +73,23 @@
 	{
 	}
 
-	void VertexProgram::program(UInt &index)
+	void VertexProgram::program(Pointer<UInt> &batch)
 	{
 		auto it = spirvShader->inputBuiltins.find(spv::BuiltInVertexIndex);
 		if (it != spirvShader->inputBuiltins.end())
 		{
 			assert(it->second.SizeInComponents == 1);
+
+			Int4 indices;
+			indices = Insert(indices, As<Int>(batch[0]), 0);
+			indices = Insert(indices, As<Int>(batch[1]), 1);
+			indices = Insert(indices, As<Int>(batch[2]), 2);
+			indices = Insert(indices, As<Int>(batch[3]), 3);
 			routine.getVariable(it->second.Id)[it->second.FirstComponent] =
-					As<Float4>(Int4(As<Int>(index) + *Pointer<Int>(data + OFFSET(DrawData, baseVertex))) + Int4(0, 1, 2, 3));
+					As<Float4>(indices + Int4(*Pointer<Int>(data + OFFSET(DrawData, baseVertex))));
 		}
 
-		auto activeLaneMask = SIMD::Int(0xFFFFFFFF); // TODO: Control this.
+		auto activeLaneMask = SIMD::Int(0xFFFFFFFF);
 		spirvShader->emit(&routine, activeLaneMask, descriptorSets);
 
 		spirvShader->emitEpilog(&routine);