Process independent vertex elements

Previously, vertices would be processed in consecutive groups of four
(for SSE/NEON). Now four indices are read from the index buffer.
Reading the input was already a gather operation, but with constant
stride. The vertex cache now performs a scatter. The vertices are
written in reverse order so that the first vertex in a group is always
present in the cache.

Also use 2^32-1 as invalid vertex cache index (corresponds with the
primitive restart index) instead of 0x80000000, since
maxDrawIndexedIndexValue is UINT32_MAX.

Bug: b/27351835
Test: dEQP-VK.glsl.loops.special.do_while_dynamic_iterations.dowhile_trap_vertex
Change-Id: Ic69dbf53c67cbda50e44913ccae91aaca2b86e21
Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/32609
Presubmit-Ready: Nicolas Capens <nicolascapens@google.com>
Kokoro-Presubmit: kokoro <noreply+kokoro@google.com>
Tested-by: Nicolas Capens <nicolascapens@google.com>
Reviewed-by: Alexis Hétu <sugoi@google.com>
diff --git a/src/Device/VertexProcessor.cpp b/src/Device/VertexProcessor.cpp
index 82c4547..5c66309 100644
--- a/src/Device/VertexProcessor.cpp
+++ b/src/Device/VertexProcessor.cpp
@@ -25,9 +25,9 @@
 {
 	void VertexCache::clear()
 	{
-		for(int i = 0; i < 16; i++)
+		for(uint32_t i = 0; i < SIZE; i++)
 		{
-			tag[i] = 0x80000000;
+			tag[i] = 0xFFFFFFFF;
 		}
 	}