Process independent vertex elements

Previously, vertices would be processed in consecutive groups of four
(for SSE/NEON). Now four indices are read from the index buffer.
Reading the input was already a gather operation, but with constant
stride. The vertex cache now performs a scatter. The vertices are
written in reverse order so that the first vertex in a group is always
present in the cache.

Also use 2^32-1 as invalid vertex cache index (corresponds with the
primitive restart index) instead of 0x80000000, since
maxDrawIndexedIndexValue is UINT32_MAX.

Bug: b/27351835
Test: dEQP-VK.glsl.loops.special.do_while_dynamic_iterations.dowhile_trap_vertex
Change-Id: Ic69dbf53c67cbda50e44913ccae91aaca2b86e21
Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/32609
Presubmit-Ready: Nicolas Capens <nicolascapens@google.com>
Kokoro-Presubmit: kokoro <noreply+kokoro@google.com>
Tested-by: Nicolas Capens <nicolascapens@google.com>
Reviewed-by: Alexis Hétu <sugoi@google.com>
diff --git a/src/Device/VertexProcessor.hpp b/src/Device/VertexProcessor.hpp
index 811ac32..a17e86a 100644
--- a/src/Device/VertexProcessor.hpp
+++ b/src/Device/VertexProcessor.hpp
@@ -25,12 +25,16 @@
 {
 	struct DrawData;
 
-	struct VertexCache   // FIXME: Variable size
+	// Basic direct mapped vertex cache.
+	struct VertexCache
 	{
+		static constexpr uint32_t SIZE = 64;  // TODO: Variable size?
+		static constexpr uint32_t TAG_MASK = SIZE - 1;  // Size must be power of 2.
+
 		void clear();
 
-		Vertex vertex[16][4];
-		unsigned int tag[16];
+		Vertex vertex[SIZE];
+		uint32_t tag[SIZE];
 
 		int drawCall;
 	};