Clarify 3-level caching of sampling routines Bug: documentation Change-Id: I708546425c3fee85844884ed5dfc8f52af1f1830 Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/51428 Kokoro-Result: kokoro <noreply+kokoro@google.com> Tested-by: Nicolas Capens <nicolascapens@google.com> Reviewed-by: Alexis Hétu <sugoi@google.com>

commit: b9e179f1f239bd1da8fffbd997a41095345d1f6a [log] [tgz]
author: Nicolas Capens <capn@google.com> Fri Jan 01 22:32:28 2021 -0500
committer: Nicolas Capens <nicolascapens@google.com> Tue Jan 05 16:56:01 2021 +0000
tree: deafb02cb83635a6568e33329209ee3f2bfa4fa9
parent: ff29e249d3179766a40c9486183559fe3e5c1456 [diff]
diff --git a/docs/SamplingRoutines.md b/docs/SamplingRoutines.md
index e965781..9c4716a 100644
--- a/docs/SamplingRoutines.md
+++ b/docs/SamplingRoutines.md

@@ -20,11 +20,11 @@
 

 We cache the generated sampling routines, using the descriptors as well as the type of sampling instruction, as the key. This is done at three levels, described in reverse order for easier understanding:

 

-At the third and last level, we use a least-recently-used (LRU) cache, just like the caches of the pipeline stages' routines. It is protected by a mutex, and it may experience high contention due to all shader worker threads needing the sampling routines.

+L3: At the third and last level, we use a generic least-recently-used (LRU) cache, just like the caches of the pipeline stages' routines. It is protected by a mutex, which may experience high contention due to all shader worker threads needing the sampling routines.

 

-To mitigate that, there's a second-level cache which contains a 'snapshot' of the last-level cache, which can be queried concurrently without locking. The snapshot is updated at pipeline barriers. While much faster than the last-level cache's critical section, the lookup is still a lot of work per sampling instruction.

+L2: To mitigate that, there's a second-level cache which contains a 'snapshot' of the last-level cache, which can be queried concurrently without locking. The snapshot is updated at pipeline barriers. While much faster than the last-level cache's critical section, the hash table lookup is still a lot of work per sampling instruction.

 

-Often the descriptors being used don't change between executions of the sampling instruction. Which is where the first-level cache comes in. It is a single-entry cache implemented at the compiled shader level. Before calling out to the C++ function to retrieve the routine, we check if the sampler and image descriptor haven't changed since the last execution of the instruction. Note that this cache doesn't use the instruction type as part of the lookup key, since each sampling instruction instance gets its own first-level cache.

+L1: Often the descriptors being used don't change between executions of the sampling instruction. Which is where the first-level or '[inline](https://en.wikipedia.org/wiki/Inline_caching)' cache comes in. It is a single-entry cache implemented at the compiled sampling instruction level. Before calling out to the C++ function to retrieve the routine, we check if the sampler and image descriptor haven't changed since the last execution of the instruction. Note that this cache doesn't use the instruction type as part of the lookup key, since each sampling instruction instance gets its own inline cache.

 

 Descriptor Identifiers

 ----------------------
commit	b9e179f1f239bd1da8fffbd997a41095345d1f6a	[log] [tgz]
author	Nicolas Capens <capn@google.com>	Fri Jan 01 22:32:28 2021 -0500
committer	Nicolas Capens <nicolascapens@google.com>	Tue Jan 05 16:56:01 2021 +0000
tree	deafb02cb83635a6568e33329209ee3f2bfa4fa9
parent	ff29e249d3179766a40c9486183559fe3e5c1456 [diff]