There are two ways of casting with AVX2, either:
__m256i b = ...set register...
auto c = (__m256d)b; // version 1
auto d = _mm256_castsi256_pd(b); // version 2
I assume that both of these should give same results. The official manual from Intel says that there is zero runtime latency for version 2. Can I use version 1 as well with a zero latency assumption? In addition can I assume casting from any to any register type with version 2 is zero latency.
Aucun commentaire:
Enregistrer un commentaire