You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
let summary = "Four-way byte dot product-accumulate instruction.";
3467
+
let description = [{
3468
+
Performs a four-way byte dot-product which is accumulated in a 32-bit
3469
+
result.
3470
+
Operand `a` and `b` are vectors of 4 bytes between which the dot product is
3471
+
computed.
3472
+
The `a_type` and `b_type` attributes specify the type of the elements in `a`
3473
+
and `b` respectively.
3474
+
If `a_type` or `b_type` is `s8`, then the elements in the corresponding
3475
+
vector are sign-extended to 32-bit before the dot product is computed.
3476
+
If `a_type` or `b_type` is `u8`, then the elements in the corresponding
3477
+
vector are zero-extended to 32-bit instead.
3478
+
Operand `c` is a 32-bit integer to which the result is accumulated. It is
3479
+
treated as holding a signed integer if any of `a_type` or `b_type` is `s8`.
3480
+
3481
+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-dp4a)
0 commit comments