Concept
Scaled Dot-Product Attention 0
Scaled Dot-Product Attention is a mechanism that calculates attention scores using the dot product of query and key vectors, which are then scaled down by the square root of the dimension of the key vectors to prevent excessively large gradients. This technique is fundamental in transformer models, enabling them to focus on relevant parts of the input sequence efficiently.
Relevant Degrees