Scaled Dot-Product Attention

English

a guide for that

Concept

Scaled Dot-Product Attention 0

Scaled Dot-Product Attention is a mechanism that calculates attention scores using the dot product of query and key vectors, which are then scaled down by the square root of the dimension of the key vectors to prevent excessively large gradients. This technique is fundamental in transformer models, enabling them to focus on relevant parts of the input sequence efficiently.

Relevant Degrees

Artificial Intelligence Systems 67%

Computational Mathematics 33%