Talk:Attention (machine learning)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Confusing line "X is the input matrix of word embeddings, size 4 x 300. x is the word vector for "that". "[edit]

After "4x300" it immediately says "x is the word for 'that'." That's super-confusing, because one might think that the second x refers to the x between 4 and 300. There are three different uses of x in the sentence. Someone familiar with the field will be able to understand it, but wikipedia is meant to be clear as possible. ThinkerFeeler (talk) 00:20, 30 July 2023 (UTC)[reply]

typo in "asterix"?[edit]

in the following extract: " The asterix within parenthesis "(*)" denotes the softmax" shouldn't the word be asterisk, not asterix ? :-) Jrob kiwi (talk) 16:35, 23 August 2023 (UTC)[reply]

Does RNN mean "recursive neural network" or "recurrent neural network"?[edit]

In this article, is RNN supposed to mean "recursive neural network" or "recurrent neural network", or maybe sometimes one and sometimes the other? Once we figure this out, let's replace all occurrences with the correct three words, so that it is immediately clear even to novices. —Quantling (talk | contribs) 16:14, 24 October 2023 (UTC)[reply]

I'm pretty sure it is "recurrent". I am going to go ahead and edit. If I have it wrong, please accept my apologies ... and fix my edit. —Quantling (talk | contribs) 16:23, 24 October 2023 (UTC)[reply]

hard vs soft weights[edit]

The intro mentions hard and soft weights, which I havent heard before in this context. can someone provide a citation showing it is actually used terminology? DMH43 (talk) 15:15, 26 December 2023 (UTC)[reply]

'word' should be replaced with something more generic[edit]

The article frequently uses the word "word" when talking about attention. For example the opening paragraph states: "It calculates "soft" weights for each word, more precisely for its embedding, in the context window.". However, attention is a concept that is independent of input type - it can and has been applied to words, pixel values, quantities, etc. I believe it would be clearer to replace the use of "word" in reference to the inputs that attention is applied to, with something more generic such as "input element" or "token". 180.150.65.6 (talk) 14:31, 5 March 2024 (UTC)[reply]