Talk:Attention (machine learning)

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
???	This article has not yet received a rating on the project's importance scale.

Statistics

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
???	This article has not yet received a rating on the importance scale.

Systems

	Systems science portal This article is within the scope of WikiProject Systems, which collaborates on articles related to systems and systems science.SystemsWikipedia:WikiProject SystemsTemplate:WikiProject SystemsSystems articles
???	This article has not yet received a rating on the project's importance scale.
	This article is within the field of Cybernetics.

Computing

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
???	This article has not yet received a rating on the project's importance scale.

Computer science

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

???

This article has not yet received a rating on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Daily pageviews of this article

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

Confusing line "X is the input matrix of word embeddings, size 4 x 300. x is the word vector for "that". "[edit]

After "4x300" it immediately says "x is the word for 'that'." That's super-confusing, because one might think that the second x refers to the x between 4 and 300. There are three different uses of x in the sentence. Someone familiar with the field will be able to understand it, but wikipedia is meant to be clear as possible. ThinkerFeeler (talk) 00:20, 30 July 2023 (UTC)[reply]

typo in "asterix"?[edit]

in the following extract: " The asterix within parenthesis "(*)" denotes the softmax" shouldn't the word be asterisk, not asterix ? :-) Jrob kiwi (talk) 16:35, 23 August 2023 (UTC)[reply]

Does RNN mean "recursive neural network" or "recurrent neural network"?[edit]

In this article, is RNN supposed to mean "recursive neural network" or "recurrent neural network", or maybe sometimes one and sometimes the other? Once we figure this out, let's replace all occurrences with the correct three words, so that it is immediately clear even to novices. — $Q$ uantling (talk | contribs) 16:14, 24 October 2023 (UTC)[reply]

I'm pretty sure it is "recurrent". I am going to go ahead and edit. If I have it wrong, please accept my apologies ... and fix my edit. —

Q

uantling (talk | contribs) 16:23, 24 October 2023 (UTC)[reply]

hard vs soft weights[edit]

The intro mentions hard and soft weights, which I havent heard before in this context. can someone provide a citation showing it is actually used terminology? DMH43 (talk) 15:15, 26 December 2023 (UTC)[reply]

'word' should be replaced with something more generic[edit]

The article frequently uses the word "word" when talking about attention. For example the opening paragraph states: "It calculates "soft" weights for each word, more precisely for its embedding, in the context window.". However, attention is a concept that is independent of input type - it can and has been applied to words, pixel values, quantities, etc. I believe it would be clearer to replace the use of "word" in reference to the inputs that attention is applied to, with something more generic such as "input element" or "token". 180.150.65.6 (talk) 14:31, 5 March 2024 (UTC)[reply]