Talk:Singular value decomposition/Archive 1

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

error in theorem

I think they are errors in the statement of the theorem: first U should be an m-by-m matrix (not an m-by-n matrix)... a unitary matrix is necessarily a square matrix. and Σ should be m-by-n with diagonal entries (it is neither n-by-n nor diagonal matrix)

best regards, Sebastien.

You're absolutely right on the dimensions, thanks. It got changed about a month ago, and sadly nobody noticed. The rectangular matrix Σ is sometimes said to be a diagonal matrix, even though it is rectangular, since its off-diagonal entries are zero. However, I saw that Horn & Johnson does not do this, so I also changed that for clarity. -- Jitse Niesen (talk) 22:50, 5 September 2005 (UTC)

I removed this:

An important extension of the singular-value decomposition theorem says that if M is a symmetric square matrix then one may take G = H, and in the case in which n=m=r (the full-rank case) and all of the singular values are different one must take G = H. That is the spectral theorem.

This is incorrect: a symmetric matrix can have negative eigenvalues, and then the spectral theorem decomposition wouldn't be a singular-value decomposition.

Something is also fishy with the term "singular vector": they are not uniquely determined by M; different singular-value decompositions of M can yield different singular vectors. What is the clean formulation of this situation? AxelBoldt 00:17 Dec 13, 2002 (UTC)

For all the textbooks and articles on matrix theory I have ever read, this is the one and only one article uses the term singular vector!!! Wshun

Well, Google gives about 3000 hits for the phrase "singular vector", and most of them seem to refer to the columns/rows of the matrices in the singular value decomposition. So the term is apparently in use, it's just that the definition is not quite clear. I restored some of the removed material about them, and also mentioned the fact that singular value deompositions always exist. AxelBoldt 19:30 Apr 6, 2003 (UTC)

Singular vector is standard terminology. Its good enough for Strang, Golub and for Hansen, so its good enough for me Billlion 14:21, 3 Sep 2004 (UTC)

Why directing the article Singular value to Singular value decomposition? It is as bizarre as directing prime number to prime number theorem, or calculus to the fundamental theorem of calculus. It is better to reverse the direction. In my opinion, the two articles should not be combined together! Wshun

I would be happy for the content of s-number to go to into this one, I think the article should cover operator case Billlion 08:23, 17 Sep 2004 (UTC)

Response to Wshun

Singular vector is indeed standard terminology. See Gilbert Strang's books. Michael Hardy 14:54, 3 Sep 2004 (UTC)

Algorithms please

How does one compute the SVD in practice? Lupin 13:19, 5 Feb 2005 (UTC)

See the article

'LAPACK users manual (http://www.netlib.org/lapack/lug/node53.html) gives details of subroutines to calculate the SVD.'

The numerical algorithms are explained and referenced there.In Matlab/Octave is easy, just use the svd command, or in Mathematica SingularValueDecomposition,Billlion 19:45, 5 Feb 2005 (UTC)

I tried to understand the methods used but on the very first algorithm they use ("reduced bidiagonal form") I can find very little explanatory material, anywhere. Yet, I have noticed that most of the source code I have come by tend to use this undocumented technique. --ANONYMOUS COWARD0xC0DE 02:12, 27 November 2006 (UTC)

I'm not sure what "reduced bidiagonal form" is; are you talking about computing the SVD by reducing the matrix to bidiagonal form? One reference for the Golub–Kahan algorithm is Golub and Van Loan, Matrix computations, John Hopkins University Press, Section 8.6, "Computing the SVD". This is a standard reference for numerical linear algebra. -- Jitse Niesen (talk) 03:17, 27 November 2006 (UTC)

Yes that's exactly what I am talking about. Thank you for your reference I just ordered it at my library (ISBN:0801854148). --ANONYMOUS COWARD0xC0DE 05:23, 28 November 2006 (UTC)

Ok I implemented the house and bidiagonalization functions (algorithm 5.1.1 and 5.4.2 respectivly) and these are the results I have obtained

\operatorname {house} \left({\begin{bmatrix}1\\4\\7\\10\end{bmatrix}}\right)={\begin{bmatrix}1\\.310\\.543\\.776\end{bmatrix}}

\operatorname {householderbidiagonalization} \left({\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\\10&11&12\end{bmatrix}}\right)={\begin{bmatrix}11.5&9.39&0.0967\\0.0264&-0.510&0.995\\0.0461&0.681&1.05\\0.0659&0.404&-0.216\end{bmatrix}}

. I was expecting the Bidiagonal matrix B and my results are definitly not bidiagonal. I have further concerns regarding algorithm 5.4.2; the example results don't seem to hold the property U⋅B⋅V^T=A as the first column of A ends up negative. Could someone verify that these are the correct results. --ANONYMOUS COWARD0xC0DE 00:47, 2 January 2007 (UTC)

Well, it seems I will have to ask how to compute the SVD elsewhere, when I find my answers I will post back here. --ANONYMOUS COWARD0xC0DE 07:07, 5 February 2007 (UTC)

M={\begin{bmatrix}-5&3\\0&8\\-1&5\end{bmatrix}}

M^{\top }M={\begin{bmatrix}26&-20\\-20&98\end{bmatrix}}

M\cdot M^{\top }={\begin{bmatrix}34&24&20\\24&64&40\\20&40&26\end{bmatrix}}

\Sigma ={\begin{bmatrix}10.16&0\\0&4.563\\0&0\end{bmatrix}}

eigenvector(M^{\top }M)={\begin{bmatrix}-.2508&.968\\.968&.2508\end{bmatrix}}=V

eigenvector(M\cdot M^{\top })={\begin{bmatrix}.4094&.8959&-.1726\\.7624&-.4398&-.4747\\.5012&-.06272&.8631\end{bmatrix}}{\overset {?}{=}}U

First you get the eigenvector of mTm *the smaller matrix* and assign it to V. In this setting U and V are orthogonal however

U\cdot \Sigma \cdot V^{\top }={\begin{bmatrix}2.914&5.052\\-3.885&6.995\\-1.554&4.857\end{bmatrix}}

is not even close. So instead you calculate U via

U=M\cdot V\cdot \Sigma ^{-1}

where

\Sigma ^{-1}=\Sigma ^{\top }

followed by inverting each element on the diagonal giving

\Sigma ^{-1}={\begin{bmatrix}{\frac {1}{10.16}}&0&0\\0&{\frac {1}{4.563}}&0\end{bmatrix}}

U={\begin{bmatrix}-5&3\\0&8\\-1&5\end{bmatrix}}{\begin{bmatrix}-.2508&.968\\.968&.2508\end{bmatrix}}{\begin{bmatrix}{\frac {1}{10.16}}&0&0\\0&{\frac {1}{4.563}}&0\end{bmatrix}}={\begin{bmatrix}.4093&-.8958&0\\.7622&.4397&0\\.5011&.06268&0\end{bmatrix}}

Thats the simplest way to solve an M-by-N matrix where M>=N, and I believe if N>=M you can use

V=M\cdot \Sigma ^{-1}\cdot U^{\top }

haven't tried it yet. However U is not orthogonal in my example, and I don't know how to correct that. My method seems to be the same as that used by the $50 .net matrix library, for example input my 3-by-2 matrix M into their free trial interface, it will cut off the offending columns in U, the dimensions of U would be 3-by-2 instead of 3-by-3 in violation of the orthogonality property. Thus I am left with just one question, how do I assign the appropriate numbers/values (the exact values in the columns are irrelevant as their eigenvalues are zero) to the remaining columns of a matrix to fulfill the orthogonality property? --ANONYMOUS COWARD0xC0DE 23:09, 27 November 2006 (UTC)

Dear AC, the singular vectors u you get by calculating the eigenvectors of M M* may each be +1 or -1 times the singular vectors u_v corresponding to the eigenvectors v of M* M. (Can you see that this is so?)

More generally, if there are degenerate singular vectors v, ie more than one sharing the same singular value, then an SV u that you get from M M* may be any unit linear combination of the vectors u_v that correspond to the vectors v that share that singular value. (See the section in the article on whether the SVD is uniquely determined).

The way round this, if you are calculating the SVD by hand, is to calculate the vectors v, then multiply each by M; and then σ_v = the magnitude of M v, and u_v = (1/σ)M v. That will ensure you get left and right singular vectors that match. Jheald 09:48, 4 December 2006 (UTC)

For the null space of M you can use any set of orthonormal u vectors that correspond to zero eigenvalues of M M*; you can find v vectors (if any more are needed) by finding the eigenvectors that correspond to zero eigenvalues of M* M. Alternatively, you can find either set of zero singular vectors by orthogonalising random vectors with respect to the vectors you've already got (u or v respectively), until you've successfully found a full basis. Jheald 10:07, 4 December 2006 (UTC)

I have noticed the {+1,-1} pattern. I also tried

u=(1/\sigma )Mv

and it works perfectly and is equivalent to my method of

U=M\cdot V\cdot \Sigma ^{-1}

where

\Sigma ^{-1}=\Sigma ^{\top }

followed by inverting each element on the diagonal (awkward notation, but equivalent). As for "orthogonalising random vectors [...] until you've successfully found a full basis" I was hoping for something that was more definite (in other words purely in complexity class P and not ZPP).--ANONYMOUS COWARD0xC0DE 06:36, 25 December 2006 (UTC)

I found a temporary solution to my problem that I should have seen earlier. The QR decomposition. Take the orthogonal vectors I have and then preform a QR decomposition on the vector matrix and replace the vector matrix with the resulting Q matrix. The QR decomposition will maintain the currently orthogonal vectors and further see that the entire vector matrix is orthogonal. But I would still like to see a more "main stream" way of computing the full svd. --ANONYMOUS COWARD0xC0DE 07:07, 5 February 2007 (UTC)

Also regarding your statement "if you are calculating the SVD by hand"; In my mind I am not capable of doing it any other way but the algorithmic way, when I think "do it by hand", I see no difference from computing it on a computer library package (besides being on paper and far slower). --ANONYMOUS COWARD0xC0DE 07:07, 5 February 2007 (UTC)

Well, I added a bit about "By hand" calculation, but then Mct mht removed it, stating that it's already in the article, so apparently he's seing something I'm not... Sim 02:51, 30 April 2007 (UTC)

I guess Mct mht is referring to the section Singular value decomposition#Linear algebraic proof. I think the bit you added goes too much into detail given that section, but there should probably be a bit about the method in the computation section. -- Jitse Niesen (talk) 04:45, 30 April 2007 (UTC)

Ah. Okay. Thanks for the heads-up to the two of you. :) Sim 03:13, 2 May 2007 (UTC)

Singular values, singular vectors, and their relation to the SVD

The first math expression in this section appears in the code but not the page, at least on my screen. I don't know enough about the expression mark-up to fix it. proteus71 17:22, 2 Nov 2005 (UTC)

Could you please check whether it is still not appearing? It appears on my screen. It might have something to do with the servers being slow today (probably because they're overloaded). -- Jitse Niesen (talk) 22:30, 2 November 2005 (UTC)

Freedom in U and V

I removed the following text:

"If M is real, and the singular values are all distinct, then unique orthogonal matrices U and V^T can be obtained. Otherwise U and V can be transformed by any unitary matrix S that commutes with Σ, such that U' = U S and (V')* = S* V*."

It is always possible to replace U and V by −U and −V, respectively. For the second part to be true, we need n = m.

Obviously, there is something to be said about the freedom in U and V. There is some discussion about it further down, but there are probably things to add. -- Jitse Niesen (talk) 00:06, 14 January 2006 (UTC)

Recent changes

In the intro, I changed 'is analogous' (to eigenvalue decomposition) to 'is similar', and added some text which makes it clearer that the two decompositions are not too similar. Also, I dropped the square characterization of symmetric and Hermitian matrizes since such matrices are always square.

In the 'Singular values, singular vectors, and their relation to the SVD' section I changed to "if and only if" since this is a definition of what a singular value is. Also, we must require that u and v are normalized in order to talk about a singular value. Added some text about the relation between the theorem and the singular values and vectors defined here. Added a defintion of what degenerate signular value is. Tried to make it clearer in when and in what way the SVD is unique or not.

The discussion on the relation with eigenvalue decomposition is expanded and given a section of its own.

Added a discussion on the relation between SVD and the range and null space of M. Changed singular value = sqrt(eigenvalue) to singular value squared = eigenvalue.

Added a discussion on range and null space in "applications" section.

Moved small section on rank to applications, since it is an application and it follows from the range and null space stuff.

--KYN 20:28, 16 January 2006 (UTC)

Great stuff. I tried to further clarify the uniqueness question; hope that's okay with you. Two further comments: firstly, it is quite common for mathematicians to use "if" in definitions where it should logically be "if and only if" (see, for instance, If and only if#Definitions); secondly, out of curiosity, why must u and v be normalized? I checked a couple of books and they all defined the singular vectors as columns of the matrices U and V, which means that they are indeed of unit length, but I don't see a particular reason to demand that the singular vectors are normalized. -- Jitse Niesen (talk) 22:07, 17 January 2006 (UTC)

You want the singular vectors to be normalised because (i) that's what makes them useful as unit basis vectors; and (ii) it's only if you normalise the singular vectors that you get the singular values coming out as well-defined gain factors in the Σ part of the decomposition. -- Jheald 22:23, 17 January 2006 (UTC).

Could you please explain (ii)? I don't see it. Sorry to be such a nuisance, but I'm just trying to understand why eigenvectors are not normalized while singular vectors are. -- Jitse Niesen (talk) 22:44, 17 January 2006 (UTC)

Eigenvectors don't need to be normalised if the only property that you're interested in is A v = λ v. That is often enough the case that often enough you can work quite happily with un-normalised eigenvectors. But if you're using the eigenvectors for a spectral decomposition, A = V Λ V*, then those column vectors in V of course do need to be normalised.

The difference with SVD is that it is essentially always the property M = U Σ V* that makes SVD practically useful, and for that the singular vectors need to be normalised; so it is essentially only normalised singular vectors that are ever of interest. It is of course true that M (k v) would still equal σ (k u), but this relation is not what makes the SVD useful. -- Jheald 23:27, 17 January 2006 (UTC).

I agree and add to that my copy of Golub and Van Loan Matrix Computations (2nd ed, page 71) states that u and v are "unit 2-norm vectors". To make the above point even more clear: Assume that M v=sigma u and M^{\star} u=sigma v for normalized u and v. Now, look at the same relations with u replaced by 2u and sigma by sigma': M v = 2 sigma' u and M^{\star} 2 u = sigma' v. Identifying with the first relations gives (1) sigma = 2 sigma' and (2) sigma = sigma'/2, which implies sigma = sigma' = 0! Consequently, we need to impose some restrictions on u and v in order to make an interpretation of the two relations. This fact is not obvious from the outset, and maybe worth mentioning somewhere in the article? --KYN 10:08, 18 January 2006 (UTC)

In reply to Jheald, I kind of see your point, but the relation Mv = σu is important for the geometric interpretation of the SVD, which was historically the motivation. However, I'll accept this as the reason for the normalization condition.

In reply to KYN, what you showed is that if you replace u by 2u and sigma is not zero, the relations M v=sigma u and M^{\star} u=sigma v cannot be satisfied, unless you replace v by 2v. This is also what Jheald means with his/her last sentence: you have to scale u and v by the same factor. If you do that, sigma does not change. -- Jitse Niesen (talk) 12:23, 18 January 2006 (UTC)

In "Statement of the theorem" section

it says in the bullet list that U and V "describe" the row or columns of M. This sound a bit fuzzy. What exactly do they describe and in what way? --KYN 22:45, 16 January 2006 (UTC)

Usage of &lowast;-symbol

Hi, I just have seen, that the &lowast;-symbol, especially "<sup>&lowast;</sup>" as used in the paragraph "Relation to eigenvalue decomposition" is not rendered correctly in Opera8 and IE (because of the font I've chosen?) while being correctly shown in FF. Is there a reason, why not just use a simple asterisk instead of the &lowast;? I mean, the asterisk is used anywhere else for this purpose. --(Tobi)134.109.148.28 14:53, 21 March 2006 (UTC)

I think the &lowast; symbol looks a bit nicer, but it's not essential. And because it does not display correctly on some systems, it has to go, so I removed it. Thanks for your comment. -- Jitse Niesen (talk) 05:18, 22 March 2006 (UTC)

How to compute compact or truncated SVD?

Is there a way to compute SVD eigenvectors in a sequence? That is to say, first compute the eigenvector for the largest eigenvalue, then compute the eigenvector for the second largest eigenvalue, etc. 04 March 2006

Yes, this is possible. The idea is that the singular values of A are related to the eigenvalues of

A^{*}A

and apply an iterative eigenvalue algorithm like Lanczos iteration to find the eigenvalues of the symmetric matrix

A^{*}A

. I'm afraid I can't tell you the details though. One problem is that you might lose accuracy when forming the product

A^{*}A

. I note that Matlab uses the matrix

{\begin{bmatrix}0&A\\A^{*}&0\end{bmatrix}}

instead. -- Jitse Niesen (talk) 04:43, 5 April 2006 (UTC)

Q: Thanks for the answer. But as I understand, Lanczos iteration can only find the eigenvector corresponding to the largest eigenvalue. After the largest eigenvector being computed, how is the eigenvector corresponding to the second largest eigenvalue computed? I appreciate if you give a specific reference. Apr 05 2006

Sorry, I forgot that you're after the vectors. It seems that Lanczos can do this, at least this is what the theorem in Section VI.6.3 in Y. Saad, Numerical methods for large eigenvalue problems, indicates. I know only very little about large-scale numerical linear algebra, so I can only recommend you to look in that book (which is online at Saad's web site). Another interesting reference is Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, 1980. I haven't read it, but it's a whole book so it's bound to touch on computing the eigenvectors, though it might be a bit out-dated. -- Jitse Niesen (talk) 02:06, 6 April 2006 (UTC)

One way into getting your head around the Lanczos method is to imagine using the Conjugate gradient method to minimise the potential function

{\frac {1}{2}}x^{T}Ax-x^{T}b

for some known positive definite symmetrical matrix A, starting from an initial position x = x₀, and then iteratively getting closer to the solution x = A^-1 b.

As a reminder, at each step of conjugate gradient we minimise the function in a particular search direction; then at the minimum point for that search direction, we calculate the negative of the gradient, which is just b - A x_n. This gives a new direction, orthogonal to the subspace we have just minimised. However, although the negative of the gradient gives the direction of steepest descent, it can be shown that it is actually suboptimal in its component in the direction we have just come: the optimal search direction is the conjugate direction, which is a linear combination of the new gradient and direction we have examined immediately previously.

By searching along the conjugate direction, rather than the steepest descent direction, we don't botch up the minimisation w.r.t. the previous search directions, so there is none of the "zig-zagging" that you get using pure gradient descent; instead by the end of the nth step, the Conjugate gradient algorithm should have accurately minimised the quadratic function in the space spanned by

[Ax_{0},A^{2}x_{0},A^{3}x_{0},...A^{n}x_{0}]

added to the initial guess x₀.

The connection with the Lanczos algorithm is this: suppose we want to give an estimate of the matrix A, just from that sequence of gradients we have calculated. The conjugate gradient property is equivalent to saying that the truncation of A in the Krylov space

[Ax_{0},A^{2}x_{0},...A^{n}x_{0}]

can be represented by

UTU^{T}

, where T is a symmetric tridiagonal matrix (ie a diagonal and one off-diagonal, repeated above and below), and U is the (normalised) set of (already orthogonal) gradient directions that we have progressively calculated.

It is quick to eigen-decompose T into V Λ V^T as it's (i) not very big (usually we will stop after a number of iterations n which is very much less big than the whole rank of A), and (ii) already tridiagonalised.

So this quite readily gives the eigen-decomposition of the truncation of A in this Krylov space.

Note that this is not the same as the first n eigenvalues and eigenvectors of A. However, because of the wonders of exponential increase, it should give you quite a good handle on, say, the first n/4 eigenvalues and eigenvectors of A, even if they are only hardly present in your initial seed vector x₀. (The more different the largest eigenvalues are from the rest of the pack, the more quickly more of their eigenvectors will become well approximated).

For more details on Conjugate gradient, and how it relates to the Lanczos method, see eg Golub and van Loan (3e) section 10.2 (pp. 520-532). A more direct presentation of the Lanczos algorithm by itself is given earlier in chapter 9 (pp.470-490).

Note that in real-world numerics (as opposed to idealised infinite precision numerics), issues arise, essentially because the minimisation can't be done perfectly, so the new gradient vectors aren't automatically perfectly orthogonalised to the foregoing ones, and if left untreated this creates a seed which can allow duplicate appearances of particular eigenvectors to build up. The numerical analysis of how this occurs has been well known since the early 70s; various fixes (of various levels of computational expense) are known, and discussed in the book.

It's well worth playing with simple codes to get your own feel for what's going on; but if you need to apply it really seriously, for that reason it may well be worth searching out pre-canned routines which have been tuned into the ground both for accuracy and for efficiency.

Hope this helps,

All best -- Jheald 20:55, 6 April 2006 (UTC).

BTW: At the moment the WP articles on Lanczos algorithm, Gradient descent, and Conjugate gradient method could all use some work IMO. In particular

the Gradient descent article should make much clearer what is wrong with Gradient descent as a method, and why Conjugate gradient fixes it.
the CG article isn't so bad so far as it goes, but should make it much clearer that CG is a method for general nonlinear minimisation, not just for solving linear systems; more clearly explain (with a picture?) why it fixes the problems of gradient descent; and explain how the approximation to the curvature (Hessian) matrix can be built up through successive CG minimisations.
the Lanczos method article could use a lot more expansion, particularly regarding explanation, motivation, some of the numerical issues that arise, and how to address them.

I can't offer any time of my own at the moment to fix them, sadly; but it would be rather good if somebody could have a go. -- Jheald 21:13, 6 April 2006 (UTC).

BTW (2): The Lanczos method is particularly good if either (i) you're only interested in the largest k eigenvectors, where k << n the size of the matrix; and/or (ii) if it is unusually fast to apply the matrix (or operator) A to a vector (e.g. if A is sparse; or if it is (say) a Fourier operator).

In cases where you just want a quicker SVD, eg because you know the matrix is very non square (m >> n), and you're not interested in the null-space singular vectors, then, as the article says, you can save a lot of time by doing a QR decomposition first, to give an m × n orthogonal Q matrix and an n × n triangular matrix R, and then calculating the much smaller SVD of R (an approach called R-SVD). Also note that often (including in R-SVD), it is not worth explicitly calculating the coefficients of the orthogonal matrices; instead, don't calculate them at all if you don't need them; or, if you're just calculating them only then to apply them to a few vectors, instead apply the elementary rotations directly to those vectors as the rotations get calculated, and never explicitly form the orthogonal matrices at all.

Most "canned" SVD routines should give you some or all of these options. Golub and van Loan (3e) gives an idea of how much the required flop counts can be reduced, depending on how much of the SVD you don't actually need, on their page 254. -- Jheald 21:34, 6 April 2006 (UTC).

Possible errata in "Relation to eigenvalue decomposition"

I think instead of "the squares of the non-zero singular values of M" it should say "the squares of the modulus of the non-zero singular values of M". Don't you think so?

singular values are by definition non-negative, therefore real. Mct mht 17:40, 20 June 2006 (UTC)

Dense article

I find this article overly dense. When trying to understand SVD it was unhelpful. For example compare with http://www.uwlax.edu/faculty/will/svd/svd/index.html, where the intuition behind the matrix decomposition is presented using a traditional graphic form.

Request: Non - mathematician understanding of SVD

Can someone please try to explain in the article , to the non-mathematician, what is SVD, what is the meaning SVD, and what is it good for? thanks.

69.107.60.124 21:06, 28 January 2007 (UTC) Dear Anonymous... I share your sentiment. Many articles on Wikipedia, while mathematically brilliant, are usually of less value to the general public using them. An artical such as the one found at http://www.kwon3d.com/theory/jkinem/svd.html is much more valuable on the topic to the public at large IMO. It generally takes a majority of the people many many reads at such articals... I feel these articles on Wikipedia are tailored for the < 5% of the population.

I added an example demonstrating the form and properties of the three matrices. While this does not fully address the request, I think that it helps. --Diomidis Spinellis 16:22, 29 January 2007 (UTC)

Losing the intent of Wiki

The authors of the math sections on Wiki need to quickly realize that wiki isnt meant just for them and for the actual majority of the public. In writing artcles so densly, it doesent matter how correct the article is, it just comes accross as gibberish. These authors would do the wiki community much better if they refrained from writing such useless explanations on topics in the first place. These guys are really doing more harm than helping... If they want to write a PhD thesis, please have them do it elsewhere. —The preceding unsigned comment was added by 69.28.107.2 (talk) 21:58, 19 February 2007 (UTC).

The author of the above needs to realize that vague whinging is not constructive, nor is promoting his/her own favorite article, putting a link at the top of the list and even having the guts to call it a better explanation while it is in fact equally dense and clearly incomplete. Specific suggestions are welcome, attempts to improve the article are even better, and I do hope that the author of the above decides to become more constructive. -- Jitse Niesen (talk) 00:05, 20 February 2007 (UTC)

Problems with "Linear Algebraic Proof"

Please pardon the LaTeX...

\begin{document}
The ``Linear Algebraic Proof'' does not specify dimensions.  Assuming
consistency -- with what appears previously in the article --
$M$ is $m$-by-$n$, hence $M^{\ast}M$ is $n$-by-$n$ and $U$ is $n$-by-$n$
(this already conflicts with the dimensions of $U$ given in the
``Statement of the theorem'', but let's ignore that for now).  
Consequently,
$$
U^{\ast} M^{\ast} M U = 
\left( \begin{array}{cc} \Sigma & 0 \\ 0 & 0\end{array} \right)
$$
is $n$-by-$n$.  However, $\Sigma$ is supposed to be $m$-by-$n$ which 
means that the partitioned matrix
$$
\left( \begin{array}{cc} \Sigma & 0 \\ 0 & 0\end{array} \right)
$$
is actually 
$$
\left( \begin{array}{c} \Sigma \\ 0 \end{array} \right)
$$
It does not help to begin with $M M^{\ast}$, since the partitioned 
matrix 
$$
\left( \begin{array}{cc} \Sigma & 0 \\ 0 & 0\end{array} \right)
$$
in that case must actually be
$$
\left( \begin{array}{cc} \Sigma & 0 \end{array} \right)
$$
Perhaps it is {\em impossible\/} to present a ``Linear Algebraic Proof''
that is consistent with previous notation... but then alternate
notation should be used (and dimensions should be given explicitly)
so that the ``Proof'' actually makes sense!  Moreover, a real proof
would most likely need to consider cases ($n \le m$ and $n > m$).
\end{document}

—The preceding unsigned comment was added by 160.36.59.7 (talk) 17:03, 19 January 2007 (UTC).

The matrices U and Σ in "Linear Algebraic Proof" are not the same as the matrices U and Σ in the statement of the theorem. This is rather confusing, so I renamed the variables in the proof. I don't think it's necessary to give the dimensions of the matrices, but I did add the dimensions in one place to be absolutely clear. A quick check didn't show me where the proof needs to distinguish cases. -- Jitse Niesen (talk) 12:28, 26 January 2007 (UTC)

thanks, Jitse. to anon: the argument is pretty standard. the matrix dimensions should be clear by inspection. starting with MM*, the same argument goes through with appropriate (in fact very little) modification. Mct mht 13:07, 26 January 2007 (UTC)

question about the number of zeroes

What does the number of zeroes show after computing the SVD for a matrix? I mean comparing 2 matrixes with the same size what could we understand from comparing the number of zeroes in there SVD matrix?Amirshahi62 15:09, 1 March 2007 (UTC)

if you talking about the diagonal matrix D where A = UDV, then the number of nonzero elements on the diagonal of D is the rank of A, because D² = UAA*U* (say) means range D = range UA. Mct mht 18:19, 1 March 2007 (UTC)

add PROPACK SVD implementation

please add a link to PROPACK sparse SVD implementation:

http://soi.stanford.edu/~rmunk/PROPACK/index.html

it's actually ONLY implemenation that i've managed to get working on large sparse matrices. both SVDPACK and SVDPACKC are terribly written, they barely work on their test matrices (die with SEGFAULT). —The preceding unsigned comment was added by 195.5.55.14 (talk) 15:31, 2 March 2007 (UTC).

Looks okay, so I added the link. -- Jitse Niesen (talk) 03:27, 3 March 2007 (UTC)

This page is quite technical

I also agree-- this page could use some examples. I'm just interested in taking the svd of a matrix for an engineering problem, and all the theorems here, though nice, don't help me any. This page is also frequently visited, and i think most visitors are looking for something simpler than this. Apologies for posting in potentially the wrong place-- first post. 129.170.66.44 (talk) 04:11, 12 March 2008 (UTC)

I agree, at least in principle with the poster who remarked earlier about this page being too dense and technical. Maybe that's unavoidable in this subject, but I do think it's worth at least looking at this page with a serious eye and asking "Is there any way that the reach of this page can be broadened, is there anyway the usefulness of it can be increased to the average person?" . Even just saying how mathmaticians and engineers use it in the introduction would help a lot. Maybe moving the history section to the top as well. FrozenPurpleCube 15:21, 14 March 2007 (UTC)

This is a topic that can be explained to people just learning some linear algebra, so it should certainly be made accessible. Convincing college freshmen/sophomores of the utility of something like this is a different story however :-). Anyway, the introduction seems quite weak in terms of context and content. So the tag seems warranted. I'll take a better look some time. --C S (Talk) 12:38, 20 March 2007 (UTC)

More explicitly if

\ \{s_{1},...,s_{p}\}

are the eigenvalues of

M^{*}M.\!

then S is a m-by-n matrix with p non-zero diagonal elements

S={\begin{bmatrix}{\sqrt {s}}_{1}&0&\ &...&\ &\ \\0&{\sqrt {s}}_{2}&0&...&\ &...\\\vdots &0&\ddots \\\ &\vdots &\ &{\sqrt {s}}_{p}&0&...\\\ &\ &\ &0&0&\ \\&\ &&\vdots &&\ddots \\\end{bmatrix}}

and

V={\begin{bmatrix}v_{1}&v_{2}&v_{3}&...&v_{p}&v_{p+1}&...&v_{n}\end{bmatrix}}

The SVD via the eigenvalues of $M*M$

Here is a version of the section The SVD via the eigenvalues of $M*M$

The approach of computing the SVD via the eigenvalues of $M*M$ is correct from a mathematical point of view. But is it really as fast and accurate compared to dedicated methods such as those presented in Golub & van Loan? --KYN 14:41, 18 July 2007 (UTC)

not a numerical analyst, but i would guess answer is no. that section is also redundant, perhaps it can be removed all together. Mct mht 22:43, 18 July 2007 (UTC)

I also propose to move the Application section (now at #13) to an earlier position, perhaps after the formal definition has been presented. Right now the article appears too technical for the reader not familiar with the concept. --KYN 14:46, 18 July 2007 (UTC)

---

The application section has now been moved to position 4. The remaining issue is what to do with the section "The SVD via the eigenvalues of $M*M$ ". I propose to remove it entirely because

It does not provide a straightforward computational approach since it relies on eigenvalue decompositions, so it remains to answer how to accomplish such a decomposition.

It appears to be significantly slower and probably of less numerical accuracy compared to a dedicated SVD algorithm.

--KYN 09:43, 31 July 2007 (UTC)

Since there has been no support for keeping the section on computing SVD via eigenvalue decomposition I have removed it. If someone want to put it back in, please provide justification in accordance to this discussion.

I am starting to look at the section "Linear algebraic proof" with a similar view. Although formally correct, the "proof" makes a strong reference to the spectral theorem. However, I don't think that proving the spectral theorem is any easier than proving the spectral decomposition using a variational method. Please correct me if I'm wrong. A proper proof must base statements on "simpler" facts rather than move laterally. Proposal: remove also section "Linear algebraic proof". Alternatively, start a new article on the various connections/relations between SVD and EVD which exist, without assuming that EVD is more basic than SVD. --KYN 16:05, 6 August 2007 (UTC)

the proof is a standard one (i doubt you can find a mathematician who disgrees) and it contains essentially the relationship between eigenvalue decomposition and SVD. anything else one would wanna say in this regard is a trivial consequence (e.g. the statements in the "Relation to eigenvalue decomposition" and "Singular values, singular vectors, and their relation to the SVD" sections). but i for one am not going to object if you wanna remove it, as the article is getting long-ish.

as an aside, the two proofs in the present version present a nice contrast: one algebraic while the other analytic. we also see how they generalize to bounded Hilbert space operators somewhere below. your comment on what constitutes a "proper" proof aside, which is very wrong (sry, :-) ), the spectral theorem for Hermitian matrices follows easily from the fact that every matrix has an eigenvector. Mct mht 16:56, 6 August 2007 (UTC)

I'm for keeping the algebraic proof. its reliance on the eigenvalue decomposition is really good an many more people are familiar with eigenvector decomposition. One of the fun bits is to take an maximizing problem and solve it with an algebraic expression. So to see the proof from the algebraic perspective is nice .

I'm also for putting back the computing SVD via eigenvalue decomposition section. (I have to own up to being the author) The reasons are

we need to give at least one algorithim not just links to libraries of code so people can have a crack themselves .
It is a simple method to calculate the SVD given an EVD (there are many algorithms for calculation the EVD given in wikipedia)
its and speed accuracy are tied to the speed and accuracy of the EVD and the multiplication of M.V1.Dinv.

About the proof I'm not totally against the so called algebraic proof and I agree that it illustrates some interesting relations between SVD and EVD. My objection is simply that it suggests that EVD is simpler or more general or basic than SVD is. Of course, this is true for some readers but not necessary for all. I notice that at least two books on the subject; Golub & van Loan and Numerical Recipes describe SVD rather early and EVD toward the end, which to me suggests that at least these authors prefer to present it the other way around; SVD is the general approach which works for all matrices and EVD is a special case which only works for normal matrices. I propose to at least change the heading from "Algebraic proof" to something like "Based on the spectral theorem". This goes more along the lines just before that heading which refer to the two "proofs" as "arguments".

About the section on computing the SVD via EVD. I must confess that this topic is slightly outside my expertise, but I'm guessing that the relevance of computing SVD via EVD decreases with the size of the matrix. Please correct me if I wrong here. For example, I can imagine that when the round-off errors, which are inevitable when a large matrix is multiplied by its own transpose, is of the same order of magnitude as the smaller singular values there may be problems. What we could say in the article is that if you you want to compute SVD and don't have a dedicated algorithm for that BUT you do have one for EVD or are prepared to implement one (instead of implementing a dedicated SVD algorithm), then you can find SVD along the lines which were presented (and now removed). What we should not do, is to suggest that this approach is a standard algorithm for computing SVD. --KYN 11:42, 10 August 2007 (UTC)

The algebraic proof is useful in that it shows the relationship with the eigenvalue decomposition of M*M. I think that most people are more familiar with eigenvalues than singular values. However, the relationship SVD/EVD is already (though in less detail) explained earlier in the article, and wonder how much the proof adds.

Calculating the SVD via the EVD is not a good algorithm and I added that to the article. The problem is that the EVD may be ill-conditioned even though the SVD is not. For those that nevertheless want to implement the algorithm, the explanation of the relation between SVD and EVD in the article should suffice.

Incidentally, you can compute the SVD stably via the EVD, but you have to use the relation between the SVD of M and the EVD of

{\begin{bmatrix}0&M\\M^{*}&0\end{bmatrix}}.

The obvious problem is that this matrix is bigger than M. Apparently, the standard methods for computing the SVD are somehow related to this, but they don't explicitly construct this matrix. However, I don't know the details. -- Jitse Niesen (talk) 13:18, 10 August 2007 (UTC)

History

I am not a mathematician, so I do not want to edit the page directly, but it occurred to me that the History section does not seem to be quite correct. It ends thus:

 Practical methods for computing the SVD were unknown until 1965 when Gene Golub and William Kahan published their algorithm.
 In 1970, Golub and Christian Reinsch published a variant of the Golub/Kahan algorithm that is still the one most-used today.

Did not Cornelius Lanczos publish his algorithm for what has later been called SVD in 1950? And is his algorithm not the most-used one today? (For instance, it is the only algorithm implemented in svdlibc.) -Peachris (talk) 23:47, 10 December 2007 (UTC)

P.S. Here is the reference: Lanczos, C. (1951). An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators. Journal of research of the National Bureau of Standards, 45, 255-282. -Peachris (talk) 19:43, 11 December 2007 (UTC)

Hmmm, I don't know. I thought that paper introduced the Lanczos algorithm for finding eigenvalues. Are singular values (possibly under another name) discussed in that paper? Golub and Kahan mention in their paper that the Lanczos algorithm can be used for computing singular values, so perhaps this idea was already floating around. However, the first reference I know of that discusses it in detail is Paige, Bidiagonalization of Matrices and Solution of Linear Equations, SIAM J. Numer. Anal., 1974.

I think that the Lanczos algorithm is not suitable for computing the full SVD. It seems to be very useful for the computation of the largest few singular values of a sparse matrix (which is a fairly common problem in practice), but the smaller singular values converge relatively slowly. See for instance the documentation of SVDPACK, on which svdlibc is based.

I should stress that numerical linear algebra is not my specialization and that I may well be wrong. -- Jitse Niesen (talk) 18:37, 16 December 2007 (UTC)

Thank you Jitse, your comments were very helpful and convincing. As a practitioner, I only need to perform truncated SVD on sparse matrices, which explains why I have not needed anything else than the Lanczos algorithm as implemented in svdpack/svdpackc/svdlibc. Thanks also for the link to the svdpack documentation – it is more helpful than the svdpackc documentation I was aware of. I'm afraid I cannot help explaining Lanczos' 1951 paper – it goes way above my head. It does not seem to mention singular value decomposition, but of course he might use another term for the same thing. -Peachris (talk) 01:54, 19 December 2007 (UTC)