Kind of matrix
In mathematics and multivariate statistics, the centering matrix[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.
Definition[edit]
The centering matrix of size n is defined as the n-by-n matrix
![{\displaystyle C_{n}=I_{n}-{\tfrac {1}{n}}J_{n}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bbb03ad63e536cfd637cf7f8c6f966f61b199194)
where
is the identity matrix of size n and
is an n-by-n matrix of all 1's.
For example
,
,
![{\displaystyle C_{3}=\left[{\begin{array}{rrr}1&0&0\\0&1&0\\0&0&1\end{array}}\right]-{\frac {1}{3}}\left[{\begin{array}{rrr}1&1&1\\1&1&1\\1&1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {2}{3}}&-{\frac {1}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&{\frac {2}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&-{\frac {1}{3}}&{\frac {2}{3}}\end{array}}\right]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a280df1cf0b14b46e0f24b487f0c0f0dc22269a3)
Properties[edit]
Given a column-vector,
of size n, the centering property of
can be expressed as
![{\displaystyle C_{n}\,\mathbf {v} =\mathbf {v} -({\tfrac {1}{n}}J_{n,1}^{\textrm {T}}\mathbf {v} )J_{n,1}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c16ea55745fb01fb2ede7f6098a1caf0c11af37f)
where
is a column vector of ones and
is the mean of the components of
.
is symmetric positive semi-definite.
is idempotent, so that
, for
. Once the mean has been removed, it is zero and removing it again has no effect.
is singular. The effects of applying the transformation
cannot be reversed.
has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.
has a nullspace of dimension 1, along the vector
.
is an orthogonal projection matrix. That is,
is a projection of
onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace
. (This is the subspace of all n-vectors whose components sum to zero.)
The trace of
is
.
Application[edit]
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix
.
The left multiplication by
subtracts a corresponding mean value from each of the n columns, so that each column of the product
has a zero mean. Similarly, the multiplication by
on the right subtracts a corresponding mean value from each of the m rows, and each row of the product
has a zero mean.
The multiplication on both sides creates a doubly centred matrix
, whose row and column means are equal to zero.
The centering matrix provides in particular a succinct way to express the scatter matrix,
of a data sample
, where
is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as
![{\displaystyle S=X\,C_{n}(X\,C_{n})^{\mathrm {T} }=X\,C_{n}\,C_{n}\,X\,^{\mathrm {T} }=X\,C_{n}\,X\,^{\mathrm {T} }.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f48540634556e73366fd755b57d98e926e164d5b)
is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are
, and
.
References[edit]
- ^ John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.