Quantum-Inspired Stochastic Regression

Updated on December 23, 2018

Reference: https://arxiv.org/abs/1811.04909

Definitions and Assumptions

Let bCm and ACm×n s.t. A1 where signifies the operator norm (or spectral norm). Furthermore, require that rank(A)=k and A+κ where A+ is the pseudoinverse of A. Hence, observe that A1 is equivalent to A having maximum singular value 1 (to see this, simply consider Spectral Theorem applied to Hermitian matrix AA). Similarly, A+ has inverted singular values from A and so A+ is equal to the reciprocal of the minimum nonzero singular value. Therefore, the condition number of A is given by AA+=κA.

So, define x to be the least-squares solution to the linear system Ax=b i.e. x=A+b. Then, in terms of these definitions, we define two primary goals:

  • Query a vector x~ s.t. x~xϵx
  • Sample from a distribution that approximates |xj|2x2 within total variation distance 2ϵ.

In order to do this, we simply assume that we have length-square sampling access to A. In other words, we are able to sample row indices of A from the distribution Ai,2AF2

Sequence of Approximations

First, we’ll summarize the sequence of approximations that we’ll perform using length-squared sampling techniques. We’ll describe these steps in depth in the following sections.

Of course, we know that the least squares solution of the linear system is given by the orthogonal projection

(AA)+A=A+b

So, we first approximate AA by RR where RCr×n, rm is constructed from length-square sampling r rows of A. Now, denote the spectral decomposition

AARR=l=1k1σl2|v(l)v(l)|

where of course σi and |v(i)Cn are the singular values and right singular vectors of R, respectively.

We see that computing these right singular vectors of R can still be computationally prohibitive given the dimension n. Hence, we can use length-square sampling again, this time on the columns of R to give a matrix CCr×c, cn. Now, the left singular vectors of C which we denote as |w(i)Cr can be efficiently computed via standard SVD methods. So,

RRCC=l=1k1σl2|w(l)w(l)|

We can then show that ()

|v~(i):=R|w(l)/σ~l

provides a good approximation of |v(i). Note that σ~l are the singular values of C which then approximate the singular values of R which similarly approximate the singular values of A. This follows from AARR and RRCC by the Hoffman–Wielandt inequality detailed in Lemma 2.7 of Kanna and Vempala and stated without proof below.

Hoffman–Wielandt inequality

If P,Q are two real, symmetric n×n matrices and λ1,λn denote eigenvalues in non-decreasing order, then

t=1n(λt(P)λt(Q))2PQF2

At this point, it seems like we haven’t made much progress since computing R|w(l) is still expensive. However, it turns out that all we need to enable query access to x~ is the ability to efficiently estimate the trace inner product tr(UV) where U and V are operators such that U can be the length-square sampled and V can be queried. To see this, we write our solution, x~, in terms of the approximations thus far

x~A+|b(RR)+A|bl=1k1σ~l2|v~(l)v~(l)|A|b

Hence, define U:=A, V:=|bv~(l)| in which case

tr(UV)=tr(A|bv~(l)|)=tr(v~(l)|A|b)=v~(l)|A|b

since v~(l)|A|b is a scalar. Therefore, say that

λ~ltr(A|bv~(l)|)

and assume that we can compute and memoize these scalars λ~i efficiently. In which case,

x~l=1k1σ~l2|v~(l)λ~l

Recalling the definition of |v~(i),

=l=1k1σ~l3R|w(l)λ~l=Rl=1k1σ~l3|w(l)λ~l

and so defining z:=l=1k1σ~l3|w(l)λ~l,

=Rz

We see that we can compute z efficiently (and memoize it for future queries) because it is a k-linear combination of left singular vectors in Cr. So, say that we wish to query an element xj~. We can simply query column R,jCr (or equivalently row Rj,) and compute R,jz. Hence, we’ve achieved our first goal.

In order to achieve our second goal, enabling sample access to a distribution that approximates |xj|2x2, we require one more trick: rejection sampling which we detail in the QML Notes linked at the document end.

All in all, we’ve performed the chain of approximations,

|x=A+|b=(AA)+A|b(RR)+A|b=l=1k1σ~l2|v(l)v(l)|A|bl=1k1σ~l2|v~(l)v~(l)|A|b(|v~(l):=R|w(l))l=1k1σ~l2|v~(l)λ~l=Rl=1k1σ~l3|w(l)λ~l=Rz

Now that we’ve sketched the steps of this process, we detail each approximation and show that we can achieve the claimed correctness and complexity bounds.

Computing Approximate Singular Vectors

As described above, we begin by length-square sampling the original matrix ACm×n. So, pick row index i with probability pi=Ai,2AF2 and output random row Y=Ai/pi=Ai,Ai,AF. After sampling r rows, we implicitly define matrix R to be the concatenation of the outputted random rows. Furthermore, we rescale R so that E[RR]=AA, which requires normalizing by r. Therefore,

R=1r[Y1Y2Yr]Cr×n

Writing in progress…

For the filled in details, please check out my QML Notes.