diff options
author | julie <julielangou@users.noreply.github.com> | 2016-04-28 05:35:01 +0000 |
---|---|---|
committer | julie <julielangou@users.noreply.github.com> | 2016-04-28 05:35:01 +0000 |
commit | f46ce64a4f7e1187047f1a527ffbca919e97fe40 (patch) | |
tree | 8452aa844e76493c97baa1fc60e09683dc9c82ff /SRC/dgesdd.f | |
parent | 64ebc20f17d045fe04ed01f661fe56efeb82248f (diff) |
From Mark Gates (UTK) - Fix Bug 157 - Various bugs in SVD routines (*gesdd, *gesvd, and *bdsdc).
Items are labelled (a) through (m), omitting (l).
Several are not bugs, just suggestions.
Most bugs are in *gesdd.
There's one bug (g) in *bdsdc. This is the underlying cause of LAPACK bug #111.
There's one bug (m) in [cz]gesvd. I also added an INT() cast in these
assignments to silence compiler warnings. Changed:
LWORK_ZGEQRF=CDUM(1)
to:
LWORK_ZGEQRF = INT( CDUM(1) )
Where possible, I ran a test showing the wrong behavior, then a test showing the
corrected behavior. These use a modified version of the MAGMA SVD tester
(calling LAPACK), because I could adjust the lwork as needed. The last 3 columns
are the lwork type, the lwork size, and the lwork formula. The lwork types are:
doc_old as documented in LAPACK 3.6.
doc as in the attached, updated documentation.
min_old minwrk, as computed in LAPACK 3.6.
min minwrk, as computed in the attached, updated code.
min-1 minimum - 1; this should cause gesdd to return an error.
opt optimal size.
max the maximum size LAPACK will take advantage of;
some cases, the optimal is n*n + work, while the max is m*n + work.
query what gesdd returns for an lwork query; should equal opt or max.
After the lwork, occasionally there is a ! or ? error code indicating:
Error codes: ! error: lwork < min. For (min-1), this ought to appear.
? compatability issue: lwork < min_old, will fail for lapack <= 3.6.
I also tested the update routines on a wide variety of sizes and jobz, with
various lwork.
Besides fixing the bugs below, I made two significant changes.
1) Changed *gesdd from computing each routine's workspace using, e.g.:
N*ilaenv(...)
to querying each routine for its LWORK, e.g.:
CALL ZGEBRD( M, N, CDUM(1), M, CDUM(1), DUM(1), CDUM(1),
$ CDUM(1), CDUM(1), -1, IERR )
LWORK_ZGEBRD_MN = INT( CDUM(1) )
This matches how *gesvd was changed in LAPACK 3.4.0.
2) Changed the Workspace: comments, which were incredibly deceptive.
For instance, in Path 2 before dbdsdc, it said
Workspace: need N + N*N + BDSPAC
since dbdsdc needs the [e] vector, [U] matrix, and bdspac.
However, that ignores that the [tauq, taup] vectors and [R] matrix
are also already allocated, though dbdsdc doesn't need them.
So the workspace needed at that point in the code is actually
Workspace: need N*N [R] + 3*N [e, tauq, taup] + N*N [U] + BDSPAC
For clarity, I added in [brackets] what matrices or vectors were allocated,
and the order reflects their order in memory.
I may do a similar change for *gesvd eventually. The workspace comments in
MAGMA's *gesvd have already been updated as above.
================================================================================
a) Throughout, to simplify equations, let:
mn = min( M, N )
mx = max( M, N )
================================================================================
b) [sdcz]gesdd Path 4 (m >> n, job="all") has wrong minwrk formula in code:
minwrk = bdspac + mn*mn + 2*mn + mx
= 4*mn*mn + 6*mn + mx
This is an overestimate, needlessly rejecting the documented formula:
doc = 4*mn*mn + 7*mn
In complex, the correct min fails, but the doc matches the wrong minwrk.
Solution: fix code to:
minwrk = mn*mn + max( 3*mn + bdspac, mn + mx )
= mn*mn + max( 3*mn*mn + 7*mn, mn + mx )
Test cases:
m=40, ..., 100; n=20; jobz='A'
See bug (d) showing documentation is also wrong.
Also, see bug (i), complex [cz]gesdd should return -12 instead of -13.
================================================================================
bt) transposed case
[sd]gesdd Path 4t (n >> m, job="all") has a different wrong minwrk; see bug (c).
[cz]gesdd Path 4t exhibits same bug as Path 4.
Test cases:
m=20; n=40, ..., 100; jobz='A'
================================================================================
c) [sd]gesdd Path 4t (n >> m, job="all") has wrong minwrk formula in code,
different than bug (b):
minwrk = bdspac + m*m + 3*m
= 4*mn*mn + 7*mn
This formula lacks any dependence on N, so the code will fail (without
setting info; orglq calls xerbla) if N is too large, N > 3*M*M + 6*M.
Bug does not occur in complex.
Test cases:
m=20; n = 1320; jobz='A' ok with documented lwork
m=20; n > 1320; jobz='A' fails with documented lwork
Solution: as in bug (b), fix code to:
minwrk = mn*mn + max( 3*mn + bdspac, mn + mx )
= mn*mn + max( 3*mn*mn + 7*mn, mn + mx )
See bug (d) showing documentation is also wrong.
================================================================================
d) [sd]gesdd documentation lists the same minimum size for jobz='S' and 'A':
If JOBZ = 'S' or 'A', LWORK >= min(M,N)*(7 + 4*min(M,N))
However, jobz='A' actually also depends on max(M,N):
minwrk = mn*mn + max( 3*mn*mn + 7*mn, mn + mx )
This causes the formula to fail for mx > 3*mn*mn + 6*mn.
Test cases:
m > 1320; n = 20; jobz='A' fails with document lwork, even after fixing bugs (b) and (c).
m = 20; n > 1320; jobz='A' fails also.
Solution: in docs, split these two cases. This fix uses an overestimate,
so that codes using it will be backwards compatible with LAPACK <= 3.6.
If JOBZ = 'S', LWORK >= 4*mn*mn + 7*mn.
If JOBZ = 'A', LWORK >= 4*mn*mn + 6*mn + mx.
================================================================================
e) [sd]gesdd, Path 5, jobz='A' has wrong maxwrk formula in the code:
MAXWRK = MAX( MAXWRK, BDSPAC + 3*N )
Should be:
MAXWRK = MAX( WRKBL, BDSPAC + 3*N )
This causes the lwork query to ignore WRKBL, and return the minimum
workspace size, BDSPAC + 3*N, instead of the optimal workspace size.
However, it only affects the result for small sizes where min(M,N) < NB.
Path 5t has the correct maxwrk formula.
Complex is correct for both Path 5 and 5t.
Test case:
Compare lwork query with
M = 30, N = 20, jobz='A', lwork query is 1340
M = 20, N = 30, jobz='A', lwork query is 3260
These should be the same.
Solution: fix code as above.
================================================================================
f) Not a bug, just a suggestion.
The lwork minimum sizes are not actually minimums, and can be larger than
the queried lwork size.
Solution: add a comment:
These are not tight minimums in all cases; see comments inside code.
================================================================================
g) [sd]bdsdc segfaults due to too small workspace size. Its documentation claims:
If COMPQ = 'N' then LWORK >= (4 * N).
Based on this, in zgesdd, the rwork size >= 5*min(M,N).
However, LAPACK bug 111 found that rwork size >= 7*min(M,N) was required.
In dbdsdc, if uplo='L', then it rotates lower bidiagonal to upper bidiagonal,
and saves 2 vectors of Givens rotations in work. It shifts WSTART from
1 to 2*N-1. Then it calls dlasdq( ..., work( wstart ), info ).
As dlasdq requires 4*N, dbdsdc would now require 6*N in this case.
This caused zgesdd to require rwork size >= 7*min(M,N) when N > M and jobz='N'.
My preferred solution is to change WSTART to 1 in the DLASDQ call inside dbdsdc:
IF( ICOMPQ.EQ.0 ) THEN
CALL DLASDQ( 'U', 0, N, 0, 0, 0, D, E, VT, LDVT, U, LDU, U,
$ LDU, WORK( WSTART ), INFO )
GO TO 40
END IF
to:
IF( ICOMPQ.EQ.0 ) THEN
* Ignores WSTART, which is needed only for ICOMPQ = 1 or 2;
* using WSTART would change required workspace to 6*N for uplo='L'.
CALL DLASDQ( 'U', 0, N, 0, 0, 0, D, E, VT, LDVT, U, LDU, U,
$ LDU, WORK( 1 ), INFO )
GO TO 40
END IF
The [cz]gesdd documentation, which was changed to 7*min(M,N) in LAPACK 3.6,
may be reverted to 5*min(M,N), if desired.
================================================================================
h) [sd]gesdd for jobz='N' requires bdspac = 7*n for the dbdsdc workspace.
However, dbdsdc requires only 4*n, or 6*n before fixing bug (g).
For backwards compatability, I did not change the code, but added a comment
for clarification.
================================================================================
i) [cz]gesdd returns info = -13 instead of info = -12 for lwork errors.
================================================================================
j) In zgesdd, for computing maxwrk, these paths:
Path 6, jobz=A
Path 6t, jobz=S
Path 6t, jobz=A
query ilaenv( 1, zungbr, ... )
when the code actually calls zunmbr (twice). I corrected it.
================================================================================
k) In zgesdd documentation, currently
lrwork >= max( 5*mn*mn + 7*mn, 2*mx*mn + 2*mn*mn + mn )
It doesn't need that much, particularly for (mx >> mn) case.
If (mx >> mn), lrwork >= 5*mn*mn + 5*mn;
else, lrwork >= max( 5*mn*mn + 5*mn,
2*mx*mn + 2*mn*mn + mn ).
I changed this in the documentation. Feel free to revert if you prefer.
================================================================================
m) [cz]gesvd, Path 10 and 10t, have minwrk inside the wrong conditional:
IF( .NOT.WNTVN ) THEN
MAXWRK = MAX( MAXWRK, 2*N+LWORK_ZUNGBR_P )
MINWRK = 2*N + M
END IF
So Path 10 with jobvt='N', and Path 10t with jobu='N', have minwrk = 1,
so an invalid lwork is not correctly rejected.
================================================================================
mt) transposed case
broken: with old routine, Path 10t with jobu='N' doesn't enforce minwrk
Diffstat (limited to 'SRC/dgesdd.f')
-rw-r--r-- | SRC/dgesdd.f | 721 |
1 files changed, 420 insertions, 301 deletions
diff --git a/SRC/dgesdd.f b/SRC/dgesdd.f index 54e2652e..4bdc8a64 100644 --- a/SRC/dgesdd.f +++ b/SRC/dgesdd.f @@ -18,8 +18,8 @@ * Definition: * =========== * -* SUBROUTINE DGESDD( JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, -* LWORK, IWORK, INFO ) +* SUBROUTINE DGESDD( JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, +* WORK, LWORK, IWORK, INFO ) * * .. Scalar Arguments .. * CHARACTER JOBZ @@ -154,8 +154,8 @@ *> \param[in] LDVT *> \verbatim *> LDVT is INTEGER -*> The leading dimension of the array VT. LDVT >= 1; if -*> JOBZ = 'A' or JOBZ = 'O' and M >= N, LDVT >= N; +*> The leading dimension of the array VT. LDVT >= 1; +*> if JOBZ = 'A' or JOBZ = 'O' and M >= N, LDVT >= N; *> if JOBZ = 'S', LDVT >= min(M,N). *> \endverbatim *> @@ -169,16 +169,18 @@ *> \verbatim *> LWORK is INTEGER *> The dimension of the array WORK. LWORK >= 1. -*> If JOBZ = 'N', -*> LWORK >= 3*min(M,N) + max(max(M,N),7*min(M,N)). -*> If JOBZ = 'O', -*> LWORK >= 3*min(M,N) + -*> max(max(M,N),5*min(M,N)*min(M,N)+4*min(M,N)). -*> If JOBZ = 'S' or 'A' -*> LWORK >= min(M,N)*(7+4*min(M,N)) -*> For good performance, LWORK should generally be larger. -*> If LWORK = -1 but other input arguments are legal, WORK(1) -*> returns the optimal LWORK. +*> If LWORK = -1, a workspace query is assumed. The optimal +*> size for the WORK array is calculated and stored in WORK(1), +*> and no other work except argument checking is performed. +*> +*> Let mx = max(M,N) and mn = min(M,N). +*> If JOBZ = 'N', LWORK >= 3*mn + max( mx, 7*mn ). +*> If JOBZ = 'O', LWORK >= 3*mn + max( mx, 5*mn*mn + 4*mn ). +*> If JOBZ = 'S', LWORK >= 4*mn*mn + 7*mn. +*> If JOBZ = 'A', LWORK >= 4*mn*mn + 6*mn + mx. +*> These are not tight minimums in all cases; see comments inside code. +*> For good performance, LWORK should generally be larger; +*> a query is recommended. *> \endverbatim *> *> \param[out] IWORK @@ -212,9 +214,11 @@ *> Ming Gu and Huan Ren, Computer Science Division, University of *> California at Berkeley, USA *> +*> @precisions fortran d -> s * ===================================================================== - SUBROUTINE DGESDD( JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, - $ LWORK, IWORK, INFO ) + SUBROUTINE DGESDD( JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, + $ WORK, LWORK, IWORK, INFO ) + implicit none * * -- LAPACK driver routine (version 3.6.0) -- * -- LAPACK is a software package provided by Univ. of Tennessee, -- @@ -243,6 +247,15 @@ $ IR, ISCL, ITAU, ITAUP, ITAUQ, IU, IVT, LDWKVT, $ LDWRKL, LDWRKR, LDWRKU, MAXWRK, MINMN, MINWRK, $ MNTHR, NWORK, WRKBL + INTEGER LWORK_DGEBRD_MN, LWORK_DGEBRD_MM, + $ LWORK_DGEBRD_NN, LWORK_DGELQF_MN, + $ LWORK_DGEQRF_MN, + $ LWORK_DORGBR_P_MM, LWORK_DORGBR_Q_NN, + $ LWORK_DORGLQ_MN, LWORK_DORGLQ_NN, + $ LWORK_DORGQR_MM, LWORK_DORGQR_MN, + $ LWORK_DORMBR_PRT_MM, LWORK_DORMBR_QLN_MM, + $ LWORK_DORMBR_PRT_MN, LWORK_DORMBR_QLN_MN, + $ LWORK_DORMBR_PRT_NN, LWORK_DORMBR_QLN_NN DOUBLE PRECISION ANRM, BIGNUM, EPS, SMLNUM * .. * .. Local Arrays .. @@ -256,9 +269,8 @@ * .. * .. External Functions .. LOGICAL LSAME - INTEGER ILAENV DOUBLE PRECISION DLAMCH, DLANGE - EXTERNAL DLAMCH, DLANGE, ILAENV, LSAME + EXTERNAL DLAMCH, DLANGE, LSAME * .. * .. Intrinsic Functions .. INTRINSIC INT, MAX, MIN, SQRT @@ -267,13 +279,13 @@ * * Test the input arguments * - INFO = 0 - MINMN = MIN( M, N ) - WNTQA = LSAME( JOBZ, 'A' ) - WNTQS = LSAME( JOBZ, 'S' ) + INFO = 0 + MINMN = MIN( M, N ) + WNTQA = LSAME( JOBZ, 'A' ) + WNTQS = LSAME( JOBZ, 'S' ) WNTQAS = WNTQA .OR. WNTQS - WNTQO = LSAME( JOBZ, 'O' ) - WNTQN = LSAME( JOBZ, 'N' ) + WNTQO = LSAME( JOBZ, 'O' ) + WNTQN = LSAME( JOBZ, 'N' ) LQUERY = ( LWORK.EQ.-1 ) * IF( .NOT.( WNTQA .OR. WNTQS .OR. WNTQO .OR. WNTQN ) ) THEN @@ -294,115 +306,140 @@ END IF * * Compute workspace -* (Note: Comments in the code beginning "Workspace:" describe the -* minimal amount of workspace needed at that point in the code, +* Note: Comments in the code beginning "Workspace:" describe the +* minimal amount of workspace allocated at that point in the code, * as well as the preferred amount for good performance. * NB refers to the optimal block size for the immediately -* following subroutine, as returned by ILAENV.) +* following subroutine, as returned by ILAENV. * IF( INFO.EQ.0 ) THEN MINWRK = 1 MAXWRK = 1 + BDSPAC = 0 + MNTHR = INT( MINMN*11.0D0 / 6.0D0 ) IF( M.GE.N .AND. MINMN.GT.0 ) THEN * * Compute space needed for DBDSDC * - MNTHR = INT( MINMN*11.0D0 / 6.0D0 ) IF( WNTQN ) THEN +* dbdsdc needs only 4*N (or 6*N for uplo=L for LAPACK <= 3.6) +* keep 7*N for backwards compatability. BDSPAC = 7*N ELSE BDSPAC = 3*N*N + 4*N END IF +* +* Compute space preferred for each routine + CALL DGEBRD( M, N, DUM(1), M, DUM(1), DUM(1), DUM(1), + $ DUM(1), DUM(1), -1, IERR ) + LWORK_DGEBRD_MN = INT( DUM(1) ) +* + CALL DGEBRD( N, N, DUM(1), N, DUM(1), DUM(1), DUM(1), + $ DUM(1), DUM(1), -1, IERR ) + LWORK_DGEBRD_NN = INT( DUM(1) ) +* + CALL DGEQRF( M, N, DUM(1), M, DUM(1), DUM(1), -1, IERR ) + LWORK_DGEQRF_MN = INT( DUM(1) ) +* + CALL DORGBR( 'Q', N, N, N, DUM(1), N, DUM(1), DUM(1), -1, + $ IERR ) + LWORK_DORGBR_Q_NN = INT( DUM(1) ) +* + CALL DORGQR( M, M, N, DUM(1), M, DUM(1), DUM(1), -1, IERR ) + LWORK_DORGQR_MM = INT( DUM(1) ) +* + CALL DORGQR( M, N, N, DUM(1), M, DUM(1), DUM(1), -1, IERR ) + LWORK_DORGQR_MN = INT( DUM(1) ) +* + CALL DORMBR( 'P', 'R', 'T', N, N, N, DUM(1), N, + $ DUM(1), DUM(1), N, DUM(1), -1, IERR ) + LWORK_DORMBR_PRT_NN = INT( DUM(1) ) +* + CALL DORMBR( 'Q', 'L', 'N', N, N, N, DUM(1), N, + $ DUM(1), DUM(1), N, DUM(1), -1, IERR ) + LWORK_DORMBR_QLN_NN = INT( DUM(1) ) +* + CALL DORMBR( 'Q', 'L', 'N', M, N, N, DUM(1), M, + $ DUM(1), DUM(1), M, DUM(1), -1, IERR ) + LWORK_DORMBR_QLN_MN = INT( DUM(1) ) +* + CALL DORMBR( 'Q', 'L', 'N', M, M, N, DUM(1), M, + $ DUM(1), DUM(1), M, DUM(1), -1, IERR ) + LWORK_DORMBR_QLN_MM = INT( DUM(1) ) +* IF( M.GE.MNTHR ) THEN IF( WNTQN ) THEN * -* Path 1 (M much larger than N, JOBZ='N') +* Path 1 (M >> N, JOBZ='N') * - WRKBL = N + N*ILAENV( 1, 'DGEQRF', ' ', M, N, -1, - $ -1 ) - WRKBL = MAX( WRKBL, 3*N+2*N* - $ ILAENV( 1, 'DGEBRD', ' ', N, N, -1, -1 ) ) - MAXWRK = MAX( WRKBL, BDSPAC+N ) + WRKBL = N + LWORK_DGEQRF_MN + WRKBL = MAX( WRKBL, 3*N + LWORK_DGEBRD_NN ) + MAXWRK = MAX( WRKBL, BDSPAC + N ) MINWRK = BDSPAC + N ELSE IF( WNTQO ) THEN * -* Path 2 (M much larger than N, JOBZ='O') -* - WRKBL = N + N*ILAENV( 1, 'DGEQRF', ' ', M, N, -1, -1 ) - WRKBL = MAX( WRKBL, N+N*ILAENV( 1, 'DORGQR', ' ', M, - $ N, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+2*N* - $ ILAENV( 1, 'DGEBRD', ' ', N, N, -1, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'QLN', N, N, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'PRT', N, N, N, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*N ) +* Path 2 (M >> N, JOBZ='O') +* + WRKBL = N + LWORK_DGEQRF_MN + WRKBL = MAX( WRKBL, N + LWORK_DORGQR_MN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DGEBRD_NN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_QLN_NN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_PRT_NN ) + WRKBL = MAX( WRKBL, 3*N + BDSPAC ) MAXWRK = WRKBL + 2*N*N MINWRK = BDSPAC + 2*N*N + 3*N ELSE IF( WNTQS ) THEN * -* Path 3 (M much larger than N, JOBZ='S') -* - WRKBL = N + N*ILAENV( 1, 'DGEQRF', ' ', M, N, -1, -1 ) - WRKBL = MAX( WRKBL, N+N*ILAENV( 1, 'DORGQR', ' ', M, - $ N, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+2*N* - $ ILAENV( 1, 'DGEBRD', ' ', N, N, -1, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'QLN', N, N, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'PRT', N, N, N, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*N ) +* Path 3 (M >> N, JOBZ='S') +* + WRKBL = N + LWORK_DGEQRF_MN + WRKBL = MAX( WRKBL, N + LWORK_DORGQR_MN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DGEBRD_NN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_QLN_NN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_PRT_NN ) + WRKBL = MAX( WRKBL, 3*N + BDSPAC ) MAXWRK = WRKBL + N*N MINWRK = BDSPAC + N*N + 3*N ELSE IF( WNTQA ) THEN * -* Path 4 (M much larger than N, JOBZ='A') -* - WRKBL = N + N*ILAENV( 1, 'DGEQRF', ' ', M, N, -1, -1 ) - WRKBL = MAX( WRKBL, N+M*ILAENV( 1, 'DORGQR', ' ', M, - $ M, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+2*N* - $ ILAENV( 1, 'DGEBRD', ' ', N, N, -1, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'QLN', N, N, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'PRT', N, N, N, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*N ) +* Path 4 (M >> N, JOBZ='A') +* + WRKBL = N + LWORK_DGEQRF_MN + WRKBL = MAX( WRKBL, N + LWORK_DORGQR_MM ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DGEBRD_NN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_QLN_NN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_PRT_NN ) + WRKBL = MAX( WRKBL, 3*N + BDSPAC ) MAXWRK = WRKBL + N*N - MINWRK = BDSPAC + N*N + 2*N + M + MINWRK = N*N + MAX( 3*N + BDSPAC, N + M ) END IF ELSE * -* Path 5 (M at least N, but not much larger) +* Path 5 (M >= N, but not much larger) * - WRKBL = 3*N + ( M+N )*ILAENV( 1, 'DGEBRD', ' ', M, N, -1, - $ -1 ) + WRKBL = 3*N + LWORK_DGEBRD_MN IF( WNTQN ) THEN - MAXWRK = MAX( WRKBL, BDSPAC+3*N ) +* Path 5n (M >= N, jobz='N') + MAXWRK = MAX( WRKBL, 3*N + BDSPAC ) MINWRK = 3*N + MAX( M, BDSPAC ) ELSE IF( WNTQO ) THEN - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'QLN', M, N, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'PRT', N, N, N, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*N ) +* Path 5o (M >= N, jobz='O') + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_PRT_NN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_QLN_MN ) + WRKBL = MAX( WRKBL, 3*N + BDSPAC ) MAXWRK = WRKBL + M*N - MINWRK = 3*N + MAX( M, N*N+BDSPAC ) + MINWRK = 3*N + MAX( M, N*N + BDSPAC ) ELSE IF( WNTQS ) THEN - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'QLN', M, N, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'PRT', N, N, N, -1 ) ) - MAXWRK = MAX( WRKBL, BDSPAC+3*N ) +* Path 5s (M >= N, jobz='S') + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_QLN_MN ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_PRT_NN ) + MAXWRK = MAX( WRKBL, 3*N + BDSPAC ) MINWRK = 3*N + MAX( M, BDSPAC ) ELSE IF( WNTQA ) THEN - WRKBL = MAX( WRKBL, 3*N+M* - $ ILAENV( 1, 'DORMBR', 'QLN', M, M, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*N+N* - $ ILAENV( 1, 'DORMBR', 'PRT', N, N, N, -1 ) ) - MAXWRK = MAX( MAXWRK, BDSPAC+3*N ) +* Path 5a (M >= N, jobz='A') + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_QLN_MM ) + WRKBL = MAX( WRKBL, 3*N + LWORK_DORMBR_PRT_NN ) + MAXWRK = MAX( WRKBL, 3*N + BDSPAC ) MINWRK = 3*N + MAX( M, BDSPAC ) END IF END IF @@ -410,106 +447,129 @@ * * Compute space needed for DBDSDC * - MNTHR = INT( MINMN*11.0D0 / 6.0D0 ) IF( WNTQN ) THEN +* dbdsdc needs only 4*N (or 6*N for uplo=L for LAPACK <= 3.6) +* keep 7*N for backwards compatability. BDSPAC = 7*M ELSE BDSPAC = 3*M*M + 4*M END IF +* +* Compute space preferred for each routine + CALL DGEBRD( M, N, DUM(1), M, DUM(1), DUM(1), DUM(1), + $ DUM(1), DUM(1), -1, IERR ) + LWORK_DGEBRD_MN = INT( DUM(1) ) +* + CALL DGEBRD( M, M, A, M, S, DUM(1), DUM(1), + $ DUM(1), DUM(1), -1, IERR ) + LWORK_DGEBRD_MM = INT( DUM(1) ) +* + CALL DGELQF( M, N, A, M, DUM(1), DUM(1), -1, IERR ) + LWORK_DGELQF_MN = INT( DUM(1) ) +* + CALL DORGLQ( N, N, M, DUM(1), N, DUM(1), DUM(1), -1, IERR ) + LWORK_DORGLQ_NN = INT( DUM(1) ) +* + CALL DORGLQ( M, N, M, A, M, DUM(1), DUM(1), -1, IERR ) + LWORK_DORGLQ_MN = INT( DUM(1) ) +* + CALL DORGBR( 'P', M, M, M, A, N, DUM(1), DUM(1), -1, IERR ) + LWORK_DORGBR_P_MM = INT( DUM(1) ) +* + CALL DORMBR( 'P', 'R', 'T', M, M, M, DUM(1), M, + $ DUM(1), DUM(1), M, DUM(1), -1, IERR ) + LWORK_DORMBR_PRT_MM = INT( DUM(1) ) +* + CALL DORMBR( 'P', 'R', 'T', M, N, M, DUM(1), M, + $ DUM(1), DUM(1), M, DUM(1), -1, IERR ) + LWORK_DORMBR_PRT_MN = INT( DUM(1) ) +* + CALL DORMBR( 'P', 'R', 'T', N, N, M, DUM(1), N, + $ DUM(1), DUM(1), N, DUM(1), -1, IERR ) + LWORK_DORMBR_PRT_NN = INT( DUM(1) ) +* + CALL DORMBR( 'Q', 'L', 'N', M, M, M, DUM(1), M, + $ DUM(1), DUM(1), M, DUM(1), -1, IERR ) + LWORK_DORMBR_QLN_MM = INT( DUM(1) ) +* IF( N.GE.MNTHR ) THEN IF( WNTQN ) THEN * -* Path 1t (N much larger than M, JOBZ='N') +* Path 1t (N >> M, JOBZ='N') * - WRKBL = M + M*ILAENV( 1, 'DGELQF', ' ', M, N, -1, - $ -1 ) - WRKBL = MAX( WRKBL, 3*M+2*M* - $ ILAENV( 1, 'DGEBRD', ' ', M, M, -1, -1 ) ) - MAXWRK = MAX( WRKBL, BDSPAC+M ) + WRKBL = M + LWORK_DGELQF_MN + WRKBL = MAX( WRKBL, 3*M + LWORK_DGEBRD_MM ) + MAXWRK = MAX( WRKBL, BDSPAC + M ) MINWRK = BDSPAC + M ELSE IF( WNTQO ) THEN * -* Path 2t (N much larger than M, JOBZ='O') -* - WRKBL = M + M*ILAENV( 1, 'DGELQF', ' ', M, N, -1, -1 ) - WRKBL = MAX( WRKBL, M+M*ILAENV( 1, 'DORGLQ', ' ', M, - $ N, M, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+2*M* - $ ILAENV( 1, 'DGEBRD', ' ', M, M, -1, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'QLN', M, M, M, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'PRT', M, M, M, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*M ) +* Path 2t (N >> M, JOBZ='O') +* + WRKBL = M + LWORK_DGELQF_MN + WRKBL = MAX( WRKBL, M + LWORK_DORGLQ_MN ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DGEBRD_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_QLN_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_PRT_MM ) + WRKBL = MAX( WRKBL, 3*M + BDSPAC ) MAXWRK = WRKBL + 2*M*M MINWRK = BDSPAC + 2*M*M + 3*M ELSE IF( WNTQS ) THEN * -* Path 3t (N much larger than M, JOBZ='S') -* - WRKBL = M + M*ILAENV( 1, 'DGELQF', ' ', M, N, -1, -1 ) - WRKBL = MAX( WRKBL, M+M*ILAENV( 1, 'DORGLQ', ' ', M, - $ N, M, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+2*M* - $ ILAENV( 1, 'DGEBRD', ' ', M, M, -1, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'QLN', M, M, M, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'PRT', M, M, M, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*M ) +* Path 3t (N >> M, JOBZ='S') +* + WRKBL = M + LWORK_DGELQF_MN + WRKBL = MAX( WRKBL, M + LWORK_DORGLQ_MN ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DGEBRD_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_QLN_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_PRT_MM ) + WRKBL = MAX( WRKBL, 3*M + BDSPAC ) MAXWRK = WRKBL + M*M MINWRK = BDSPAC + M*M + 3*M ELSE IF( WNTQA ) THEN * -* Path 4t (N much larger than M, JOBZ='A') -* - WRKBL = M + M*ILAENV( 1, 'DGELQF', ' ', M, N, -1, -1 ) - WRKBL = MAX( WRKBL, M+N*ILAENV( 1, 'DORGLQ', ' ', N, - $ N, M, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+2*M* - $ ILAENV( 1, 'DGEBRD', ' ', M, M, -1, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'QLN', M, M, M, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'PRT', M, M, M, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*M ) +* Path 4t (N >> M, JOBZ='A') +* + WRKBL = M + LWORK_DGELQF_MN + WRKBL = MAX( WRKBL, M + LWORK_DORGLQ_NN ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DGEBRD_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_QLN_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_PRT_MM ) + WRKBL = MAX( WRKBL, 3*M + BDSPAC ) MAXWRK = WRKBL + M*M - MINWRK = BDSPAC + M*M + 3*M + MINWRK = M*M + MAX( 3*M + BDSPAC, M + N ) END IF ELSE * -* Path 5t (N greater than M, but not much larger) +* Path 5t (N > M, but not much larger) * - WRKBL = 3*M + ( M+N )*ILAENV( 1, 'DGEBRD', ' ', M, N, -1, - $ -1 ) + WRKBL = 3*M + LWORK_DGEBRD_MN IF( WNTQN ) THEN - MAXWRK = MAX( WRKBL, BDSPAC+3*M ) +* Path 5tn (N > M, jobz='N') + MAXWRK = MAX( WRKBL, 3*M + BDSPAC ) MINWRK = 3*M + MAX( N, BDSPAC ) ELSE IF( WNTQO ) THEN - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'QLN', M, M, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'PRT', M, N, M, -1 ) ) - WRKBL = MAX( WRKBL, BDSPAC+3*M ) +* Path 5to (N > M, jobz='O') + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_QLN_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_PRT_MN ) + WRKBL = MAX( WRKBL, 3*M + BDSPAC ) MAXWRK = WRKBL + M*N - MINWRK = 3*M + MAX( N, M*M+BDSPAC ) + MINWRK = 3*M + MAX( N, M*M + BDSPAC ) ELSE IF( WNTQS ) THEN - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'QLN', M, M, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'PRT', M, N, M, -1 ) ) - MAXWRK = MAX( WRKBL, BDSPAC+3*M ) +* Path 5ts (N > M, jobz='S') + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_QLN_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_PRT_MN ) + MAXWRK = MAX( WRKBL, 3*M + BDSPAC ) MINWRK = 3*M + MAX( N, BDSPAC ) ELSE IF( WNTQA ) THEN - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'QLN', M, M, N, -1 ) ) - WRKBL = MAX( WRKBL, 3*M+M* - $ ILAENV( 1, 'DORMBR', 'PRT', N, N, M, -1 ) ) - MAXWRK = MAX( WRKBL, BDSPAC+3*M ) +* Path 5ta (N > M, jobz='A') + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_QLN_MM ) + WRKBL = MAX( WRKBL, 3*M + LWORK_DORMBR_PRT_NN ) + MAXWRK = MAX( WRKBL, 3*M + BDSPAC ) MINWRK = 3*M + MAX( N, BDSPAC ) END IF END IF END IF + MAXWRK = MAX( MAXWRK, MINWRK ) WORK( 1 ) = MAXWRK * @@ -559,17 +619,18 @@ * IF( WNTQN ) THEN * -* Path 1 (M much larger than N, JOBZ='N') +* Path 1 (M >> N, JOBZ='N') * No singular vectors to be computed * ITAU = 1 NWORK = ITAU + N * * Compute A=Q*R -* (Workspace: need 2*N, prefer N+N*NB) +* Workspace: need N [tau] + N [work] +* Workspace: prefer N [tau] + N*NB [work] * CALL DGEQRF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Zero out below R * @@ -580,7 +641,8 @@ NWORK = ITAUP + N * * Bidiagonalize R in A -* (Workspace: need 4*N, prefer 3*N+2*N*NB) +* Workspace: need 3*N [e, tauq, taup] + N [work] +* Workspace: prefer 3*N [e, tauq, taup] + 2*N*NB [work] * CALL DGEBRD( N, N, A, LDA, S, WORK( IE ), WORK( ITAUQ ), $ WORK( ITAUP ), WORK( NWORK ), LWORK-NWORK+1, @@ -588,14 +650,14 @@ NWORK = IE + N * * Perform bidiagonal SVD, computing singular values only -* (Workspace: need N+BDSPAC) +* Workspace: need N [e] + BDSPAC * CALL DBDSDC( 'U', 'N', N, S, WORK( IE ), DUM, 1, DUM, 1, $ DUM, IDUM, WORK( NWORK ), IWORK, INFO ) * ELSE IF( WNTQO ) THEN * -* Path 2 (M much larger than N, JOBZ = 'O') +* Path 2 (M >> N, JOBZ = 'O') * N left singular vectors to be overwritten on A and * N right singular vectors to be computed in VT * @@ -603,42 +665,45 @@ * * WORK(IR) is LDWRKR by N * - IF( LWORK.GE.LDA*N+N*N+3*N+BDSPAC ) THEN + IF( LWORK .GE. LDA*N + N*N + 3*N + BDSPAC ) THEN LDWRKR = LDA ELSE - LDWRKR = ( LWORK-N*N-3*N-BDSPAC ) / N + LDWRKR = ( LWORK - N*N - 3*N - BDSPAC ) / N END IF ITAU = IR + LDWRKR*N NWORK = ITAU + N * * Compute A=Q*R -* (Workspace: need N*N+2*N, prefer N*N+N+N*NB) +* Workspace: need N*N [R] + N [tau] + N [work] +* Workspace: prefer N*N [R] + N [tau] + N*NB [work] * CALL DGEQRF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Copy R to WORK(IR), zeroing out below it * CALL DLACPY( 'U', N, N, A, LDA, WORK( IR ), LDWRKR ) - CALL DLASET( 'L', N-1, N-1, ZERO, ZERO, WORK( IR+1 ), + CALL DLASET( 'L', N - 1, N - 1, ZERO, ZERO, WORK(IR+1), $ LDWRKR ) * * Generate Q in A -* (Workspace: need N*N+2*N, prefer N*N+N+N*NB) +* Workspace: need N*N [R] + N [tau] + N [work] +* Workspace: prefer N*N [R] + N [tau] + N*NB [work] * CALL DORGQR( M, N, N, A, LDA, WORK( ITAU ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) IE = ITAU ITAUQ = IE + N ITAUP = ITAUQ + N NWORK = ITAUP + N * -* Bidiagonalize R in VT, copying result to WORK(IR) -* (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB) +* Bidiagonalize R in WORK(IR) +* Workspace: need N*N [R] + 3*N [e, tauq, taup] + N [work] +* Workspace: prefer N*N [R] + 3*N [e, tauq, taup] + 2*N*NB [work] * CALL DGEBRD( N, N, WORK( IR ), LDWRKR, S, WORK( IE ), $ WORK( ITAUQ ), WORK( ITAUP ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * WORK(IU) is N by N * @@ -648,7 +713,7 @@ * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in WORK(IU) and computing right * singular vectors of bidiagonal matrix in VT -* (Workspace: need N+N*N+BDSPAC) +* Workspace: need N*N [R] + 3*N [e, tauq, taup] + N*N [U] + BDSPAC * CALL DBDSDC( 'U', 'I', N, S, WORK( IE ), WORK( IU ), N, $ VT, LDVT, DUM, IDUM, WORK( NWORK ), IWORK, @@ -656,21 +721,23 @@ * * Overwrite WORK(IU) by left singular vectors of R * and VT by right singular vectors of R -* (Workspace: need 2*N*N+3*N, prefer 2*N*N+2*N+N*NB) +* Workspace: need N*N [R] + 3*N [e, tauq, taup] + N*N [U] + N [work] +* Workspace: prefer N*N [R] + 3*N [e, tauq, taup] + N*N [U] + N*NB [work] * CALL DORMBR( 'Q', 'L', 'N', N, N, N, WORK( IR ), LDWRKR, $ WORK( ITAUQ ), WORK( IU ), N, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', N, N, N, WORK( IR ), LDWRKR, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Multiply Q in A by left singular vectors of R in * WORK(IU), storing result in WORK(IR) and copying to A -* (Workspace: need 2*N*N, prefer N*N+M*N) +* Workspace: need N*N [R] + 3*N [e, tauq, taup] + N*N [U] +* Workspace: prefer M*N [R] + 3*N [e, tauq, taup] + N*N [U] * DO 10 I = 1, M, LDWRKR - CHUNK = MIN( M-I+1, LDWRKR ) + CHUNK = MIN( M - I + 1, LDWRKR ) CALL DGEMM( 'N', 'N', CHUNK, N, N, ONE, A( I, 1 ), $ LDA, WORK( IU ), N, ZERO, WORK( IR ), $ LDWRKR ) @@ -680,7 +747,7 @@ * ELSE IF( WNTQS ) THEN * -* Path 3 (M much larger than N, JOBZ='S') +* Path 3 (M >> N, JOBZ='S') * N left singular vectors to be computed in U and * N right singular vectors to be computed in VT * @@ -693,38 +760,41 @@ NWORK = ITAU + N * * Compute A=Q*R -* (Workspace: need N*N+2*N, prefer N*N+N+N*NB) +* Workspace: need N*N [R] + N [tau] + N [work] +* Workspace: prefer N*N [R] + N [tau] + N*NB [work] * CALL DGEQRF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Copy R to WORK(IR), zeroing out below it * CALL DLACPY( 'U', N, N, A, LDA, WORK( IR ), LDWRKR ) - CALL DLASET( 'L', N-1, N-1, ZERO, ZERO, WORK( IR+1 ), + CALL DLASET( 'L', N - 1, N - 1, ZERO, ZERO, WORK(IR+1), $ LDWRKR ) * * Generate Q in A -* (Workspace: need N*N+2*N, prefer N*N+N+N*NB) +* Workspace: need N*N [R] + N [tau] + N [work] +* Workspace: prefer N*N [R] + N [tau] + N*NB [work] * CALL DORGQR( M, N, N, A, LDA, WORK( ITAU ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) IE = ITAU ITAUQ = IE + N ITAUP = ITAUQ + N NWORK = ITAUP + N * * Bidiagonalize R in WORK(IR) -* (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB) +* Workspace: need N*N [R] + 3*N [e, tauq, taup] + N [work] +* Workspace: prefer N*N [R] + 3*N [e, tauq, taup] + 2*N*NB [work] * CALL DGEBRD( N, N, WORK( IR ), LDWRKR, S, WORK( IE ), $ WORK( ITAUQ ), WORK( ITAUP ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Perform bidiagonal SVD, computing left singular vectors * of bidiagoal matrix in U and computing right singular * vectors of bidiagonal matrix in VT -* (Workspace: need N+BDSPAC) +* Workspace: need N*N [R] + 3*N [e, tauq, taup] + BDSPAC * CALL DBDSDC( 'U', 'I', N, S, WORK( IE ), U, LDU, VT, $ LDVT, DUM, IDUM, WORK( NWORK ), IWORK, @@ -732,19 +802,20 @@ * * Overwrite U by left singular vectors of R and VT * by right singular vectors of R -* (Workspace: need N*N+3*N, prefer N*N+2*N+N*NB) +* Workspace: need N*N [R] + 3*N [e, tauq, taup] + N [work] +* Workspace: prefer N*N [R] + 3*N [e, tauq, taup] + N*NB [work] * CALL DORMBR( 'Q', 'L', 'N', N, N, N, WORK( IR ), LDWRKR, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * CALL DORMBR( 'P', 'R', 'T', N, N, N, WORK( IR ), LDWRKR, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Multiply Q in A by left singular vectors of R in * WORK(IR), storing result in U -* (Workspace: need N*N) +* Workspace: need N*N [R] * CALL DLACPY( 'F', N, N, U, LDU, WORK( IR ), LDWRKR ) CALL DGEMM( 'N', 'N', M, N, N, ONE, A, LDA, WORK( IR ), @@ -752,7 +823,7 @@ * ELSE IF( WNTQA ) THEN * -* Path 4 (M much larger than N, JOBZ='A') +* Path 4 (M >> N, JOBZ='A') * M left singular vectors to be computed in U and * N right singular vectors to be computed in VT * @@ -765,16 +836,18 @@ NWORK = ITAU + N * * Compute A=Q*R, copying result to U -* (Workspace: need N*N+N+M, prefer N*N+N+M*NB) +* Workspace: need N*N [U] + N [tau] + N [work] +* Workspace: prefer N*N [U] + N [tau] + N*NB [work] * CALL DGEQRF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DLACPY( 'L', M, N, A, LDA, U, LDU ) * * Generate Q in U -* (Workspace: need N*N+N+M, prefer N*N+N+M*NB) +* Workspace: need N*N [U] + N [tau] + M [work] +* Workspace: prefer N*N [U] + N [tau] + M*NB [work] CALL DORGQR( M, M, N, U, LDU, WORK( ITAU ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Produce R in A, zeroing out other entries * @@ -785,7 +858,8 @@ NWORK = ITAUP + N * * Bidiagonalize R in A -* (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB) +* Workspace: need N*N [U] + 3*N [e, tauq, taup] + N [work] +* Workspace: prefer N*N [U] + 3*N [e, tauq, taup] + 2*N*NB [work] * CALL DGEBRD( N, N, A, LDA, S, WORK( IE ), WORK( ITAUQ ), $ WORK( ITAUP ), WORK( NWORK ), LWORK-NWORK+1, @@ -794,7 +868,7 @@ * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in WORK(IU) and computing right * singular vectors of bidiagonal matrix in VT -* (Workspace: need N+N*N+BDSPAC) +* Workspace: need N*N [U] + 3*N [e, tauq, taup] + BDSPAC * CALL DBDSDC( 'U', 'I', N, S, WORK( IE ), WORK( IU ), N, $ VT, LDVT, DUM, IDUM, WORK( NWORK ), IWORK, @@ -802,18 +876,19 @@ * * Overwrite WORK(IU) by left singular vectors of R and VT * by right singular vectors of R -* (Workspace: need N*N+3*N, prefer N*N+2*N+N*NB) +* Workspace: need N*N [U] + 3*N [e, tauq, taup] + N [work] +* Workspace: prefer N*N [U] + 3*N [e, tauq, taup] + N*NB [work] * CALL DORMBR( 'Q', 'L', 'N', N, N, N, A, LDA, $ WORK( ITAUQ ), WORK( IU ), LDWRKU, - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', N, N, N, A, LDA, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Multiply Q in U by left singular vectors of R in * WORK(IU), storing result in A -* (Workspace: need N*N) +* Workspace: need N*N [U] * CALL DGEMM( 'N', 'N', M, N, N, ONE, U, LDU, WORK( IU ), $ LDWRKU, ZERO, A, LDA ) @@ -828,7 +903,7 @@ * * M .LT. MNTHR * -* Path 5 (M at least N, but not much larger) +* Path 5 (M >= N, but not much larger) * Reduce to bidiagonal form without QR decomposition * IE = 1 @@ -837,21 +912,24 @@ NWORK = ITAUP + N * * Bidiagonalize A -* (Workspace: need 3*N+M, prefer 3*N+(M+N)*NB) +* Workspace: need 3*N [e, tauq, taup] + M [work] +* Workspace: prefer 3*N [e, tauq, taup] + (M+N)*NB [work] * CALL DGEBRD( M, N, A, LDA, S, WORK( IE ), WORK( ITAUQ ), $ WORK( ITAUP ), WORK( NWORK ), LWORK-NWORK+1, $ IERR ) IF( WNTQN ) THEN * +* Path 5n (M >= N, JOBZ='N') * Perform bidiagonal SVD, only computing singular values -* (Workspace: need N+BDSPAC) +* Workspace: need 3*N [e, tauq, taup] + BDSPAC * CALL DBDSDC( 'U', 'N', N, S, WORK( IE ), DUM, 1, DUM, 1, $ DUM, IDUM, WORK( NWORK ), IWORK, INFO ) ELSE IF( WNTQO ) THEN +* Path 5o (M >= N, JOBZ='O') IU = NWORK - IF( LWORK.GE.M*N+3*N+BDSPAC ) THEN + IF( LWORK .GE. M*N + 3*N + BDSPAC ) THEN * * WORK( IU ) is M by N * @@ -859,6 +937,8 @@ NWORK = IU + LDWRKU*N CALL DLASET( 'F', M, N, ZERO, ZERO, WORK( IU ), $ LDWRKU ) +* IR is unused; silence compile warnings + IR = -1 ELSE * * WORK( IU ) is N by N @@ -869,53 +949,59 @@ * WORK(IR) is LDWRKR by N * IR = NWORK - LDWRKR = ( LWORK-N*N-3*N ) / N + LDWRKR = ( LWORK - N*N - 3*N ) / N END IF NWORK = IU + LDWRKU*N * * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in WORK(IU) and computing right * singular vectors of bidiagonal matrix in VT -* (Workspace: need N+N*N+BDSPAC) +* Workspace: need 3*N [e, tauq, taup] + N*N [U] + BDSPAC * CALL DBDSDC( 'U', 'I', N, S, WORK( IE ), WORK( IU ), $ LDWRKU, VT, LDVT, DUM, IDUM, WORK( NWORK ), $ IWORK, INFO ) * * Overwrite VT by right singular vectors of A -* (Workspace: need N*N+2*N, prefer N*N+N+N*NB) +* Workspace: need 3*N [e, tauq, taup] + N*N [U] + N [work] +* Workspace: prefer 3*N [e, tauq, taup] + N*N [U] + N*NB [work] * CALL DORMBR( 'P', 'R', 'T', N, N, N, A, LDA, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * - IF( LWORK.GE.M*N+3*N+BDSPAC ) THEN + IF( LWORK .GE. M*N + 3*N + BDSPAC ) THEN * +* Path 5o-fast * Overwrite WORK(IU) by left singular vectors of A -* (Workspace: need N*N+2*N, prefer N*N+N+N*NB) +* Workspace: need 3*N [e, tauq, taup] + M*N [U] + N [work] +* Workspace: prefer 3*N [e, tauq, taup] + M*N [U] + N*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, N, N, A, LDA, $ WORK( ITAUQ ), WORK( IU ), LDWRKU, - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Copy left singular vectors of A from WORK(IU) to A * CALL DLACPY( 'F', M, N, WORK( IU ), LDWRKU, A, LDA ) ELSE * +* Path 5o-slow * Generate Q in A -* (Workspace: need N*N+2*N, prefer N*N+N+N*NB) +* Workspace: need 3*N [e, tauq, taup] + N*N [U] + N [work] +* Workspace: prefer 3*N [e, tauq, taup] + N*N [U] + N*NB [work] * CALL DORGBR( 'Q', M, N, N, A, LDA, WORK( ITAUQ ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Multiply Q in A by left singular vectors of * bidiagonal matrix in WORK(IU), storing result in * WORK(IR) and copying to A -* (Workspace: need 2*N*N, prefer N*N+M*N) +* Workspace: need 3*N [e, tauq, taup] + N*N [U] + NB*N [R] +* Workspace: prefer 3*N [e, tauq, taup] + N*N [U] + M*N [R] * DO 20 I = 1, M, LDWRKR - CHUNK = MIN( M-I+1, LDWRKR ) + CHUNK = MIN( M - I + 1, LDWRKR ) CALL DGEMM( 'N', 'N', CHUNK, N, N, ONE, A( I, 1 ), $ LDA, WORK( IU ), LDWRKU, ZERO, $ WORK( IR ), LDWRKR ) @@ -926,10 +1012,11 @@ * ELSE IF( WNTQS ) THEN * +* Path 5s (M >= N, JOBZ='S') * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U and computing right singular * vectors of bidiagonal matrix in VT -* (Workspace: need N+BDSPAC) +* Workspace: need 3*N [e, tauq, taup] + BDSPAC * CALL DLASET( 'F', M, N, ZERO, ZERO, U, LDU ) CALL DBDSDC( 'U', 'I', N, S, WORK( IE ), U, LDU, VT, @@ -938,20 +1025,22 @@ * * Overwrite U by left singular vectors of A and VT * by right singular vectors of A -* (Workspace: need 3*N, prefer 2*N+N*NB) +* Workspace: need 3*N [e, tauq, taup] + N [work] +* Workspace: prefer 3*N [e, tauq, taup] + N*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, N, N, A, LDA, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', N, N, N, A, LDA, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) ELSE IF( WNTQA ) THEN * +* Path 5a (M >= N, JOBZ='A') * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U and computing right singular * vectors of bidiagonal matrix in VT -* (Workspace: need N+BDSPAC) +* Workspace: need 3*N [e, tauq, taup] + BDSPAC * CALL DLASET( 'F', M, M, ZERO, ZERO, U, LDU ) CALL DBDSDC( 'U', 'I', N, S, WORK( IE ), U, LDU, VT, @@ -961,20 +1050,21 @@ * Set the right corner of U to identity matrix * IF( M.GT.N ) THEN - CALL DLASET( 'F', M-N, M-N, ZERO, ONE, U( N+1, N+1 ), + CALL DLASET( 'F', M - N, M - N, ZERO, ONE, U(N+1,N+1), $ LDU ) END IF * * Overwrite U by left singular vectors of A and VT * by right singular vectors of A -* (Workspace: need N*N+2*N+M, prefer N*N+2*N+M*NB) +* Workspace: need 3*N [e, tauq, taup] + M [work] +* Workspace: prefer 3*N [e, tauq, taup] + M*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, M, N, A, LDA, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', N, N, M, A, LDA, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) END IF * END IF @@ -989,17 +1079,18 @@ * IF( WNTQN ) THEN * -* Path 1t (N much larger than M, JOBZ='N') +* Path 1t (N >> M, JOBZ='N') * No singular vectors to be computed * ITAU = 1 NWORK = ITAU + M * * Compute A=L*Q -* (Workspace: need 2*M, prefer M+M*NB) +* Workspace: need M [tau] + M [work] +* Workspace: prefer M [tau] + M*NB [work] * CALL DGELQF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Zero out above L * @@ -1010,7 +1101,8 @@ NWORK = ITAUP + M * * Bidiagonalize L in A -* (Workspace: need 4*M, prefer 3*M+2*M*NB) +* Workspace: need 3*M [e, tauq, taup] + M [work] +* Workspace: prefer 3*M [e, tauq, taup] + 2*M*NB [work] * CALL DGEBRD( M, M, A, LDA, S, WORK( IE ), WORK( ITAUQ ), $ WORK( ITAUP ), WORK( NWORK ), LWORK-NWORK+1, @@ -1018,68 +1110,69 @@ NWORK = IE + M * * Perform bidiagonal SVD, computing singular values only -* (Workspace: need M+BDSPAC) +* Workspace: need M [e] + BDSPAC * CALL DBDSDC( 'U', 'N', M, S, WORK( IE ), DUM, 1, DUM, 1, $ DUM, IDUM, WORK( NWORK ), IWORK, INFO ) * ELSE IF( WNTQO ) THEN * -* Path 2t (N much larger than M, JOBZ='O') +* Path 2t (N >> M, JOBZ='O') * M right singular vectors to be overwritten on A and * M left singular vectors to be computed in U * IVT = 1 * -* IVT is M by M +* WORK(IVT) is M by M +* WORK(IL) is M by M; it is later resized to M by chunk for gemm * IL = IVT + M*M - IF( LWORK.GE.M*N+M*M+3*M+BDSPAC ) THEN -* -* WORK(IL) is M by N -* + IF( LWORK .GE. M*N + M*M + 3*M + BDSPAC ) THEN LDWRKL = M CHUNK = N ELSE LDWRKL = M - CHUNK = ( LWORK-M*M ) / M + CHUNK = ( LWORK - M*M ) / M END IF ITAU = IL + LDWRKL*M NWORK = ITAU + M * * Compute A=L*Q -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need M*M [VT] + M*M [L] + M [tau] + M [work] +* Workspace: prefer M*M [VT] + M*M [L] + M [tau] + M*NB [work] * CALL DGELQF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Copy L to WORK(IL), zeroing about above it * CALL DLACPY( 'L', M, M, A, LDA, WORK( IL ), LDWRKL ) - CALL DLASET( 'U', M-1, M-1, ZERO, ZERO, - $ WORK( IL+LDWRKL ), LDWRKL ) + CALL DLASET( 'U', M - 1, M - 1, ZERO, ZERO, + $ WORK( IL + LDWRKL ), LDWRKL ) * * Generate Q in A -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need M*M [VT] + M*M [L] + M [tau] + M [work] +* Workspace: prefer M*M [VT] + M*M [L] + M [tau] + M*NB [work] * CALL DORGLQ( M, N, M, A, LDA, WORK( ITAU ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) IE = ITAU ITAUQ = IE + M ITAUP = ITAUQ + M NWORK = ITAUP + M * * Bidiagonalize L in WORK(IL) -* (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB) +* Workspace: need M*M [VT] + M*M [L] + 3*M [e, tauq, taup] + M [work] +* Workspace: prefer M*M [VT] + M*M [L] + 3*M [e, tauq, taup] + 2*M*NB [work] * CALL DGEBRD( M, M, WORK( IL ), LDWRKL, S, WORK( IE ), $ WORK( ITAUQ ), WORK( ITAUP ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U, and computing right singular * vectors of bidiagonal matrix in WORK(IVT) -* (Workspace: need M+M*M+BDSPAC) +* Workspace: need M*M [VT] + M*M [L] + 3*M [e, tauq, taup] + BDSPAC * CALL DBDSDC( 'U', 'I', M, S, WORK( IE ), U, LDU, $ WORK( IVT ), M, DUM, IDUM, WORK( NWORK ), @@ -1087,21 +1180,24 @@ * * Overwrite U by left singular vectors of L and WORK(IVT) * by right singular vectors of L -* (Workspace: need 2*M*M+3*M, prefer 2*M*M+2*M+M*NB) +* Workspace: need M*M [VT] + M*M [L] + 3*M [e, tauq, taup] + M [work] +* Workspace: prefer M*M [VT] + M*M [L] + 3*M [e, tauq, taup] + M*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, M, M, WORK( IL ), LDWRKL, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', M, M, M, WORK( IL ), LDWRKL, $ WORK( ITAUP ), WORK( IVT ), M, - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Multiply right singular vectors of L in WORK(IVT) by Q * in A, storing result in WORK(IL) and copying to A -* (Workspace: need 2*M*M, prefer M*M+M*N) +* Workspace: need M*M [VT] + M*M [L] +* Workspace: prefer M*M [VT] + M*N [L] +* At this point, L is resized as M by chunk. * DO 30 I = 1, N, CHUNK - BLK = MIN( N-I+1, CHUNK ) + BLK = MIN( N - I + 1, CHUNK ) CALL DGEMM( 'N', 'N', M, BLK, M, ONE, WORK( IVT ), M, $ A( 1, I ), LDA, ZERO, WORK( IL ), LDWRKL ) CALL DLACPY( 'F', M, BLK, WORK( IL ), LDWRKL, @@ -1110,7 +1206,7 @@ * ELSE IF( WNTQS ) THEN * -* Path 3t (N much larger than M, JOBZ='S') +* Path 3t (N >> M, JOBZ='S') * M right singular vectors to be computed in VT and * M left singular vectors to be computed in U * @@ -1123,38 +1219,41 @@ NWORK = ITAU + M * * Compute A=L*Q -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need M*M [L] + M [tau] + M [work] +* Workspace: prefer M*M [L] + M [tau] + M*NB [work] * CALL DGELQF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Copy L to WORK(IL), zeroing out above it * CALL DLACPY( 'L', M, M, A, LDA, WORK( IL ), LDWRKL ) - CALL DLASET( 'U', M-1, M-1, ZERO, ZERO, - $ WORK( IL+LDWRKL ), LDWRKL ) + CALL DLASET( 'U', M - 1, M - 1, ZERO, ZERO, + $ WORK( IL + LDWRKL ), LDWRKL ) * * Generate Q in A -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need M*M [L] + M [tau] + M [work] +* Workspace: prefer M*M [L] + M [tau] + M*NB [work] * CALL DORGLQ( M, N, M, A, LDA, WORK( ITAU ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) IE = ITAU ITAUQ = IE + M ITAUP = ITAUQ + M NWORK = ITAUP + M * -* Bidiagonalize L in WORK(IU), copying result to U -* (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB) +* Bidiagonalize L in WORK(IU). +* Workspace: need M*M [L] + 3*M [e, tauq, taup] + M [work] +* Workspace: prefer M*M [L] + 3*M [e, tauq, taup] + 2*M*NB [work] * CALL DGEBRD( M, M, WORK( IL ), LDWRKL, S, WORK( IE ), $ WORK( ITAUQ ), WORK( ITAUP ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U and computing right singular * vectors of bidiagonal matrix in VT -* (Workspace: need M+BDSPAC) +* Workspace: need M*M [L] + 3*M [e, tauq, taup] + BDSPAC * CALL DBDSDC( 'U', 'I', M, S, WORK( IE ), U, LDU, VT, $ LDVT, DUM, IDUM, WORK( NWORK ), IWORK, @@ -1162,18 +1261,19 @@ * * Overwrite U by left singular vectors of L and VT * by right singular vectors of L -* (Workspace: need M*M+3*M, prefer M*M+2*M+M*NB) +* Workspace: need M*M [L] + 3*M [e, tauq, taup] + M [work] +* Workspace: prefer M*M [L] + 3*M [e, tauq, taup] + M*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, M, M, WORK( IL ), LDWRKL, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', M, M, M, WORK( IL ), LDWRKL, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * * Multiply right singular vectors of L in WORK(IL) by * Q in A, storing result in VT -* (Workspace: need M*M) +* Workspace: need M*M [L] * CALL DLACPY( 'F', M, M, VT, LDVT, WORK( IL ), LDWRKL ) CALL DGEMM( 'N', 'N', M, N, M, ONE, WORK( IL ), LDWRKL, @@ -1181,7 +1281,7 @@ * ELSE IF( WNTQA ) THEN * -* Path 4t (N much larger than M, JOBZ='A') +* Path 4t (N >> M, JOBZ='A') * N right singular vectors to be computed in VT and * M left singular vectors to be computed in U * @@ -1194,17 +1294,19 @@ NWORK = ITAU + M * * Compute A=L*Q, copying result to VT -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need M*M [VT] + M [tau] + M [work] +* Workspace: prefer M*M [VT] + M [tau] + M*NB [work] * CALL DGELQF( M, N, A, LDA, WORK( ITAU ), WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DLACPY( 'U', M, N, A, LDA, VT, LDVT ) * * Generate Q in VT -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need M*M [VT] + M [tau] + N [work] +* Workspace: prefer M*M [VT] + M [tau] + N*NB [work] * CALL DORGLQ( N, N, M, VT, LDVT, WORK( ITAU ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Produce L in A, zeroing out other entries * @@ -1215,7 +1317,8 @@ NWORK = ITAUP + M * * Bidiagonalize L in A -* (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB) +* Workspace: need M*M [VT] + 3*M [e, tauq, taup] + M [work] +* Workspace: prefer M*M [VT] + 3*M [e, tauq, taup] + 2*M*NB [work] * CALL DGEBRD( M, M, A, LDA, S, WORK( IE ), WORK( ITAUQ ), $ WORK( ITAUP ), WORK( NWORK ), LWORK-NWORK+1, @@ -1224,7 +1327,7 @@ * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U and computing right singular * vectors of bidiagonal matrix in WORK(IVT) -* (Workspace: need M+M*M+BDSPAC) +* Workspace: need M*M [VT] + 3*M [e, tauq, taup] + BDSPAC * CALL DBDSDC( 'U', 'I', M, S, WORK( IE ), U, LDU, $ WORK( IVT ), LDWKVT, DUM, IDUM, @@ -1232,18 +1335,19 @@ * * Overwrite U by left singular vectors of L and WORK(IVT) * by right singular vectors of L -* (Workspace: need M*M+3*M, prefer M*M+2*M+M*NB) +* Workspace: need M*M [VT] + 3*M [e, tauq, taup]+ M [work] +* Workspace: prefer M*M [VT] + 3*M [e, tauq, taup]+ M*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, M, M, A, LDA, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', M, M, M, A, LDA, $ WORK( ITAUP ), WORK( IVT ), LDWKVT, - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Multiply right singular vectors of L in WORK(IVT) by * Q in VT, storing result in A -* (Workspace: need M*M) +* Workspace: need M*M [VT] * CALL DGEMM( 'N', 'N', M, N, M, ONE, WORK( IVT ), LDWKVT, $ VT, LDVT, ZERO, A, LDA ) @@ -1258,7 +1362,7 @@ * * N .LT. MNTHR * -* Path 5t (N greater than M, but not much larger) +* Path 5t (N > M, but not much larger) * Reduce to bidiagonal form without LQ decomposition * IE = 1 @@ -1267,28 +1371,33 @@ NWORK = ITAUP + M * * Bidiagonalize A -* (Workspace: need 3*M+N, prefer 3*M+(M+N)*NB) +* Workspace: need 3*M [e, tauq, taup] + N [work] +* Workspace: prefer 3*M [e, tauq, taup] + (M+N)*NB [work] * CALL DGEBRD( M, N, A, LDA, S, WORK( IE ), WORK( ITAUQ ), $ WORK( ITAUP ), WORK( NWORK ), LWORK-NWORK+1, $ IERR ) IF( WNTQN ) THEN * +* Path 5tn (N > M, JOBZ='N') * Perform bidiagonal SVD, only computing singular values -* (Workspace: need M+BDSPAC) +* Workspace: need 3*M [e, tauq, taup] + BDSPAC * CALL DBDSDC( 'L', 'N', M, S, WORK( IE ), DUM, 1, DUM, 1, $ DUM, IDUM, WORK( NWORK ), IWORK, INFO ) ELSE IF( WNTQO ) THEN +* Path 5to (N > M, JOBZ='O') LDWKVT = M IVT = NWORK - IF( LWORK.GE.M*N+3*M+BDSPAC ) THEN + IF( LWORK .GE. M*N + 3*M + BDSPAC ) THEN * * WORK( IVT ) is M by N * CALL DLASET( 'F', M, N, ZERO, ZERO, WORK( IVT ), $ LDWKVT ) NWORK = IVT + LDWKVT*N +* IL is unused; silence compile warnings + IL = -1 ELSE * * WORK( IVT ) is M by M @@ -1298,52 +1407,58 @@ * * WORK(IL) is M by CHUNK * - CHUNK = ( LWORK-M*M-3*M ) / M + CHUNK = ( LWORK - M*M - 3*M ) / M END IF * * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U and computing right singular * vectors of bidiagonal matrix in WORK(IVT) -* (Workspace: need M*M+BDSPAC) +* Workspace: need 3*M [e, tauq, taup] + M*M [VT] + BDSPAC * CALL DBDSDC( 'L', 'I', M, S, WORK( IE ), U, LDU, $ WORK( IVT ), LDWKVT, DUM, IDUM, $ WORK( NWORK ), IWORK, INFO ) * * Overwrite U by left singular vectors of A -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need 3*M [e, tauq, taup] + M*M [VT] + M [work] +* Workspace: prefer 3*M [e, tauq, taup] + M*M [VT] + M*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, M, N, A, LDA, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) * - IF( LWORK.GE.M*N+3*M+BDSPAC ) THEN + IF( LWORK .GE. M*N + 3*M + BDSPAC ) THEN * +* Path 5to-fast * Overwrite WORK(IVT) by left singular vectors of A -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need 3*M [e, tauq, taup] + M*N [VT] + M [work] +* Workspace: prefer 3*M [e, tauq, taup] + M*N [VT] + M*NB [work] * CALL DORMBR( 'P', 'R', 'T', M, N, M, A, LDA, $ WORK( ITAUP ), WORK( IVT ), LDWKVT, - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Copy right singular vectors of A from WORK(IVT) to A * CALL DLACPY( 'F', M, N, WORK( IVT ), LDWKVT, A, LDA ) ELSE * +* Path 5to-slow * Generate P**T in A -* (Workspace: need M*M+2*M, prefer M*M+M+M*NB) +* Workspace: need 3*M [e, tauq, taup] + M*M [VT] + M [work] +* Workspace: prefer 3*M [e, tauq, taup] + M*M [VT] + M*NB [work] * CALL DORGBR( 'P', M, N, M, A, LDA, WORK( ITAUP ), - $ WORK( NWORK ), LWORK-NWORK+1, IERR ) + $ WORK( NWORK ), LWORK - NWORK + 1, IERR ) * * Multiply Q in A by right singular vectors of * bidiagonal matrix in WORK(IVT), storing result in * WORK(IL) and copying to A -* (Workspace: need 2*M*M, prefer M*M+M*N) +* Workspace: need 3*M [e, tauq, taup] + M*M [VT] + M*NB [L] +* Workspace: prefer 3*M [e, tauq, taup] + M*M [VT] + M*N [L] * DO 40 I = 1, N, CHUNK - BLK = MIN( N-I+1, CHUNK ) + BLK = MIN( N - I + 1, CHUNK ) CALL DGEMM( 'N', 'N', M, BLK, M, ONE, WORK( IVT ), $ LDWKVT, A( 1, I ), LDA, ZERO, $ WORK( IL ), M ) @@ -1353,10 +1468,11 @@ END IF ELSE IF( WNTQS ) THEN * +* Path 5ts (N > M, JOBZ='S') * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U and computing right singular * vectors of bidiagonal matrix in VT -* (Workspace: need M+BDSPAC) +* Workspace: need 3*M [e, tauq, taup] + BDSPAC * CALL DLASET( 'F', M, N, ZERO, ZERO, VT, LDVT ) CALL DBDSDC( 'L', 'I', M, S, WORK( IE ), U, LDU, VT, @@ -1365,20 +1481,22 @@ * * Overwrite U by left singular vectors of A and VT * by right singular vectors of A -* (Workspace: need 3*M, prefer 2*M+M*NB) +* Workspace: need 3*M [e, tauq, taup] + M [work] +* Workspace: prefer 3*M [e, tauq, taup] + M*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, M, N, A, LDA, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', M, N, M, A, LDA, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) ELSE IF( WNTQA ) THEN * +* Path 5ta (N > M, JOBZ='A') * Perform bidiagonal SVD, computing left singular vectors * of bidiagonal matrix in U and computing right singular * vectors of bidiagonal matrix in VT -* (Workspace: need M+BDSPAC) +* Workspace: need 3*M [e, tauq, taup] + BDSPAC * CALL DLASET( 'F', N, N, ZERO, ZERO, VT, LDVT ) CALL DBDSDC( 'L', 'I', M, S, WORK( IE ), U, LDU, VT, @@ -1388,20 +1506,21 @@ * Set the right corner of VT to identity matrix * IF( N.GT.M ) THEN - CALL DLASET( 'F', N-M, N-M, ZERO, ONE, VT( M+1, M+1 ), + CALL DLASET( 'F', N-M, N-M, ZERO, ONE, VT(M+1,M+1), $ LDVT ) END IF * * Overwrite U by left singular vectors of A and VT * by right singular vectors of A -* (Workspace: need 2*M+N, prefer 2*M+N*NB) +* Workspace: need 3*M [e, tauq, taup] + N [work] +* Workspace: prefer 3*M [e, tauq, taup] + N*NB [work] * CALL DORMBR( 'Q', 'L', 'N', M, M, N, A, LDA, $ WORK( ITAUQ ), U, LDU, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) CALL DORMBR( 'P', 'R', 'T', N, N, M, A, LDA, $ WORK( ITAUP ), VT, LDVT, WORK( NWORK ), - $ LWORK-NWORK+1, IERR ) + $ LWORK - NWORK + 1, IERR ) END IF * END IF |