python - Many small matrices speed-up for loops -
i have large coordinate grid (vectors , b), generate , solve matrix (10x10) equation. there way scipy.linalg.solve
accept vector input? far solution run for cycles on coordinate arrays.
import time import numpy np import scipy.linalg n = 10 = np.linspace(0, 1, 10**3) b = np.linspace(0, 1, 2*10**3) = np.random.random((n, n)) # input matrix, not static def f(a,b,n): # array-filling function return a*b*n def sol(x,y): # matrix solver d = np.arange(0,n) b = f(x,y,d)**2 + f(x-1, y+1, d) # source vector x = scipy.linalg.solve(a,b) return x # output n-size vector start = time.time() answer = np.zeros(shape=(a.size, b.size)) # predefine output array egg in range(a.size): # ugly double-for cycle ham in range(b.size): aa = a[egg] bb = b[ham] answer[egg,ham] = sol(aa,bb)[0] print time.time() - start
@yevgeniy right, efficiently solving multiple independent linear systems a x = b scipy bit tricky (assuming a array changes every iteration).
for instance, here benchmark solving 1000 systems of form, a x = b, 10x10
matrix, , b
10
element vector. surprisingly, approach put 1 block diagonal matrix , call scipy.linalg.solve
once indeed slower both dense , sparse matrices.
import numpy np scipy.linalg import block_diag, solve scipy.sparse import block_diag sp_block_diag scipy.sparse.linalg import spsolve n = 10 m = 1000 # number of coordinates ai = np.random.randn(n, n) # can compute inverse here, # let's assume ai different matrices in loop loop bi = np.random.randn(n) %timeit [solve(ai, bi) el in range(m)] # 10 loops, best of 3: 32.1 ms per loop afull = sp_block_diag([ai]*m, format='csr') bfull = np.tile(bi, m) %timeit afull = sp_block_diag([ai]*m, format='csr') %timeit spsolve(afull, bfull) # 1 loops, best of 3: 303 ms per loop # 100 loops, best of 3: 5.55 ms per loop afull = block_diag(*[ai]*m) %timeit afull = block_diag(*[ai]*m) %timeit solve(afull, bfull) # 100 loops, best of 3: 14.1 ms per loop # 1 loops, best of 3: 23.6 s per loop
the solution of linear system, sparse arrays faster, time create block diagonal array slow. dense arrays, slower in case (and take lots of ram).
maybe i'm missing how make work efficiently sparse arrays, if keeping for
loops, there 2 things optimizations.
from pure python, @ source code of scipy.linalg.solve
: remove unnecessary tests , factorize repeated operations outside of loops. instance, assuming arrays not symmetrical positives, do
from scipy.linalg import get_lapack_funcs gesv, = get_lapack_funcs(('gesv',), (ai, bi)) def solve_opt(a, b, gesv=gesv): # not sure if copying , b necessary, in case (faster if arrays not copied) lu, piv, x, info = gesv(a.copy(), b.copy(), overwrite_a=false, overwrite_b=false) if info == 0: return x if info > 0: raise linalgerror("singular matrix") raise valueerror('illegal value in %d-th argument of internal gesv|posv' % -info) %timeit [solve(ai, bi) el in range(m)] %timeit [solve_opt(ai, bi) el in range(m)] # 10 loops, best of 3: 30.1 ms per loop # 100 loops, best of 3: 3.77 ms per loop
which results in 6.5x speed up.
if need better performance, have port loop in cython , interface gesv
blas functions directly in c, discussed here, or better cython api blas/lapack in scipy 0.16.
edit: @eelco hoogendoorn mentioned if a matrix fixed, there simpler , more efficient approach.
Comments
Post a Comment