Mostly empty numeric data appears in graph adjacency tables, one-hot feature rows, finite-element assemblies, and text or recommender matrices. SciPy sparse arrays store only the coordinates and values that matter, so a small coordinate list can represent a much larger two-dimensional array.
coo_array() accepts three aligned arrays: stored values, row coordinates, and column coordinates. The coordinate order does not need to be sorted, and an explicit shape keeps empty trailing rows or columns instead of letting SciPy infer the smallest possible shape.
Use COO while collecting entries, then convert to CSR or CSC before arithmetic or matrix-vector work. A small toarray() check is readable for a tiny test array, but large sparse arrays should stay sparse outside short inspections.
import numpy as np from scipy.sparse import coo_array row = np.array([0, 0, 1, 2, 2]) col = np.array([0, 2, 2, 0, 0]) data = np.array([10, 3, 8, 4, 6], dtype=float) coo = coo_array((data, (row, col)), shape=(3, 4)) csr = coo.tocsr() weights = np.array([1.0, 2.0, 0.5, 0.0]) print("coo format:", coo.format) print("coo shape:", coo.shape) print("coo stored values:", coo.nnz) print("csr format:", csr.format) print("csr stored values:", csr.nnz) print("dense rows:") for dense_row in csr.toarray(): print(dense_row) print("row totals:", np.asarray(csr.sum(axis=1)).ravel()) print("matrix-vector product:", csr @ weights)
COO stores coordinate triples efficiently while values are being assembled.
$ python3 sparse_array_create.py coo format: coo coo shape: (3, 4) coo stored values: 5 csr format: csr csr stored values: 4 dense rows: [10. 0. 3. 0.] [0. 0. 8. 0.] [10. 0. 0. 0.] row totals: [13. 8. 10.] matrix-vector product: [11.5 4. 10. ]
row = np.array([0, 0, 1, 2, 2]) col = np.array([0, 2, 2, 0, 0]) data = np.array([10, 3, 8, 4, 6], dtype=float)
The final two values both target coordinate (2, 0). COO can hold those duplicate entries before conversion.
coo = coo_array((data, (row, col)), shape=(3, 4))
The fourth column has no stored values. Passing shape=(3, 4) keeps that empty column in the result.
csr = coo.tocsr()
Converting to CSR sums the duplicate (2, 0) entries, so nnz changes from 5 stored coordinate entries to 4 stored positions.
for dense_row in csr.toarray(): print(dense_row)
toarray() materializes every zero. Use it for tiny checks, not for large sparse datasets.
weights = np.array([1.0, 2.0, 0.5, 0.0]) print(csr @ weights)
The matrix-vector product uses the CSR sparse array directly and returns [11.5 4. 10. ] for this input.
$ rm sparse_array_create.py