Mostly empty numeric data appears in graph adjacency tables, one-hot feature rows, finite-element assemblies, and text or recommender matrices. SciPy sparse arrays store only the coordinates and values that matter, so a small coordinate list can represent a much larger two-dimensional array.
coo_array() accepts three aligned arrays: stored values, row coordinates, and column coordinates. The coordinate order does not need to be sorted, and an explicit shape keeps empty trailing rows or columns instead of letting SciPy infer the smallest possible shape.
Use COO while collecting entries, then convert to CSR or CSC before arithmetic or matrix-vector work. A small toarray() check is readable for a tiny test array, but large sparse arrays should stay sparse outside short inspections.
Steps to create a SciPy sparse array:
- Create a Python script named sparse_array_create.py.
- sparse_array_create.py
import numpy as np from scipy.sparse import coo_array row = np.array([0, 0, 1, 2, 2]) col = np.array([0, 2, 2, 0, 0]) data = np.array([10, 3, 8, 4, 6], dtype=float) coo = coo_array((data, (row, col)), shape=(3, 4)) csr = coo.tocsr() weights = np.array([1.0, 2.0, 0.5, 0.0]) print("coo format:", coo.format) print("coo shape:", coo.shape) print("coo stored values:", coo.nnz) print("csr format:", csr.format) print("csr stored values:", csr.nnz) print("dense rows:") for dense_row in csr.toarray(): print(dense_row) print("row totals:", np.asarray(csr.sum(axis=1)).ravel()) print("matrix-vector product:", csr @ weights)
COO stores coordinate triples efficiently while values are being assembled.
- Run the script.
$ python3 sparse_array_create.py coo format: coo coo shape: (3, 4) coo stored values: 5 csr format: csr csr stored values: 4 dense rows: [10. 0. 3. 0.] [0. 0. 8. 0.] [10. 0. 0. 0.] row totals: [13. 8. 10.] matrix-vector product: [11.5 4. 10. ]
- Use one row coordinate and one column coordinate for each stored value.
row = np.array([0, 0, 1, 2, 2]) col = np.array([0, 2, 2, 0, 0]) data = np.array([10, 3, 8, 4, 6], dtype=float)
The final two values both target coordinate (2, 0). COO can hold those duplicate entries before conversion.
- Set the sparse array shape explicitly.
coo = coo_array((data, (row, col)), shape=(3, 4))
The fourth column has no stored values. Passing shape=(3, 4) keeps that empty column in the result.
- Convert the coordinate array to CSR before row-oriented work.
csr = coo.tocsr()
Converting to CSR sums the duplicate (2, 0) entries, so nnz changes from 5 stored coordinate entries to 4 stored positions.
- Use dense output only for a small correctness check.
for dense_row in csr.toarray(): print(dense_row)
toarray() materializes every zero. Use it for tiny checks, not for large sparse datasets.
- Test one sparse operation against the created array.
weights = np.array([1.0, 2.0, 0.5, 0.0]) print(csr @ weights)
The matrix-vector product uses the CSR sparse array directly and returns [11.5 4. 10. ] for this input.
- Remove the demo script when it was only created for the check.
$ rm sparse_array_create.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.