Categorical arrays often contain repeated labels after logs, survey answers, or model outputs are loaded into NumPy. Finding the distinct entries shows which labels are present, while counts show whether one category dominates the sample.
np.unique() returns the unique array in sorted order by default, and optional outputs can provide counts, first input positions, and inverse positions. Those extra arrays stay aligned with the returned unique values, so a label and its count use the same index.
If reports need the original first-seen order, sort the returned first-position indexes and apply that order to both unique values and counts. The inverse output is useful when encoded positions must be checked because indexing the unique array with the inverse should rebuild the original labels.
Related: Sort an array
Related: Calculate a histogram
Related: Calculate statistics
import numpy as np labels = np.array(["api", "web", "api", "batch", "web", "api"]) unique, first_index, inverse, counts = np.unique( labels, return_index=True, return_inverse=True, return_counts=True, ) first_seen_order = np.argsort(first_index) print("input:", labels) print("unique sorted:", unique) print("counts:", counts) print("first positions:", first_index) print("first-seen unique:", unique[first_seen_order]) print("first-seen counts:", counts[first_seen_order]) print("inverse:", inverse) print("reconstructed:", unique[inverse]) print("matches input:", np.array_equal(unique[inverse], labels))
return_index=True stores the first input position for each returned unique value. return_inverse=True stores indexes that can reconstruct the original array from the unique array.
$ python unique-values-find.py input: ['api' 'web' 'api' 'batch' 'web' 'api'] unique sorted: ['api' 'batch' 'web'] counts: [3 1 2] first positions: [0 3 1] first-seen unique: ['api' 'web' 'batch'] first-seen counts: [3 2 1] inverse: [0 2 0 1 2 0] reconstructed: ['api' 'web' 'api' 'batch' 'web' 'api'] matches input: True
np.unique() sorts the returned unique values by default. The first-seen arrays reorder the same labels and counts by their first position in the input.