How to find unique values in a NumPy array

Categorical arrays often contain repeated labels after logs, survey answers, or model outputs are loaded into NumPy. Finding the distinct entries shows which labels are present, while counts show whether one category dominates the sample.

np.unique() returns the unique array in sorted order by default, and optional outputs can provide counts, first input positions, and inverse positions. Those extra arrays stay aligned with the returned unique values, so a label and its count use the same index.

If reports need the original first-seen order, sort the returned first-position indexes and apply that order to both unique values and counts. The inverse output is useful when encoded positions must be checked because indexing the unique array with the inverse should rebuild the original labels.

Steps to find unique values with NumPy:

  1. Create a script that asks np.unique() for unique labels, positions, inverse indexes, and counts.
    unique-values-find.py
    import numpy as np
     
    labels = np.array(["api", "web", "api", "batch", "web", "api"])
     
    unique, first_index, inverse, counts = np.unique(
        labels,
        return_index=True,
        return_inverse=True,
        return_counts=True,
    )
    first_seen_order = np.argsort(first_index)
     
    print("input:", labels)
    print("unique sorted:", unique)
    print("counts:", counts)
    print("first positions:", first_index)
    print("first-seen unique:", unique[first_seen_order])
    print("first-seen counts:", counts[first_seen_order])
    print("inverse:", inverse)
    print("reconstructed:", unique[inverse])
    print("matches input:", np.array_equal(unique[inverse], labels))

    return_index=True stores the first input position for each returned unique value. return_inverse=True stores indexes that can reconstruct the original array from the unique array.

  2. Run the script to confirm that the inverse indexes reconstruct the original labels.
    $ python unique-values-find.py
    input: ['api' 'web' 'api' 'batch' 'web' 'api']
    unique sorted: ['api' 'batch' 'web']
    counts: [3 1 2]
    first positions: [0 3 1]
    first-seen unique: ['api' 'web' 'batch']
    first-seen counts: [3 2 1]
    inverse: [0 2 0 1 2 0]
    reconstructed: ['api' 'web' 'api' 'batch' 'web' 'api']
    matches input: True

    np.unique() sorts the returned unique values by default. The first-seen arrays reorder the same labels and counts by their first position in the input.