Introducing unsig Color Clusters

It took quite a lot of time, but we now have a way to enjoy the unsigned_algorithms collection by color! In this post we’ll explore the concept of color clusters and how to explore them here on unsig.info.

It started with output colors

During the unsigned_algorithms mint, a dataset was created that split all of the output colors in the collection into what we call bins. Even though the collection contains millions of colors, having only 64 bins provided us with an approachable way to discuss and analyze unsigs. This was especially key since unsigned_algorithms was created as study of color.

Here on unsig.info visitors are able to explore the collection by the number of output colors. What has been missing since the collection was minted is a method to cluster unsigs by their shared output color combinations. The community has manually picked out and named certain groupings, like the Sakura collection with its standout green and pink, but there was no technical way to approach unsigs by color combination.

Now is the time!

The role of k-nearest neighbors

Enter the k-nearest neighbors (k-nn) algorithm. If you’re not familiar, k-nn is a staple in the machine learning world, helping to categorize data based on how it compares to existing datasets. Alexander Watanabe, the creator of unsigned_algorithms, suggested over a year ago that using k-nn could be the answer to the color combination challenge.

After rebuilding unsig.info last year, my earliest forays into Python centered around experimenting with k-nn for this purpose. It was promising, but the resulting clusters had more variation than I’d hoped. A breakthrough came with the aid of ChatGPT, and a few refinements to my k-nn algorithm brought everything together.

Using the “elbow method”, which is a way to pinpoint the most effective number of clusters for data, 132 surfaced as the magic number. This led to unsigs being meaningfully grouped by color, in line with observations about color combinations made visually by the community. My results are just one approach to categorizing the collection by color though. A different number of clusters or a different solution to the problem could allow us to talk about color combinations in even more ways.

Exploring by Color Cluster

Among these 132 clusters, many unsigs seem to share a color story. But it’s not just about the colors; it’s about their proportions too. Take the Astatke collection as an example. Here, yellow/red/green combos rule, but the shades vary in dominance. Some unsigs might lean heavy on yellow, while others contain small amounts of blue and magenta.

unsig #03526 - Cluster 97
unsig #03527 - Cluster 108
unsig #10426 - Cluster 8
unsig #11079 - Cluster 53
unsig #12309 - Cluster 110
unsig #12725 - Cluster 43
unsig #13205 - Cluster 69

To really capture these differences, we took the top 18 colors from each cluster and examined their proportions. This gave us new visuals that highlighted the color spectra within clusters, helping to distinguish one from the other. You might not be able to find your unsig in a cluster based on its unique spectrum, but the unsig Details page now contains cluster data if you search for a specific unsig. (Special thanks to @Mar5man for creating the script to generate these images!)

Cluster 97
Cluster 108
Cluster 8
Cluster 53
Cluster 110
Cluster 43
Cluster 69

As you can see, these unsigs all seem to belong in the same “super cluster,” but some of their neighbors are closer than others.

Unusual cases

Given that some pieces in the collection are monochromatic or even achromatic, it makes sense to ask about which clusters they were placed in. The k-nn algorithm slotted these pieces into clusters where they fit very well, even if they could belong to any number of clusters. Rather than handpick exceptions, it seemed better to just let these pieces live in their assigned clusters. A piece is no more or less special because of its cluster, it’s simply one way to organize the collection.

Do clusters inform rarity?

We need to be careful when using the color clusters to describe rarity, at least initially. The clusters aren’t any kind of objective measure of the collection, and a different number of clusters could have resulted in pieces being sorted into a larger or smaller cluster.

The clustering is an organizational structure that lets us see that some color combinations, such as the Astatke collection above, are very common. It also can shed light on color combinations like the red/cyan Anaglyph collection, which is split into only two relatively small clusters, meaning it’s on the rare side. More analysis to come on that in time!

What's next?

With this milestone behind us, I’m eager to see what the community will bring to the table. Maybe now that it’s easier to isolate color collections some new names will be on the horizon!