How to Convert a Picture to Numbers

The experience of seeing the world around us is all but impossible to capture in a handful of words, from the careful steps of a marching ant, to the works of Pablo Picasso and Beatrix Potter, to a solitary oak tree, twisted and dignified. It’s ridiculous to think that we could ever reduce it all to ones and zeros. Except that we have. In fact, our images are so realistic now that we go to pains to re-introduce artifacts, like the washed out colors of Polaroids or the scratches on celluloid.

This may be a blow to romanticism, but it’s great luck for machine learning practitioners. Reducing images to numbers makes them amenable to computation.

Color perception

Color fascinates me because it is less about physics than it is about the physiology and psychology of human perception. All our standards are determined by what humans perceive. The range that needs to be covered, the number of channels necessary to represent a color, the resolution with which a color must be specified, and hence the information density and storage requirements, all depend on human retinas and visual cortices.

It also means that, as with everything human, there is a great deal of variability. There are deficiencies like color blindness (I myself experience deuteranomaly, a type of red-green colorblindness) and there are those with unusual abilities, like tetrachromats, who have not three types of color receptors, but four, and can distinguish colors that the rest of us can’t. Because of this, keep in mind that all of the statements we can make about perception are generalizations only, and there will be individual differences.

Although photons vibrate at all frequencies, we have three distinct types of color-sensing cones, each with its characteristic frequency response, a particular color that it responds strongly to. That means that a combination of just three light sources with carefully chosen colors in carefully chosen intensities can make us experience any color that we’re capable of seeing in the natural world.

Making color

In computer screens, this is done with a red, a green, and a blue light, often light-emitting diodes (LEDs).

In practice the red, green, and blue LEDs in a computer screen can’t represent all the colors we can see. To make a colored LED, a chemical is introduced that fluoresces at about the right color. These are close to ideal red, blue, and green but they aren’t perfect. For this reason there is a bit of a gap between the range of colors that you can see in the real world (the gamut) and what you can see on a computer screen.

As a side note, lasers are capable of producing color much closer to the ideal. Commercially available laser projection systems cover much more of the human perceivable gamut, and laser micro arrays for computer screens are a current topic of research and development.

Turning color into numbers

Each pixel in a screen is a triplet of a red, a green, and a blue light source, but when you look at them from far enough away they are too small for your eye to distinguish, and they look like a single small patch of color. One way to determine which color is produced is to specify the intensity levels of each of the light sources. Since the just noticeable difference (JND) in human perception of color intensity tends to stay in the neighborhood of one part in a hundred, using 256 discrete levels gives enough fine-grained control that color gradients look smooth.

256 intensity levels can be represented with 8 bits or 1 byte. It can also be represented with two hexadecimal numbers, between 0x00 for zero brightness and 0xff for maximum brightness. Specifying the intensity of three colors takes triple that: 6 hexidecimal numbers (24 bits or 3 bytes). The hex representation gives a concise way to call out a red-green-blue color. The first two digits show the red level, the second two correspond to the green level, and the third pair correponds to the blue level. Here are a few extremes.

There are a lot more useful color hex codes here. For convenience and code readability, colors can also be represented as triples of decimals, as in (255, 255, 255) for white or (0, 255, 0) for green.