108. The Representation of Images (3)

Return to "iOS/Android Audio and Video Development Guide "

In the chapters that have unfolded before us, we embarked on a journey to unravel a question that bridges the realms of perception and technology: How do the "scenes" captured by our eyes transform into the "image data" processed by our gadgets like smartphones and computers? This inquiry led us through a series of explorations in "The Representation of Images (1)" and "The Representation of Images (2)," where we delved into the essence of what an image truly is, the principles behind image formation, and the mathematical descriptions that allow us to capture images in the digital age. Now, we stand at the threshold of further discovery, venturing into the realms of digitalizing images and understanding the nature of digital image data.

The Digitalization of Images

The journey to digitalize an image, akin to translating the physical world into a language understood by machines, is intricate, filled with numerous steps and detailed processes. Imagine a modern digital camera, a marvel of technology that serves as our bridge between the seen and the recorded. Even a simplified overview of its process, a mere snapshot of the intricate dance of capturing light and shadow, involves complexities far beyond what can be contained within a few articles. Hence, our aim is not to exhaustively cover every facet of this process but to sketch a basic understanding, to lay the groundwork for the conceptual framework that underpins the digitalization of images.

As we journey further into the labyrinth of digital transformation, we find parallels between the realms of images and sounds—both seek to capture the essence of our world through the lens of technology. The process, at its core, is about converting analog signals, those continuous whispers of the physical world, into digital signals, the discrete language of computers. This transformation is twofold, encompassing both sampling and quantization.

Imagine you are painting a scene, where each stroke and color choice is guided by your perception. In the digital realm, this artistry is mirrored by the process of creating a digital image. To digitize the vast continuum of visual information, we represent the colors of the pixels within an image through a function f(x, y), where 'x' and 'y' are coordinates on the plane of the image. This function, owing to the continuous nature of our world, could theoretically assume an infinite number of values. However, such continuous functions are beyond the grasp of digital systems, which thrive on precision and discreteness. They cannot process, transmit, or store an infinite spectrum of values without first translating them into a language they understand.

Therefore, we undertake two critical steps to bridge this divide: sampling, which digitizes the coordinates, and quantization, which converts the infinite possibilities of color values into a finite set of digital equivalents. Sampling slices the continuous spatial domain into discrete pixels, while quantization compresses the color spectrum into a manageable palette of digital color values. Through these processes, the fluid beauty of the visual world is distilled into the binary clarity of digital images, ready to be processed, shared, and preserved in the digital domain.

Imagine, if you will, a scene captured within the bounds of an image, where the myriad of grays paint a picture as complex as life itself. The journey from this analog canvas to a digital masterpiece involves meticulous steps, exemplified in the process of creating a digital image from the continuous amplitude values (grayscale levels) along a segment AB in an image.

The upper right corner of our illustrative example reveals a curve—a one-dimensional function representing the grayscale values along AB, with its variations and the whisper of image noise. To digitize this continuous melody of grays, we sample the function at equal intervals along AB, as shown in the lower left illustration. Here, each sample's spatial location is marked by vertical scales at the image's base, and the samples themselves are denoted by small squares on the function curve. This collection of discrete positions forms our sampled function, a step towards capturing the essence of the scene in digital form.

However, the journey is not complete without quantization. The samples, while discrete in position, still span a continuous range of grayscale values. To craft a digital function, these grayscales must also be discretized. The process is visualized in the diagram, where the grayscale is segmented into eight discrete intervals, ranging from black to white. Vertical scales assign specific values to these eight levels of gray, and each sample is quantized by assigning it to one of these discrete grayscale levels, based on which it is closest to.

Through the meticulous processes of sampling and quantization, we arrive at the digital samples depicted in the lower right illustration. Executing this procedure row by row, from the top of the image to the bottom, we weave a two-dimensional digital image—a tapestry of digital bits that captures the essence of our original scene.

Once we have this digital image, how do we bring it to life, how do we display this digital creation for the world to see?

Our guide in this journey, the function f(x, y), serves as a beacon, illuminating three distinct paths for displaying digital images, each with its own narrative and audience.

The first path employs a three-dimensional perspective, where the axes x and y map the spatial coordinates, and a third axis reveals the corresponding grayscale value. This method, akin to sculpting with light and shadow, offers a detailed landscape of grayscale intensities. However, like a dense forest, its complexity can be overwhelming, making it less accessible for interpreting intricate images filled with an abundance of detail.
The second path leads us to a more familiar scene, reminiscent of viewing an image on a monitor. Here, the grayscale intensity at each point is directly proportional to the value of f at that point. Imagine a world of simplicity, where grayscale values are normalized to the range [0, 1], and each point on the image is an echo of either pure black, a shade of gray, or pristine white. This method, akin to painting with a limited palette, allows us to quickly grasp the essence of an image, observing its outcomes with clarity and immediacy.
The third path unfolds into a grid, a matrix representation where the numerical values of f(x, y) are laid bare. This approach, while less visually intuitive and more cumbersome for print, shines in the realm of numerical analysis and algorithm development. In this matrix, each element, known as a pixel, becomes a unit of measurement, a building block of the digital image.

Among these, the second and third paths stand out for their utility. The second offers a quick and intuitive visualization, allowing us to effortlessly perceive the image's overall composition. The third, though more austere, becomes a powerful tool in the hands of those sculpting algorithms, where each pixel's value is a note in the symphony of digital image processing.

As we navigate through the digital landscape, we encounter fundamental attributes that define our journey:

image resolution
pixel depth.

These attributes, like the dimensions of a map, guide us in understanding the scale and detail of our digital creations, marking the boundaries of our exploration in the digital domain.

In the quest to understand the essence of digital images, one term frequently crosses our paths: image resolution. Often, you might hear phrases like "This image has a resolution of 1024 x 1024 pixels." However, this statement only scratches the surface of the true meaning of resolution. Without the context of spatial dimensions within which these pixels are arranged, the statement lacks depth.

Image resolution, or more accurately, spatial resolution, is the measure of the smallest detail that can be discerned within an image. It's a testament to the image's clarity and depth, offering a glimpse into the intricacies hidden within its confines. The measurement of spatial resolution can be approached in several ways, with the most common being the number of line pairs per unit distance and the number of pixels per unit distance.

Let's explore the concept of line pairs per unit distance with an example. Imagine crafting an image solely with alternating black and white vertical lines. If one line measures 0.1 mm in width, within a unit distance (let's say a millimeter), you could fit 5 line pairs, equating to 10 alternating black and white lines. This measure gives us an insight into the level of detail that the image can convey.

In the realms of printing and publishing, resolution is often discussed in terms of pixels or dots per unit distance. In the United States, this is commonly expressed as dots per inch (dpi). For instance, newspapers might be printed at 75 dpi, offering a certain level of detail suitable for their medium, while high-quality book pages could be printed at an astounding 2044 dpi, capturing every nuance of the image.

For electronic devices, the resolution is usually measured in pixels per inch (ppi), providing a gauge of the screen's imaging resolution. For example, the iPhone 4 boasted a screen resolution of 326 ppi, while the iPhone 15 Pro Max impresses with a 460 ppi display, showcasing the evolution of display technology and its ability to render images with increasing clarity and detail.

To visualize the impact of different spatial resolutions, consider the following imagery.

As we delve deeper into the canvas of digital imaging, another crucial element emerges, painting the picture of how images come to life: pixel depth.

Pixel depth, a measure of the number of color levels each pixel in an image can display, is the artist's palette in the digital world. For grayscale images, it dictates the number of possible gray levels. Typically expressed as a power of 2, pixel depth brings a spectrum of color to our digital canvas. For instance, a pixel depth of 24 bits unlocks a palette of 16,777,216 (2^24) different colors, each pixel a drop in the vast ocean of visual possibilities.

Increasing pixel depth enriches the image, allowing for a more nuanced and natural portrayal of the scene. Each pixel can display more color variations, adding depth and life to the image. However, this richness comes with a caveat. The human eye, with its finite resolution, can only perceive up to 10 million colors. Thus, a 24-bit depth, offering just over 16 million colors, is more than sufficient to satisfy our visual needs, painting a picture as rich and detailed as the world around us.

Consider the effect of varying levels of gray in a grayscale image. At lower levels, the image might appear stark, its contrasts bold but lacking subtlety. As we increase the gray levels, the image gains depth, its shadows and highlights blending smoothly to reveal textures and details previously lost in translation. This gradient of grays, from the darkest black to the brightest white, illustrates not just the technical prowess of digital imaging but also the aesthetic beauty that can be achieved through the careful manipulation of pixel depth.

The Digital Image Data

As we navigate the digital night sky, a question twinkles brightly: What exactly is digital image data? This query leads us into the heart of our digital devices, where images, once captured by our eyes, are transformed and stored.

In the vast universe of audio-visual development, two types of image data frequently traverse our screens and networks: RGB and YCbCr. These color models, which we've ventured through in previous discussions, serve distinct yet complementary roles in the digital realm. RGB, with its direct correlation to the way screens display colors through red, green, and blue pixels, is the beacon for on-screen visuals. YCbCr, on the other hand, offers a canvas for image processing, encoding, and transmission, prized for its efficient compression capabilities. The alchemy of converting between these color models hinges on specific standards, sampling formats, and the transformation matrices we've previously explored, a dance of numbers and equations that we shall not repeat here.

As we delve deeper into the fabric of digital imagery, we encounter the vessels that carry these visual treasures: the formats of PNG, JPEG, GIF, and others. These are not mere names but encodings, methods of compressing and storing the vast arrays of image data into manageable, transferable files. To bring these images to life, to display them on our screens and devices, we embark on a journey of decoding, unraveling the compressed data back into visible light and color.

In this realm, we tread lightly, for the intricacies of these formats and the processes of encoding and decoding are vast, filled with technicalities and standards. Here, in our exploration, we acknowledge their presence, their importance in the digital tapestry, without diving into the depths of their secrets. After all, our journey is one of understanding and appreciation, of seeing the digital image not just as data, but as a window into the world around us, captured, transformed, and preserved in the digital night sky.

Follow me on:

(Foks) Hui Wang's LinkedIn

(Foks) Hui Wang

Senior iOS Developer

108. The Representation of Images (3)

The Digitalization of Images

The Digital Image Data

Recent Posts

Comments

Never Miss a Post. Subscribe Now!