Fun With Filters and Frequencies!

Derry Xu

CS 180, Fall 2024

1.1: Finite Difference Operator

In this section of the project, use the finite difference filter to approximate the partial derivatives in the $x$ and $y$ directions. More specificially:

$$ D_x = \begin{bmatrix}1 & -1\end{bmatrix}, D_y = \begin{bmatrix}1 \\ -1\end{bmatrix} $$

We use $D_x$ and $D_y$ as filters to convolve with our image, which is intended to find large changes in pixel values in horizontal and vertical directions. To get the edges, we can take the absolute value of the partials (since negative and positive change both indicate edges) and binarize the partials using a threshold.

To test these filters, we convolve them on a image of a cameraman, and get the following gradients:

camera man
original image
horizontal edges
horizontal edges, ie. convolved with $D_y$, threshold = 0.25
vertical edges
vertical edges, ie. convolved with $D_x$, threshold = 0.25

To create edges that aren't just horizontal or vertical, we can calculate the gradient magnitude, which we can calculate as:

$$||\nabla f|| = \sqrt{(\frac{\partial f}{\partial x})^2 + (\frac{\partial f}{\partial y})^2}$$

Essentially, at each pixel we take the square root of the sum of the squared x and y partial derivatives at that point, which in our case, is approximated by our convolutions. Using the above operation and our previously found partial derivative approximations, and a new threhsold, we can try and find the complete edges of the picture:

gradient magnitude
Gradient magnitude before thresholding
gradient magnitude binarized
Gradient magnitude using threshold = 0.3

1.2: Derivative of Gaussian Filter

To make our edges more defined and eliminate noise, we then used a Gaussian low pass filter to remove the high frequencies from the image and blur the image before using the gradient magnitude to detect edges. The main difference ends up being the strength of the lines around the cameraman, and some minimal elimination of noise, though there are still some unwanted edges.
low pass
Gaussian filtered cameraman before gradient magnitude: filter size = 5x5, sigma = 1
gradient magnitude
Gaussian filter + gradient magnitude before thresholding
gradient magnitude binarized
Gaussian filter + gradient magnitude after thresholding, threshold = 0.1
To replicate the last image, we can use one convolution as well by constructing the derivative of Gaussian filter, which we create by first convolving the 2D Gaussian filter with each the $D_x$ and $D_y$ filters (don't hold dimensions), then using this filters as the DoG filters. The code for this is in my submitted notebook.

2.1: Image "Sharpening"

In this section, we try and sharpen images by taking the high frequency components of the image and and adding more of them to the base image. To demonstrate the process, I used the provided image of the Taj Mihal.

We first use a Gaussian filter (from the previous section) to isolate the low frequency components of the image. Then, by subtratcing this low passed image from the original base image, we can extract the high frequencies that were removed by the Gaussian filter. Finally, by adding some multiple of these high frequencies to the original image, and clipping any out of bounds pixel values, we can create a sharpened image. This is done for each channel of the image independently.

original image
Original image
low pass
Gaussian filtered Taj Mihal: filter size = 5x5, $sigma = 1$
high pass
Normalized high pass of the image
sharpened image
Sharpened Taj Mihal using $\alpha$ = 1 and clipping

This entire process can be captured in a single convolution using the following filter: $((1 + \alpha)e - \alpha g)$, where $e$ is the unit impulse filter of the same size as the Gaussian, and $g$ is the 2D Gaussian filter. In my code, you can see that this approach gives us the same sharpened image as using multiple steps (blur -> highpass -> add).

Using the same technique, I sharpened an image of the Berkeley night skyline, though I personally feel I still prefer the unsharpened version.

original image
Original Image
sharpened Berkeley
Sharpened image using $\alpha = 0.75$, filter size = 20x20, $\sigma = 5$

2.2: Hybrid Images

In this section of the project, we combine low and high frequencies to make hybrid images where the image changes depending on the distance you view it from.

To construct the hybrid images, we use techniques from the previous sections. More specifically, after aligning the images, we create a low frequency version of one image (the one viewed from afar) using a Gaussian filter. Then we create a high frequency version of the other image (the one viewed from up close) using the difference between the original image and the low passed image. Then, we average the two images, and get our final result.

The first hybrid image created was the one provided by the staff:

Derek
Aligned Derek
Nutmeg
Aligned Nutmeg
Catman
Hybrid between Derek (low pass), and Nutmeg (high pass) using size = 50x50, $\sigma = 15$

I also chose an intentionally mismatching set of images to show a potential difficult example: when the images don't structurally match up well and one image has many high frequencies:

failure
Forest mixed with building. In the end, the high frequencies of the forest are so much that it just looks like noise

I think my best result was in mixing two football players on the Philadelphia Eagles: Jalen Hurts and AJ Brown. This proved more effective because I used headshots from each player, leading to matching angles, and since they are both humanoid, matching features as well. I think the most interesting thing is looking at their teeth, in my experience it looks like the hybrid image smiles wider when you get further (Hurts, the low pass, has a wider smile than Brown).

Jalen
Aligned Hurts
AJ
Aligned Brown
Eagles together
Hybrid between Hurts (low pass), and Brown (high pass) using size = 10x10, $\sigma = 3$

I also conducted my Fourier analysis on this hybrid image.

Hurts
FFT of the aligned Hurts image. It includes a strong vertical line because of the cropping issues.
Brown
FFT of the aligned Brown image. It includes both a strong vertical and horizontal line because of cropping
Hurts lowpass
Low pass of Hurts in FFT space
Brown highpass
High pass of Brown in FFT space
hybrid
Hybrid image in FFT space

2.3: Gaussian and Laplacian Stacks

To set up our blending in the next section, in this section we create Gaussian and Laplacian stacks, which will us the blend different frequencies separately later on. To create the Gaussian stack, we simply apply a Gaussian filter successively to an image without downsampling. For demonstration, we apply it to both the apple and the orange:

Apple Gaussian
Apple Gaussian stack (depth of 8)
Orange Gaussian
Orange Gaussian stack (depth of 8)

The Gaussian stacks are then used to create the Laplacian stacks, which help divide the image into different frequencies. To create the Laplacian stack, we take the difference between each element in the Gaussian stack, and subtract it from the next most clear image in the stack. We also keep the bottom image of the Gaussian stack as the bottom of the Laplacian stack. Here are the min-max normalized Laplacian stacks:

Apple Laplacian
Apple Laplacian stack (depth of 8)
Orange Laplacian
Orange Laplacian stack (depth of 8)

2.4: Multiresolution Blending

Now that we have our Laplacian, we can blend some images. To do so, we create a mask; for the orange and apple, we use a mask with a simple vertical boundary. We then create a Gaussian stack with this mask (the displayed stack will have length 15, since that's what I decided on using for the oraple). For my Gaussian stack, I opted to actually create a length 16 stack, then throw out the original binary mask, because it was drawing a noticable line on my oraple.

Masks Gaussian
Mask Gaussian stack (depth of 15)

Then, using the stack of masks, and the Laplacian stacks of both the orange and the apple, we use the following equation to blend:

$$LS_l(i, j) = GR_l(i, j) LA_l(i, j) + (1 - GR_l(i, j))LB_l(i, j)$$

Where $LS$ is the combined stack, $GR$ is the Gaussian stack of masks, $LA, LB$ are the Laplacian stacks, and $l$ is the layer. In essense, we multiply the first Laplacian stack by the Gaussian stack, the second Laplacian by 1 minus the Gaussian stack (at each element), and add them together to blend. This is done at each channel level individually.

Finally to make the blended image, we add up all layers of the stack for each image channel, then stack image channels to create our final image:

Apple
Original apple
Orange
Original orange
Oraple
Oraple, using 15 layer stack, Laplacians built using Gaussian stacks with size = 20x20, $\sigma = 10$, and mask size = 50x50, $\sigma = 20$

I made three additional blended images shown below.

guy
Some dude
robot
A robot
Cyborg
Cyborg, using same params as oraple
Tiger
Tiger
Lion
Lion
Liger
Liger, using 8 layer stack, Laplacians built using Gaussian stacks with size = 20x20, $\sigma = 10$, and mask size = 20x20, $\sigma = 5$
Dog
Dog with big nose
Muffin
Muffin
Muffin dog
Muffin dog, using irregular mask (small box in the center), used 8 layers, but otherwise parameters like oraple/cyborg

For the liger (my best result, probably because they line up structurally the best), we can see the final blended stack before being added up to better understand how we blend at multiple resolutions:

blended Laplacians
Blended Laplacian Stack, normalized, for Liger

We can see that at higher frequencies on the right, there is minimal blend and more of a strict cutoff, while at the left side for the lower frequencies, the pictures tend to bleed into one another more. After adding this stack together and clipping values, we get the liger displayed above.