Aligning images – an engineer’s solution

Recently I was struggling with the fact that one of the datasets, that I was working with had the same images, but they were not correctly aligned. Since one of them had location annotations that I used for training a Music Object Detector, I had to align them somehow.

For getting an impression on the alignment-error, look at the following images:

Black-and-White Image

Unaligned Grayscale Image

Bitwise difference between the images

The top-left image is the binarized image which serves as a reference. The top-right image is the original gray-scale image that is misaligned a tiny little bit, which you can’t even notice from just looking at them. So I’ve generated the bit-wise difference between the two images which is shown at the bottom and there you can almost read the full scores because they are slightly shifted and misplaced.

Generating such a diff-image from two images in Python with the Pillow library basically boils down to:

from PIL import Image, ImageChops
diff_image = ImageChops.difference(Image.open(image1_path), Image.open(image2_path))
diff_image.save(output_path)

allowing me to visually verify whether or not the images were aligned correctly.

Turns out, that almost every image in the dataset was transformed a little bit. Since the dataset contains 1000 images in multiple flavors, I needed some automation. As you can notice, the images are not very far apart from each other. So upon searching for a clever solution, I found a nice blog entry which attempts to align color channels of images, that are slightly misaligned, by applying an iterative algorithm to find an affine transformation (which is generally a very hard task). Luckily, that algorithm is readily implemented in OpenCV and is called cv2.findTransformECC. Using it is almost newbie-friendly:

from cv2 import cv2, countNonZero, cvtColor

im1 = cv2.imread(path_to_desired_image)
im2 = cv2.imread(path_to_image_to_warp)

warp_mode = cv2.MOTION_AFFINE
warp_matrix = np.eye(2, 3, dtype=np.float32)

# Specify the number of iterations.
number_of_iterations = 100

# Specify the threshold of the increment in the correlation 
# coefficient between two iterations
termination_eps = 1e-7

criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 
            number_of_iterations, termination_eps)

# Run the ECC algorithm. The results are stored in warp_matrix.
(cc, warp_matrix) = cv2.findTransformECC(im1, im2, warp_matrix, 
                                         warp_mode, criteria)

Lastly, one “only” needs to warp the image with the found affine transformation:

# Get the target size from the desired image
target_shape = im1.shape

aligned_image = cv2.warpAffine(
                          unaligned_image, 
                          warp_matrix, 
                          (target_shape[1], target_shape[0]), 
                          flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP,
                          borderMode=cv2.BORDER_CONSTANT, 
                          borderValue=0)

cv2.imwrite(destination_path, aligned_image)

The final result is remarkable. Can you still see the difference?

Diff of aligned images — Bitwise difference between the aligned images. A black pixel appears, where the images are not the same. Since the images are aligned, the image is almost completely white.

Just a few pixels remain, and these are because of errors during binarization of the image, which necessarily is a lossy operation. A cool side-effect is that the images are now not only aligned but also have the same size.

The only things, I needed to tweak a little bit where the two parameters number_of_iterations and termination_eps. Both are required for the cv2.findTransformECC algorithm and specify the maximum time that it tries to find a solution and the required quality before stopping. When either is satisfied, the algorithm stops and returns the found solution. Letting the algorithm run for a few hours, yielded a perfectly aligned the dataset, which allows me now to go back to train my networks to detect musical objects.

If you are interested in the full source-code, you can find it in this Github repository.

The score images depicted in this article are from the CVC-MUSCIMA dataset by Alicia Fornés, Anjan Dutta, Albert Gordo, and Josep Lladós, licensed under CC BY-NC-SA 4.0. More information on the dataset can also be found here as well as in their original paper.

6 thoughts on “Aligning images – an engineer’s solution”

juerg says:

January 1, 2019 at 5:55 pm

Interesting but unfortunately not working as published. You should mention that we need gray images and you use sz[0] and sz[1] but what are these???

LikeLike

1. apacha says:
  
  January 1, 2019 at 7:14 pm
  
  Hi Juerg, thanks for your feedback. I’ve just updated the post to clarify what this sz is. I think I mentioned it more than once, that I’m working with grayscale images. Please refer to Github for the whole source-code: https://github.com/apacha/CVC-MUSCIMA/blob/master/CompareDatasets.py
  And what do you mean by “not working as published”?
  BTW: If you are just interested in the aligned images of the CVC-MUSCIMA dataset, I’ve uploaded them to Github: https://github.com/apacha/OMR-Datasets/releases/tag/datasets
  
  Hope this helps.
  
  LikeLike
  
Georgios Evangelidis says:

September 12, 2019 at 12:32 pm

thumbs up!

LikeLike

Paul Sikat says:

October 20, 2021 at 9:22 pm

Way cool! Some very valid points! I appreciate you penning this article and also the rest of the site is extremely good.

LikeLike

Carla Cowling says:

October 23, 2021 at 12:39 am

Excellent post. I am experiencing some of these issues as well..

LikeLike

Hiroko Barrows says:

October 27, 2021 at 8:47 am

bookmarked!!, I really like your website!

LikeLike