Recently I was struggling with the fact that one of the datasets, that I was working with had the same images, but they were not correctly aligned. Since one of them had location annotations that I used for training a Music Object Detector, I had to align them somehow.
For getting an impression on the alignment-error, look at the following images:
The top-left image is the binarized image which serves as a reference. The top-right image is the original gray-scale image that is misaligned a tiny little bit, which you can’t even notice from just looking at them. So I’ve generated the bit-wise difference between the two images which is shown at the bottom and there you can almost read the full scores because they are slightly shifted and misplaced.
Generating such a diff-image from two images in Python with the Pillow library basically boils down to:
from PIL import Image, ImageChops diff_image = ImageChops.difference(Image.open(image1_path), Image.open(image2_path)) diff_image.save(output_path)
allowing me to visually verify whether or not the images were aligned correctly.
Turns out, that almost every image in the dataset was transformed a little bit. Since the dataset contains 1000 images in multiple flavors, I needed some automation. As you can notice, the images are not very far apart from each other. So upon searching for a clever solution, I found a nice blog entry which attempts to align color channels of images, that are slightly misaligned, by applying an iterative algorithm to find an affine transformation (which is generally a very hard task). Luckily, that algorithm is readily implemented in OpenCV and is called cv2.findTransformECC. Using it is almost newbie-friendly:
from cv2 import cv2, countNonZero, cvtColor im1 = cv2.imread(path_to_desired_image) im2 = cv2.imread(path_to_image_to_warp) warp_mode = cv2.MOTION_AFFINE warp_matrix = np.eye(2, 3, dtype=np.float32) # Specify the number of iterations. number_of_iterations = 100 # Specify the threshold of the increment in the correlation # coefficient between two iterations termination_eps = 1e-7 criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, number_of_iterations, termination_eps) # Run the ECC algorithm. The results are stored in warp_matrix. (cc, warp_matrix) = cv2.findTransformECC(im1, im2, warp_matrix, warp_mode, criteria)
Lastly, one “only” needs to warp the image with the found affine transformation:
# Get the target size from the desired image target_shape = im1.shape aligned_image = cv2.warpAffine( unaligned_image, warp_matrix, (target_shape[1], target_shape[0]), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP, borderMode=cv2.BORDER_CONSTANT, borderValue=0) cv2.imwrite(destination_path, aligned_image)
The final result is remarkable. Can you still see the difference?

Just a few pixels remain, and these are because of errors during binarization of the image, which necessarily is a lossy operation. A cool side-effect is that the images are now not only aligned but also have the same size.
The only things, I needed to tweak a little bit where the two parameters number_of_iterations and termination_eps. Both are required for the cv2.findTransformECC algorithm and specify the maximum time that it tries to find a solution and the required quality before stopping. When either is satisfied, the algorithm stops and returns the found solution. Letting the algorithm run for a few hours, yielded a perfectly aligned the dataset, which allows me now to go back to train my networks to detect musical objects.
If you are interested in the full source-code, you can find it in this Github repository.
The score images depicted in this article are from the CVC-MUSCIMA dataset by Alicia Fornés, Anjan Dutta, Albert Gordo, and Josep Lladós, licensed under CC BY-NC-SA 4.0. More information on the dataset can also be found here as well as in their original paper.
Interesting but unfortunately not working as published. You should mention that we need gray images and you use sz[0] and sz[1] but what are these???
LikeLike
Hi Juerg, thanks for your feedback. I’ve just updated the post to clarify what this sz is. I think I mentioned it more than once, that I’m working with grayscale images. Please refer to Github for the whole source-code: https://github.com/apacha/CVC-MUSCIMA/blob/master/CompareDatasets.py
And what do you mean by “not working as published”?
BTW: If you are just interested in the aligned images of the CVC-MUSCIMA dataset, I’ve uploaded them to Github: https://github.com/apacha/OMR-Datasets/releases/tag/datasets
Hope this helps.
LikeLike
thumbs up!
LikeLike
Way cool! Some very valid points! I appreciate you penning this article and also the rest of the site is extremely good.
LikeLike
Excellent post. I am experiencing some of these issues as well..
LikeLike
bookmarked!!, I really like your website!
LikeLike