Hi đź‘‹,
In this article I would like to talk about image hashing.
Image hashing algorithms are specialized hashing functions that output the hash of an image based on the image’s properties. Duplicate images output the same hash value and visually identical images output a hash value that is slightly different.
To simplify
|
|
Some use cases for image hashing are:
- Duplicate Image Detection
- Anti-Impersonation / Image Stealing
- Image filtering
- Reverse image search
Let’s play around with image hashing techniques using Python and the ImageHash library. Install the library with:
|
|
To obtain some sample images I’ve used Pexels and searched for words like “white cat”, “firetruck”.
Here’s the images that I’m using: cat1, cat2, cat3 and firetruck1.
I’m going to import the necessary stuff and add a function that converts the hexadecimal string given by image hash to an integer.
|
|
The reason for the hash_to_int function is that is much easier to do computations using integers rather than strings, in the future if we’re going to build a service that makes use of the image hashing and computes hamming distances, we can store the int hashes in an OLAP database such as ClickHouse and use bitHammingDistance to compute the Hamming Distance.
The next snippet of code opens the images, computes the average and color hashes and for every image in the dataset it computes the hamming distance between the average hash summed with the hamming distance of the color hash.
The lower the hamming distance the more similar the images. A hamming distane of 0 means the images are equal.
|
|
To compute the hamming distance, you’ll need to XOR the two integers and then count the number of 1 bits bin(source[1] ^ image[1]).count("1")
. That’s it.
If the run the program with the source variable set to cat1.jpg, source = image_hashes[0]
, we get the following result:
|
|
If we look at our dataset the first image cat1 is somewhat visually similar to the image of the firetruck.
If we run the program with the source variable set to cat2.jpg we can see that cat2 is similar to cat3 since both images contain white cats.
|
|
Conclusion
We used a Python image hashing library to compute the average and color hash of some images and then we determined which images are similar to each other by computing the hamming distance of the hashes.
Thanks for reading and build something fun! 🔨
References
- https://practicaldatascience.co.uk/data-science/how-to-use-image-hashing-to-identify-visually-similar-or-duplicate-images
- https://pypi.org/project/ImageHash/
Full Code
|
|