Files
DLSiteFSearch/DLSiteFSearchObsidian/Studying imsearch.md
2025-04-07 00:40:21 +02:00

9.6 KiB

imsearch is a Image Reverse Search Server that is written in Rust. It mixes C++ code from ORB_SLAM3 with its own custom wrapper, and is also linked with FAISS and OpenCV. For some reason the repository already contains C++ Code from FAISS, and not just using the "include" section for using FAISS from C++, but that is beyond me.

The core idea is that this program will allow the server owner to add in a bunch of images, and then exposes an API or CLI to reverse-search any images in the added database.

This program is not using any GPU acceleration features.

While adding the images, it will extract features using ORB from the ORB_SLAM3 module, and adds it to FAISS index, meanwhile, it also stores image path (image metadata) to a separate key-value database with RockDB. This separate database stores the image path and its correlation with the FAISS index ID of the vector (image feature).

While adding, the features from the images are extracted, and its metadata is also being stored. Since imsearch is using BinaryIVF index from FAISS, these types of requires "training" to be useful for image reverse search. The GPU training can take several tens of hours if not days on large datasets.

While searching, the program will extract the features from the query image, and runs a ANN (Approximate Nearest Neighbor) search against all vectors stored in the FAISS index. This will return results if similar feature vectors stored in the index and its FAISS index ID, querying RockDB and thus, finding the possible original image.

imsearch is a fucking pain in the ass to compile, at least in Windows I wasn't able to build any compatible binary. I cannot build FAISS in Windows with GPU support for some reason, and some strange errors while linking will also occur during the compilation.

Also it kind of interests me how the C++ <--> Rust integration works in this program. From what I can observe, OpenCV and ORB_SLAM3 are all C++ dependencies. Which basically means it needs a wrapper in order for them to work together.

What puzzles me is that ORB_SLAM3 itself, also depends on OpenCV heavily, so if Rust is using OpenCV wrapper, what the ORB_SLAM3 is supposed to use? Specially that ORB_SLAM3 will return vectors that are OpenCV types. And Rust may not understand C++ OpenCV type.

After a bit of digging, I found that imsearch uses a premade wrapper for OpenCV, which is fine. During the compilation of imsearch, linking OpenCV is the step that will often fail (because OpenCV-rust binding requires you to bring your own OpenCV or your system package's OpenCV). My hypothesis is that ORB_SLAM3 in the imsearch code is probably linking against the same OpenCV library that is being used in the Rust calls. They can pass raw pointers to each other which is allowed by the Rust OpenCV binding. And the fact that imsearch/src/ORB_SLAM3/ocvrs_common.hpp indicates that ORB_SLAM3 and imsearch are passing pointers around, the custom wrapper is imsearch/src/ORB_SLAM3/ORBwrapper.cc.

Search method

First, imsearch needs a source dataset, which is a image database where future queries will be compared against.

During imsearch add-image (directory), the program will loop through all image files in the directory, and extracts all feature vectors using ORB_SLAM3.

The "feature vector", which is actually the source descriptor (OpenCV Descriptor) is stored in RockDB. A source descriptor is a matrix of size \text{Number of Features} \times 32. Each containing a uint8 value. (See test_slamorb.ipynb)

All the features will be stored in a internal Key-Value database using RockDB. There are various tables that are used in imsearch.

The first table is the Features table. Each feature will have its own corresponding ID in incremental order. Note that it does not store the original image path, which is the only important metadata.

The second table is the Image Path table, which stores all image paths (the only metadata about images) added to imsearch. Each image path will have its own corresponding ID.

The third table is a Relation Table. For all stored features in RockDB, it establishes a relation between Feature ID and Image path ID.

For example, running ORB_SLAM3 on an image may return an 480x32 matrix, this means that the image has 480 feature vectors. Adding a single image to the database, imsearch will extract all the feature vectors, and store each of the feature vectors (all 480 of them) into the Features Table, then the image path will be inserted in the Image Path Table. Finally, it will insert into the Relation Table, all the Feature ID of the image, and its corresponding Image Path ID.

erDiagram
	fid[ImageFeatureColumn] {
		uint64 FeatureID
		vector feature	
	}
	ipid[ImagePathColumn] {
		uint64 ImagePathID
		string path
	}
	fid2ipid[FeatureID2ImagePathID] {
		uint64 FeatureID
		uint64 ImagePathID
	}
	fid |{ -- }| fid2ipid : Relation
	ipid || -- || fid2ipid : Relation

This establishes a many to one relationship between features and image paths.

After adding all the features into RockDB, imsearch export-data is called, and all the feature vectors are exported into a NumPy Serialized Array. After exporting, using the python script provided in utils/train.py, a new FAISS index using IndexBinaryIVF is created with dimension 256 (uint8 is 8 bit, there are 32 uint8 in one feature vector, a single feature vectors uses up 256 bit, thus the dimension for the binary vector index is 256), k or nlist is at discretion of the user, depending on the feature amount in the database. The nlist parameter divides all feature vectors into clusters, nlist indicates the amount of cluster to form. These kind of indexes requires training, the script will attempt to train the index using all the feature vectors contained in the NumPy Serialized Array exported by imsearch. After the index training is complete. The newly created FAISS index will be serialized and saved into ~/.config/imsearch/.

Afterwards, running imsearch build-index will actually add all the vectors into the index. During the training process, the actual index stays empty, the training process is to better cluster new feature vectors that will be added afterwards, and also make the KNN search much more performant.

After index building is complete, the imsearch can finally be used for reverse-image search. Either via CLI or using the Web API.

During search, a query image is passed in. ORB_SLAM3 will extract all the feature vectors present in the query image. Obtaining the feature of the image, which is a 2D Matrix. If an image has 480 feature vectors, then the Matrix will be 480x32. One row of the Matrix corresponds to a single feature vector of the image.

imsearch will perform a KNN search for all features present in the image, all 480 feature vectors will be searched. returning its neighbor vectors (their index, and their distance).

After getting all neighbors vectors (id) and their distance, we look up neighbor's vector id with its corresponding image file path in the RockDB FeatureID2ImagePathID. And then we assign a score on the similarity of each feature based on its distance with the neighbor vector. We basically obtain a statistical chart: a HashMap with image-path as Key, and a list of scores on the similarity of each feature vector between the query image vector and neighbor vectors (and its image). Please see lolishinshi/imsearch/src/imdb.rs

Usually if an image has a match, it will find various image (paths) with various feature vector similarity scores attached under that image path. If all scores are high and there are plenty of scores attached under the same image path? Then it's probably the original image that we are trying to find. If not, then either the scores will be low, or the image-path will be completely different, and with low similarity scores on each feature-vector neighbor-vector comparison.

Finally, all the scores are weighted using Wilson's Score. Giving each image-path a uniform similarity score. Then the result will be passed back to the end user.

Still, it's not a trivial process, whoever came up with the idea, I must give you my praise. But holy shit the source code for lolishinshi/imsearch is hard to read. It came with basically no documentation (other than how to use it). Reading Rust code is extremely hard for me, specially when there is some chaining action going on, like this:

// Fragment of lolishinshi/imsearch/src/index.rs @ L185
pub fn search<M>(&self, points: &M, knn: usize) -> Vec<Vec<Neighbor>>
where
	M: Matrix,
{
	assert_eq!(points.width() * 8, self.d as usize);
	let mut dists = vec![0i32; points.height() * knn];
	let mut indices = vec![0i64; points.height() * knn];
	let start = Instant::now();
	unsafe {
		faiss_IndexBinary_search(
			self.index,
			points.height() as i64,
			points.as_ptr(),
			knn as i64,
			dists.as_mut_ptr(),
			indices.as_mut_ptr(),
		);
	}

	debug!("knn search time: {:.2}s", start.elapsed().as_secs_f32());
	indices
		.into_iter()
		.zip(dists.into_iter())
		.map(|(index, distance)| Neighbor {
			index: index as usize,
			distance: distance as u32,
		})
		.chunks(knn)
		.into_iter()
		.map(|chunk| chunk.collect())
		.collect()
}

I had to whip out GitHub Copilot for this hieroglyphic, because between non-existence of code documentation, the ludicrous amount of into_iter() and chaining, and unwraps and results and unfamiliar macros. It's definitely a frustrating experience reading the code if you are not a Rust developer.

I will be adapting this image search method into Python and Milvus. Thank you lolishinshi.