diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..25da6e6
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+.obsidian
\ No newline at end of file
diff --git a/DLSiteFSearchObsidian/Approach.md b/DLSiteFSearchObsidian/Approach.md
new file mode 100644
index 0000000..67f93c1
--- /dev/null
+++ b/DLSiteFSearchObsidian/Approach.md
@@ -0,0 +1,40 @@
+My objective is given a DLSite ASMR audio, we will refer these as Search Audio, because it is the audio that needs to be reverse-searched, we don't know any information about the Search Audio other that it came from DLSite. And from the Search Audio, get the original DLSite product information or metadata.
+
+My approach is heavily inspired on the [lolishinshi/imsearch](https://github.com/lolishinshi/imsearch) project, which is a image search project using image's local features: extracting local image features (vectors) using ORB_SLAM3, and then indexing/searching it using [FAISS](https://github.com/facebookresearch/faiss). And ultimately storing all indexed image metadata into [RockDB](https://rocksdb.org/).
+(wow, 2 Facebook projects)
+
+I assumed that we need the 2 things:
+- Audio to be indexed (basically, all works from DLSite)
+- Vector Database (acting as a index, and search engine: FAISS or Milvus, or other traditional databases with Vector support: Postgres, Apache Solr/Lucene/Cassandra, MySQL, MongoDB, etc.)
+
+The Audio to be indexed presents me with an obstacle: I don't have the money to purchase all DLSite ASMR works, it would be ridiculous. And I do not think DLSite would want to collaborate on this project that mainly focuses on Audio reverse-searching similar audio. Also it would put a ton of responsibility on me, and I don't want that.
+So we are going for the second best option of sailing the high seas. Fortunately there are already data hoarders with tons of ASMR audios, they are ridiculously big, so I will have to make the index in batches.
+
+The vector database we could just use FAISS, but the training stage would probably present a problem, because I don't want my system to be maxed out at 100% GPU usage for days on end. I will try different database solutions, like Milvus, also traditional (and more well-known) databases are also possible? I will have to find that out.
+
+I will be conducting a small scale test picking about 316 DLSite audio works (translated, and raw, with additional images, documents, etc.), and see how can I construct this application.
+
+The first BIG **problem** is how the hell do I convert those audio to vectors? If it's image, then we just need to run ORB_SLAM3 for feature extraction and that will work quite well. if it's text there are text to embeddings model out there that will also work? I just need to make sure to pick open-source models.
+
+But the audio... There are commercial products that uses algorithm to search songs (Shazam), but I... have ASMR audio at my hand.
+My planned use case is that the end user may possess a potential audio from DLSite, only knowing the fact that it probably came from DLSite and nothing else, the audio could be lossy compressed, and my application's job is to find which product ID corresponds to that audio.
+From the articles that I could find:
+https://milvus.io/docs/audio_similarity_search.md
+https://zilliz.com/ai-faq/what-is-audio-similarity-search
+https://www.elastic.co/search-labs/blog/searching-by-music-leveraging-vector-search-audio-information-retrieval
+https://zilliz.com/learn/top-10-most-used-embedding-models-for-audio-data
+
+One of the path is that I should be using embedding models used in Deep Learning to extract all the feature vectors from the audio. I have my doubts, since these models are trained on different real-world scenario audios, or music, and they might not be suitable for ASMR audio works. But I could be proven wrong, I wish to be proven wrong.
+
+Another path is this paper I found while searching:
+![[Efficient_music_identification_using_ORB_descripto.pdf]]
+
+This paper employed ORB algorithm on the spectrogram image, which is interesting. But the paper specifically says that it is tested for music identification. Not ASMR audio. Although I am sure that a spectrogram is just another image for the ORB algorithm. But usual length for ASMR audio ranges from short minutes to hours long audio. And I am not sure if ORB is able to handle such extreme image proportions (extremely large images, with the audio length proportional to the X dimension of the image).
+One of the ways I came up is to probably chop the audio into pieces, and then running the ORB algorithm to extract the features, that way we don't end up with extraordinary image sizes for the spectrogram, but I am not sure of its effectiveness. So I will also have to experiment with that.
+
+So my current approach will be experimenting these two ways using the local DLSite audio that I have. And compare the results between each other.
+
+Also, I want to index more aspects of the audio works, these DLSite packages usually come with images and documents, I also want to index those aspects. The documents can be converted to vectors using Embedding models, and for the image we can use the same approach from `imsearch`.
+
+I will have to do some analysis on the files that I have got on my hands. The collection of DLSite works I was able to find has approximately 50k audio works, with each weighing in at 3 GB to 8 GB with some outliers eating up from 20GB to 110GB of space. A rough estimation is that all of these work combined will produce use up more than 30 to 35 TB of space. I don't have the space for that, so I will have to do indexing on batches.
+
diff --git a/DLSiteFSearchObsidian/Efficient_music_identification_using_ORB_descripto.pdf b/DLSiteFSearchObsidian/Efficient_music_identification_using_ORB_descripto.pdf
new file mode 100644
index 0000000..09b82bc
--- /dev/null
+++ b/DLSiteFSearchObsidian/Efficient_music_identification_using_ORB_descripto.pdf
@@ -0,0 +1,10911 @@
+%PDF-1.4
+%
+1 0 obj
+<>
+endobj
+2 0 obj
+<>stream
+
+
+
+
+ Audio fingerprinting,Music identification,Oriented FAST and Rotated BRIEF,Spectrogram
+ Acrobat Distiller 10.1.5 (Windows); modified using iText® 5.3.5 ©2000-2012 1T3XT BVBA (AGPL-version)
+
+
+ application/pdf
+ 10.1186/s13636-017-0114-4
+
+
+ EURASIP Journal on Audio, Speech, and Music Processing
+
+
+
+
+ EURASIP Journal on Audio, Speech, and Music Processing, 2017, doi:10.1186/s13636-017-0114-4
+
+
+
+
+ Audio fingerprinting
+ Music identification
+ Oriented FAST and Rotated BRIEF
+ Spectrogram
+
+
+
+
+ Efficient music identification using ORB descriptors of the spectrogram image
+
+
+
+
+ Dominic Williams
+ Akash Pooransingh
+ Jesse Saitoo
+
+
+
+
+ 2017-07-11T03:24:25+02:00
+ 2017-07-08T16:04:02+08:00
+ Arbortext Advanced Print Publisher 9.1.440/W Unicode
+ 2017-07-11T03:24:25+02:00
+
+
+ uuid:27a7f857-a1b7-498e-bbdf-53652be70ad7
+ uuid:a8b85e5e-5669-416b-bf72-93e93d4c18e4
+ default
+ 1
+
+
+
+ converted
+ uuid:2f6b1cae-a38c-4b01-a0a4-7d575c8b42cc
+ converted to PDF/A-2b
+ pdfToolbox
+ 2017-07-08T16:17:21+08:00
+
+
+
+
+
+ 2
+ B
+
+
+
+
+
+ Dominic Williams
+ http://orcid.org/0000-0003-2344-2760
+
+
+
+
+
+
+
+
+ Springer Nature ORCID Schema
+ http://springernature.com/ns/xmpExtensions/2.0/
+ sn
+
+
+
+ authorInfo
+ Bag AuthorInformation
+ external
+ Author information: contains the name of each author and his/her ORCID (ORCiD: Open Researcher and Contributor ID). An ORCID is a persistent identifier (a non-proprietary alphanumeric code) to uniquely identify scientific and other academic authors.
+
+
+ editorInfo
+ Bag EditorInformation
+ external
+ Editor information: contains the name of each editor and his/her ORCID identifier.
+
+
+ seriesEditorInfo
+ Bag SeriesEditorInformation
+ external
+ Series editor information: contains the name of each series editor and his/her ORCID identifier.
+
+
+
+
+
+
+ AuthorInformation
+ http://springernature.com/ns/xmpExtensions/2.0/authorInfo/
+ author
+ Specifies the types of author information: name and ORCID of an author.
+
+
+
+ name
+ Text
+ Gives the name of an author.
+
+
+ orcid
+ URI
+ Gives the ORCID of an author.
+
+
+
+
+
+ EditorInformation
+ http://springernature.com/ns/xmpExtensions/2.0/editorInfo/
+ editor
+ Specifies the types of editor information: name and ORCID of an editor.
+
+
+
+ name
+ Text
+ Gives the name of an editor.
+
+
+ orcid
+ URI
+ Gives the ORCID of an editor.
+
+
+
+
+
+ SeriesEditorInformation
+ http://springernature.com/ns/xmpExtensions/2.0/seriesEditorInfo/
+ seriesEditor
+ Specifies the types of series editor information: name and ORCID of a series editor.
+
+
+
+ name
+ Text
+ Gives the name of a series editor.
+
+
+ orcid
+ URI
+ Gives the ORCID of a series editor.
+
+
+
+
+
+
+
+
+ http://ns.adobe.com/pdf/1.3/
+ pdf
+ Adobe PDF Schema
+
+
+
+ internal
+ A name object indicating whether the document has been modified to include trapping information
+ Trapped
+ Text
+
+
+
+
+
+ http://ns.adobe.com/pdfx/1.3/
+ pdfx
+ PDF/X ID Schema
+
+
+
+ internal
+ ID of PDF/X standard
+ GTS_PDFXVersion
+ Text
+
+
+ internal
+ Conformance level of PDF/X standard
+ GTS_PDFXConformance
+ Text
+
+
+ internal
+ Company creating the PDF
+ Company
+ Text
+
+
+ internal
+ Date when document was last modified
+ SourceModified
+ Text
+
+
+
+
+
+ http://ns.adobe.com/xap/1.0/mm/
+ xmpMM
+ XMP Media Management Schema
+
+
+
+ internal
+ UUID based identifier for specific incarnation of a document
+ InstanceID
+ URI
+
+
+ internal
+ The common identifier for all versions and renditions of a document.
+ OriginalDocumentID
+ URI
+
+
+
+
+
+ http://www.aiim.org/pdfa/ns/id/
+ pdfaid
+ PDF/A ID Schema
+
+
+
+ internal
+ Part of PDF/A standard
+ part
+ Integer
+
+
+ internal
+ Amendment of PDF/A standard
+ amd
+ Text
+
+
+ internal
+ Conformance level of PDF/A standard
+ conformance
+ Text
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+endstream
+endobj
+3 0 obj
+<>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]/ColorSpace<>/Font<>>>/Thumb 14 0 R/MediaBox[0 0 595.276 790.866]/Annots[15 0 R 16 0 R 17 0 R 18 0 R 19 0 R 20 0 R 21 0 R 22 0 R 23 0 R 24 0 R 25 0 R]/Rotate 0>>
+endobj
+5 0 obj
+<>stream
+x\YF~ׯG0BMu7^{vzb$?I1I y@E#&BUV_f>su߿~8Ty>\-w<͝Ϯ^w?ܾKnwnUvuzW,ROaJS^}ZvkҮf>5siO=y,-e5TuGu;.vS^]]67&
W{m
{rkXCm^ήm]&j7sy_ rchW^>
fYAOz"#-(,hY&mLյ3XOHf