1.2 KiB
1.2 KiB
The local dataset due to space (Disk partition) constraints, is split into three subsets:
- ASMROne
- ASMRTwo
- ASMRThree
There are no substantial differences between each subset. Subset sizes and audio work count:
- ASMR One --> 119 Audio works, 470GB/504 791 391 855 Bytes
- ASMR Two --> 90 Audio works, 439GB/471 683 782 635 Bytes
- ASMR Three --> 121 Audio works, 499GB/536 552 753 022 Bytes
Total: 330 Audio works, 1409GB/1 513 027 927 512 Bytes
There are works from different languages (audio language, or including translation subtitle file), different sizes, different audio encoding formats, etc.
Basic statistical data on filesystem level:
| Subset | File count | Folder count |
|---|---|---|
| ASMR One | 6317 | 1017 |
| ASMR Two | 7435 | 760 |
| ASMR Three | 6694 | 1066 |
Average Audio Work size:
1409 \, \text{GigaBytes} \div 330 \, \text{Works} = 4.2\overline{69} \, \text{GigaBytes/Work}
Avg.: approximately 4.27 GB per work
In this project we will be indexing only the following type of files:
- Audio
- Image
- Document
In depth analysis of the contents in the dataset is located in LocalDatasetAnalysis.ipynb