[{"content":"\nDescription Another one parody of the covers of a well-known book publisher.\nThe work continues the folk art series of O RLY book covers.\nThe endangered species on the cover of this book is Brainlet, a member of Wojak\u0026rsquo;s kin.\n","permalink":"https://siv-radio.github.io/posts/2026-04-01-a-new-o-rly-book/","summary":"Finally, the deep learning book we deserve.","title":"A New O RLY Book"},{"content":"Introduction There is a usual way of solving classification problems using artificial neural network models described in books [link], articles [link], courses [link], and library documentation [link], [link], [link]. It is based on the selection of a pre-trained model with the installment of a blank classifier head, which are then fine-tuned on a task-specific dataset. The idea for this work appeared during the development of a classifier with a varying number of classes. The list of classes was mostly the same after each update, but some number of them could disappear, and new ones were included. Therefore, it was necessary to fine-tune a model each time on a whole set of data with lots of old and a few new instances. The problem here is that a classifier model stores data in a tangled form, and it is hard to separate one class embedding vector from other embedding vectors. The ability to separate the embeddings could provide a way to change only those embeddings that have to be changed without fine-tuning the entire model. This is where embedding creation, similarity search, and vector databases come in.\nThe technique below is applied to an image classification task, but it also works well for sentence classification problems. This approach is a good first step to deal with image and sentence classification tasks, while fine-tuning of a classifier model is a possible second step if the results from the first step are not good enough.\nLibraries and environment This work is done by using Python with some popular deep learning and supportive libraries.\nPyTorch is used as the core artificial neural network framework. timm (PyTorch Image Models) is used for working with the state-of-the-art computer vision models. Sentence Transformers provides access to a semantic search function. sklearn is used to build a confusion matrix. Pillow (Python Imaging Library (Fork)) is used for image processing. NumPy and SciPy are used as supportive mathematical tools. pandas is used for its Comma Separated Values (CSV) file support. tqdm renders progress bars. matplotlib is used for plotting. A standalone x86-64 computer with CUDA support was used during this work.\nCPU: Intel Core i5-9300H RAM: 16 GB, DDR4-3200, dual channel SSD: Intel 660p, 1 TB, NVMe GPU: NVIDIA GeForce RTX 2060 Mobile, 6 GB Operating system: Windows 10 Home 64-bit, version 22H2 CUDA Toolkit: v12.9 Package manager: Miniforge3 v25.3.0-1 IDE: Spyder IDE v6.0.5 To be clear, no Linux, Jupyter Notebook, or Google Colab were used.\nThe list of installed packages is shown below.\nVirtual environment name: ml-dev-2 channels: - pytorch - nvidia - conda-forge dependencies: - aiohappyeyeballs=2.6.1=pyhd8ed1ab_0 - aiohttp=3.11.18=py312h31fea79_0 - aiosignal=1.3.2=pyhd8ed1ab_0 - alabaster=1.0.0=pyhd8ed1ab_1 - annotated-types=0.7.0=pyhd8ed1ab_1 - anyio=4.9.0=pyh29332c3_0 - asgiref=3.8.1=pyhd8ed1ab_1 - asttokens=3.0.0=pyhd8ed1ab_1 - attrs=25.3.0=pyh71513ae_0 - autopep8=2.3.2=pyhd8ed1ab_0 - aws-c-auth=0.9.0=h3b843a2_4 - aws-c-cal=0.9.0=hd30f992_0 - aws-c-common=0.12.2=h2466b09_0 - aws-c-compression=0.3.1=hd30f992_4 - aws-c-event-stream=0.5.4=h12f1610_7 - aws-c-http=0.10.0=h74fe21f_0 - aws-c-io=0.18.1=hfd8e7f4_2 - aws-c-mqtt=0.12.3=hd3945f4_4 - aws-c-s3=0.7.16=hf4a4381_1 - aws-c-sdkutils=0.2.3=hd30f992_4 - aws-checksums=0.2.7=hd30f992_0 - aws-crt-cpp=0.32.4=hf35b9f3_2 - aws-sdk-cpp=1.11.510=h1c8c2b7_6 - babel=2.17.0=pyhd8ed1ab_0 - backoff=2.2.1=pyhd8ed1ab_1 - bcrypt=4.3.0=py312h2615798_0 - beautifulsoup4=4.13.4=pyha770c72_0 - blas=1.0=mkl - blinker=1.9.0=pyhff2d567_0 - brotli=1.1.0=h2466b09_2 - brotli-bin=1.1.0=h2466b09_2 - brotli-python=1.1.0=py312h275cf98_2 - bzip2=1.0.8=h2466b09_7 - c-ares=1.34.5=h2466b09_0 - ca-certificates=2025.4.26=h4c7d964_0 - cachetools=5.5.2=pyhd8ed1ab_0 - cairo=1.18.4=h5782bbf_0 - certifi=2025.1.31=pyhd8ed1ab_0 - cffi=1.17.1=py312h4389bb4_0 - charset-normalizer=3.4.2=pyhd8ed1ab_0 - chroma-hnswlib=0.7.6=py312hbaa7e33_1 - chromadb=1.0.7=py312h2cd702c_0 - click=8.1.8=pyh7428d3b_0 - cloudpickle=3.1.1=pyhd8ed1ab_0 - colorama=0.4.6=pyhd8ed1ab_1 - coloredlogs=15.0.1=pyhd8ed1ab_4 - comm=0.2.2=pyhd8ed1ab_1 - contourpy=1.3.2=py312hd5eb7cc_0 - cpython=3.12.10=py312hd8ed1ab_0 - cryptography=44.0.3=py312h9500af3_0 - cuda-cccl=12.9.27=0 - cuda-cccl_win-64=12.9.27=0 - cuda-cudart=12.4.127=0 - cuda-cudart-dev=12.4.127=0 - cuda-cupti=12.4.127=0 - cuda-libraries=12.4.1=0 - cuda-libraries-dev=12.4.1=0 - cuda-nvrtc=12.4.127=0 - cuda-nvrtc-dev=12.4.127=0 - cuda-nvtx=12.4.127=0 - cuda-opencl=12.9.19=0 - cuda-opencl-dev=12.9.19=0 - cuda-profiler-api=12.9.19=0 - cuda-runtime=12.4.1=0 - cuda-version=12.9=3 - cycler=0.12.1=pyhd8ed1ab_1 - datasets=3.5.0=pyhd8ed1ab_0 - debugpy=1.8.14=py312h275cf98_0 - decorator=5.2.1=pyhd8ed1ab_0 - deprecated=1.2.18=pyhd8ed1ab_0 - dill=0.3.8=pyhd8ed1ab_0 - dlfcn-win32=1.4.1=h63175ca_0 - dnspython=2.7.0=pyhff2d567_1 - docutils=0.21.2=pyhd8ed1ab_1 - double-conversion=3.3.1=he0c23c2_0 - durationpy=0.9=pyhd8ed1ab_1 - email-validator=2.2.0=pyhd8ed1ab_1 - email_validator=2.2.0=hd8ed1ab_1 - exceptiongroup=1.2.2=pyhd8ed1ab_1 - executing=2.2.0=pyhd8ed1ab_0 - faiss=1.8.0=py312cuda120h1068afa_1_cuda - faiss-gpu=1.8.0=h3722977_2 - fastapi=0.115.9=pyh29332c3_0 - fastapi-cli=0.0.7=pyhd8ed1ab_0 - filelock=3.18.0=pyhd8ed1ab_0 - flake8=7.2.0=pyhd8ed1ab_0 - flask=3.1.0=pyhd8ed1ab_1 - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 - font-ttf-inconsolata=3.000=h77eed37_0 - font-ttf-source-code-pro=2.038=h77eed37_0 - font-ttf-ubuntu=0.83=h77eed37_3 - fontconfig=2.15.0=h765892d_1 - fonts-conda-ecosystem=1=0 - fonts-conda-forge=1=0 - fonttools=4.57.0=py312h31fea79_0 - freetype=2.13.3=h57928b3_1 - frozenlist=1.5.0=py312h31fea79_1 - fsspec=2024.12.0=pyhd8ed1ab_0 - gdown=5.2.0=pyhd8ed1ab_1 - google-auth=2.39.0=pyhd8ed1ab_0 - googleapis-common-protos=1.70.0=pyhd8ed1ab_0 - graphite2=1.3.13=h63175ca_1003 - grpcio=1.71.0=py312h18946f6_1 - h11=0.16.0=pyhd8ed1ab_0 - h2=4.2.0=pyhd8ed1ab_0 - harfbuzz=11.1.0=h8796e6f_0 - hpack=4.1.0=pyhd8ed1ab_0 - httpcore=1.0.9=pyh29332c3_0 - httptools=0.6.4=py312h4389bb4_0 - httpx=0.28.1=pyhd8ed1ab_0 - huggingface_hub=0.30.2=pyhd8ed1ab_0 - humanfriendly=10.0=pyh7428d3b_8 - hyperframe=6.1.0=pyhd8ed1ab_0 - icu=75.1=he0c23c2_0 - idna=3.10=pyhd8ed1ab_1 - imagesize=1.4.1=pyhd8ed1ab_0 - importlib-metadata=8.6.1=pyha770c72_0 - importlib-resources=6.5.2=pyhd8ed1ab_0 - importlib_resources=6.5.2=pyhd8ed1ab_0 - iniconfig=2.0.0=pyhd8ed1ab_1 - intel-openmp=2025.1.0=h57928b3_980 - ipykernel=6.29.5=pyh4bbf305_0 - ipython=8.36.0=pyh9ab4c32_0 - ipywidgets=8.1.7=pyhd8ed1ab_0 - itsdangerous=2.2.0=pyhd8ed1ab_1 - jedi=0.19.2=pyhd8ed1ab_1 - jinja2=3.1.6=pyhd8ed1ab_0 - joblib=1.5.0=pyhd8ed1ab_0 - jsonschema=4.23.0=pyhd8ed1ab_1 - jsonschema-specifications=2025.4.1=pyh29332c3_0 - jupyter_client=8.6.3=pyhd8ed1ab_1 - jupyter_core=5.7.2=pyh5737063_1 - jupyterlab_widgets=3.0.15=pyhd8ed1ab_0 - khronos-opencl-icd-loader=2024.10.24=h2466b09_1 - kiwisolver=1.4.8=py312hc790b64_0 - krb5=1.21.3=hdf4eb48_0 - lcms2=2.17=hbcf6048_0 - lerc=4.0.0=h6470a55_1 - libabseil=20250127.1=cxx17_h4eb7d71_0 - libarrow=20.0.0=h10765f2_0_cpu - libarrow-acero=20.0.0=h7d8d6a5_0_cpu - libarrow-dataset=20.0.0=h7d8d6a5_0_cpu - libarrow-substrait=20.0.0=hb76e781_0_cpu - libblas=3.9.0=1_h8933c1f_netlib - libbrotlicommon=1.1.0=h2466b09_2 - libbrotlidec=1.1.0=h2466b09_2 - libbrotlienc=1.1.0=h2466b09_2 - libcblas=3.9.0=12_hb3dda5d_netlib - libclang13=20.1.4=default_h6e92b77_0 - libcrc32c=1.1.2=h0e60522_0 - libcublas=12.4.5.8=0 - libcublas-dev=12.4.5.8=0 - libcufft=11.2.1.3=0 - libcufft-dev=11.2.1.3=0 - libcurand=10.3.10.19=0 - libcurand-dev=10.3.10.19=0 - libcurl=8.13.0=h88aaa65_0 - libcusolver=11.6.1.9=0 - libcusolver-dev=11.6.1.9=0 - libcusparse=12.3.1.170=0 - libcusparse-dev=12.3.1.170=0 - libdeflate=1.23=h76ddb4d_0 - libevent=2.1.12=h3671451_1 - libexpat=2.7.0=he0c23c2_0 - libfaiss=1.8.0=cuda120h2ee710b_1_cuda - libffi=3.4.6=h537db12_1 - libfreetype=2.13.3=h57928b3_1 - libfreetype6=2.13.3=h0b5ce68_1 - libglib=2.84.1=h7025463_0 - libgoogle-cloud=2.36.0=hf249c01_1 - libgoogle-cloud-storage=2.36.0=he5eb982_1 - libgrpc=1.71.0=h8c3449c_1 - libhwloc=2.11.2=default_hc8275d1_1000 - libiconv=1.18=h135ad9c_1 - libintl=0.22.5=h5728263_3 - libjpeg-turbo=3.1.0=h2466b09_0 - liblapack=3.9.0=12_h13b7882_netlib - liblzma=5.8.1=h2466b09_0 - libnpp=12.2.5.30=0 - libnpp-dev=12.2.5.30=0 - libnvfatbin=12.9.19=0 - libnvfatbin-dev=12.9.19=0 - libnvjitlink=12.4.127=0 - libnvjitlink-dev=12.4.127=0 - libnvjpeg=12.3.1.117=0 - libnvjpeg-dev=12.3.1.117=0 - libparquet=20.0.0=ha850022_0_cpu - libpng=1.6.47=h7a4582a_0 - libprotobuf=5.29.3=he9d8c4a_1 - libpulsar=3.7.0=h5b24947_1 - libre2-11=2024.07.02=hd248061_3 - libsentencepiece=0.2.0=h98a84dd_11 - libsodium=1.0.20=hc70643c_0 - libsqlite=3.49.1=h67fdade_2 - libssh2=1.11.1=h9aa295b_0 - libthrift=0.21.0=hbe90ef8_0 - libtiff=4.7.0=h797046b_4 - libutf8proc=2.10.0=hf9b99b7_0 - libuv=1.50.0=h2466b09_0 - libwebp=1.5.0=h3b0e114_0 - libwebp-base=1.5.0=h3b0e114_0 - libxcb=1.16=h013a479_1 - libxml2=2.13.7=h442d1da_1 - libxslt=1.1.39=h3df6e99_0 - libzlib=1.3.1=h2466b09_2 - lz4-c=1.10.0=h2466b09_1 - m2w64-gcc-libgfortran=5.3.0=6 - m2w64-gcc-libs=5.3.0=7 - m2w64-gcc-libs-core=5.3.0=7 - m2w64-gmp=6.1.0=2 - m2w64-libwinpthread-git=5.0.0.4634.697f757=2 - markdown-it-py=3.0.0=pyhd8ed1ab_1 - markupsafe=3.0.2=py312h31fea79_1 - matplotlib=3.10.1=py312h2e8e312_0 - matplotlib-base=3.10.1=py312h90004f6_0 - matplotlib-inline=0.1.7=pyhd8ed1ab_1 - mccabe=0.7.0=pyhd8ed1ab_1 - mdurl=0.1.2=pyhd8ed1ab_1 - mkl=2023.1.0=h6a75c08_48682 - mmh3=5.1.0=py312h275cf98_1 - monotonic=1.6=pyhd8ed1ab_0 - mpmath=1.3.0=pyhd8ed1ab_1 - msys2-conda-epoch=20160418=1 - multidict=6.4.3=py312h31fea79_0 - multiprocess=0.70.16=py312h4389bb4_1 - munkres=1.1.4=pyh9f0ad1d_0 - nest-asyncio=1.6.0=pyhd8ed1ab_1 - networkx=3.4.2=pyh267e887_2 - numpy=1.26.4=py312h8753938_0 - oauthlib=3.2.2=pyhd8ed1ab_1 - onnxruntime=1.21.1=py312h414cfab_0_cpu - opencl-headers=2024.10.24=he0c23c2_0 - openjpeg=2.5.3=h4d64b90_0 - openssl=3.5.0=ha4e3fda_1 - opentelemetry-api=1.32.1=pyhd8ed1ab_0 - opentelemetry-exporter-otlp-proto-common=1.32.1=pyhd8ed1ab_0 - opentelemetry-exporter-otlp-proto-grpc=1.32.1=pyhd8ed1ab_0 - opentelemetry-instrumentation=0.53b1=pyhd8ed1ab_0 - opentelemetry-instrumentation-asgi=0.53b1=pyhd8ed1ab_0 - opentelemetry-instrumentation-fastapi=0.53b1=pyhd8ed1ab_0 - opentelemetry-proto=1.32.1=pyhd8ed1ab_0 - opentelemetry-sdk=1.32.1=pyhd8ed1ab_0 - opentelemetry-semantic-conventions=0.53b1=pyh3cfb1c2_0 - opentelemetry-util-http=0.53b1=pyhd8ed1ab_0 - orc=2.1.1=h35764e3_1 - orjson=3.10.18=py312h2615798_0 - overrides=7.7.0=pyhd8ed1ab_1 - packaging=25.0=pyh29332c3_1 - pandas=2.2.3=py312h72972c8_3 - parso=0.8.4=pyhd8ed1ab_1 - pcre2=10.44=h99c9b8b_2 - pickleshare=0.7.5=pyhd8ed1ab_1004 - pillow=10.4.0=py312h381445a_1 - pip=25.1.1=pyh8b19718_0 - pixman=0.46.0=had0cd8c_0 - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_2 - platformdirs=4.3.7=pyh29332c3_0 - pluggy=1.5.0=pyhd8ed1ab_1 - posthog=3.6.5=pyhd8ed1ab_0 - prompt-toolkit=3.0.51=pyha770c72_0 - propcache=0.3.1=py312h31fea79_0 - protobuf=5.29.3=py312h275cf98_0 - psutil=7.0.0=py312h4389bb4_0 - pthread-stubs=0.4=hcd874cb_1001 - pthreads-win32=2.9.1=h2466b09_4 - pulsar-client=3.6.1=py312he90df90_0 - pure_eval=0.2.3=pyhd8ed1ab_1 - pyarrow=20.0.0=py312h2e8e312_0 - pyarrow-core=20.0.0=py312h6a9c419_0_cpu - pyasn1=0.6.1=pyhd8ed1ab_2 - pyasn1-modules=0.4.2=pyhd8ed1ab_0 - pycodestyle=2.13.0=pyhd8ed1ab_0 - pycparser=2.22=pyh29332c3_1 - pydantic=2.11.3=pyh3cfb1c2_0 - pydantic-core=2.33.1=py312hfe1d9c4_0 - pyflakes=3.3.2=pyhd8ed1ab_0 - pygments=2.19.1=pyhd8ed1ab_0 - pyjwt=2.10.1=pyhd8ed1ab_0 - pyopenssl=25.0.0=pyhd8ed1ab_0 - pyparsing=3.2.3=pyhd8ed1ab_1 - pypika=0.48.9=pyhd8ed1ab_1 - pyproject_hooks=1.2.0=pyhd8ed1ab_1 - pyreadline3=3.5.4=py312hf8493c8_0 - pyside6=6.9.0=py312h520aab8_0 - pysocks=1.7.1=pyh09c184e_7 - pytest=8.3.5=pyhd8ed1ab_0 - pytest-mock=3.14.0=pyhd8ed1ab_1 - python=3.12.10=h3f84c4b_0_cpython - python-build=1.2.2.post1=pyhff2d567_1 - python-dateutil=2.9.0.post0=pyhff2d567_1 - python-dotenv=1.1.0=pyh29332c3_1 - python-flatbuffers=25.2.10=pyhbc23db3_0 - python-kubernetes=32.0.1=pyhd8ed1ab_0 - python-multipart=0.0.20=pyhff2d567_0 - python-tzdata=2025.2=pyhd8ed1ab_0 - python-xxhash=3.5.0=py312h4389bb4_2 - python_abi=3.12=7_cp312 - pytorch=2.5.1=py3.12_cuda12.4_cudnn9_0 - pytorch-cuda=12.4=h3fd98bf_7 - pytorch-ignite=0.5.1=pyh36561fd_0 - pytorch-mutex=1.0=cuda - pytz=2025.2=pyhd8ed1ab_0 - pyu2f=0.1.5=pyhd8ed1ab_1 - pywin32=307=py312h275cf98_3 - pyyaml=6.0.2=py312h31fea79_2 - pyzmq=26.4.0=py312hd7027bb_0 - qhull=2020.2=hc790b64_5 - qt6-main=6.9.0=h83cda92_1 - re2=2024.07.02=haf4117d_3 - referencing=0.36.2=pyh29332c3_0 - regex=2024.11.6=py312h4389bb4_0 - requests=2.32.3=pyhd8ed1ab_1 - requests-oauthlib=2.0.0=pyhd8ed1ab_1 - rich=14.0.0=pyh29332c3_0 - rich-toolkit=0.11.3=pyh29332c3_0 - roman-numerals-py=3.1.0=pyhd8ed1ab_0 - rpds-py=0.24.0=py312hfe1d9c4_0 - rsa=4.9.1=pyhd8ed1ab_0 - safetensors=0.5.3=py312h2615798_0 - scikit-learn=1.6.1=py312h816cc57_0 - scipy=1.15.2=py312h451d5c4_0 - sentence-transformers=4.1.0=pyhd8ed1ab_0 - sentencepiece=0.2.0=h459e5fc_11 - sentencepiece-python=0.2.0=py312h1f37e12_11 - sentencepiece-spm=0.2.0=h98a84dd_11 - setuptools=80.1.0=pyhff2d567_0 - shellingham=1.5.4=pyhd8ed1ab_1 - six=1.17.0=pyhd8ed1ab_0 - snappy=1.2.1=h500f7fa_1 - sniffio=1.3.1=pyhd8ed1ab_1 - snowballstemmer=2.2.0=pyhd8ed1ab_0 - soupsieve=2.7=pyhd8ed1ab_0 - sphinx=8.2.3=pyhd8ed1ab_0 - sphinxcontrib-applehelp=2.0.0=pyhd8ed1ab_1 - sphinxcontrib-devhelp=2.0.0=pyhd8ed1ab_1 - sphinxcontrib-htmlhelp=2.1.0=pyhd8ed1ab_1 - sphinxcontrib-jsmath=1.0.1=pyhd8ed1ab_1 - sphinxcontrib-qthelp=2.0.0=pyhd8ed1ab_1 - sphinxcontrib-serializinghtml=1.1.10=pyhd8ed1ab_1 - spyder-kernels=3.0.3=win_pyh7428d3b_0 - stack_data=0.6.3=pyhd8ed1ab_1 - starlette=0.45.3=pyha770c72_0 - sympy=1.14.0=pyh04b8f61_5 - tbb=2021.13.0=h62715c5_1 - tenacity=9.1.2=pyhd8ed1ab_0 - threadpoolctl=3.6.0=pyhecae5ae_0 - timm=1.0.15=pyhd8ed1ab_0 - tk=8.6.13=h5226925_1 - tokenizers=0.21.1=py312h21dd274_0 - tomli=2.2.1=pyhd8ed1ab_1 - tornado=6.4.2=py312h4389bb4_0 - tqdm=4.67.1=pyhd8ed1ab_1 - traitlets=5.14.3=pyhd8ed1ab_1 - transformers=4.51.3=pyhd8ed1ab_0 - typer=0.15.3=pyhf21524f_0 - typer-slim=0.15.3=pyh29332c3_0 - typer-slim-standard=0.15.3=h1a15894_0 - typing-extensions=4.13.2=h0e9735f_0 - typing-inspection=0.4.0=pyhd8ed1ab_0 - typing_extensions=4.13.2=pyh29332c3_0 - typing_utils=0.1.0=pyhd8ed1ab_1 - tzdata=2025b=h78e105d_0 - ucrt=10.0.22621.0=h57928b3_1 - unicodedata2=16.0.0=py312h4389bb4_0 - urllib3=2.4.0=pyhd8ed1ab_0 - uvicorn=0.34.2=pyh5737063_0 - uvicorn-standard=0.34.2=h5737063_0 - vc=14.3=h2b53caa_26 - vc14_runtime=14.42.34438=hfd919c2_26 - vs2015_runtime=14.42.34438=h7142326_26 - watchfiles=1.0.5=py312h2615798_0 - wcwidth=0.2.13=pyhd8ed1ab_1 - websocket-client=1.8.0=pyhd8ed1ab_1 - websockets=15.0.1=py312h4389bb4_0 - werkzeug=3.1.3=pyhd8ed1ab_1 - wheel=0.45.1=pyhd8ed1ab_1 - widgetsnbextension=4.0.14=pyhd8ed1ab_0 - win_inet_pton=1.1.0=pyh7428d3b_8 - wrapt=1.17.2=py312h4389bb4_0 - xorg-libxau=1.0.11=hcd874cb_0 - xorg-libxdmcp=1.1.3=hcd874cb_0 - xxhash=0.8.3=hbba6f48_0 - yaml=0.2.5=h8ffe710_2 - yarl=1.20.0=py312h31fea79_0 - zeromq=4.3.5=ha9f60a1_7 - zipp=3.21.0=pyhd8ed1ab_1 - zstandard=0.23.0=py312h4389bb4_2 - zstd=1.5.7=hbeecb71_2 - pip: - torch-lr-finder==0.2.2 - torchaudio==2.5.1 - torchvision==0.20.1 Algorithm description Building of a system:\nSeparate a whole dataset into 2 or 3 datasets: training, validation, and test. Make it possible to iterate through dataset items related to a particular class. Choose a data transformation algorithm that will be used to preprocess each dataset item. Data augmentation techniques can be used. The final choice can be made after experimenting. Choose an artificial neural network model that can produce embedding vectors out of dataset items. There can be several candidate models for this role, and the final choice can be made after experimenting. Choose a way of creating a single embedding vector out of an item set related to this class. The vector represents the whole class.\nNote. In the general case, there can be multiple ways of embedding vector creation and several embedding vectors representing one class. It is not the case in this blog post. Choose database organization to store and search embedding vectors. The main feature of the database here is its algorithm for finding similarity between a query vector and stored vectors. Several alternatives may be considered.\nNote. In the general case, there can be several embedding vectors for a query object. Construct an embedding vector creation algorithm. It must be able to fill up a database with the vectors created out of a training dataset. Construct a validation / test algorithm to check how well the system works on unseen data. Different metrics can be used, but accuracy may be considered the main one. Construct an algorithm to process arbitrary queries to the database. Database creation:\nRun an embedding vector creation algorithm on a training dataset in order to build a vector database. The algorithm creates an embedding vector out of the training dataset for each class. All items of the same class eventually produce only one embedding vector, which represents this class. Then the algorithm places the vectors into the database. Check the accuracy of the system and other metrics on a validation dataset. Items from the validation dataset are converted into embedding vectors, which are compared with the vectors inside the previously created database in order to define predicted labels of the validation data. If the known target label of a validated item coincides with the label of a respective database response, then the prediction is correct. The more correct predictions there are, the more accuracy there is. The use of the system:\nUse an arbitrary query processing algorithm. When a query occurs, the model is used to create a new embedding vector from an input object. Then find the most similar vector in the vector database to the query vector. The label of the found database vector is the answer to the query.\nBuilding datasets Sourav Banerjee\u0026rsquo;s Animal Image Dataset (90 Different Animals) is used in this work. This non-standard dataset has been chosen for several reasons. It is small enough to be processed on an average personal computer, has a few flaws to be realistic, does not have enough data to provide stable validation results, and has no ready-to-use application programming interface. These features make working with this dataset similar to some real-world tasks. On the other hand, dozens of (newbie) researchers approved its usability in their works. Of course, there could be another dataset or a subset of a popular dataset like ImageNet.\nData augmentation techniques can be applied to virtually enlarge this small dataset. In particular, resizing, cropping, horizontal flipping, and color jittering. Data augmentation can be applied to training, validation, and test datasets. Whether it should be applied or not should be discovered during research. In this work, only one data-augmented item instance of each item can be used for database creation. In the general case, there can be multiple data-augmented instances of each item.\nLoading and preprocessing images from a solid-state drive is the most time-consuming part of a script execution process in this work. It is worth considering the use of a caching mechanism to store loaded data in virtual memory for tasks like this one. However, this may be difficult in the general case, particularly if a dataset is too big to be fully placed in random access memory. Caching is not used here because it does not suit data augmentation techniques well. Probably, caching could be used in the case of downsizing the original images and storing them in PIL or Tensor formats prepared for further transformations.\nScript execution time can be reduced using the multiprocessing feature of a \u0026ldquo;torch.DataLoader\u0026rdquo; object. In this case, data loading can occur in separate processes. It is inapplicable if there should be reproducibility since multiprocessing works differently on Linux and NT systems, which leads to different results of random number generator calls [link].\nEmbedding generation At the time of starting this work, there was a quick and simple way to get a notion of the best currently publicly available models for image classification; it is called Papers With Code. At the moment (August 2025) of writing this text, the site link redirects to its GitHub page; the site has been shut down since June 2025 [link]. It is bad for future works, but in the case of this one, it is possible to get access to interesting materials using Internet Archive [link]. Some of the best models for image classification are available via the timm library. In particular, timm Top-20 Fastest Models, Fastest timm models \u0026gt; 80% Top-1 ImageNet-1k, Fastest timm models \u0026gt; 83% ImageNet-1k Top-1 were considered for their small size and relatively high accuracy. It is worth noticing here that these models were trained on the ImageNet-1k dataset, which has images of 1000 classes, while the Banerjee\u0026rsquo;s Animal Image Dataset has only 90 classes. Therefore, potentially, these models can show more accuracy here than on the ImageNet-1k dataset.\nIn this work, some models produce significantly higher accuracy if a new task-specific head is trained (without backbone fine-tuning) during transfer learning rather than using them for embedding creation. For example, EfficientViT family.\nExperiments showed that distilled TinyViT models [link], [link], [link] are far ahead (\u0026gt; 4 % of accuracy) of other considered models in this particular task. These models have been chosen as the main models for this work (see table 1).\nTable 1. Distilled TinyViT models for embedding creation.\rModel name Params\n(M) GMACs Activations\n(M) Image size Embedding\nsize tiny_vit_5m_224\n.dist_in22k_ft_in1k 5.4 1.2 9.3 224 x 224 320 tiny_vit_11m_224\n.dist_in22k_ft_in1k 11.0 1.9 10.7 224 x 224 448 tiny_vit_21m_224\n.dist_in22k_ft_in1k 21.2 4.1 15.9 224 x 224 576 Notes: 1) the models are pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k datasets; 2) GMACs - giga multiply-accumulate operations per second.\nCreation of a vector that represents each class in a database is a task similar to pooling. Different techniques can be applied to a set of vectors of the same size to create one output vector. One of the most obvious is averaging. Suppose there is a set of embedding vectors of one class\nVi=(vi,0⋯vi,n−1),S=(V0⋯Vm−1),S=(v0,0⋯v0,n−1⋯⋯⋯vm−1,0⋯vm−1,n−1),\\begin{align}\rV_{i}=\\begin{pmatrix} v_{i,0} \u0026amp; \\cdots \u0026amp; v_{i,n-1} \\end{pmatrix},\rS=\\begin{pmatrix} V_{0} \\\\ \\cdots \\\\ V_{m-1} \\end{pmatrix},\rS=\\begin{pmatrix}\rv_{0,0} \u0026amp; \\cdots \u0026amp; v_{0,n-1} \\\\\r\\cdots \u0026amp; \\cdots \u0026amp; \\cdots \\\\\rv_{m-1,0} \u0026amp; \\cdots \u0026amp; v_{m-1,n-1}\r\\end{pmatrix}, \\nonumber\r\\end{align}Vi​=(vi,0​​⋯​vi,n−1​​),S=​V0​⋯Vm−1​​​,S=​v0,0​⋯vm−1,0​​⋯⋯⋯​v0,n−1​⋯vm−1,n−1​​​,​where m - the number of embedding vectors, n - the size of an embedding vector, Vi - the i-th embedding vector, vi,j - the j-th element of the i-th embedding vector, S - a matrix of size (m x n) that contains embedding vectors. Then the average embedding vector is defined as\nVavg=1m(∑i=0m−1vi,0⋯∑i=0m−1vi,n−1).V_{avg}=\\frac{1}{m}\\begin{pmatrix} \\displaystyle\\sum_{i=0}^{m-1}v_{i,0} \u0026amp; \\cdots \u0026amp; \\displaystyle\\sum_{i=0}^{m-1}v_{i,n-1} \\end{pmatrix}.Vavg​=m1​(i=0∑m−1​vi,0​​⋯​i=0∑m−1​vi,n−1​​).This technique was applied as the first alternative in this work and yielded satisfying results. Other techniques were not tested.\nThere are several measures to compare a request vector with database vectors in practice. For example, cosine similarity from the Sentence Transformers library [link], Euclidean / L2 distance from the FAISS library [link], and inner product from the FAISS library [link]. In this work, only cosine similarity was tested. It does not require using a separate database class; a \u0026ldquo;torch.Tensor\u0026rdquo; object stores the database vectors. The author\u0026rsquo;s previous experience tells that the accuracy and performance of a system with FAISS may be very similar to this one, but verification of this hypothesis is beyond the scope of this work. FAISS becomes more useful when there are many more vectors in a database and direct search methods are inapplicable.\nDifferent subsets of data items (images in this case) are used for database creation and validation. A validation procedure helps to assess the accuracy of a search in the database. Top-k accuracy is used here besides accuracy as an additional measure of the system quality.\nSimulation results and their precision Suppose 80 % of 5400 images are used for a database creation, 10 % for validation, and 10 % for the final test. Hence, there are only 540 images used for validation. The total number of classes is 90, which means 6 images per class are in a validation dataset. One misclassified image costs 0.185 % total accuracy and 16.7 % class accuracy. In practice, validation results are unstable and strongly depend on a validation subset selection. Accuracy may vary by ± 2 % from a sole validation result. I. e. after getting one validation accuracy value, another simulation with another validation subset may bring an accuracy value that differs by ± 2 % from the first one. Different techniques exist to mitigate this kind of problem.\nCross-validation is based on an idea of the iterative use of different subsets of one dataset for training and validation. A final validation metric (for example, accuracy) can be estimated as an average value of the validation metric measured in a series of experiments. If 80 % of a dataset is for training and 20 % is for validation, then there are 5 disjoint validation subsets, which means that all available data will be used for the final validation results. However, if 5 measure values are not enough to calculate the final value with required confidence, another data partitioning technique should be used.\nBootstrap resampling is a technique of the iterative use of random dataset partitioning to create training and validation subsets. It requires many iterations to create statistics that show distributions of metric values by percentiles. It does not require the preliminary knowledge of distribution laws, and this is a big advantage of this technique.\nEach iteration of dataset partitioning requires execution of training and validation procedures, which consume respective computational resources and time. In order to reduce the number of iterations, distribution law knowledge or hypothesis may be used. Several values of a validation metric can be enough to calculate distribution parameters with required confidence. For example, suppose that accuracy calculated for different validation subsets can be approximately described by Student\u0026rsquo;s t-distribution. Of course, unlike t-distribution, which has infinite tails, accuracy values are distributed between 0 and 1 inclusively. Probably, the most precise way is to find the true distribution law and get its parameters from experimental data; a slightly worse way is to use something like a metalog distribution that can approximate a wide variety of statistics. However, t-distribution is used here for its simplicity as an approximation near the mean value. A set of expressions to calculate the mean accuracy value is given below\nμsamp=1n∑i=0n−1xi,σsamp=1n−1∑i=0n−1(xi−μsamp)2,\\begin{align}\r\\mu_{samp}=\\frac{1}{n}\\displaystyle\\sum_{i=0}^{n-1}x_{i},\r\\sigma_{samp}=\\sqrt{\\frac{1}{n-1}\\displaystyle\\sum_{i=0}^{n-1}(x_{i}-\\mu_{samp})^2}, \\nonumber\r\\end{align}μsamp​=n1​i=0∑n−1​xi​,σsamp​=n−11​i=0∑n−1​(xi​−μsamp​)2​,​p(μsamp−δreq\u0026lt;μtrue\u0026lt;μsamp+δreq)=preq,δreq=t(preq,n−1)σsampn,\\begin{align}\rp(\\mu_{samp}-\\delta_{req}\u0026lt;\\mu_{true}\u0026lt;\\mu_{samp}+\\delta_{req})=p_{req},\r\\delta_{req}=t(p_{req},n-1)\\frac{\\sigma_{samp}}{\\sqrt{n}}, \\nonumber\r\\end{align}p(μsamp​−δreq​\u0026lt;μtrue​\u0026lt;μsamp​+δreq​)=preq​,δreq​=t(preq​,n−1)n​σsamp​​,​μtrue=μsamp±δreq with preq confidence,\\begin{align}\r\\mu_{true}=\\mu_{samp}\\pm\\delta_{req} \\text{ with } p_{req} \\text{ confidence}, \\nonumber\r\\end{align}μtrue​=μsamp​±δreq​ with preq​ confidence,​where xi - the i-th experimental value of an observed variable, n - the size of a sample, μsamp and μtrue - sample and true mean values, σsamp and σtrue - unbiased sample and true standard deviations, δreq - the margin of error, preq - required confidence level, t(preq, n – 1) - two-sided critical value for a given confidence level preq and degrees of freedom (n – 1).\nTheoretically, it may be possible to improve the accuracy of this system by fine-tuning a model on a classification task using this particular dataset. The shortage of data, imprecise results, and already achieved high accuracy values complicate this task. A single validation accuracy value after fine-tuning of the classifier is not enough to assess the progress because the value highly likely will be inside an interval where 95 % of the values lie. To mitigate this issue, the model can be fine-tuned and validated several times in order to reduce the confidence interval of the accuracy mean value. This problem may be a subject of future work.\nTable 2 represents simulation results of 3 distilled TinyViT models. In practice, it is much easier to notice the difference in accuracy between 5M and 11M models than between 11M and 21M models. It is interesting to find how an error rate decreases when a larger model is used.\n5M to 11M: (0.966−0.951)/(1−0.951)≈0.306.\\text{5M to 11M: } (0.966-0.951)/(1-0.951)\\approx0.306.5M to 11M: (0.966−0.951)/(1−0.951)≈0.306. 11M to 21M: (0.973−0.966)/(1−0.966)≈0.206.\\text{11M to 21M: } (0.973-0.966)/(1-0.966)\\approx0.206.11M to 21M: (0.973−0.966)/(1−0.966)≈0.206.Maybe it is not obvious, but despite the little accuracy differences, the larger models here, actually, provide significant improvement in prediction quality by reducing the error rate.\nTable 2. Simulation results for distilled TinyViT models.\rModel name Encoder\nparams (M) Accuracy Top-4\naccuracy tiny_vit_5m_224\n.dist_in22k_ft_in1k 5.07 0.951 ± 0.002 0.993 ± 0.001 tiny_vit_11m_224\n.dist_in22k_ft_in1k 10.5 0.966 ± 0.002 0.995 ± 0.001 tiny_vit_21m_224\n.dist_in22k_ft_in1k 20.6 0.973 ± 0.002 0.996 ± 0.001 Notes: 1) 80 % of the whole dataset is for training / encoding and 20 % is for validation; 2) the average accuracy is based on 20 simulations with arbitrary dataset partitioning.\nModel accuracy shows average prediction accuracy through all classes. This number does not show whether all classes have similar accuracy or some classes have significantly higher or lower accuracy than the average number. It is not a big deal to assess prediction accuracy when there are, for example, 2 or 10 classes because it does not take too much time to check accuracy for all classes. The problem steadily increases when the number of classes rises.\nIn probability theory, there is a cumulative distribution function (CDF) that gives the probability that a random variable is not greater than a specific value. It is a non-negative, monotonically non-decreasing function bounded between 0 and 1. A similar function can describe accuracy dependency on the share of classes. The share of classes and accuracy for each class are real numbers in the [0, 1] interval. For a given share of classes, accuracy is not greater than a provided value. For an ideal classifier, it is a horizontal line with a value of 1. The function can be called a cumulative accuracy function (CAF). The idea of this function originally came in an attempt to assess classifier quality with tens of thousands of classes. The function used in this work shows the share of problematic classes.\nA cumulative accuracy function in picture 1 shows that some classes are significantly less predictable than others. For example, 0.35 share of classes have accuracy not greater than 0.9167. Here are the classes with accuracy ≤ 0.8: duck (0.500), rat (0.583), possum (0.583), whale (0.667), mouse (0.750), and ox (0.750). This information can be used for further data analysis and system improvement.\nPicture 1. Cumulative accuracy function of a system based on the TinyViT 5M model.\nPicture 2 shows some examples of images with their true and predicted labels. Most of such examples show only correct predictions, but this instance was selected because it contains an incorrect one. Probable reasons for the incorrect prediction are the mice and rats resemblance (not each human neural network can distinguish a mouse from a rat) and the low quality of the picture. Although the system somehow managed to identify a flamingo from that cropped and gridded picture of a (paper?) flamingo model.\nPicture 2. Some predictions of a system based on the TinyViT 5M model.\nProject files The project files are listed below in alphabetical order with short descriptions.\nbanerjee.py # Copyright (C) 2025 Igor Sivchek # Licensed under the MIT License. # See license text at [https://opensource.org/license/mit]. \u0026#34;\u0026#34;\u0026#34; Tools to work with Sourav Banerjee\u0026#39;s Animal Image Dataset (90 Different Animals) v5. Changes in the directory structure of the dataset. ------------------------------------------------------- original -\u0026gt; this work ------------------------------------------------------- animals/ banerjee-animal-90/ animals/ animals/ name of the animals.txt names-of-the-animals.txt ------------------------------------------------------- Requires Python \u0026gt;= 3.12. References: 1. \u0026#34;Animal Image Dataset (90 Different Animals)\u0026#34;, by Sourav Banerjee, v5, 2022.07.17. https://www.kaggle.com/datasets/iamsouravbanerjee/animal-image-dataset-90-different-animals 2. \u0026#34;Datasets \u0026amp; DataLoaders\u0026#34;, PyTorch v2.7.0+cu126, 2024.11.05. https://docs.pytorch.org/tutorials/beginner/basics/data_tutorial.html 3. \u0026#34;pathlib - Object-oriented filesystem paths\u0026#34;, Python v3.13.3, 2025.06.02. https://docs.python.org/3/library/pathlib.html 4. \u0026#34;Built-in Exceptions\u0026#34;, Python v3.13.3, 2025.06.02. https://docs.python.org/3/library/exceptions.html \u0026#34;\u0026#34;\u0026#34; from copy import deepcopy import pathlib from typing import Any, Callable, Optional, Union import PIL import timm import torch import torchvision __all__ = [\u0026#34;Animal90\u0026#34;, \u0026#34;DatasetMaker\u0026#34;, \u0026#34;augment_data\u0026#34;] DATASET_PATH = \u0026#34;../data/banerjee-animal-90\u0026#34; DATASET_DIR = \u0026#34;animals\u0026#34; # LABEL_FILE = \u0026#34;names-of-the-animals.txt\u0026#34; # Type aliases for better readability. LabelIdx = int ImageIdx = int ActLabelIdx = int # Active label index. ActImageIdx = int # Active image index. Label = str ImageName = str # It has a method to select only necessary classes from a dataset. # Each label has its own unique internal number. External (active) label # numbers have sequantial numeration after selecting a subset of labels. These # external numbers may differ from the internal numbers. class Animal90(torch.utils.data.Dataset): def __init__( self, *, dataset_path: pathlib.Path, images: list[tuple[ImageName, LabelIdx]], labels: list[tuple[Label, ImageIdx, ImageIdx]], pil_images: bool = True, transform: Optional[Callable] = None ) -\u0026gt; None: self.__dataset_path = dataset_path # All images and labels. self.__labels = labels self.__images = images # Output data type of images: PIL (True) or Tensor (False). self.__pil_images = pil_images # A transformation applied to each image. self.__transform = transform # Provide images with only specified label indexes. self.__act_labels: list[tuple[LabelIdx, ActImageIdx, ActImageIdx]] self.__act_images: list[tuple[ImageIdx, ActLabelIdx]] self.use_labels() def __len__(self) -\u0026gt; ActImageIdx: return len(self.__act_images) def __getitem__( self, idx: ActImageIdx ) -\u0026gt; tuple[Union[torch.Tensor, PIL.ImageFile.ImageFile], ActLabelIdx]: # The main drawback here is that it loads an image by each call. image_idx, act_label_idx = self.__act_images[idx] image_name, label_idx = self.__images[image_idx] label = self.__labels[label_idx][0] image_path = self.__dataset_path.joinpath(label, image_name) if self.__pil_images: # image: PIL.ImageFile.ImageFile image = PIL.Image.open(image_path) else: # image: torch.Tensor image = torchvision.io.decode_image(image_path) if self.__transform is not None: image = self.__transform(image) return image, act_label_idx def get_num_labels(self) -\u0026gt; LabelIdx: return len(self.__labels) def get_num_act_labels(self) -\u0026gt; ActLabelIdx: return len(self.__act_labels) def act_label_idx_to_label_idx( self, act_label_idx: ActLabelIdx ) -\u0026gt; LabelIdx: return self.__act_labels[act_label_idx][0] def get_label(self, act_label_idx: ActLabelIdx) -\u0026gt; Label: label_idx = self.__act_labels[act_label_idx][0] return self.__labels[label_idx][0] def get_labels(self) -\u0026gt; list[Label]: # O(n), where n = len(self.__labels). return [label for label, _, _ in self.__labels] def get_act_labels(self) -\u0026gt; list[tuple[Label, LabelIdx]]: # O(n), where n = len(self.__labels) (the worst case). return [ (self.__labels[label_idx][0], label_idx) for label_idx, _, _ in self.__act_labels ] def get_image_path(self, act_image_idx: ActImageIdx) -\u0026gt; pathlib.Path: image_idx = self.__act_images[act_image_idx][0] image_name, label_idx = self.__images[image_idx] label = self.__labels[label_idx][0] return self.__dataset_path.joinpath(label, image_name) def get_num_images(self) -\u0026gt; ImageIdx: return len(self.__images) def get_num_images_by_label( self, act_label_idx: ActLabelIdx ) -\u0026gt; ActImageIdx: begin_act_image_idx, end_act_image_idx = ( self.__act_labels[act_label_idx][1:3] ) return end_act_image_idx - begin_act_image_idx def get_state(self) -\u0026gt; dict[str, list]: # Warning: it does not contain image transformations. return dict( images=deepcopy(self.__images), labels=deepcopy(self.__labels), act_labels=[label_idx for label_idx, _, _ in self.__act_labels], ) def get_transform(self) -\u0026gt; Optional[Callable]: return self.__transform def set_transform(self, transform: Optional[Callable]) -\u0026gt; None: self.__transform = transform def use_labels( self, *, label_indexes: Optional[list[LabelIdx]] = None, share: Optional[float] = None ) -\u0026gt; None: # share: 0.1112 -\u0026gt; 10 labels out of 90; 0.89 -\u0026gt; 80 labels out of 90. if label_indexes is not None and share is not None: raise ValueError( \u0026#34;An attempt to select labels by indexes and by share simultaneously.\u0026#34; ) if share is not None: if share \u0026lt; 0.0 or 1.0 \u0026lt; share: raise ValueError( f\u0026#34;A share must be in [0, 1] interval, but given: {share}\u0026#34; ) num_act_labels = int(share * len(self.__labels)) label_indexes = ( torch.randperm(n=len(self.__labels)) .narrow(dim=0, start=0, length=num_act_labels) .tolist() ) if label_indexes is None: # Use all labels by default. label_indexes = list(range(len(self.__labels))) if not isinstance(label_indexes, list): label_indexes = list(label_indexes) label_indexes.sort() self.__act_labels = list() self.__act_images = list() for act_label_idx, label_idx in enumerate(label_indexes): label, begin_image_idx, end_image_idx = self.__labels[label_idx] begin_act_image_idx = len(self.__act_images) self.__act_images.extend([ (image_idx, act_label_idx) for image_idx in range(begin_image_idx, end_image_idx) ]) end_act_image_idx = len(self.__act_images) self.__act_labels.append(( label_idx, begin_act_image_idx, end_act_image_idx )) def set_image_type(self, t: str) -\u0026gt; None: t = t.lower() if t == \u0026#34;pil\u0026#34;: self.__pil_images = True elif t == \u0026#34;tsr\u0026#34; or t == \u0026#34;tensor\u0026#34;: self.__pil_images = False else: raise ValueError(f\u0026#34;An unknown image type: {t}\u0026#34;) class DatasetMaker: def __init__(self, *, dataset_path: Union[str, pathlib.Path]) -\u0026gt; None: if not isinstance(dataset_path, pathlib.Path): dataset_path = pathlib.Path(dataset_path) self.__dataset_path = dataset_path self.__labels: list[tuple[Label, ImageIdx, ImageIdx]] self.__images: list[tuple[ImageName, LabelIdx]] # Set ``labels`` and ``images`` variables. self.__enum_images() def make_datasets( self, *, train_transform: Optional[Callable] = None, eval_transform: Optional[Callable] = None, valid_share: float = 0.1, test_share: float = 0.1 ) -\u0026gt; None: # train_share = 1.0 - valid_share - test_share # Check that the share values are in [0, 1] interval. # Check that there is at least one image per label. train_images = list() # list[tuple[ImageName, LabelIdx]] valid_images = list() test_images = list() train_labels = list() # list[tuple[LabelName, ImageIdx, ImageIdx]] valid_labels = list() test_labels = list() for label, begin_image_idx, end_image_idx in self.__labels: num_images = end_image_idx - begin_image_idx num_valid_images = int(num_images * valid_share) num_test_images = int(num_images * test_share) num_train_images = num_images - num_valid_images - num_test_images indexes = torch.randperm(n=num_images) indexes.add_(begin_image_idx) train_indexes, valid_indexes, test_indexes = indexes.split([ num_train_images, num_valid_images, num_test_images ]) # train_indexes.sort() # valid_indexes.sort() # test_indexes.sort() # Training dataset. begin_train_image_idx = len(train_images) train_images.extend([ self.__images[train_idx] for train_idx in train_indexes ]) end_train_image_idx = len(train_images) train_labels.append(( label, begin_train_image_idx, end_train_image_idx )) # Validation dataset. begin_valid_image_idx = len(valid_images) valid_images.extend([ self.__images[valid_idx] for valid_idx in valid_indexes ]) end_valid_image_idx = len(valid_images) valid_labels.append(( label, begin_valid_image_idx, end_valid_image_idx )) # Test dataset. begin_test_image_idx = len(test_images) test_images.extend([ self.__images[test_idx] for test_idx in test_indexes ]) end_test_image_idx = len(test_images) test_labels.append(( label, begin_test_image_idx, end_test_image_idx )) # A training dataset may have data augmentation. train_ds = Animal90( dataset_path=self.__dataset_path, images=train_images, labels=train_labels, transform=train_transform ) # There should be no data augmentation in validation and test datasets. valid_ds = Animal90( dataset_path=self.__dataset_path, images=valid_images, labels=valid_labels, transform=eval_transform ) test_ds = Animal90( dataset_path=self.__dataset_path, images=test_images, labels=test_labels, transform=eval_transform ) return train_ds, valid_ds, test_ds def restore_datasets( self, *, train_ds_state: Optional[dict[str, list]] = None, valid_ds_state: Optional[dict[str, list]] = None, test_ds_state: Optional[dict[str, list]] = None, train_transform: Optional[Callable] = None, eval_transform: Optional[Callable] = None, ) -\u0026gt; None: if train_ds_state is not None: train_ds = Animal90( dataset_path=self.__dataset_path, images=train_ds_state[\u0026#34;images\u0026#34;], labels=train_ds_state[\u0026#34;labels\u0026#34;], transform=train_transform ) train_ds.use_labels(label_indexes=train_ds_state[\u0026#34;act_labels\u0026#34;]) else: train_ds = None if valid_ds_state is not None: valid_ds = Animal90( dataset_path=self.__dataset_path, images=valid_ds_state[\u0026#34;images\u0026#34;], labels=valid_ds_state[\u0026#34;labels\u0026#34;], transform=eval_transform ) valid_ds.use_labels(label_indexes=valid_ds_state[\u0026#34;act_labels\u0026#34;]) else: valid_ds = None if test_ds_state is not None: test_ds = Animal90( dataset_path=self.__dataset_path, images=test_ds_state[\u0026#34;images\u0026#34;], labels=test_ds_state[\u0026#34;labels\u0026#34;], transform=eval_transform ) test_ds.use_labels(label_indexes=test_ds_state[\u0026#34;act_labels\u0026#34;]) else: test_ds = None return train_ds, valid_ds, test_ds def get_num_labels(self) -\u0026gt; LabelIdx: return len(self.__labels) def __enum_images(self) -\u0026gt; None: # ``pathlib.Path.walk`` added in Python 3.12. dataset_iter = pathlib.Path.walk(self.__dataset_path) # -\u0026gt; # -\u0026gt; (dirpath, dirnames, filenames) # next(dataset_iter) -\u0026gt; # -\u0026gt; (Path(\u0026#34;\u0026lt;path\u0026gt;/animals\u0026#34;), [\u0026#34;antelope\u0026#34;, ..., \u0026#34;zebra\u0026#34;], []) labels = next(dataset_iter)[1] self.__labels = list() self.__images = list() for label_idx, (root, dirs, files) in enumerate(dataset_iter): # next(dataset_iter) -\u0026gt; # -\u0026gt; (Path(\u0026#34;\u0026lt;path\u0026gt;/animals/\u0026lt;label\u0026gt;\u0026#34;), [], [\u0026#34;\u0026lt;image.jpg\u0026gt;\u0026#34;, ...]) # Each label: (label, begin_image_idx, end_image_idx) label = labels[label_idx] begin_image_idx = len(self.__images) for file in files: self.__images.append((file, label_idx)) end_image_idx = len(self.__images) self.__labels.append((label, begin_image_idx, end_image_idx)) def augment_data( data_config: dict[str, Any], ) -\u0026gt; torchvision.transforms.transforms.Compose: # data_config[\u0026#34;input_size\u0026#34;] -\u0026gt; (channels, height, width) cropping_size = data_config[\u0026#34;input_size\u0026#34;][1:3] resizing_size = int(1.05 * max(cropping_size)) data_mean = data_config[\u0026#34;mean\u0026#34;] data_std = data_config[\u0026#34;std\u0026#34;] return torchvision.transforms.transforms.Compose([ torchvision.transforms.transforms.Resize( size=resizing_size, interpolation=torchvision.transforms.InterpolationMode.BICUBIC, max_size=None, antialias=True ), torchvision.transforms.transforms.CenterCrop(size=cropping_size), torchvision.transforms.transforms.RandomHorizontalFlip(p=0.5), torchvision.transforms.transforms.ColorJitter( brightness=(0.6, 1.4), contrast=(0.6, 1.4), saturation=(0.6, 1.4) ), timm.data.transforms.MaybeToTensor(), torchvision.transforms.transforms.Normalize( mean=torch.Tensor(data_mean), std=torch.Tensor(data_std) ), ]) if __name__ == \u0026#34;__main__\u0026#34;: dataset_path = pathlib.Path(DATASET_PATH) / DATASET_DIR dsm = DatasetMaker(dataset_path=dataset_path) train_ds, valid_ds, test_ds = dsm.make_datasets() The main class is \u0026ldquo;DatasetMaker\u0026rdquo; that randomly splits the whole Banerjee\u0026rsquo;s dataset into training, validation, and test datasets with the \u0026ldquo;make_datasets\u0026rdquo; method. \u0026ldquo;restore_datasets\u0026rdquo; can restore datasets using their previously stored states.\n\u0026ldquo;Animal90\u0026rdquo; is a dataset class with necessary methods to build PyTorch data loaders and operate with underlying data.\n\u0026ldquo;augment_data\u0026rdquo; is a data augmentation function used to randomly transform the original dataset images.\ncumacc.py # Copyright (C) 2025 Igor Sivchek # Licensed under the MIT License. # See license text at [https://opensource.org/license/mit]. \u0026#34;\u0026#34;\u0026#34; Cumulative accuracy function. This metric may be valuable for multiclass classification problems. \u0026#34;\u0026#34;\u0026#34; import matplotlib.pyplot as plt import numpy as np import scipy __all__ = [\u0026#34;cumulative_accuracy\u0026#34;] # It is like the cumulative distribution function in probability theory. # x-axis - the share of classes or the number of classes. # y-axis - accuracy. # At a particular share of classes, the accuracy is not greater than the # respective value. # In an ideal case, it is a horizontal line with a value of 1.0. def cumulative_accuracy( *, accuracies: np.ndarray, num_intervals: int = 10 ) -\u0026gt; tuple[np.ndarray]: if num_intervals \u0026lt; 1: raise ValueError( \u0026#34;The number of intervals must be not less than 1, but given: {0}\u0026#34; .format(num_intervals) ) num_pts = num_intervals + 1 sorted_accs = np.sort(accuracies) shares = np.linspace(start=0.0, stop=1.0, num=num_pts) pts_per_interv = len(sorted_accs) / num_intervals cum_accs = np.zeros(num_pts, dtype=np.float32) for idx in range(1, num_pts): cum_num_pts = int(idx * pts_per_interv) acc_idx = (cum_num_pts - 1 if cum_num_pts \u0026gt; 0 else 0) cum_accs[idx] = sorted_accs[acc_idx] return cum_accs, shares if __name__ == \u0026#39;__main__\u0026#39;: labels = np.array([2, 3, 4, 5, 6, 7, 8], dtype=np.int64) # accuracies = np.array([0.25, 0.5, 1.0, 0.75, 0.8, 0.4, 0.6, 0.65, 0.9, 0.85, 0.95, 0.7, 0.55, 0.2, 0.8, 0.6, 0.85, 0.9, 0.95, 0.7, 0.45, 0.8, 0.95, 0.75, 0.1], dtype=np.float32) accuracies = np.array([0.2, 0.5, 0.75, 0.8, 0.4, 0.6, 0.65, 0.9, 0.85, 0.95, 0.7, 0.55, 0.2, 0.8, 0.6, 0.85, 0.9, 0.95, 0.7, 0.45, 0.8, 0.95, 0.75, 0.1], dtype=np.float32) cum_accs, shares = cumulative_accuracy(accuracies=accuracies) shares_cs = np.linspace(shares[0], shares[-1], 100) cum_accs_cs = scipy.interpolate.CubicSpline(shares, cum_accs) fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 8), constrained_layout=True) fig.suptitle(\u0026#34;Cumulative accuracy function\u0026#34;) ax.plot(shares, cum_accs) ax.plot(shares_cs, cum_accs_cs(shares_cs)) # ax.set_xlim([0.0, 1.0]) # ax.set_ylim([0.0, 1.0]) ax.set_xlabel(\u0026#34;share\u0026#34;) ax.set_ylabel(\u0026#34;accuracy\u0026#34;, rotation=\u0026#34;horizontal\u0026#34;) ax.yaxis.set_label_coords(0.0, 1.05) ax.grid() fig.show() Implementation of a cumulative accuracy function. The function is calculated from an accuracy array for a selected number of points.\nengine.py # Copyright (C) 2025 Igor Sivchek # Licensed under the MIT License. # See license text at [https://opensource.org/license/mit]. \u0026#34;\u0026#34;\u0026#34; Tools to work with artificial neural networks and datasets. Creation and usage of a vector database. \u0026#34;\u0026#34;\u0026#34; import datetime from types import SimpleNamespace from typing import Any, Callable, Optional, Union import numpy as np from sentence_transformers.util import semantic_search import torch from tqdm import tqdm __all__ = [\u0026#34;Encoder\u0026#34;] class Encoder: def __init__( self, *, model: torch.nn.Module, device: torch.device, train_dl: torch.utils.data.Dataset, valid_dl: torch.utils.data.Dataset, test_dl: torch.utils.data.Dataset, config: SimpleNamespace, ) -\u0026gt; None: self.model = model.to(device) self.device = device self.train_dl = train_dl self.valid_dl = valid_dl self.test_dl = test_dl self.config = config self.embeddings: Optional[torch.Tensor] = None self.labels: Optional[np.ndarray] = None # It uses all labels of a training dataset. def encode(self, *, inform: bool = False) -\u0026gt; dict[str, Any]: starting_time = datetime.datetime.now() if inform: print( \u0026#34;[Encode/TrainDS] Embedding creation started at: {0}\u0026#34; .format(starting_time) ) embeddings = list() labels = list() total_num_proc_batches = 0 total_num_proc_samples = 0 pbar = tqdm( iterable=range(self.train_dl.dataset.get_num_labels()), desc=\u0026#34;[Encode/TrainDS]\u0026#34;, leave=True, disable=(not inform), position=0, ) self.model.eval() with torch.inference_mode(): for label_idx in pbar: # Process the dataset by active labels. self.train_dl.dataset.use_labels(label_indexes=[label_idx]) label_embeddings = list() for batch in self.train_dl: # Process images with a specific label. if ( self.config.short_run_batches is not None and self.config.short_run_batches \u0026lt;= total_num_proc_batches ): if inform: pbar.write( \u0026#34;[Encode/TrainDS] A short run completed: {0} batches processed.\u0026#34; .format(total_num_proc_batches) ) break # type(batch) -\u0026gt; list # type(batch[0]) -\u0026gt; torch.Tensor # batch[0].shape -\u0026gt; torch.Size([batch_size, num_channels, img_height, img_width]) # batch[0].dtype -\u0026gt; torch.float32 # type(batch[1]) -\u0026gt; torch.Tensor # batch[1].shape -\u0026gt; torch.Size([batch_size]) # batch[1].dtype -\u0026gt; torch.int64 # Check that there is no another active label index (all # active indexes are 0). assert(batch[1].sum() == 0) inputs = batch[0].to(self.device) outputs = self.model(inputs) label_embeddings.append(outputs.cpu().numpy()) # type(outputs) -\u0026gt; torch.Tensor # outputs.shape -\u0026gt; torch.Size([batch_size, embedding_dim]) # outputs.dtype -\u0026gt; torch.float32 total_num_proc_batches += 1 total_num_proc_samples += len(batch[1]) if len(label_embeddings) \u0026gt; 0: label_embeddings = np.concatenate(label_embeddings) average_label_embedding = label_embeddings.mean(axis=0) embeddings.append(average_label_embedding) labels.append(self.train_dl.dataset.get_label(0)) else: # Short run; last batch was not processed. break pbar.set_description(\u0026#34;[Encode/TrainDS] Progress\u0026#34;) # Ideally, a context manager should be used here to restore the # original set of labels. self.train_dl.dataset.use_labels() # Use all labels. assert( self.train_dl.dataset.get_num_act_labels() == self.train_dl.dataset.get_num_labels() ) self.labels = np.array(labels) self.embeddings = ( torch .from_numpy(np.array(embeddings)) .to(self.device) ) assert(self.labels.shape[0] == self.embeddings.shape[0]) num_trainable_params = sum( p.numel() for p in self.model.parameters() if p.requires_grad ) embedding_dim = self.model.head_hidden_size ending_time = datetime.datetime.now() duration = ending_time - starting_time if inform: print( \u0026#34;[Encode/TrainDS] Embedding creation finished at: {0}\u0026#34; .format(ending_time) ) stats = dict( model_name=self.model.default_cfg[\u0026#34;hf_hub_id\u0026#34;], num_trainable_params=num_trainable_params, embedding_dim=embedding_dim, dataset_part=\u0026#34;train\u0026#34;, num_labels=self.labels.shape[0], num_images=total_num_proc_samples, duration=duration, ) if inform: total_num_labels = self.train_dl.dataset.get_num_labels() total_num_images = self.train_dl.dataset.get_num_images() print(\u0026#34;-- Stats --\u0026#34;) print(\u0026#34;Model name:\u0026#34;, stats[\u0026#34;model_name\u0026#34;]) print( \u0026#34;Number of trainable model parameters:\u0026#34;, stats[\u0026#34;num_trainable_params\u0026#34;] ) print(\u0026#34;Size of an embedding vector:\u0026#34;, stats[\u0026#34;embedding_dim\u0026#34;]) print(\u0026#34;Dataset part:\u0026#34;, stats[\u0026#34;dataset_part\u0026#34;]) print( \u0026#34;Number of processed labels: {0}/{1}\u0026#34; .format(stats[\u0026#34;num_labels\u0026#34;], total_num_labels) ) print( \u0026#34;Number of processed images: {0}/{1}\u0026#34; .format(stats[\u0026#34;num_images\u0026#34;], total_num_images) ) print(\u0026#34;Calculation duration:\u0026#34;, stats[\u0026#34;duration\u0026#34;]) print(\u0026#34;-----------\u0026#34;) return stats def evaluate( self, *, dataset_part: str, inform: bool = False, ) -\u0026gt; dict[str, Any]: dataset_part = dataset_part.lower() if dataset_part == \u0026#34;train\u0026#34;: dataloader = self.train_dl elif dataset_part == \u0026#34;valid\u0026#34;: dataloader = self.valid_dl elif dataset_part == \u0026#34;test\u0026#34;: dataloader = self.test_dl else: raise ValueError(\u0026#34;An unknown dataset part: {0}\u0026#34;.format(dataset_part)) starting_time = datetime.datetime.now() dataset_part_disp = dataset_part.capitalize() + \u0026#34;DS\u0026#34; if inform: print( \u0026#34;[Eval/{0}] Accuracy calculation started at: {1}\u0026#34; .format(dataset_part_disp, starting_time) ) all_predictions = list() all_targets = list() total_num_proc_samples = 0 total_num_corr_preds = 0 total_num_corr_topk_preds = 0 pbar = tqdm( iterable=dataloader, desc=f\u0026#34;[Eval/{dataset_part_disp}]\u0026#34;, leave=True, disable=(not inform), position=0, ) self.model.eval() with torch.inference_mode(): for batch_idx, batch in enumerate(pbar): # Process the dataset by image batches. if ( self.config.short_run_batches is not None and self.config.short_run_batches \u0026lt;= batch_idx ): if inform: pbar.write( \u0026#34;[Eval/{0}] A short run completed: {1} batches processed.\u0026#34; .format(dataset_part_disp, batch_idx) ) break inputs = batch[0].to(self.device) targets = batch[1] outputs = self.model(inputs) hits = semantic_search(outputs, self.embeddings, top_k=self.config.topk) num_corr_preds = 0 num_corr_topk_preds = 0 for prediction, target in zip(hits, targets): predicted_label_indexes = [p[\u0026#34;corpus_id\u0026#34;] for p in prediction] all_predictions.append(predicted_label_indexes[0]) all_targets.append(target.item()) num_corr_preds += int(predicted_label_indexes[0] == target) num_corr_topk_preds += int(target in predicted_label_indexes) total_num_proc_samples += len(targets) total_num_corr_preds += num_corr_preds total_num_corr_topk_preds += num_corr_topk_preds accuracy = total_num_corr_preds / total_num_proc_samples pbar.set_description(f\u0026#34;[Eval/{dataset_part_disp}] Accuracy: {accuracy:.4}\u0026#34;) accuracy = total_num_corr_preds / total_num_proc_samples topk_accuracy = total_num_corr_topk_preds / total_num_proc_samples num_trainable_params = sum( p.numel() for p in self.model.parameters() if p.requires_grad ) embedding_dim = self.model.head_hidden_size ending_time = datetime.datetime.now() duration = ending_time - starting_time if inform: print( \u0026#34;[Eval/{0}] Accuracy calculation finished at: {1}\u0026#34; .format(dataset_part_disp, ending_time) ) stats = dict( model_name=self.model.default_cfg[\u0026#34;hf_hub_id\u0026#34;], num_trainable_params=num_trainable_params, embedding_dim=embedding_dim, dataset_part=dataset_part, targets=all_targets, predictions=all_predictions, num_labels=len(set(all_targets)), num_images=total_num_proc_samples, accuracy=accuracy, topk=self.config.topk, topk_accuracy=topk_accuracy, duration=duration, ) if inform: total_num_labels = dataloader.dataset.get_num_act_labels() total_num_images = len(dataloader.dataset) print(\u0026#34;-- Stats --\u0026#34;) print(\u0026#34;Model name:\u0026#34;, stats[\u0026#34;model_name\u0026#34;]) print( \u0026#34;Number of trainable model parameters:\u0026#34;, stats[\u0026#34;num_trainable_params\u0026#34;] ) print(\u0026#34;Size of an embedding vector:\u0026#34;, stats[\u0026#34;embedding_dim\u0026#34;]) print(\u0026#34;Dataset part:\u0026#34;, stats[\u0026#34;dataset_part\u0026#34;]) print( \u0026#34;Number of processed labels: {0}/{1}\u0026#34; .format(stats[\u0026#34;num_labels\u0026#34;], total_num_labels) ) print( \u0026#34;Number of processed images: {0}/{1}\u0026#34; .format(stats[\u0026#34;num_images\u0026#34;], total_num_images) ) print(\u0026#34;Accuracy:\u0026#34;, stats[\u0026#34;accuracy\u0026#34;]) print(f\u0026#34;Top {stats[\u0026#39;topk\u0026#39;]} accuracy: {stats[\u0026#39;topk_accuracy\u0026#39;]}\u0026#34;) print(\u0026#34;Calculation duration:\u0026#34;, stats[\u0026#34;duration\u0026#34;]) print(\u0026#34;-----------\u0026#34;) return stats def infer( self, *, inputs: torch.Tensor, ) -\u0026gt; list[list[dict[str, Any]]]: # type(batch) -\u0026gt; list # type(batch[0]) -\u0026gt; torch.Tensor # batch[0].shape -\u0026gt; torch.Size([batch_size, num_channels, img_height, img_width]) # batch[0].dtype -\u0026gt; torch.float32 # type(batch[1]) -\u0026gt; torch.Tensor # batch[1].shape -\u0026gt; torch.Size([batch_size]) # batch[1].dtype -\u0026gt; torch.int64 self.model.eval() with torch.inference_mode(): inputs = inputs.to(self.device) outputs = self.model(inputs) hits = semantic_search(outputs, self.embeddings, top_k=self.config.topk) return hits Creation and validation of a vector database are implemented in the \u0026ldquo;Engine\u0026rdquo; class. It works with a given artificial neural network model, datasets (for training, validation, and test), and configuration settings. Its main methods inform a user about the current progress of a respective complex operation.\n\u0026ldquo;encode\u0026rdquo; method creates a vector database using a training dataset. A model produces an embedding vector for each dataset item (i. e. a transformed image), which are then averaged for each item class (cats, dogs, etc.).\n\u0026ldquo;evaluate\u0026rdquo; method is used to calculate prediction accuracy on a selected dataset. A model produces an embedding vector for each request. Then a special algorithm searches the most similar vectors in the database. It calculates accuracy and top-k accuracy. Mainly it is useful to get validation results.\n\u0026ldquo;infer\u0026rdquo; method is used to process arbitrary user requests. The logic is similar to that from the \u0026ldquo;evaluate\u0026rdquo; method, but it uses arbitrary input data instead of using a particular dataset.\np1_similarity_search_rsch.py # Copyright (C) 2025 Igor Sivchek # Licensed under the MIT License. # See license text at [https://opensource.org/license/mit]. \u0026#34;\u0026#34;\u0026#34; Part 1: similarity search using a pretrained model. References: 1. \u0026#34;Getting Started with PyTorch Image Model (timm): a practitioner\u0026#39;s guide\u0026#34;, by Chris Hughes, 2022.02.01, https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055-2/ 2. \u0026#34;Getting Started With Embeddings\u0026#34;, by Omar Espejel, 2022.06.23. https://huggingface.co/blog/getting-started-with-embeddings 3. \u0026#34;TinyViT: Fast Pretraining Distillation for Small Vision Transformer\u0026#34;, by by Wu et al., 2022.07.21. https://arxiv.org/abs/2207.10666 A TinyViT image classification model. Pretrained on ImageNet-22k with distillation and fine-tuned on ImageNet-1k by paper authors. https://huggingface.co/timm/tiny_vit_21m_224.dist_in22k_ft_in1k 4. \u0026#34;MobileNetV4: Universal Models for the Mobile Ecosystem\u0026#34;, by Qin et al., 2024.09.29. https://arxiv.org/abs/2404.10518 A MobileNet-V4 image classification model. Trained on ImageNet-1k by Ross Wightman. https://huggingface.co/timm/mobilenetv4_conv_small.e2400_r224_in1k 5. \u0026#34;Searching for MobileNetV3\u0026#34;, by Howard et al., 2019.11.20. https://arxiv.org/abs/1905.02244 A MobileNet-v3 image classification model. Trained on ImageNet-1k in timm using recipe template described below. https://huggingface.co/timm/mobilenetv3_small_100.lamb_in1k 6. \u0026#34;EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention\u0026#34;, by Liu et al., 2023.05.11. https://arxiv.org/abs/2305.07027 An EfficientViT (MSRA) image classification model. Trained on ImageNet-1k by paper authors. https://huggingface.co/timm/efficientvit_m0.r224_in1k https://huggingface.co/timm/efficientvit_m1.r224_in1k https://huggingface.co/timm/efficientvit_m2.r224_in1k https://huggingface.co/timm/efficientvit_m3.r224_in1k https://huggingface.co/timm/efficientvit_m4.r224_in1k \u0026#34;\u0026#34;\u0026#34; import datetime import pathlib import random from types import SimpleNamespace from typing import Any, Callable, Optional, Union import numpy as np from sklearn.metrics import multilabel_confusion_matrix import timm import torch from tqdm import tqdm import banerjee import cumacc import engine import randomness import store import utils import visual AUGMENT_ENCODE_DATA = False AUGMENT_EVAL_DATA = False BATCH_SIZE = 32 DATASET_PATH = \u0026#34;../data/banerjee-animal-90\u0026#34; DATASET_DIR = \u0026#34;animals\u0026#34; DATA_AUGMENTATION = False EMBEDDINGS_FILENAME = \u0026#34;embeddings, tiny_vit_11m_224\u0026#34; # 196_000 bytes FORCE_CPU = False # MODEL_NAME = \u0026#34;efficientvit_m0.r224_in1k\u0026#34; # Params: 2.3M. Valid acc.: 0.67, n = 1. # MODEL_NAME = \u0026#34;efficientvit_m1.r224_in1k\u0026#34; # Params: 3.0M. Valid acc.: 0.82, n = 1. # MODEL_NAME = \u0026#34;efficientvit_m2.r224_in1k\u0026#34; # Params: 4.2M. Valid acc.: 0.80, n = 1. # MODEL_NAME = \u0026#34;efficientvit_m3.r224_in1k\u0026#34; # Params: 6.9M. Valid acc.: 0.69, n = 1. # MODEL_NAME = \u0026#34;efficientvit_m4.r224_in1k\u0026#34; # Params: 8.8M. Valid acc.: 0.68, n = 1. # MODEL_NAME = \u0026#34;mobilenetv3_small_100.lamb_in1k\u0026#34; # Params: 2.5M. Valid acc.: 0.84, n = 1. # MODEL_NAME = \u0026#34;mobilenetv4_conv_small.e2400_r224_in1k\u0026#34; # Params: 3.8M. Valid acc.: 0.87, n = 1. # MODEL_NAME = \u0026#34;tiny_vit_5m_224.dist_in22k_ft_in1k\u0026#34; # Params: 5.4M. Valid acc.: 0.9513 +/- 0.0021 with 0.95 confidence, n = 20. MODEL_NAME = \u0026#34;tiny_vit_11m_224.dist_in22k_ft_in1k\u0026#34; # Params: 11.0M. Valid acc.: 0.9664 +/- 0.0021 with 0.95 confidence, n = 20. # MODEL_NAME = \u0026#34;tiny_vit_21m_224.dist_in22k_ft_in1k\u0026#34; # Params: 21.2M. Valid acc.: 0.973 +/- 0.0023 with 0.95 confidence, n = 20. NUM_WORKERS = 0 # 0 - no multiprocessing. PROFILE = False # Do profiling. PROFILING_FILENAME = \u0026#34;profiling-dump\u0026#34; RANDOM_SEED = 17 REPRODUCIBLE = True SAVE_EMBEDDINGS = False SHORT_RUN = False SHORT_RUN_BATCHES = 4 STUDENT_N = 20 # STUDENT_T = 3.182 # n = 4, 2-sided confidence 0.95. # STUDENT_T = 2.365 # n = 8, 2-sided confidence 0.95. STUDENT_T = 2.093 # n = 20, 2-sided confidence 0.95. STUDENT_TWO_SIDED_CONFIDENCE = 0.95 TOPK = 4 #%% Script. if __name__ == \u0026#34;__main__\u0026#34;: print(\u0026#34;Part 1: similarity search using a pretrained model.\u0026#34;) #%% Create a model and a dataset maker. device = utils.get_device(force_cpu=FORCE_CPU) if REPRODUCIBLE: randomness.set_determinism( seed=RANDOM_SEED, use_deterministic_algorithms=False ) tgen = randomness.make_torch_generator(seed=RANDOM_SEED) else: tgen = randomness.make_torch_generator() # https://huggingface.co/docs/timm/reference/models#timm.create_model model = timm.create_model( model_name=MODEL_NAME, pretrained=True, # Load model parameters. num_classes=0, # Remove the model head. # global_pool=\u0026#34;\u0026#34;, # Remove pooling. ) # \u0026#34;discrepancy between number of features\u0026#34; by Elsospi, 2024.09.13. # https://huggingface.co/timm/mobilenetv4_conv_small.e2400_r224_in1k/discussions/3 # model.conv_head = torch.nn.Identity() # if hasattr(model, \u0026#34;conv_norm\u0026#34;): # model.conv_norm = torch.nn.Identity() # model.to(device) # model.eval() # model.training -\u0026gt; False # Get model specific transforms (normalization, resizing). data_config = timm.data.resolve_model_data_config(model) if AUGMENT_ENCODE_DATA: encode_transform = banerjee.augment_data(data_config) else: encode_transform = timm.data.create_transform( **data_config, is_training=False, ) if AUGMENT_EVAL_DATA: eval_transform = banerjee.augment_data(data_config) else: eval_transform = timm.data.create_transform( **data_config, is_training=False, ) # transforms = timm.data.create_transform(**data_config, is_training=False) # type(transforms) -\u0026gt; torchvision.transforms.transforms.Compose # print(transforms) -\u0026gt; # Compose( # Resize(size=235, interpolation=bicubic, max_size=None, antialias=True) # CenterCrop(size=(224, 224)) # MaybeToTensor() # Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250])) # ) # Note: this pipeline of transfroms is aimed to work with a PIL image # object as input and produces a PyTorch tensor object as output. dataset_path = pathlib.Path(DATASET_PATH) / DATASET_DIR dsm = banerjee.DatasetMaker(dataset_path=dataset_path) config = dict( short_run_batches=(SHORT_RUN_BATCHES if SHORT_RUN else None), topk=TOPK, ) config = SimpleNamespace(**config) encoder = engine.Encoder( model=model, device=device, train_dl=None, valid_dl=None, test_dl=None, config=config, ) #%% Create embeddings and calculate accuracy on training and validation # datasets. with utils.Profile(active=PROFILE) as profiler: run_results = list() pbar = tqdm( iterable=range(STUDENT_N), desc=\u0026#34;[Run]\u0026#34;, leave=True, disable=(not True), position=0, ) for run_idx in pbar: train_ds, valid_ds, _ = dsm.make_datasets( train_transform=encode_transform, eval_transform=eval_transform, valid_share=0.2, test_share=0.0, ) train_dl = torch.utils.data.DataLoader( dataset=train_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKERS, worker_init_fn=randomness.seed_worker, generator=tgen ) valid_dl = torch.utils.data.DataLoader( dataset=valid_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKERS, worker_init_fn=randomness.seed_worker, generator=tgen ) encoder.train_dl = train_dl encoder.valid_dl = valid_dl encres = encoder.encode(inform=True) evres_train = encoder.evaluate(dataset_part=\u0026#34;Train\u0026#34;, inform=True) evres_valid = encoder.evaluate(dataset_part=\u0026#34;Valid\u0026#34;, inform=True) run_results.append(dict( encode=encres, eval_train=evres_train, eval_valid=evres_valid )) profiler.dump(filename=PROFILING_FILENAME, sort_by=\u0026#34;cumtime\u0026#34;) #%% Calculate the final stats in case of a series of simulations. if len(run_results) \u0026gt; 1: total_num_valid_labels = valid_dl.dataset.get_num_act_labels() total_num_valid_images = len(valid_dl.dataset) total_valid_accuracy = 0.0 total_valid_topk_accuracy = 0.0 total_valid_duration = datetime.timedelta(0.0) for runres in run_results: total_valid_accuracy += runres[\u0026#34;eval_valid\u0026#34;][\u0026#34;accuracy\u0026#34;] total_valid_topk_accuracy += runres[\u0026#34;eval_valid\u0026#34;][\u0026#34;topk_accuracy\u0026#34;] total_valid_duration += runres[\u0026#34;eval_valid\u0026#34;][\u0026#34;duration\u0026#34;] total_valid_accuracy /= len(run_results) total_valid_topk_accuracy /= len(run_results) # Standard deviations of a sample. total_valid_accuracy_std_dev = 0.0 total_valid_topk_accuracy_std_dev = 0.0 for runres in run_results: total_valid_accuracy_std_dev += (runres[\u0026#34;eval_valid\u0026#34;][\u0026#34;accuracy\u0026#34;] - total_valid_accuracy)**2 total_valid_topk_accuracy_std_dev += (runres[\u0026#34;eval_valid\u0026#34;][\u0026#34;topk_accuracy\u0026#34;] - total_valid_topk_accuracy)**2 # With Bessel\u0026#39;s correction. total_valid_accuracy_std_dev = np.sqrt(total_valid_accuracy_std_dev / (len(run_results) - 1)) total_valid_topk_accuracy_std_dev = np.sqrt(total_valid_topk_accuracy_std_dev / (len(run_results) - 1)) # Confidence interval. total_valid_accuracy_ci = STUDENT_T * total_valid_accuracy_std_dev / np.sqrt(len(run_results)) total_valid_topk_accuracy_ci = STUDENT_T * total_valid_topk_accuracy_std_dev / np.sqrt(len(run_results)) final_stats = dict( model_name=run_results[0][\u0026#34;eval_valid\u0026#34;][\u0026#34;model_name\u0026#34;], num_trainable_params=run_results[0][\u0026#34;eval_valid\u0026#34;][\u0026#34;num_trainable_params\u0026#34;], embedding_dim=run_results[0][\u0026#34;eval_valid\u0026#34;][\u0026#34;embedding_dim\u0026#34;], dataset_part=run_results[0][\u0026#34;eval_valid\u0026#34;][\u0026#34;dataset_part\u0026#34;], num_labels=run_results[0][\u0026#34;eval_valid\u0026#34;][\u0026#34;num_labels\u0026#34;], num_images=run_results[0][\u0026#34;eval_valid\u0026#34;][\u0026#34;num_images\u0026#34;], accuracy=total_valid_accuracy, accuracy_confidence=STUDENT_TWO_SIDED_CONFIDENCE, accuracy_ci=total_valid_accuracy_ci, topk=run_results[0][\u0026#34;eval_valid\u0026#34;][\u0026#34;topk\u0026#34;], topk_accuracy=total_valid_topk_accuracy, topk_accuracy_ci=total_valid_topk_accuracy_ci, duration=total_valid_duration, ) # A couple of interesting variables to explore manually. valid_accuracies = [runres[\u0026#34;eval_valid\u0026#34;][\u0026#34;accuracy\u0026#34;] for runres in run_results] valid_topk_accuracies = [runres[\u0026#34;eval_valid\u0026#34;][\u0026#34;topk_accuracy\u0026#34;] for runres in run_results] print(\u0026#34;-- Final stats --\u0026#34;) print(\u0026#34;Model name:\u0026#34;, final_stats[\u0026#34;model_name\u0026#34;]) print( \u0026#34;Number of trainable model parameters:\u0026#34;, final_stats[\u0026#34;num_trainable_params\u0026#34;] ) print(\u0026#34;Size of an embedding vector:\u0026#34;, final_stats[\u0026#34;embedding_dim\u0026#34;]) print(\u0026#34;Dataset part:\u0026#34;, final_stats[\u0026#34;dataset_part\u0026#34;]) print( \u0026#34;Number of processed labels: {0}/{1}\u0026#34; .format(final_stats[\u0026#34;num_labels\u0026#34;], total_num_valid_labels) ) print( \u0026#34;Number of processed images: {0}/{1}\u0026#34; .format(final_stats[\u0026#34;num_images\u0026#34;], total_num_valid_images) ) print( \u0026#34;Accuracy: {0:.4} +/- {1:.2} with {2:.3} confidence\u0026#34; .format( final_stats[\u0026#34;accuracy\u0026#34;], final_stats[\u0026#34;accuracy_ci\u0026#34;], final_stats[\u0026#34;accuracy_confidence\u0026#34;] ) ) print( \u0026#34;Top {0} accuracy: {1:.4} +/- {2:.2} with {3:.3} confidence\u0026#34; .format( final_stats[\u0026#39;topk\u0026#39;], final_stats[\u0026#39;topk_accuracy\u0026#39;], final_stats[\u0026#34;topk_accuracy_ci\u0026#34;], final_stats[\u0026#34;accuracy_confidence\u0026#34;] ) ) print(\u0026#34;Calculation duration:\u0026#34;, final_stats[\u0026#34;duration\u0026#34;]) print(\u0026#34;-----------------\u0026#34;) #%% Save embeddings. if SAVE_EMBEDDINGS: embeddings_filename = pathlib.Path(\u0026#34;../data/models/\u0026#34;) / (MODEL_NAME + \u0026#34;.embed\u0026#34;) store.save_embeddings( embeddings=encoder.embeddings.cpu().numpy(), targes=encoder.labels, filename=embeddings_filename, inform=True, ) #%% Cumulative accuracy function. all_label_indexes = np.arange( start=0, stop=evres_valid[\u0026#34;num_labels\u0026#34;], step=1 ) mcm_valid = multilabel_confusion_matrix( y_true=evres_valid[\u0026#34;targets\u0026#34;], y_pred=evres_valid[\u0026#34;predictions\u0026#34;], labels=all_label_indexes ) true_pos = mcm_valid[:, 1, 1] # True positive. false_neg = mcm_valid[:, 1, 0] # False negative. class_samples = true_pos + false_neg assert(np.sum(class_samples) == len(encoder.valid_dl.dataset)) class_accuracies = true_pos / class_samples cum_accuracies, class_shares = cumacc.cumulative_accuracy( accuracies=class_accuracies, num_intervals=20 ) visual.draw_cumulative_accuracy( shares=class_shares, cum_accs=cum_accuracies, figsize=(4, 4) ) #%% The most problematic classes. indexes_of_sorted_class_accuracies = np.argsort(class_accuracies) class_labels = encoder.valid_dl.dataset.get_labels() class_accuracy_threshold = 0.8 least_accurate_classes = list() for idx in indexes_of_sorted_class_accuracies: if class_accuracies[idx] \u0026gt; class_accuracy_threshold: break least_accurate_classes.append(dict( label=class_labels[idx], accuracy=float(class_accuracies[idx]) )) #%% Manual checking. num_images = len(encoder.valid_dl.dataset) # image_idx = 23 image_idx = random.randrange(num_images) image_path = encoder.valid_dl.dataset.get_image_path(image_idx) image, target_label_idx = encoder.valid_dl.dataset[image_idx] inputs = image.unsqueeze(0) result = encoder.infer(inputs=inputs) predicted_label_indexes = [p[\u0026#34;corpus_id\u0026#34;] for p in result[0]] target_label = encoder.valid_dl.dataset.get_label(target_label_idx) predicted_labels = [ encoder.valid_dl.dataset.get_label(idx) for idx in predicted_label_indexes ] print(\u0026#34;-- Manual checking --\u0026#34;) print(\u0026#34;Image index:\u0026#34;, image_idx) print(\u0026#34;Image path:\u0026#34;, image_path) print(\u0026#34;Target label:\u0026#34;, target_label) print(\u0026#34;Predicted labels:\u0026#34;, predicted_labels) print(\u0026#34;Correct:\u0026#34;, predicted_labels[0] == target_label) print(\u0026#34;Target is among predicted:\u0026#34;, target_label in predicted_labels) print(\u0026#34;---------------------\u0026#34;) #%% Visual checking. # https://docs.pytorch.org/tutorials/beginner/transfer_learning_tutorial.html # https://docs.pytorch.org/vision/main/auto_examples/others/plot_visualization_utils.html # https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.imshow.html num_images = len(encoder.valid_dl.dataset) image_indexes = np.random.permutation(num_images)[:6] inputs = list() target_labels = list() for image_idx in image_indexes: image, target_label_idx = encoder.valid_dl.dataset[image_idx] inputs.append(image) target_labels.append(encoder.valid_dl.dataset.get_label(target_label_idx)) inputs = torch.stack(inputs) result = encoder.infer(inputs=inputs) predicted_label_indexes = [r[0][\u0026#34;corpus_id\u0026#34;] for r in result] predicted_labels = [ encoder.valid_dl.dataset.get_label(idx) for idx in predicted_label_indexes ] visual.show_images( inputs=inputs, targets=target_labels, predictions=predicted_labels, mean=data_config[\u0026#34;mean\u0026#34;], std=data_config[\u0026#34;std\u0026#34;], figsize=(4, 4) ) The main program is written in a script file. It describes calculations necessary for this research. The script contains different code sections, which can be called separately, but mostly they require pre-execution of a previous section. The code sections:\nGlobal constants used during a script run. Loading of an artificial neural network model and creation of an \u0026ldquo;engine.Encoder\u0026rdquo; instance with that model. After an encoder object has been created, it is necessary to create a vector database and assess the accuracy of answers that the system produces. It is done in a \u0026ldquo;for\u0026rdquo; loop, which creates datasets with respective data loaders, and after that, runs \u0026ldquo;Encoder.encode\u0026rdquo; and \u0026ldquo;Encoder.evaluate\u0026rdquo; methods. The loop is capable of collecting prediction accuracy using different dataset partitioning in order to calculate the mean accuracy. Calculation of the mean accuracy and the mean top-k accuracy with confidence intervals when there is more than one dataset partitioning. Finally, it prints the simulation results in a human-readable format. Saving of the embedding vectors with related dataset partitioning if the operation is required. Calculation and plotting of a cumulative accuracy function. Creation of a list of the least accurate classes. Manual checking of the system work by forming a request with a randomly chosen image from a validation dataset. \u0026ldquo;Encoder.infer\u0026rdquo; method processes the request. It prints thorough request and response information. Visual checking picks up 6 random images from a validation dataset and forms a request to the system. It draws images with their true and predicted labels. randomness.py \u0026#34;\u0026#34;\u0026#34; Tools to make results reproducible. References: 1. \u0026#34;Reproducibility\u0026#34;, PyTorch v2.7, updated on 2024.11.26. https://docs.pytorch.org/docs/stable/notes/randomness.html 2. \u0026#34;2.1.4. Results Reproducibility\u0026#34;, cuBLAS v12.9, \u0026#34;1. Introduction\u0026#34;, updated on 2025.05.03. https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility 3. \u0026#34;Random seeds and reproducible results in PyTorch\u0026#34;, by Vandana Rajan, 2021.05.11. https://vandurajan91.medium.com/random-seeds-and-reproducible-results-in-pytorch-211620301eba 4. \u0026#34;How to Set Random Seeds in PyTorch and TensorFlow\u0026#34;, by Hey Amit, 2024.12.06. Note: probably, AI-generated, but useful. https://medium.com/we-talk-data/how-to-set-random-seeds-in-pytorch-and-tensorflow-89c5f8e80ce4 \u0026#34;\u0026#34;\u0026#34; import os import random from typing import Optional import numpy as np import torch __all__ = [\u0026#34;make_torch_generator\u0026#34;, \u0026#34;seed_worker\u0026#34;, \u0026#34;set_determinism\u0026#34;] def set_determinism(*, seed: int, use_deterministic_algorithms: bool) -\u0026gt; None: os.environ[\u0026#34;PYTHONHASHSEED\u0026#34;] = str(seed) os.environ[\u0026#34;CUBLAS_WORKSPACE_CONFIG\u0026#34;] = \u0026#34;:4096:8\u0026#34; random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed(seed) torch.cuda.manual_seed_all(seed) torch.use_deterministic_algorithms(use_deterministic_algorithms) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False def seed_worker(worker_id: int) -\u0026gt; None: worker_seed = torch.initial_seed() % 2**32 np.random.seed(worker_seed) random.seed(worker_seed) def make_torch_generator( *, seed: Optional[int] = None, device: Optional[str] = None ) -\u0026gt; torch._C.Generator: if device is not None: generator = torch.Generator(device=device) else: generator = torch.Generator() if seed is not None: generator.manual_seed(seed) return generator # An example: # # generator = make_torch_generator(seed=random_seed) # # DataLoader( # train_dataset, # batch_size=batch_size, # num_workers=num_workers, # worker_init_fn=seed_worker, # generator=generator, # ) There are common recommendations on how to make results reproducible using PyTorch [link]. They were used here with some extra features.\nIt is unnecessary and, maybe, even improper to turn off randomness during most of the training and validation runs, because they show a variety of results and may point to some data and algorithmic issues. On the other hand, it is appropriate to have the final results reproducible.\nstore.py # Copyright (C) 2025 Igor Sivchek # Licensed under the MIT License. # See license text at [https://opensource.org/license/mit]. \u0026#34;\u0026#34;\u0026#34; Utilities for storing (saving and loading) embeddings and checkpoints. \u0026#34;\u0026#34;\u0026#34; import pathlib from typing import Any, Callable, Optional, Union import numpy import pandas import timm import torch import banerjee __all__ = [ \u0026#34;save_embeddings\u0026#34;, \u0026#34;load_embeddings\u0026#34;, \u0026#34;save_checkpoint\u0026#34;, \u0026#34;load_checkpoint\u0026#34; ] def save_embeddings( *, embeddings: numpy.ndarray, targes: numpy.ndarray, filename: Union[str, pathlib.Path], inform: bool = False ) -\u0026gt; None: filename = pathlib.Path(filename) embeddings_df = pandas.DataFrame(data=embeddings) embeddings_df.insert(loc=0, column=\u0026#39;target\u0026#39;, value=targes) embeddings_df.to_csv( f\u0026#34;{filename}.zip\u0026#34;, index=False, compression={ \u0026#39;method\u0026#39;: \u0026#39;zip\u0026#39;, \u0026#39;archive_name\u0026#39;: f\u0026#34;{filename.name}.csv\u0026#34;, } ) if inform: print( \u0026#34;Embeddings have been saved:\u0026#34;, f\u0026#34;{filename}.zip\u0026#34;, sep=\u0026#34;\\n \u0026#34;, ) def load_embeddings( *, filename: Union[str, pathlib.Path], inform: bool = False ) -\u0026gt; tuple[numpy.ndarray]: filename = pathlib.Path(filename) embeddings_df = pandas.read_csv(filename) targets = embeddings_df.iloc[:, 0].to_numpy() embeddings = embeddings_df.iloc[:, 1:].to_numpy() if inform: print( \u0026#34;Embeddings have been loaded:\u0026#34;, filename, sep=\u0026#34;\\n \u0026#34;, ) return targets, embeddings def save_checkpoint( *, filename: Union[str, pathlib.Path], model: torch.nn.Module, train_ds: Optional[banerjee.Animal90] = None, valid_ds: Optional[banerjee.Animal90] = None, test_ds: Optional[banerjee.Animal90] = None, inform=False, ) -\u0026gt; None: # Warning: it does not save image trainsformations of the datasets. state = dict( model_config = model.default_cfg, # May not be universal. model=model.state_dict(), train_ds_state=(train_ds.get_state() if train_ds is not None else None), valid_ds_state=(valid_ds.get_state() if valid_ds is not None else None), test_ds_state=(test_ds.get_state() if test_ds is not None else None), ) torch.save(obj=state, f=filename) if inform: print( \u0026#34;A ckeckpoint has been saved:\u0026#34;, f\u0026#34;{filename}\u0026#34;, sep=\u0026#34;\\n \u0026#34;, ) def load_checkpoint( *, checkpoint_path: Union[str, pathlib.Path], dataset_path: Optional[Union[str, pathlib.Path]], inform=False, ) -\u0026gt; tuple: state = torch.load( f=checkpoint_path, map_location=\u0026#34;cpu\u0026#34;, weights_only=True ) model_state = state[\u0026#34;model\u0026#34;] if \u0026#34;head.linear.weight\u0026#34; in model_state: # EfficientViT. num_labels = len(model_state[\u0026#34;head.linear.weight\u0026#34;]) elif \u0026#34;classifier.weight\u0026#34; in model_state: # MobileNetV4. num_labels = len(model_state[\u0026#34;classifier.weight\u0026#34;]) elif \u0026#34;head.fc.weight\u0026#34; in model_state: # TinyViT. num_labels = len(model_state[\u0026#34;head.fc.weight\u0026#34;]) else: if inform: print(\u0026#34;state[\u0026#39;model\u0026#39;] keys:\u0026#34;, model_state.keys()) raise ValueError(\u0026#34;An unexpected model. Cannot define \u0026#39;num_classes\u0026#39; attribute.\u0026#34;) model = timm.create_model( model_name=state[\u0026#34;model_config\u0026#34;][\u0026#34;hf_hub_id\u0026#34;], pretrained=False, # Do not load model parameters. num_classes=num_labels, # Set the model head. ) model.load_state_dict(model_state) train_ds_state = state[\u0026#34;train_ds_state\u0026#34;] valid_ds_state = state[\u0026#34;valid_ds_state\u0026#34;] test_ds_state = state[\u0026#34;test_ds_state\u0026#34;] if ( dataset_path is not None and ( train_ds_state is not None or valid_ds_state is not None or test_ds_state is not None ) ): dsm = banerjee.DatasetMaker(dataset_path=dataset_path) train_ds, valid_ds, test_ds = dsm.restore_datasets( train_ds_state=train_ds_state, valid_ds_state=valid_ds_state, test_ds_state=test_ds_state, ) else: train_ds = valid_ds = test_ds = None if inform: print( \u0026#34;A ckeckpoint has been loaded:\u0026#34;, f\u0026#34;{checkpoint_path}\u0026#34;, sep=\u0026#34;\\n \u0026#34;, ) return model, train_ds, valid_ds, test_ds There are functions to save and load an embedding vector database. A file in CSV format is used to store the vectors, and the file is placed into a ZIP archive for compression.\nA couple of other functions are capable of saving and loading an artificial neural network model after transfer learning with respective dataset information. The latter is important for reusing the datasets since they are created randomly by default. The checkpoints do not store the full dataset data but only the names of images and labels in order to achieve a smaller size.\nutils.py # Copyright (C) 2025 Igor Sivchek # Licensed under the MIT License. # See license text at [https://opensource.org/license/mit]. \u0026#34;\u0026#34;\u0026#34; Utilities of different kinds. \u0026#34;\u0026#34;\u0026#34; import cProfile from collections.abc import Iterable import pstats from typing import Any, Callable, Optional, Union import timm import torch __all__ = [\u0026#34;NativeScalerV2\u0026#34;, \u0026#34;Profile\u0026#34;, \u0026#34;check_cuda_support\u0026#34;, \u0026#34;get_device\u0026#34;] def check_cuda_support(*, inform: bool = False) -\u0026gt; tuple[bool, bool]: has_cuda = False has_bf16 = False if torch.cuda.is_available(): has_cuda = True if torch.cuda.is_bf16_supported(): has_bf16 = True print(f\u0026#34;CUDA support: {has_cuda}\u0026#34;) print(f\u0026#34;BF16 support: {has_bf16}\u0026#34;) return (has_cuda, has_bf16) def get_device(*, force_cpu: bool = False) -\u0026gt; torch.device: if force_cpu: return torch.device(\u0026#34;cpu\u0026#34;) else: return torch.device(\u0026#34;cuda:0\u0026#34; if torch.cuda.is_available() else \u0026#34;cpu\u0026#34;) # Mixed precision training may cause \u0026#34;UserWarning: Detected call of # `lr_scheduler.step()` before `optimizer.step()`\u0026#34; issue. A special technique # should be used to avoid this problem. See # https://discuss.pytorch.org/t/optimizer-step-before-lr-scheduler-step-error-using-gradsclaer/92930 # https://discuss.pytorch.org/t/userwarning-detected-call-of-lr-scheduler-step-before-optimizer-step/164814 # https://discuss.pytorch.org/t/userwarning-detected-call-of-lr-scheduler-step-before-optimizer-step-in-pytorch-1-1-0-and-later-you-should-call-them-in-the-opposite-order-optimizer-step-before-lr-scheduler-step/88295 # Basic resources # https://docs.pytorch.org/docs/2.7/amp.html # https://github.com/pytorch/pytorch/blob/v2.7.0/torch/amp/grad_scaler.py # https://github.com/huggingface/pytorch-image-models/blob/v1.0.15/timm/utils/cuda.py # It seems, ideally, there should be retraining on a problematic batch, not # just skipping of a learning rate step. There should be a loop to find an # applicable scale. Of course, the number of cycles should be limited. class NativeScalerV2(timm.utils.cuda.NativeScaler): def __init__(self, device: str = \u0026#34;cuda\u0026#34;) -\u0026gt; None: super().__init__(device=device) self._stepped: bool = False def __call__( self, *, loss: Union[torch.Tensor, Iterable[torch.Tensor]], optimizer: torch.optim.Optimizer, clip_grad: Optional[float] = None, clip_mode: str = \u0026#34;norm\u0026#34;, parameters: Optional[Iterable] = None, create_graph: bool = False, need_update: bool = True, ) -\u0026gt; None: scale = self.get_scale() super().__call__( loss=loss, optimizer=optimizer, clip_grad=clip_grad, clip_mode=clip_mode, parameters=parameters, create_graph=create_graph, need_update=need_update, ) self._stepped = (scale \u0026lt;= self.get_scale()) def has_stepped(self) -\u0026gt; bool: return self._stepped def get_scale(self) -\u0026gt; float: return self._scaler.get_scale() # A context manager to wrap a ``cProfile.Profile`` object. It can be activated # or deactivated without the necessity to change code containing it. # https://docs.python.org/3/library/profile.html # https://stackoverflow.com/questions/29630667/how-can-i-analyze-a-file-created-with-pstats-dump-statsfilename-off-line class Profile: def __init__(self, *, active: bool = True): self.obj = cProfile.Profile() self.__active = active def __enter__(self): self.obj.enable() return self def __exit__(self, exc_type, exc_value, exc_tb): self.obj.disable() if exc_type is not None: return False def __copy__(self): raise TypeError(\u0026#34;A \u0026#39;copy\u0026#39; operation is not supported.\u0026#34;) def __deepcopy__(self): raise TypeError(\u0026#34;A \u0026#39;deepcopy\u0026#39; operation is not supported.\u0026#34;) def dump( self, *, filename: str, sort_by: str = \u0026#34;stdname\u0026#34;, strip_dirs: bool = False, ): if self.__active: with open(file=(filename + \u0026#34;.log\u0026#34;), mode=\u0026#34;w\u0026#34;) as out_stream: stats = pstats.Stats(self.obj, stream=out_stream) if strip_dirs: stats.strip_dirs() stats.sort_stats(sort_by) stats.dump_stats(filename + \u0026#34;.prof\u0026#34;) stats.print_stats() There are some general functions that can be useful in different deep learning projects. One of them is a function to check CUDA and bfloat16 format support. Another handy function returns a device (a CPU or a CUDA device) to place an artificial neural network on.\nA special scaler class is used for mixed precision calculations [link] in transfer learning. The design of this scaler is aimed to avoid a known issue when an optimizer step can be called before a scheduler step, which causes an error [link]. Note that it actually does not make much sense to use mixed precision in this work for performance reasons.\nA context manager wrapper around a \u0026ldquo;cProfile.Profile\u0026rdquo; object is used for convenient activation of the profiler when it is necessary. It also has a method to save profiling data in binary and human-readable formats.\nvisual.py # Copyright (C) 2025 Igor Sivchek # Licensed under the MIT License. # See license text at [https://opensource.org/license/mit]. \u0026#34;\u0026#34;\u0026#34; Some visualization tools. \u0026#34;\u0026#34;\u0026#34; from collections.abc import Sequence from typing import Any, Callable, Optional, Union import matplotlib import numpy as np import torch __all__ = [\u0026#34;draw_cumulative_accuracy\u0026#34;, \u0026#34;show_images\u0026#34;] def draw_image( *, ax: matplotlib.axis.Axis, image: torch.Tensor, mean: np.ndarray, std: np.ndarray, title: Optional[str] = None, ) -\u0026gt; None: image = image.numpy().transpose((1, 2, 0)) # This procedure is unnecessary if data is not transformed. image = std * image + mean image = np.clip(image, 0, 1) ax.imshow(image) if title is not None: ax.set_title(title) def show_images( *, inputs: torch.Tensor, targets: list[str], predictions: list[str], mean: Sequence[float], std: Sequence[float], figsize: tuple[int] = (8, 8), ) -\u0026gt; None: inputs = inputs.detach().cpu() mean = np.array(mean) std = np.array(std) num_rows = 2 num_cols = 3 num_plots = num_rows * num_cols fig, axes = matplotlib.pyplot.subplots( nrows=num_rows, ncols=num_cols, figsize=figsize, constrained_layout=True ) plot_idx = 0 for input_idx, (image, target, prediction) in ( enumerate(zip(inputs, targets, predictions)) ): if plot_idx \u0026gt;= num_plots: break row_idx = input_idx // num_cols col_idx = input_idx % num_cols title = \u0026#34;targ.: {0}\\npred.: {1}\u0026#34;.format(target, prediction) ax = axes[row_idx, col_idx] ax.axis(\u0026#34;off\u0026#34;) draw_image(ax=ax, image=image, mean=mean, std=std, title=title) plot_idx += 1 def draw_cumulative_accuracy( *, shares: np.ndarray, cum_accs: np.ndarray, figsize: tuple[int] = (8, 8), ) -\u0026gt; None: fig, ax = matplotlib.pyplot.subplots( nrows=1, ncols=1, figsize=figsize, constrained_layout=True ) fig.suptitle(\u0026#34;Cumulative accuracy function\u0026#34;) ax.plot(shares, cum_accs) ax.set_xlabel(\u0026#34;share\u0026#34;) ax.set_ylabel(\u0026#34;accuracy\u0026#34;, rotation=\u0026#34;horizontal\u0026#34;) ax.yaxis.set_label_coords(0.0, 1.05) ax.grid() fig.show() There is a function to visualize predictions. It shows images, target labels, and predicted labels. It is worth to visually control the results of model work when it is possible.\nAnother function draws a cumulative accuracy function plot.\nConclusion This blog post describes a viable approach to solving a classification task using a pre-trained artificial neural network model without fine-tuning. It is based on the use of an embedding vector database and cosine similarity search. The presented image classification system has up to 0.973 validation accuracy on the Banerjee\u0026rsquo;s Animal Image Dataset, which has 90 classes. The core of the system is one of the distilled TinyViT models with 5.07 - 20.5 million parameters.\nDue to the relatively small size of the dataset (5400 images), a special technique was used to increase the precision of the final accuracy calculation. Different training and validation subsets were partitioned in order to calculate the sample mean validation accuracy.\nThe experience of this work shows that although it is possible to use a small dataset to build an image classifier (with only 60 images per class), it is hard to assess its accuracy precisely and therefore improve it. Special techniques, like cross-validation and bootstrapping, can be applied to mitigate this issue and improve the precision of the final accuracy calculation, but they require extra computational and time resources.\n","permalink":"https://siv-radio.github.io/posts/2025-10-20-similarity-search-for-an-image-classification-task/","summary":"This work shows how to use a pre-trained artificial neural network for an image classification task without fine-tuning of the network.","title":"Similarity Search for an Image Classification Task"}]