This article on Using DuckDB for Embeddings and Vector Search by Sören Brunk shows a number of DuckDB features I wasn’t aware of.
- DuckDB can read directly from Huggingface datasets
- DuckDB can read just the parts of a .parquet file it needs, even over HTTP
- DuckDB lets you write custom functions in Python
- DuckDB now has a vector similarity search extension
I’ve recently become a DuckDB fan and continue to be impressed.