Document index vector enhancements

On a proof of concept project we're working on, we have successfully tested Microsoft's Vector Hybrid Index. It is an extension of the standard 'vanilla' MS index that includes embeddings (i.e. a similarity search based on meaning that has been encoded into 'vector' matrices). Under the hood the index is searched twice - a vector search and their vanilla keyword indexed search (as we have now) with MS merging the results into a single resultset.

Search results across thousands of documents is improved and more natural language friendly.

With regards to the indexing, uploading documents would require using a recommended embedding model to vectorise the document content as it's uploaded.

Then, in order to perform a vector search, we must pass a vectorised version of the search query (using the same embeddings model as the content). So the search documents action would need updating (or separate action needed). If shared action, would need:

option to choose hybrid search or vanilla index search (as today) - no need to search embeddings if you don't need to - saves on generating a query embedding.
if hybrid selected, action must create query embedding on the fly before querying the search index with a different, hybrid payload.

There are request size limits on the embeddings API which will need to be considered.

I'd be really interested to hear your thoughts on this and I'm more than happy to provide more information or a demo if that will help you add this into Tenjin.

Attach files

Enter a subject

Please enter your email address

RELATED IDEAS

Document index vector enhancements