.Net: [MEVD] Consider returning a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable #11291

roji · 2025-03-31T21:41:30Z

We're considering doing the same thing with the database-generated IDs returned by IVectorStoreRecordCollection.UpdateBatchAsync (issue): working with IAsyncEnumerable is more difficult, and it's very unlikely that the streaming aspect is important here (or even supported by any actual connector).

westey-m · 2025-04-15T15:38:07Z

Here is a comparison of what the different DBs support today:

Database	Filtering Supported	Advertised as Lightweight Collections	CollectionsListingResponseType	MaxNumberOfCollections
AzureAISearch	N	N	Streaming AsyncPageable	3 000
MongoDB Atlas	Y	N	Streaming IAsyncCursor	100 000
Cosmos MongoDB Vcore	Y	N	Streaming IAsyncCursor	1 000
Cosmos NoSQL	Y	N	Streaming FeedIterator	499
Chroma	Limit/Offset	N	Http Response, so streamable	Not Documented
Milvus	Differs by SDK	N	Non Streaming List via GPRC / Http Response	65 536
Pinecone	N	N	Non Streaming List over GRPC	No limit?
Postgres	Y	N	Streaming NpgsqlDataReader	No limit
Qdrant	N	N	Non Streaming list over GRPC	No limit
Redis	N	N	Non Streaming list	Not Documented
Sqlite	Y	N	Streaming SqliteDataReader	Theoretical limit of 4 294 967 294
SqlServer	Y	N	Streaming SqlDataReader	Theoretical limit of 2 147 483 647 for all objects
Weaviate	N	N	Http Response, so streamable	Default: 1000, can be changed

Some DBs certainly support very large numbers of collections. Some also support it, combined with streaming retrieval methods.

Based on this, I think it makes sense to continue supporting IAsyncEnumerable. For those DBs that support streaming, it would certainly be preferable to locking up an application for a long time, while internally streaming the results and packaging them into a collection. Having an IAsyncEnumerable signals to users that many results may be there and getting all results may take time.

CC @roji @dmytrostruk @adamsitnik

roji · 2025-04-15T17:27:25Z

Specifically about the streaming retrieval methods, I'll point out that most (all?) are just general, low-level retrieval methods that aren't specific to fetching the collection names. For example, all the SQL connectors use DbDataReader, since that's just the way any data is fetched from the database (same for Cosmos FeedIterator and I'm assuming the others) - so obviously they support streaming.

At the end of the day, I think what matters is how we expect our high-level API to actually be used... Even if someone has 20000 collections in the database and calls ListCollectionAsync, streaming only helps if they want to get earlier results super early (low latency), or maybe if they want to cancel in the middle; though as you say, this API isn't made for this (no filtering, no ordering/top/limit...), and we do support a cancellation token if the entire request takes too much time (so just not "in-the-middle cancellation" after some data was downloaded). Checking whether a specific collection exists is there as a separate API, so partial downloads of collection names without filtering/ordering/top just seems... weird...

Basically it's hard for me to imagine how streaming would be actually useful here, in the vast majority of concrete user scenarios. It's also worth mentioning that users with specific edge-case needs can always drop down to the low-level SDK and get collections in whatever way they want. My main concern here is to make sure the 95% usage case is as easy/nice as possible.

This is definitely not a hill I'm willing to die on - if you strongly feel we should keep IAsyncEnumerable I'm OK with it.

markwallace-microsoft · 2025-04-23T11:17:28Z

Discussed with the team and decided to keep the current implementation dependant on feedback from the API review

roji added .NET Issue or Pull requests regarding .NET code Build Features planned for next Build conference msft.ext.vectordata Related to Microsoft.Extensions.VectorData labels Mar 31, 2025

roji assigned dmytrostruk and westey-m Mar 31, 2025

roji added this to Semantic Kernel Mar 31, 2025

markwallace-microsoft added the triage label Mar 31, 2025

github-actions bot changed the title ~~[MEVD] Consider return a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable~~ .Net: [MEVD] Consider return a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable Mar 31, 2025

markwallace-microsoft removed the triage label Mar 31, 2025

roji changed the title ~~.Net: [MEVD] Consider return a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable~~ .Net: [MEVD] Consider returning a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable Apr 7, 2025

markwallace-microsoft unassigned dmytrostruk Apr 15, 2025

westey-m moved this to Backlog: Planned in Semantic Kernel Apr 15, 2025

westey-m moved this from Backlog: Planned to Sprint: Planned in Semantic Kernel Apr 15, 2025

westey-m moved this from Sprint: Planned to Sprint: In Review in Semantic Kernel Apr 15, 2025

westey-m moved this from Sprint: In Review to Sprint: In Progress in Semantic Kernel Apr 15, 2025

markwallace-microsoft moved this from Sprint: In Progress to Sprint: Done in Semantic Kernel Apr 23, 2025

markwallace-microsoft moved this from Sprint: Done to Backlog: Planned in Semantic Kernel Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: [MEVD] Consider returning a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable #11291

.Net: [MEVD] Consider returning a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable #11291

roji commented Mar 31, 2025

westey-m commented Apr 15, 2025

roji commented Apr 15, 2025

markwallace-microsoft commented Apr 23, 2025

.Net: [MEVD] Consider returning a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable #11291

.Net: [MEVD] Consider returning a simple list from IVectorStore.ListCollectionNamesAsync instead of IAsyncEnumerable #11291

Comments

roji commented Mar 31, 2025

westey-m commented Apr 15, 2025

roji commented Apr 15, 2025

markwallace-microsoft commented Apr 23, 2025