Skip to content

Feat: Adds OpenSearch2.19.1 as the vector_database support #7140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 24, 2025

Conversation

pyyuhao
Copy link
Contributor

@pyyuhao pyyuhao commented Apr 18, 2025

What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow.

Main Benefit

  1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch
  2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema
  3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema

Changes

  • Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py
  • Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py
  • Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json
  • Support OpenSearch python sdk : pyproject.toml
  • Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

How to use

  • I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work.

Others

Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

Type of change

  • New Feature (non-breaking change which adds functionality)

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 🌈 python Pull requests that update Python code 💞 feature Feature request, pull request that fullfill a new feature. labels Apr 18, 2025
@KevinHuSh KevinHuSh requested a review from asiroliu April 18, 2025 08:31
@KevinHuSh KevinHuSh added the ci Continue Integration label Apr 18, 2025
@yingfeng
Copy link
Member

Thanks for the contribution. OpenSearch can be added as a doc engine alternative. However, maintaining OpenSearch is not an easy work, after this PR is merged, OpenSearch might not be able to work after several weeks.

docker/.env Outdated
@@ -2,7 +2,8 @@
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}
# - `opensearch` (https://github.com/opensearch-project/OpenSearch)
DOC_ENGINE=${DOC_ENGINE:-opensearch}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default DOC_ENGINE should be elasticsearch

Copy link
Contributor Author

@pyyuhao pyyuhao Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asiroliu I'v changed it into elasticsearch by default in my commit. It's a config mistake I forgot to change back from my local environment

@pyyuhao
Copy link
Contributor Author

pyyuhao commented Apr 18, 2025

Thanks for the contribution. OpenSearch can be added as a doc engine alternative. However, maintaining OpenSearch is not an easy work, after this PR is merged, OpenSearch might not be able to work after several weeks.

Thanks for you reply, and I'v changed some litte problems mentioned above. I am an search-engine engineer focusing on the stuff about OpenSearch/Elasticsearch for years, and also write some plugins for Opnsearch.I will still pay much attention on ES/OS continuously. During these two years, We gave more attention on RAG stuff

@pyyuhao pyyuhao requested a review from asiroliu April 18, 2025 10:10
@pyyuhao
Copy link
Contributor Author

pyyuhao commented Apr 18, 2025

@asiroliu @yingfeng @KevinHuSh ,hi: I've made some commits mainly about fomat and comment. Please review again, thanks a lot

@yingfeng
Copy link
Member

It can not pass CI, the container of elasticsearch can not be started. See the CI logs https://github.com/infiniflow/ragflow/actions/runs/14533575135/job/40777952188

@pyyuhao
Copy link
Contributor Author

pyyuhao commented Apr 18, 2025

It can not pass CI, the container of elasticsearch can not be started. See the CI logs https://github.com/infiniflow/ragflow/actions/runs/14533575135/job/40777952188

It worked well now at my local environment, I will check the code again and create a virtual machine to verify it.
Maybe because I use opensearch on 9200 port which is the same as elasticsearch(on most cases these are the same), is there a rule about this that will block the test job for es?
I will check these in a few days

Have a nice weekend

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. size:XXL This PR changes 1000+ lines, ignoring generated files. labels Apr 21, 2025
@yingfeng yingfeng added ci Continue Integration and removed ci Continue Integration labels Apr 21, 2025
@yingfeng
Copy link
Member

Still the same error for CI:
https://github.com/infiniflow/ragflow/actions/runs/14573690004/job/40875405448?pr=7140

Error response from daemon: container 41fe9ca43b5b1f497269237c45423bf7c4e3b53687d48b7f94b235cee24095e0 is not running

@yingfeng yingfeng removed the ci Continue Integration label Apr 21, 2025
@pyyuhao pyyuhao force-pushed the main branch 2 times, most recently from 00d0b0d to c6f9337 Compare April 22, 2025 09:13
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 22, 2025
pyyuhao and others added 16 commits April 22, 2025 17:30
### What problem does this PR solve?

Documentation for MCP server

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <[email protected]>
Add MCP support with a client example.

Issue link: infiniflow#4344

- [x] New Feature (non-breaking change which adds functionality)
Add MCP support with a client example.

Issue link: infiniflow#4344

- [x] New Feature (non-breaking change which adds functionality)
Add MCP support with a client example.

Issue link: infiniflow#4344

- [x] New Feature (non-breaking change which adds functionality)
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Apr 22, 2025
@yingfeng yingfeng added the ci Continue Integration label Apr 22, 2025
@pyyuhao
Copy link
Contributor Author

pyyuhao commented Apr 22, 2025

Still the same error for CI: https://github.com/infiniflow/ragflow/actions/runs/14573690004/job/40875405448?pr=7140

Error response from daemon: container 41fe9ca43b5b1f497269237c45423bf7c4e3b53687d48b7f94b235cee24095e0 is not running

hi,there were some config adaption mistakes before . I reckeced on my new environment both on ES and OpenSearch . I've repushed today and I see the ci has passed
Thanks for your review

@wanpdsantos
Copy link
Contributor

Shouldn't tests to OpenSearch be added here: https://github.com/infiniflow/ragflow/blob/main/.github/workflows/tests.yml ?

@asiroliu
Copy link
Contributor

@pyyuhao
Apologies for not reviewing your PR immediately - I've been fully occupied with finalizing the 0.18.0 release. I'll prioritize reviewing your contribution as soon as the release is complete.

@pyyuhao
Copy link
Contributor Author

pyyuhao commented Apr 23, 2025

@pyyuhao Apologies for not reviewing your PR immediately - I've been fully occupied with finalizing the 0.18.0 release. I'll prioritize reviewing your contribution as soon as the release is complete.

Thanks for your review.
no rush, take your time.

@KevinHuSh KevinHuSh changed the title Added OpenSearch2.19.1 as the vector_database support Feat: Adds OpenSearch2.19.1 as the vector_database support Apr 24, 2025
@KevinHuSh KevinHuSh merged commit c8c3b75 into infiniflow:main Apr 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continue Integration 💞 feature Feature request, pull request that fullfill a new feature. 🌈 python Pull requests that update Python code size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants