Skip to content

Commit 45f2023

Browse files
Add ADI Based Skillset (#11)
Co-authored-by: Ben Constable <[email protected]>
1 parent 65a3d2d commit 45f2023

22 files changed

+2376
-12
lines changed

.vscode/extensions.json

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"recommendations": [
3+
"ms-azuretools.vscode-azurefunctions",
4+
"ms-python.python"
5+
]
6+
}

.vscode/launch.json

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"configurations": [
3+
{
4+
"connect": {
5+
"host": "localhost",
6+
"port": 9091
7+
},
8+
"name": "Attach to Python Functions",
9+
"preLaunchTask": "func: host start",
10+
"request": "attach",
11+
"type": "debugpy"
12+
}
13+
],
14+
"version": "0.2.0"
15+
}

.vscode/settings.json

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"azureFunctions.projectLanguage": "Python",
3+
"azureFunctions.projectLanguageModel": 2,
4+
"azureFunctions.projectRuntime": "~4",
5+
"azureFunctions.scmDoBuildDuringDeployment": true,
6+
"debug.internalConsoleOptions": "neverOpen"
7+
}

.vscode/tasks.json

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"tasks": [
3+
{
4+
"command": "host start",
5+
"isBackground": true,
6+
"label": "func: host start",
7+
"options": {
8+
"cwd": "${workspaceFolder}/ai_search_with_adi_function_app"
9+
},
10+
"problemMatcher": "$func-python-watch",
11+
"type": "func"
12+
}
13+
],
14+
"version": "2.0.0"
15+
}

README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ It is intended that the plugins and skills provided in this repository, are adap
77
## Components
88

99
- `./text2sql` contains an Multi-Shot implementation for Text2SQL generation and querying which can be used to answer questions backed by a database as a knowledge base.
10-
- `./ai_search_with_adi` contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt4o) to interpret and understand these.
10+
- `./ai_search_with_adi_function_app` contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt4o) to interpret and understand these.
11+
- `./deploy_ai_search` provides an easy Python based utility for deploying an index, indexer and corresponding skillset for AI Search.
1112

1213
The above components have been successfully used on production RAG projects to increase the quality of responses. The code provided in this repo is a sample of the implementation and should be adjusted before being used in production.
1314

adi_function_app/.funcignore

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.git*
2+
.vscode
3+
__azurite_db*__.json
4+
__blobstorage__
5+
__queuestorage__
6+
local.settings.json
7+
test
8+
.venv

ai_search_with_adi/README.md renamed to adi_function_app/README.md

+21-11
Original file line numberDiff line numberDiff line change
@@ -38,35 +38,46 @@ The properties returned from the ADI Custom Skill are then used to perform the f
3838

3939
## Provided Notebooks \& Utilities
4040

41-
- `./ai_search.py`, `./deployment.py` provide an easy Python based utility for deploying an index, indexer and corresponding skillset for AI Search.
42-
- `./function_apps/indexer` provides a pre-built Python function app that communicates with Azure Document Intelligence, Azure OpenAI etc to perform the Markdown conversion, extraction of figures, figure understanding and corresponding cleaning of Markdown.
41+
- `./ai_search_with_adi_function_app` provides a pre-built Python function app that communicates with Azure Document Intelligence, Azure OpenAI etc to perform the Markdown conversion, extraction of figures, figure understanding and corresponding cleaning of Markdown.
4342
- `./rag_with_ai_search.ipynb` provides example of how to utilise the AI Search plugin to query the index.
4443

44+
## Deploying AI Search Setup
45+
46+
To deploy the pre-built index and associated indexer / skillset setup, see instructions in `./ai_search/README.md`.
47+
4548
## ADI Custom Skill
4649

47-
Deploy the associated function app and required resources. You can then experiment with the custom skill by sending an HTTP request in the AI Search JSON format to the `/adi_2_ai_search` HTTP endpoint.
50+
Deploy the associated function app and required resources. You can then experiment with the custom skill by sending an HTTP request in the AI Search JSON format to the `/adi_2_deploy_ai_search` HTTP endpoint.
4851

4952
To use with an index, either use the utility to configure a indexer in the provided form, or integrate the skill with your skillset pipeline.
5053

51-
### function_app.py
54+
### Deployment Steps
55+
56+
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication. Use this template to update the environment variables in the function app.
57+
2. Make sure the infra and required identities are setup. This setup requires Azure Document Intelligence and GPT4o.
58+
3. [Deploy your function app](https://learn.microsoft.com/en-us/azure/azure-functions/functions-deployment-technologies?tabs=windows) and test with a HTTP request.
5259

53-
`./function_apps/indexer/function_app.py` contains the HTTP entrypoints for the ADI skill and the other provided utility skills.
60+
### Code Files
5461

55-
### adi_2_aisearch
62+
#### function_app.py
5663

57-
`./function_apps/indexer/adi_2_aisearch.py` contains the methods for content extraction with ADI. The key methods are:
64+
`./indexer/ai_search_with_adi_function_app.py` contains the HTTP entrypoints for the ADI skill and the other provided utility skills.
5865

59-
#### analyse_document
66+
#### adi_2_aisearch
67+
68+
`./indexer/adi_2_aisearch.py` contains the methods for content extraction with ADI. The key methods are:
69+
70+
##### analyse_document
6071

6172
This method takes the passed file, uploads it to ADI and retrieves the Markdown format.
6273

63-
#### process_figures_from_extracted_content
74+
##### process_figures_from_extracted_content
6475

6576
This method takes the detected figures, and crops them out of the page to save them as images. It uses the `understand_image_with_vlm` to communicate with Azure OpenAI to understand the meaning of the extracted figure.
6677

6778
`update_figure_description` is used to update the original Markdown content with the description and meaning of the figure.
6879

69-
#### clean_adi_markdown
80+
##### clean_adi_markdown
7081

7182
This method performs the final cleaning of the Markdown contents. In this method, the section headings and page numbers are extracted for the content to be returned to the indexer.
7283

@@ -181,7 +192,6 @@ If `chunk_by_page` header is `False`:
181192

182193
**Page wise analysis in ADI is recommended to avoid splitting tables / figures across multiple chunks, when the chunking is performed.**
183194

184-
185195
## Production Considerations
186196

187197
Below are some of the considerations that should be made before using this custom skill in production:

0 commit comments

Comments
 (0)