Skip to content

Nishan Jain - Technical Writer Task (RAG and Text-to-SQL) #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gr8nishan
Copy link

@gr8nishan gr8nishan commented Mar 14, 2025

This LlamaIndex-based application is an AI-powered query agent that retrieves data from both a database and a file-based system using LlamaIndex and OpenAI's GPT models. It utilizes an agent router to determine the appropriate source for answering user queries.

Summary by CodeRabbit

  • New Features

    • Launched an AI-powered, hybrid query interface that enables users to input queries and receive responses powered by both SQL and semantic search engines.
    • Enhanced query routing to automatically direct queries to the most suitable processing engine.
  • Documentation

    • Introduced comprehensive setup guides and usage instructions, covering prerequisites and environment configuration.
  • Chores

    • Updated dependency management to ensure stability with controlled library versions.

Copy link

coderabbitai bot commented Mar 14, 2025

Walkthrough

This pull request introduces the foundation for an AI-powered SQL-RAG query agent. It adds documentation and dependency declarations along with implementation files. The new Streamlit application integrates SQL database queries with semantic search via LlamaCloud and OpenAI’s GPT models. Additionally, a custom router agent workflow is defined using event classes and asynchronous methods to manage tool calls and chat interactions. Environment setup, error handling, and query processing flows are clearly structured across the new files.

Changes

File(s) Change Summary
Readme.md, requirements.txt Added documentation detailing the SQL-RAG LlamaIndex Agent’s features, installation instructions, and usage. Introduced a new requirements file listing the necessary dependencies with specific versions (e.g., llama_index, streamlit, sqlalchemy).
sql_rag.py, tool_workflow.py Created the main Streamlit application to integrate SQL querying with semantic search via LlamaCloud. Implemented functions for API key retrieval, logging, database initialization, and query execution. Added a router workflow with event classes and asynchronous methods to manage tool calls and chat.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI as "Streamlit UI"
    participant App as "SQL-RAG App"
    participant DB as "SQLite DB"
    participant Workflow as "RouterWorkflow"
    participant Llama as "LlamaCloud/LLM"

    User->>UI: Enter query
    UI->>App: Invoke main() & initialize app
    App->>DB: Set up in-memory SQLite DB
    App->>App: Retrieve API keys & setup logging
    App->>Llama: Initialize LlamaCloud query engine
    App->>Workflow: Initialize router workflow
    User->>App: Submit query
    App->>Workflow: Run query asynchronously
    Workflow->>Llama: Process query via chat method
    Llama-->>Workflow: Return response
    Workflow-->>App: Return final result
    App-->>UI: Display response
Loading
sequenceDiagram
    participant User
    participant Workflow as "Router Workflow"
    participant LLM as "Language Model"
    participant Tool as "Query Tool"

    User->>Workflow: Provide query
    Workflow->>LLM: Prepare chat & validate input
    LLM-->>Workflow: Chat response (with possible tool calls)
    alt Tool call detected
        Workflow->>Tool: Dispatch tool call
        Tool-->>Workflow: Return tool result
        Workflow->>LLM: Append result & continue chat
    else No tool call
        Workflow-->>User: Return final message
    end
Loading

Poem

I hop through lines of freshly spun code,
With carrots of logic on a new episode.
SQL and Llama dance in rhythmic delight,
Workflows and queries bloom in the night.
Every commit makes my rabbit heart light!

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gr8nishan gr8nishan changed the title Nishan Jain - Technical Writer Task Nishan Jain - Technical Writer Task (RAG and Text-to-SQL) Mar 14, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
Task1-sql-router-assessment/requirements.txt (1)

4-4: Remove unintended spacing in version specifier.

There's an extra space after "==" in line 4 for the sqlalchemy package, which might cause issues in some parsing contexts.

-sqlalchemy== 2.0.39
+sqlalchemy==2.0.39
Task1-sql-router-assessment/Readme.md (2)

79-79: Use a Markdown hyperlink instead of a bare URL.

Avoid using bare URLs to comply with Markdown linting best practices.

-https://typefully.com/t/Vkxs861
+[Typefully Link](https://typefully.com/t/Vkxs861)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

79-79: Bare URL used
null

(MD034, no-bare-urls)


82-82: Consider rephrasing "feel free to" for a more professional tone.

“Feel free to” is often overused. Replacing it can polish your wording.

-Contributions are welcome! Feel free to fork the repo and submit a pull request.
+Contributions are welcome! You can fork the repo and submit a pull request.
🧰 Tools
🪛 LanguageTool

[style] ~82-~82: The phrase ‘feel free to’ is used quite frequently. Consider using a less frequent alternative to set your writing apart from others and make it sound more professional.
Context: ...Contributing Contributions are welcome! Feel free to fork the repo and submit a pull request...

(FEEL_FREE_TO_STYLE_ME)

Task1-sql-router-assessment/sql_rag.py (2)

41-41: Raise exceptions using the original chain for better traceback context.

When re-raising exceptions, it's preferable to use raise ... from e to preserve the original traceback chain.

Here’s a sample fix:

- except Exception as e:
-     raise RuntimeError(f"Failed to initialize LlamaCloudIndex: {e}")
+ except Exception as e:
+     raise RuntimeError(f"Failed to initialize LlamaCloudIndex: {e}") from e

...
- except Exception as e:
-     raise RuntimeError(f"Failed to set up database: {e}")
+ except Exception as e:
+     raise RuntimeError(f"Failed to set up database: {e}") from e

...
- except Exception as e:
-     raise RuntimeError(f"Failed to setup SQL query engine: {e}")
+ except Exception as e:
+     raise RuntimeError(f"Failed to setup SQL query engine: {e}") from e

...
- except Exception as e:
-     raise RuntimeError(f"Failed to initialize workflow: {e}")
+ except Exception as e:
+     raise RuntimeError(f"Failed to initialize workflow: {e}") from e

...
- except Exception as e:
-     raise RuntimeError(f"Query execution failed: {e}")
+ except Exception as e:
+     raise RuntimeError(f"Query execution failed: {e}") from e

Also applies to: 72-72, 86-86, 94-94, 102-102

🧰 Tools
🪛 Ruff (0.8.2)

41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


130-130: Consider asynchronous Streamlit patterns if concurrency becomes necessary.

You’re calling asyncio.run(run_query(...)) within a button callback. This is fine for simple use, but if you need additional concurrency, explore non-blocking approaches or Streamlit’s async patterns in the future.

Task1-sql-router-assessment/tool_workflow.py (6)

1-2: Use explicit Python version union syntax or future import if targeting Python < 3.10.

The file uses Python 3.10+ union syntax (e.g., GatherToolsEvent | StopEvent). If this code must run under earlier Python versions, consider either importing annotations from __future__ or using Union[GatherToolsEvent, StopEvent].


16-18: Event subclass defined concisely.

The InputEvent class is straightforward, but consider adding typed fields or docstrings if you anticipate storing data in InputEvent later.


20-24: GatherToolsEvent usage.

tool_calls is typed as Any. If possible, refine the type to something more explicit to improve clarity (e.g., List[ToolSelection]), making the code more self-documenting.


41-48: Consider clarifying parameter docstrings.

The __init__ signature is thorough, but documenting each parameter’s purpose (e.g., tools, timeout, disable_validation, etc.) with docstrings could help other developers understand usage more quickly.


58-62: Reset method clarity.

reset simply clears chat history. Consider resetting other session-related attributes (like the tools_dict or partial states) if relevant to the workflow’s state machine.


63-73: Validate message existence.

You correctly raise an error when 'message' is missing and then store user input in chat history. No immediate issues, but you may want to handle empty messages or sanitize user inputs if security is a concern.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6140fc3 and 09aa4c5.

📒 Files selected for processing (4)
  • Task1-sql-router-assessment/Readme.md (1 hunks)
  • Task1-sql-router-assessment/requirements.txt (1 hunks)
  • Task1-sql-router-assessment/sql_rag.py (1 hunks)
  • Task1-sql-router-assessment/tool_workflow.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
Task1-sql-router-assessment/Readme.md

[grammar] ~63-~63: Please check the verb form.
Context: ...y via the selected engine. ## Usage 1. Open the Streamlit UI. 2. Enter a query, e.g...

(SHE_LIVE)


[style] ~82-~82: The phrase ‘feel free to’ is used quite frequently. Consider using a less frequent alternative to set your writing apart from others and make it sound more professional.
Context: ...Contributing Contributions are welcome! Feel free to fork the repo and submit a pull request...

(FEEL_FREE_TO_STYLE_ME)

🪛 markdownlint-cli2 (0.17.2)
Task1-sql-router-assessment/Readme.md

79-79: Bare URL used
null

(MD034, no-bare-urls)

🪛 Ruff (0.8.2)
Task1-sql-router-assessment/sql_rag.py

41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


72-72: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


86-86: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


94-94: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (11)
Task1-sql-router-assessment/requirements.txt (1)

1-6: All dependencies are pinned and look consistent.

Having pinned versions helps ensure reproducible installs, and the listed dependencies align with the rest of the project requirements.

Task1-sql-router-assessment/Readme.md (2)

63-63: Grammar check appears fine; no changes necessary.

The grammar in line 63 is acceptable, and there's no clear verb form issue.

🧰 Tools
🪛 LanguageTool

[grammar] ~63-~63: Please check the verb form.
Context: ...y via the selected engine. ## Usage 1. Open the Streamlit UI. 2. Enter a query, e.g...

(SHE_LIVE)


1-87: Comprehensive and well-structured documentation.

This README effectively covers the setup, usage, and architecture. No major concerns regarding clarity or consistency.

🧰 Tools
🪛 LanguageTool

[grammar] ~63-~63: Please check the verb form.
Context: ...y via the selected engine. ## Usage 1. Open the Streamlit UI. 2. Enter a query, e.g...

(SHE_LIVE)


[style] ~82-~82: The phrase ‘feel free to’ is used quite frequently. Consider using a less frequent alternative to set your writing apart from others and make it sound more professional.
Context: ...Contributing Contributions are welcome! Feel free to fork the repo and submit a pull request...

(FEEL_FREE_TO_STYLE_ME)

🪛 markdownlint-cli2 (0.17.2)

79-79: Bare URL used
null

(MD034, no-bare-urls)

Task1-sql-router-assessment/sql_rag.py (1)

1-143: Overall, this integration is well-designed and clear.

The code connects SQL and LlamaIndex seamlessly under Streamlit, with robust error handling and logical structure.

🧰 Tools
🪛 Ruff (0.8.2)

41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


72-72: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


86-86: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


94-94: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Task1-sql-router-assessment/tool_workflow.py (7)

3-15: Imports look appropriate.

You are importing classes and types from llama_index. Ensure all these dependencies are correctly installed to avoid runtime errors. Otherwise, no issues here.


26-30: ToolCallEvent type suggestion.

tool_call is strongly typed as ToolSelection, good job. If you expect additional fields or clarifications, consider adding them here for better maintainability.


32-36: ToolCallEventResult type usage.

The msg field is typed as ChatMessage, which is consistent. Make sure you always attach relevant context in additional_kwargs for traceability, if needed.


38-40: High-level workflow class.

The class-level docstring nicely encapsulates the purpose of the workflow.


49-57: Tool dictionary creation and fallback LLM.

  1. Good approach to create a dictionary keyed by tool names.
  2. self.llm falling back to OpenAI is convenient, but be cautious of environment variables or credentials that might be required.

74-97: Conditional stop event logic.

  1. Code uses achat_with_tools to retrieve tool calls.
  2. If no tool calls exist, it returns a StopEvent immediately.
  3. This is a neat approach to short-circuit when no tool calls are needed.

No critical issues noted; just confirm that the short-circuit scenario is expected behavior in your user flow.


138-151: Chat history accumulation and re-loop.

  1. Collecting tool events and appending messages is clear.
  2. The final return of InputEvent() restarts the loop, ensuring the workflow cycles again.
  3. Confirm that the agent truly requires re-entry after tool invocation rather than finalizing if no further user input is expected.

Otherwise, looks good.

Comment on lines +98 to +110
@step(pass_context=True)
async def dispatch_calls(self, ctx: Context, ev: GatherToolsEvent) -> ToolCallEvent:
"""Dispatches calls."""

tool_calls = ev.tool_calls
await ctx.set("num_tool_calls", len(tool_calls))

# trigger tool call events
for tool_call in tool_calls:
ctx.send_event(ToolCallEvent(tool_call=tool_call))

return None

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Return type annotation mismatch.

The method signature indicates -> ToolCallEvent, but the function returns None. This mismatch can cause confusion or break type-checking. Either adjust the return type or return a ToolCallEvent (or a list of them).

-async def dispatch_calls(self, ctx: Context, ev: GatherToolsEvent) -> ToolCallEvent:
+async def dispatch_calls(self, ctx: Context, ev: GatherToolsEvent) -> None:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@step(pass_context=True)
async def dispatch_calls(self, ctx: Context, ev: GatherToolsEvent) -> ToolCallEvent:
"""Dispatches calls."""
tool_calls = ev.tool_calls
await ctx.set("num_tool_calls", len(tool_calls))
# trigger tool call events
for tool_call in tool_calls:
ctx.send_event(ToolCallEvent(tool_call=tool_call))
return None
@step(pass_context=True)
async def dispatch_calls(self, ctx: Context, ev: GatherToolsEvent) -> None:
"""Dispatches calls."""
tool_calls = ev.tool_calls
await ctx.set("num_tool_calls", len(tool_calls))
# trigger tool call events
for tool_call in tool_calls:
ctx.send_event(ToolCallEvent(tool_call=tool_call))
return None

Comment on lines +111 to +137
@step()
async def call_tool(self, ev: ToolCallEvent) -> ToolCallEventResult:
"""Calls tool."""

tool_call = ev.tool_call

# get tool ID and function call
id_ = tool_call.tool_id

if self._verbose:
print(f"Calling function {tool_call.tool_name} with msg {tool_call.tool_kwargs}")

# call function and put result into a chat message
tool = self.tools_dict[tool_call.tool_name]
output = await tool.acall(**tool_call.tool_kwargs)
msg = ChatMessage(
name=tool_call.tool_name,
content=str(output),
role="tool",
additional_kwargs={
"tool_call_id": id_,
"name": tool_call.tool_name
}
)

return ToolCallEventResult(msg=msg)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Potential KeyError for missing tools.

tool = self.tools_dict[tool_call.tool_name] will raise KeyError if no matching tool exists. Consider handling this with a graceful error message or fallback logic.

+if tool_call.tool_name not in self.tools_dict:
+    raise RuntimeError(f"Tool '{tool_call.tool_name}' not found.")
 tool = self.tools_dict[tool_call.tool_name]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@step()
async def call_tool(self, ev: ToolCallEvent) -> ToolCallEventResult:
"""Calls tool."""
tool_call = ev.tool_call
# get tool ID and function call
id_ = tool_call.tool_id
if self._verbose:
print(f"Calling function {tool_call.tool_name} with msg {tool_call.tool_kwargs}")
# call function and put result into a chat message
tool = self.tools_dict[tool_call.tool_name]
output = await tool.acall(**tool_call.tool_kwargs)
msg = ChatMessage(
name=tool_call.tool_name,
content=str(output),
role="tool",
additional_kwargs={
"tool_call_id": id_,
"name": tool_call.tool_name
}
)
return ToolCallEventResult(msg=msg)
@step()
async def call_tool(self, ev: ToolCallEvent) -> ToolCallEventResult:
"""Calls tool."""
tool_call = ev.tool_call
# get tool ID and function call
id_ = tool_call.tool_id
if self._verbose:
print(f"Calling function {tool_call.tool_name} with msg {tool_call.tool_kwargs}")
# call function and put result into a chat message
if tool_call.tool_name not in self.tools_dict:
raise RuntimeError(f"Tool '{tool_call.tool_name}' not found.")
tool = self.tools_dict[tool_call.tool_name]
output = await tool.acall(**tool_call.tool_kwargs)
msg = ChatMessage(
name=tool_call.tool_name,
content=str(output),
role="tool",
additional_kwargs={
"tool_call_id": id_,
"name": tool_call.tool_name
}
)
return ToolCallEventResult(msg=msg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant