Skip to content

(EAI-923) universal tagging system #671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import {
Factuality,
} from "autoevals";
import { strict as assert } from "assert";
import { MongoDbTag } from "../mongoDbMetadata";
import { MongoDbTag } from "mongodb-rag-core";
import { fuzzyLinkMatch } from "./fuzzyLinkMatch";
import { binaryNdcgAtK } from "./scorers/binaryNdcgAtK";
import { ConversationEvalCase as ConversationEvalCaseSource } from "mongodb-rag-core/eval";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,26 @@
Can be run through npm script or directly using node:

```bash
npm run generate-eval-cases -- <csvFileName> <yamlFileName> [transformationType]
npm run generate-eval-cases -- <csvFileName> <yamlFileName> [transformationType] [transformationOptions]
```

Or:

```bash
node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType]
node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType] [transformationOptions]
```

### Arguments

- `csvFilePath`: (Required) Absolute path to the input CSV file
- `yamlFileName`: (Required) Name of the output YAML file (without .yml extension)
- `transformationType`: (Optional) Type of transformation to apply to the cases
- `transformationOptions`: (Optional) Additional options for the transformation

### Available Transformations

- `web`: Adds a "web" tag to all evaluation cases
- `addTags`: Adds specified tags to all evaluation cases
- `addCustomTags`: Adds specified custom tags to all evaluation cases

### File Paths

Expand All @@ -36,14 +38,14 @@
### Example

```bash
npm run generate-eval-cases -- Users/first.lastname/Downloads/input-file.csv output-file-name web
npm run generate-eval-cases -- /path/to/input.csv output-name addTags tag1 tag2
```

This will:
1. Read from: /Users/first.lastname/Downloads/input-file.csv
2. Apply the web transformation
3. Write to: evalCases/output-file-name.yml
4. Log missing resources to the console in a warning
1. Read from: /path/to/input.csv
2. Add tags "tag1" and "tag2" to all cases, after validating them against the MongoDbTags enum.
3. Write to: evalCases/output-name.yml
4. Log any missing resources to the console as warnings
*/

import fs from "fs";
Expand All @@ -55,6 +57,7 @@ import {
} from "mongodb-rag-core/eval";
import { MONGODB_CONNECTION_URI, MONGODB_DATABASE_NAME } from "../../config";
import { makeMongoDbPageStore } from "mongodb-rag-core";
import { validateTags } from "mongodb-rag-core";

const SRC_ROOT = path.resolve(__dirname, "../");

Expand All @@ -63,24 +66,30 @@ const pageStore = makeMongoDbPageStore({
databaseName: MONGODB_DATABASE_NAME,
});

function addWebDataSourceTag(evalCases: ConversationEvalCase[]) {
return evalCases.map((caseItem) => {
const tags = caseItem.tags || [];
if (!tags.includes("web")) {
tags.push("web");
}
return {
...caseItem,
tags,
};
});
function addTags({
evalCases,
tagNames,
custom = false,
}: {
evalCases: ConversationEvalCase[];
tagNames: string[];
custom?: boolean;
}): ConversationEvalCase[] {
validateTags(tagNames, custom);
return evalCases.map((caseItem) => ({
...caseItem,
tags: [...(caseItem.tags || []), ...tagNames],
}));
}

const transformationMap: Record<
string,
(cases: ConversationEvalCase[]) => ConversationEvalCase[]
(cases: ConversationEvalCase[], options?: string[]) => ConversationEvalCase[]
> = {
web: addWebDataSourceTag,
addTags: (cases: ConversationEvalCase[], options?: string[]) =>
addTags({ evalCases: cases, tagNames: options || [] }),
addCustomTags: (cases: ConversationEvalCase[], options?: string[]) =>
addTags({ evalCases: cases, tagNames: options || [], custom: true }),
// Add more transformation functions here as needed
};

Expand All @@ -103,15 +112,20 @@ async function main({
csvFilePath,
yamlFileName,
transformationType,
transformationOptions,
}: {
csvFilePath: string;
yamlFileName: string;
transformationType?: keyof typeof transformationMap;
transformationOptions?: string[];
}): Promise<void> {
console.log(`Reading from: ${csvFilePath}`);
const evalCases = await getConversationEvalCasesFromCSV(
csvFilePath,
transformationType ? transformationMap[transformationType] : undefined
transformationType
? (cases) =>
transformationMap[transformationType](cases, transformationOptions)
: undefined
);
const expectedUrls = Array.from(
new Set(evalCases.flatMap((caseItem) => caseItem.expectedLinks ?? []))
Expand Down Expand Up @@ -142,15 +156,20 @@ async function main({
// Checks if the script is being run directly (not imported as a module) and handles command-line arguments.
if (require.main === module) {
const args = process.argv.slice(2);
const [csvFilePath, yamlFileName, transformationType] = args;
const [
csvFilePath,
yamlFileName,
transformationType,
...transformationOptions
] = args;
const availableTransformationTypes = Object.keys(transformationMap);
if (
args.length < 2 ||
(transformationType &&
!availableTransformationTypes.includes(transformationType))
) {
console.error(
"Usage: node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType]\n" +
"Usage: node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType] [tranformationOptions]\n" +
"Arguments:\n" +
" csvFileName: Input CSV file name (required)\n" +
" yamlFileName: Output YAML file name (required)\n" +
Expand All @@ -166,6 +185,7 @@ if (require.main === module) {
csvFilePath,
yamlFileName,
transformationType,
transformationOptions,
})
.catch((error) => {
console.error("Error:", error);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ async function conversationEval() {
// Get dotcom question set eval cases from YAML
const basePath = path.resolve(__dirname, "..", "..", "..", "evalCases");
const conversationEvalCases = getConversationsEvalCasesFromYaml(
fs.readFileSync(path.resolve(basePath, "uni_skills_evaluation_questions.yml"), "utf8")
fs.readFileSync(
path.resolve(basePath, "uni_skills_evaluation_questions.yml"),
"utf8"
)
);

const generateConfig = {
Expand Down Expand Up @@ -52,4 +55,4 @@ async function conversationEval() {
generate: generateConfig,
});
}
conversationEval();
conversationEval();
2 changes: 1 addition & 1 deletion packages/chatbot-server-mongodb-public/src/lib.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
Export some modules from the implementation for use in things like evaluation.
*/
export { systemPrompt } from "./systemPrompt";
export * as mongoDbMetadata from "./mongoDbMetadata";
export * as mongoDbMetadata from "mongodb-rag-core";

This file was deleted.

Loading