Skip to content

(EAI-923) universal tagging system #671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import {
Factuality,
} from "autoevals";
import { strict as assert } from "assert";
import { MongoDbTag } from "../mongoDbMetadata";
import { MongoDbTag } from "mongodb-rag-core";
import { fuzzyLinkMatch } from "./fuzzyLinkMatch";
import { binaryNdcgAtK } from "./scorers/binaryNdcgAtK";
import { ConversationEvalCase as ConversationEvalCaseSource } from "mongodb-rag-core/eval";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,26 @@
Can be run through npm script or directly using node:

```bash
npm run generate-eval-cases -- <csvFileName> <yamlFileName> [transformationType]
npm run generate-eval-cases -- <csvFileName> <yamlFileName> [transformationType] [transformationOptions]
```

Or:

```bash
node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType]
node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType] [transformationOptions]
```

### Arguments

- `csvFilePath`: (Required) Absolute path to the input CSV file
- `yamlFileName`: (Required) Name of the output YAML file (without .yml extension)
- `transformationType`: (Optional) Type of transformation to apply to the cases
- `transformationOptions`: (Optional) Additional options for the transformation

### Available Transformations

- `web`: Adds a "web" tag to all evaluation cases
- `addTags`: Adds specified tags to all evaluation cases
- `addCustomTags`: Adds specified custom tags to all evaluation cases

### File Paths

Expand All @@ -36,14 +38,14 @@
### Example

```bash
npm run generate-eval-cases -- Users/first.lastname/Downloads/input-file.csv output-file-name web
npm run generate-eval-cases -- /path/to/input.csv output-name addTags tag1 tag2
```

This will:
1. Read from: /Users/first.lastname/Downloads/input-file.csv
2. Apply the web transformation
3. Write to: evalCases/output-file-name.yml
4. Log missing resources to the console in a warning
1. Read from: /path/to/input.csv
2. Add tags "tag1" and "tag2" to all cases, after validating them against the MongoDbTags enum.
3. Write to: evalCases/output-name.yml
4. Log any missing resources to the console as warnings
*/

import fs from "fs";
Expand All @@ -55,6 +57,7 @@ import {
} from "mongodb-rag-core/eval";
import { MONGODB_CONNECTION_URI, MONGODB_DATABASE_NAME } from "../../config";
import { makeMongoDbPageStore } from "mongodb-rag-core";
import { validateTags } from "mongodb-rag-core";

const SRC_ROOT = path.resolve(__dirname, "../");

Expand All @@ -63,24 +66,30 @@ const pageStore = makeMongoDbPageStore({
databaseName: MONGODB_DATABASE_NAME,
});

function addWebDataSourceTag(evalCases: ConversationEvalCase[]) {
return evalCases.map((caseItem) => {
const tags = caseItem.tags || [];
if (!tags.includes("web")) {
tags.push("web");
}
return {
...caseItem,
tags,
};
});
function addTags({
evalCases,
tagNames,
custom = false,
}: {
evalCases: ConversationEvalCase[];
tagNames: string[];
custom?: boolean;
}): ConversationEvalCase[] {
validateTags(tagNames, custom);
return evalCases.map((caseItem) => ({
...caseItem,
tags: [...(caseItem.tags || []), ...tagNames],
}));
}

const transformationMap: Record<
string,
(cases: ConversationEvalCase[]) => ConversationEvalCase[]
(cases: ConversationEvalCase[], options?: string[]) => ConversationEvalCase[]
> = {
web: addWebDataSourceTag,
addTags: (cases: ConversationEvalCase[], options?: string[]) =>
addTags({ evalCases: cases, tagNames: options || [] }),
addCustomTags: (cases: ConversationEvalCase[], options?: string[]) =>
addTags({ evalCases: cases, tagNames: options || [], custom: true }),
// Add more transformation functions here as needed
};

Expand All @@ -103,15 +112,20 @@ async function main({
csvFilePath,
yamlFileName,
transformationType,
transformationOptions,
}: {
csvFilePath: string;
yamlFileName: string;
transformationType?: keyof typeof transformationMap;
transformationOptions?: string[];
}): Promise<void> {
console.log(`Reading from: ${csvFilePath}`);
const evalCases = await getConversationEvalCasesFromCSV(
csvFilePath,
transformationType ? transformationMap[transformationType] : undefined
transformationType
? (cases) =>
transformationMap[transformationType](cases, transformationOptions)
: undefined
);
const expectedUrls = Array.from(
new Set(evalCases.flatMap((caseItem) => caseItem.expectedLinks ?? []))
Expand Down Expand Up @@ -142,15 +156,20 @@ async function main({
// Checks if the script is being run directly (not imported as a module) and handles command-line arguments.
if (require.main === module) {
const args = process.argv.slice(2);
const [csvFilePath, yamlFileName, transformationType] = args;
const [
csvFilePath,
yamlFileName,
transformationType,
...transformationOptions
] = args;
const availableTransformationTypes = Object.keys(transformationMap);
if (
args.length < 2 ||
(transformationType &&
!availableTransformationTypes.includes(transformationType))
) {
console.error(
"Usage: node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType]\n" +
"Usage: node generateEvalCasesYamlFromCSV.js <csvFileName> <yamlFileName> [transformationType] [tranformationOptions]\n" +
"Arguments:\n" +
" csvFileName: Input CSV file name (required)\n" +
" yamlFileName: Output YAML file name (required)\n" +
Expand All @@ -166,6 +185,7 @@ if (require.main === module) {
csvFilePath,
yamlFileName,
transformationType,
transformationOptions,
})
.catch((error) => {
console.error("Error:", error);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ async function conversationEval() {
// Get dotcom question set eval cases from YAML
const basePath = path.resolve(__dirname, "..", "..", "..", "evalCases");
const conversationEvalCases = getConversationsEvalCasesFromYaml(
fs.readFileSync(path.resolve(basePath, "uni_skills_evaluation_questions.yml"), "utf8")
fs.readFileSync(
path.resolve(basePath, "uni_skills_evaluation_questions.yml"),
"utf8"
)
);

const generateConfig = {
Expand Down Expand Up @@ -52,4 +55,4 @@ async function conversationEval() {
generate: generateConfig,
});
}
conversationEval();
conversationEval();
2 changes: 1 addition & 1 deletion packages/chatbot-server-mongodb-public/src/lib.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
Export some modules from the implementation for use in things like evaluation.
*/
export { systemPrompt } from "./systemPrompt";
export * as mongoDbMetadata from "./mongoDbMetadata";
export * as mongoDbMetadata from "mongodb-rag-core";
17 changes: 0 additions & 17 deletions packages/chatbot-server-mongodb-public/src/mongoDbMetadata/tags.ts

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import {
} from "./extractMongoDbMetadataFromUserMessage";
import { Eval } from "braintrust";
import { Scorer } from "autoevals";
import { MongoDbTag } from "../mongoDbMetadata";
import { MongoDbTag } from "mongodb-rag-core";
import {
OPENAI_PREPROCESSOR_CHAT_COMPLETION_DEPLOYMENT,
openAiClient,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import { OpenAI } from "mongodb-rag-core/openai";
import {
mongoDbProductNames,
mongoDbProgrammingLanguageIds,
} from "../mongoDbMetadata";
} from "mongodb-rag-core";

export const ExtractMongoDbMetadataFunctionSchema = z.object({
programmingLanguage: z
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import {
} from "./makeStepBackUserQuery";
import { Message, updateFrontMatter } from "mongodb-chatbot-server";
import { ObjectId } from "mongodb-rag-core/mongodb";
import { MongoDbTag } from "../mongoDbMetadata";
import { MongoDbTag } from "mongodb-rag-core";
import {
OPENAI_PREPROCESSOR_CHAT_COMPLETION_DEPLOYMENT,
OPENAI_API_KEY,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,9 @@ const fewShotExamples: OpenAI.ChatCompletionMessageParam[] = [
mongoDbProduct: "MongoDB University",
})
),
makeAssistantFunctionCallMessage(name,{
transformedUserQuery: "What is the skill badge program on MongoDB University?",
makeAssistantFunctionCallMessage(name, {
transformedUserQuery:
"What is the skill badge program on MongoDB University?",
} satisfies StepBackUserQueryMongoDbFunction),
];

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import { binaryNdcgAtK } from "../eval/scorers/binaryNdcgAtK";
import { f1AtK } from "../eval/scorers/f1AtK";
import { precisionAtK } from "../eval/scorers/precisionAtK";
import { recallAtK } from "../eval/scorers/recallAtK";
import { MongoDbTag } from "../mongoDbMetadata";
import { MongoDbTag } from "mongodb-rag-core";
import {
extractMongoDbMetadataFromUserMessage,
ExtractMongoDbMetadataFunction,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import {
} from "./userMessageMongoDbGuardrail";
import { Eval } from "braintrust";
import { Scorer, LLMClassifierFromTemplate } from "autoevals";
import { MongoDbTag } from "../mongoDbMetadata";
import { MongoDbTag } from "mongodb-rag-core";
import {
JUDGE_LLM,
OPENAI_PREPROCESSOR_CHAT_COMPLETION_DEPLOYMENT,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,7 @@ const fewShotExamples: OpenAI.ChatCompletionMessageParam[] = [
rejectMessage: false,
} satisfies UserMessageMongoDbGuardrailFunction),
// Example 16
makeUserMessage(
"What is a skill?"
),
makeUserMessage("What is a skill?"),
makeAssistantFunctionCallMessage(name, {
reasoning:
"This query is asking about MongoDB University's skills program, which allows users to earn a skill badge for taking a short course and completing an assessment. Therefore, it is relevant to MongoDB.",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { Eval, EvalCase, EvalScorer } from "braintrust";
import { MongoDbTag } from "./mongoDbMetadata";
import { MongoDbTag } from "mongodb-rag-core";
import {
findVerifiedAnswer,
verifiedAnswerConfig,
Expand Down
1 change: 1 addition & 0 deletions packages/mongodb-rag-core/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ export * from "./References";
export * from "./VectorStore";
export * from "./arrayFilters";
export * from "./assertEnvVars";
export * from "./mongoDbMetadata";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export * from "./mongoDbMetadata";

pls do this as a named export. in the PR #678, i added much of the metadata stuff.

it's exported as mongodb-rag-core/mongoDbMetadata.

pls update the imports throught this PR accordingly.

Original file line number Diff line number Diff line change
Expand Up @@ -311,12 +311,14 @@ export const mongoDbProducts = [
{
id: "mongodb_university",
name: "MongoDB University",
description: "Online platform that offers certifications, courses, labs, and skills badges",
description:
"Online platform that offers certifications, courses, labs, and skills badges",
},
{
id: "skills",
name: "MongoDB University Skills",
description: "An educational program that allows users to earn a skill badge after taking a short course and completing an assessment",
description:
"An educational program that allows users to earn a skill badge after taking a short course and completing an assessment",
},
] as const satisfies MongoDbProduct[];

Expand Down
76 changes: 76 additions & 0 deletions packages/mongodb-rag-core/src/mongoDbMetadata/tags.ts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export this from mongoDbMetadata/index.ts

Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import { mongoDbProducts, mongodbDrivers } from "./products";
import { mongoDbProgrammingLanguageIds } from "./programmingLanguages";
import { mongoDbTopics } from "./topics";

// Helpers for constructing the `MongoDbTag` union type
const mongoDbProductIds = mongoDbProducts.map((product) => product.id);
const mongoDbDriverIds = mongodbDrivers.map((driver) => driver.id);
const mongoDbTopicIds = mongoDbTopics.map((topic) => topic.id);

/**
All possible MongoDB tags. Useful for tagging evaluations.
*/
export type MongoDbTag =
| (typeof mongoDbProgrammingLanguageIds)[number]
| (typeof mongoDbProductIds)[number]
| (typeof mongoDbDriverIds)[number]
| (typeof mongoDbTopicIds)[number];

/**
All possible MongoDB tags as enum.
*/
export const mongoDbTags = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm confused by this. isn't this a record, not an enum?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typescript does support enums as a firstclass construct https://www.w3schools.com/typescript/typescript_enums.php

consider also doing a Set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, its not an enum. I made some changes that simplify it a bit.

// Add all programming language IDs
...mongoDbProgrammingLanguageIds.reduce((acc, id) => {
acc[id] = id;
return acc;
}, {} as Record<string, string>),

// Add all product IDs
...mongoDbProductIds.reduce((acc, id) => {
acc[id] = id;
return acc;
}, {} as Record<string, string>),

// Add all driver IDs
...mongoDbDriverIds.reduce((acc, id) => {
acc[id] = id;
return acc;
}, {} as Record<string, string>),

// Add all topic IDs
...mongoDbTopicIds.reduce((acc, id) => {
acc[id] = id;
return acc;
}, {} as Record<string, string>),
} as const;

/**
Validates an array of tag names against the MongoDbTags enum.

@param tagNames - An array of strings representing tag names to validate
@param custom - A boolean flag indicating whether custom tags are allowed
@throws {Error} When non-custom tags are used that don't exist in MongoDbTags enum

@remarks
If custom is false, all tags must exist in the MongoDbTags enum.
If any invalid tags are found, throws an error with the list of invalid tags
and the allowed tags from MongoDbTags enum.
*/
export const validateTags = (tagNames: string[], custom: boolean): void => {
if (!custom) {
// check if all tags are allowed using the enum MongoDbTags
const invalidTags = tagNames.filter((tag) => !(tag in mongoDbTags));
if (invalidTags.length > 0) {
throw new Error(
`Invalid tags found: ${invalidTags.join(
", "
)} \nUse the "addCustomTags" transformation instead or use allowed tags: \n - ${Object.keys(
mongoDbTags
)
.sort()
.join("\n - ")}`
);
}
}
};