Skip to content

[Schema Inaccuracy] Duplicate title properties and inline schemas #4622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wolfy1339 opened this issue Mar 12, 2025 · 3 comments
Open

[Schema Inaccuracy] Duplicate title properties and inline schemas #4622

wolfy1339 opened this issue Mar 12, 2025 · 3 comments

Comments

@wolfy1339
Copy link

Schema Inaccuracy

The schema contains many inline schemas (that should be using re-usable components) and many inline schemas and components that have duplicate title properties

This causes problems when using tools like json-schema-to-typescript where you end up with many TypeScript interfaces named User_3, Repository_1, etc.

Expected

No inline schemas for re-used definitions, or schemas that have the same title

Reproduction Steps

This script logs all duplicate title properties found

import schema from "./packages/openapi-webhooks/generated/api.github.com.json" with { type: "json" };
import { writeFileSync } from "node:fs";

function findDuplicateTitles(schema) {
  const seen = new Set();
  const duplicates = [];

  function traverse(obj, path = []) {
    if (typeof obj !== "object" || obj === null) return;

    if (obj.title) {
      const titlePath = path.join("/");
      if (seen.has(obj.title)) {
        duplicates.push({ title: obj.title, path: titlePath });
      } else {
        seen.add(obj.title);
      }
    }

    // Traverse properties like oneOf, anyOf, etc.
    for (const [key, value] of Object.entries(obj)) {
      if (Array.isArray(value)) {
        value.forEach((item, index) => traverse(item, [...path, key, index]));
      } else if (typeof value === "object") {
        traverse(value, [...path, key]);
      }
    }
  }

  traverse(schema);
  return duplicates;
}

async function main() {
  const duplicates = findDuplicateTitles(schema)
    .sort((a, b) => {
      if (a.title < b.title) return -1;
      if (a.title > b.title) return 1;
      return 0;
    })
    .map(({ title, path }) => `- [ ] Title: ${title}, Path: \`#/${path}\``);

  writeFileSync("duplicates.txt", duplicates.join("\n"));
}

main().catch(console.error);

Here is the list of duplicates I have found, which is too long to post directly into the issue body:
duplicates.txt

@bearcherian
Copy link
Contributor

bearcherian commented Mar 27, 2025

@wolfy1339 Thanks for opening this issue. The JSON Schema spec provides this documentation for the title field1:

The title keyword in JSON Schema is used to provide a human-readable label for a schema or its parts. It does not affect data validation but serves as an informative annotation.

Since the title is just meta data for the schema and not intended to be a unique identifier, we don't consider the duplicate titles an issue in our schema. I would work with the maintainer of the json-schema-to-typescript and see if they have a way to work around that. Alternatively, GitHub does provide openapi-types, a library of Typescript definitions generated from our OpenAPI schema.

Footnotes

  1. https://www.learnjsonschema.com/2020-12/meta-data/title/

@wolfy1339
Copy link
Author

Yes, I am aware of all those points. I am also very aware of openapi-types, as I help maintain it.

The duplicate titles are pointing to a bigger issue, those items should most likely be using the reusable components or have a different title that explains the difference between the reusable component and that inline definition.

Example, having the title be App Instance on many of them isn't very informative considering there is already a reusable component with the same title. What is different between that one and the reusable component? Maybe it's App Instance With Organization, or App Instance Owned By Organization?

I understand that titles aren't necessarily unique, but I believe that this points to a bigger issue of general duplication within the OpenAPI spec and should be looked at for each occurrence to see if there is a way to reduce the duplication.

The schema is already a mighty 10MB.

I hope you understand the point I'm trying to make with this issue.

@bearcherian
Copy link
Contributor

Thanks for clarifying the issue. I'll create an issue to track this internally and de-duplicate the components, or use better titles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants