Abhishek ChaudharyAbhishek Chaudhary

Music Schema.org JSON-LD by Hand for a Hindi Catalog

I hand-rolled the MusicRecording and MusicComposition JSON-LD for a 49+ track Hindi catalogue. Here is the graph, the spec gotchas, and why.

Abhishek Chaudhary11 min read

Music schema.org JSON-LD is the structured-data layer that tells Google, Perplexity, and Claude what a song actually is: who composed it, who performed it, when it was released, which album it belongs to, which ISRC identifies it, and whether it is a reimagining of another track. I ship a 49+ track Hindi and Urdu catalogue with this markup hand-rolled in one file, lib/seo.tsx, on a Next.js 16 site running SQLite. This post is the exact graph I settled on, three spec gotchas most music-SEO blogs get wrong, and what the schema still does not do.

TL;DR

  • composer, lyricist, and lyrics belong on MusicComposition, not MusicRecording. Most plugins get this wrong.
  • isrcCode is a first-class property on MusicRecording. releaseUpc is not, so it goes in as a PropertyValue identifier.
  • Reimagined variants link back to the original with isBasedOn, which mirrors the parentId foreign key in the media table.
  • A site-wide @id graph (#person, #website, #musicgroup) lets every per-track schema resolve by reference instead of duplicating the artist block.
  • Next.js 16's official recommendation is a plain <script type="application/ld+json"> tag, not next/script.

Why composer, lyricist, and lyrics belong on MusicComposition, not MusicRecording

This is the first place almost every "music schema for SEO" post I have read in 2026 fails. The spec is explicit and most plugins ignore it. On schema.org/MusicRecording the only authorship-adjacent property is byArtist, which points at the performer. There is no composer, no lyricist, no lyrics property on MusicRecording. Those three live on MusicComposition, the abstract song as a work, which the recording is a recordingOf.

In lib/seo.tsx the getMusicRecordingSchema builder encodes this directly. The code produces a MusicRecording object with byArtist, producer, copyrightHolder, duration, image, datePublished, copyrightYear, copyrightNotice, and recordingOf, and the recordingOf block is a nested MusicComposition carrying composer, lyricist, and lyrics. The two-level shape matches the real distinction: the song is a composition, the recording is one rendering of that composition. A future live version or a cover would be a new MusicRecording with the same recordingOf target.

The practical consequence is that crawlers that obey the spec can answer different queries off the same markup. "Who wrote [Aaj Bhi](/music/aaj-bhi "Aaj Bhi, 2015 Hindi song by Abhishek Chaudhary")?" resolves through the composition; "who performed the 2015 release?" resolves through the recording. When the composition and the recording are collapsed into one object, both questions bleed into each other and LLMs tend to pick the wrong signal.

How the @id graph ties Person, WebSite, and MusicGroup into one resolvable node

Schema.org supports JSON-LD @id anchors, which are URIs that other nodes can point at by reference. I use three at the site root: ${SITE_URL}/#person, ${SITE_URL}/#website, and ${SITE_URL}/#musicgroup. Every per-page and per-track schema that needs to reference the artist uses { "@id": "${SITE_URL}/#person" } instead of duplicating the full Person block.

This is emitted once from app/layout.tsx, which calls <JsonLd data={[getPersonSchema(), getWebSiteSchema(), getMusicGroupSchema(), getProfilePageSchema()]} /> inside <body>. Every /music/[slug] page then emits a MusicRecording whose byArtist, producer, and copyrightHolder are all the same one-line @id reference. The recordingOf.composer and recordingOf.lyricist are the same reference. Five properties, one resolution. The crawler parses the root layout schema once and reuses the Person node across every subsequent track page.

Without the @id pattern, every one of the 49 track pages would carry its own full Person block with social links, job titles, knowsAbout, address, and alternateName. That is 49 copies of the same data, which is exactly the "spammy markup" pattern Google has penalised since 2024. The @id reference is both smaller on the wire and more correct semantically. Perplexity in particular seems to resolve @id-linked graphs cleanly when answering "who is Abhishek Chaudhary" queries, because the node is definitional and cited once.

ISRC as a property, UPC as a PropertyValue: the real schema.org distinction

The media table stores both isrcCode and releaseUpc as nullable text columns. When the MusicRecording schema is built, they render differently, and the reason is in the spec.

MusicRecording.isrcCode is a first-class property on the type. If an ISRC exists on a track, the schema emits isrcCode: "IN-XXX-XX-XXXXX" directly at the top level.

UPC is a Universal Product Code, which identifies a release (an album or a single's packaging) rather than the recording itself. There is no first-class upcCode on MusicRecording. The correct schema.org pattern for any external identifier that does not have its own property is the identifier property with a nested PropertyValue. In lib/seo.tsx that renders as:

{
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "UPC",
    "value": "123456789012"
  }
}

Most music-schema plugins I have reviewed in 2026 either drop UPC entirely or, worse, invent a non-standard upcCode property. The PropertyValue wrapper is the right answer and it also scales: EAN, catalogue numbers, and label-assigned identifiers all fit the same shape with a different propertyID.

Linking a reimagined NCS variant to the original with isBasedOn

Half of the catalogue now exists as both the original recording and one or more reimagined variants released under Creative Commons Attribution 4.0 for the NCS catalogue. Aaj Bhi has two variants: Aaj Bhi (Reimagined v1) and Aaj Bhi (Reimagined v2). The database encodes this with a parentId foreign key on the media table, self-referencing media.id.

In JSON-LD the equivalent relationship is isBasedOn. When the track loader resolves a parent, getMusicRecordingSchema emits:

{
  "isBasedOn": {
    "@type": "MusicRecording",
    "@id": "https://abhishekchaudhary.com/music/aaj-bhi",
    "name": "Aaj Bhi"
  }
}

This does three things at once. It gives Google an explicit derivative relationship, which is the "cover version" pattern in Google's music structured data guidance. It lets an LLM that encounters the variant page trace back to the canonical original in one hop. And it preserves copyright attribution: the variant's copyrightNotice says it is CC BY 4.0, but the isBasedOn tells the crawler the composition and the artist are the same as the original. Without isBasedOn, two tracks with the same title and different licences read as either a contradiction or a duplicate.

The copyrightNotice branch is explicit in the same builder. If track.ncs is true, the notice is the CC BY 4.0 attribution string. If not, it is the all-rights-reserved string for the same year. Two tracks, same composition, different recordings, different licences, one consistent schema graph.

inLanguage "hi" and why most Hindi catalog sites skip it

Every MusicRecording in lib/seo.tsx carries inLanguage: "hi". Every WebPage and ProfilePage at the site level carries inLanguage: "en-US", because the editorial surface is in English. The separation is deliberate: the site is English-language, the recordings are Hindi.

Most Hindi-catalogue sites I have looked at in 2026 either omit inLanguage on MusicRecording entirely or set it to the page language, which is usually English. Both are wrong. The property is about the content, not the container. A reader searching "hindi songs by independent artists" or an LLM answering "what Hindi tracks are available on independent artist sites" benefits from the language being explicit on the recording itself. A crawler that sees inLanguage: "en-US" on a Hindi track silently filters it out of Hindi-intent results.

The BCP 47 tag "hi" covers Hindi. For Urdu tracks, the correct tag is "ur". The current builder sets "hi" globally, which is a known limitation documented in the author_gaps list and worth fixing in the next pass.

Hand-rolled JsonLd in Next.js 16 without next/script

The JsonLd helper in lib/seo.tsx renders a plain HTML <script type="application/ld+json"> via dangerouslySetInnerHTML. It does not use next/script. This is the official Next.js recommendation for JSON-LD, documented in the app-router data section, and it has held through the Next.js 16 proxy migration.

The reason matters for solo ops. next/script is designed for client-runtime script loading with a strategy field (beforeInteractive, afterInteractive, lazyOnload). JSON-LD is data embedded in the document, not a runtime script. It should ship inline on server-rendered HTML so that every crawler visit gets the structured data on first paint. Using next/script for JSON-LD breaks that contract and sometimes defers the payload to a place where a headless crawler does not see it.

The one thing the helper adds over a raw <script> tag is the </ escape: the code does JSON.stringify(graph, null, 0).replace(/</g, "\\u003c") before injection. That guards against a malicious track title or description breaking out of the script context. It is four lines of defence for a one-line payload and worth shipping even when all inputs come from the admin panel.

The same pattern works on the blog. BlogPosting JSON-LD on each published post is emitted from getBlogPostingSchema. The SQLite vs Postgres post is an example of the shape on a shipped article.

What this schema graph does not do yet

Four things the current builder does not cover, in order of how much they matter:

First, Urdu-forward tracks still carry inLanguage: "hi". The right fix is to thread a language field through the media row or derive it from tags at render time.

Second, there is no MusicAlbum emission on the per-track page unless the track explicitly belongs to an album. The 2015 Echoes at Taj album and Aaj Bhi are both legitimate MusicAlbum targets and the builder exists (getMusicAlbumSchema) but is not yet wired from every track that should resolve to an album.

Third, speakable is not emitted on blog posts. It is a small schema addition that lights up voice-search surfaces in 2026 and costs nothing to add.

Fourth, recordingOf.lyrics is only populated when the track's description field is present, which conflates the lyrics with the editorial description. A dedicated lyrics column on media would separate the two and let the schema carry real lyrics text rather than an editorial summary, which is the single change I would queue first for the next schema pass.

The spec itself will not stop moving. Schema.org/MusicComposition has picked up properties I did not have when the builder shipped, and the right cadence for a music-catalogue site is a quarterly schema review, not a one-time implementation. Every new property added to the builder cascades across the 49 tracks with no further work, which is the argument for keeping everything in one typed file and one typed database column.

FAQ

What is the difference between MusicRecording and MusicComposition in schema.org?

MusicRecording is a specific rendering of a song: a performance captured at a point in time, with a duration, an ISRC, a release date, and a byArtist. MusicComposition is the song as an abstract work: the composer, the lyricist, the lyrics, the date the composition was written, and the conceptual entity that multiple recordings point back to. On the spec, MusicRecording.recordingOf is the property that connects the two. Every recording has one composition; every composition can have many recordings. Collapsing the two into a single node is the most common mistake I see in music-schema markup.

Where does lyrics actually belong in JSON-LD for a song?

On MusicComposition, not on MusicRecording. The lyrics property is typed as a CreativeWork, so the canonical shape is a nested object with text and optionally inLanguage. A recording has a duration and a release date; a composition has lyrics and a lyricist. Putting lyrics on the recording is a spec violation that most WordPress music plugins commit by default. The correct pattern is MusicRecording.recordingOf = MusicComposition { lyrics: { "@type": "CreativeWork", text: "..." } }.

Is ISRC a property or an identifier in schema.org?

ISRC is a first-class property on MusicRecording, spelled isrcCode. UPC and other catalogue or release identifiers are not first-class and should use the identifier property with a nested PropertyValue carrying propertyID: "UPC" and the value. The distinction matters because isrcCode is something Google's music structured-data pipeline understands directly, while PropertyValue is the generic fallback for any identifier schema.org has not promoted to first class. If a plugin exposes a upcCode property, that is the plugin inventing an invalid name.

How do I mark up a reimagined NCS version of an original Hindi track?

With a MusicRecording for the variant, a distinct ISRC if assigned, a distinct copyrightNotice reflecting the CC BY 4.0 licence, and an isBasedOn pointing at the original MusicRecording by @id. The variant's recordingOf still points at the same MusicComposition as the original, because the song as a work has not changed. The composition stays one; the recordings split. That keeps authorship and lyrics in one place while allowing licence, release date, and mix to differ per recording.

Does Google show rich results for MusicRecording schema in 2026?

Google's visible music rich-result surfaces in 2026 are narrower than the schema's nominal coverage, and most MusicRecording markup does not produce a rich card on a standard SERP. The value is not the rich card. Structured data is load-bearing for AI Overviews and for Perplexity citations even when no visible snippet appears, because it is the entity graph those systems resolve against when answering a query. A track with clean MusicRecording and MusicComposition schema gets cited as a source even when it has no visible rich result attached to the page.

Can I hand-write JSON-LD in Next.js 16 without a plugin?

Yes, and it is the official recommendation. The pattern is a plain <script type="application/ld+json"> rendered via dangerouslySetInnerHTML with the JSON payload serialised from a typed helper. Do not use next/script for JSON-LD; it is designed for client-runtime scripts, not inline document data. The schema should ship on the server-rendered HTML so that every crawler sees the payload on first fetch. The defensive </ escape in the payload (.replace(/</g, "\\u003c")) guards against injection from any text field that flows into the schema from the database.