Astro Content Collections: Real-World schemas you can steal

Astro's content collections solve a problem every content-heavy site eventually runs into: you have a folder of Markdown files or JSON entries, and at some point a typo in a frontmatter key or a missing date silently breaks your build — or worse, breaks at runtime. Collections move that failure to a type-checked, build-time contract you define once in src/content.config.ts.

The official docs show the basics. This post goes further — five schemas modeled on real production use cases, with patterns for z.coerce.date(), reference(), z.discriminatedUnion, transform(), and the file() loader. Steal what fits, adapt the rest.

How content collections work (the thirty-second version)

You create src/content.config.ts, define one collection per content type using defineCollection(), attach a Zod schema, then export them all under the collections named export. Astro validates every entry at build time and generates TypeScript types automatically. Querying is done with getCollection() and getEntry() from astro:content.

// src/content.config.ts
import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blog = defineCollection({
  loader: glob({ base: './src/content/blog', pattern: '**/*.{md,mdx}' }),
  schema: z.object({
    title: z.string(),
    pubDate: z.coerce.date(),
  }),
});

export const collections = { blog };

That's the skeleton. Now the useful part.

Schema 1: Blog posts

A blog schema needs more than just a title and date. You want optional draft suppression, tag arrays, an optional cover image, and a computed reading-time hook. Here's a schema that handles all of it:

// src/content.config.ts
import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blog = defineCollection({
  loader: glob({ base: './src/content/blog', pattern: '**/*.{md,mdx}' }),
  schema: z.object({
    title: z.string().min(10).max(100),
    description: z.string().min(50).max(200),
    pubDate: z.coerce.date(),
    updatedDate: z.coerce.date().optional(),
    author: z.string().default('The Summit Team'),
    tags: z.array(z.string()).default([]),
    draft: z.boolean().default(false),
    // image() validates the file exists on disk and returns
    // a typed object with processed src — import it from 'astro:content'
    // cover: image().optional(),
  }),
});

export const collections = { blog };

A few things worth noting here. z.coerce.date() is the right choice for date fields — it accepts the ISO-8601 strings YAML produces and turns them into real Date objects, so pubDate.getFullYear() works without a manual parse. Using z.string().min(10).max(100) on titles adds real guardrails: you get a build error if a title is suspiciously short or truncated.

To filter out drafts at query time:

// src/pages/blog/index.astro
import { getCollection } from 'astro:content';

const posts = await getCollection('blog', ({ data }) => {
  return import.meta.env.PROD ? !data.draft : true;
});

The callback form of getCollection() receives the full typed entry — data.draft is typed as boolean, not unknown, because the schema said so.

Schema 2: Product or theme catalog (JSON-backed)

When your content is structured data rather than prose, use the file() loader to pull from a single JSON file. This pattern is well-suited to product catalogs, theme listings, or any dataset that doesn't need individual Markdown files.

import { defineCollection, z } from 'astro:content';
import { file } from 'astro/loaders';

const themes = defineCollection({
  loader: file('src/data/themes.json'),
  schema: z.object({
    slug: z.string(),
    name: z.string(),
    category: z.enum(['Services', 'Healthcare', 'Real Estate', 'Retail']),
    status: z.enum(['live', 'coming-soon']),
    tagline: z.string().max(120),
    description: z.string().min(100).max(600),
    demoUrl: z.string().url(),
    price: z.object({
      astro: z.number().int().positive(),
      agency: z.number().int().positive(),
    }),
    features: z.array(z.string()).min(3).max(12),
    pages: z.array(z.string()),
    lighthouse: z.object({
      performance: z.number().min(0).max(100),
      seo: z.number().min(0).max(100),
    }),
    versions: z.object({
      astro: z.string(),
      tailwind: z.string(),
    }),
    changelog: z.array(
      z.object({
        version: z.string(),
        date: z.coerce.date(),
        notes: z.array(z.string()),
      })
    ),
  }),
});

export const collections = { themes };

The nested z.object() calls for price, lighthouse, and versions give you deep validation with no extra work. z.string().url() on demoUrl catches fat-finger errors like a missing https://. z.number().min(0).max(100) on Lighthouse scores means someone can't accidentally enter 1000 and make the marketing page look ridiculous.

Schema 3: Authors with cross-collection references

Once you have multiple collections, you want to link them. Astro's reference() function creates a typed foreign-key relationship between collections. The classic case is blog posts that reference an authors collection:

import { defineCollection, reference, z } from 'astro:content';
import { glob, file } from 'astro/loaders';

const authors = defineCollection({
  loader: file('src/data/authors.json'),
  schema: z.object({
    name: z.string(),
    bio: z.string().max(300),
    avatar: z.string().url(),
    social: z.object({
      twitter: z.string().optional(),
      github: z.string().optional(),
    }),
  }),
});

const blog = defineCollection({
  loader: glob({ base: './src/content/blog', pattern: '**/*.{md,mdx}' }),
  schema: z.object({
    title: z.string(),
    pubDate: z.coerce.date(),
    author: reference('authors'),           // typed foreign key
    relatedPosts: z.array(reference('blog')).default([]),
  }),
});

export const collections = { blog, authors };

At query time, reference() fields come back as { collection: 'authors', id: 'jason' }. To hydrate them into real data, use getEntry() or pass the array to getEntries():

import { getEntry, getEntries } from 'astro:content';

// Resolving a single reference
const authorEntry = await getEntry(post.data.author);

// Resolving an array of references
const related = await getEntries(post.data.relatedPosts);

This keeps your frontmatter lean — just an ID string — while giving you the full typed object whenever you need it.

Schema 4: Service pages with a discriminated union

Service-business sites often have content that looks similar but isn't identical: a city landing page needs different required fields than a service-category page. Cramming both into one loose schema leads to optional fields everywhere. A discriminated union is cleaner:

import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const servicePageBase = z.object({
  title: z.string(),
  description: z.string(),
  pubDate: z.coerce.date(),
});

const cityPage = servicePageBase.extend({
  type: z.literal('city'),
  city: z.string(),
  state: z.string().length(2),             // e.g. "TX"
  serviceArea: z.array(z.string()),
});

const categoryPage = servicePageBase.extend({
  type: z.literal('category'),
  category: z.string(),
  parentService: z.string().optional(),
  icon: z.string().optional(),
});

const pages = defineCollection({
  loader: glob({ base: './src/content/pages', pattern: '**/*.{md,mdx}' }),
  schema: z.discriminatedUnion('type', [cityPage, categoryPage]),
});

export const collections = { pages };

With a discriminated union, Zod looks at the type field first and applies the matching schema. A city page that's missing city fails at build time. A category page that doesn't have the irrelevant serviceArea field doesn't get flagged. TypeScript narrows correctly in your components too — after checking entry.data.type === 'city', the IDE knows entry.data.city exists.

Schema 5: Changelog entries

Changelogs are a good fit for the file() loader with a tightly typed schema. One JSON file, one schema, one getCollection() call — sorted by date.

import { defineCollection, z } from 'astro:content';
import { file } from 'astro/loaders';

const changelog = defineCollection({
  loader: file('src/data/changelog.json'),
  schema: z.object({
    version: z.string().regex(/^\d+\.\d+\.\d+$/),   // enforces semver format
    date: z.coerce.date(),
    type: z.enum(['feature', 'fix', 'breaking', 'chore']).default('feature'),
    theme: z.string().optional(),                     // which theme this applies to
    notes: z.array(z.string()).min(1),
    breakingChanges: z.array(z.string()).default([]),
  }),
});

export const collections = { changelog };

The regex on version is worth including — it prevents entries like "v1.2" or "1.2" from slipping in and breaking the sort. z.enum on type means your changelog page can render different visual badges per type without a fragile string comparison.

Querying and sorting:

const entries = await getCollection('changelog');
const sorted = entries.sort(
  (a, b) => b.data.date.valueOf() - a.data.date.valueOf()
);

Patterns worth memorizing

z.coerce.date() instead of z.date()

YAML and JSON dates are strings. z.date() will reject them. z.coerce.date() calls new Date(value) for you. Use it any time a date comes from a content file.

Defaults push complexity to the schema, not the template

Rather than writing entry.data.draft ?? false in every template, define draft: z.boolean().default(false) in the schema. The value is always defined and typed when it reaches your component.

Transforms let you compute derived values at parse time

slug: z.string().transform((s) => s.toLowerCase().replace(/\s+/g, '-')),

A transform() call runs during validation and returns the derived value as the field's type. Use it sparingly — usually for normalization, not business logic.

Shared base schemas reduce duplication

If several collections share common fields, define a base schema and extend it:

const seoBase = z.object({
  title: z.string().max(70),
  description: z.string().max(160),
  canonical: z.string().url().optional(),
});

const blog = defineCollection({
  loader: glob({ base: './src/content/blog', pattern: '**/*.{md,mdx}' }),
  schema: seoBase.extend({
    pubDate: z.coerce.date(),
    tags: z.array(z.string()).default([]),
  }),
});

Wrapping up

The payoff of a well-typed content collection shows up immediately — build-time errors on missing fields, full autocomplete when you query entries, and no defensive ?.data?.title ?? '' chains in templates. Start with the blog schema, add authors when you need them, reach for discriminatedUnion when two page types are almost-but-not-quite the same thing. The schema is cheap to write and expensive not to have.