Static Cache Wrangler – Headless Assistant

Static Cache Wrangler – Headless Assistant is a free companion plugin for the WordPress Static Cache Wrangler that identifies core Guttenberg and Kadence blocks and enables creation of core files necessary to import into Sanity®–a popular headless CMS. The tooling is 95% command-interface driven and does require the WP-CLI and BASH skills.

WP-CLI Tooling

Headless Assistant features rich, composable WP-CLI tooling that is fully compatible with WordPress and other shell tools you count on.

Bash
wp scw-headless analyze <file> [--pattern=<name>] [--format=<format>]
wp scw-headless convert [--cms=<target>] [--limit=<number>] [--verbose]
wp scw-headless detectors [--format=<format>] [--type=<type>]
wp scw-headless info [--format=<format>]
wp scw-headless normalize <file> [--output=<path>] [--verbose]
wp scw-headless patterns [--format=<format>] [--enabled-only] [--verbose]
wp scw-headless prepare-assets [--force] [--verbose]
wp scw-headless scan [--format=<format>]
wp scw-headless targets [--format=<format>]

Background

Static Cache Wrangler – Headless Assistant is a composable set of tools designed to help developers analyze, decompose, and transform WordPress sites for use in headless content architectures. It is not a wizard, a migration service, or an opinionated framework. It is tooling for developers who want control over how WordPress content is extracted, represented, and reused.

The ecosystem consists of a main and companion WordPress plugin (free) and a soon to be released JavaScript reference implementation for uploading media to headless CDN infrastructure. Today, a core WordPress plugin is available for free on WordPress.org. Additional plugins and tools are under development and may be made available to qualified professionals for evaluation.


WordPress vs. Headless

For over two decades, the WordPress community has invested in open-source software—core, themes, and plugins—making WordPress one of the most widely adopted content publishing platforms on the internet. The creator of Static Cache Wrangler has been part of that ecosystem since 2008.

WordPress is an end-to-end system that tightly integrates design, functionality, and content. While this approach has proven effective for a wide range of use cases, no single architectural model fits every problem.

“Headless CMS” describes an architectural pattern where content management is decoupled from presentation and delivery. This pattern has led to commercial platforms such as Sanity, Contentful, and Strapi, among others. WordPress is often labeled a “monolith” due to its tightly coupled install model, yet it exposes APIs and can be used solely as a content source when the business and technical case supports that decision.

The conversation should not be framed as WordPress versus headless, but rather as a set of architectural tradeoffs grounded in cost, scale, performance, and long-term maintainability.


Static Cache Wrangler Tool Ecosystem

Static Cache Wrangler is not a point solution and not a guided workflow. It is a collection of composable tools that reduce WordPress output to its simplest, most portable form: HTML.

The core Static Cache Wrangler plugin supports a variety of use cases, including:

  • Building static websites
  • Creating static, redundant failover sites
  • Reducing operational complexity by treating WordPress as a build-time system

A companion plugin, Static Cache Wrangler – Coverage Assistant, helps site owners and developers understand how much of a site is being actively cached and can assist in automating cache generation for specific site areas.

Together, these tools treat WordPress as a system that produces artifacts, rather than a system that must remain live and dynamic at runtime.


Static Cache Wrangler – Headless Assistant

Static Cache Wrangler – Headless Assistant began as an exploratory project with a specific question:

Can WordPress content and presentation formats (blocks) be decomposed from rendered output and transformed into data structures usable by headless CMS architectures?

The project assumed that most developers are not interested in “apples-to-apples” WordPress migrations to headless systems. A move to headless is typically transformative—affecting UI, data models, and editorial workflows.

The experiment demonstrated that rendered HTML, when combined with knowledge of block structures, could serve as a reliable starting point for analysis and transformation. The result is a WordPress.org plugin, released in early 2026, that uses the HTML output generated by Static Cache Wrangler and exposes a series of composable WP-CLI tools.

These tools allow developers to analyze and transform sites built with core Gutenberg blocks and selected third-party blocks (such as Kadence) into import formats for headless CMS platforms. The current implementation supports generating Sanity imports. From that point forward, developers retain full control over how content, structure, and presentation are reimagined.


Step by Step Instructions

Markdown
# Migration Guide: WordPress to Sanity CMS

**Static Cache Wrangler Headless Assistant v2.1.0**

## Overview

This guide covers the complete migration process from WordPress to Sanity CMS using the Static Cache Wrangler (SCW) Headless Assistant plugin. The approach treats cached HTML files as the source of truth, extracting semantic content and metadata to create portable, structured data.

**Key Principles:**
- Documents represent intent, not templates
- Objects represent composition, not layout  
- References over duplication
- Everything is safe to delete or evolve
- Nothing assumes a frontend framework

## General Approach

### Migration Philosophy

Traditional WordPress migrations extract data from the database, but this creates challenges:
- Database schema dependencies
- Plugin-specific data structures
- Active WordPress instance required
- Complex query relationships

The SCW approach is fundamentally different:

1. **Cache First**: Use SCW to generate static HTML snapshots of your WordPress site
2. **Extract Semantics**: Parse the cached HTML to identify content patterns (blocks, images, links, metadata)
3. **Convert to Portable Format**: Transform extracted patterns into generic PortableText/NDJSON
4. **Import to Target CMS**: Load the structured data into Sanity (or other headless CMS)

### Prerequisites

- WordPress site with Static Cache Wrangler plugin active
- SCW Headless Assistant plugin v2.1.0+ installed
- Site fully cached (run `wp scw cache` to generate HTML snapshots)
- WP-CLI access for command execution
- Sanity project created with v3 schema design (see V13SCHEMADESIGN.md)

### Migration Workflow

```
WordPress Site
    ↓ (Static Cache Wrangler)
Cached HTML Files + Assets
    ↓ (SCW Headless Assistant v2.1.0)
NDJSON Export + Portable Text
    ↓ (Sanity CLI)
Sanity CMS Dataset
```

## Phase 1: Extraction and Validation

### Step 1: Generate Cached Files

Ensure your WordPress site is fully cached:

```bash
# Cache entire site
wp scw status

# Verify cache directory
ls -la /path/to/wordpress/cache/
```

### Step 2: Run Conversion

Execute the headless conversion command:

```bash
# Convert all cached files to NDJSON
wp scw-headless convert \
  --output-dir=/path/to/exports \
  --format=sanity

# For verbose output with debugging
wp scw-headless convert \
  --output-dir=/path/to/exports \
  --format=sanity \
  --verbose
```

This generates:
- `posts.ndjson` - All post content with portable text
- `pages.ndjson` - All page content  
- `authors.ndjson` - Author profiles
- `categories.ndjson` - Category taxonomy
- `tags.ndjson` - Tag taxonomy
- `images.ndjson` - Image assets with metadata

### Step 3: GROQ Validation Queries

After importing to Sanity, validate the conversion quality with these GROQ queries in Sanity Vision:

#### Check Total Document Counts

```groq
{
  "posts": count(*[_type == "post"]),
  "pages": count(*[_type == "page"]),
  "authors": count(*[_type == "author"]),
  "categories": count(*[_type == "category"]),
  "images": count(*[_type == "sanity.imageAsset"])
}
```

#### Verify Content Structure

```groq
// Sample post with all relationships
*[_type == "post"][0] {
  title,
  slug,
  publishedAt,
  author->{name, slug},
  categories[]->{title, slug},
  "blockCount": length(body),
  "imageCount": length(body[_type == "image"]),
  "hasExcerpt": defined(excerpt)
}
```

#### Analyze Block Type Distribution

```groq
// What content patterns were detected?
*[_type == "post"] {
  "blocks": body[]._type
} | {
  "allBlocks": @.blocks[]
} | {
  "blockTypes": array::unique(@.allBlocks),
  "distribution": @.allBlocks
}
```

#### Check Semantic Conversion Rate

```groq
// How much content was successfully converted?
*[_type == "post"] {
  _id,
  title,
  "totalBlocks": length(body),
  "semanticBlocks": length(body[_type != "block"]),
  "fallbackBlocks": length(body[_type == "block"]),
  "conversionRate": length(body[_type != "block"]) / length(body) * 100
} | order(conversionRate desc)
```

**Target Metrics:**
- Semantic conversion rate: 70%+ (varies by site complexity)
- Zero fallback blocks for standard WordPress/Gutenberg content
- All images present with proper references
- All internal links preserved as references

#### Validate Link Preservation

```groq
// Check internal link conversion
*[_type == "post"][0].body[_type == "block"][0] {
  children[]{
    _type,
    marks,
    "hasInternalLink": defined(marks[0]) && marks[0] match "link-*"
  }
}
```

## Phase 2: Migration Script and Verification

### Sanity Import Script

Create a `migrate.js` file in your Sanity project:

```javascript
const fs = require('fs');
const path = require('path');
const sanityClient = require('@sanity/client');
const { pipeline } = require('stream/promises');
const ndjson = require('ndjson');

// Initialize Sanity client
const client = sanityClient({
  projectId: 'your-project-id',
  dataset: 'production',
  token: 'your-token-with-write-access',
  apiVersion: '2024-01-01',
  useCdn: false
});

/**
 * Import NDJSON file to Sanity dataset
 * 
 * @param {string} filePath - Path to NDJSON file
 * @param {string} documentType - Sanity document type
 */
async function importNDJSON(filePath, documentType) {
  console.log(`Importing ${documentType} from ${filePath}...`);
  
  const documents = [];
  
  await pipeline(
    fs.createReadStream(filePath),
    ndjson.parse(),
    async function* (source) {
      for await (const doc of source) {
        documents.push(doc);
        yield doc;
      }
    }
  );

  // Batch import with transaction
  const transaction = client.transaction();
  
  documents.forEach(doc => {
    transaction.createOrReplace(doc);
  });

  const result = await transaction.commit();
  console.log(`✓ Imported ${documents.length} ${documentType} documents`);
  
  return result;
}

/**
 * Main migration process
 */
async function migrate() {
  const exportDir = './exports'; // Path to your NDJSON exports
  
  try {
    // Import in dependency order
    console.log('Starting migration...\n');
    
    // 1. Authors (no dependencies)
    await importNDJSON(
      path.join(exportDir, 'authors.ndjson'),
      'author'
    );
    
    // 2. Taxonomies (no dependencies)
    await importNDJSON(
      path.join(exportDir, 'categories.ndjson'),
      'category'
    );
    await importNDJSON(
      path.join(exportDir, 'tags.ndjson'),
      'tag'
    );
    
    // 3. Images (no dependencies)
    await importNDJSON(
      path.join(exportDir, 'images.ndjson'),
      'sanity.imageAsset'
    );
    
    // 4. Content (references authors, categories, tags, images)
    await importNDJSON(
      path.join(exportDir, 'posts.ndjson'),
      'post'
    );
    await importNDJSON(
      path.join(exportDir, 'pages.ndjson'),
      'page'
    );
    
    console.log('\n✓ Migration complete!');
    
  } catch (error) {
    console.error('Migration failed:', error);
    process.exit(1);
  }
}

// Run migration
migrate();
```

### Run the Migration

```bash
# Install dependencies
npm install @sanity/client ndjson

# Execute migration
node migrate.js
```

### Post-Migration GROQ Verification

After running the migration script, verify data integrity:

#### Referential Integrity Check

```groq
// Find posts with missing author references
*[_type == "post" && !defined(author)] {
  _id,
  title,
  "issue": "Missing author reference"
}

// Find posts with missing category references
*[_type == "post" && count(categories) == 0] {
  _id,
  title,
  "issue": "No categories assigned"
}
```

#### Image Reference Validation

```groq
// Check for broken image references
*[_type == "post"] {
  _id,
  title,
  "images": body[_type == "image"]{
    asset->{_id, url},
    "isBroken": !defined(asset->url)
  }
}[count(images[isBroken == true]) > 0]
```

#### Content Quality Audit

```groq
// Posts with low semantic conversion
*[_type == "post"] {
  _id,
  title,
  "totalBlocks": length(body),
  "fallbackBlocks": length(body[_type == "block"]),
  "conversionRate": (length(body) - length(body[_type == "block"])) / length(body) * 100
}[conversionRate < 50] | order(conversionRate asc)
```

#### URL Slug Uniqueness

```groq
// Detect duplicate slugs
*[_type == "post"] {
  "slug": slug.current,
  "count": count(*[_type == "post" && slug.current == ^.slug.current])
}[count > 1]
```

## Schema Overview

### Sanity v3 Schema Design

SCW Headless Assistant v2.1.0 generates content compatible with Sanity's v3 schema design. For complete schema documentation, see **V13SCHEMADESIGN.md** in the plugin documentation.

### Core Document Types

The migration creates these Sanity schema types:

#### Post

```javascript
{
  name: 'post',
  type: 'document',
  fields: [
    {name: 'title', type: 'string'},
    {name: 'slug', type: 'slug'},
    {name: 'author', type: 'reference', to: [{type: 'author'}]},
    {name: 'publishedAt', type: 'datetime'},
    {name: 'categories', type: 'array', of: [{type: 'reference', to: [{type: 'category'}]}]},
    {name: 'tags', type: 'array', of: [{type: 'reference', to: [{type: 'tag'}]}]},
    {name: 'excerpt', type: 'text'},
    {name: 'body', type: 'array', of: [
      {type: 'block'},
      {type: 'image'},
      // Custom block types detected during conversion
    ]},
    {name: 'featuredImage', type: 'image'}
  ]
}
```

#### Page

```javascript
{
  name: 'page',
  type: 'document',
  fields: [
    {name: 'title', type: 'string'},
    {name: 'slug', type: 'slug'},
    {name: 'body', type: 'array', of: [{type: 'block'}, {type: 'image'}]},
    {name: 'publishedAt', type: 'datetime'}
  ]
}
```

#### Author

```javascript
{
  name: 'author',
  type: 'document',
  fields: [
    {name: 'name', type: 'string'},
    {name: 'slug', type: 'slug'},
    {name: 'bio', type: 'text'},
    {name: 'image', type: 'image'}
  ]
}
```

### Custom Block Types

The converter detects and creates schema for these WordPress-specific blocks:

- **Gutenberg Core Blocks**: `core/heading`, `core/list`, `core/quote`, `core/code`, etc.
- **Kadence Blocks**: `kadence/advancedheading`, `kadence/rowlayout`, `kadence/column`, etc.
- **Custom Patterns**: Automatically detected through DOM analysis

Each block type preserves its semantic meaning while converting to portable format.

### Metadata Preservation

The migration preserves WordPress metadata through `_meta` fields:

```javascript
{
  _id: 'post-123',
  _type: 'post',
  _meta: {
    wpId: 123,
    wpSlug: 'original-wordpress-slug',
    wpUrl: 'https://example.com/original-url',
    sourceEnvelope: {
      // Original WordPress metadata
      postDate: '2024-01-01',
      modifiedDate: '2024-01-15',
      postType: 'post',
      postStatus: 'publish'
    }
  }
}
```

## Troubleshooting

### Low Conversion Rates

If semantic conversion is below 70%:

1. Check which blocks are falling back with the block distribution query
2. Review your site's HTML cache for proper WordPress block comments
3. Consider adding custom extractors for your theme's specific markup
4. Verify SCW Headless Assistant is version 2.1.0 or later

### Missing References

If author/category references are broken:

1. Verify the dependency import order (authors/taxonomies before posts)
2. Check slug consistency between WordPress and Sanity
3. Review the NDJSON files for proper `_ref` formatting

### Image Import Failures

If images aren't importing:

1. Verify the images exist in the WordPress uploads directory
2. Check Sanity asset upload permissions
3. Review image URLs in the NDJSON export for correct paths

## Version History

- **v2.1.0** - Current release with full Sanity v3 compatibility
- Improved pattern detection for Gutenberg and Kadence blocks
- Enhanced metadata preservation through source envelopes
- Generic PortableText output for multi-CMS compatibility

---

**Next Steps**: After successful migration, use Sanity Studio to review content, adjust schemas as needed, and begin building your headless frontend. The portable text format ensures your content works with any React/Vue/Svelte framework.

For detailed schema specifications, refer to **V13SCHEMADESIGN.md** in the plugin documentation.

Export Schema

Markdown
# Trustworthy Metadata for Phase 2 Migration

**Static Cache Wrangler Headless Assistant v2.1.0**

## Philosophy: "The Hard Parts"

Developers migrating from WordPress to headless CMS need **reliable, queryable metadata** they can trust. STCW Headless Assistant provides structured data that makes Phase 2 migration scriptable, predictable, and safe.

---

## What Makes Metadata "Trustworthy"?

### Stable IDs
```json
"_assetId": "image-hero-banner"
```
- **Never changes** across imports
- **Deterministic** (same source = same ID)
- **Queryable** with GROQ/GraphQL
- **Unique** per asset

### Clear Phase Tracking
```json
"_migration": {
  "phase": 1  // 1 = external URL, 2 = Sanity CDN
}
```
- **Know the state** of each asset
- **Query by phase** to find pending migrations
- **Track progress** across large datasets

### Complete Source Information
```json
"_migration": {
  "sourceUrl": "https://oc2.co/wp-content/uploads/stcw-assets/hero.jpg",
  "placeholderRef": "asset-placeholder-image-hero-banner"
}
```
- **Original URL** for reference
- **Placeholder reference** for replacement strategy
- **Verifiable** against asset manifest

### Upload Status Tracking
```json
"_migration": {
  "uploaded": false,
  "sanityAssetId": null,
  "uploadedAt": null
}
```
- **Resume capability** (skip already-uploaded)
- **Verify completion** (query for uploaded === false)
- **Audit trail** (when was it migrated?)

---

## Complete Data Structure

### Phase 1: Initial Import (External URLs)
```json
{
  "_type": "imageBlock",
  
  // Works immediately
  "url": "https://oc2.co/wp-content/uploads/stcw-assets/hero-banner.jpg",
  "alt": "Hero banner",
  "width": 1920,
  "height": 1080,
  
  // NOT populated yet (Phase 2)
  "asset": null,
  
  // Trustworthy metadata for migration
  "_assetId": "image-hero-banner",
  "_migration": {
    "phase": 1,
    "placeholderRef": "asset-placeholder-image-hero-banner",
    "sourceUrl": "https://oc2.co/wp-content/uploads/stcw-assets/hero-banner.jpg",
    "uploaded": false,
    "sanityAssetId": null,
    "uploadedAt": null
  }
}
```

### Phase 2: After Asset Upload (Sanity CDN)
```json
{
  "_type": "imageBlock",
  
  // Still works as fallback
  "url": "https://oc2.co/wp-content/uploads/stcw-assets/hero-banner.jpg",
  "alt": "Hero banner",
  "width": 1920,
  "height": 1080,
  
  // NOW populated - real Sanity asset
  "asset": {
    "_type": "reference",
    "_ref": "image-abc123xyz-1234567890ab"  // Real Sanity asset ID
  },
  
  // Updated metadata
  "_assetId": "image-hero-banner",  // Still stable
  "_migration": {
    "phase": 2,  // Updated
    "placeholderRef": "asset-placeholder-image-hero-banner",
    "sourceUrl": "https://oc2.co/wp-content/uploads/stcw-assets/hero-banner.jpg",
    "uploaded": true,  // Updated
    "sanityAssetId": "image-abc123xyz-1234567890ab",  // Updated
    "uploadedAt": "2025-12-26T18:45:00Z"  // Updated
  }
}
```

---

## How Developers Use This

### Query 1: Find All Pending Migrations
```groq
// All images still in Phase 1
*[_type == "page"] {
  _id,
  title,
  "pendingImages": content[
    _type == "imageBlock" && 
    _migration.phase == 1
  ] {
    _assetId,
    url,
    alt
  }
}[count(pendingImages) > 0]
```

### Query 2: Find Specific Asset Across All Pages
```groq
// Where is "image-hero-banner" used?
*[_type == "page" && content[]._assetId == "image-hero-banner"] {
  _id,
  title,
  "heroImages": content[_assetId == "image-hero-banner"]
}
```

### Query 3: Migration Progress Report
```groq
{
  "total": count(*[_type == "page"].content[_type == "imageBlock"]),
  "phase1": count(*[_type == "page"].content[
    _type == "imageBlock" && 
    _migration.phase == 1
  ]),
  "phase2": count(*[_type == "page"].content[
    _type == "imageBlock" && 
    _migration.phase == 2
  ]),
  "percentComplete": (
    count(*[_type == "page"].content[
      _type == "imageBlock" && 
      _migration.phase == 2
    ]) / 
    count(*[_type == "page"].content[_type == "imageBlock"])
  ) * 100
}
```

### Query 4: Verify All Assets Uploaded
```groq
// Should return empty array when done
*[_type == "page"].content[
  _type == "imageBlock" && 
  _migration.uploaded == false
] {
  _assetId,
  "pageId": ^._id,
  "pageTitle": ^.title
}
```

---

## Migration Script Patterns

### Pattern 1: Upload & Update (In-Place Patches)
```javascript
// For each asset in manifest
for (const asset of manifest.assets) {
  // 1. Upload to Sanity
  const uploaded = await client.assets.upload('image', fileStream);
  
  // 2. Find all documents with this _assetId
  const query = `*[_type == "page" && content[]._assetId == $assetId]`;
  const pages = await client.fetch(query, {assetId: asset.assetId});
  
  // 3. Update each page's content array
  for (const page of pages) {
    const updatedContent = page.content.map(block => {
      if (block._assetId === asset.assetId) {
        return {
          ...block,
          asset: {_type: 'reference', _ref: uploaded._id},
          _migration: {
            ...block._migration,
            phase: 2,
            uploaded: true,
            sanityAssetId: uploaded._id,
            uploadedAt: new Date().toISOString()
          }
        };
      }
      return block;
    });
    
    await client.patch(page._id).set({content: updatedContent}).commit();
  }
}
```

### Pattern 2: Upload Then Re-Import (Batch Strategy)
```javascript
// Step 1: Upload all assets, collect mapping
const assetMapping = {}; // assetId → sanityAssetId

for (const asset of manifest.assets) {
  const uploaded = await client.assets.upload('image', fileStream);
  assetMapping[asset.assetId] = uploaded._id;
}

// Step 2: Transform original data.ndjson
const originalData = fs.readFileSync('data.ndjson', 'utf8');
const lines = originalData.split('\n').filter(Boolean);

const updatedLines = lines.map(line => {
  const doc = JSON.parse(line);
  
  if (doc.content) {
    doc.content = doc.content.map(block => {
      if (block._type === 'imageBlock' && block._assetId) {
        const sanityAssetId = assetMapping[block._assetId];
        
        if (sanityAssetId) {
          return {
            ...block,
            asset: {_type: 'reference', _ref: sanityAssetId},
            _migration: {
              ...block._migration,
              phase: 2,
              uploaded: true,
              sanityAssetId
            }
          };
        }
      }
      return block;
    });
  }
  
  return JSON.stringify(doc);
});

fs.writeFileSync('data-with-assets.ndjson', updatedLines.join('\n'));

// Step 3: Re-import
// sanity dataset import data-with-assets.ndjson production --replace
```

### Pattern 3: Incremental Migration (High-Priority First)
```javascript
// Sort assets by priority (from manifest)
const sortedAssets = manifest.assets.sort((a, b) => {
  const priorityOrder = {high: 3, medium: 2, low: 1};
  return priorityOrder[b.usage.priority] - priorityOrder[a.usage.priority];
});

// Migrate high-priority assets first
for (const asset of sortedAssets) {
  if (asset.usage.priority === 'high') {
    await uploadAndUpdate(asset);
  }
}

// Later: migrate medium/low priority
```

---

## Why This Approach Works

### 1. **Resumable**
If script fails at asset #47 of 147:
```javascript
// Skip already-uploaded
const pending = manifest.assets.filter(a => !a.migration?.uploaded);
```

### 2. **Verifiable**
Audit migration state anytime:
```groq
*[_type == "page"].content[
  _type == "imageBlock"
]._migration.phase
```

### 3. **Flexible**
Use any workflow:
- Upload all at once (Pattern 2)
- Incremental by priority (Pattern 3)
- Per-page selective (Pattern 1)
- Resume after failure (all patterns)

### 4. **Safe**
External URLs work as fallback:
- Phase 1 content displays immediately
- Phase 2 migration can happen later
- No broken images during migration

### 5. **Traceable**
Every decision documented:
```json
"_migration": {
  "uploadedAt": "2025-01-15T10:30:00Z"  // Audit trail
}
```

---

## Asset Manifest Structure (v2.1.0)

**File:** `asset-manifest.json`

```json
{
  "version": "2.1.0",
  "generatedAt": "2025-01-15T09:00:00Z",
  "sourceUrl": "https://oc2.co",
  "assetCount": 147,
  
  "assets": [
    {
      "assetId": "image-hero-banner",
      "type": "image",
      "sourceUrl": "https://oc2.co/wp-content/uploads/stcw-assets/hero-banner.jpg",
      "relativePath": "stcw-assets/hero-banner.jpg",
      "dimensions": {
        "width": 1920,
        "height": 1080
      },
      "fileSize": 245760,
      "mimeType": "image/jpeg",
      "usage": {
        "count": 12,
        "pages": ["/", "/about/", "/services/"],
        "priority": "high"
      },
      "migration": {
        "phase": 1,
        "uploaded": false,
        "sanityAssetId": null,
        "uploadedAt": null
      }
    }
  ],
  
  "summary": {
    "totalAssets": 147,
    "totalSize": 12845760,
    "byType": {
      "image": 145,
      "video": 2
    },
    "byPriority": {
      "high": 23,
      "medium": 89,
      "low": 35
    }
  }
}
```

### Manifest Fields Explained

#### Asset Entry
- **assetId**: Stable, deterministic identifier (never changes)
- **type**: Asset type (image, video, etc.)
- **sourceUrl**: Original WordPress URL (for reference)
- **relativePath**: Path for rsync/deployment (for developers)
- **dimensions**: Image width/height (for responsive layouts)
- **fileSize**: Bytes (for optimization decisions)
- **mimeType**: MIME type (for proper handling)

#### Usage Tracking
- **count**: How many times this asset appears
- **pages**: Which pages use it (for impact analysis)
- **priority**: high/medium/low (for incremental migration)

#### Migration Status
- **phase**: 1 (external URL) or 2 (Sanity CDN)
- **uploaded**: Boolean (for resume capability)
- **sanityAssetId**: Populated after upload (for queries)
- **uploadedAt**: ISO timestamp (for audit trail)

---

## FAQ

### Q: Why not just use Sanity asset references from the start?
**A:** WordPress assets aren't in Sanity yet. Phase 1 (external URLs) works immediately, Phase 2 (Sanity CDN) happens when developers are ready.

### Q: Can I skip Phase 2 entirely?
**A:** Yes! Phase 1 (external URLs) works permanently. Phase 2 is optional optimization.

### Q: What if I lose the asset manifest?
**A:** The `_assetId` and `_migration` fields in the content itself contain all critical info. The manifest is for convenience.

### Q: Can I update _migration fields manually?
**A:** Absolutely. They're just data. Query and patch however you want.

### Q: What happens if I delete the external URL source?
**A:** If Phase 2 migration is complete (asset refs populated), external URLs aren't needed. If not, images will break (which is why Phase 2 exists).

### Q: Should I keep external URLs after Phase 2?
**A:** Yes, but keeping it as fallback is safer. If Sanity CDN has issues, images still load.

### Q: What if I want to use a different CDN (not Sanity)?
**A:** Replace the `asset` reference with your CDN URL. The `_assetId` is stable, so you can query and update.

### Q: Can I migrate some assets and not others?
**A:** Absolutely. High-traffic pages in Phase 2, low-traffic pages can stay Phase 1 indefinitely.

### Q: How do I know which assets to prioritize?
**A:** Check `usage.priority` in asset manifest. High-priority assets are used most frequently.

### Q: Can I change the metadata fields?
**A:** Yes! This is your system. Add fields, remove fields, adapt to your needs. The structure is a reference.

---

## Summary: Trust Through Structure

**Trustworthy metadata is:**
- **Stable** - IDs don't change
- **Queryable** - GROQ/GraphQL friendly
- **Complete** - All info needed for migration
- **Verifiable** - Can audit progress
- **Resumable** - Safe to interrupt/restart
- **Flexible** - Adapt to any workflow

**Developers can:**
- Query exact migration state
- Script updates confidently
- Resume from failures
- Verify completion
- Audit the process
- Use any tools they prefer

**We provide:**
- Data structure (reliable)
- Example scripts (reference)
- Documentation (clear)
- **NOT:** Opinionated workflows

This is **composable CLI design**: give smart people good data, they'll figure out the rest.

---

## Version History

- **v2.1.0** - Current release
  - Enhanced metadata preservation through source envelopes
  - Improved asset manifest structure
  - Additional GROQ query examples
  - Expanded FAQ section

---

**Document Version:** 2.1.0  
**Last Updated:** January 2025  
**Maintained By:** Derick Schaefer  
**Contributors:** Claude (Anthropic)  

For migration workflow documentation, see **MIGRATION.md** in the plugin documentation.