Kuvaus
Markdown for Agents and Statistics converts your WordPress content to Markdown and serves it
to AI agents and language model tools that request it via HTTP content negotiation
(Accept: text/markdown).
The Chancery Lane Project is a charity that helps organisations reduce emissions using the power of legal documents and processes. We’ve published this plugin as we believe that making content more legible for AI Agents makes a meaningful difference to their energy usage – not only by reducing the amount of tokens required (by up to 90% over HTML) to consume the content, but also minimising the server resources required to render, process and display pages at source.
How it works:
- Posts and taxonomy archive pages are converted to Markdown and saved as static
files on disk insidewp-content/uploads/. - When a visitor (or AI agent) requests a page with
Accept: text/markdownin
the HTTP headers, WordPress serves the pre-generated.mdfile directly —
no page render required. - A
<link rel="alternate" type="text/markdown">tag is added to each page’s
so agents can discover Markdown versions automatically.
Features:
- Content negotiation (
Accept: text/markdown,?output_format=md, or known AI User-Agents) - Taxonomy archive support — category, tag, and custom taxonomy term pages served as Markdown post listings
- Automatic Markdown generation on post save; taxonomy archives auto-update when any post in the term changes
- AJAX bulk generation with live progress counter — no page timeouts on large sites
- Per-post-type field configuration — choose which meta/ACF fields go in frontmatter or body
- ACF support with dot notation for nested group fields (e.g.
group.subfield) - Content fields option — use ACF fields as the body content instead of post_content
- Manifest generation with content hashes and change tracking per post type
- Incremental export — only re-export changed documents (
--incremental) - Delta file (
changes.json) for RAG system sync - Access statistics — logs AI agent requests with a dedicated stats admin page
- Access grouping by class of agent
- Optional frontmatter fields — hierarchy (parent/ancestors/children IDs), author display name, root-relative featured image paths
- Topics section — appends a
## Topicssection with linked taxonomy terms to the Markdown body - Export preview — preview generated Markdown inline in the post editor without writing to disk
- WP-CLI commands:
generate,generate-taxonomies,prune-stats,status,delete - Fully unit-tested
Asennus
- Upload the plugin to
/wp-content/plugins/markdown-for-agents/, or install via the WordPress Plugins screen. - Activate the plugin through the Plugins screen in WordPress.
- Visit Settings Markdown for Agents and choose which post types and taxonomies to generate.
- Enable Auto-generate on save so files stay in sync as you publish or edit content (optional).
- Click Generate All to create Markdown for your existing content. On large sites you can also run
wp markdown-agents generateandwp markdown-agents generate-taxonomiesfrom WP-CLI. - Verify by appending
?output_format=mdto any post URL (or using an AI User-Agent) to confirm Markdown is served.
UKK
-
Where are the Markdown files stored?
-
Inside
wp-content/uploads/{export_dir}/(configurable in Settings). Post files
live under{export_dir}/{post-type}/{slug}.md. Taxonomy archive files live under
{export_dir}/taxonomy/{taxonomy}/{term-slug}.md. The directory is served by
WordPress when content negotiation is triggered. -
Will this slow down my site?
-
No. Markdown files are generated ahead of time (on post save or via manual/CLI
bulk generation). Serving them is a simple file read, much faster than rendering
a full WordPress page. -
AI agents are getting HTML instead of Markdown. Why?
-
Almost always this is a CDN, firewall, or page cache sitting in front of
WordPress — not the plugin. On many hosts (for example Cloudflare in front of WP
Engine) the edge answers a request before it ever reaches the plugin: a full-page
cache can return the cached HTML, or a bot/WAF rule can block a known AI crawler
with a 403/429.The reliable route is the query parameter: append
?output_format=mdto any post
or archive URL. Because that is a distinct URL, caches store it separately and
firewalls treat it as an ordinary request, so it reaches the plugin even on a
hardened stack. The plugin advertises this URL automatically via a
tag in each page’s<head>, so
agents that read the page can discover and follow it.The
Accept: text/markdownheader and User-Agent routes also work, but only if
your CDN/cache is configured to let them through (see the next question). -
How do I let my CDN or cache serve Markdown to agents?
-
This is host/CDN configuration, not a plugin setting. Two changes help:
- Page cache (WP Engine, LiteSpeed, Varnish, nginx): exclude agent-shaped
requests from the full-page cache — any request whoseAcceptheader contains
text/markdown, whose query string containsoutput_format=md, or whose
User-Agent is a known AI bot. Do not add User-Agent to the cache key; that
fragments the cache for every visitor. Exclude from caching, do not key on it. - Firewall / bot rules (Cloudflare): add a skip/allow rule for the AI
User-Agents you want to serve (for example GPTBot, ClaudeBot, PerplexityBot,
Google-Extended). Otherwise they receive a 403/429 and get nothing.
If you skip this, nothing breaks — agents simply use the
?output_format=mdURL
via discovery instead. The plugin already protects against the reverse problem:
Markdown responses are sent withCache-Control: private, no-storeand
Vary: Accept, User-Agent, so a shared cache cannot replay the Markdown to a
human browser on the same URL. - Page cache (WP Engine, LiteSpeed, Varnish, nginx): exclude agent-shaped
-
How can I check what an agent actually receives?
-
Request a page the way an agent would and inspect the response headers:
` -
Query-param route (the reliable one)
-
curl -sI ’https://example.com/your-post/?output_format=md’
-
Accept-header route
-
curl -sI -H ’Accept: text/markdown’ ’https://example.com/your-post/’
`A genuine Markdown response from the plugin has
Content-Type: text/markdownand
anX-Markdown-Source: markdown-for-agentsheader. If you instead see
Content-Type: text/html, the request was answered by a cache or firewall before
reaching the plugin (see the previous questions). Note that running these from
your own server may bypass your CDN; testing from an external network shows what
real agents experience. -
Should I publish an llms.txt file?
-
llms.txt is a proposed convention for a single Markdown index of your site at
https://example.com/llms.txt, aimed at AI tools that look for a site-level
manifest. It is an emerging community convention, not an official standard, and
there is limited evidence that the major AI crawlers consume it yet — so treat it
as low-cost, optional, and complementary to the per-page discovery this plugin
already provides.This plugin does not generate
llms.txt. If you want one, publish a static file at your web root listing your
key pages with their?output_format=mdURLs, and keep it in sync with published
and retired content or it will point agents at missing pages. -
What are taxonomy archive files?
-
For every public taxonomy term (categories, tags, custom taxonomies) the plugin
generates a Markdown file listing all published posts in that term with links and
excerpts. These are served automatically when an AI agent requests a taxonomy
archive URL. This lets agents navigate your site structure by exploring term listings,
not just individual posts. -
What is the manifest.json file?
-
When you generate with
--with-manifestor--incremental, amanifest.jsonis
created inside each post-type export folder (e.g.wp-mfa-exports/post/manifest.json).
It contains a registry of all exported documents with content hashes and change
tracking (new/modified/unchanged/deleted), enabling RAG systems to identify what
changed since the last export without reprocessing all documents. -
How does incremental export work?
-
Use
wp markdown-agents generate --incrementalto only re-export documents that
have changed since the last export. The plugin compares content hashes against the
previous manifest.json and skips unchanged posts. This also generates a
changes.json delta file listing new, modified, and deleted documents — your RAG
system can read this to know exactly what to re-embed. -
How do I configure fields per post type?
-
In Settings Markdown for Agents, each enabled post type has its own
”Field Configuration” section with two textareas:- Frontmatter fields — meta or ACF fields added to the YAML frontmatter.
- Content fields — meta or ACF fields used as the body content. When set,
post_content is automatically excluded.
Use dot notation for ACF group fields (e.g.
clause_fields.clause_summary).
Plain meta keys work too (e.g._yoast_wpseo_title). ACF relationship fields
are automatically converted to a list of post titles. -
Can I customise the Markdown output?
-
Yes. Several filters are available:
markdown_for_agents_pre_convert— filter HTML before conversionmarkdown_for_agents_post_convert— filter Markdown after conversionmarkdown_for_agents_frontmatter— modify frontmatter fields for a postmarkdown_for_agents_taxonomy_frontmatter— modify frontmatter fields for a taxonomy archivemarkdown_for_agents_serve_enabled— enable/disable serving for a specific postmarkdown_for_agents_serve_taxonomies— enable/disable serving for taxonomy archive pagesmarkdown_for_agents_cache_headers— override the cache-related headers sent with the Markdown responsemarkdown_for_agents_file_generated— action fired after a file is writtenmarkdown_for_agents_file_deleted— action fired after a file is deleted
-
Can I let CDNs/full-page caches cache the Markdown responses?
-
By default the Markdown response is sent with
Cache-Control: private, no-store, max-age=0(plusX-LiteSpeed-Cache-Control,X-Accel-ExpiresandVary: Accept, User-Agent). This is deliberate: the Markdown is negotiated on the same URL as the HTML page, so a shared cache that ignores or normalisesVarycould otherwise store the Markdown variant and replay it to ordinary browsers expecting HTML.If your CDN/cache layer honours
Varycorrectly (or you serve Markdown from distinct URLs), you can relax this with themarkdown_for_agents_cache_headersfilter. Map any header to an empty string to omit it entirely:`add_filter( ’markdown_for_agents_cache_headers’, function ( array $headers, string $filepath ) {
$headers[’Cache-Control’] = ’public, max-age=300’;
$headers[’X-LiteSpeed-Cache-Control’] = ”;
$headers[’X-Accel-Expires’] = ”;
return $headers;
}, 10, 2 );
`This filter governs only the cache-related headers listed above. The
Content-SignalandX-Markdown-Sourceheaders are sent separately and are unaffected (Content-Signalhas its ownmarkdown_for_agents_content_signalfilter).Override with caution — incorrectly cached Markdown will be served to browsers.
-
How do I generate taxonomy archives via WP-CLI?
-
wp markdown-agents generate-taxonomies
wp markdown-agents generate-taxonomies --taxonomy=category
wp markdown-agents generate-taxonomies --dry-run
Arvostelut
There are no reviews for this plugin.
Avustajat & Kehittäjät
“Markdown for Agents and Statistics” perustuu avoimeen lähdekoodiin. Seuraavat henkilöt ovat osallistuneet tämän lisäosan kehittämiseen.
AvustajatKäännä “Markdown for Agents and Statistics” omalle kielellesi.
Oletko kiinnostunut kehitystyöstä?
Browse the code, check out the SVN repository, or subscribe to the development log by RSS.
Muutosloki
1.5.1
- Add
markdown_for_agents_cache_headersfilter so the cache-related headers on Markdown responses can be customised (e.g. to allow CDN caching whereVaryis honoured). Defaults are unchanged and remain cache-bypassing.
1.5.0
- Add new ’skipped’ grouping on generating MD files to show those that have been skipped for good reason (password or draft etc) rather than failed.
- Add new ’Agent Class’ graph display on Agent Stats page which mimics Known Agents classifications to help understand traffic patterns
- Better documentation for caching and generation logic
1.4.5
- Fix: Issues where memcache could cause problems on CLI invoked rebuilds on large sites. Also resolves minor issues with and outputs generated by post filters appearing in MD output, while allowing for same in
blocks where needed.
1.4.4
- Fix: full-page caches (LiteSpeed, Varnish, nginx fastcgi_cache) could store the Markdown response under a page URL when an AI agent or
?output_format=mdrequest hit it first, then replay the.mdbody to subsequent HTML browser requests. Markdown responses now sendCache-Control: private, no-store,X-LiteSpeed-Cache-Control: no-cache,X-Accel-Expires: 0, andVary: Accept, User-Agentunconditionally.
1.4.3
- Update to fix deleting posts on status change outside of auto-update flow
1.4.2
- Fixed issue with private/draft posts being created as MD files and added checkbox to post edit pages to exclude posts from MD generation. Also fixes small issue in unusual taxonomy slugs prodducing incorrect URLs in Topics secion of MD body. Adds Strauss namespacing to html-to-markdown/Composer includes to avoid collisions.
1.4.1
- Removed
llms.txtindex generation. TheLlmsTxtGeneratorclass, its--with-llmstxtWP-CLI flag onwp markdown-agents generate, and the corresponding unit tests have been dropped.
1.4.0
- Add notices and copy around generating and regenerating content on install and updates to Settings
- Add transient to store and note when content needs regenerating
1.3.0
- Optional hierarchy frontmatter fields (
parent,ancestors,childrenIDs) for hierarchical post types (pages, etc.). - Optional author display name in frontmatter.
- Optional root-relative paths for featured images (survives domain migrations).
- Optional
## Topicssection appended to the Markdown body with linked taxonomy terms. - Export preview — ”Preview Markdown” button in the post meta box renders generated Markdown inline without writing to disk.
- New WP-CLI command:
wp markdown-agents prune-stats [--days=<n>] [--yes]— removes access stats older than N days. - Manifest hash now covers taxonomy term slugs — incremental export correctly detects posts whose terms changed.
1.2.0
- Taxonomy archive support — generates Markdown index files for all public taxonomy terms (categories, tags, custom taxonomies), served via content negotiation.
- Taxonomy archives auto-regenerate when any post in the term is saved or deleted.
- AJAX bulk generation for taxonomy archives on the Settings page with live progress counter.
- New WP-CLI command:
wp markdown-agents generate-taxonomies [--taxonomy=<slug>] [--dry-run]. <link rel="alternate" type="text/markdown">tag now emitted on taxonomy archive pages.- New filter:
markdown_for_agents_serve_taxonomiesto enable/disable taxonomy archive serving globally. - New filter:
markdown_for_agents_taxonomy_frontmatterto modify taxonomy archive frontmatter before serialisation. - Bulk generation buttons converted to AJAX with live counter — no more page timeouts on large sites.
1.1.0
- Per-post-type field configuration for frontmatter and content fields.
- ACF support with dot notation for nested group fields.
- Content fields option — use ACF/meta fields as body content instead of post_content.
- ACF relationship fields automatically normalised to post titles.
- Added manifest.json generation with content hashes and change tracking.
- New
--with-manifestflag forwp markdown-agents generate. - Manifest is generated per post-type folder for independent change tracking.
- Incremental export via
--incremental— skips unchanged documents. - Delta file (
changes.json) generated for RAG system integration. - Access statistics — logs AI agent requests; dedicated stats admin page.
- UA detection — configurable User-Agent strings force Markdown serving.
1.0.0
- Initial release.
