Standalone Usage
The core API of next-markdown-mirror works without Next.js. You can use it in Express, plain Node.js scripts, or any JavaScript/TypeScript environment.
The core works anywhere
The Next.js-specific code (proxy, createMarkdownHandler, etc.) lives in the next-markdown-mirror/nextjs entry point. Everything else — HtmlToMarkdown, extractContent, filterContent, generateLlmsTxt, and more — is in the base next-markdown-mirror entry point and has no Next.js dependency.
Express.js middleware
Serve Markdown responses from an Express app:
import express from 'express';
import { HtmlToMarkdown, isMarkdownRequest } from 'next-markdown-mirror';
const app = express();
const converter = new HtmlToMarkdown({
contentSelectors: ['article', 'main'],
extractJsonLd: true,
});
app.get('*', async (req, res, next) => {
// Only intercept AI agent requests
if (!isMarkdownRequest(req)) {
return next();
}
// Fetch the HTML from your own server or a CMS
const htmlResponse = await fetch(`http://localhost:3000${req.path}`);
const html = await htmlResponse.text();
const result = converter.convert(html);
res.set({
'Content-Type': 'text/markdown; charset=utf-8',
'Vary': 'Accept',
'x-markdown-tokens': String(result.tokenCount),
});
res.send(result.markdown);
});
Node.js script
Convert a remote page to Markdown in a script:
import { HtmlToMarkdown } from 'next-markdown-mirror';
const converter = new HtmlToMarkdown({
contentSelectors: ['article', 'main'],
extractJsonLd: true,
});
const response = await fetch('https://example.com/blog/my-post');
const html = await response.text();
const result = converter.convert(html);
console.log('Title:', result.title);
console.log('Tokens:', result.tokenCount);
console.log('JSON-LD:', result.jsonLd);
console.log('---');
console.log(result.markdown);
Using individual functions
You can use the extraction and filtering functions independently for more control:
Extract main content
import { extractContent } from 'next-markdown-mirror';
const html = '<html><body><nav>...</nav><main><h1>Hello</h1></main></body></html>';
const mainContent = extractContent(html, ['main', 'article']);
// '<h1>Hello</h1>'
Filter non-content elements
import { filterContent } from 'next-markdown-mirror';
const cleaned = filterContent(html, {
exclude: ['.sidebar', '.ads', '.cookie-banner'],
include: ['pre'], // keep code blocks even if inside excluded areas
});
Extract JSON-LD
import { extractJsonLd } from 'next-markdown-mirror';
const schemas = extractJsonLd(html);
// [{ "@type": "Article", "headline": "My Post", ... }]
Count tokens
import { countTokens } from 'next-markdown-mirror';
const tokens = countTokens(markdownContent);
console.log(`Approximately ${tokens} tokens`);
Check if a request wants Markdown
import { isMarkdownRequest } from 'next-markdown-mirror';
// Works with any standard Request object
const request = new Request('https://example.com/?v=md');
isMarkdownRequest(request); // true
const request2 = new Request('https://example.com/', {
headers: { Accept: 'text/markdown' },
});
isMarkdownRequest(request2); // true
llms.txt generation
Generate llms.txt files standalone without a route handler:
import { generateLlmsTxt, generateLlmsFullTxt } from 'next-markdown-mirror';
// Basic llms.txt
const txt = await generateLlmsTxt({
siteName: 'My Site',
baseUrl: 'https://example.com',
description: 'A great website.',
pages: [
{ url: '/', title: 'Home', description: 'Welcome page' },
{ url: '/about', title: 'About' },
{ url: '/docs', title: 'Documentation', section: 'resources' },
],
sections: {
resources: { title: 'Resources' },
},
});
// Write to a file
import { writeFileSync } from 'fs';
writeFileSync('public/llms.txt', txt);
// Full-text variant
const fullTxt = await generateLlmsFullTxt({
siteName: 'My Site',
baseUrl: 'https://example.com',
pages: [{ url: '/', title: 'Home' }],
});
writeFileSync('public/llms-full.txt', fullTxt);
Using a sitemap
Pass a sitemap URL as the pages value to auto-discover pages:
import { generateLlmsTxt, parseSitemap } from 'next-markdown-mirror';
// Option 1: pass sitemap URL directly
const txt = await generateLlmsTxt({
siteName: 'My Site',
baseUrl: 'https://example.com',
pages: 'https://example.com/sitemap.xml',
});
// Option 2: parse sitemap separately for more control
const pages = await parseSitemap(
'https://example.com/sitemap.xml',
'https://example.com'
);
// Filter or modify pages before generating
const docsPages = pages.filter(p => p.url.startsWith('/docs'));
const txt2 = await generateLlmsTxt({
siteName: 'My Docs',
baseUrl: 'https://example.com',
pages: docsPages,
});