Web to Markdown Converter

Overview

Convert web page content to Markdown format by providing a URL. Supports three modes: standard Markdown for human reading, AI-optimized format for use as AI agent context, and dual mode that generates both. Ideal for archiving web documentation as local Markdown files or preparing content for RAG systems.

When to Use

Trigger this skill when the user requests:

"Convert this web page to Markdown"
"Save this URL as Markdown"
"Archive this web page"
"Save this blog post as Markdown"
"Convert to AI-readable format" (AI-optimized mode)
"Make it suitable for context use" (AI-optimized mode)
"Give me both the original and AI-optimized version" (dual mode)

Core Workflow

Step 1: Get the URL

Obtain the target web page URL from the user.

Example:

Claude: Please enter the web page URL to convert.
User: https://example.com/article

Important:

URL must start with http:// or https://
HTTP URLs are automatically upgraded to HTTPS
Invalid URLs return an error

Step 2: Select Conversion Mode

Analyze the user's request to choose the appropriate mode.

Conversion Modes:

Standard Mode (default): Convert to human-readable Markdown
AI-Optimized Mode: Convert to format best suited as AI agent context
Dual Mode: Generate both original + AI-optimized files

Auto-detect keywords:

"AI readable", "for context", "AI learning" -> AI-optimized mode
"original", "both", "two versions" -> Dual mode
All other requests -> Standard mode

Step 3: Confirm Save Options

Ask the user for save location and filename.

Example:

Claude: Where should the Markdown file be saved?
Options:
1. Current directory (./)
2. Specify a path
3. Don't save, just display content

Filename? (default: webpage.md)
User: Save in current directory, filename article.md

Step 4: Fetch and Convert

Use the WebFetch tool to retrieve and convert the web page.

Standard Mode Prompt

url = "https://example.com/article"
prompt = "Convert the entire web page content to Markdown format. Include all headings, body text, links, and images, but exclude unnecessary navigation and ads."

AI-Optimized Mode Prompt

url = "https://example.com/article"
prompt = """Convert this web page to a format optimized for AI agent context use:

**Required structure:**

1. **Front Matter (YAML format)**
---
title: "Page title"
url: "Original URL"
author: "Author (if available)"
date: "Publication date (if available)"
word_count: Approximate word count
topics: ["topic1", "topic2", "topic3"]
summary: |
  3-5 line summary of the core content
  for quick AI comprehension
main_points:
  - Key point 1
  - Key point 2
  - Key point 3
content_type: "tutorial|guide|article|documentation|news|blog"
difficulty: "beginner|intermediate|advanced"
---

2. **Body Structure**
# [Original Title]

## Core Summary
[3-5 lines clearly explaining the content]

## Main Content
[Clear hierarchical sections using H2/H3]

## Key Insights
- Insight 1
- Insight 2

## Practical Applications
[How to apply this content]

## Related Resources
[Links with descriptions, if available]

## Conclusion
[Summary]

**Conversion rules:**
- Remove all ads, navigation, footers, sidebars
- Code blocks must specify language
- Links use [description](URL) format
- Images use ![description](URL) format
- Remove unnecessary filler words, keep it concise
- Lists use clear bullet points
- Important concepts in **bold**

**Goal:**
Enable an AI to grasp the core content within 3 seconds
and accurately answer user questions about it.
"""

Important:

WebFetch automatically converts HTML to Markdown
15-minute cache for faster repeated requests to the same URL
Redirects are automatically followed
AI-optimized mode saves 30-50% tokens with clearer structure

Step 5: Save the Markdown

Save the converted Markdown as a file.

AI-optimized mode recommended filenames:

Standard: article.md
AI-optimized: article-ai-optimized.md or article.context.md

Step 6: Confirm Result

Show the user the saved file path and brief statistics:

Web page converted to Markdown!

File: article.md
Path: /path/to/article.md
Size: ~1,234 words

Dual Mode Workflow

Dual mode generates both standard Markdown and AI-optimized version simultaneously.

Dual Mode Steps

Confirm URL and filename with the user
Generate standard Markdown using standard mode prompt via WebFetch
Generate AI-optimized version using AI-optimized prompt on the same URL
Report results:

Dual mode conversion complete! 2 files generated.

Standard Markdown:
- File: useState.md
- Size: ~3,500 words
- Use: Human-readable original

AI-Optimized:
- File: useState.context.md
- Size: ~2,100 words (40% saved)
- Use: AI agent context

Tips:
- Standard (.md) for human reading
- AI-optimized (.context.md) for RAG systems or AI agent context

Dual Mode File Naming

Pattern	Standard	AI-Optimized
Extension (recommended)	`article.md`	`article.context.md`
Suffix	`article.md`	`article-ai-optimized.md`
Folder	`docs/original/article.md`	`docs/optimized/article.md`

Dual Mode Benefits

Preserve original: Human-readable content stays intact
AI efficiency: AI version saves tokens with structured format
Separate by use: Choose the right file for the purpose
Backup effect: Two formats archived simultaneously
Comparable: Analyze differences between original and optimized

Advanced Options

Batch Convert Multiple URLs

User: Convert all these URLs to Markdown
- https://example.com/article1
- https://example.com/article2
- https://example.com/article3

Claude: Converting 3 web pages. Auto-generate filenames or specify each?
User: Auto-generate

Claude: [Converts each URL and saves as article1.md, article2.md, article3.md]

Extract Specific Sections

User: https://example.com/docs -- only extract the "Installation" section
Claude: [Specifies "extract only the Installation section" in WebFetch prompt]

Custom Markdown Format

Specify desired Markdown styling during conversion.

Dynamic Content Handling

Problem: JavaScript-Rendered Pages

WebFetch only fetches static HTML, so React, Vue, Next.js, and similar JavaScript-rendered pages may return empty content.

Symptoms:

Converted Markdown is nearly empty
Only "Loading..." placeholders appear
Core content is missing

Solution: Playwright Fallback

When WebFetch returns empty or insufficient content, ask the user if they want to use Playwright.

MCP Playwright (Recommended)

// 1. Navigate to page
mcp__playwright__navigate({ url: "https://example.com" })

// 2. Wait for JavaScript rendering
mcp__playwright__waitForLoadState({ state: "networkidle" })

// 3. Get HTML content
const htmlContent = mcp__playwright__getContent()

// 4. Convert to Markdown

Node Playwright (Fallback)

node << 'EOF'
const playwright = require('playwright');

(async () => {
  const browser = await playwright.chromium.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto('https://example.com', { waitUntil: 'networkidle' });
  await page.waitForTimeout(3000);
  const content = await page.content();
  console.log(content);
  await browser.close();
})();
EOF

Workflow Summary

1. WebFetch attempt (fast)
   |
2. Validate result (is content sufficient?)
   | Insufficient
3. Ask user (use Playwright?)
   | Yes
4. Playwright retry
   ├─ MCP Playwright (recommended) or
   └─ Node Playwright (fallback)
   |
5. Convert to Markdown and save

MCP Playwright vs Node Playwright

Feature	MCP Playwright	Node Playwright
Install	MCP server setup	`npm install playwright`
Invocation	MCP tool call	Bash command
Session mgmt	Automatic	Manual (script required)
Error handling	Simple	Complex
Claude Code integration	Native	Indirect
Recommendation	Highly recommended	General

Error Handling

Error	Message
Invalid URL	"Invalid URL. Please enter a complete URL starting with http:// or https://."
Inaccessible page	"Cannot access the web page. The page may be deleted, require access permissions, or have a network error."
File save error	"Cannot save file. Please verify the path is correct, you have write permissions, and the directory exists."

Best Practices

Use descriptive filenames that reflect the content
Organize folders by topic when batch converting
Verify URLs before conversion
Respect copyright of web page content
Personal archival -- primarily for personal reference use

Tips

Long documents: Very long web pages may be summarized
Dynamic content: JavaScript-rendered content can be handled via Playwright
Images: Included as original URL links (not downloaded)
Repeated requests: Same URL within 15 minutes uses cached version
MCP Playwright: Recommended for sites with heavy dynamic content

Web to Markdown Converter

Web to Markdown Converter

Overview

When to Use

Core Workflow

Step 1: Get the URL

Step 2: Select Conversion Mode

Step 3: Confirm Save Options

Step 4: Fetch and Convert

Standard Mode Prompt

AI-Optimized Mode Prompt

Step 5: Save the Markdown

Step 6: Confirm Result

Dual Mode Workflow

Dual Mode Steps

Dual Mode File Naming

Dual Mode Benefits

Advanced Options

Batch Convert Multiple URLs

Extract Specific Sections

Custom Markdown Format

Dynamic Content Handling

Problem: JavaScript-Rendered Pages

Solution: Playwright Fallback

MCP Playwright (Recommended)

Node Playwright (Fallback)

Workflow Summary

MCP Playwright vs Node Playwright

Error Handling

Best Practices

Tips

相关技能 Related Skills

Card News Generator v2 - Auto Mode

Work Record Automation (Workthrough)

Canvas Design