WXR to CSV Converter - WordPress Data Migration
A Python utility that converts WordPress eXtended RSS (WXR) export files to CSV format, making WordPress data easily accessible for analysis, migration, and integration with other systems.
Overview
The WXR to CSV Converter is a powerful Python script designed to bridge the gap between WordPress and other data systems. It transforms complex WordPress export files into clean, structured CSV format, making it easy to analyze, migrate, or integrate WordPress content with external tools and platforms.
Key Features
Comprehensive Data Conversion
- Complete Export Handling: Converts posts, pages, and custom post types from WordPress exports
- Metadata Preservation: Maintains categories, tags, custom fields, and all WordPress metadata
- Content Processing: Properly handles HTML content, special characters, and formatting
- Flexible Output: Comprehensive CSV format with all essential WordPress data points
- Custom Post Types: Support for any custom post type defined in WordPress
Command-Line Interface
- Simple Usage: Easy command-line operation perfect for automation and scripting
- Flexible Options: Specify output files, filter by post types, and customize processing
- No Dependencies: Uses only Python standard library for maximum compatibility
- Cross-Platform: Works seamlessly on Windows, macOS, and Linux systems
- Batch Processing: Handle multiple exports efficiently
Data Integrity
- UTF-8 Encoding: Proper handling of international characters and special symbols
- HTML Preservation: Maintains content formatting and structure during conversion
- Complete Metadata: Exports all WordPress post and page metadata without loss
- Error Handling: Robust error handling with informative messages
- Data Validation: Ensures data integrity throughout the conversion process
Technology Stack
- Language: Python 3.6+ (uses standard library only)
- Format Support: XML parsing with CSV output
- Dependencies: None (fully self-contained)
- Compatibility: Works with all WordPress versions that support WXR export
Installation & Setup
Requirements
- Python 3.6 or higher
- No additional packages needed (uses Python standard library only)
Installation Steps
-
Clone or download the repository:
git clone https://github.com/LunarBit-dev/WXR-to-CSV.git
cd WXR-to-CSV -
Verify Python installation:
python --version
# or
python3 --version -
Test the script:
python wxr_to_csv.py --help
Usage
Basic Usage
Convert a WordPress export file to CSV:
python wxr_to_csv.py your_wordpress_export.xml
Advanced Usage
Specify Output File
python wxr_to_csv.py your_wordpress_export.xml -o output.csv
Filter by Post Types
# Include only posts and pages
python wxr_to_csv.py export.xml -t post page
# Include custom post types
python wxr_to_csv.py export.xml -t post page product testimonial
Auto-run Script
# Interactive guided conversion
python autorun.py
Python Script Usage
from wxr_to_csv import WXRToCSVConverter
# Create converter instance
converter = WXRToCSVConverter()
# Convert specific post types
converter.convert_to_csv('export.xml', 'output.csv', ['post', 'page'])
Command Line Options
usage: wxr_to_csv.py [-h] [-o OUTPUT] [-t TYPES [TYPES ...]] input_file
Convert WordPress eXtended RSS (WXR) files to CSV format
positional arguments:
input_file Path to the WXR file to convert
optional arguments:
-h, --help Show this help message and exit
-o OUTPUT, --output OUTPUT
Output CSV file path (default: same name as input with .csv extension)
-t TYPES [TYPES ...], --types TYPES [TYPES ...]
Post types to include (default: post page)
CSV Output Format
The generated CSV file includes comprehensive WordPress data with the following columns:
Basic Information
- post_id: WordPress post ID
- title: Post/page title
- post_type: Type of content (post, page, custom types)
- status: Publication status (publish, draft, private, etc.)
- creator: Author username
- link: Post URL/permalink
Content Data
- description: Post excerpt/description
- content: Full post content (HTML preserved)
- excerpt: Post excerpt
- post_name: URL slug/post name
Dates & Timestamps
- post_date: Publication date
- post_modified: Last modification date
- pub_date: RSS publication date
- post_date_gmt: Publication date (GMT)
- post_modified_gmt: Modification date (GMT)
Taxonomy & Classification
- categories: Categories (semicolon-separated)
- tags: Tags (semicolon-separated)
Settings & Configuration
- comment_status: Comment settings (open, closed)
- ping_status: Pingback/trackback settings
- post_parent: Parent post ID (for hierarchical content)
- menu_order: Menu order
- is_sticky: Sticky post flag
- post_password: Password protection
Extended Data
- custom_fields: Custom field data (JSON format)
Getting WordPress Export Files
Export Process
- Log into WordPress Admin: Access your WordPress dashboard
- Navigate to Tools: Go to Tools → Export
- Select Content: Choose All content or select specific content types:
- Posts
- Pages
- Media (attachments)
- Custom post types
- Download: Click Download Export File
- Save File: Save the
.xmlfile to your computer
Export Options
- All Content: Exports everything (recommended for migration)
- Posts: Only blog posts
- Pages: Only static pages
- Media: Only attachments and media files
- Specific Authors: Filter by author
- Date Range: Filter by publication date
Use Cases
Data Migration
- Platform Migration: Moving from WordPress to other CMS platforms
- Database Migration: Transferring content to different database systems
- System Integration: Importing WordPress data into custom applications
- Backup Processing: Converting WordPress backups for external storage
Content Analysis
- Content Audit: Analyzing website content structure and metadata
- SEO Analysis: Examining post titles, descriptions, and taxonomy
- Author Analysis: Reviewing content creation patterns by author
- Performance Review: Analyzing content publication and modification patterns
Business Intelligence
- Content Metrics: Analyzing content production and publication trends
- Category Analysis: Understanding content organization and classification
- Timeline Analysis: Tracking content creation and modification over time
- Custom Field Analysis: Extracting and analyzing custom metadata
Development & Integration
- API Development: Creating data sources for custom APIs
- Third-party Integration: Preparing data for external systems
- Data Warehousing: Loading WordPress data into analytics platforms
- Backup Processing: Converting WordPress exports for data processing
Conversion Examples
Basic Conversion
# Convert all posts and pages
python wxr_to_csv.py wordpress_export.xml
# Output: wordpress_export.csv
Selective Conversion
# Convert only blog posts
python wxr_to_csv.py wordpress_export.xml -t post
# Output: wordpress_export.csv (posts only)
# Convert custom post types
python wxr_to_csv.py wordpress_export.xml -t product testimonial event
# Output: wordpress_export.csv (custom types only)
Custom Output
# Specify custom output filename
python wxr_to_csv.py wordpress_export.xml -o my_wordpress_data.csv
# Output: my_wordpress_data.csv
# Full workflow example
python wxr_to_csv.py site_backup.xml -o site_data.csv -t post page product
Troubleshooting
Common Issues
File Format Errors
- "Error parsing WXR file": The XML file may be corrupted or not a valid WXR export
- Solution: Re-export from WordPress or check file integrity
No Data Found
- "No posts found": Check that specified post types exist in the export
- Solution: Verify post types with
--typesparameter or export different content
Encoding Issues
- Character display problems: Usually related to encoding
- Solution: The script handles UTF-8 by default; check source file encoding
Memory Issues
- Large file processing: Very large WordPress exports may consume significant memory
- Solution: Filter by post type to reduce dataset size or process in smaller chunks
Performance Optimization
Large Files
- Memory Usage: The script loads the entire XML file into memory
- Filtering: Use post type filtering to reduce memory usage
- Batch Processing: Consider splitting large exports into smaller files
Processing Speed
- File Size: Larger exports take longer to process
- Content Complexity: Posts with extensive custom fields or metadata require more processing time
- Hardware: Faster CPU and more RAM improve processing speed
Validation Tips
- Test with Small Export: Start with a small export to verify format
- Check Column Headers: Ensure CSV headers match expected format
- Verify Data Integrity: Spot-check converted data against original posts
- Custom Fields: Pay special attention to custom field conversion
Advanced Features
Custom Field Handling
- JSON Format: Custom fields are exported in JSON format for easy parsing
- Nested Data: Supports complex custom field structures
- Metadata Preservation: All WordPress metadata is retained
Error Recovery
- Graceful Handling: Script continues processing even if individual posts have issues
- Error Reporting: Clear error messages for troubleshooting
- Partial Success: Completes processing even with some failed entries
Data Validation
- Format Checking: Validates WXR format before processing
- Content Verification: Ensures data integrity during conversion
- Output Validation: Verifies CSV format and structure
Contributing
We welcome contributions to improve the WXR to CSV Converter!
How to Contribute
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes
- Test thoroughly with various WordPress exports
- Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin feature/your-feature-name - Submit a pull request
Development Areas
- GUI Interface: Create a user-friendly graphical interface
- Additional Output Formats: Support for JSON, XML, or database formats
- Advanced Filtering: More sophisticated content filtering options
- Performance Optimization: Improve processing speed for large files
- Error Handling: Enhanced error recovery and reporting
Roadmap
Planned Features
- Graphical Interface: User-friendly GUI for non-technical users
- Multiple Output Formats: JSON, XML, SQL, and other format support
- Advanced Filtering: Complex filtering rules and conditions
- Batch Processing: Convert multiple WXR files simultaneously
- Database Export: Direct export to various database systems
- Progress Tracking: Progress bars and status updates for large files
- Data Validation: Enhanced validation and error checking
- Configuration Files: Save and reuse conversion settings
Recent Updates
- Improved Error Handling: Better error messages and recovery
- Enhanced Encoding: Better support for international characters
- Performance Optimization: Faster processing for large exports
- Documentation: Comprehensive usage guide and examples
Support
Getting Help
- GitHub Issues: Report bugs or request features on our GitHub repository
- Documentation: This guide covers most common use cases
- Community Support: Connect with other users for help and tips
Reporting Issues
When reporting issues, please include:
- Python version
- Operating system
- WordPress version (source of export)
- Error messages (if any)
- Sample data (if possible)
License
This project is open source and available under the MIT License. You're free to use, modify, and distribute it according to the terms of the license.
WXR to CSV Converter - Bridging the gap between WordPress and your data analysis tools.