User Guide
Welcome to Page2Table AI! This guide will help you get started quickly and make the most of all the extension's features.
Getting Started
Install the Extension
- Download and install the Page2Table AI extension from the Chrome Web Store.
- After installation, click the extension icon in the browser toolbar.
- On first use, you'll need to log in with your Google account (for AI analysis services).
Note: The extension requires access to webpage content to extract data. These permissions are only used when you actively click the \"Convert Current Page\" button.
First Extraction
- Open the webpage you want to extract data from (e.g., product list page, article list, etc.).
- Click the extension icon to open the sidebar workbench.
- Click the \"⚡️ Convert Current Page\" button in the top left.
- Wait for AI analysis to complete (first analysis may take a few seconds).
- Data will be displayed in table format in the workbench.
Basic Usage
Convert Current Page
This is the core feature of the extension. AI automatically analyzes page structure, identifies structured data such as lists and tables, and generates extraction schemas.
💡 Tip: If the page contains multiple logical data areas (such as \"Product List\" and \"Filter Conditions\"), they will be automatically separated into different worksheets, which you can switch between using the worksheet tabs at the bottom.
View Extracted Data
- Data is displayed in table format with support for sorting and filtering.
- If data contains links, a ⚡️ icon will appear next to the link, indicating that you can drill down to extract detail page data.
- Each worksheet has its own tab for easy management of different types of data.
Export Data
Click the \"🗂️ Local Storage\" button in the top right to open the file manager:
- Single file: Click the \"Download\" button to export as
.xlsx format.
- Related files: Click \"Download\" on a parent file to package it and all child files into a
.zip archive.
- Exported Excel files include relationship ID fields for easy data relationship reconstruction in external tools.
Advanced Features
🔗 Drill Down to Detail Pages
In the extracted table, click a cell containing a link or the ⚡️ icon next to it to automatically extract detail page data.
- After clicking the link, AI will automatically analyze the detail page using the default extraction schema.
- On first drill-down, AI will learn and cache the extraction schema for this type of page.
- Subsequent drill-downs to similar pages will directly reuse the cached schema for instant extraction.
- After extraction is complete, detail page data will open as a new child file in the workbench, and the icon in the original table will change to ✅.
🔄 Smart Pagination Collection
If a list worksheet is identified as having pagination functionality, pagination controls will appear below the table.
- The pagination control will show the pagination mode identified by AI (e.g., \"Click Next Page\" or \"Infinite Scroll\").
- Set the maximum number of pages (or scroll times) you want to capture in the input box.
- Click the \"Start Pagination\" button, and the program will automatically simulate click or scroll operations.
- Newly captured data will be deduplicated and appended to the end of the current table. You can click the \"Stop\" button at any time to interrupt.
⚡ Batch Drill-Down
If a column contains multiple drill-down links, a ⚡️ batch drill-down button will appear in the column header.
- Click the batch drill-down button. If there's no cached schema for this type of page, the program will prompt you to enter extraction requirements for the detail page (you can use the default prompt).
- After confirmation, the program will automatically access all unextracted links in that column in the background.
- Icons in corresponding rows of the table will update in real-time (loading 🔄 → completed ✅).
📦 Batch Operations
After batch drill-down is complete, a batch operations panel will appear. If the newly generated child pages also contain paginated lists, options will be provided to perform batch pagination on all these child pages.
💡 Tip: Batch operations can achieve exponential growth in data collection. For example: first batch drill-down 100 product detail pages, then perform batch pagination on each detail page to quickly collect large amounts of data.
🧠 Schema Cache & Reuse
For similar pages on the same website (such as multiple product detail pages), Page2Table AI only needs to analyze once. Subsequent extractions will automatically reuse cached extraction schemas, greatly improving speed and saving AI call costs.
Note: Cached extraction schemas only contain data structure information (such as field names, selectors, etc.) and do not contain any original webpage content, ensuring data privacy and security.
Data Management
Local Storage
All extracted data, file relationships, and extraction schemas are securely stored in your local browser:
- Data is completely stored on your device, and we cannot access it.
- Data will not be lost after closing the extension.
File Manager
Click the \"🗂️ Local Storage\" button in the top right to:
- View all saved extraction tasks, displayed in a parent-child tree structure.
- Click the \"View\" button to reopen the file in the workbench.
- Download single files or related file packages.
- Delete files and all their child files.
FAQ
Q: Why do I need to log in with a Google account?
A: Google account is used for user authentication and AI analysis service call management. We only collect your email address and do not obtain other sensitive information.
Q: Will extracted data be sent to the server?
A: No. All extracted data is stored locally in your browser. Only HTML content used to generate extraction schemas is temporarily sent to the server for AI analysis and deleted immediately after analysis is complete.
Q: How can I improve extraction accuracy?
A: Make sure the page is fully loaded before extraction. For dynamically loaded content, wait for all content to finish loading. If the extraction results are not ideal, try clicking the \"Force Reanalyze\" button to let AI reanalyze the page structure.
Q: What types of webpages are supported?
A: Page2Table AI supports extracting any webpage containing structured data, including product lists, article lists, search results, table data, etc. For dynamically loaded content, it's recommended to wait for the page to fully load before extraction.
Q: Can I extract webpages that require login?
A: Yes. As long as you are logged in and can normally access the webpage in your browser, the extension can extract data from it.
Q: What is the relationship ID in exported Excel files?
A: Relationship IDs are used to identify parent-child relationships between data. The parent_file_sheet_row_id field in child file data corresponds to the unique ID (file_sheet_row_id) of a row in the parent file, making it easy to reconstruct data relationships in databases or BI tools.
Troubleshooting
Analysis Failed
- Ensure you are logged in with your Google account.
- Check if your network connection is normal.
- If the page content is too complex, try clicking the \"Force Reanalyze\" button.
Incomplete Data
- Ensure the page is fully loaded (wait for dynamic content to finish loading).
- If the page uses infinite scroll, you can use the pagination feature to collect more data.
- For complex page structures, try using the \"Force Reanalyze\" feature to reanalyze the page.
Extension Not Working
- Ensure Chrome browser version is 88 or higher.
- Try refreshing the page and operating again.
- Check if the extension is enabled (view in
chrome://extensions/).
- If the problem persists, please contact us at support@page2table.com.
⚠️ Important Note: If you encounter any issues, first try refreshing the page or reopening the extension. If the problem persists, please contact us via email and we will resolve it as soon as possible.
← Back to Home