I'm seeking an experienced developer to create a robust web scraping tool that extracts detailed product information from Walmart and Target category or storefront URLs. The tool must effectively navigate anti-bot measures to prevent IP blocking and ensure reliable data retrieval.
Project Overview:
The scraper should accept a Walmart or Target category/storefront URL and output structured data (CSV or JSON) containing the following fields:
- Product URL
- Brand
- Title
- Price
- Rating
- Total Number of Reviews
- Inventory Count (for Walmart, this data is available in the source code)
- Third-Party Seller Name and Inventory Count (if applicable)
- Indicator if Walmart is the Seller (Yes/No)
- UPC (if available)
- Walmart Product ID
Anti-Bot and Anti-Blocking Requirements:
To ensure uninterrupted scraping, the tool should incorporate the following measures:
- IP Rotation: Implement rotating residential proxies to distribute requests and avoid IP bans.
- User-Agent Management: Use realistic and rotating user-agent strings to mimic genuine browser behavior.
- Request Throttling: Introduce randomized delays between requests to simulate human browsing patterns.
- Headless Browser Support: Utilize headless browsers like Puppeteer or Playwright to render JavaScript-heavy pages accurately.
- CAPTCHA Handling: Detect and solve CAPTCHAs as needed to maintain scraping continuity.
- Monitoring and Adaptation: Continuously monitor scraper performance and adapt to changes in target websites' structures or anti-scraping measures.
If you're interested in this project, please comment below or send me a direct message with the subject line "Walmart & Target Scraper Project." In your response, include your bid for the project, estimated timeline for completion, examples of previous similar work, preferred programming language and frameworks, and any questions or clarifications you need to provide an accurate quote. The deliverables include a functional scraper tool (command-line interface or simple GUI) that meets the specifications outlined above, along with a well-documented codebase for future maintenance and updates. I look forward to collaborating with you.