Hi,
Web Data Engineer | Web Scraping & Data Extraction
I build Python systems that automatically extract data from websites, PDFs, and APIs, clean it, and deliver it in the format you need (CSV, Excel, or Google Sheets).
I focus on scraping dynamic sites, handling JavaScript-rendered content, pagination, and anti-bot challenges.
My solutions reduce manual efforts by 70–80%, delivering consistent and reliable data, so teams can focus on insights instead of data wrangling.
Let’s connect and streamline your data needs!
About Me
I’m a Web Data Engineer focused on building reliable extraction systems that automatically pull data from websites, PDFs, and APIs, then clean and structure it into usable formats such as CSV, Excel or Google Sheets.
Most of my work involves Python and Playwright to solve scraping challenges on dynamic websites by handling JavaScript-rendered content, pagination, session states, and access restrictions. Where possible, I use direct API integration to reduce overhead, and PDF parsing and OCR tools to process large volumes of complex PDF data.
I design systems that reduce manual data handling by 70–80% while improving consistency and reliability, enabling immediate business impact.
My approach is shaped by my background in mathematics and years of teaching, which trained me to break down problems and choose the most efficient solution, whether through API access or parsing-based extraction.
The result is simple: cleaned, structured, reliable data delivered in the format you need.
If you’re working with data that’s difficult to collect or organise, I can help simplify it.
My Skills
My Technical Skills
Soft Skills
Featured Projects
Real-World Web Scraping — From E-Commerce to Property Markets
Built a reliable Python–Playwright scraping pipeline for NigeriaPropertyCentre.com with retry logic, configurable filters, robust error handling, and clean CSV export using Pandas. Designed for resilient multi-page extraction under unstable page loads.
End-to-end automated pipeline for tracking Galaxy A06 listings on Jumia. Powered by Playwright, playwright-stealth, and GitHub Actions with structured historical tracking and change detection using pandas. Enhanced with autogenerated user agents, human-like interaction simulation, and exponential backoff retry logic for improved resilience against bot detection and transient failures. This solution delivers continuous competitive pricing intelligence.
Extracted structured listing data from JS-rendered pages using Playwright with pagination control, DOM synchronization, and reliable CSV normalization.
Automated data extraction from a JavaScript-rendered e-commerce site with cookie handling, pagination logic, and reliable selector strategies.
Engineered an asyncio-powered Playwright pipeline to extract and normalize data for the top 250 TV series at scale, with resilient error handling and structured export.
Automated daily extraction of product data — price, stock, discounts, reviews, and ratings — from Jumia over 5 days, producing a validated historical dataset ready for trend analysis.
Data Projects
My Recent Projects
Project Summary: Designed and implemented a data extraction solution for NigeriaPropertyCentre.com using Playwright and Python. The system handles dynamic content rendering, pagination, and configurable search filters such as listing type, city, and bedroom count to extract structured property data including pricing, location, and key features (agent info, bedrooms, bathrooms, parking spaces, and square metre area). Built with a focus on reliability, the scraper incorporates retry logic for intermittent page failures, logging, dynamic user-agent rotation, and efficient error handling to ensure consistent data collection across multiple pages. Post-processing includes data cleaning, normalisation, and anonymisation of sensitive agent information, with output structured in CSV format. This solution is tailored for efficient, reliable and repeatable extraction from NigeriaPropertyCentre, ensuring accuracy
Project Summary: A fully automated price monitoring system that tracks Galaxy A06 listings on Jumia Nigeria. Runs daily via GitHub Actions, using Playwright with stealth enhancements for real-time extraction and pandas for historical tracking with price change detection. Automatically classifies each price movement (new, increased, decreased, no change) while preserving complete historical records. Built with anti-bot resilience techniques including playwright-stealth integration, autogenerated user agents to mask identity, human‑like browsing behaviors to evade detection, and enhanced retry logic using exponential backoff for stronger resilience. It also includes robust error handling, structured logging, and a controlled execution timeline of 4 weeks and 3 days. This solution delivers continuous competitive intelligence on pricing dynamics, ensures reliable data collection, and provides a scalable foundation for advanced analytics such as dashboards or predictive modeling.
Project Summary: A Python-based web scraping pipeline using Playwright to extract structured real estate listing data from Movoto, with a configurable city parameter (currently set to Phoenix, AZ). The scraper programmatically navigates JavaScript-rendered listing pages, iterates through pagination, collects property URLs, and extracts key details from individual listings including address, price, bedrooms, bathrooms, square footage, property type, and year built. Implements dynamic Chrome user-agent generation, controlled navigation timing, and DOM-state synchronization via explicit selector waits, with defensive error handling under dynamic content shifts ensuring high reliability. Output is normalised for CSV export using Pandas, ideal for market research, price monitoring, and real estate analysis.
Project Summary: Automated the extraction of daily an E-Commerce product data (price, stock, discount, reviews, ratings, url) from Jumia, eliminate manual tracking. The scraper ran automatically for 5 days, generating a historical dataset that was cleaned and validated for reliable trend analysis and reporting. The Python script handles scraping, data cleaning, validation and storing data in a structured CSV file ready for analysis.
Project Summary: Built a Python Playwright scraper that simulates real browser behavior to extract ST-style guitar data from Thomann UK, a dynamic e-commerce website. Implemented automated cookie consent handling, pagination control, and reliable extraction from JavaScript-rendered content handling using safe locator strategies, DOM manipulation, and controlled waits to reliably extract product titles, prices, reviews, ratings, and availability. Output can be easily extended to CSV, JSON, or database
Project Summary: This Python web scraper extracts luxury property listings from LuxuryEstate.com, focusing on New York City, United States. It navigates pagination, collects unique listing URLs, and parses key details such as title, price, room counts, construction year, amenities, and views using BeautifulSoup for HTML parsing and httpx for robust HTTP requests. Randomized user agents and adaptive sleep delays help reduce detection. The data collected is compiled into a Pandas DataFrame and exported as a CSV file for easy analysis or integration. Ideal for market research or data aggregation tasks and easily configurable for other cities/countries through simple variable updates.
Project Summary: Engineered an asynchronous web scraping pipeline with Playwright, asyncio, and Pandas to extract web data from IMDb's Top 250 TV series. The pipeline automated navigation of JavaScript-rendered pages, normalized show URLs with regex, and applied resilient error handling for failed requests—achieving reliable data extraction. This initiative produced a clean CSV dataset ready for analytics, visualization, or integration into media recommendation systems.
Project Summary: This project automates the extraction of real-time cryptocurrency data, including coin name, price, change, change percentage, and volume, from Yahoo Finance every 6 hours over a 3-day period. The goal is to track market behavior and spot short-term trends. The script uses Python libraries to scrape, store and visualize the data in a structured CSV file for ongoing analysis.
Project Summary: A Python scraper that extracts whisky product data (name, brand, and price) from The Whisky Exchange search API. The scraper uses requests.Session() for persistent connections, handles headers and cookies to mimic browser requests, iterates through paginated results, and stores the structured output in a CSV file using pandas. This project demonstrates skills in web data extraction, JSON parsing, session management, and data structuring for analysis.
Project Summary: Analyzed 42,992 e-commerce transactions using RFM segmentation to classify customers into top-value, potential, and low-engagement tiers, and to uncover revenue trends across age groups, countries, and product categories. This project delivers a comprehensive view of customer behaviour and revenue dynamics to support strategic growth initiatives. The analysis was conducted using Python to ensure scalable, data-driven insights for decision-making.
Project Summary: Developed a Random Forest model to predict customer churn using demographic and behavioral data. Evaluated performance with precision, recall, F1-score, and optimized thresholds to improve churn detection, delivering actionable insights to support proactive business decisions.
Project Summary: This project forecasts natural gas prices for any date using historical monthly data and interpolated daily values. It identifies seasonal trends, visualizes monthly patterns, and applies an ARIMA model to predict prices from October 2024 to September 2025, supporting storage contract pricing and trading desk decisions.
Project Summary: Developed a logistic regression model to predict personal loan default probabilities and estimate expected losses (10% recovery), providing actionable insights for credit risk management and capital allocation.
Explore skills I bring to the table
My Experience
My Work Experience
Experience in The Field
Portfolio of Projects
Satisfied Customers
Available for freelance & contract work — Let's talk