Web Scraping with Python and Selenium
Learn how to extract data from websites when APIs aren't available using Python and Selenium for automated browsing.
Web Scraping with Python and Selenium
Web scraping is a powerful technique for data collection when APIs aren't available. In this post, I'll show you how to use Python and Selenium to automate web browsing and data extraction.
When to Use Web Scraping
While APIs are generally preferred, web scraping becomes necessary when:
- No API is available
- API access is limited or expensive
- You need data from diverse sources
- The website's content changes frequently
Setting Up Selenium
First, install the necessary packages:
bash1pip install selenium webdriver-manager pandas
Then, set up a basic scraper:
python1from selenium import webdriver 2from selenium.webdriver.chrome.service import Service 3from webdriver_manager.chrome import ChromeDriverManager 4from selenium.webdriver.common.by import By 5from selenium.webdriver.chrome.options import Options 6import pandas as pd 7 8# Setup chrome options 9chrome_options = Options() 10chrome_options.add_argument("--headless") # Run in background 11chrome_options.add_argument("--no-sandbox") 12chrome_options.add_argument("--disable-dev-shm-usage") 13 14# Setup webdriver 15driver = webdriver.Chrome( 16 service=Service(ChromeDriverManager().install()), 17 options=chrome_options 18) 19 20# Navigate to website 21driver.get("https://example.com") 22 23# Extract data 24elements = driver.find_elements(By.CSS_SELECTOR, ".product-item") 25data = [] 26 27for element in elements: 28 name = element.find_element(By.CSS_SELECTOR, ".product-name").text 29 price = element.find_element(By.CSS_SELECTOR, ".product-price").text 30 data.append({"name": name, "price": price}) 31 32# Convert to DataFrame 33df = pd.DataFrame(data) 34print(df.head()) 35 36# Close the driver 37driver.quit()
Best Practices
-
Be respectful
- Check the website's robots.txt
- Add delays between requests
- Don't overload the server
-
Handle dynamic content
- Use explicit waits for elements to load
- Handle AJAX requests
- Navigate pagination properly
-
Error handling
- Implement try/except blocks
- Log errors and continue
- Plan for site structure changes
Real-World Application
At Ventask, I used web scraping to automate data collection from various job boards when APIs weren't available. This automation reduced manual data entry by 80% and ensured we had timely data for our operations.
Conclusion
Web scraping with Selenium is a powerful tool in your data collection arsenal. Use it responsibly and in conjunction with other techniques like API integration for comprehensive data solutions.

João Vicente
Developer & Data Analyst
Sharing insights on automation, data analysis, and web development. Based in Lisbon, Portugal.