Skip to content

alex-petr/coldwell_banker_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coldwell Banker Scraper

Description

This is a Ruby script to scrape products from https://www.coldwellbankerhomes.com.

Scraped states, regions, products data then saved to output/ directory in files with CSV format.

This scraper script uses

  • Static HTML (DOM) parsing for links/general info
  • Semantic annotation recognizing in product/residence Microformat for parsing estate-specific data embedded in the product pages

Workflow: how it works

Workflow

Features

  • Service Object Pattern which provide one public method - #call
  • Ruby executable script
  • All required gems installed with Bundler
  • curl support with Curb for getting pages HTML
  • Nokogiri for HTML parsing with XPath and CSS selector support.
  • CSV export via CSV Ruby class
  • Logging via Logger Ruby class
  • Code style is provided via RuboCop
  • Ruby code quality reporter via RubyCritic

Requirements

  • System: Linux, Mac
  • Git
  • Ruby version manager (rbenv or RVM)
  • Ruby 2.5.0
  • Bundler
  • Gems installed via Bundler Gemfile

Installation

Download code from repository

Clone with SSH:

$ git clone git@github.com:alex-petr/coldwell_banker_scraper.git

Or clone with HTTPS:

$ git clone https://github.com/alex-petr/coldwell_banker_scraper.git

rbenv (for macOS)

$ cd coldwell_banker_scraper/ && brew install rbenv

Ruby

$ rbenv install 2.5.0

Install Bundler and all required gems

$ gem install bundler && bundle

Tests

No test suite is available. To ensure that this scraper works run it and check output in terminal and output/ directory for CSV files.

Usage

$ bin/scraper

After running script will generate a bunch of CSV files inside output/ directory.