This article will discuss how to scrape websites in Ruby and Selenium. CSS class selectors will be used to scrape the data.
We are going to scrape the StackOverflow questions page and pull out the following information:
- Question Text
- Answer Count
- Views
- Vote Count
First, download the Chrome driver from Selenium according to your Chrome version from here(We will only be scraping with Google Chrome). Put the downloaded .exe file in the directory where the Ruby script is present.
Go to the questions page. Open Inspect Element window. The main element of focus will be the list item.
Get all the elements with class “question-summary”.
1
questions = @driver.find_elements(class: "question-summary")
Now run a loop to fetch the classes of “question-hyperlink”, “vote-count-post”, “status”, and “views” for question text, vote count, answer count and views respectively.
1
2
3
4
5
6
7
8
9
10
11
questions.each do |question|
question_text = question.find_element(class: "question-hyperlink").text
puts question_text
vote_count = question.find_element(class: "vote-count-post").text
puts vote_count
answer_count = question.find_element(class: "status").text
puts answer_count
views = question.find_element(class: "views").text
puts views
end
The complete code will look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
require 'selenium-webdriver'
class Scraper
def initialize
@long = 3
@root_path = "https://stackoverflow.com/questions"
@driver = Selenium::WebDriver.for :chrome,:driver_path => './chromedriver'
end
def main
@driver.navigate.to @root_path
get_questions
sleep @long
@driver.quit
end
def get_questions
questions = @driver.find_elements(class: "question-summary")
questions.each do |question|
question_text = question.find_element(class: "question-hyperlink").text
puts question_text
vote_count = question.find_element(class: "vote-count-post").text
puts vote_count
answer_count = question.find_element(class: "status").text
puts answer_count
views = question.find_element(class: "views").text
puts views
end
end
end
start = Scraper.new
start.main
Now you know how to scrape websites using Ruby and Selenium! Happy scraping!