It’s a Big One

I was walking around Coney Island and a Great Black-backed Gull (size: between a crow and a goose) on the sidewalk was swallowing a hamburger bun whole. A lady in a ticket booth nearby, now open for the season, booms out on her microphone, “IT’S A BIG ONE, ISN’T IT?”

My heart attack? Yes, I think it is the Big One.

Simple Ruby Scraping Walk-Through

I’ll get better at using CSS selectors, I promise, but in the meantime, I wanted to scrape a live site with Ruby as quickly and simply as possible. I wanted to make sure I understood the basic process of pulling data from a public website, and parsing it into a useful format. For more information on what scraping is, who does it and why, check out my post, here.

1. Set up
A site to scrape — I wrote a post to scrape on my personal site, just four h2 headlines, “Dogs, Cats, Truth, Rain.” I ended up deleting the post, but one is easy enough to make on your own blog.

Files — Next, I created two blank files using my text editor (Sublime), and saved them to the desktop as:


Folder — Then, I created a new folder, named it lib, and also saved it to my desktop. Lib/, the library directory, is where code usually lives in basic Ruby programs. I dragged and dropped the two blank files I had made inside.

Nokogiri — Now, install Nokogiri, a gem that collects that site’s HTML into a huge string and “treats it like nested nodes to extract the desired information,” according to the Flatiron School.

In your terminal (on a Mac, it’s in applications > utilities), enter:

gem install nokogiri

2. Code the list_items.rb file
Enter some basic code in list_items.rb. Now, instances of list_items.rb can be initialized and stored in @@all = [] (which we’ll use in scraper.rb).

class List_items

  attr_accessor :name

  @@all = []

  def initialize
    @@all << self

  def self.all

  def self.reset_all


3. Require
At the top of scraper.rb, type:

require 'nokogiri'
require 'open-uri'
require_relative './list_items.rb'

Open-Uri is a module that takes one argument, the URL of the website you want to scrape, and returns the site’s HTML. Then, Nokogiri helps parse it.

When we eventually run the program, we’re going to call scraper.rb, which will tell the software to also run (require) list_items.rb. So, scraper.rb here is kind of the brain of the operation.

4. Collect the HTML
Write a single line to grab the HTML with Open-Uri, and convert the HTML into a Nodeset with Nokogiri:


5. Functionality
Put that line inside a Ruby method, and the method inside a Scraper class (below). We’ll eventually be calling a new instance of Scraper and acting on it with four methods.

require 'nokogiri'
require 'open-uri'

require_relative './list_items.rb'

class Scraper

  def get_page


The post I created to scrape only has a couple layers of CSS, “.entry-content” and “h2.” (The period before “entry” indicates that it’s a CSS class.)

I found the CSS using the inspect tool (click here for the Google tutorial).
Here’s the finished code:

require 'nokogiri'
require 'open-uri'

require_relative './list_items.rb'

class Scraper

  def get_page

  def get_items

  def make_items
    self.get_items.each do |content|
      items = = content.css("h2").text

  def print_items
    List_items.all.each do |items|
        puts "Is this working: #{}"

To run the program, navigate to your home directory:

cd ~
cd Desktop
cd lib
ruby scraper.rb

In trying to achieve the simplest scrape possible, I probably left something out. Connect with me on Twitter and let a girl know. @SaraHarvy

MoCCA Fest Roundup!

The MoCCA Fest Arts Festival wrapped up its 2019 edition yesterday in New York City. Here are a few things I want MoCCA to do again next year, and what I want it to add (or take away — nylon giants are especially scary for introverts).

10/10 awesome time. Would do again.

Getting started: coding Ruby in the terminal

A New York UPS center/metaphor

I typed my first HTML in a Codecademy browser window before moving on to build HTML, CSS and PHP sites. Now, I’m enrolled in the Flatiron School boot camp, coding in an online learning environment again, and realized I’ve never actually coded Ruby (the first part of the curriculum) in the wild.

Here’s my quick take on setting up to code in the terminal on a Mac, in case anyone else’s Ruby journey thus far has been mostly online.

Install Rails, Homebrew, git, RVM, gems, SQLite and Node on Mac OS

I ended up using a Flatiron tutorial to install these programs, but tried the first thing I found online first. I ended up with lots of errors. So, I uninstalled and reinstalled, and things went a lot more smoothly the second time around.

I didn’t have the Learn IDE installed, and I didn’t set up the Learn gem because I’m still planning to finish my coursework online.

Adding a SHH key to GitHub

This is one of the steps in the above tutorial. For specific instructions from GitHub on using the SHH key, click here. I was afraid this would somehow disrupt Flatiron’s system, but it didn’t. I added my key to all the other keys.

Commands for coding in the terminal

You code in the terminal (Applications > Utilities > Terminal), where you just installed Rails and the other systems, using commands. As a Ruby newbie, I’m not ashamed to say that this simple, obvious fact was a revelation.

Here’s a basic tutorial by Railsbridge on using commands in the terminal.
Another tutorial by Thoughtco on using the command line.

Coding Ruby locally

Single file
I opened my text editor, Sublime, wrote some Ruby code and saved the file as my_program.rb to my desktop. I navigated to it in the terminal (by typing: cd desktop) and then ran it: ruby my_program.rb. The code ran in the terminal.

Running a program with multiple files
I wrote a sample Ruby CLI with multiple files, including a bin and a lib folder. To run it, I navigated to the program folder on the desktop, all the way to greet.rb in the bin folder (which requires the lib file):

cd desktop (enter)
cd greetings_earthing (enter)
cd bin (enter)
ruby greet.rb
=> Hello, earthling! What is your name?

Using GitHub from the terminal

I created a new repository on GitHub and set it to private (since I’m just practicing). Then, I actually selected upload files and dragged and dropped in my program from the desktop. So, not exactly interacting with GitHub form the terminal yet, but there will be opportunity.

Other ways to upload to GitHub
Cheat sheet of Git commands

If I missed anything in this simple guide, connect with me on Twitter and let me know! @SaraHarvy

10 Examples of Web Scraping in Use

Live web projects as cool as scraped ice

I’m studying Ruby through the Flatiron School, so I researched real-life examples of scraping, a common Ruby task. I understand web scraping is the process of finding and pulling information from public websites, but who does it and why? Here are five common uses with examples, though not all were scraped using Ruby, in capsule form.

Big business

Walmart vs. Amazon
E-commerce sites regularly use software to send waves of bots and web crawlers across multitudes of sites at a time. Bots search for information such as product reviews, contact information to use for marketing, or prices for comparison websites, and save copies in a spreadsheet or database.

It’s a web search that retrieves and saves enormous amounts of information. As Columbia University puts it, “Google scraped (emphasis mine) the web to catalogue all of the information on the internet and make it accessible.” A Google search, in essence, is a scrape.

Walmart was checking the prices on several million times a day in 2016 before Amazon blocked its bots, according to Reuters. In e-commerce, a customer toggling between two competitor sites can choose the product 50 cents cheaper with a click. So, the stakes are high.


Skincare startup Proven scrapes millions of customer reviews, information on thousands of beauty products, and scientific articles, to personalize health and beauty products. The brand both consults and creates a product specifically for each customer, according to Forbes.

Co-founder Amy Yuan searched thousands of ingredients, products, consumer reviews and scientific journal articles to find a product that worked for her own skin, according to Forbes. She and co-founder Ming Zhao extended the project with machine learning and artificial intelligence algorithms to understand the correlations between people’s skin and the ingredients that work for each person.

Naked Apartments
Type in what you’re looking for in a New York apartment, and Naked Apartments retrieves the real estate options, scraped from the web, that fill the bill. It also allows brokers to contact an apartment seeker with permission, and ranks the best sellers, a distinction they can’t buy. Zillow bought the Webby Award-nominated company in 2016.

Naked Apartments provided its in-demand listings service in 2010 and turned a profit in 2011. Travel site scrapes to offer comparison prices, and offers job listings and other directories. Almost any size business also scrapes for competitor prices, leads, search engine results to track SEO and marketing.

Community and social justice organizations
MCSafetyFeed scrapes public information in cooperation with Monroe County, New York, to maximize open data in government. The site provides a history of all the 911 calls that come in to the Monroe County dispatch center and additional data about each call that isn’t immediately available on the dispatch website or Monroe’s RSS feed. Read more about the tech aspects, here.

Community projects may provide scraped digital data, or information from reports requested from governing bodies, or a combination. The goal is usually just to get the information out there for civic transparency and accountability.


Pfizer’s disclosures of payments to doctors
Journalism is obviously another powerful way to get information to the people. ProPublica offers a guide for journalists who may want to scrape public records themselves for reporting. The guide leads aspiring data journalists through the process of scraping pharmaceutical giant Pfizer’s record of payments made to doctors.

Another example of journalistic scraping is James Tozer’s Economist article on the countries likely to have liberal abortion laws. Read more about his entrance into data journalism, here.

Public health

“What if you had an idea for an ecological study, but the data you needed wasn’t available to you?” – a pitch for Columbia University’s Mailman School of Public Health’s courses on web scraping and population health methods.

In public health, big data helps chart disease occurrence based on time, location and demographics, according to the SUNY Downstate Medical Center Department of Epidemiology and Biostatistics. Students use data to identify the relative contributions of biological, behavioral, socio-economic, and environmental risk factors to disease incidence.

Google Flu Trends
It feels more accurate to describe the complex Google Flu Trends as “a big data tool for epidemiologists” than as simply “scraping.” (Browse the Google project’s summary on Wikipedia.)

The idea was that when people have the flu, they search for flu-related information on Google, indicating the time and location of a potential outbreak. The goal was to predict outbreaks weeks earlier than the Centers for Disease Control and Prevention (CDC). Read more about the outcome, and how the process might be refined in the future, on Wired.

For more on the topic of tech and public health, epidemiology in particular, browse the 2015 article “Epidemiology in the Era of Big Data,” from the National Center for Biology Information, a branch of the National Institutes of Health.

Academics and student work

Perception of French hip hop
Developer Alexandre Robin helped a friend write a student paper on the perception of French hip hop through the decades by scraping 7,000 newspaper articles using Node. Read more about his project, here.

Flatiron alumni Clinton Nguyen and Jason Decker used web scraping to refine browser searches. “Sign up to post your own educational trails and follow them, or simply search through on the main page for user-submitted trails that are already out there.”

If there’s anything that can be improved in this article, connect with me through Twitter! @SaraHarvy