Web Scraping: Get Data You Want On Your Terms(4 Part Series)

Part I: Overview

·

2 min read

Table of contents

I know, I know building ML Applications is fun yes, but have you ever tried training a model with limited data? huhhh?

Web Crawler:

Being able to get data however and whenever you want is 🔥.
Today, we’ll be looking into how I reverse engineered stockx’s web and mobile api, to get more historical data on items.

WHY?
It’s common knowledge that stockx is the fountain of data when it comes to sneakers sale’s history. You can get whatever you want, right on there!
Super Amazing right?

Case Study
Let’s take a look one of my favourite kicks of all time, drum rolls…
The Jordan 1 Retro High OG

Overview
As with every webpage you want to crawl, we need to get an overview of what this page looks like, know what data we’d want to extract.
I can’t speak for you, but I definitely dont want to store every single thing on that page 😆.

You see it already right? So much meaningful data 💃 💃 💃

What Do We Want To Scrape
As you might’ve guessed already, we want store:

  • Title of this sneaker

  • Breadcrumb of this page, just so we know all the tiny little categories 😅

  • Style

  • Retail Price

  • Release Date,

  • Colorway

  • Sales History
    - date
    - time
    - size
    - price

And of course, you can add more fields, whatever you want!

Â