<code>I know, I know building ML Applications is fun yes, but have you ever tried training a model with limited data? huhhh?</code>
<h3 id="heading-web-crawler">Web Crawler:</h3>
<img src="https://miro.medium.com/v2/resize:fit:300/0*GD1lz09N010bm86K.jpg" alt class="image--center mx-auto" />
Being able to get data however and whenever you want is 🔥. Today, we’ll be looking into how I reverse engineered stockx’s web and mobile api, to get more historical data on items.
WHY? It’s common knowledge that stockx is the fountain of data when it comes to sneakers sale’s history. You can get whatever you want, right on there! Super Amazing right?
Case Study Let’s take a look one of my favourite kicks of all time, drum rolls… The <a target="_blank" href="https://stockx.com/air-jordan-1-retro-high-og-unc-toe-gs">Jordan 1 Retro High OG</a>
<img src="https://miro.medium.com/v2/resize:fit:280/0*DKBzdebXgGAOq8YM" alt class="image--center mx-auto" />
Overview As with every webpage you want to crawl, we need to get an overview of what this page looks like, know what data we’d want to extract. I can’t speak for you, but I definitely dont want to store every single thing on that page 😆.
<img src="https://miro.medium.com/v2/resize:fit:700/1*45RpPemrqANUvKWW5db9Mw.png" alt class="image--center mx-auto" />
<img src="https://miro.medium.com/v2/resize:fit:700/1*y9Hvq6Vki4ZA9ZIUYAyhJQ.png" alt class="image--center mx-auto" />
<img src="https://miro.medium.com/v2/resize:fit:505/1*XJzYt5aDZdQ4oCpfu3EmXQ.png" alt class="image--center mx-auto" />
You see it already right? So much meaningful data 💃 💃 💃
What Do We Want To Scrape As you might’ve guessed already, we want store:
<ul>
<li>Title of this sneaker
</li>
<li>Breadcrumb of this page, just so we know all the tiny little categories 😅
</li>
<li>Style
</li>
<li>Retail Price
</li>
<li>Release Date,
</li>
<li>Colorway
</li>
<li>Sales History - date - time - size - price
</li>
</ul>
And of course, you can add more fields, whatever you want!

`I know, I know building ML Applications is fun yes, but have you ever tried training a model with limited data? huhhh?`

### Web Crawler:

![](https://miro.medium.com/v2/resize:fit:300/0*GD1lz09N010bm86K.jpg align="center")

Being able to get data however and whenever you want is 🔥.  
Today, we’ll be looking into how I reverse engineered stockx’s web and mobile api, to get more historical data on items.

**WHY?**  
It’s common knowledge that stockx is the fountain of data when it comes to sneakers sale’s history. You can get whatever you want, right on there!  
Super Amazing right?

**Case Study**  
Let’s take a look one of my favourite kicks of all time, drum rolls…  
The [Jordan 1 Retro High OG](https://stockx.com/air-jordan-1-retro-high-og-unc-toe-gs)

![](https://miro.medium.com/v2/resize:fit:280/0*DKBzdebXgGAOq8YM align="center")

**Overview**  
As with every webpage you want to crawl, we need to get an overview of what this page looks like, know what data we’d want to extract.  
I can’t speak for you, but I definitely dont want to store every single thing on that page 😆.

![](https://miro.medium.com/v2/resize:fit:700/1*45RpPemrqANUvKWW5db9Mw.png align="center")

![](https://miro.medium.com/v2/resize:fit:700/1*y9Hvq6Vki4ZA9ZIUYAyhJQ.png align="center")

![](https://miro.medium.com/v2/resize:fit:505/1*XJzYt5aDZdQ4oCpfu3EmXQ.png align="center")

You see it already right? So much meaningful data 💃 💃 💃

**What Do We Want To Scrape**  
As you might’ve guessed already, we want store:

* Title of this sneaker
    
* Breadcrumb of this page, just so we know all the tiny little categories 😅
    
* Style
    
* Retail Price
    
* Release Date,
    
* Colorway
    
* Sales History  
    \- date  
    \- time  
    \- size  
    \- price
    

And of course, you can add more fields, whatever you want!

Web Scraping: Get Data You Want On Your Terms(4 Part Series)

Part I: Overview

Table of contents

Web Crawler: