Extract data directly from websites with import.io

I’ve discussed before how to extract data from static published graphs in Matlab. But what if you wanted to extract table data from a website that doesn’t have an API?

import.io does just this. You show a few examples of the data you want to extract from a site, and it guesses intelligently how to extract other data points from the same or similar pages. You can then access that data as JSON through a REST API, or just download a .csv file.

You could use it to extract data tables from Wikipedia, government agencies, commercial websites, etc. for legit scientific purposes. Better yet, you can extract top 1000 lists from, say, Les Inrocks, and get them into Spotify for some infinite lab playlists.

 

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s