I’ve discussed before how to extract data from static published graphs in Matlab. But what if you wanted to extract table data from a website that doesn’t have an API?
import.io does just this. You show a few examples of the data you want to extract from a site, and it guesses intelligently how to extract other data points from the same or similar pages. You can then access that data as JSON through a REST API, or just download a .csv file.
You could use it to extract data tables from Wikipedia, government agencies, commercial websites, etc. for legit scientific purposes. Better yet, you can extract top 1000 lists from, say, Les Inrocks, and get them into Spotify for some infinite lab playlists.