I had several people express an interest in the presentation I used to talk about data wrangling during an unconference session this afternoon. It’s here:
And we’re releasing an updated version of the OpenRefine CKAN Connector and open-sourcing it next week!
Another tool people might be interested in is http://tabula.technology/
Pretty much my goto now a days for pulling data from data tables in pdfs.
Agree. Tabula is awesome! Worked with them at pdf liberation hackathon last year and at recent nypl lab hackathon. FYI, they have an API. It’s WIP, but hope to use it in City Record project.