Built a JSON API around a clunky gov website


(Chris Whong) #1

Childcare connect lets you look up daycares and violations:
https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do

No open data of course, though you may recall this project by Anita Schmidt and others to map them.
I wanted to get access to the violations data to make it more usable, but instead of writing a scraper to output to a file, it scrapes on the fly and exposes a JSON endpoint. This way, the data is always fresh!

The plan is to let the user click on a daycare on the map, then use this API to present the freshest data on daycare violations possible. No need to store it, just get it right from the source on-demand. Thoughts? Does this approach make sense vs running a scraper periodically that hits the page for EVERY daycare? Unless you do it daily it may get stale fast.

TODO: compile violations data into a csv for people to download and analyze the data