This morning, I put together a scraper I’ve been meaning to for at least four years now.
The Kings County Supreme Court posts PDFs of this upcoming week’s foreclosure sales, but changes the page every week. AFAIK there is no way to go back in time to get the prior PDFs, which contain the address, remaining principal of the loan, owner of the note, and attorney info.
For the time being, I’m hosting the data here myself.
The code is all on GitHub, and you can try running it yourself. I run it every day in case they change files, but it’s smart enough to check headers and not save duplicate PDFs.
As I noted in the repo, there are a few things I’d love to see:
- Some basic OCR on the PDFs, which are scans. There are a few keywords we could check for. It would be amazing to start throwing what we can pick up into a database.
- Posting to data.beta.nyc!
These could both be wonderful CodeAcross projects. I’ve got something else on my plate for that event, but please do get in touch with me if you’re interested in taking either of those (or something else) on.
AFAIK, the other boroughs don’t post these foreclosure sale notices online. To be fair, I’ve only checked Manhattan though.