Since 2013, members of BetaNYC have advocated for the City’s open data to conform to open standards.
Through our work trying to scrape the City Record, we unearthed headaches trying to define what is where. In a city as complex as the one beneath our feet, there are at least three ways to describe that location.
Last year, we supported the passage of Local Law 108 of 2015. Earlier this year, we worked with NYC Parks to explore a handful of data formats ensuring human readability and the ability to locate that data.
Today, we are excited to support the City’s efforts to gather the widest feedback possible. We encourage everyone to provide feedback and help grow a new data standard.
Note, public comments will close on Thursday, September 15th.
From MODA’s Post on Draft Standards
For any dataset on the NYC Open Data Portal that includes row-level address fields, agencies must separate locational information into “core address” and “core geospatial reference” attributes. These attributes will appear on the Portal according to a standard column naming convention.
Agencies will be responsible for separating core address information into five standard column fields:
Agencies will also be required, with technical guidance from the Open Data team, to include six standard column fields of core geospatial reference information:
“BIN” (Building Identification Number)
Geosupport: We recommend agencies whose datasets do not already contain the six core geospatial reference fields to use Geosupport, a publicly available tool that also serves as the City of New York’s geocoder of record maintained by the Department of City Planning. Core address data entered into Geosupport can return all required core geospatial reference data. Agencies may geocode their locational data at the database level or the extraction level. Alternatively, agencies may elect to have the Open Data team establish an automated feed, in which datasets are passed through an ETL where they are geocoded and uploaded directly to the Portal.
When a dataset is geocoded by Geosupport, its data dictionary must designate which attribute fields were reported directly by the agency, and which attribute fields were created by geocoding in order to meet these standards. Data dictionaries must also include the version number of the geocoder and error rates that result from geocoding. Finally, agencies with datasets that do not have address fields but include other locational data are encouraged, but not required, to populate as many core geospatial reference fields as possible using Geosupport.