Proposal to manage schema

Hi Guys

While playing around with the dataset, I found the exercise of using ipython notebook very useful for exploring the data as well as documenting “schema”. Check out the example I uploaded while playing around with some of the Mayor’s office ads.

Would be nice if we could get away from the large google documents and consolidate the information into a notebook. Having a rich media format that provides context and the ability to explore the data might help us get up and running faster.


This is great. Thanks, Amal. I’ll look at a little more to get a sense of the workflow. As we’re moving from the designing the schema in google spreadhseets to a JSON Schema representation, is this something we could use think?

@bahijnyc, what do you think? have you used this before?


I’m totally new to data handling, regex, python and pretty much all of this stuff, so I spent the last couple hours doing some research. I’m a bit confused by what you mean when you say a notebook. I agree that it makes sense to use a collaborative parsing tool like iPython. If it facilitates the development of the application code we should absolutely work with it.

My personal concern is that if we move the schema to a notebook without a google sheet equivalent, it will be inaccessible without knowing how to access the data using python. I might be the sole case of this, and I certainly don’t want to confuse things by adding a google doc that must be synced.

For now, can you continue parsing the data using tools like iPython, and we can finalize the schema in a flat format? It makes sense to me to translate the schema into a rich media format once we have it more defined. Explain to me though, would putting it in a notebook actually make it hard to access, I just don’t know?