Wednesday, March 4
Hope all are staying dry in the wet weather. Just doing a quick (“scrum”) check in to sum up the past week, identify some of the issues that are impeding our progress, and sum up our next steps. Will send these weekly/biweekly so these posts should be a good opportunity for all to catch-up if you weren’t involved in all of the pasta discussions - and yes, future ones will be shorter. Here we go.
Many of you have already starting exploring and testing the state of the current data. @mattalhonte did some preliminary analysis of the blurb that is the “Additional a description text” using NLTK, looking at the structure and analyzing phrases that repeat themselves. @csudama has also been active and had converted the data into JSON which might be easier to feed your parsers.
Schema standardization talk
Last week I spoke with the good people at popoloproject.com - a government schema standardization initiative, and got some valuable feedback on our proposed Public Notice Data Standard. I have integrated some of these notes under the Schema board on Trello.com.
Leading the way
Representing Commune [kuh-myoon], Mikael, undersigned, has since CodeAccross taken up a bigger role on the CROW project and will lead the community development going forward. There are many overlaps between CROW and Commune (standardizing schemas, structuring notice publication, making public meetings more accessible, etc.), and we will be dedicating a substantial amount of our time to take a lead role in contributing to the realization of CROWs objectives. The successful implementation of these will benefit us and any other person, startup or organization interested in structured public notice information.
Timeline - starting simple
As many of you may have seen, @dclark added a proposed timeline for how we can proceed for parsing the objects. The idea is to first set up the testing suite and a mvp library that “kind of works”. With this in place, we can start adding incremental changes to become better and better, and closer to our goals.
We like the approach and I have added it to trello for the address card as a suggestion, but those working on other parsers are also welcome to take and draw some inspiration from it. (Be sure to read @dclark original post though as my rendition does not do it proper justice.)
Anyone have issues or are stuck? Add it to the issue tracker. There is a lot of expertise on this team - you will get an answer.
Important! Break your task down this week
That is, define the user story/goal, and list 3 or more steps that you will need to accomplish that goal. This will help the others to know what you are thinking, as well as to offer their expertise on individual points. Feel free to add steps if your card has an approach - this will only spear discussion. See the schema or the
address card for examples. I will come back to you to discuss some expected delivery dates.
The hacknights are a perfect opportunity to take some of your well developed ideas and hack them out in the real life. We wont do weekly full team meetings, but you are free, nay urged, to coordinate with the other person(s) on your card on trello and letting them know that you’ll be going. (E.g. commenting “Going to hacknigh tonight” on a card is enough to notify the people on your team about what you are doing.) Also, you are encouraged to meet with your partner somewhere else convenient, which I know some of you have done already. The thing we ask is only that you sum up quickly the take aways from the meeting.
Good data card
After last weeks discussions I’m creating a new issue on github dedicated to building a pipeline to connect, and massage, the DCAS data output so it’s clean and easily accessible by parsing libraries
Since all the libraries depend on some level of clean data, I thought we could perhaps centralize at least of this effort. Anyone interesting in taking the lead on this? All are welcome to - and should probably - contribute how they want the data to be like. This group will also be an important member in the discussions with DCAS team.
We will move a substantial part of our issue tracking and development management to GitHub. More on this together with timelines towards the rest of next week.
Make sure you are "Watching” the City Record Workgroup on Talk. That’s the only way you’ll make sure to not miss any developments, or my engaging email. I will be following up with all of you individually in the next few days to make sure no one is in the dark.
Thats’s it for today! Enjoy hacknight for those of you are doing, and you will hear back from me later this week (albeit in shorter form) to discuss the delivery time tables…