Some Preliminary Analyses

I did some preliminary analyses on our data using NLTK, a Natural Language Processing package for Python. Mostly analyzing phrases that repeat themselves.

I put them online in a couple of IPython Notebooks.

First one is mostly just me messing around with NLTK functions on a small subset of the data (partially to give a tour of NLTK to folks who haven’t played with it):

This one goes deeper into the data, analyzing longer phrases and the whole body of entries:


Nice work @mattalhonte. I’m new to NLTK. Can this be used to turn the sentences/fragments into some tokenized grammar, or Language Model tree that would give us a better chance of identifying entities, addresses, times? Really cool stuff man – You should do a talk on what you’ve learned – I’m very interested in what you’ve learned :smile:

1 Like