Parsing project published

Tags: #<Tag:0x00007fca769e3908>

Hey Guys,

I merged what I’ve been working on recently. Comment/Feedback appreciated.

  • CROW’s parsing folder.
    • Overview and status for procPublicationRequest is here

Some Data integrity concerns have popped up.

There is a fair bit of critical information missing in the DCAS data we use. The AdditionalDescription field is empty, or is incomplete for quite 1/3 of all entries. This field, when it is properly populated is a text dump of an HTML fragment.

See this summary of null (missing) and non-null (parse-able) messages for the procPublicationRequest we’re using.

@mikael, @bahijnyc, @bmadhusu, @mattalhonte - I’m not sure what the best way to proceed. Any ideas? Help!