Workgroup introductions

If you are joining this group, please provide an introduction on why you want to improve MTA’s service alerts.

Hi all,

I’m Danielle, an information designer, data visualization expert and map enthusiast. I introduced this project idea during the Code Across NYC hackathon and was lucky enough to have a group join me. This is a really interesting dataset that is not currently being archived.

My goal is two-fold:

  1. To parse and save the alerts in a database to explore patterns
    through a map and other data graphics and;
  2. To map service alerts in real time
1 Like

Hello, my name is Kevin Diaz. I am a freelance developer with preference towards back end programming.

I am interested in this project because it’s important to be able to deliver this data to developers without the pain of going through hard to read documents.

I hope to be of help.

1 Like

Hi. I’m Nathan. I’ve been working with MTA data for the last year or so, primarily to quantify bus performance. I’m interested in service alerts because they describe (in words) the issues affecting observed performance (in numbers). I’ve been scraping the service_status.txt feed since the end of November and can add to your archive, if needed. I’m using a crontab on an AWS ec2 instance to capture the data, which is then uploaded to an S3 bucket at the end of each day. How have you been capturing the data, how far back does it go, and where is it stored?

Hello! Is this project still active?

I’m an an urban planner / designer / UI engineer and I’m really interested in helping out. I did freelance work for a real-time transit data startup where we had the same problems trying to parse existing MTA status data, so I’d love to be able to help.

I also have this idea for a side project – it might take a couple of different forms, but one is to create a mobile-friendly alternative to the Weekender site with this data, and the second is to create a NYC subway map that is a “live snapshot” of the system at the current time (so it only shows lines and stations that are currently in service, which would be great for weekends and when services have cahnged).


Hi all!

I and a few others are interested in helping out with this project. Specifically, we’d like to help out with the parsing and API design. We’ve done a small bit of work so far but wanted to understand the status of the project so that we’re not overlapping.

Would someone be able to give an update of the status and an idea of the project plan? It seems like much of the interest in the project involves using already-parsed data to design visualizations/possibly apps, so we’d love to work on exposing a better-parsed API for those kinds of projects.

We’ve been at most of the recent Hacknights to plan this part of the project, and will continue to do so. Hope to work with you all on this!

1 Like

Hey all, I’m a front end developer and occasionally frustrated subway rider. I’d like to work with yall to try different ideas around using MTA notifications to help alleviate some of that frustration (generally around not knowing that one of my main lines is going to close without looking at the signs at the subway stops).

@lou I like your idea of showing an active state of the subway system!

+1 for a status update, or is there a repo we can start to investigate?

1 Like

Hi everyone!

Glad to see so much interest! The last commit to the github repo was in April:

@jimjshields I’d love to touch base with you on what you’ve been working on and what ideas you have. I’ll be at the next few hacknights.

One of the devs on this project reached out to the MTA about improving their feed in the following ways but, to my knowledge, we haven’t seen any changes.

Request for static lists of information

The official MTA service alert feed we’ve been using uses an internally consistent syntax to refer to station names and line directions (e.g., “Times Square-42 St” is always “Times Square-42 St” and never “42 St/Times Square”). Unfortunately these names often have subtle variations from the GTFS names which are found in stops.txt (e.g., “Broadway Junction” (feed) vs “Broadway Jct” (GTFS)).

It would be extremely helpful to have a canononical list of all the station names and their GTFS stop_ids used to generate the service status feed. As it stands, we have to wait for an issue to happen at a station in order to find out the “official” name, and then write an exception for it.

Similarly, while most trains are referred to as being “southbound” or “northbound”, some are station bound. Once again, these names are internally consistent, but sometimes difficult to predict (e.g., An [M] bound for “Forest Hills-71 Av” is said to be “71 Av-bound”). A list of the station-bound train directions would be very useful.

Potential changes to the feed

The service status feed works by over-writing a publically accessible xml document approximately once a minute with all currently active alerts. We easily can track alerts of type “Planned Work” across this refresh because they expose their internal MTA id. Unfortunately, alerts of type “Delays” lack this alert which forces us to compare the body text of the alerts from the current minute with those of the last minute to maintain continuity.

If “Delay”-type alerts have a unique identifier that is used internally at the MTA, it would be extremely helpful if it were to be exposed in the feed, similar to the “Planned Work”-type alerts.

The lack of a unique identifier can be particularly troublesome for long-lasting delays which, for some reason update their “Posted” approximately every two hours. This change causes our system to identify them as two distinct alerts.

@danielle That sounds great! Unfortunately I couldn’t make tonight’s but I can give you an idea of what we’ve talked about so far.

A few weeks ago I spoke with Henry, who was also working on the project and said he had access to the app that’s scraping the data. I wasn’t able to get access (seems like Heroku might be the issue) but understood from him that the current status of the project is somewhat on hold, and the roadblock seemed to be consistently parsing the data (what you allude to in that request — inconsistent station names, directions, etc.).

Regardless of access to the app, it seems like there are a few separate pieces to the project, as we understood it.

  1. The scraping/storing of the XML in some database. This seems to be happening in a few places. Is there an ideal source for this historically, and if so, how would one get access to it?
  2. The parsing of the XML. We understood that this is somewhat in progress, but it’s hard to determine what exactly the status is. The way we saw it, this piece would take the XML, either from the feed or from a DB w/ historical XMLs, and store parsed information in a well-designed database.
  3. An API for that parsed data. This would hit that database of parsed data and could query for things like line, station, or delay type.
  4. Any apps/visualizations that communicate directly with the API.

We were most interested parts 2 and 3, and have been discussing and planning how this would work. Here’s the (extremely simple) breakdown of the tables we’d ideally want, at least to start:

Our next step was either to get access to historical data, or start to store the XML historically ourselves. Without some store of data, it’s hard to tell how to design the tables in an ideal database.

I’d love to hear your thoughts on this — whether it’s consistent with what you’d planned or is redundant/detractive.

1 Like

Where is the data?

For what it’s worth, I’ve posted the data I’m collecting to (also includes elevator/escalator status, Bus Time data, and an archive of GTFS files.)

1 Like

@nathan_johnson This is way later than I’d like, but thank you so much for sharing that data — it’s exactly what we’d need. I haven’t stored any data myself, just started to write some of the parsing, so that data is extraordinarily helpful as a base from which to test parsing.


Hi All,

Great topic! I am the co founder of an app called TravAlarm ( which won two prizes in the last MTA App quest. The latest version we are working on can pull calendar events from your various calendars and plan journeys automatically for those events.

We are starting to create a very similar service alert data set which will be plugged back in to GTFS data to create accurate alerts and a dynamic subway map. We need more people on board to help with this. Would any of you be interested?



1 Like

Hi @arka777 — would be interested to hear what exactly you’d need help with. A few folks have done some work but I think it stalled a bit in the last few months.

Hi @jimjshields , Thanks a lot for getting back to me. We are designing Java apps that would actually modify GTFS data based on delays and also create APIs which can report other issues. Pretty big undertaking :slightly_smiling: I would love to do a skype call to explain what we need help with. My skype id is arka_bala. Whats yours?

This may be of interest to the group: Citymapper now interprets Subway planned and unplanned service change text in real time and automatically applies the changes to the schedules, so disruptions are routed around: