Tag Archives: accidents

Four observations on using road traffic accident #openData

When I walk my kids to school in the morning we navigate a crossroad junction busy with commuters going to work, shoppers going to Asda and several buses normally arriving at once.  There aren’t any controls for pedestrians so we – and countless others throughout the day – try and make it across in the split-second between the lights changing.  

 

Hulmejunction

I wrote and spoke to the council people but got the message that they were all spent up and that it wasn’t deemed a priority.  The unsaid logic was that there hadn’t been enough serious accidents there… *shakes fist**

The next step could have been a petition and wider campaign to highlight the perils of the junction, but for now I’ve turned to the data.  I’m interested to see if the council are “right”, or at least try and have a discussion based upon some facts.  I knew that via DataGM, datasets of road accidents since 2005 had been published, so I started to take a look through them.  It’s resulted in a map of accidents in Hulme – but I wanted to share some wider concerns and observations about using opendata specifically in this post.

1 – Data use means making editorial decisions

The published datasets are quite large in terms of coverage and values. I had to think about how to segment and group this according to the analysis I wanted to undertake – settling upon

  • When (dates, time etc)
  • Where (geographic and administrative geography)
  • What (number of vehicles and casualties, etc).  

The wider point here is that alreadyI was making editorial decisions on a raw dataset

2 – Transforming and cleaning opendata takes time

I spent a lot of time in Google Refine:

  • Transforming the published Eastings and Northings values to lat/long, postcodes and then administrative geography elements (ward, local authority) – using regine URL lookups and the uk-postcodes API
  • Building upon the published datetime stamp to isolate values for year, month and day – and also group times into arbitrary time slots through the day
  • Converting the numeric codes for each question into the text/English version – in order that
    the end user could navigate the data easily

So – Ive made a lot of adjustments to the original data.  Ideally, I’d like my derivation to be openly published (currently via Google Fusion Tables), but more important is sharing and attributing the steps I’ve gone through.  Again, in my usage of the open data I’m moving beyond the raw data via subjective decisions I make.  What happens to the “added value” I’m creating?  

3 – Sub datasets only make part of the story

I’ve created a map of Hulme using the Exhibit software and scripts.  That’s all very well, but I’m aware that this area is pretty meaningless outside of local politics.  Ideally, I’d like to lift the whole dataset onto such a facet browsing platform (if someone can help with Exhibit 3.0 then please shout) – but I’m aware that people may want to split and view the data via other factors – bus routes for example?

Hulmemap

This data is derived from the Stats19 dataset, which requires each accident be recorded into a standard way.  So far, roughly one third of the scope of this datasets has been published via my source – DataGM.  There are potentially tons of other insights to be gleaned from the details on the accident, according to the forms that are in use.

4 – Did I chose the right dataset?

It figures to look at accident data when looking at road safety – or does it?  What about data on traffic flows, bus routes, cycle and pedestrian throughput? Or wider data around local services and demographics?  At this point I start to get into the “overwhelmed by opendata” state – and cling back to my initial little map.  But how do we take this further and engage people?  I posted the map to the local email news forum – not a single response so far…..people probably have far more interesting things to do.

 

This has been a great process to get to grips with a few things personally.  In the meantime, I’ll keep on jumping the lights ….

 

Road Accident data – slow progress

At the Open Data Hackday on Saturday, I started a project I’ve been long meaning to attempt – around the road accident data that has been opened for Greater Manchester.

Context/disclosure: there is a really (imho) junction near our school, which I think merits a pedestrian crossing. I wrote to Manchester City Council, who replied that budget cuts meant it was not a priority, etc, etc.  One option was to start a petition and campaign on a single issue – the other was to take a look at the data and (perhaps) some evidence….

So – the idea is to first of all crunch the five years (20005-10) worth of accident data in Greater Manchester, and add information such as ward and super-output area.  With that, I’m hoping a Drupal that can group accidents by ward would be a good starting point to look at the evidence and contact local councillors… 

I’ve aggregated the data together and opened all 44,389 incidents in Google Refine. As I’ve only got a lat and a long for each incident, the next step has been to use the URL function in Refine to bring in some administrative geography data.  So far, I’ve tried two services:

  1. Uk-postcodes.com provides back some nice JSON, that I can easily parse.  Trouble is it takes a while – around about 24 hours for each “day” of incidents (so far, I’ve done Monday, Tuesday, and 66% through Wednesdays!)
  2. MapIt from Mysociety is far quicker, but I’m absolutely stumped as to how to parse the resultant JSON data.  I’ve read around and asked a few people, but nothing concrete on this so far….

Refine

So, currently at the data processing stage…. running in the background whilst I get on with other stuff.

In the meantime, Ric sent me this project from the BBC – which is a useful check:

Bbc-map2