Saturday, 26 February 2011

Open the door (close the door)

Open data as an overall banner is a very useful one. It says exactly what it means - opening up data (for benchmarking, performance improvement, resource allocation efficiencies etc etc).

What is becoming foggier as time passes, for me, is the role of all the individual stakeholders in this banner - I had to try and explain to a colleague from Policy how the open data landscape currently looked as she will be directly involved in the Single Data List roll out once it's out of consultation and into implementation and believe me when I say, without a pen and paper to draw all the bubbles of the interested parties, it was quite hard.

So here's my attempt at drawing the open data landscape as it is currently. All mistakes are mine, if I insult your job title, please correct me in the comments. Because this is my post, I'm going to cover them in the order in which I became aware of them - this is not necessarily the order of creation. Apologies if you feel it's 'simplistic' - I'm writing for a different audience to the one you write for. I'm writing for the Policy people who have to get their heads around all this.

Rewired State
Starting at the end of the process, just my style. Rewired State convene hack days and developers within those hack days to take data which has been released by organisations of all kinds within the public sector and do something interesting with it - in the case of hack days in one day. Prizes are usually given for the best 'hack' where hack does not equal breaking things, but creating things by 'hacking' around with the data. Rewired State recently has started to employ developers, I think, though their web page makes no mention of this for some reason.

What they also have is 7 simple rules for releasing open data:
1. No PDFs
2. CSVs not Excel (I wonder how many people would know how to create a CSV without using Excel)
3. Consistent formatting of documents (can be hard when source varies from Access databases to Oracles to Sharepoint)
4. Know what data you've got/can provide access to
5. Appoint a decision maker and contact point
6, Stick to standards (which ones?)
7. Listen to feedback

I've commented on them because I want to make a point - it's not that simple for a 4000 employee organisation. It's just not. It should be, perhaps, in the eyes of the outside world, but the simple fact is, Rome wasn't built in a day, so for now you're going to need to accept our systems were not built for this, are disparate, legacy systems we often inherited from 3rd party bods (or which are currently sat with 3rd party bods) and what should be something simple, actually isn't.


Datadotgovdotuk is a data signpost. It doesn't host the data, individual organisations do that, instead it simply provides the ability to signpost it, and crucially for everyone in the public sector to do that, all in one place. In theory, if you want to find some public sector data which has already been released, it should be located through datadotgovuk - and on this front I think it is a brilliant resource. It integrates with something called CKAN which stores all the location data which people upload - so the website is almost a pretty cover on the front of a database. In addition to this, it was also supposed to be somewhere developers could come and ask for more data and show off the apps which they'd made using the data. That side of things seems to be going less well and the forums are very quiet. Some readers will know about this site as it's where we have to signpost our over £500 spend data from.

Public Data Corporation

Datadotgovuk is all about providing data for free and Rewired State hack days rely on data being free too. So the Public Data Corporation (PDC) is raising some concern in some quarters as it has not been made clear whether data which is provided to the PDC for free from the public sector, will then be charged for 'in the interests of taxpayers'. We are not sure because the details of the existence of the PDC have been published before the actual datasets or code of conduct have been decided. There is concern from a large number of quarters that the PDC will crush innovation and curiosity and prohibit the habit of bored data gatherers to simply have a wander through some data to see what possibilities it contains for making something if a charge is imposed before it is even possible to see what the dataset contains, what format it is in, if it's 'clean' and finally if it actually contains something useful to mash up with something else. There is no doubt that the implementation of this needs careful examination not only from the point of view of the taxpayer but also the point of view of people like Rewired State. One assumes they are involved in any consultations currently happening.

Single Data List

The single data list is the proposed 'replacement' for the National Indicator Set which was abolished last year along with the Audit Commission whose job it was to monitor the NIS. There are over 400 or so requirements to provide data by Councils in the Single Data List some of which directly correlate with the old NIS and some of which do not. Looking at this from a purely open data stance, there is an assumption that we will be required to publish either centrally or locally the detailed equations and processes behind our Single Data List results but it is not clear whether we will be locked into an agreement with the PDC on this and this is quite crucial. If you wish to see others responses to the Single Data List, please Google. It's an interesting read.

Protection of Freedoms Bill - Section 92 (Release and publication of datasets held by public authorities)

Included because it affects us local govvies - it covers how we release data as a result of a Freedom of Information request and states that we will make the data permanently available electronically once someone has requested it in a FOI. It doesn't mention whether this needs to be signposted form datadotgovuk and it doesn't mention whether it will need to be submitted to the PDC either. It also mentions that the format of the data should be 'capable of re-use', something which seems a little fluffy to me - PDFs fine then?
There's also a small issue of it mentioning a licence in 92 (3) 11A (2) (apologies if not right syntax, it's been 15 years since I did law) which is never explained, but which is quite important as it mentions copyright and us making data available under the relevant licence.

It does specify in the next sub section that:
“the specified licence” is the licence specified by the Secretary of 

5State in a code of practice issued under section 45, and the 

Secretary of State may specify different licences for different 

However S45 in that Bill refers to Devolution of Scotland and Wales and S45 in the original Freedom of Information Act which I am told this Bill is actually amending is Issue of Code of Practice for the Secretary of State. So we appear to have a hanging reference to a 'licence' which one hopes will be clarified before the Bill becomes an Act or there are going to be far more confused people than just me.

Code of recommended practice for Local Authorities (consultation)

Taking all the above into account, the Department of Communities and Local Government would like to know what you think about open data and local government. You have until the 14th March to contribute and somewhat unusually, I think, comments on the forum linked above will count as actual comments on the consultation which will be fed back as part of the formal consultation process.

Somewhat interestingly, there are already mutterings about the fact that it is a Recommended Code of Practice only, that there will be no penalties for complying and that this might lead to no one actually taking the blindest bit of notice, especially when compared and contrasted with less staff and the Localism agenda. It is interesting also to note, in particular, Annex B Section 17 which states data must be made available as quickly as possible and if this results in errors in the data so be it. A difficult one to reconcile when crucial financial and resource allocation decisions may be made as a result of this possibly incorrect data.

In typical style we're looping back around to the end of the process again:

Linked Gov

Linked Gov is a social enterprise and quite a clever one (I'm declaring interest, here). The idea is to acknowledge that the public sector is new to data publishing on the scale which is becoming expected, that benchmarking can't happen if all the data from all contributing Authorities is not comparable both in format but also in content and the location of that content within the format, and that accuracy and cleanliness of the data is paramount in order for it to yield performance assessments, incorrect resource allocations, monetary issues, codding allocation issues etc. In acknowledging this, they have come up with a rather interesting way to rectify all these issues - incentivise the act of cleaning the data and interlinking the same data but from different sources and turn that action into game play. Where some visible tracking and reward is possible for the voluntary effort which you contribute to doing those aforementioned actions. It is in fledgling stages at the moment, but taking all of the above post into consideration I believe it identifies something no one else does (though the next entry covers this too), which is that data can be unwieldy, massively scaled, inappropriately presented and a incredibly horribly time consuming to clean and present in the correct format in the correct place at the correct time. And I speak from personal experience here, believe me.

There is far far more to Linked Gov than simply this - the scale of it is breath taking once you start to investigate and I would encourage you to do so - semantic is becoming a reality.

And to the final player in this epic post of epicness:

Making A Difference With Data

Making A Difference With Data (MADWD) was created, I think, at UK Gov Camp 2011 which I posted about before. I must confess it was completely off my radar until it launched a few days ago so my understanding of where it fits, the reasons and motivations behind it and who funds and runs it is nil. However, the front page points, very emphatically, in the 'doing' camp and I think we're going to see some considerable developments of data mash ups and collisions in the next few months from this site. I suspect this will also be the place which you will come to, to demonstrate to your peers in Finance and Legal, what can actually be done with open data and why it matters and why it's worth the effort.

One more. Last, but definitely not least.

London Data Store and now Manchester Data Store

Not directly relevant to my colleagues and I, nevertheless, both these sites are incredibly helpful collections for developers and those who are simply 'data curious' (I'd class myself in that, I think). Collations of all relevant data to a place or city might seem obvious, but I think it is fair to say that only now do we have the scale of data available and the quality and relevance of data available to make these sites the absolute goldmines that they are. Go and have a play - yes London and Manchester have the audience for the apps which are being created, and yes that audience might not exist in the same scale elsewhere, but all you have to do is think on a County level instead of a city level and perhaps the potential can start to be realised. Admittedly some out of silo working will be required, but one assumes the London Data Store has been liasing with not an inconsiderable amount of London Boroughs to get to where it is - the challenge is for others to step up to the plate and stake their claims as collators and diplomats.

And there you have it. The reason why an hours lunch on Tuesday turned into 90 minutes and could have gone on far longer - so this post is an attempt at perhaps being somewhere that, at least for the next few weeks, people can point others to as being a snapshot of where open data is at in the UK - before it changes and someone launches something else.

As ever, please comment if I have made any glaring omissions or mistakes. This post brought to you by @pip_cross, @emercoleman, @hadleybeeman, @paul_clarke, @annelidworm, @socialtechno - all of whom have contributed massively to my knowledge and understanding of all these sites and issues and have patiently listened or answered questions when I have been trying to get my head around it all.


  1. Good post. Couple of minor points. Re the CSV/Excel point on the Rewired State section, it reads a little as though it should be CSV files produced separately from Excel, whereas the point is that if you have Excel spreadsheets you should put them online not as Excel files but as CSV files (using Save As...).
    Also would hope OpenlyLocal is worthy of a mention, as the single largest source of local government data in the UK, with over 10,000 councillors, and 1,500,000 Local Authority payments, all open data

  2. Thanks for putting this together. It's getting re-tweeted and I hope it attracts a lot of attention.

    I thought it might be a good idea if I cleared up the mystery about CSVs. CSV stands for 'Comma Separated Values' and is, quite simply, the Easiest Structured Data Format Ever (ESDFE).

    All you have to do is
    1) put your data in a text file
    2) use one row per item
    3) put descriptive names in the top row
    4) separate values with commas!
    Like this:

    Club, Ground, Capacity
    Blackburn Rovers, Ewood Park, 31400
    Burnley, Turf Moor, 22546
    Bolton Wanderers, Reebok Stadium, 28723

    No Excel required, just a logical mind and a bit of patience, and you have data that anybody can write a program to play tunes on.

    But if you DO have a copy of Excel (or a compatible spreadsheet application), you can import your CSV file and watch it format itself into columns. This is a helpful way of checking you haven't made any mistakes.

  3. Thank you for your kind words and warm inclusion in this really helpful blog post. We are honoured to be in such wonderful company.

    To clarify a couple of things:

    1. Our 7 points on open data are just what developers were repeatedly saying at the end of every hack event in the Q&A. We were forever being asked to write them down, so we have. Totally not meant as gospel! But you raise some very valid points above, we will have a look at the wording of them - thanks for that.

    2. We haven't hired any developers, we run hack days - but we are massive developer fans. HUGE developer fans.

    (For future ref if you are writing more about this and you need to ask us anything or clarify stuff that is confusing you, please feel free to use our contacts page here you can call, email, tweet whatever you like - or pop in for tea. But we have just redesigned the site and it might have been a bit tricky to find this page - sorry about that!)