Government Releases Spending Data

The raw data from the government's Combined Online Information System (COINS) was released on Friday.

The information can be found here.

COINS records government expenditure and categorises it by various headings, such as government department, project, account or month. The categories are fiendishly difficult to translate into meaningful, real-life things. If you'd like to know exactly how much the previous government spent on ID cards, for example, you'll be disappointed. According to The Guardian, they've cunningly hidden the figures in a general 'identity and passport' category.

To anyone familiar with local government expenditure, the released data may resemble council budget codes and expenditure records.

The data will likely provide some good information on where taxpayer's money has been spent, but it's probably vague, obfuscated and in some cases perhaps misleading.

For example, money might be spent purchasing assets and services in one category, whilst a different department begins monthly payments to the first department against a different spending category to the same department, and a few months later the entire spend against services is mysteriously refunded, and 2 years later the assets are amortised... (see the Olympic funding contribution category for an example of this sort of confusion)

It would take serious time and effort to uncover useful expenditure information from amongst the inter-departmental cost-code juggling and accountancy-speak.

Hopefully future expenditure data releases will be more straightforward - all central government expenditure over £25,000 and all local government expenditure over £500 is to be published online from November.


The Guardian's COINS data query tool

The Open Knowledge Foundation

Notes and Research

The newly published database is difficult to work with and requires access to a high performance database system such as Oracle, MySQL, PostgreSQL, or Microsoft SQL Server - there are over three million records in the files which is more than Microsoft Excel can load and is probably impractical for Microsoft Access.

Even for the IT-literate, there are additional problems. Firstly, the compression uses a proprietary PKZip compresion extension to handle the large file size. Windows users can make use of Winzip 9.0 to decompress the file, while Linux users can use p7zip.

Once the data is uncompressed, you'll find that it's presented in Unicode UTF-16 format, despite containing no Unicode characters. This means the filesize is twice as large as it needs to be - every other byte in the file is effectively meaningless and set to zero. UTF-16 can be converted to plain ASCII by removing the first two bytes and then removing every second byte. The largest file is over 4GB, which may present problems for older filesystems.

The data also includes the, "x number of rows affected" output from the database server right at the end of each file, which has to be removed.

The Treasury notes that accompany the data say that a more accessible version will be released in August.

Edited 07.06.10.