Converting a CSV to JSON in PHP

I recently was working on a project where I needed to take a CSV file and convert it to json with a specific structure. Whenever faced with a problem that can be solved with technology I lean on a pragmatic approach of how to accomplish it with the tools I already know. I am fairly adept at creating PHP scripts, so I chose to use PHP to write a script to make the conversion.

I could have written it to be executed once and then done, but I predicted that I will need to be able to run this logic multiple times, so I made it more robust than a one-off solution.

CSV to Array

The first step I needed was a convenient way to just convert the CSV into a plain old PHP array, since dealing with the CSV file directly is cumbersome.

Let's assume we're starting with the following CSV data:

First Name Last Name Date Age Pizza Topping
Justin Bailey 2014-09-21 25 Onion
Ricky Martin 2013-05-14 22 Cheese
Sarah Connor 2015-01-28 31 Artichoke Hearts

We want it to be converted to this:

$data = array (
    array(
        'First Name' => 'Justin',
        'Last Name' => 'Bailey',
        'Date' => '2014-09-21',
        'Age' => '25',
        'Pizza Topping' => 'Onion',
    ),
    array(
        'First Name' => 'Ricky',
        'Last Name' => 'Martin',
        'Date' => '2013-05-14',
        'Age' => '22',
        'Pizza Topping' => 'Cheese',
    ),
    array(
        'First Name' => 'Sarah',
        'Last Name' => 'Connor',
        'Date' => '2015-01-28',
        'Age' => '31',
        'Pizza Topping' => 'Artichoke Hearts',
    ),
);

A search on Google revealed a gist with a nice PHP function to convert a CSV to a PHP array. Thanks Jay Williams.

In this gist, the function csv_to_array() expects a path to a CSV file as input and it returns an array of the rows using the first row as headers. Let's copy that function and paste it into a new PHP file csv-to-json.

Here is our script so far:

<?php

$file = 'data.csv';
$data = csv_to_array($file);

function csv_to_array($filename='', $delimiter=',')
{
    // Function from gist above, copy pasted here.
}

We can run this from the command line with php csv-to-json (note the file doesn't have to have a .php extension).

Capturing the Input

Now let's make it easier to specify the file we want to convert and allow our script to be reusable.

First, let's make it so you can pass in the name of the file from the command line. To do this, we'll use the $argv variable, which is an array of arguments passed into the PHP interpreter when invoked from the command line.

The first index (0) of $argv is always the name of the PHP script. The first argument is index 1. This means if we ran the script with the command php csv-to-json mycoolfilename.csv, then the $argv variable will be populated with this:

Array
(
    [0] => csv-to-json
    [1] => mycoolfilename.csv
)

With this we can grab the name of the file we should read in as the CSV file. We'll add this to our script.

<?php

$file = $argv[1];
$data = csv_to_array($file);

function csv_to_array($filename='', $delimiter=',')
{
    // Function from gist above, copy pasted here.
}

Check an Argument is Passed to Script

To make the script smarter and easier to use we'll add an error message if there is no argument passed to the program. Add the following:

if (!isset($argv[1])) {
    print "Missing parameter\n";
    exit(1);
}

Note that if there is a problem we exit the script with an exit status of 1. Exiting with a non-zero value in a shell script means that something went wrong. So doing this is a good practice.

Check File Exists

Before we pass the file off to our conversion function, let's make sure the file exists first. Add the following to our script before the invocation of csv_to_array():

if (!file_exists($file)) {
    printf("File '%s' not found or not readable.\n", $file);
    exit(2);
}

Here we're exiting with a different non-zero value. In my head (as we're building out this script), I'm defining that an exit status of 1 means we're missing an argument and an exit status of 2 means the file could not be read.

Add a Shebang

Next let's add a shebang line for PHP (This is for Mac/Linux, sorry Windows users). What this does is tells the shell which program to use to execute this script.

Add this to the first line of the script:

#!/usr/bin/env php
<?php
// ... rest of script

With the shebang in place we can run this script without typing php first. Simplifying our execution to typing just ./csv-to-json <filename>.

The file must also be executable for this to work. Run the command chmod a+x csv-to-json to make it executable for all users.

I also added a comment block at the top of the file so we know what it is for and how to use it. Here is our script so far:

#!/usr/bin/env php
<?php
/**
 * This script is used to convert a CSV file 
 * into json format
 *
 * Run it from the command line like so:
 * $ ./csv-to-json data.csv > data.json
 */

if (!isset($argv[1])) {
    print "Missing parameter\n";
    exit(1);
}

$file = $argv[1];

if (!file_exists($file)) {
    printf("File '%s' not found or not readable.\n", $file);
    exit(2);
}

$data = csv_to_array($file);

The JSON

Now let us work on the part to convert this data to JSON. Let's pretend that we were asked to make the data the following JSON structure. A hash is calculated from the data (an md5 checksum) and the current date is inserted to let us know when the json data was generated. The people data are in a node called "persons." Additionally, the invidual data should be organized by person's name and then the date captured.

{
    "hash": "ec048083dda179acd193892afcb231b3",
    "date": "2015-05-21 11:57:59",
    "persons": {
        "Justin Bailey": {
            "2014-09-21": {
                "name": "Justin Bailey",
                "date": "2014-09-21",
                "age": "25",
                "pizza_topping": "Onion"
            }
        },
        "Ricky Martin": {
            "2013-05-14": {
                "name": "Ricky Martin",
                "date": "2013-05-14",
                "age": "22",
                "pizza_topping": "Cheese"
            }
        },
        "Sarah Connor": {
            "2015-01-28": {
                "name": "Sarah Connor",
                "date": "2015-01-28",
                "age": "31",
                "pizza_topping": "Artichoke Hearts"
            }
        }
    }
}

To tackle this challenge we are going to iterate through the data rows (in our array from the CSV) and create new StdClass() objects (PHP's 'blank' object), which works well for translating to JSON objects.

Make the Person Objects

Here is the basic logic to create the persons collection.

$persons = array();

foreach ($data as $row) {
    $person = new StdClass();

    if (!isset($row['First Name'])) {
        continue;
    }

    $person->name = $row['First Name'] . ' ' . $row['Last Name'];
    $person->date = $row['Date'];
    $person->age = $row['Age'];
    $person->pizza_topping = $row['Pizza Topping'];

    $persons[] = $person;
}

Make the Base Object

Now we will set up the elements of the base JSON object, which will store the hash, the date and the persons collection. We're calling it a $personRegistry, because this is a registry of people (with their declared pizza toppings, very important information).

$personRegistry = new StdClass();
$personRegistry->hash = '';
$personRegistry->date = date('Y-m-d H:i:s');
$personRegistry->persons = new StdClass();
$p = $personRegistry->persons;

foreach ($persons as $person) {
    $p->{$person->name} = new StdClass();
    $p->{$person->name}->{$person->date} = $person;
}

The foreach loop above correctly maps each person object to the required keys based on our JSON structure requirement (name):(date):(object).

Generate the Hash

Finally, the logic to create the hash and to output the JSON data to stdout.

To create the hash, we are going to call json_encode on the data so it is represented as a string for which we can calculate an md5 sum. This is merely used so that if the data changes, the hash would become invalid. It also let's us know quickly if there are differences in two files` data when comparing them.

// Make the entire registry into a string so we can give it a unique hash
$registryString = json_encode($personRegistry);
$registryHash = md5($registryString);

// Save the hash to the object
$personRegistry->hash = $registryHash;

// Output the registry
print(json_encode($personRegistry, JSON_PRETTY_PRINT));

Since we're outputting the JSON data, it is up to the user on how that gets saved to another file. Using a unix pipe or redirector you can pipe it to another file or process.

This entire script and the sample CSV file are available in this gist.

Try it out with the following command:

./csv-to-json data.csv > data.json

Now we will have our JSON data in the file data.json. Mission accomplished.

JMS Serializer

Creating the JSON output could also have been done using Johannes Schmitt's Serializer jms/serializer. I leave that as an exercise to the reader.