Converting a CSV to JSON in PHP
I recently was working on a project where I needed to take a CSV file and convert it to json with a specific structure. Whenever faced with a problem that can be solved with technology I lean on a pragmatic approach of how to accomplish it with the tools I already know. I am fairly adept at creating PHP scripts, so I chose to use PHP to write a script to make the conversion.
I could have written it to be executed once and then done, but I predicted that I will need to be able to run this logic multiple times, so I made it more robust than a one-off solution.
CSV to Array
The first step I needed was a convenient way to just convert the CSV into a plain old PHP array, since dealing with the CSV file directly is cumbersome.
Let's assume we're starting with the following CSV data:
First Name | Last Name | Date | Age | Pizza Topping |
---|---|---|---|---|
Justin | Bailey | 2014-09-21 | 25 | Onion |
Ricky | Martin | 2013-05-14 | 22 | Cheese |
Sarah | Connor | 2015-01-28 | 31 | Artichoke Hearts |
We want it to be converted to this:
$data = array (
array(
'First Name' => 'Justin',
'Last Name' => 'Bailey',
'Date' => '2014-09-21',
'Age' => '25',
'Pizza Topping' => 'Onion',
),
array(
'First Name' => 'Ricky',
'Last Name' => 'Martin',
'Date' => '2013-05-14',
'Age' => '22',
'Pizza Topping' => 'Cheese',
),
array(
'First Name' => 'Sarah',
'Last Name' => 'Connor',
'Date' => '2015-01-28',
'Age' => '31',
'Pizza Topping' => 'Artichoke Hearts',
),
);
A search on Google revealed a gist with a nice PHP function to convert a CSV to a PHP array. Thanks Jay Williams.
In this gist, the function csv_to_array()
expects a path to a CSV file as
input and it returns an array of the rows using the first row as headers. Let's
copy that function and paste it into a new PHP file csv-to-json
.
Here is our script so far:
<?php
$file = 'data.csv';
$data = csv_to_array($file);
function csv_to_array($filename='', $delimiter=',')
{
// Function from gist above, copy pasted here.
}
We can run this from the command line with php csv-to-json
(note the file
doesn't have to have a .php
extension).
Capturing the Input
Now let's make it easier to specify the file we want to convert and allow our script to be reusable.
First, let's make it so you can pass in the name of the file from the command
line. To do this, we'll use the $argv
variable, which is an array of
arguments passed into the PHP interpreter when invoked from the command line.
The first index (0) of $argv
is always the name of the PHP script. The first
argument is index 1. This means if we ran the script with the command php
csv-to-json mycoolfilename.csv
, then the $argv
variable will be populated
with this:
Array
(
[0] => csv-to-json
[1] => mycoolfilename.csv
)
With this we can grab the name of the file we should read in as the CSV file. We'll add this to our script.
<?php
$file = $argv[1];
$data = csv_to_array($file);
function csv_to_array($filename='', $delimiter=',')
{
// Function from gist above, copy pasted here.
}
Check an Argument is Passed to Script
To make the script smarter and easier to use we'll add an error message if there is no argument passed to the program. Add the following:
if (!isset($argv[1])) {
print "Missing parameter\n";
exit(1);
}
Note that if there is a problem we exit the script with an exit status of 1. Exiting with a non-zero value in a shell script means that something went wrong. So doing this is a good practice.
Check File Exists
Before we pass the file off to our conversion function, let's make sure the
file exists first. Add the following to our script before the invocation of
csv_to_array()
:
if (!file_exists($file)) {
printf("File '%s' not found or not readable.\n", $file);
exit(2);
}
Here we're exiting with a different non-zero value. In my head (as we're building out this script), I'm defining that an exit status of 1 means we're missing an argument and an exit status of 2 means the file could not be read.
Add a Shebang
Next let's add a shebang line for PHP (This is for Mac/Linux, sorry Windows users). What this does is tells the shell which program to use to execute this script.
Add this to the first line of the script:
#!/usr/bin/env php
<?php
// ... rest of script
With the shebang in place we can run this script without typing php
first.
Simplifying our execution to typing just ./csv-to-json <filename>
.
The file must also be executable for this to work. Run the command chmod a+x
csv-to-json
to make it executable for all users.
I also added a comment block at the top of the file so we know what it is for and how to use it. Here is our script so far:
#!/usr/bin/env php
<?php
/**
* This script is used to convert a CSV file
* into json format
*
* Run it from the command line like so:
* $ ./csv-to-json data.csv > data.json
*/
if (!isset($argv[1])) {
print "Missing parameter\n";
exit(1);
}
$file = $argv[1];
if (!file_exists($file)) {
printf("File '%s' not found or not readable.\n", $file);
exit(2);
}
$data = csv_to_array($file);
The JSON
Now let us work on the part to convert this data to JSON. Let's pretend that we were asked to make the data the following JSON structure. A hash is calculated from the data (an md5 checksum) and the current date is inserted to let us know when the json data was generated. The people data are in a node called "persons." Additionally, the invidual data should be organized by person's name and then the date captured.
{
"hash": "ec048083dda179acd193892afcb231b3",
"date": "2015-05-21 11:57:59",
"persons": {
"Justin Bailey": {
"2014-09-21": {
"name": "Justin Bailey",
"date": "2014-09-21",
"age": "25",
"pizza_topping": "Onion"
}
},
"Ricky Martin": {
"2013-05-14": {
"name": "Ricky Martin",
"date": "2013-05-14",
"age": "22",
"pizza_topping": "Cheese"
}
},
"Sarah Connor": {
"2015-01-28": {
"name": "Sarah Connor",
"date": "2015-01-28",
"age": "31",
"pizza_topping": "Artichoke Hearts"
}
}
}
}
To tackle this challenge we are going to iterate through the data rows (in our
array from the CSV) and create new StdClass()
objects (PHP's 'blank' object),
which works well for translating to JSON objects.
Make the Person Objects
Here is the basic logic to create the persons
collection.
$persons = array();
foreach ($data as $row) {
$person = new StdClass();
if (!isset($row['First Name'])) {
continue;
}
$person->name = $row['First Name'] . ' ' . $row['Last Name'];
$person->date = $row['Date'];
$person->age = $row['Age'];
$person->pizza_topping = $row['Pizza Topping'];
$persons[] = $person;
}
Make the Base Object
Now we will set up the elements of the base JSON object, which will store the
hash, the date and the persons collection. We're calling it a
$personRegistry
, because this is a registry of people (with their declared
pizza toppings, very important information).
$personRegistry = new StdClass();
$personRegistry->hash = '';
$personRegistry->date = date('Y-m-d H:i:s');
$personRegistry->persons = new StdClass();
$p = $personRegistry->persons;
foreach ($persons as $person) {
$p->{$person->name} = new StdClass();
$p->{$person->name}->{$person->date} = $person;
}
The foreach
loop above correctly maps each person object to the required
keys based on our JSON structure requirement (name):(date):(object).
Generate the Hash
Finally, the logic to create the hash and to output the JSON data to stdout.
To create the hash, we are going to call json_encode
on the data so it is
represented as a string for which we can calculate an md5 sum. This is merely
used so that if the data changes, the hash would become invalid. It also let's
us know quickly if there are differences in two files` data when comparing them.
// Make the entire registry into a string so we can give it a unique hash
$registryString = json_encode($personRegistry);
$registryHash = md5($registryString);
// Save the hash to the object
$personRegistry->hash = $registryHash;
// Output the registry
print(json_encode($personRegistry, JSON_PRETTY_PRINT));
Since we're outputting the JSON data, it is up to the user on how that gets saved to another file. Using a unix pipe or redirector you can pipe it to another file or process.
This entire script and the sample CSV file are available in this gist.
Try it out with the following command:
./csv-to-json data.csv > data.json
Now we will have our JSON data in the file data.json
. Mission accomplished.
JMS Serializer
Creating the JSON output could also have been done using Johannes Schmitt's Serializer jms/serializer. I leave that as an exercise to the reader.