quick method to clone a MySQL database

let’s say you have a MySQL database on db1.db and you want to clone it to db2.db

the “official” way to do this is to run a “mysqldump” on db1.db and then import the resulting .sql file into the db2.db server.

There are problems with this approach:

  • mysqldump locks the source database, making it inaccessible while the dump is happening.
  • mysqldump creates files which may be many times the size of the source database’s binary files, potentially exhausting the space on your source server before it’s even done.
  • the resulting file then needs to be imported into the target server, which could take hours depending on the size.

I needed to clone some databases in a hurry that are about 20G in size. The method I used ended up taking less than half an hour to complete, and the source database (db1.db) only had to be down for less than a minute, instead of the potential /hours/ in the mysqldump method.

  1. use rsync on db2.db to copy the data directories from db1.db to db2.db:
    cd /var/lib/ && rsync root@db1.db:/var/lib/mysql ./ -rva –progress –delete
  2. use rsync on db2.db to copy binary logs from db1.db to db2.db:
    cd /var/log/ && rsync root@db1.db:/var/log/mysql ./ -rva –progress –delete
  3. repeat 1&2 (the first time around would take some time. the second time around will be quick)
  4. on db1.db, stop the database
    service mysqld stop
  5. on db2.db, repeat 1&2 one last time
  6. on db1.db, start the database again, and start the slave service if you need to
    service mysqld start
  7. on db2.db, remove auto.cnf and any innodb log files
    cd /var/lib/mysql/ && rm -f auto.cnf ib_logfile*
  8. start the database, and start the slave if needed
    service mysqld start

With the above method, your source database will be down for only a minute or so (steps 4-6).

The reason that 1&2 are repeated 3 times:

  1. clone the db1.db database from scratch. this will take a while
  2. because it took so long to run #1, there are probably a lot of changes. repeat to get those changes
  3. when you stop db1.db, some files will get final changes as they are changed. grab those after db1.db has been stopped

You need to delete any existing innodb logs (step 7) which might cause the system to attempt to “fix” some tables it might think are broken. but, because we did a clean shutdown in step 4, this is not necessary. so delete the log files (they will be recreated automatically).

If you are doing the clone because you want to create a new slave database, then the database needs a new internal ID that it will send to the master. By deleting auto.cnf, you force the MySQL server to create a new unique ID.

front right top corner

I’ve done the back corners of the printer. Now, I can tackle the front.

The front corners are where I will put the motors that control the X/Y coordinates of the hot-end.

So far, everything I’ve printed is symmetrical, but the two belts are at different heights, so in this case, one motor will be higher than the other.

I’ve decided that the right motor (right when facing out from the printer. left when facing the printer) will be the top motor.

I’ve designed the model for this so that it can wrap over the end of the case edge (and you can screw into it) and bolt the motor into the model.

front right top corner, with model Nema-17 motor in place

front right top corner, with model Nema-17 motor in place

front right top corner model

front right top corner model

It’s best to print this one on its side, so there is no support needed, and less cleanup in that space between the walls. After printing this out for the first time, I found that the wall space in my print was too tight, so I adjusted the STL file to add 1mm more space. this should not matter much.

motor and printed model added to box

motor and printed model added to box

back top corners

CoreXY printers have two timing belts overlaid on each other around the box. To allow the belts to move, bearings are placed in various corners. Today, I’ll tackle the back top corners of the printer box.

In the image below (taken from a scene of this video), you can see how it’s handled usually:

image showing back corners of CoreXY belt system

image showing back corners of CoreXY belt system

Because I’m trying to avoid using any rods are other forms of complex structure, I decided to come up with a printed solution that I could attach to the wooden corners of the box.

The design with bearings and a washer in place will look like this:

back top corners of print, with two bearings and a washer in place

back top corners of print, with two bearings and a washer in place

This slots neatly over the wood at the back top corners of the box.

The design is not yet perfect. I anticipate there will be pressure towards the center of the box on the bottom bearing, so I should have screw holes at the bottom of those walls as well. But, I think this will do for the “bootstrap” printer.

An improvement I will be making as soon as the prototype is complete, is to replace the metal bearings with 3d-printed bearings, like in this video. That will get me closer to having a purely 3d-printed 3d printer. Also, 3d-printed bearings will be cheaper than metal bearings, reducing the cost for future printers.

So to create the corners, we will need to print out two each of the outer back top corners, and the inner back top corners. Don’t slot them together until you have your bearings. Otherwise you will find it difficult (or impossible) to separate them without breaking them.

inner back top corner. bearings and washer go on the pole

inner back top corner. bearings and washer go on the pole

Screenshot from 2016-03-23 20-13-32

outer back top corner. the hole on the top slots onto the inner corner’s pole to keep it still

Once your pieces are printed, place an LM8UU bearing on each pole, then a washer, and then another LM8UU bearing. Slot the bottom piece with the pole into the top piece so that the pole goes into its corresponding circular hole in the top piece. You might need to shave the top of the pole slightly to make this fit. Don’t shave too much.

Finally, place the corner pieces over the back top corners of the box and bolt them in place. For the other edge and corner pieces so far, you could use screws, but this one will need bolts because there will be inward pulling force on the pieces from the belts going through them.

where to put the inner back top corner pieces

where to put the inner back top corner pieces

I don’t yet have the bearings for the corners, so the photo below is of installation on one side without the bearings. When the bearings arrive, I’ll update this post.

back top corner. the belts loop around the pole on this (after bearings are added)

back top corner. the belts loop around the pole on this (after bearings are added)

putting the box together

KV Printer 1 will be basically a 50cm^3 cube, giving quite a large printable area.

Obtain a 5mm plywood sheet and cut 4 50cm^2 squares in it. These form the base and walls.

Next, we need to stick this together at the corners. To do that, print out 2 corner pieces and 6 edge pieces. Using these as templates, drill 2mm diameter holes in all corners of the wooden squares (they’ll be 20.5mm in from X and Y), then screw the squares together like in the third image below.

outer corner for 3d printer

outer corner for 3d printer

outer edge piece for 3d printer

outer edge piece for 3d printer

placement of outer corner and edge pieces

placement of outer corner and edge pieces

Notice that we have not yet fastened the back top edges together. That will be done in the next post.

The finished product at this stage looks like this:

printer box after installation of back bottom corner and side edge pieces

printer box after installation of back bottom corner and side edge pieces

building a new 3D printer

after working with the MakiBox 3D printer for 8 months, I think I’ve learned enough about its failings to start building my own.

I’ve started building a 3D printer of my own, based on the SmartCore idea, but with enough changes that this will be my own design.

Makibox (on the right) printing out pieces for the new KVPrinter version 1. The wood on the left is for the walls and base of KV Printer 1

Makibox (on the right) printing out pieces for the new KVPrinter version 1. The wood on the left is for the walls and base of KV Printer 1

The MakiBox printer’s major failing (as far as I’m concerned) is in how it controls the X/Y position of the hot-end.

To do this, it has two long horizontal threaded rods, against the back wall and the left wall. These rods have long arms positioned on the threads, extending out above the print bed. Where the arms cross each other, the hot-end hangs down. Thus, the position of the hot-end can be adjusted by turning the rods.

The problem with this method is easy to see when you consider an analogy. Hold a pencil normally, and draw a 1mm line. Now, hold the pencil by the eraser end and try draw a 1mm line. The precision is just not there. The further away from the fingers the pencil lead gets, the harder it is to control it precisely.

One solution to this which I thought of, is to use a Bowden cable (bicycle brake cables, for example) to fix the position of the arms at the screw side to the position of the arms at their opposite sides. This would work, and would increase the precision of prints drastically, but it’s a lot of work and would look ugly.

After seeing the SmartCore printer, I decided that instead of fixing what I have, I would use what I have to make a new printer. In a way, I am printing a new printer. At least, parts of one.

The SmartCore printer is based on the CoreXY positioning technology, which is similar to the Bowden solution I came up with. Here is a video showing CoreXY in motion

In CoreXY, the hot-end (or drawing thing in the video) is positioned on a moving platform. It can move in X along the platform, and the platform itself moves in Y along rods in the sides of the frame.

To reduce cost in my own printer, I will replace the Y and Z rods with ledges that the platform will slide along.

My calculations suggest that the material cost of my printer will end up being below €150. If this ends up being correct, and the printer is as good as I hope it to be, then I will sell kit packages of the printer for €200.

Bill of Materials:

item amt cost per piece total
nema 17 motors 2 €18.61 €37.22
nema 17 motors 2 €12.675 €25.35
rods, 8x500mm 2 €4.58 €9.16
lm8uu bearings 12 €0.5075 €6.09
608 bearings 10 €0.237 €2.37
timing belts (meters) 5 €1.004 €5.02
controller board 1 €25.71 €25.71
pfte bowden tube 1 €7.97 7.97
hot end 1 €8.53 €8.53
psu, 12v 20a 1 €21.29 €21.29
Total €148.71

I’m working on construction at the moment. I’ll write more articles as I go.

Makibox 3D Printer

I had the option to get my birthday present about two months early. Jumped at the chance.

Makibox, a 3D printer company, is selling off its entire stock of printers (makiboxclearance.co.uk), so it was a chance to get something cheap that I can hack on.

The package only took a week or so deliver, which is much better service than I expected, based on some of the messages I’d seen online.

I bought the unheated version (here) in kit form.

It took a few hours to put the machine together. I didn’t try printing anything until the next day.

The printer works by raising and lowering a print bed (the Z axis), and moving a “hot end” around on top of that in X and Y. The hot end hangs from the centre of two crossbeams, one of which moves in X and the other in Y.

The first problem I encountered, was that when I went to print for the first time, the hot end immediately started carving a pretty pattern into the bed. The printer didn’t know where the bed was, so was lowering the hotend too far down.

This kept happening even after I used the “bed leveling wizard” in Cura, the first step of which is /supposed/ to define where the bed is. But, no matter how accurately I did the first step, it totally ignored that and reset automatically to a level where it thought the bed was a few millimetres lower than it actually was, making the hot end drive straight into the bed.

It took me a while to figure out the problem – that the bed depth was “hard-coded” into the printer’s hardware – before every print, it would raise the bed right up until the platform-raising piece on the X axis screw touched against the “end-stop” switch at the top.

The solution to that was to glue something to the top of the platform-raising piece so it would hit the switch sooner. In the end, I glued a scrabble piece and a sim card (I had them at hand) on. This artificially lowered the expected bed depth by about 2cm, which is much more than is needed for the hot-end that comes with the printer, but is perfect for the replacement hot-end I ordered next.

The original hot-end sucks. They even say it themselves – in their words, “the standard hotend in the makibox kit is not the greatest piece of engineering ever made by man, it does have a tendency to burn out”.

The first problem I encountered with that hot-end was that it has no way of cooling off. There is an aluminium wall on one side of the base-plate, which could hold a heat-sink, but the heat-sink would be a case of “too little, too late”, as the hot-end should really be cooled right above the heating element, not 3cm above it. The problem is that when the hot-end’s heat spreads upwards, the plastic being pushed into it melts too soon, and it ends up like trying to push goo through a small hole at the bottom of a can, using a piece of spaghetti.

I /was/ going to try solve this by wrapping some tubing around the hollow bolt above the heating element, and run water through it, but the hot-end just stopped working on me completely, so I decided to pay for a better solution.

This solution was the E3D V6 (Lite), which has a proper heatsink, and a fan.

The E3D V6 took a few days to arrive, and when it did, there was a few hours assembly needed. The hardest part was figuring out how to connect the Bowden tube to the Makibox’s extruder. I managed this in the end by taking an M6 (I think. maybe it was M5?) nut and screwing it directly onto the end of the Bowden tube, then the new tube would dock into the extruder just like the original.

The next problem is one I’m still working on solving. The hot-end is positioned by moving two beams. The hot-end hangs from where the beams cross each other. The problem is that the beams are moved by long screws on /one end/ of the beam. The part that connects to the screw tries to keep the beams perpendicular to the screw, but it’s like trying to lift a plank of wood by lifting just one end – difficult.

The solution for this, I think, is to run some strings around a series of wheels that guide the strings such that when one end of the Y axis moves (for example), the other end is pulled by the string to keep the beams perpendicular to each other.

So, the first prints I’m doing are holders for the wheels. The prints are really terrible, as the printer is obviously not yet in perfect working order, but after I finish fixing this problem, I can print them again in better quality 🙂

Gauss gun – part 1

At the end of the last semester of Monaghan Coder Dojo, I promised the students we’d do something cool for the next series of classes. We’re going to build a Gauss gun.

A Gauss gun is a rail, which a metal projectile travels along. It has a series of electro-magnets on it. As the projectile approaches each magnet, the magnet turns on, accelerating the projectile in towards the center of the magnet.

As the projectile reaches the center, the magnet turns off, so the projectile travels through it, and on towards the next magnet.

The same trick is done a few times, accelerating the projectile more and more each time, until it finally reaches the end of the track.

The first thing I had to do was design a circuit which you can turn on electronically that will stay on, and which you can then turn off electronically. I mean, the circuit should not involve a switch that requires physical effort to turn on and off, as that may slow down the projectile.

So, the solution I came up with was:
1. a circuit which uses a transistor to turn on. This way it can be enabled by shoving a little bit of power through the transistor’s base.
2. the circuit, once completed, will feed a little bit of its output electricity back into the transistor’s base by using a capacitor to give a smooth and continuous power line.
3. to turn it off, we will short-circuit the capacitor.

I did a quick “proof of concept” with an LED.

In the next article, I’ll show how to adapt this so that the “switches” are photoelectric cells, so you can turn the circuit on by disrupting one light beam, and turn it off by disrupting the next.

kbarcode part 1 – JavaScript

Short story:

Github repository for kbarcode – the JavaScript part of the solution. You can use it on its own, without needing Cordova at all.

Demo of kbarcode finding a barcode and then the barcode parameters being printed out onto the image that the barcode was found in.

Long story:

There are already a few barcode readers for Cordova. The most popular one is the official Phonegap barcode plugin, which is based on the amazingly comprehensive ZXing library of algorithms.

At FieldMotion, we were using the official plugin, but it had a few short-comings that meant we had to look for a better solution:

  • When looking for a barcode, the plugin opens up an external camera application. This means that your own application stops, the external app is started, and when you find your barcode, yur app is started up again. This process is very jarring, and noticeably slow.
  • You have absolutely no say over the look of the barcode scanner.
  • If you want multiple barcodes, you are out of luck – you’re just going to have to go through the selection process manually for every one of them.

What we wanted was:

  • A small camera view to appear when we press to select a barcode.
  • To be able to style this ourselves in whatever way we want.
  • To optionally keep the scanner open after it has found a barcode, so that it can keep scanning for others if need be.
  • It must feel natural and fast.

So, we went looking.

The nearest thing to a solution that we found was a combination of two plugins – Moonware’s CameraPlus plugin, which allows the camera to be opened in the background and its photos returned to a JavaScript callback for you to handle however you wish, and Eddie Larsson’s JOB (JavaScript-only Barcode Reader).

In combination, these appear to be perfect – we could get images via CameraPlus, display them in a popup UI that could be used by the user to center the barcode, and then use JOB to scan the image and retrieve the code.

Unfortunately, this method is SLOW.

I identified two main reasons for this:

  1. Streaming images to JavaScript via a Java bridge is very slow, because the images need to be encrypted in Base64 (increasing their size), and the images also need to be in high resolution in order to give the barcode reader the best chance it can get.
  2. The method that Eddie’s algorithm uses is to find the barcode in the image, no matter where it is, which involves reading the entire image. In JavaScript. Brilliant, but slow.

After some wracking of the brain, I came up with this solution:

  • Tweak the CameraPlus plugin so it returns just a small image to be displayed, and also a 1px high gray-scale strip from the center of a higher-resolution image (in byte array format).
  • Write a barcode decryption algorithm that will find a barcode in a 1D array of gray-scale values, instead of a 2D image.

This worked wonderfully. We now have a very usable barcode reader that is not very laggy, and finds the barcodes incredibly quickly. We’re also only interested in the EAN-13 encoding, so we don’t need to check for other encodings.

The reason we chose to use a 1D strip instead of the entire image, is that if you have a UI which has a marker displayed where you want the user to put the barcode, they are psychologically inclined to do so, so you really only need to consider that single central strip, and can safely ignore the rest of the image.

It’s a Worker, so it runs in a separate thread to the rest of your code. No need to include it in your HTML file – just correct the reference to the file in the code example below.

Example usage:

var kbarcode=new Worker('kbarcode.js');
kbarcode.addEventListener('message', function(result) {
  if (result.value) {
  'cmd': 'decode',
  'img': [43,43,42,42,42,42,42,42,42,42,43,43,43,43,43,43,43,43,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,41,41,41,41,41,41,41,41,41,41,41,41,41,41,42,41,41,41,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,41,41,41,41,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,39,39,39,39,39,39,39,40,39,39,39,39,38,38,38,38,38,39,39,39,40,40,40,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,40,39,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,40,40,40,40,40,40,39,40,40,40,40,39,39,39,39,39,39,38,38,38,38,38,38,38,39,39,39,39,39,39,39,39,39,39,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,39,39,39,39,39,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,41,41,41,41,41,42,42,42,42,42,43,43,43,43,43,42,42,42,42,42,42,43,43,43,43,43,44,44,44,44,44,45,45,45,45,45,46,46,46,46,46,47,47,47,47,48,48,48,48,49,49,49,49,50,50,50,51,51,51,52,52,52,53,53,53,53,54,54,54,55,55,56,56,57,57,58,58,59,59,59,60,61,61,62,63,63,64,65,65,65,66,66,67,68,69,70,71,71,73,74,79,115,166,191,200,202,204,204,204,205,206,206,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,207,206,206,206,206,206,206,206,205,205,205,205,203,198,184,145,93,97,143,178,181,156,105,90,114,162,175,152,105,83,81,80,79,80,80,81,88,121,168,176,141,99,86,85,85,85,98,159,189,196,198,200,200,200,198,194,175,123,95,113,162,189,196,196,195,186,146,98,96,135,174,177,147,100,91,116,171,189,195,195,191,173,127,85,81,80,79,79,78,78,84,107,163,174,157,107,83,79,79,79,79,78,78,78,78,78,85,116,167,175,155,108,88,104,156,180,171,124,95,100,147,186,195,199,201,201,201,201,202,201,199,193,172,120,91,104,150,186,196,198,199,200,199,197,191,168,101,84,82,82,82,98,149,178,175,132,94,90,123,169,179,155,101,89,108,154,174,170,121,88,89,135,170,171,137,90,85,120,165,176,156,111,85,80,80,80,80,81,82,91,131,179,191,193,193,188,167,115,90,102,155,174,168,126,89,82,82,82,82,82,83,84,106,161,187,193,194,193,182,142,96,95,155,183,193,193,194,190,172,117,95,115,163,186,195,198,198,198,197,196,189,159,113,91,104,161,176,171,128,96,97,138,181,192,196,198,198,199,199,199,199,196,193,182,142,99,92,130,175,189,192,192,189,173,127,87,81,81,80,79,79,80,81,110,155,171,159,118,87,81,80,79,79,79,80,85,112,161,173,158,116,94,99,141,176,187,189,189,181,154,113,93,99,152,173,168,131,97,96,131,174,191,196,198,198,198,199,200,201,201,201,201,201,201,200,199,199,199,198,196,193,181,150,101,91,124,159,169,155,116,89,82,82,82,89,133,173,187,189,189,186,167,124,96,97,131,166,167,142,103,84,79,78,77,77,77,77,85,123,161,170,150,105,89,104,146,169,164,133,92,89,126,166,182,186,187,185,171,125,93,80,78,78,77,78,78,80,107,149,166,158,120,91,94,136,169,180,183,183,181,165,125,94,81,80,79,82,134,169,182,184,184,179,163,121,88,97,138,161,159,128,93,88,129,164,179,186,187,188,188,187,180,163,124,91,79,77,77,81,128,157,159,133,97,85,105,145,157,143,106,87,97,146,174,181,183,184,184,184,182,174,132,96,82,77,76,78,106,154,158,135,99,86,105,150,172,182,185,186,186,187,188,189,189,189,189,189,189,189,189,189,189,189,189,186,185,184,184,183,183,179,167,136,100,79,70,68,66,65,64,64,62,61,60,60,60,60,59,59,59,59,58,58,58,58,57,57,57,57,56,56,55,55,54,54,53,53,53,52,52,51,50,50,49,48,47,46,45,45,44,43,43,42,41,41,40,39,39,38,38,38,38,38,38,37,37,37,37,37,37,37,37,36,36,36,37,37,37,38,38,38,38,39,39,38,37,37,37,36,35,34,33,33,32,32,32,32,32,31,31,31,31,31,31,31,31,31,31,31,32,32,32,31,31,31,31,31,30,30,29,29,29,29,29,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,29,29,29,29,28,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,28,28,28]

In the next post in this series, I’ll upload the Cordova plugin we developed to use with this.

Why gay marriage will NOT be bad for children

I was shocked earlier today by one of my friends (now unfriended on Facebook), who argued for the No side of the marriage equality issue.

His argument was that marriage is all about children, and that if gay people are allowed to marry, then there will be a mass market for “child farm”-created children, and that children “need” both a mother and a father.

This, despite the facts that

  1. gay couples can already get children through adoption or surrogacy [1][2]
  2. there are many countries that have already legalised gay marriage and yet this has not caused a surge in “child farm” creation. [3]
  3. 12.5% of children live in a one-person family, so if they “needed” both a mother and father, wouldn’t this “need” show itself in some way? Research shows that there is no difference between children raised by gay parents and children raised by straight parents. [4]
  4. Almost half of all children born into a straight family are from an unplanned pregnancy, but children in a gay family are always planned. [5]
  5. Children of gay cohabiting parents (remember, marriage is still illegal…) do better in school than children of straight cohabiting parents [6]
  6. Where child abuse is concerned, the parents are usually straight. In fact, “a child’s risk of being molested by his or her relative’s heterosexual partner is over one hundred times greater than by someone who might be identifiable as being homosexual” [7][8]

Despite all of these arguments, this person continued to call me unanalytical, yet refused to provide any references backing up his own version of reality.

I don’t need to surround myself with mad people. I’m already mad enough as it is.

So, I unfriended him.

References for the above:
1. http://www.irishtimes.com/news/politics/gay-adoption-law-due-before-same-sex-marriage-referendum-1.2073215
2. http://www.irishtimes.com/opinion/why-surrogacy-has-nothing-to-do-with-same-sex-marriage-1.2189717
3. http://en.wikipedia.org/wiki/Same-sex_marriage#Legal_recognition
4. http://eu.wiley.com/WileyCDA/PressRelease/pressReleaseId-67057.html
5. http://en.wikipedia.org/wiki/Unintended_pregnancy#Europe
6. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3000058/table/t1-dem-47-0755/
7. http://www.childwelfare.gov/pubPDFs/f_gay.pdf
8. http://www.huffingtonpost.com/2010/11/10/lesbians-child-abuse-0-percent_n_781624.html

Distributed File Storage, using PHP and MongoDB


  • Alice creates an entry on Server1, and uploads an image to it.
  • Bob views that entry on Server2, but can’t view the image because the server doesn’t have it.

There are a number of solutions to this.

  1. after each upload, push the new file out to all servers so they also have a copy
  2. mount an external file system, networked to all servers
  3. create a caching distributed file system centered around an external database

The first solution, ensuring that every uploaded file is simultaneously uploaded to all servers, is wrong for an obvious reason: hard-drive space. Imagine you have 20 servers and the file is likely only to ever be read on 3 of them (maybe they’re location-based?) – by uploading to all servers, you waste space, increasing storage costs and also slowing down the servers as they are busy doing work that they really don’t need to be doing.

The second solution is better – an external mounted solution such as NFS, S3QL, or Samba can store your files on file servers that are backed up and replicated, and are simultaneously available to all your web servers. But these solutions come at a huge speed cost – every file check involves network access, lock checking, POSIX compliance and other ugliness. Also, network file systems of this sort are very sensitive to network outages, however temporary they are.

The solution we will build in this article is to create an external file system that

  1. supports local caching of files for speed
  2. has immediate availability of files across all servers
  3. is “shardable”, so files only exist on servers where they are actually needed


To store the files, you need an external storage solution. For reasons that we will see later, the solution I use is MongoDB and its GridFS solution.

MongoDB is a NoSQL database, that stores information in binary JSON files. It is extremely scalable, and shards nicely as well, allowing us to concentrate more on our application and less on database maintenance.

To store the files, we will upload them into the MongoDB network, where they will be stored as “chunks”. Retrieving and storing the files is a simple matter, as we’ll see.

Saving Files

Up until now, all your files were recorded on the system using direct access – using file_put_contents(), for example.

We need to find all instances of these calls and route them through a new function called mdbFileSet (MongoDB File Set) that will record the file as requested, but will also upload it to the database.

In most cases, this is a simple matter – if the user-files directory is $_SERVER[‘DOCUMENT_ROOT’].’/userfiles/’, then a call such as file_put_contents($_SERVER[‘DOCUMENT_ROOT’].’/userfiles/’.$filename, $filecontent) will be replaced with mdbFileSet($filename, $filecontent). This is obviously more readable, and we are abstracting the user-files location as well, making it flexible.

The actual mdbFileSet() function works like this

  1. parameters are $fname and $file, which contain the filename (including the directories, delimited by ‘/’), and the file content as a string.
  2. check GridFS to see if the file already exists. If it does:
    1. delete the existing file (see Deleting Files in this article)
  3. copy the uploaded file to the local user-files location (to act as a cache)
  4. upload the file using GridFS

Code for the mdbFileSet function:

function mdbFileSet($fname, $file) {
  if (strpos($fname, '..')!==false) { // hack attempt
    return false;
  global $MDBVARS;
  if (strpos($fname, '/')!==false) {
    @mkdir($MDBVARS['cache'].preg_replace('/[^\/]*$/', '', $fname), 0755, true);
  file_put_contents($MDBVARS['cache'].$fname, $file);
  $conn=new Mongo($MDBVARS['dbhost']);
  $db->authenticate($MDBVARS['username'], $MDBVARS['password']);
  if (!is_null($existing)) {
  $grid->storeBytes($file, array('filename'=>$fname), array('safe'=>true));

You will need to set the $MDBVARS global array before running the function. I keep mine in the server’s config.php.



Replace the values in the above code with your own values.

You can test this easily. Create a test.php file with the following code:

require_once 'php/basics.php'; // link to file containing common functions
mdbFileSet('test/file.php', file_get_contents(__FILE__));

The above code will upload a copy of the test.php file you just created, and will store a copy in your cache as well. After loading the file in your browser, you can test this by looking in your cache on the server:

[root@cp3 server]# ls userfiles/test/file.php -l
-rw-r--r-- 1 apache apache 142 Nov  6 10:36 userfiles/test/file.php

And also by logging into the MongoDB server and searching for the file:

> db.fs.files.find({filename:'test/file.php'})
{ "_id" : ObjectId("545b4f1560b99367688b456b"), "filename" : "test/file.php", "uploadDate" : ISODate("2014-11-06T10:36:05.121Z"), "length" : NumberLong(142), "chunkSize" : NumberLong(261120), "md5" : "00397d7306c53cda5ea9446d7bd62594" }

Before going any further, you should go through your code now and edit all your user-file-writing functions so they use the mdbFileSet() function. Everything should still work as before, but now, there will be a copy of each file saved in the MongoDB database as well.

Reading Files

Okay, so let’s say all your work so far in this article has been done on Server1. You now switch over to Server2 and want to open a record that includes an image uploaded to Server1. The image is obviously not on Server2, so how do we transparently download it to Server2 such that the end-user never needs to know?

For this, we will write a function called mdbFileGet (MongoDB File Get), which will retrieve it from the MongoDB server if it is not already cached locally. How it works:

  1. there is one parameter, $fname, which is the filename including the directories.
  2. if the file already exists in the local server’s cache, then return that file’s contents.
  3. otherwise, download the file from GridFS, store a copy in the local cache, and return the file’s contents.

There is an issue to do with the cache, which I’ll explain in a moment, but in the meantime, here is the code for the function:

function mdbFileGet($fname) {
  if (strpos($fname, '..')!==false) { // hack attempt
    return false;
  global $MDBVARS;
  if (file_exists($MDBVARS['cache'].$fname)) {
    return file_get_contents($MDBVARS['cache'].$fname);
  $conn=new Mongo($MDBVARS['dbhost']);         
  $db->authenticate($MDBVARS['username'], $MDBVARS['password']);        
  if (is_null($fdata)) { // file doesn't exist
    return false;
  if (strpos($fname, '/')!==false) {
    @mkdir($MDBVARS['cache'].preg_replace('/[^\/]*$/', '', $fname), 0755, true);
  file_put_contents($MDBVARS['cache'].$fname, $bytes);
  $ftime=date('YmdHis', $file->file['uploadDate']->sec);
  touch($MDBVARS['cache'].$fname, $ftime);
  return $bytes;

For an example of this in use, let’s consider an image, /userfiles/1/image.jpg that was uploaded to Server1. It’s obviously not yet on Server2, so how do we view it there?
When loading the file up (let’s say http://server2.yourcomp.any/userfiles/1/image.jpg), the server looks directly for the image, and doesn’t find it. We need to route the request through a script that makes sure the file is there before sending it back.
To do that in this case, we can use mod_rewrite so that calls to /userfiles/[whatever] are routed to something like /php/file-get.php, which handles the work.
Edit your .htaccess file, and add in something like this:

RewriteEngine on
RewriteRule ^userfiles/.*$ /php/file-get.php [QSA,L]

Now create the file php/file-get.php:

require_once 'basics.php'; // load common functions and config.php
$fname=preg_replace('/^\/userfiles\/|\?.*/', '', $_SERVER['REQUEST_URI']);
if (strpos($fname, '..')!==false) { // hack attempt
$ext=strtolower(preg_replace('/.*\./', '', $fname));
switch ($ext) {
  case 'png':
    header('Content-type: image/png');
  case 'jpg': case 'jpeg':
    header('Content-type: image/jpg');
  case 'gif':
    header('Content-type: image/gif');
    header('Content-type: ');
echo mdbFileGet($fname);

You can see that most of the file’s code is actually just figuring out the mime-type to show. The downloading and showing of the file is done right at the last line.

You can now transparently upload files on one server and view them on another!

In fact, once the file is uploaded, you can remove it completely from all servers, and then when you next need it, just load it up through mdbFileGet() as normal and it will download again.

The caching issue that I mentioned earlier has to do with cache invalidation. Let’s say we upload image.jpg and it is distributed to a number of servers. After a few hours, we might upload a replacement image – how do we tell the servers that the old image is invalid and it should be downloaded again?

We will start solving that in the next section.

Deleting Files

Deleting files is not as obvious as it sounds. On a one-server system, it’s simply a matter of using unlink() to remove the file, and there’s no more to be said about that.

However, in a multi-server system, we have three steps:

  1. delete the local cached file
  2. delete the database-stored file
  3. find all servers that have a copy of the file and delete the file from those servers.

#1 and #2 can be solved immediately in a very simple function:

function mdbFileRemove($fname) {
  if (strpos($fname, '..')!==false) { // hack attempt
    return false;
  global $MDBVARS;
  $conn=new Mongo($MDBVARS['dbhost']);
  $db->authenticate($MDBVARS['username'], $MDBVARS['password']);
  if (!is_null($existing)) {

The above will delete a file from the local server and from the MongoDB database, but will not clear the file from other server caches.

To delete from the other machines, we need to set up a deletion queue, which we’ll do later in the File Delete Queues section.

Creating File Delete Queues

To delete files from all servers, we need to send a message to those servers to tell them to delete their local copies of the file.

Sending a message to every single server in your network is a waste of resources, as most of the servers may not actually have a copy of the file you are trying to delete.

So, we need to adapt the mdbFileSet and mdbFileGet functions so they add a record to the database telling it exactly what servers have copies of the files. This will then allow us to target just those servers and to know that we’re not wasting time.

Edit the mdbFileSet function and change this line:

$grid->storeBytes($file, array('filename'=>$fname), array('safe'=>true));

to this:

  array('filename'=>$fname, 'servers'=>array($_SERVER['HTTP_HOST'])),

As a test, I uploaded an image called 3184/user-photos/3184.jpg, then checked my MongoDB instance:

> db.fs.files.find({filename:'3184/user-photos/3184.jpg'})
{ "_id" : ObjectId("545b68a160b993b86b8b4567"), "filename" : "3184/user-photos/3184.jpg", "servers" : [ "cp3.myserver.com" ], "uploadDate" : ISODate("2014-11-06T12:25:05.344Z"), "length" : NumberLong(37182), "chunkSize" : NumberLong(261120), "md5" : "9def8b14cb1611097e755692d04dcbdd" }

Note the highlighted servers section. As part of the file upload, we are initialising an array which states what servers have a copy of that file.

An important thing to note as well, is that in GridFS, the file is recorded in a set of chunks which are standard MongoDB documents, and the metadata of the file is recorded in another normal document. What we look at with db.fs.files.find is the metadata, not the file chunks. It would be uneconomical to store metadata within the same document(s) as the file chunks, as checking something as simple as its creation date, or the list of servers that have it, would then involve downloading the entire file.

Next, we need to adapt the mdbFileGet() function. Change the following:

touch($MDBVARS['cache'].$fname, date('YmdHis', $file->file['uploadDate']->sec));

to this:

touch($MDBVARS['cache'].$fname, date('YmdHis', $file->file['uploadDate']->sec));

In this, we inline-update the server array that we created in mdbFileSet(). There is no need to download, change, and re-upload the record. In fact, there is a race condition there, in that some other server may be doing the same thing at the same time. It is safer to have the MongoDB server handle the update of the document directly.

If you then open the image on another server and check the file again on the MongoDB server, you’ll see something like this:

> db.fs.files.find({filename:'3184/user-photos/3184.jpg'})
{ "_id" : ObjectId("545b68a160b993b86b8b4567"), "filename" : "3184/user-photos/3184.jpg", "servers" : [ "cp3.myserver.com", "cp4.myserver.com" ], "uploadDate" : ISODate("2014-11-06T12:25:05.344Z"), "length" : NumberLong(37182), "chunkSize" : NumberLong(261120), "md5" : "9def8b14cb1611097e755692d04dcbdd" }

Note that the servers array has an extra entry in it, but nothing else was touched. Exactly what we want.

Next, we need to adapt the mdbFileRemove function, so it builds the queue of files to delete (and what servers to delete them from).

To do that, change the following:

  if (!is_null($existing)) {

to this:

  if (!is_null($existing)) {
    if (isset($existing->file['servers'])) {
      $idx=array_search($_SERVER['HTTP_HOST'], $servers);
      if ($idx!==false) {
      $list=array_values(array_map(function($server) use ($fname) {
        return array(
      }, $servers));
      $ret=$db->command(array('insert'=>'deletes', 'documents'=>$list));

This code inserts an entry into a db.deletes collection on the MongoDB server for every server that has a cached copy of the file. Of course, it removes a reference to the local server before doing so, as we can handle that immediately.

After doing an update of the image on cp3.myserver.com, I then checked the MongoDB deletes collection:

> db.deletes.find()
{ "_id" : ObjectId("545b7f6165f402bccee49573"), "filename" : "3184/user-photos/3184.jpg", "server" : "cp4.myserver.com" }

This means we can now work on the next part; writing a deletion daemon.

Running a File Deletion Queue

We now have a list of the cached files and the servers that have them. But how do we tell those servers to delete those cached files?

A way to do this is to write a cron job that runs every minute and checks the MongoDB deletes collection to see if there are any cached files that need to be deleted, then call those servers and tell them to delete the files.

This script will need to run directly on the MongoDB server, so install PHP on that server. In particular, you will need the command-line version of PHP. In Centos7, it is installed like this:

[root@mdb1 ~]# yum install php-cli php-devel php-pear gcc openssl-devel
[root@mdb1 ~]# pecl install mongo
[root@mdb1 ~]# echo "extension=mongo.so" >> /etc/php.ini

On the MongoDB server, create a user called mongo (useradd mongo), and create a file called /home/mongo/checkCaches.php:

$conn=new Mongo($MDBVARS['dbhost']);
$db->authenticate($MDBVARS['username'], $MDBVARS['password']);
while ($d=$fdata->getNext()) {
        curl_setopt($ch, CURLOPT_POST, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_POSTFIELDS, array(
        if ($ret=='ok') {
                $db->deletes->remove(array('_id'=>new MongoId($d['_id'])));


The $MDBVARS array is almost the same as those on the application servers. We add a new item, though, apikey, which helps us provide some authentication without needing usernames and passwords. By running the filename, the time, and the apikey through an MD5 function, we create a value that can only reasonably be reproduced by another MD5 function that knows the same details. So, we send the filename, time and MD5 result through to the target server, and if the target server can reproduce the MD5 result by MD5ing the filename, time, and its own copy of the apikey, then that’s enough proof that the call is valid.

Make sure to add the apikey entry to all your servers’ $MDBVARS arrays.

On the target server, then, we create the /php/cacheClear.php file:

require_once 'basics.php';
if ($md5!=$_REQUEST['md5']) {
  echo 'incorrect API key';
if (strpos($fname, '..')!==false) { // check for hacks
echo 'ok';

As usual, there is a potential flaw to consider. The checkCaches.php file on the MongoDB server goes through every delete entry in the database, but what if this takes more than a minute to finish?

If it takes more than a minute to finish, and the script is being called once a minute, then eventually, the server will have multiple copies of the script running against overlapping lists of files, and it will crash.

The solution to this is simply to add a timeout to the script, so it runs for 55 seconds (say) and then stops.

In checkCaches.php on the MongoDB server, change the following:

while ($d=$fdata->getNext()) {

to this:

while ($d=$fdata->getNext()) {
  if ($time-55>$now) {

Now it will simply stop after second 55, and continue when it is called again.

Edit cron for the mongo user (su mongo -c “crontab -e”) and add this line then save the file:

* * * * * php /home/mongo/checkCache.php >/dev/null 2>/dev/null

That’s it! You now have a working distributed filesystem.