Open document formats and why we need them

The whole world is moving to open document formats for the obvious reason that in order to read and write documents one shouldn’t need to waste money on Microsoft’s increasingly ungainly Office application when perfectly good free and open source alternatives exist (thinking particularly of the mighty LibreOffice). The UK Government has formally adopted an open standard for all of its documents.

Despite this many public bodies in the UK, including the NHS, councils, and many other organisations, continue to use Microsoft’s proprietary formats for documents. It may seem to the casual user that the arguments for proprietary versus open document formats are rather arcane. I would like to tell a story from my own life that illustrates why in actual fact every citizen and certainly every public body should be interested in open document formats.

My wife has just applied for a job. The application process involved filling out a Microsoft Word document. Naturally, she wanted to appear efficient and professional and so was concerned that if she used LibreOffice the tables within the document would be mangled. We don’t own a copy of Microsoft Word. I consider it to be a waste of money and besides object to funding Microsoft since they have done so much to oppose open standards.

As a consequence, I have spent three hours, which I could have spent working, or looking after my family, or just relaxing from doing either, using a huge pile of technology (a Citrix enabled Word program that I can access through my job, Google’s own document preview function in their email, a .pdf to .docx converter, and LibreOffice) to ensure that the document renders nicely.

This was further complicated by the fact that the document itself, in its original state, was badly formatted, and so I had to ensure that the bad formatting left in the document really was in the original or if we had introduced it using LibreOffice.

I’m a computer programmer fluent in R, moderately proficient in PHP/ MySQL, and manage two Linux servers. Not only that but I have extensive experience with Microsoft Office from my PhD. I actually made the document better.

Not everybody will want to or be able to pay for Microsoft Office. Not everybody will know somebody who can fiddle around with the huge pile of technology necessary to make the form look reasonable. But I think it’s pretty obvious that we don’t want to make job decisions based on an application form’s looking unprofessional because the applicant didn’t have Office and wasn’t married to a geek.

And that, ladies and gentlemen, is why we need open document standards.

Backing up a remote MySQL database on a remote server using cron

This is more a collection of links, really, than a proper blog post, but I just spent quite a while figuring out how all of these different technologies all fit together to do the job I wanted so if you want to do what it says on the tin of this blog post then you should find it useful.

I have a remote server running a MySQL database. I wish to backup this server daily. I would rather do this on my other remote server than on my local machine because, obviously, my server is always switched on whereas my local machine isn’t. So the task is to dump a MySQL database to a file and upload it to a remote server.

To begin, you’re going to make sure that the server with the MySQL database on it can access the other server. This is easiest using ssh key login. I use this for all my servers and all of my local machines. There is a summary here. The beauty of using a key based login is it eliminates the need for passwords, which is not only secure in general but also means you do not need to store passwords in cron scripts or anywhere else on the computer which is highly insecure. Unless you really know what you’re doing you should never do this.

Next you’ll need to create a user on your MySQL and give it SELECT privileges so it can read the data to make the backup. This is very simple

GRANT SELECT, LOCK TABLES ON mydatabase.* to ‘backup’@’localhost’ IDENTIFIED BY ‘password’;

Now you need to set up a cron job to back up the database every day. There’s some stuff about it here, but you won’t need the bit about passwords because we’re using key authentication. As I mentioned above, it’s easier and safer. We’re also going to add some terminal magic to automatically change the file name every day to the current date so you can have daily backups on the server. Here’s the finished cron entry.

0 1 * * * mysqldump -u backup SUCE | ssh chrisbeeley@ “cat > ~/backupDB/$(date +\%F)_SUCE_DB.sql”

This runs the backup operation every day at 1am.

You’ll need to clean up the old files occasionally, otherwise your server will get full of very old copies of the database. The following command will remove all files from the directory that are older than 31 days old:

find ~/backupDB -type f -mtime +31 -exec rm {} +

BE VERY CAREFUL with this command. If you run it with the wrong directory or have any strange errors with it you may find yourself deleting a lot of files that you did not want to delete. ALWAYS be safe and test by running find ~/backupDB -type f -mtime +31 first and reviewing the files which come back. Only when you’re happy that it’s finding the right files should you add -exec rm{} +

I haven’t added this to a cron job, partly because it doesn’t seem worth the risk (the command is safe in my environment today, but always be wary of PEBKAC, even for yourself), and also because it means I can choose when to flush the old database files, just in case I find some current problem in the database which I can fix with an older version.

Make your own HTML tables in R

I’ve avoid writing HTML tables by hand in R, reasoning that it would be far too complicated. But the problem with using packages like pander and rmarkdown, much as I love those packages, is you can’t fine tune the outputs. So if a colleague asks you to add little extra touches for a report you can’t really do it.

So it was a nice surprise to find that it’s actually not that hard. Here’s a table with 3 columns which looks very nice with bootstrap (don’t know what the styling options are on WordPress so it just looks unformatted here)

Service quality 95 90
Promoter 90 85
SUCE returns 1000 1200


paste0(c("<table class='table table-striped'>", 
  "<th>Score</th><th>Current quarter</th><th>Previous quarter</th>",
  "<tr>", paste0("<td>", c("Service quality", 95, 90), "</td>"), "</tr>",
  "<tr>", paste0("<td>", c("Promoter", 90, 85), "</td>"), "</tr>",
  "<tr>", paste0("<td>", c("SUCE returns", 1000, 1200), "</td>"), "</tr>",
  "</table>"), collapse = "")

Obviously you can make it dynamic pretty easily, the table above is from a Shiny app I’m currently developing, I’ve just replaced the numbers with static values for the blog post. So the next time someone asks me to style a table it should be pretty easy.

G++ error compiling on R

I was installing R packages (the httpuv package, but I imagine others would generate the same problem) on my Ubuntu server today and I got a very strange error message:

g++: error: Welcome: No such file or directory
g++: error: to: No such file or directory
g++: error: R!: No such file or directory

It was a problem with my file. That’s why it looks funny, because R is saying “Welcome to R”, but for some reason this is messing with g++. In my case it seemed to be generated by running R as sudo, but I need to do that to make the package library writable. Just invoke R at the terminal with:

R --no-site-file

and it will be fixed.

Sharing files from host machine on virtualbox

I’ve really started getting into virtual machines recently. Firstly because I like to try different flavours of Linux and constantly reinstalling partitions is starting to get a bit of a drag, especially when it often takes ages to get the graphical desktop to work because of my silly graphics card. Secondly, and more importantly, because I’m finding if you want to do programming, or especially write about programming, you often need a “clean” system or a system set up a particular way to test things and work on multiple things at once. So the code I write for Shiny is based on the newest 0.12 version. However, my server is all the way back on 0.7 for various reasons to do with browser compatibility. So this means I need a local 0.7 installation to test programs that I’m updating on the server. Or sometimes it can be important to check what the dependencies for your programs are. You may have installed something a year ago that makes everything work beautifully that you’ve long since forgotten about, and if you tell someone else to do what you’re doing it won’t work on their machine. In the Linux world technologies like Docker which package up whole environments at the operating system level and help you to manage dependencies (amongst other things) are starting to spring up and in the R world things like packrat roll up packages into bundles in order that all package versions are the same and dependencies satisfied.

I haven’t learned those technologies yet so I’m going to stick with virtualisation for now (Docker is probably a bit of a sledgehammer to crack a nut for me, anyway, really).

I am mainly making this post because I keep forgetting how to access the host file system on the guest system, so I’ll be able to refer to it in future. I hope it helps you. It’s based on Ubuntu. Some of it is probably similar in other OS’s, I really have no idea I’m afraid.

Go to Devices (on the Virtualbox host machine, i.e. on the window that contains the graphical environment of the guest machine) and select Insert Guest Additions CD image. It will be mounted for you. You need to find out where it’s mounted so you can access it from the terminal. Mine mounted at /media/chris/VBOXADDITIONS_4.3.10_93012/.

Run the Linux script within the mounted folder, on my machine it was:

cd /media/chris/VBOXADDITIONS_4.3.10_93012/

sudo sh ./

Presuming this works okay, reboot the virtual machine, and now select Devices… Shared folder settings on the host machine. Click “Add shared folder” over on the right. Find the folder and click “Auto-mount”. Select “Read only” (a lot of people seem to do this, don’t know why, seems safer anyway).

If you restart the folder should now be in /media. If you get “Unknown filesystem”, “Cannot open folder”, or something like that then you need the advice from here.

From the VBox User Manual:

Access to auto-mounted shared folders is only granted to the user group vboxsf, which is created by the VirtualBox Guest Additions installer. Hence guest users have to be member of that group to have read/write access or to have read-only access in case the folder is not mapped writable.

Therefore run:

sudo adduser yourusername vboxsf

Restart and it should all work. I restarted the virtual machine after each stage, just in case. If it still doesn’t work, sorry, off to the forums with you.

Single file Shiny app to take GET parameters and return date range input boxes

I’ve spent quite a long time fiddling around with some code for the feedback website for which I am responsible for all the reporting so we can distribute custom URLs to individuals who have specific reports they want to run. We use Shiny to do custom reports, the application lives here, and many of the people who run the reports (our staff in Nottinghamshire Healthcare NHS Trust, mainly) would welcome the ability to have the date range preselected by means of a URL containing GET parameters which they could just click so that the date range is pre-selected.

The code is not very well commented and does not handle bad URLs very well. In time I may fix this or I may just tell people that if they can’t figure out how to write well formed GET querys that I will make some for them.

Nonetheless, I thought it would be useful to share in case anyone else is doing something similar and wants to get started or steal some of the code. It’s all hosted as a single file shiny app on Gist here which means that by the magic of Shiny you need only run:

runGist(gist = "79a592a83ccda153efae")

for a demo. You will need to paste the correct GET parameters in yourself, obviously. The app does one of three things, all selectable using a type=”xxx” GET parameter. There follows a list with a URL that does something for each one. The output to the right of the date widget is just some debugging stuff so I can see what’s happening as I go along, you may wish to keep something like that in anything you write while you’re building it.

You can run it on your own machine or I host it on my server, if you run on your own machine just add to the end of the address bar everything after and including the “?”, or click the link to see it on my server. It looks a bit funny because I have an old version of Shiny on the server, I can’t upgrade at the moment because of various issues with IE7 etc. but hope to soon. It works, anyway, which is the point, and looks fine on the newest version of Shiny (0.12 as I write this).


This is the simplest one, you just put the date (begin and/ or ending) straight in as dmy format (which UK people will prefer, more on which later) like this:

Missing out either is fine, if you miss the beginning one it will just go to a year ago and if you miss the end one it picks today.


This is for people who want to find the previous n months, rounding to months (e.g., today, in September, n = 3 would bring back the three months previous to September- June, July, and August).


And lastly and most complicated-ly, people who want to use just quarters and years. It took me ages to debug this because the quarter() function from the marvellous lubridate package returns January as 1, April as 2, etc., which apparently is the US way of doing things, whereas in the UK January is 4 (of the previous year, obviously), April is 1, July is 2, etc. I never had any idea there were different systems. In fact, there are different systems all over the world. Amazing what you learn some days.

Converting the output of quarter() to what I want entailed moving from a year that cycled 1, 2, 3, 4 to a year that cycled 4, 1, 2, 3 and so I used our old friend modulo arithmetic to convert between the two (%% in R). I won’t say anything further about that because I’m completely hopeless when it comes to modulo arithmetic and if I’m honest the code came about more through trial and error than anything else.

And that’s basically it. Browse the code, run the application, I hope it’s of some use to someone looking to do something similar.

Producing headings in Word files programmatically using knitr

I love knitr. Sometimes I have to send people documents in Word. It works beautifully except when I want to produce headings programmatically in code using cat(). This is fine in HTML:

sapply(headings, function(x){
  cat(paste0("<h1>", x, "</h1>"))
  cat(paste0("<p>", rep("Some text here", 10), "</p>"))

However, when writing Word .docx files using Rmarkdown the heading tags are not recognised. Well, they’re not printed, so they obviously are recognised, but they don’t work.

I’ve struggled for a long time to understand how to produce headings in Rmarkdown and to be honest in the past I’ve actually done it with a lot of guesswork and some silly hacky unnecessary code. Today I have finally figured out a foolproof way to do it very easily every time, so I present the whole document to you and to my future self when I inevitably forget how to do it:

title: "Minimal example"
author: "Chris Beeley"
date: "1 September 2015"
output: word_document

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)


```{r, results = 'asis', echo = FALSE}

headings = letters[1:4]

  sapply(headings, function(x){
    cat(paste0("#", x, "\n"))
    cat(paste0("<p>", rep("Some text here", 10), "</p>"))


That’s it! Don’t forget the invisible() because otherwise if you use lists and things like that you’ll get the names of the list at the end.

Digikam crashes on startup

To be honest, this post is really designed as a reminder to me in case I ever reinstall my OS, but it will probably get a bit of Google juice and could possibly help somebody else.

If you’re using Digikam (a fine piece of software for organising your photos, well worth a look) and you find that it crashes on startup with Linux Mint, then

sudo apt-get install kdelibs-bin kdelibs5-data kdelibs5-plugins

Convert pdf to text on Ubuntu

We had rather an ugly scanned pdf of a very lovely poem over on our feedback website so I thought I would try to post the text from it using Optical Character Recognition using tesseract.

On Ubuntu start with:

sudo apt-get install tesseract-ocr imagemagick

You’ll need to convert the pdf to an image file:

convert -density 600 input.pdf output.tif

and then output to output.txt like this:

tesseract myscan.png out

Do note that if you have problems where an empty text file is returned (as I did) this could be because the margins around the image are too large- crop it down (not too tightly) and it should work nicely (although I would say it returned “l” a lot when in natural English that is obviously going to be an “I”, seems kind of obvious to a user but there must be some technical problem there, I suppose).

Five tips for five years

I saw some wonderful tweets the other day where the author gave their former naive self some time-travelling advice from their present self about how best to get started with programming. I did intend to keep the link but sadly my filing system has failed and it’s fallen through the cracks. I’m grateful to the author, in any case, for inspiring this post, written on the fifth anniversary of my own initial foray into the world of programming. I will give myself five pieces of advice for five years, also because five is a nice round-ish number.

1. It may sound obvious, but it’s worth saying and it’s probably the most important of the points in this post- write code. Write code to solve problems you face in your work. Write code to find out why the code that you’re writing doesn’t work. Write code to make sure that the code that you’re writing gives you the right answer. Write code for fun. Write code to see if you can do something with code instead of by hand. Write code to write “Hey, look what I did” blog posts. Write code to produce minimal examples so you can get help on forums. Writing code is an art (we’ll come back to the points in this post that caution against churning our code that “just works”), and, like all arts, you need to log the hours. So log the hours.

2. Following on from this, and as espoused by the pre-eminent stats guru, Hadley Wickham, never be ashamed of your code. The code I wrote five years ago is laughably bad. At a glance it’s quite obvious I had no idea what I was doing. Many of the functions are the wrong functions. It’s by turns ridiculously verbose and frustratingly taciturn. It’s badly commented and baffling to me, never mind about any other poor individual who might have to maintain or change it. But you learn by experience and coming back to all this dreadful code taught me WHY it’s important to write legible, well commented code and WHY it’s important to refactor code and WHY it’s important to use the paradigms in the language you’re writing in (functions, objects…).

3. At the same time as you’re writing code that solves problems, having fun, and learning how to make code that works, listen to the experts. You won’t understand it all yet but you need to file it away for later. There are many brilliant people on the internet sharing their knowledge with anybody who cares to listen and you need to listen to each and explore for yourself what they’re saying. In my time I’ve run down many dead-ends (or, I should say, things that are dead-ends for me now, with my level of skill) but have learned a great deal when I crashed into the wall at the end. Emacs is the best text editor in the world? Great! Wham! No, Vim is the best editor. Great! Wham! You must use MVC for PHP/ MySQL applications? I’ll use CakePHP! Wham!

CakePHP was probably the brick wall that I hit the hardest. I didn’t understand it at all. Now I’m starting to understand the principles, and the why, but haven’t quite got to the how.

4. While it’s good to just roam the internet, writing terrible code, reading advice from experts, and generally just having fun, there is definitely value in doing a paid course where you get help with your own code from an expert. I’ve taken two courses now, one with the Open University, Introduction to object oriented programming with Java, and one with O’Reilly, Web security with PHP and MySQL. Having a real person read, assess, and improve my actual own real code not only taught me a lot but also gave me discipline to write code that could be understood and easily criticised by somebody other than me. This is also a good way to dip your toes into different waters and start to understand the strengths and weaknesses of different languages and approaches. I have learnt Java and Python (on and although I haven’t yet used them for anything serious it’s helped me to understand the situations in which you would use them for something serious and why.

5. This only applies to certain types of programmer, but it certainly applied to me. Run your own server. They’re quite cheap to rent (or you could use an old computer in your own house) and you can do fun things with them like run WordPress (I self host this blog) and Shiny Server. Running a server will cause you a lot of headaches but will teach you a tremendous amount about your operating system as well as giving you an insight into the world of running applications and servers in real world applications. This only applies to people who want to write applications for the web, of course. Some programmers will spend their whole lives blissfully ignorant about such matters.

6. One extra one for fun. Use Linux. It’s fun. You can “take the back off” quite easily which makes it good for learning and the community is great. You’ll probably break and replace your operating system many times getting your graphics/ sound/ network card working and that’s more grist to the mill learning how your operating system works.