Convert pdf to text on Ubuntu

We had rather an ugly scanned pdf of a very lovely poem over on our feedback website so I thought I would try to post the text from it using Optical Character Recognition using tesseract.

On Ubuntu start with:

sudo apt-get install tesseract-ocr imagemagick

You’ll need to convert the pdf to an image file:

convert -density 600 input.pdf output.tif

and then output to output.txt like this:

tesseract myscan.png out

Do note that if you have problems where an empty text file is returned (as I did) this could be because the margins around the image are too large- crop it down (not too tightly) and it should work nicely (although I would say it returned “l” a lot when in natural English that is obviously going to be an “I”, seems kind of obvious to a user but there must be some technical problem there, I suppose).

Five tips for five years

I saw some wonderful tweets the other day where the author gave their former naive self some time-travelling advice from their present self about how best to get started with programming. I did intend to keep the link but sadly my filing system has failed and it’s fallen through the cracks. I’m grateful to the author, in any case, for inspiring this post, written on the fifth anniversary of my own initial foray into the world of programming. I will give myself five pieces of advice for five years, also because five is a nice round-ish number.

1. It may sound obvious, but it’s worth saying and it’s probably the most important of the points in this post- write code. Write code to solve problems you face in your work. Write code to find out why the code that you’re writing doesn’t work. Write code to make sure that the code that you’re writing gives you the right answer. Write code for fun. Write code to see if you can do something with code instead of by hand. Write code to write “Hey, look what I did” blog posts. Write code to produce minimal examples so you can get help on forums. Writing code is an art (we’ll come back to the points in this post that caution against churning our code that “just works”), and, like all arts, you need to log the hours. So log the hours.

2. Following on from this, and as espoused by the pre-eminent stats guru, Hadley Wickham, never be ashamed of your code. The code I wrote five years ago is laughably bad. At a glance it’s quite obvious I had no idea what I was doing. Many of the functions are the wrong functions. It’s by turns ridiculously verbose and frustratingly taciturn. It’s badly commented and baffling to me, never mind about any other poor individual who might have to maintain or change it. But you learn by experience and coming back to all this dreadful code taught me WHY it’s important to write legible, well commented code and WHY it’s important to refactor code and WHY it’s important to use the paradigms in the language you’re writing in (functions, objects…).

3. At the same time as you’re writing code that solves problems, having fun, and learning how to make code that works, listen to the experts. You won’t understand it all yet but you need to file it away for later. There are many brilliant people on the internet sharing their knowledge with anybody who cares to listen and you need to listen to each and explore for yourself what they’re saying. In my time I’ve run down many dead-ends (or, I should say, things that are dead-ends for me now, with my level of skill) but have learned a great deal when I crashed into the wall at the end. Emacs is the best text editor in the world? Great! Wham! No, Vim is the best editor. Great! Wham! You must use MVC for PHP/ MySQL applications? I’ll use CakePHP! Wham!

CakePHP was probably the brick wall that I hit the hardest. I didn’t understand it at all. Now I’m starting to understand the principles, and the why, but haven’t quite got to the how.

4. While it’s good to just roam the internet, writing terrible code, reading advice from experts, and generally just having fun, there is definitely value in doing a paid course where you get help with your own code from an expert. I’ve taken two courses now, one with the Open University, Introduction to object oriented programming with Java, and one with O’Reilly, Web security with PHP and MySQL. Having a real person read, assess, and improve my actual own real code not only taught me a lot but also gave me discipline to write code that could be understood and easily criticised by somebody other than me. This is also a good way to dip your toes into different waters and start to understand the strengths and weaknesses of different languages and approaches. I have learnt Java and Python (on https://www.coursera.org/) and although I haven’t yet used them for anything serious it’s helped me to understand the situations in which you would use them for something serious and why.

5. This only applies to certain types of programmer, but it certainly applied to me. Run your own server. They’re quite cheap to rent (or you could use an old computer in your own house) and you can do fun things with them like run WordPress (I self host this blog) and Shiny Server. Running a server will cause you a lot of headaches but will teach you a tremendous amount about your operating system as well as giving you an insight into the world of running applications and servers in real world applications. This only applies to people who want to write applications for the web, of course. Some programmers will spend their whole lives blissfully ignorant about such matters.

6. One extra one for fun. Use Linux. It’s fun. You can “take the back off” quite easily which makes it good for learning and the community is great. You’ll probably break and replace your operating system many times getting your graphics/ sound/ network card working and that’s more grist to the mill learning how your operating system works.

Don’t apologise- it’s just feedback

I’m cross posting this to my team’s blog which can be found here to bring together my two worlds of programming and Linux-geekery with the Involvement and experience in UK health services which I use programming and Linux-geekery for in my job. Here it is, a story from a meal out in a pub and the lessons the NHS might learn from it.

I witnessed a rather amusing event after a meal in a pub last week, but thinking about it the next day I thought perhaps there are some lessons in it.

A man approaches the bar after eating his meal, wishing to pay, and asks “Do you have a feedback form?”. It’s only a village pub, not the sort of place you’d really expect to have a feedback form. They look a little bit baffled and suggest he could email. Unperturbed, he rattles off three complaints, counting each off on his fingers as he goes.

They apologise, and he responds “You don’t have to apologise, it’s just feedback. But I don’t think I’ll be coming back”. And with that he walks off, presumably never to be seen again.

The incident stuck with me for a couple of reasons. Firstly, he clearly felt quite irate. His complaints, to be honest, were pretty valid and I basically agree with most of what he said. I had a pleasant evening myself so will be going back, but he was essentially spot on with his assessment of the experience. Despite being irate, however, he clearly had no desire for an apology, since he dismissed the offer of one.

Secondly, because he felt irate, the whole experience was quite uncomfortable for everyone. You could see the bar staff (all junior in age and seniority, the manager being inside the kitchen somewhere) were quite taken aback at his manner and didn’t know quite how to respond.

Thirdly, although he clearly wanted to feed back, and didn’t even seem necessarily to want to vent his spleen in person (given that he first asked for a feedback form), he wasn’t really interested in improving the venue for his own benefit (as regular customers might) since he vowed never to return.

So what can the NHS learn from all this? I think there are a few lessons.

Firstly, although feedback forms are often criticised for being impersonal, it’s clear that in this situation it would have helped the staff by removing them from this confrontational situation. The individual in question clearly didn’t want an apology, or compensation, or even to see improvements on their return, so feeding their very accurate impressions of the venue back on a form would have spared the staff the awkwardness of meeting this challenge face to face.

Secondly, it’s a good reminder that everybody has their own idiosyncratic relationship with feedback. Some people just want to have a rant at somebody and get it off their chest. Some people want to constructively engage and keep using a service and watch it improve. Some people just want an apology. Some people (as in this case) just have a good assessment to share and feel natural about sharing it on a form or face to face, and have no interest in apologies, or improvements, or anything else.

Thirdly, it just goes to show that good feedback can be found anywhere. It was quite a difficult situation and proved impossible to resolve face to face, with the poor old staff just looking uncomfortable, and the man disappeared into the night before they’d even replied at all. Nonetheless, he was spot on and I certainly hope that they do make all the changes which he suggested ready for my next visit.