Having extracted the Ordnance Survey Open Names data from the OS web site and found that the precision of the data was insufficient for the project in hand, I then had to improve on the coordinates provided.
This is a non-trivial exercise and implies either manually amending coordinates or looking up the associated coordinate data from other sources. To put this in perspective: There are just under 6,000 place names defined in the OS Open Names data set for Cornwall. Ignoring for the moment whether all 6,000 are actually needed for my purposes, or whether additional settlement names need to be included, it is obvious that an industrial-scale approach is needed to improve associated coordinate data, and an ad-hoc or manual data entry approach is not practicable.
As a matter of preference, it would be better to lookup someone else’s data first, rather than create new data: in other words, geocoding name data from geocoding service.
Link to the Ordnance Survey Open Names support page:
There are two basic approaches to using a geocoding service, (i) use a user interface provided by the geocoding service, or (ii) using their application programming interface (API) and writing your own code. Using a pre-existing user interface is acceptable if there are small amounts of data or for a one-off geocoding activity, but if more significant processing is required, potentially comprising several ‘runs’ of the same data, then more control over the process is needed. This means writing your own code, which must work with The input and output formats that have been defined by the geocoding service itself.
For clarity, this may be illustrated thus:
This is all well and good, but the question remains: which programming language, and which programming tools.
There are several useful geocoding environments for ad-hoc geocoding, including:
Doogal – which converts post codes to map output, but also provides LatLong and Northing/Easting coordinates:
Nominatum – which uses the Open Street Names dataset to locate a place (but you have to look at the debug output link at the top of the page to see the coordinates):
The Programming Environment
In theory, there are a large number of combinations of programming language and programming tools that might be used, but in practice these options are far more limited than might be supposed. There are several factors that constrain choice and limit the availability of specific tools, some of which are identified here.
The operating system platform: The Windows environment has the richest selection of tools in terms of number available, maturity and usability of resources, but many of these are commercial (paid for) offerings. By comparison, the Linux environment provides significant open source and free resources, but support may be patchy depending on the size of the user base and maturity of product. Finally, the Mac environment is the most limited of mainstream environments in terms of choice, but the user interface is often better crafted.
Ultimately, users end up using the platform they are most familiar with or is available to them. In my case, the Mac is my ‘default’ platform, and the best equipped machine I have in terms of processing power and RAM, so I only really have the illusion of choice.
A further consideration is the availability of geocoding ‘libraries’ to support whatever programming language you end up using. Ideally, there will be a wide range of coding examples to draw from. Practically, library support for geocoding (or indeed any other application) may be limited to one or two programming languages and particular types of output, such as XML, and may not necessarily include JSON or CSV formats, and if you don’t ‘read’ or process XML, this might be a problem. For example: the XML libraries for Excel are available on Windows, but not Mac OS.
Finally, there is the ability of the programming development environment (IDE) to support the preferred programming language, and the time and expertise required to configure the IDE to work with that programming language, might rule out the IDE or using your favourite language. For example: (i) A developer may prefer to use C# (C-sharp) with the Visual Studio IDE, but this may be an unrealistic combination for use with a particular geocoding environment, or (ii) The platform may be a Mac and the preferred language Java, but Xcode support for Java is almost non-existent (Xcode is the native IDE for Mac).
Having tried out various combinations for the Mac, I ended up ditching both Visual Studio and Xcode. Having shortlisted several other IDEs, I finally chose NetBeans, since its support for Java appeared to be one of the most complete, whilst its IDE interface is mature and relatively user-friendly. Yes, I would have to learn Java, but since I have previous experience in programming in a variety of other languages including C# and Fortran, I thought the time spent would be acceptable.
The Java Architecture
In order to use Java for development, there are several sub-systems that need to be installed, and a whole new terminology needs to be learned, for example:
- The JDK (Java Development Kit): These are libraries that support different types of function, from reading and writing files, to handling databases and error handling for example
- The JRE (Java Runtime Environment): This allows completed Java programs to run on the operating system platform and includes application programming interfaces (APIs) and the JVM (Java Virtual Machine), an environment in which your program runs.
- Java applications (programs), which include Code modules and classes: The programs you will have to write.
So these components are either installed during Java setup, are created by you, the developer, or created ‘on the fly’, when you run one of your programs, and come together in the form of the Java ‘architecture’.
Preparing to Develop on the Mac
Developing inevitably means being comfortable with doing things through the command prompt. This means being able to change directory using the cd command, list files using ls command, and translate where you are between what Finder says and what Mac OS says at the command prompt. This is crucial.
One of things that you may need to change along the way is the shell used by the command prompt. By default, Mac OS uses bash, but this is being deprecated in favour of Z Shell (zsh). Annoyingly, most examples of working with the command shell on the Internet are for bash though.
Configuring the Environment
It should be stressed that getting the Mac, NetBeans and Java to work seamlessly together is not ‘point and click’, and some knowledge of installing, uninstalling and configuring Java was required.
Firstly, it is likely that your machine will be running a legacy version of Java unless you have consciously updated it, but most geocoding requirements will be for recent versions, such as Version 8.111 (Build 1.8). This means installing a newer version (Build 1.8) in my case. Secondly, your system needs to recognise the new version when you use your IDE and try to code. This is typically done by setting the default version. If you plan on using Python, the same will also apply.
Secondly, you will need to install whatever programming libraries are needed, if they are not already present. Thankfully, NetBeans is fairly helpful when being set up for the first time and will download and enable various plugins, and when the full Java environment is installed, this includes the Java Development Kit (JDK), which includes most/all of the relevant libraries.
In all cases, it is advisable to install ‘complete packages’ and to use the recommended installers, rather than attempting to implement the environment one piece at a time.
Writing Some Example Code: A Proof of Principle
Before diving in to writing geocoding applications, the whole set up needs testing. The first thing is to write a ‘Hello World’ application, and to prove to yourself that it runs. If you’re using Java, try running it in interpreted mode, then build and compile it, and run the binary (executable) file.
I built up my Java skills, one plank at a time, assigning variables, writing out a file, reading a file, and then reading file input, doing something and then writing out lines to a file. This simulates the basic structure of a geocoding application. In each case, I used NetBeans to manage the entire process and also to build up a library of examples.