Tweets, blogs and corpora: How computer technology helps us make better dictionaries

I have worked in the dictionary business since 1980, following brief and not very brilliant periods as an academic, then English language teacher. My career in dictionaries is bookended by two major revolutions in lexicography: the arrival of corpora in the early 1980s and - in the last few years - the transfer of reference resources from print to digital media. With the second of these revolutions still in its early stages, I'm on an exciting new learning curve (like everyone else in the dictionary world), as the nature of the business undergoes a complete transformation.

I got into dictionaries via ELT, and have been lucky enough to be involved in all the major developments of the last 25 years or so. I worked for a time at COBUILD during the earliest days of corpus lexicography, then for over 10 years as Managing Editor at Longman, before becoming Editor-in-Chief of dictionaries at Macmillan in 1998.

At that point, there was no Macmillan dictionary list. The first Macmillan English Dictionary for Advanced Learners (MED1) was planned and created from scratch, and published at the beginning of 2002. It gained two major awards (English Speaking Union, and British Council Eltons), and quickly overtook the Cambridge and COBUILD learner's dictionaries in the sales rankings.

As well as designing, editing and project-managing dictionaries, I've been involved in the design and collection of several major corpora, including the British National Corpus. I've also worked closely with Adam Kilgarriff, a leading computational linguist, to design software tools for automating some of the jobs lexicographers do. Two pieces of lexicographic software which are now quite standard in the field were first developed for use in Macmillan dictionaries: 'Word Sketches', which provide a systematic account of collocation for any word in a corpus, and 'GDEX', a software tool for automatically extracting the 'best' examples of a word from the corpus.

Over the past 20 years or so, I have trained hundreds of lexicographers from all over the world, especially through the annual Lexicom Workshops which I run with Sue Atkins and Adam Kilgarriff (the three of us form the company Lexicography MasterClass, which provides training and project-management services). I have also done university teaching in the area of lexicography and lexical computing, both at UK universities (Brighton, Exeter, Aston) and in Tokyo, Guangzhou, and Barcelona. With my colleague Sue Atkins, I was editorial director of the project which produced the Dante lexical database - the most complete record in existence of the core vocabulary of English.

I have published  over 20 papers on lexicography, corpus linguistics, and lexical computing, and am the co-author, with Sue Atkins, of the Oxford Guide to Practical Lexicography (2008).

On the home front, I've been going to Tai Chi classes for several years, and I've recently revived my attempts to learn Spanish (seriously, this time). I also enjoy watching (but not playing) cricket, and am the author of The Wisden Dictionary of Cricket (A&C Black, 2006).