Michal Měchura EN GA CS

Language technologist, information architect

Hello. I am the author of the open-source dictionary writing system Lexonomy and the open-source terminology management platform Terminologue. I have written the Irish-language book An Ríomhaire Ilteangach, a guide to language technology for general readers. I have built or co-built many Irish-language reference websites including the National Terminology Database for Irish, the Placenames Database of Ireland, the National Folklore Collection and the Dictionary and Language Library. I am the author of Xonomy, an open-source, browser-based XML editor. I have written a computational grammar of Irish called Gramadán and I maintain the Irish National Morphology Database.
Fiontar & Scoil na Gaeilge, Dublin City University, Ireland
Foras na Gaeilge, Dublin, Ireland.
Natural Language Processing Centre, Masaryk University, Brno, Czech Republic
Dioplóma Iarchéime sa Ghaeilge Fheidhmeach | Postgraduate Diploma in Applied Irish
Dublin Institute of Technology, 2010
MPhil in Speech and Language Processing
Trinity College, University of Dublin, 2008

Publications

2018

CONFERENCE PAPER

Shareable Subentries in Lexonomy as a Solution to the Problem of Multiword Item Placement BIB

EVENT EURALEX 2018, Ljubljana, Slovenia
PUBLISHED IN Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts
This paper introduces a new way of dealing with phraseology in dictionaries. A classical question in lexicography is whether multiword items such as third time lucky should be listed under third, time or lucky. The ideal answer is ‘under all of them’ but, until now, the only way to do that in conventional tree-structured dictionaries has been to keep multiple copies (of what conceptually is one and the same item) in several places throughout the dictionary. We present a way to achieve the same goal without copying. The multiword item becomes a semi-independent subentry which exists in only one copy but appears simultaneously in several places in the dictionary. The structure of the dictionary remains a tree but the lexicographer is empowered to occasionally ‘break out’ of the tree in order to avoid duplication. This paper explains the reasoning behind the concept of shareable subentries and shows how this new functionality has been implemented in the dictionary writing system Lexonomy.
CONFERENCE PAPER with Krasimir Angelov

Editing with Search and Exploration for Controlled Languages BIB

PUBLISHED IN Proceedings of the Sixth International Workshop on Controlled Natural Language
PUBLISHER IOS Press, Maynooth, Ireland
We present an editor for controlled languages which is a combination of a syntax editor and a predictive editor.
TALK with Miloš Jakubíček, Vojtěch Kovář and Pavel Rychlý

Practical Post- Editing Lexicography with Lexonomy and Sketch Engine BIB

EVENT XVIII EURALEX International Congress: Lexicography in Global Contexts

2017

BOOK

An Ríomhaire Ilteangach BIB

PUBLISHER Cois Life, Dublin
ISBN 978-1-907494-70-3
Treoirleabhar don teicneolaíocht teanga atá dírithe ar an léitheoir ginearálta. Léitheoireacht riachtanach é seo do gach duine a láimhseálann breis is teanga amháin ar an ríomhaire. | A guide to language technology for general readers. This book is required reading for everybody who uses more than one language on their computer.
CONFERENCE PAPER

Towards a metadata infrastructure for online dictionaries BIB

EVENT European Network of e-Lexicography, Budapest
TALK

Ar thairseach na haoise digití: mionteangacha agus an ríomhaireacht BIB

EVENT ‘Ar an Imeall i Lár an Domhain?’: An tairseachúlacht i litríocht agus i gcultúr na hÉireann agus na hEorpa, Prague
TALK

How (not) to build a European Dictionary Portal BIB

EVENT Final Conference of the European Network of e-Lexicography, Leiden
CONFERENCE PAPER

Introducing Lexonomy: an open-source dictionary writing and publishing system BIB

PUBLISHER Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference, Leiden
This demo introduces Lexonomy (www.lexonomy.eu), a free, open-source, web-based dictionary writing and publishing system. In Lexonomy, users can take a dictionary project from initial set-up to final online publication in a completely self-service fashion, with no technical skills required and no financial cost.
TALK with Miloš Jakubíček, Vojtěch Kovář and Pavel Rychlý

One-Click Dictionary BIB

EVENT Electronic lexicography in the 21st century (eLex) conference

2016

CONFERENCE PAPER

Data Structures in Lexicography: from Trees to Graphs BIB

PUBLISHED IN Recent Advances in Slavonic Natural Language Processing
In lexicography, a dictionary entry is typically encoded in XML as a tree: a hierarchical data structure of parent-child relations where every element has at most one parent. This choice of data structure makes some aspects of the lexicographer’s work unnecessarily difficult, from deciding where to place multi-word items to reversing anentire bilingual dictionary. This paper proposes that these and other notorious areas of difficulty can be made easier by remodelling dictionaries as graphs rather than trees. However, unlike other authors who have proposed a radical departure from tree structures and whose proposals have remained largely unimplemented, this paper proposes a conservative compromise in which existing tree structures become augmented with specific types of inter-entry relations designed to solve specific problems.
TALK with Brian Ó Raghallaigh and Katie Ní Loingsigh

Towards a database of Irish surnames BIB

EVENT 25th Spring Conference of the Society for Name Studies in Britain and Ireland
TALK

Things to think about when building a dictionary website BIB

EVENT European Network of e-Lexicography, Barcelona, Catalonia

2015

TALK

Do minority languages need the same language technology as majority languages? BIB

EVENT British-Irish Council conference on language technology in indigenous, minority and lesser-used languages, Dublin Castle, Ireland
BLOG

Do minority languages need machine translation? »

I want to bust the myth that machine translation is necessary for the revival of minority languages.

2014

CONFERENCE PAPER

Irish National Morphology Database: a high-accuracy open-source dataset of Irish words BIB

PUBLISHED IN Proceedings of the First Celtic Language Technology Workshop
The Irish National Morphology Database is a human-verified, Official Standard-compliant dataset containing the inflected forms and other morphosyntactic properties of Irish nouns,adjectives, verbs and prepositions. It is being developed by Foras na Gaeilge as part of the New English-Irish Dictionary project. This paper introduces this dataset and its accompanying software library Gramadán.
BLOG

10 reasons why Irish is an absolutely awesome language »

And these are proper linguistic reasons, too – none of that starry-eyed sentimental nonsense about the language being ‘beautiful’ or ‘romantic’.

Breathing new life into old data: how to retro-digitize a dictionary »

What I learned from a project where we retro-digitized two Irish dictionaries and published them on the web.

2013

BLOG

The linguistic relativity of up and down »

A nice and simple example of how learning a new language causes you to start perceiving the world differently.

2012

CONFERENCE PAPER with Brian Ó Raghallaigh

The logainm.ie Placenames Database of Ireland: Software demonstration BIB

EVENT Placenames Workshop 2012
PUBLISHER Dublin City University
TALK

Idir foclóir agus léarscáil: Bunachar Logainmneacha na hÉireann BIB

EVENT Daonscoil na Mumhan, Waterford, Ireland
CONFERENCE PAPER

Léacslann: a platform for building dictionary writing systems BIB

PUBLISHED IN Proceedings of the 15th Euralex International Congress
PUBLISHER University of Oslo, Oslo
The purpose of this demo is to introduce Léacslann, a new platform for building dictionary writing systems (DWS) and terminology management systems (TMS) as well as other lexicographic and reference applications. Léacslann can be used without anyknowledge of programming to create a basic lexical database with an arbitrary structure. This will be demonstrated in the first half of the demo, while the second half will show how a software developer can customize Léacslann for more demanding applications.
REPORT

Léacslann Tutorial BIB

PUBLISHER Dublin City University

2010

CONFERENCE PAPER with Brian Ó Raghallaigh

How to build a termbase for 500,000 users (and live to tell the story) BIB

PUBLISHED IN Proceedings of Terminology and Knowledge Engineering (TKE) Conference
PUBLISHER Dublin City University, Dublin, Ireland
CONFERENCE PAPER

When definitions are not enough BIB

PUBLISHED IN Proceedings of Terminology and Knowledge Engineering (TKE) Conference
PUBLISHER Dublin City University
This paper introduces Compositional Term Diagrams (CTDs) as a formalism for analysing the structure of multi-word terms. CTDs have the potential to help terminologists resolve ambiguities related to transitivity (“who does what to whom”), modification (“what modifies what”) and evocation (“which sense is evoked by this word?”).
CONFERENCE PAPER with Brian Ó Raghallaigh

The Focal.ie National Terminology Database for Irish BIB

PUBLISHED IN Proceedings of the 14th Euralex International Congress
PUBLISHER Fryske Akademy, Ljouwert/Leeuwarden
CONFERENCE PAPER

What WordNet does not know about selectional preferences BIB

PUBLISHED IN Proceedings of the 14th Euralex International Congress
PUBLISHER Fryske Akademy, Ljouwert/Leeuwarden
Selectional preferences are the tendencies of words to co-occur with other words that belong to certain semantictypes. In this paper, I will investigate how closely these corpus-attested preferences correspond to WordNet. For example, for all possible direct objects of cancel, is there a single category (or a union of several categories) in WordNet that subsumes them, and only them? Selectional preferences manifest themselves in authentic texts andcan be revealed through corpus analysis. I will introduce an experimental tool I have built which attempts to do this automatically by aligning corpus-extracted lists of collocates (for example a list of the direct objects of cancel) with WordNet. The strength of this method is that it can discover and name selectional preferences automatically, but its weakness is that it can only do so when WordNet contains a suitable category. We will see that WordNet often lacks a category (or even a union of several categories) that fully corresponds to an attested selectional preference – for example, there is no category in WordNet that includes all the kinds of events that can be direct objects of cancel (meeting, wedding, concert etc.) but excludes those that cannot (accident, sunset, invention etc.).
BLOG

Living with a diacritic »

No, this is not an article about living with an obscure illness. It’s an article about living with a name no-one can spell correctly.

2009

TALK with Brian Ó Raghallaigh

User-Friendliness: the key to promoting a minority language on the Internet BIB

EVENT 12th International Conference on Minority Languages, Tartu, Estonia
BLOG

Flags as language symbols – so what is the problem? »

Using country flags as if they were language symbols is bad. So why does everybody keep on doing it? And is it really so bad?

Linguistic relativity: fact or wishful thinking? »

Most linguists secretly wish the Sapir-Whorf Hypothesis to be true. But is it?

2008

CONFERENCE PAPER

Giving them what they want: search strategies for electronic dictionaries BIB

PUBLISHED IN Proceedings of the 13th Euralex International Congress
PUBLISHER Universitat Pompeu Fabra, Barcelona
This paper deals with how humans search electronic dictionaries. It raises the point that users often make dictionary searches with misspellings, with inflected words copied and pasted from elsewhere, with complete sentences or fragments thereof, and with other kinds of low-quality input, and suggests methods for dealing with such phenomena in a pre-emptive manner. The issues addressed include searching with inflections, dealing with multi-word items, misspelling detection and text normalization. Additionally, the value of log files is emphasized as a source of information on user behaviour.
TALK

Cá bhfuil mo shínte fada? – ionchódú téacs ar ríomhairí BIB

EVENT Engineers Ireland, Dublin, Ireland
DISSERTATION

Selectional Preferences, Corpora and Ontologies BIB

PUBLISHER Trinity College, University of Dublin
This work presents a technique for exploring the selectional preferences ofwords in a semi-automatic way. The technique combines corpora with ontologiessuch as WordNet.The term selectional preference denotes a word’s tendency to co-occur withwords that belong to certain lexical sets. For example, the adjective delicious prefers to modify nouns that denote food and the verb marry prefers subjects and objects that denote humans. This work develops techniques for associating corpus-attested selectional preferences with concepts in an ontology. It shows how lexical sets can be derived from ontologies and how corpus-extracted collocates of a word can then be aligned with these lexical sets to reveal any selectional preferences the word has. An additional contribution provided here is an insight into the limitations of this method. The work presents evidence for the conclusion that aligning selectional preferences with an ontology is useful for some purposes, but fundamentally inaccurate because currently existing ontologies do not accurately reflect the mental categories evoked in selectional preferences.
BLOG

Sub Specie Aeternitatis »

Aiste leis an teangeolaí Seiceach Pavel Eisner a amharcann ar athbheochan na Seicise agus ar a bhfuil i ndán feasta di féin agus do mhionteangacha eile.

2007

MAGAZINE ARTICLE

Ionchódú Téacs ar Ríomhairí BIB

PUBLISHED IN Comhar
MAGAZINE ARTICLE

Localization into Irish BIB

PUBLISHED IN Multilingual Computing and Technology

2006

CONFERENCE PAPER

Finding the right structure for lexicographical data: experiences from a terminology project BIB

PUBLISHED IN Proceedings of the 13th Euralex International Congress
PUBLISHER Edizioni dell'Orso, Turin
MANUSCRIPT

Uimhreacha na Gaeilge BIB

Sa saothar seo tá cuntasar iomlán na rialacha a bhaineann le húsáid uimhreacha sa Ghaeilge. Mar is eol donléitheoir, tá córas uimhreacha na Gaeilge an-chasta, rud a chuireann fonn ar lucht scríofa leabhar gramadaí a gcuid cuntas ar an chóras a shimpliú agus ceisteanna áirithe a fhágáil gan freagra soiléir mar bheadh an freagra casta agus deacair le tuiscint. Sa saothar seo, tá a mhalairt de chur chuige i gceist. Rinne mé iarracht cur síos a dhéanamh ar chóras na n-uimhreachaar bhealach atá chomh hiomlán agus is féidir, in ainneoin a chastachta. Fónfaidh an saothar seo don té atá ar thóir cruinnis.

2005

MANUSCRIPT

A practical guide for functional text analysis: Analyzing English texts for field, mode, tenor and communicative effectiveness BIB

This document provides a scheme for analyzing English texts from a functional perspective. The document contains information adapted from Chapters 8, 10 and 12 – 16 of Books 2 and 3 of the Open University course E303 English Grammar in Context as it was presented in 2005, as well as from the set book Longman Student Grammar of Spoken and Written English and from the course’s associated readings. Skills in functional analysis are developed in the course books; this document re-iterates in concise form the main points to consider when performing the analysis.

2004

MANUSCRIPT

Czech–English translation difficulties arising from differences in word order BIB

This work deals with Czech-English translation difficulties that result from differences in word order between the syntax of the two languages. A functional framework is used to interpret the implications of the syntactical differences. Both English and Czech have a tendency to present given information at the beginning of a clause and new information at the end, but the flexibility of Czech word order makes it possible to observe this principle more consistently than English syntax makes possible. Additionally, Czech, unlike English, does not observe the end-weight principle and therefore long stretches of circumstantial information do not prefer to be placed at the end of a clause. Both these differences result in significant mismatches in word order between Czech clauses and their English translation equivalents.