|Home||Feature Inventory||Related Issues||Search||Contact|
Grammatical Features Home
Welcome to the Features website. This page gives a brief introduction to grammatical features and an explanation of some academic and technical conventions adopted in this website.
Both the content and the structure of the Features website, in particular the Feature Inventory, will be updated on a regular basis. When accessing the Features website, please check the date the page was last updated (this is given at the bottom of each page).
We hope you will find this resource useful, and that you may help us improve it by sharing your expertise with us. Comments, corrections and contributions to the website are very welcome and will be gratefully acknowledged. You will find our e-mail address on the Contact page. Thank you.
– Anna Kibort & Greville Corbett
In attempting to understand language, many researchers use features, the elements into which linguistic units, such as words, can be broken down. Examples of features are NUMBER (singular, plural, dual, ...), PERSON (1st, 2nd, 3rd), and TENSE (present, past, ...). Features have proved invaluable for analysis and description, and have a major role in contemporary linguistics, from the most abstract theorising to the most applied computational applications. Yet little is firmly established about features: we have no inventory of which features are found in the world's languages, no agreed account of how they operate across different components of language, no certainty on how they interact, and thus no general theory of features. They are used, but are little discussed and poorly understood. This is a central gap in the conceptual underpinning of much linguistic investigation.
The Feature Inventory offered on the pages of this website is an attempt to put the notion of linguistic 'feature' on a sounder empirical and conceptual base. It aims to provide evidence for the diverse content of features in the world's languages, as well as discuss some of their formal properties, particularly in morphology (word structure) and syntax (sentence structure).
We envisage that the Features website will be useful to theoretical and applied linguists of any persuasion, including computational linguists. We hope that the careful catalogue of the various types and uses of features will aid any further work on the typology of features and provide the basis for a theoretical conceptualisation of the notion 'feature'. It will help demonstrate the type of features on which linguistic theory can legitimately call and the implications of adopting different theoretical perspectives on features while using them for the same descriptive goals.
The website may also prove to be of particular value to fieldworkers and psycholinguists. It will enable the former to check previous work before proposing a new feature for the description or analysis of the investigated language. Psycholinguists will be able to consult the Inventory while designing experiments. For example, various experiments use two-valued features when a different choice would have led to richer data. The Inventory can be used to inform the choice of language and feature in such instances.
The systematisation of language description offered in the Features website may also be of interest to language educators and lexicographers, including the providers of on-line dictionaries and automated translation tools, and anyone who deals with the description as well as formal and semantic classification of words.
Teachers of well-known languages will find in the Inventory information that is not readily available in textbooks (for example, why some Russian textbooks list only six declensional cases, but in other textbooks one can find up to ten - how many cases are there, then?). Teachers of various languages will often be able to find a linguistic explanation of a problematic phenomenon, an illumination of a native-speaker intuition, or a debunking of a language myth. Anyone interested in languages will be able to find out exciting facts about various languages and be amazed at the diversity displayed by natural language, particularly as they explore the sections of the Inventory listing 'Feature values', 'Oddly behaving feature markers' or 'Problem cases'. We believe that the Inventory may also be accessible to inquisitive high school students and may encourage some of them in the future to become linguists - whether theoretical, computational, or fieldworkers documenting languages in danger of extinction.
While referring to features and values, we use lower case if the feature or value expresses a cross-linguistic generalisation or a logical possibility (e.g. feminine gender, past tense), and an initial capital letter whenever we need to distinguish the particular, language-specific morphological exponent of the feature or value from the basic cross-linguistic or logical set that we have identified (e.g. the Present Perfect in English).
In our own transcriptions of data, we follow the Leipzig Glossing Rules (Conventions for interlinear morpheme-by-morpheme glosses). Data cited from other sources may follow other conventions.
In references to languages, we have tried to follow the SIL classification and used mostly the names of languages as recommended in the Ethnologue (version 15): Gordon, Raymond G., Jr. (ed.), 2005. Ethnologue: Languages of the World, Fifteenth edition. Dallas, TX: SIL International. Online version: http://www.ethnologue.com/. We have diverged from the Ethnologue occasionally when we had reasons to believe that the name preferred by the community in which the language is spoken was different.
As our research is relevant to moves towards standardising the annotation conventions in computational linguistics and electronic language documentation, we have initiated contact or entered into collaboration with several external bodies involved with standards in linguistic annotation and linguistic data resources.
Since July 2005 we have acted as advisors to E-MELD (Electronic Metastructure for Endangered Languages Data), a US-based project run by the LINGUIST List, the world's largest online linguistic resource and forum. The goal of E-MELD is to promote consensus about key aspects of the infrastructure for linguistic archives by advocating adequate collaboration among archivists, field linguists, and language engineers. The E-MELD team are working towards the development of a common standard for the digitisation of linguistic data, in order to minimise the variation in archiving practices and language representation that could seriously inhibit data access, searching, and cross-linguistic comparison. One of their main concerns, shared by us, is that standards may be set without guidance from descriptive linguists, the people who best know the range of structural possibilities in human language.
Together with our colleagues from the Surrey Morphology Group, we participate in the recently created Ontology Wiki ('ontowiki') whose aim is to map out a General Ontology for Linguistic Description (GOLD). The ontology is intended to capture the knowledge of a well-trained linguist, and can thus be viewed as an attempt to codify the general knowledge of the field. It is intended to give a formalised account of the most basic categories and relations used in the scientific description of human language, and thus facilitate automated reasoning over linguistic data and establish the basic concepts through which intelligent search can be carried out. An example of an implementation of an ontology driven search is ODIN (The Online Database of Interlinear Text), based on an earlier version of the ontology.
We have been in contact with the ISO Technical Committee 37 (TC 37), Sub-Committee 4 (SC 4), regarding their current work on the standard for morphosyntactic annotation ISO 24613. TC 37 concern themselves with terminology and other language and content resources, with SC 4 dealing more specifically with language resource management. One of the standards they are working on currently (ISO 24613), which is to be published in 2008, concerns an abstract metamodel, called the Lexical Markup Framework (LMF), that will provide a common, standardised framework for the construction of computational lexicons. We have provided detailed comments on a few working drafts of ISO 24613 in the hope of making a positive contribution to the standard.