The collective information system that we call language bridges knowledge across people, organizations, and cultures. It enables quite astonishing social coordination, at many levels, across wide landscapes of space and time, and is a foundation of virtually every other information system. Language has been a defining feature of humanity for more than seventy thousand years — but it didn't always exist, and it changes constantly. Understanding the origins and dynamics of language — one of the major transitions of evolution 1 — is a fascinating multidisciplinary effort in its own right.2
The study of language as an evolving information system has deep socio-technical import: Language is a "model organism" for many other modern dynamic, distributed, semantic systems such as web services, digital libraries, and multi-agent systems. The emergent, evolutionary, stability, and sensemaking properties of language can be used to understand many kinds of "self-organizing" information systems: How can coordinated concepts, signs, semantics, information organizations, and languages arise in information systems in general? What laws govern their structure and how they change? How can infosystems maintain a balance between globally stable interoperability and locally adaptive innovation? Understanding the natural evolution of infosystems' interactivity promises exciting new capabilities in representation, integration, usability, autonomy, and sustainability.
This study group/seminar covers mathematical, computational, experimental and empirical research at the intersection of language evolution and distributed information systems. Some central issues include: how artificial agents can create and adapt their own communication languages; models of human language emergence and change; symbol grounding; emergent web ontologies and "folksonomies"; population and network models of language dynamics; online consensus and agreement in very large spaces; emergence/evolution of signalling and communication in biological systems (molecular to population levels); language as a complex adaptive system; dynamic resource description/discovery systems and metadata; distributed information integration; etc. The study group is intended as a way for interested people in the UIUC community to participate and track work in this rapidly expanding area — the core UIUC group has been meeting steadily since 2003. The process is reading and presentation of research papers; a prior reading of each paper is generally assumed in the discussion.
The principal (but not exclusive) source of materials is the UIUC Language Evolution and Computation Repository.
Recommended introductory/background materials and samples of readings from earlier semesters are linked below the schedule to give an idea of topics and level.
1 Maynard-Smith, J. and Szathmary, E. (1997) The Major Transitions in Evolution, ch. 17. New York: Oxford University Press.
2 E.g., Maggie Tallerman, Ed. Language Origins: Perspectives on Evolution. Oxford University Press, 2005; M.H. Christiansen and S. Kirby, Eds., Language Evolution: The States of the Art. Oxford University Press, 2003; Chris Knight, James R. Hurford and Michael Studdert-Kennedy, Eds., The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form. Cambridge: Cambridge University Press, 2000; See also the UIUC Language Evolution and Computation Repository.
Time: 1:30pm to 3:00pm
Place: LIS Building, Room 340 (the "ISRL Commons", East end of the LIS bulding)
- James W. Minett and William S-Y. Wang, "Modelling Endangered Languages: The Effects of Bilinguilism and Social Structure", Lingua 118 (2008). (You must use a UIUC-domain machine to have direct access using this link, via CITES VPN or an on-campus machine.) Alternatively, the preprint is available through the UIUC Langev repository here or you can use UIUC's online journal access, if you're authorized.
Abstract: The mathematical model for language competition developed by Abrams and Strogatz allows the
evolution of the numbers of monolingual speakers of two competing languages to be estimated. In this
paper, we extend the model to examine the role of bilingualism and social structure, neither of which are
addressed in the previous model. We consider the impact of two strategies for language maintenance:
(1) adjusting the status of the endangered language; and (2) adjusting the availability of monolingual and
bilingual educational resources. The model allows us to predict for which scenarios of intervention language
maintenance is more likely to be achieved. Qualitative analysis of the model indicates a set of intervention
strategies by which the likelihood of successful maintenance is expected to increase.
There are many occasions to consider statistical physics models for language dynamics. The reading is a dense and long (but good) overview paper on a variety of models; Section V is the most important as it focuses on models for language in particular.
- Claudio Castellano, Santo Fortunato and Vittorio Loreto. "Statistical physics of Social Dynamics", ArXiv, 2007.
Description from Soren Wichmann:
Reviews work by physicists on various social phenomena, including languages. It also contains a section describing some social models often used.
Matt Garley, UIUC Linguistics Ph.D. student, will attend today and we've asked him to give a short intro to his work - which he described to Les in an email as: "a sociolinguistic corpus analysis of English borrowings in the German hip-hop fan community by collecting and analyzing information from several online forums."
- Andrew B. Wedel, "Exemplar Models, Evolution and Language Change", The Linguistic Review, 23 (2006) 247-274. (UIUC subscribes to this journal, so you can get the official 28 page version from the online journals page.)
Abstract: Evidence supporting a rich memory for associations suggests that people can
store perceptual details in the form of exemplars. The resulting particulate
model of category contents allows the application of evolution theory in mod-
eling category change, because variation in categorized percepts is reï¬‚ected
in the distribution of exemplars in a category. Within a production-perception
feedback loop, variation within an exemplar-based category provides a reserve
of variants that can serve as the seeds for shifts in the system over time through
random or selection-driven asymmetries in production and perception. Here,
three potential pathways for evolutionary change are identiï¬ed in linguistic
categories: pruning of lines of inheritance, blending inheritance and natural
selection. Simulations of each of these pathways are shown within a simple
exemplar-based model of category production and perception, showing how
consideration of evolutionary processes may contribute to our understanding
of linguistic category change over time.
- S. Wichmann, D. Stauffer, C. Schulze and E.W. Holman, "Do Language Change Rates Depend on Population Size? ", Advances in Complex Systems, 11 (3): 357-369 (June 2008). (UIUC subscribes to this journal, however there is a 12 month delay.)
Abstract: An earlier study (24) concluded, based on computer simulations and some inferences from empirical data, that languages will change the more slowly the larger the population gets. We replicate this study using a more complete language model for simulations (the Schulze model combined with a Barabasi-Albert network) and a richer empirical dataset (12). Our simulations show either a negligible or a strong dependence of language change on population sizes, depending on the parameter settings; while empirical data, like some of the simulations, show a negligible dependence.
- R.D. Gray, A. J. Drummond, and S. J. Greenhill, "Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlements ", Science 323, 479 (2009). (You must use a UIUC-domain machine to have direct access using this link, via CITES VPN or an on-campus machine.) Alternatively, you can use UIUC's online journal access, if you're authorized.)
Abstract: Debates about human prehistory often center on the role that population expansions play in shaping biological and cultural diversity. Hypotheses on the origin of the Austronesian settlers of the Pacific are divided between a recent "pulse-pause" expansion from Taiwan and an older "slow-boat" diffusion from Wallacea. We used lexical data and Bayesian phylogenetic methods to construct a phylogeny of 400 languages. In agreement with the pulse-pause scenario, the language trees place the Austronesian origin in Taiwan approximately 5230 years ago and reveal a series of settlement pauses and expansion pulses linked to technological and social innovations. These results are robust to assumptions about the rooting and calibration of the trees and demonstrate the combined power of linguistic scholarship, database technologies, and computational phylogenetic methods for resolving questions about human prehistory.
- P. F. Dominey, "Emergence of Grammatical Constructions: Evidence from Simulation and Grounded Agent Experiments ", Connection Science, 17(3-4): 289-306. (UIUC subscribes to this journal, so you can get the official version from the UIUC Catalog entry .)
Abstract: This research takes grammatical constructions (sentence form-to-meaning mappings) as an alternative to abstract generative grammars in the context of understanding the emergence of language. A model of sentence processing based on this construction grammar approach is presented, and then a series of neuropsychological and neurophysiological studies are reviewed that attempt to validate the model and to establish its neurophysiological underpinnings. The resulting model is demonstrated to provide insight into a developmental and evolutionary passage from unitary idiom-like holophrases to progressively more abstract grammatical constructions. The model is then functionally validated by its insertion into a perceptually grounded system that allows spoken language interaction with a human interlocutor. The potential utility of this emergence approach in understanding language is discussed.
- Mark Pagel, Quentin D. Atkinson and Andrew Meade, "Frequency of word-use predicts rates of lexical evolution throughout Indo-European history.", Nature, 449(11 Oct. 2007):717-721.
Abstract: Greek speakers say 'omicronupsilonrho', Germans 'schwanz' and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as 'tail') evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly--such as the number 'two', for which all Indo-European language speakers use the same related word-form. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English, Spanish, Russian and Greek) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.
- Y. Lee, T.C. Collier, C.E. Taylor and E.P. Stabler, "Cohesion of Languages in Grammar Networks. " To appear in Cooperative Control of Distributed Multi-Agent Systems Jeff Shamma, ed. Wiley, 2008.
Readings: David Gil, "How Much Grammar Does It Take to Sail a Boat? " In G. Sampson, D. Gil & P. Trudgill (Eds.), Language Complexity as an Evolving Variable. Oxford University Press, 2008.
- Damon Centola, Juan Carlos Gonzalez-Avella, Victor M. Equiluz and Maxi San Miguel, "Homophily, Cultural Drift, and the Co-Evolution of Cultural Groups. " In Journal of Conflict Resolution, Volume 52, Number 6, December 2007.
Studies of cultural differentiation have shown that social mechanisms that normally lead
to cultural convergenceâ€”homophily and inï¬‚uenceâ€”can also explain how distinct cultu-
ral groups can form. However, this emergent cultural diversity has proven to be unstable
in the face of cultural driftâ€”small errors or innovations that allow cultures to change
from within. The authors develop a model of cultural differentiation that combines the
traditional mechanisms of homophily and inï¬‚uence with a third mechanism of network
homophily, in which network structure co-evolves with cultural interaction. Results show
that in certain regions of the parameter space, these co-evolutionary dynamics can lead
to patterns of cultural diversity that are stable in the presence of cultural drift. The
authors address the implications of these ï¬ndings for understanding the stability of cul-
tural diversity in the face of increasing technological trends toward globalization.
- Background reading: Robert Axelrod, "The Dissemination of Culture: A Model with Local Convergence and Global Polarization ", in The Journal of Conflict Resolution, Volume 41, Issue 2, April 1997, 203-226. (This paper was previously read in LEADS on 3/7/2008)
(You must use a UIUC-domain machine to have direct access using these links, via CITES VPN or an on-campus machine.) Alternatively, you can use UIUC's online journal access, if you're authorized.)
- Readings: Samarth Swarup and Les Gasser, "The Iterated Classification Game: A New Model of the Cultural Transmission of Language", Accepted for publication in Adaptive Behavior.
The Iterated Classiï¬cation Game (ICG) combines the Classiï¬cation Game with the It-
erated Learning Model (ILM) to create a more realistic model of the cultural transmission
of language through generations. It includes both learning from parents and learning from
peers. Further, it eliminates some of the chief criticisms of the ILM: that it does not study
grounded languages, that it does not include peer learning, and that it builds in a bias for
compositional languages. We show that, over the span of a few generations, a stable linguis-
tic system emerges that can be acquired very quickly by each generation, is compositional,
and helps the agents to solve the classiï¬cation problem with which they are faced. The
ICG also leads to a different interpretation of the language acquisition process. It suggests
that the role of parents is to initialize the linguistic system of the child in such a way that
subsequent interaction with peers results in rapid convergence to the correct language.
LEADS meetings have ended for the Spring 2009 semester. LEADS will start up again in early June.