World’s largest grammar database, Grambank, reveals accelerating loss of language diversity
A new grammatical database, Grambank, documents the enormous diversity of current languages on Earth, highlighting just how much humanity stands to lose and why it’s worth saving.

Known as Grambank, the database––debuted in a study published today in Science Advances––could play a key role in preserving languages that are going extinct at an increasingly rapid pace. Global language experts estimate that, without intervention, about one language will be lost every month for the next 40 years due to social, political and economic pressures.
Grambank, now the world’s largest publicly available comparative grammatical database, reveals that language loss is occurring unevenly across major linguistic regions of the world. The analysis of more than 400,000 data points and 2,400 separate languages and dialects shows Indigenous languages in northeast South America, Alaska to Oregon, and in northern Australia are at highest risk.
“Right now we’re at a critical state, in terms of language endangerment,” said Hannah Haynie, co-first author of the study and assistant professor in the Department of Linguistics at CU Boulder. “Grambank is showing us the importance of working on language documentation and revitalization in order to preserve this legacy of human communication, culture and cognition.”
Compared with previous databases, Grambank is larger in scale and more thorough. It encodes 195 possible grammatical features for about 215 language families. So far, the database has recorded all possible grammar information of approx. 4,300 languages, meaning it’s over halfway to encoding the more than 7,000 known languages in the modern world from existing data sources.
Of all 2,400 languages and dialects in the dataset, only five match up the same using the grammatical code used to document and analyze them within Grambank. Though vocabulary may play a big role in the mutual unintelligibility that linguists rely on to determine what counts as separate languages, Grambank shows that the grammatical ‘fingerprints’ of languages are also typically unique, she said.
This work comes on the heels of the United Nations declaring this the International Decade of Indigenous Languages to try to promote language preservation, documentation and revitalization. Some of the most at-risk languages include the Aleut in Alaska and Salish languages of the Pacific Northwest, Yagua and Tariana spoken in South America, and the languages of Kuuk-Thayorre and Wardaman native to Northern Australian communities.
Initiated by scholars in the Department of Linguistic and Cultural Evolution at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, more than 100 authors from 68 institutions, including CU Boulder, contributed to the years-long, global data project.
The Grambank database is an open-access comprehensive resource maintained by the Max Planck Society.
Bibliographic information: