
Gutierrez-Vasques, X., Bentz, C., and Samardžić, T. (2023). Languages through the looking glass of BPE compression. Computational Linguistics.

Petrini, S., Casas-i-Muñoz, A., Cluet-i-Martinell, J., Wang, M., Bentz, C., and Ferrer-i-Cancho, R. (2023). Direct and indirect evidence of compression of word lengths. Zipf’s law of abbreviation revisited. Glottometrics.

Ehret, K., Berdicevskis, A., Bentz, C., and Blumenthal-Dramé, A. (2023). Measuring language complexity: challenges and opportunities. Linguistics Vanguard, 9 (s1), pp. 1-8.

Bentz, C., Gutierrez-Vasques, X., Sozinova, O., and Samardžić, T. (2022). Complexity trade-offs and equi-complexity in natural languages: A meta-analysis. Linguistics Vanguard.
Data and Code

Ferrer-i-Cancho, R., Bentz, C., and Seguin, C. (2022). Optimal coding and the origins of Zipfian laws. Journal of Quantitative Linguistics, 29(2), p. 165-194.

Ehret, K., Blumenthal-Dramé, A., Bentz, C., and Berdicevskis, A. (2021). Meaning and measures: Intepreting and evaluating complexity metrics. Frontiers in Communication, 6, p. 66.
Data and Code

Dutkiewicz, E., Lee, S., Russo, G., and Bentz, C. (2020). SignBase, a collection of geometric signs on mobile objects in the Paleolithic. Nature Scientific Data, 7, 364.
Data and Code

Bentz, C., Dediu, D., Verkerk, A. and Jäger, G. (2018). The evolution of language families is shaped by the environment beyond neutral drift. Nature Human Behaviour, 2, 816-821.
Supplementary   Data and Code

Sahle, Y., Reyes-Centeno, H. and Bentz, C. (2018). Modern human origins and dispersal: current state of knowledge and future directions. Evolutionary Anthropology, 1-4, DOI: 10.1002/evan.21573.

Bentz, C., Alikaniotis, D., Cysouw, M. and Ferrer-i-Cancho, R. (2017). The entropy of words - learnability and expressivity across more than 1000 languages. Entropy, 19(6), 275; doi:10.3390/e19060275
R package Hrate   unigram entropies   entropy rates

Bentz, C., Alikaniotis, D., Samardžić, T. and Buttery, P. (2017). Variation in word frequency distributions: Definitions, measures and implications for a corpus-based language typology. Journal of Quantitative Linguistics, 24(2-3), 128-162.
R package NFD

Bentz, C., Verkerk, A., Kiela, D., Hill, F. and Buttery, P. (2015). Adaptive communication: Languages with more non-native speakers tend to have fewer word forms. PLoS ONE 10(6): e0128254. doi:10.1371/journal.pone.0128254
R script

Bentz, C., Kiela, D., Hill, F. and Buttery, P. (2014). Zipf's law and the grammar of languages: A quantitative study of Old and Modern English parallel texts. Corpus Linguistics and Linguistic Theory, 10 (2), 175-211.

Hill, F., Korhonen, A. and Bentz, C. (2014). A quantitative empirical analysis of the abstract/concrete distinction. Cognitive Science 38, 162-177.

Bentz, C. and Winter, B. (2013). Languages with more second language learners tend to lose nominal case. Language Dynamics and Change 3, 1-27.
Data   R script


Samardžić, T., Gutierrez-Vasques, X., Bentz, C., Moran, S., and Sozinova, O. (2024). A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets. In: Findings of the Association for Computational Linguistics (NAACL 2024), p. 3367–3382.

Bentz, C. (2023). The Zipfian Challenge: Learning the statistical fingerprint of natural languages. In: Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL 2023).
Appendix   Data and Code

Moran, S., Bentz, C., Gutierrez-Vasques, X., Sozinova, O., and Samardžić, T. (2022). TeDDi sample: Text Data Diversity sample for language comparison and multilingual NLP. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), p. 1150-1158.

Gutierrez-Vasques, X., Bentz, C., Sozinova, O., and Samardžić, T. (2021). From characters to words: the turning point of BPE merges. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, p. 3454–3468.

Caines, A., Bentz, C., Knill, K., Rei, M., and Buttery, P. (2020). Grammatical error detection in transcriptions of spoken English. In: Proceedings of the 28th International Conference on Computational Linguistics, p. 2144–2162. Barcelona, Spain.

Berdicevskis, A., Çöltekin, Ç., Ehret, K., von Prince, K., Ross, D., Thompson, B., Yan, C., Demberg, V., Lupyan, G., Rama, T., and Bentz, C. (2018). Using Universal Dependencies in cross-linguistic complexity research. In: Proceedings of UDW-18, Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium.

Bentz, C. and Berdicevskis, A. (2016). Learning pressures reduce morphological complexity: linking corpus, computational and experimental evidence. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan.

Bentz, C., Ruzsics, T., Koplenig, A. and Samardžić, T. (2016). A comparison between morphological complexity measures: typological data vs. language corpora. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan.

Bentz, C. and Ferrer-i-Cancho, R. (2016). Zipf's law of abbreviation as a language universal. In: Bentz, C., Jäger, G. and Yanovich, I. (eds.) Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics. University of Tübingen, online publication system,

Bentz, C. (2016). The Low-Complexity-Belt: evidence for large-scale language contact in human prehistory? In: Roberts, S.G., Cuskley, C., McCrohon, L., Barceló-Coblijn, L., Feher, O. and Verhoef T. (eds.) The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). doi:10.17617/2.2248195.

Caines, A., Bentz, C., Alikaniotis, D., Katushemererwe, F. and Buttery, P. (2016) The Glottolog data explorer: Mapping the world's languages. Proceedings of LREC, Portoroz, Slovenia, 2016.

Caines, A., Bentz, C., Graham, C., Polzehl, T. and Buttery, P. (2016) Crowdsourcing a multilingual speech corpus: recording, transcription and annotation of the CROWDED CORPUS. Proceedings of LREC, Portoroz, Slovenia, 2016.

Bentz, C., Buttery, P. (2014). Towards a computational model of grammaticalization and lexical diversity. CogACLL: Workshop on Cognitive Aspects of Computational Language Learning at the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden.

Bentz, C., Kiela, D. (2014). Zipf's law across languages of the world: Towards a quantitative measure of lexical diversity. In: Cartmill, E. A., Lyn, H., Roberts, S., and Cornish, H. (Eds.) The Evolution of Language: Proceedings of the 10th International Conference Evolang 10, Vienna 2014. World Scientific.

Bentz, C. (2014). What's next? (possible) agenda for evolutionary linguistics after Evolang9. To appear in: Evolang student volume.

Bentz, C. (2013). Beyond rule versus rote? Processing of distinctive dative and genitive case markers in German. CogSci 2013, The annual meeting of the Cognitive Science Society.

Hill, F., Korhonen, A. and Bentz, C. (2013). Large-scale empirical analyses of the abstract/concrete distinction. CogSci 2013, The annual meeting of the Cognitive Science Society.

Bentz, C. and Winter, B. (2012). The impact of L2 speakers on the evolution of case marking. In: Scott-Phillips, T. C., Tamariz, M., Cartmill, E. A., and Hurford, J. R. (Eds.), Proceedings of the 9th International Conference on the Evolution of Language (pp. 58-63). New Jersey: World Scientific.

Bentz, C. and Christiansen, M. H. (2010). Linguistic adaptation at work? The change of word order and case system from Latin to the Romance languages. In: A. Smith, M. Schouwstra, B. de Boer and K. Smith (Eds.), Proceedings of the Eight International Conference on the Evolution of Language (pp. 26-33). London: World Scientific Publishing.


Bentz, C. (2018). Adaptive languages: An information-theoretic account of linguistic diversity. Trends in Linguistics. Studies and Monographs (TiLSM), volume 316. Berlin/Boston, De Gruyter Mouton.

Edited Volumes

Debowski, Ł., and Bentz, C. (eds.) (2020). Information theory and language. Entropy, Volume 22.

Sahle, Y., Reyes-Centeno, H., and Bentz, C. (eds.) (2019). Modern human origins and dispersal. Words, Bones, Genes, Tools: DFG Center for Advanced Studies Series. Tübingen: Kerns Verlag.

Bentz, C., Jäger, G. and Yanovich, I. (eds.) (2016). Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics. University of Tübingen, online publication system.

Book Chapters

Daneyko, T. and Bentz, C. (2019). Click languages tend to have large phoneme inventories: Implications for language evolution and change. In: Modern human origins and dispersal, ed. by Sahle, Y., Reyes-Centeno, H., and Bentz, C. Words, Bones, Genes, Tools: DFG Center for Advanced Studies Series. Tübingen: Kerns Verlag. p. 315-329.

Nichols, J. and Bentz, C. (2018). Morphological complexity of languages reflects the settlement history of the Americas. In: New Perspectives on the Peopling of the Americas, ed. by Harvati, K., Jäger, G., and Reyes-Centeno, H. Words, Bones, Genes, Tools: DFG Center for Advanced Studies Series. Tuebingen: Kerns Verlag. p. 13-26.

Lozano, A., Casas, B., Bentz, C., and Ferrer-i-Cancho, R. (2017). Fast calculation of entropy with Zhang's estimator. In: Studies in Quantitative Linguistics 23, ed. by E. Kelih, R. Knight, J. Mačutek, and A. Wilson. Lüdenscheid: RAM Verlag. p. 273-285.

Bentz, C. and Winter, B. (2014). Languages with more second language learners tend to lose nominal case . In: Quantifying Language Dynamics: On the Cutting Edge of Areal and Phylogenetic Linguistics, ed. S. Wichmann and J. Good. Leiden: Brill.

Bentz, C. and Christiansen, M. H. (2013). Linguistic Adaptation:The trade-off between case marking and fixed word orders in Germanic and Romance languages . In: Eastward flows the great river. Festschrift in honor of Prof. William S-Y. Wang on his 80th birthday, ed. Feng Shi and Gang Peng. City University of Hong Kong Press, 48-56.

Invited Talks

Bentz, C. (2023). Beyond Words - Lower and upper bounds on the entropy of subword units in diverse languages. University of Düsseldorf, 16th International Cognitive Linguistics Conference (ICLC).

Bentz, C. and Dutkiewicz, E. (2023). Zipf’s law of abbreviation holds for geometric signs of the Upper Paleolithic. MPI Leipzig, Communicative Efficiency Workshop.

Bentz, C. (2020). Quantitative comparison of natural languages and other sequences. University of Jerusalem, Language Learning and Processing Laboratory.

Bentz, C. (2020). SignBase - A collection of geometric signs in the Paleolithic. University of Aarhus, CLIOARCH webinar.

Bentz, C. (2020). Language change as a (random?) walk in entropy space. University of Cambridge, Cambridge Language Sciences Annual Symposium, December 2019.

Bentz, C., Dediu, D., Verkerk, A. and Jäger, G. (2018). The evolution of language families is shaped by the environment beyond neutral drift. University of York, Workshop on Phylogenetic Linguistics and Linguistic Theory, November 2018.

Bentz, C., Alikaniotis, D., Cysouw, M. and Ferrer-i-Cancho, R. (2017). Word entropy across more than 1000 languages: the linear relationship between unigram entropy and entropy rate. University of Warsaw, Workshop on Statistics of Languages: Theories and Experiments, July 2017.

Bentz, C. (2017). Adaptive languages: An information-theoretic account of linguistic diversity. University of Jena, English Department, January 2017.

Nichols, J. and Bentz, C. (2016). Morphological complexity as an indicator of population isolation. University of Tübingen, Inaugural Symposium: Words, Bones, Genes, Tools, November 2016.

Bentz, C. (2015). The distribution of information-theoretic complexity across languages of the world. University of Tübingen, Inaugural Symposium: Words, Bones, Genes, Tools, November 2015.

Bentz, C. (2015). The impact of non-native speakers on word forms and (potentially) word order. Massachusetts Institute of Technology, TedLab talks, February 2015.

Bentz, C. (2014). Towards measuring and modelling the (potential) impact of non-native speakers on language structures. University of Zurich, Linguistisches Kolloquium , December 2014.

Bentz, C. (2014). Measuring and modelling the impact of non-native speakers on morphological productivity. University of Manchester, Department of Linguistics and English Language, Langwidge Sandwidge, October 2014.

Bentz, C. (2013). Zipf's law and the grammar of languages: A (potential) cross-linguistic measure of syntheticity.NLIP seminar series, Computer Laboratory, Cambridge University.

Conference Talks

Dutkiewicz, E., and Bentz, C. (2019). SignBase – a data-driven approach to abstract signs in the Paleolithic. International Conference on Mesolithic Art – Abstraction, Decoration, Messages, Halle (Saale), Germany.

Bentz, C., Dediu, D., Verkerk, A., and Jäger, G. (2019). The evolution of language families is shaped by the environment beyond neutral drift. Inaugural Workshop for the Center for the Interdisciplinary Study of Language Evolution (ISLE), Zurich, Switzerland.

Bentz, C., Nichols, J., and Jäger, G. (2017). Assessing the effect of geographical isolation on morphological complexity. 12th Conference of the Association for Linguistic Typology (ALT), Canberra, Australia.

Bentz, C., Berdicevskis, A. (2016). Learning pressures simplify morphology: corpus, computational and experimental evidence. Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 26th International Conference on Computational Linguistics (COLING 2016), December 2016.

Bentz, C. (2016). Phylogenetic signals of language "external" factors. 46th Poznan Linguistic Meeting (PLM2016), September 2016.

Bentz, C. (2016). The Low-Complexity-Belt: Evidence for large-scale language contact in human prehistory? 11th International Conference on the Evolution of Language (EVOLANG 11, New Orleans), March 2016.

Bentz, C. and Ferrer-i-Cancho, R. (2015). Zipf's law of abbreviation as an absolute linguistic universal. Capturing Phylogenetic Algorithms for Linguistics. Lorentz Center Workshop, Leiden, October 2015.

Bentz, C. (2015). Causality in historical language change. Causality in the language sciences. Workshop by the Max Planck Institute for Mathematics in the Sciences, Leipzig, April 2015.

Bentz, C. (2014). Measuring and modelling lexical diversity across languages. 5th UK Cognitive Linguistics Conference (Lancaster, UK), July 2014.

Bentz, C. and Buttery, P. (2014). Towards a computational model of grammaticalization and lexical diversity. CogACLL Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics (Gothenburg), May 2014.

Bentz, C. and Kiela, D. (2014). Zipf's law across languages of the world: Towards a cross-linguistic measure of lexical diversity. 10th International Conference on the Evolution of Language (EVOLANG 10, Vienna), April 2014.

Bentz, C. (2014). Adaptive languages: Modeling lexical diversity cross-linguistically. Workshop in Computational Linguistics.Cambridge University, February 2014.

Bentz, C. (2013). Beyond rule versus rote? Processing of distinctive genitive and dative case markers in German. 35th annual meeting of the Cognitive Science Society.Humboldt University Berlin, August 2013.

Bentz, C. (2013). Measuring morphological productivity in synchronic and diachronic corpora. English Profile meeting. The Cass Center, Cambridge University Press, Cambridge.

Bentz, C. (2012). What frequency distributions of words in parallel texts might tell us about the grammatical structures of languages. 7th Newcastle-upon-Tyne Postgraduate Conference in Linguistics.

Bentz, C. and Winter, B. (2012). The impact of L2 speakers on the evolution of case marking. 9th International Conference on the Evolution of Language (EVOLANG 9, Kyoto).

Bentz, C. and Christiansen, M. H. (2010). Linguistic adaptation at work? The change of word order and case system from Latin to the Romance languages. 8th International Conference on the Evolution of Language (EVOLANG 8, Utrecht).


Scheiffele, S., Bentz, C., Haidle, M., and Stolarczyk, R. (2017). Assessing tool complexity: Combining approaches from Cognitive Archaeology and Information Theory. 7th Annual Meeting of the European Society for the study of Human Evolution (ESHE), Leiden, September 2017.

Bentz, C. (2016). A comparison between morphological complexity measures: Typological data vs. language corpora. Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 26th International Conference on Computational Linguistics (COLING 2016), December 2016.

Bentz, C. (2014). Zipf's law and the grammar of languages. Synthetic and analytic encoding strategies across languages of the world. Nijmegen Lectures at the MPI for Psycholinguistics, Nijmegen (27-29 January).

Bentz, C. (2013). Zipf's law and the grammar of languages. Synthetic and analytic encoding strategies across languages of the world. Language Sciences in the 21st century, Cambridge (3-4 October).