Paleogenetics and linguistics work hand in hand, revealing prehistory

Paleogenetics and linguistics. Disparate at first sight, but together they are helping to reveal the mysterious history of human migration thousands of years ago. The 35 year old scientist of Russian origin – Pavel Flegontov (University of Ostrava, Czech Republic) and his team have recently published an article in Nature that intends to put an end to the long-standing dispute over North American prehistory combining data and knowledge from paleogenetics, linguistics and archaeology.

Their breakthrough results suggest that a hypothesis discounted by the majority of linguists might actually be true and that the Yeniseian language family from central Siberia, that is almost extinct, and the Na-Dene languages spoken by many Native Americans might have had a common ancestor. This great success has brought Flegontov to the top of his field: He will continue his research at Harvard University with the world-famous geneticist David Reich.

Paleogenetics and linguistics seem to be too disparate to complement each other, how exactly does it work?
Historical linguists are able to – at some level of certainty – reconstruct proto-languages by the meticulous comparison of word roots and grammar rules across various languages. This has been done for the Indo-European family of languages, so we know what the ancestral language looked like 6 000 years ago. For instance, we know that Indo-Europeans most likely had wheels, wagons and horses, because there are some related terms reconstructed from the Proto-Indo-European language. But despite the fact that it is the most widely spoken language family in the world, including for example Slavic, Germanic, Celtic, Italic or Romance languages, its origins have long remained unknown.

And is the origin of the Indo-European language family clear nowadays?
A classic theory about its origin has been put forward based on “linguistic archaeology” and real archaeology. Once a proto-language is reconstructed, we go and take a look at what culture had those elements, and where we can locate it. And we zoom in on the Pontic-Caspian steppe region and a culture called the Yamnaya (Pit Grave) that existed about 6 000 years ago, and it seems that it might actually be a community that spoke a very early Indo-European language. Then, another hypotheses appeared, arguing that the spread of Indo-European languages was mediated by the first wave of farmers and herders migrating from Anatolia (modern Turkey) to Europe, that is much earlier, about 10 000 years ago. And it was almost impossible to make a decision about which of these hypotheses are correct.

But as we´ve sequenced the first farmers (by “us” I mean paleogeneticists in general, but not my group in particular), sequenced ancient Europeans from different time periods and sequenced the Yamnaya people, we can see that there was a massive influx, maybe even a violent influx of the Yamnaya people around 4 500 years ago from the steppe into Europe, and up to half of the European gene pool was replaced. We also see this movement of the Yamnaya people eastwards as far as China and India. Now we are tracing the migrations and they match pretty well with what we know about ancient Indo-Europeans from archaeology and linguistic reconstructions. So, we have found good archaeogenetic support for the classic hypothesis, it´s been proven beyond any reasonable doubt, although there is some debate still going on. We are now trying to develop similar stories for other regions of the world, like the relationship between the Yeniseian and Na-Dene language families.

That is the matter you deal with in the article recently published in Nature: a dispute about North American prehistory. What was unclear and how did the team under your leadership contribute to resolving it?
Yeniseian is an almost extinct language family in Siberia currently only spoken by the Kets, a small group scattered across a few villages in the middle of the Yenisei river basin, deep in taiga very far from civilisation. This language will probably be dead in a few years, like other related languages that died out in the 19th and 20th centuries and before. But they have been recorded and studied, and based on word lists and other data, linguists have reconstructed the Proto-Yeniseian language.

There is also a large language family in North America that’s called Na-Dene. These are languages of various groups of Native Americans living mostly in Alaska and North-western Canada. It looks like it´s really distinct from the other language families, but a few linguists have proposed a hypothesis that the Na-Dene and Yeniseian languages are related, i.e. they descend from a common ancestor. This hypothesis remains controversial and is not accepted by the majority of linguists. However, it is still considered among the best supported in the class of “not universally accepted language relationship hypotheses”.

We´ve collected a wide arsenal of methods, both standard and novel, and also for the first time obtained genomic data for ancient Aleuts, Athabaskans or Eskimos of Chukotka and got breakthrough results using two independent graph methods. We´ve seen migration flows that could be responsible for the spread of the Dene-Yeniseian language family, which of course doesn’t prove that the Dene-Yeniseian proto-language existed. Of course it’s up to linguists to arrive at a common opinion on that matter, but it now seems more likely that this family existed.

Another debate, this time in the genetic literature, concerned the origin of Na-Dene. Our major competitors, a large archaeogenetic team from Denmark led by Eske Willerslev, published a series of papers from 2014–2018 in the most prestigious journals (Nature, Science), claiming that the so-called Paleo-Eskimos, the first inhabitants of the American Arctic, have not admixed with Na-Dene ancestors. This is contrary to the earlier genetic results published in 2012 by our principal co-author David Reich. But by applying graph-based methods in a careful way we were able to show that the models favoured by Willerslev’s group (Koryak- and Chukchi-related people admixed with Na-Dene ancestors) are not the best-supported ones and that the earlier result published by David Reich’s team is correct. We were also able to develop a rather detailed model of population movements back and forth across the Bering Strait during the last 5 000 years.

It is a big thing you have completed. What are your scientific plans now?
I’m now working on a large-scale project on the history of Asia over roughly the last 10 000 years. So far no detailed graph describing Asian population history has been published, unlike for the West Eurasian side. For the first time we have developed a robust graph for major Asian lineages and used it to reveal the genetic affiliations of hundreds of ancient individuals, and of present-day populations. So we could know what the genetic composition of Wusun, Kangju, Xiongnu, and dozens of other mysterious peoples mentioned in Chinese chronicles was. I have been invited to continue this and similar projects as a member of David Reich’s team at the Harvard Medical School. I hope to move to Harvard in September and work there as a staff scientist for about 5 years. I plan to keep a part-time position at the University of Ostrava and to keep my research group here, and hope to return to Czechia in 5 years.

Wow, I´m sure the University of Ostrava is also hoping for you to come back. Why have you actually chosen the University of Ostrava for your career?
I’ve not chosen the University of Ostrava, it has chosen me. Back in 2013 Marek Eliáš kindly offered me the chance to start my own research group here, originally funded fully by the University. He noticed my skills in microbial genomics, and genome biologists were rare in Czechia at that time. But I immediately opened a new research direction in my group, in addition to the old one. I still work on the biodiversity and ecology of marine microbes, but I mostly focus on human history.

All the facts about prehistoric migration you have mentioned are fascinating. But to me, the most interesting thing would actually be how and why people migrated. Are you also interested in this?
Of course, I am also interested in that aspect. Regarding European expansion, its main driver was technological innovation – these people were the second to domesticate horse and the first to invent the wheel and wheeled vehicles, and due to that they became a formidable military power. They conquered and killed extensively.

And a similar situation we see in the case of Eskimo-Inuit peoples in an absolutely different part of the world, around one or two millennia ago. They were the first groups in the Arctic who became really successful in whale hunting, so they obtained access to a huge source of meat and fuel (fat), and could feed large villages. They developed advanced technology like large boats, sophisticated harpoons and spears and spread rapidly throughout the American Arctic. Their precursors, the Paleo-Eskimos, were probably the first people who hunted marine mammals on a regular basis, like seal and walrus, and within only a few centuries they had occupied huge territories on the American Arctic coast. They moved from Alaska to northern Greenland, which about 4 000 years ago was warmer than today. But polar night there lasts for almost half a year, and we do not know how they were able to survive, as they could not hunt during the polar night and probably had to live on stocks of meat they put in permafrost. They also had just a little bit of driftwood for fuel, as oil lamps with seal, walrus or whale oil were invented much later. They might have slept under piles of mammal furs to preserve heat, like a form of hibernation. But we don´t know for sure.

Where do you get the information from? How do you get the samples?
My colleagues and I in Ostrava don´t collect any samples ourselves, but there are people who travel around the world, collect skull samples or teeth from museums and anthropological collections and provide them to archaeogenetic centres. They are then analysed and a massive amount of data is obtained. I collaborate with two large archaeogenetic centres, in Germany and the USA. They are mostly tooth samples or samples from the cochlear part of the temporal bone, from the inner ear. That is the densest bone in our body, and for some reason DNA preservation is exceptionally good there.

How exactly do you get data from a piece of skull? Is it a complicated process?
That part is a routine protocol now, but it became routine only a few years ago. It´s pretty complicated technically. First of all, you clean a certain part of the surface of the skull, irradiate it with ultraviolet light and/or clean it using bleach to reduce contamination with modern DNA, then you drill into it, take a small piece of the cochlear part and mill it into fine powder, and extract the DNA from it. Then sequencing “libraries” are made, and they are sequenced on a sequencing machine. We determine whether there is enough human DNA in the ancient sample, because 90 %, or rather 99 % of DNA is just bacterial. Then we prioritize well-preserved samples and treat them with a certain enzyme which removes the most common type of ancient DNA damage (DNA that sits for millennia in bones gets degraded and chemically modified). Afterwards, we sequence libraries at a high coverage and extract as much information as possible. Nowadays another technology called “targeted enrichment” is available. There is a chip with, let´s say, one million short DNA fragments that correspond to variable positions in the genome. We apply the ancient sample (actually the sequencing library made from that sample) onto that chip and it pulls out human DNA and lets us wash away DNA that is not bound, so we get an enriched sample. That method is now common and it helps to extract information from samples that would have been inaccessible a few years ago, like samples from the tropics, for example.

What can we learn from DNA these days? Can we tell what people looked like for example?
We can learn about phenotypes of ancient people, what they looked like, at some level of certainty because there are certain genes that are responsible for eye colour, skin colour, hair colour, resistance to lactose, etc. It´s been found that hunter-gatherers in the Palaeolithic time in Europe, (10 000 years ago and earlier) had dark hair, blue eyes and dark skin. The current type of blond hair and blue eyes which is common in the North of Europe appeared pretty late. It was possibly brought by the Indo-European expansion from eastern hunter-gatherers which occupied the East European Plain in the Bronze Age and earlier. It´s not native to Europe, it´s been brought here. So we can reconstruct just simple traits, we can´t reconstruct the shape of the face or body.

We can also learn about ancient pathogens, that’s a very popular area of research now. From a tooth in an ancient skeleton we can sequence some pathogens, like the plague. It was found that it might have been the Huns who brought the plague that struck the Byzantine Empire in the 7th century AD. This massive epidemic was called the Justinian plague. That strain survived and struck again in the 14th century.

Is it possible to tell the ancestry of an individual? Based on my DNA, can I learn whether I am a pure Czech or have some ancestors in other countries or parts of the world?
All these estimates are extremely inaccurate. Especially for us living in a relatively homogeneous Slavic society, genetic testing in general cannot tell us interesting things about our ancestry, because it’s hard to distinguish Slavs from Germans, for example. For people living in extremely mixed societies, like the USA, genetic testing is more informative because they often don’t know their ancestral proportions even at the continental scale, whether they have African, East Asian or Native American ancestors, etc.

For anyone who is interested in the topic but is not a scientist in the field, is there any book you would recommend, let´s say, for average brains?
There´s a book “Who we are and how we got here” by David Reich, discussed a lot recently because in this book Reich supports the scientific concept of “race”, while there is a trend nowadays in the US to purge this word and concept from scientific literature (oddly enough, this word is extremely common in the US media). The book gives a detailed account of all recent discoveries, but the style is a little bit heavy. Then there is also “A brief history of anyone who ever lived” by Adam Rutherford.