Large Language Models and Who Owns Our Knowledge

This transcript is dictated by Peter Murray-Rust and doesn't represent anyone else's views.

The Rise of Large Language Models

This is about the use of large language models and who owns what.

In the last 5 years, a new technology has developed, or rather, a fairly old technology has been brought to production, where it's possible to scrape the whole of published knowledge, if you can get it, and if you're large enough and have the resources to do it.

Then it's scraped and condensed it into something called Large Language Models, which I won't go into here. But I've had some involvement with these models.

How LLMs Work

These models are a condensation of the material that comes in, so information is lost during this process. Therefore, what comes out of it may include inaccuracies, it may include hallucinations, where the large language model system makes up answers that are linguistically plausible, but make no sense, or are completely untrue.

However, these are sufficiently valuable that a very large number of people are using them, and they're becoming mainstream.

The Copyright Question

Can we do anything about this? Should we do anything about it?

In principle, our copyright should prevent the reuse of this material as derivative works. But the only way that we can find out whether they are derivative works is to have it answered in a court, and of course, it depends what jurisdiction you're in and how far that extends. So, it's a complex matter and one that very few individuals can become involved with.

The Reality

In practice, the large language model system owners can ride roughshod over our knowledge, and they can do what they like with it.

There are many downsides to this. The authorship of the material is often, or usually not referenced. In other words, we can't say who wrote it. They may take our material and assert our authorship of it when it's combined with something else, and people have found their voices and their face is associated with material that they did not create.

The Future of Knowledge

Coming back to the largely textual approach of open access, we can expect a large amount of material which is a composite of the scraped material, and may or may not be trustable in detail.

So, I don't see any end to this.

semanticclimate

← Back