How to Protect the Data that Powers Artificial Intelligence

Adam Philipp
June 7, 2021

Data fuels AI,
But can it be protected?
It’s not so easy.

Additional Topics

Many Artificial Intelligence (AI) applications run on data. For example, as ZDNet explains, machine learning (a subset of AI) dates back to 1959, when the phrase was coined by Arthur Samuel, who developed the Samuel Checkers-playing Program — a self-learning project.

To “learn,” machine learning systems

are fed huge amounts of data, which they then use to learn how to carry out a specific task, such as understanding speech or captioning a photograph. The quality and size of this dataset is important for building a system able to accurately carry out its designated task. For example, if you were building a machine-learning system to predict house prices, the training data should include more than just the property size, but other salient factors such as the number of bedrooms or the size of the garden.

Many “free” consumer applications are really harvesters to collect information that can be monetized – sometimes in the form of “food” for machine learning applications. As has often been said, “If you’re not paying for it, you’re not the customer; you’re the product being sold.”

As we noted in this recent blog when a cloud storage app misrepresented its privacy practices and used uploaded photos to develop a facial recognition model, the Federal Trade Commission (FTC) not only ordered the company to delete and destroy all photos and videos collected from users, it also order deletion or destruction of any “Affected Work Product” – any models or algorithms developed using the improperly collected photos.

Collecting vast amounts of data can be tedious, time-consuming, and expensive, and that data clearly has value. But is data a form of intellectual property (IP) that can be protected against unauthorized use and copying by others?
Let’s look at the options.

Facts, per se, can’t be protected as any form of IP.

As New Media Rights explains,

Generally, facts and utilitarian language can’t receive copyright protection. Facts about the natural world or current and past events may be discovered, but that discovery isn’t an act of authorship that the law deems worthy enough to protect. This means that even if someone spends a lot of time and mental energy discovering a fact, you can still copy that fact and use it in your own work in any way you want without issue.

As Emory Libraries explains,

Databases as a whole can be protected by copyright as a compilation, but only under certain conditions. The first is that mere collection of data is not enough. The arrangement and selection of data must be sufficiently creative or original.
As the Supreme Court put it in Feist Publications v. Rural Telephone:
“Factual compilations… may possess the requisite originality. The compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers. These choices as to selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity, are sufficiently original that Congress may protect such compilations through the copyright laws.”

That degree of creativity generally doesn’t apply to the buckets of data used to train AI applications.

Similarly, under patent law, the US Supreme court has held that facts like the laws of nature (for example, E = mc²) and natural phenomena aren’t patentable because the “manifestations of laws of nature” are “part of the storehouse of knowledge,” “free to all men and reserved exclusively to none.”

Thus, the only form of US IP protection available for databases may be trade secret law. This protects data that the owner has taken reasonable efforts to keep secret.

Trade secret protection may work fine as long as the AI training data is kept in-house, to train the owner’s own AI products. However, if the data owner wants to license out that data to others, trade secret protection becomes much more tenuous.

Once “the cat is out of the bag” and the data becomes public, there may be little to nothing the data owner can do to prevent others from using its data.

Just like the haiku above, we like to keep our posts short and sweet. Hopefully, you found this bite-sized information helpful. If you would like more information, please do not hesitate to contact us here.

Let's work together.

Stay Informed

Sign up to receive Patent Poetry—a monthly roundup of key IP issues in our signature haiku format. Four articles (only 68 syllables); zero hassle.

PROTECT

DEAL

DEFEND

HIGH
TECHNOLOGY

MECHANICAL
& PRODUCTS

LIFE SCIENCES
& CHEMISTRY

BRANDING
& CREATIVE