Sampling, AI Training, and Fair Use: The Misplaced Notion of Ethical/Non-Ethical Training
Immediately after any new major technological advancement emerges, people take sides. They become entrenched in hardcore positions that are based more on their opinion (what they feel) and far less on fact. And more often than not, the immediate reaction is that the new technology is bad for artists and creativity.
Centuries ago, when print technology was invented, many writers cried foul… When electronic music instruments like the electric guitar were invented, some musicians swore that it was the end of “real” music and “real” musicians… Same sentiment was echoed by many musicians when drum machines and digital samplers were invented…
Every major technological advancement, especially where writing, music, and creativity are concerned, generates immediate controversy. People take sides. And those who rail against the new technology inevitably end up on the wrong side of history.
The AI Training Debate
AI technology is here. And while it certainly holds various implications for creativity and authorship, if you see it as a “bad” thing, as something that will ultimately “hurt” creativity and undermine artists, writers, musicians, painters, and the like, you’re on the wrong side of history.
With regards to AI training, there is currently strong debate as to whether or not fair use protects such activity. Recent high profile copyright infringement cases involving AI has raised the profile of this issue, demonstrating that, while the courts may seem split on various areas AI-related questions, general consensus is that AI training meets the threshold of fair use. Also, the 2025 “Copyright and Artificial Intelligence, Part 3: Generative AI Training” report released by United States Copyright Office points to one thing: “Various uses of copyrighted works in AI training are likely to be transformative… The fair use determination requires balancing multiple statutory factors in light of all relevant circumstances…it is not possible to prejudge the result in any particular case.”
Next, even while the Copyright Office states that it “will continue to monitor developments in technology, case law, and markets, and to offer further assistance to Congress as it considers these issues,” there’s just no getting around the reality that AI “training” on copyrighted works is the sort of activity that fair use seeks to protect.
Hence, the “ethical”/“non-ethical” and “consent” sub-issue that has arisen along with the general debate about AI training is largely displaced. The public does not need consent to learn, study, digest, ergo, “train” on, a piece of music or any other copyrighted work. Consent comes into the picture at the point of publication and distribution of the element (elements) appropriated from a copyrighted work. And when an appropriation of copyrighted work leads to a formal claim of copyright infringement in a court of law, each case is determined on a case-by-case basis; there is no bright line. In the Copyright Act, there is no blanket consent decree for simply “training”.
This is the standard. Unfortunately, most people know very little about how copyright law actually works; and they know even less about fair use, a critical safe harbor for permissive borrowing, and its primary objectives. Thus, some people rush to conflate the concept of consent where consent isn’t needed.
Beyond what the Copyright Office has to say on this matter, and it’s clear that the Copyright Office’s position is that AI training of copyrighted works is not de facto copyright infringement, it’s becoming increasingly clear that the issue isn’t really about using copyrighted works for “training”; again, recent court decisions point to such training as being transformative and therefore fair use. The real issue is really about access to copyrighted works, i.e. how the copyrighted works were accessed, not actually the training itself.
The Fundamental Fact At Play Here
Beyond this debate about AI training on copyrighted works, however, remains a fundamental fact that some people are overlooking. At the core, imitation and referencing are taking place when AI music generation is at work. This is the same concept, Imitation and referencing via analysis, “digestion”. “learning,” and “training”, that place when any musician listens to, studies, and digests any piece of music. And such activity has never required consent (permission) under copyright law. And the specific doctrine that has always protected this activity is fair use.
A garage band honing its skills on Led Zeppelin songs, i.e. training, is fair use. Again, this sort of training activity is protected by fair use, whether the band is using a vinyl record, a YouTube video, or a digital download. How the garage band accessed the Led Zeppelin music is a different matter altogether. But an illegal download, for instance, does not negate the public’s right to make fair use of the download. Again, access of the copyrighted work is the real issue here. A garage band selling cover versions of Led Zeppelin songs to the public without a mechanical license is not fair use. This activity, unlike mere training, is not protected by copyright law. See the difference?
Access to music is not copyrightable. More importantly, AI companies, like the rest of the public, already have access to any music that has ever been published and made available to the public. That an AI company uses publicly available material, in this case songs, to train its AI generator isn’t copyright infringement. If the AI company is not republishing or redistributing the music (in whole or substantial part) that it analyzes and learns from, i.e. “trains” on, then that’s not copyright infringement. It’s also worth noting that at the time of the publication of this article (book), there’s no law that specifically prohibits the use of copyrighted works in AI training. In fact, the Copyright Office, whose primary concern with regards to policy is the public good, has only recently begun to weigh in on this. And reading the Copyright Office’s initial reaction, it’s clear that the Copyright Office isn’t convinced that the use of copyrighted works in AI training is illegal.
Again, it doesn’t matter if the AI company accesses music through a major streamer like Spotify and Apple Music or if it accesses the music through YouTube or a direct upload from a user or whatever point of access. If an AI company’s music generator simply analyzes the characteristics of the music and songs that it ingests (digests), then imitates (“uses”) what it’s learned to create sound-alike songs or sound alike musical elements, that’s also not copyright infringement.
Viewed in this light, when you get right down to it, every musician, every recording artist since the 1930s — roughly the dawn of the music industry — has been “trained” on copyrighted music! Any musician who’s ever listened to a copyrighted song played on the radio or contained on a vinyl record or a cassette tape or a CD or in a download or on any capture media known to date has “trained” himself on copyrighted music. Led Zeppelin “trained” on music by the great blues musicians Robert Johnson, Willie Dixon, Muddy Waters, and Howlin’ Wolf. And Greta Van Fleet “trained” on songs by Led Zeppelin.
Finally, whether you sample from a vinyl record, a YouTube video, mp3 file, or any other audio capture medium doesn’t make the sampling more “cool”. The art of sampling is not limited to any one method. And while historically the tradition of sampling vinyl records has always been significant, it can never render the use of any other audio format as arbitrarily inferior. In fact, throughout the mid-‘80s and ‘90s, sample-based beatmakers (producers) routinely sampled other audio formats. So while sampling vinyl may be “cooler” to some, what counts, as always, is the final result. If the beat/song is dope, it’s dope! And sampling in general is already cool.
This article is adapted from The Art of Sampling, 3rd Edition