Google defends AI training as fair use in its governance document

Since the launch of AI Overviews, research and publishing professionals have been paying close attention to how AI companies should manage the content used to train their models. Google has now shared its position. It emphasizes fair use and offers opt-out options, while also highlighting paid agreements for specific situations.

In a political document published June 25, Google says training models on publicly available web data is considered a “transformative, non-expressive use” that should remain protected under fair use in the United States. The company highlights opt-out controls and existing copyright law as their primary solutions to address publisher concerns.

The document titled “A Pragmatic Approach to AI Governance in America” brings together points previously shared by Google. It comes at a time when regulators and publishers are pushing to do more, seeking not only opt-outs but also clearer attribution and sometimes even compensation. For publishers looking to manage AI access to their content, it offers a useful overview of Google’s position.

Google’s position on copyright

Google likens AI training to “an art student getting inspiration from a gallery walk.” It also suggests that the same level of protection should be extended internationally through exceptions related to text and data mining.

For site owners who don’t want their content used, Google recommends using machine-readable controls like Google-Extended in their robots.txt. When AI results copy existing work, the solution is not to filter to determine whether a result is “too similar”, but to rely on well-known notice and takedown processes, as outlined in the paper.

Google is also exploring new ways to create value, such as partnerships with websites that provide content that help keep AI responses current and accurate, and agreements to pay for access to specialized, non-public content. The document does not specify any particular program, conditions or timetable.

Where the post lands

This month, the UK CMA introduced a new driving requirement which gives websites the ability to disable AI search features and requires Google to attribute publisher content. The regulator mentioned that this measure aims to help strengthen the negotiating power of publishers. Google has already started testing an opt-out toggle, although the reports available to publishers to help them decide do not yet include click data.

American publishers make their position even clearer. Digital Content Next Recently sent a cease and desist letter to the Common Crawl Foundation, emphasizing that “copyright law is not an opt-out regime.” This means that scrapers should ask permission before using content, rather than publishers having to ask to be excluded. This perspective directly calls into question the model of non-participation mentioned in the Google document.

Why it matters

The document highlights Google’s position as policymakers consider new rules. Google recommends keeping its current approach unchanged.

Publishers and regulators are looking for more than what the journal currently offers. They ask for compensation, permission-based scraping, and detailed click-level data. In response, the newspaper offers controls and manages negotiations on an individual basis.

Looking to the future

These are political positions, not product commitments. The foundational partnerships and content deals mentioned by Google could influence how value reaches publishers, but the document leaves the details flexible. Check to see if Google links programs, terms, or numbers to the value exchange language it currently includes in its policy documents.

Featured image: FotoField/Shutterstock

Source link

Google defends AI training as fair use in its governance document

Google’s position on copyright

Where the post lands

Why it matters

Looking to the future

Leave a ReplyCancel Reply

Where synthetic data fits into customer research