The Republic of Agora

Navigate Risks Of AI


Navigating the Risks of Artificial Intelligence on the Digital News Landscape

Caitlin Chin | 2023.08.31

Artificial intelligence (AI) is the most recent dilemma confronting the news industry, particularly following the public research release of ChatGPT in December 2022. A few outlets like BuzzFeed, News Corps Australia, and G/O Media quickly moved to incorporate generative AI into their content production. In early 2023, BuzzFeed rolled out ChatGPT-fueled quizzes, travel articles, and a recipe recommendation chatbot named Botatouille. Many others are scoping longer-term strategies, like the Washington Post, which announced the creation of two internal teams in May 2023 to explore future uses for AI. Writers, on the other hand, have generally been more cautious: both the Writers Guild of America, East and the Gizmodo Media Group Union condemned G/O Media in July 2023 for publishing AI-generated articles without first consulting editorial staffers, warning “unreliable AI programs notorious for creating falsehoods and plagiarizing the work of real writers” were “an existential threat to journalism.”

Some AI developers are attempting to get ahead of the controversy by framing their chatbots as value-added features for the news industry — in other words, helpers, not displacers, of human journalists. Over the past few months, Google has reportedly met with both national and local news outlets to pitch Genesis, a generative AI chatbot that can draft headlines, social media posts, and articles, framed as a productivity booster. In July 2023, OpenAI partnered with the American Journalism Project to provide $5 million in direct grants to enable local newsrooms to test-drive AI. The same month, it struck an agreement with the Associated Press to access archived articles through 1985 to train large language models (LLMs) in exchange for both licensing fees and experimental use of OpenAI software. But these limited partnerships gloss over technology’s strained history with newsrooms, one where most journalists have received no compensation for the use of their work to train algorithms even as digital ad-tech monopolies have contributed to their long-term decline in marketing revenue.

A common refrain has been that newsrooms must evolve to accommodate technological advancements, but this characterization is neither accurate nor fair. Even publishers that have adapted to the whims of powerful technology corporations have faced repercussions for doing so. For example, some digital news outlets redesigned their distribution strategies to capitalize on social media’s peak growth in the early 2010s, allowing individual users to view and share article links on decentralized channels in exchange for a steady stream of clicks. BuzzFeed, which initially gained traction through social media virality instead of traditional print subscriptions, epitomized this novel business model. But when Facebook unilaterally modified its content ranking algorithm in January 2018 to prioritize advertiser and connection-based engagement, which reduced visibility to external news websites, early movers like BuzzFeed were hit the hardest. BuzzFeed abruptly closed its Pulitzer-winning news division in April 2023 citing revenue shortfalls, while outlets like the New York Times, which had diversified its income stream with traditional subscriptions, were less vulnerable to opaque decisions by large technology companies.

The sustainability of news cannot fall on publishers alone; large digital platforms must share responsibility to understand and address their sizable impacts on society. Yet search engine and social media companies operate with relatively few U.S. legal requirements to build fairness and transparency into algorithms, protect sensitive personal information when serving personalized advertisements, engage in ad-tech practices that promote fair competition with news publishers, and mitigate the spread of harmful content online. Without bright-line U.S. regulations for technology companies, the recent acceleration in AI adoption presents at least four major risks that could severely undermine both news availability and public access to information in the long term.

The sustainability of news cannot fall on publishers alone; large digital platforms must share responsibility to understand and address their sizable impacts on society.

(1) Search engines may adopt AI to answer user queries, which would significantly decrease web traffic to external news websites.

Newspapers are in a reciprocal but largely unequal relationship with search engines. Google, which controls approximately 92 percent of the search engine market worldwide, sends news websites approximately 24 billion views per month. This may account for over one-third of publishers’ online traffic, which is a critical metric for digital advertisements. Shortly after the research release of ChatGPT, Google and Microsoft both announced plans to harness generative AI to directly answer user queries in the form of paragraphs. Unlike the current version of ChatGPT, which is not connected to the internet and only reflects historical training data prior to 2021, Microsoft’s Bing (which incorporates ChatGPT) and Google’s Bard both intend to derive responses from real-time data across the internet ecosystem, which could enable them to analyze breaking news. In this manner, LLMs could increase the gatekeeper power of dominant search engines that aim to maximize user engagement or screen time on their platforms.

Should LLMs direct fewer readers to click through Google to external websites, digital news organizations risk losing a major source of online visibility, audience engagement, and advertising revenue. Going forward, if news publishers cannot reliably count on search engine traffic in the long term, websites may increasingly depend on paywalls to draw revenue independent of large technology corporations. In 2019, 76 percent of U.S. newspapers employed paywalls, compared to 60 percent in 2017. Many substantially hiked subscription rates during this time frame as their advertising revenues simultaneously faltered. Paid subscriptions can help some news organizations build around loyal reader bases, especially if their content is specialized or exclusive. But the subscription pot is not large enough to sustain all publications, and smaller or more niche publications are disproportionately more likely to fold.

There are also negative societal externalities to walling off access to accurate and relevant information on topics including climate change, public health, and civil rights. Stephen Bates, a professor at the University of Nevada, Las Vegas, warns that the rising prevalence of paywalls could create “income rather than geographic news deserts.” In other words, individuals who cannot afford multiple newspaper subscriptions may be more likely to believe misinformation and lower-quality content — whether human- or AI-generated — that they view on social media or search engines for free. In a more fragmented internet, people are more likely to exist within their ideological bubbles, as chatbots cannot offer diverse perspectives like a human journalist can. Social media algorithms, which typically recommend or promote content based on past browsing activity or personal interests, further reinforce echo chambers based on user engagement and not the common good.

(2) Social media platforms are using AI to automatically rank posts, which enables the mass de-prioritization of legitimate news outlets in favor of fake, spammy, or manipulative user-uploaded content.

Prior to the internet age, news outlets controlled public attention in centralized destinations, effectively serving as the primary window for mass audiences to understand current events. But social media platforms democratized publishing in the past two decades by allowing anyone to gain international virality, transforming content-ranking algorithms into the new gatekeepers of attention and relevance. Newspapers face legal liability for publishing defamatory or false claims, but social media platforms generally do not. Section 230 of the Communications Decency Act allows “online computer services” immunity over most types of content that third-party users upload. Subsequently, many social media platforms employ AI recommendation systems that automatically rank content based on users’ predicted interests or personal connections, with the goal of maximizing screen time instead of collective public knowledge.

When Facebook chose to algorithmically de-prioritize public news content in January 2018, external news websites lost visitors. Within six months of that algorithmic change, BuzzFeed’s traffic decreased by 13 percent and ABC News’s by 12 percent, according to the analytics firm Comscore. The Pew Research Center found that only 31 percent of U.S. adults reported consuming news on Facebook by 2022, compared to 66 percent in 2016. Facebook’s power to singlehandedly decrease automated referrals to news websites, coupled with the platform’s first-ever decrease in U.S. users in 2022, had the indirect effect of deepening many publishers’ reliance on Google for web visitors and their ensuing digital advertising dollars. Furthermore, as Facebook whistleblower Frances Haugen revealed in 2021, the 2018 algorithmic policy shift may have harmed not only the bottom line of newspapers but also their perceived legitimacy within the social media ecosystem itself. In a leaked internal memo, company data scientists discovered the decision “had unhealthy side effects on important slices of public content, such as politics and news,” since the algorithm frequently ranked user-generated misinformation higher than trustworthy publisher-generated news.

In addition to text, the widespread availability of generative AI tools allows any internet user to easily post doctored images, video, and audio online, which could facilitate the impersonation of newsrooms or even threaten the safety of individual journalists. In 2022, Graphika detected AI-generated videos on Facebook simulating a nonexistent news agency called Wolf News, which appeared to broadcast messaging supporting the Chinese Communist Party. In 2018, far-right groups spread deepfake pornography videos containing journalist Rana Ayyub’s manipulated image in retaliation for her investigative reporting, subjecting her to years-long harassment, doxxing, and death threats. There are no U.S. federal laws that specifically regulate deepfake AI technologies, so every social media platform, app store, search engine, and online forum treats this content differently. Meta’s policy is to remove synthetic media that “would likely mislead someone into thinking that a subject of the video said words that they did not” or that “merges, replaces, or superimposes content on a video, making it appear to be authentic.” However, the company exempts “parody or satire.” Furthermore, as deepfake imagery becomes more realistic and commonplace, synthetic media policies will likely become progressively difficult to enforce. Content detection algorithms must continuously advance, too; otherwise, the internet ecosystem may become a more perilous space for public-facing journalists, with audiences who are less receptive to the information they convey.

(3) Chatbots cannot perform the same functions as a human journalist, but news executives may still leverage AI to streamline operations or justify workforce reductions in the short term.

At the moment, artificial general intelligence cannot match human writers and editors in technical capability. LLMs like ChatGPT are best equipped to automate specific functions like summarizing documents — but not advanced editorial skills like relationship building with sources, original analytical thinking, contextual understanding, or long-form creative writing. LLMs predict patterns and word associations based on their training datasets but, during large-scale deployments, are known to contain factual inaccuracies or even generate fake stories altogether. In February 2023, Penn State researchers also found that LLMs can spit out plagiarized text, whether by inadequately paraphrasing or copying training material verbatim. Such behavior is doubly problematic for some models, like ChatGPT, which do not attribute or cite sources by default. In addition, since many LLMs build upon text from online websites and forums — many of which have historically excluded or exhibited hostility toward individuals based on factors like gender identity, race, or sexual orientation — their automated outputs can reproduce broader societal biases.

The internet ecosystem may become a more perilous space for public-facing journalists, with audiences who are less receptive to the information they convey.

Despite these shortcomings, some corporate news executives may leverage LLMs to cut expenditures in the short term and not simply to boost productivity or create new value in the long term. When G/O Media, the parent company of Gizmodo and Deadspin, published AI-generated entertainment articles in July 2023, it attracted high public backlash over their many factual errors, lack of human editorial oversight, and overall substandard quality of writing. CNET paused its use of LLMs in January 2023 after a significant number of errors and plagiarized language were detected within its AI-generated articles, which the outlet admitted to having “quietly” published for months without clear disclosures. As historian David Walsh puts it, “The issue with AI is not that it will actually replace us, but that it will be used to justify catastrophic business decisions that will destroy entire industries precisely because AI cannot actually replace us.”

In March 2023, OpenAI, OpenResearch, and University of Pennsylvania researchers estimated that LLMs could affect job functions for 80 percent of the U.S. workforce — with writers, reporters, and journalists among the most vulnerable. Moreover, MIT, London School of Economics, and Boston University researchers detected a negative correlation between AI adoption and job recruitment between 2010 and 2018: for every 1 percent increase in AI deployment, companies cut hiring by approximately 1 percent. It is hardly surprising that CNET staffers cited long-term uncertainty from AI as one reason for unionizing in May 2023 or that the Writers’ Guild of America (WGA) proposed banning AI in screenwriting and prohibiting creative material from training algorithms when striking the same month. (A later proposal from the WGA contemplated allowing studios to use AI to craft screenplays but with human employees retaining full economic residuals and credits.) The impact of AI on the workforce is not simply a long-term issue; many writers and journalists are already facing a significant amount of labor uncertainty.

(4) Generative AI can increase the prevalence of spammy or false content online, which obscures legitimate news and funnels advertising dollars away from traditional publishers.

While present-day LLMs cannot compose original prose comparable to that of a highly skilled journalist, they are well suited to churning out low-cost, low-quality, and high-volume clickbait. While clickbait production does not help most traditional newsrooms, it benefits made-for-advertising (MFA) websites, which are spammy, traffic-driven sites designed solely to maximize page views and advertising dollars. As of August 2023, analytics firm NewsGuard discovered at least 437 websites that deployed generative AI to churn out large quantities of fictitious articles — many containing unsubstantiated conspiracy theories, unreliable medical advice, or fabricated product reviews. These sites draw clicks with headlines ranging from “Can lemon cure skin allergy?” to “I’m sorry for the confusion, as an AI language model I don’t have access to external information or news updates beyond my knowledge cutoff data. However, based on the given article title, an eye-catching news headline could be.”

The impact of AI on the workforce is not simply a long-term issue; many writers and journalists are already facing a significant amount of labor uncertainty.

MFA websites provide no material public benefits but, without proper safeguards, could create significant negative externalities in an AI era. LLMs are designed to generate outcomes at scale — a perfect fit for content farms whose sole purpose is search engine optimization (SEO) through nonsensical keywords, summarized or verbatim text from news sources, and highly repetitive spam. These articles often list fake authors or anonymous bylines and appear to lack human oversight. The rising prevalence of AI-generated spam could decrease public trust and understanding of critical current events, especially if it distorts the market for real news and obscures legitimate newsrooms as centralized sources of information. It will become exponentially harder for human journalists to disseminate trustworthy information when the internet ecosystem is stuffed with bots.

Content farms divert more than user attention away from legitimate news websites; they also cost valuable digital advertising dollars. The AI-generated websites that NewsGuard detected were stuffed with programmatic advertisements, including from major brands like Subaru and Citigroup — almost all of which were automatically routed through Google’s Ad Exchange. Google Ads maintains policies against servicing “spammy automatically-generated content” but does not publicly reveal the results of its placement algorithm or content review outcomes. In June 2023, an Adalytics study showed that Google frequently served video ads on lower-quality clickbait or junk websites without the awareness of its buy-side advertising clients. The same month, the Association of National Advertisers estimated that about $13 billion in digital advertising revenue is algorithmically funneled into clickbait MFA websites, which amounts to approximately 15 percent of the total $88 billion pie that marketers spend on automated ad exchanges every year. If not for the proliferation of AI-generated MFA content, those funds could otherwise provide a much-needed lifeline for legitimate news outlets.

Analysis of Policy Approaches

A massive legislative push to compel large technology platforms that host news content to pay publishers is playing out all over the world. In June 2023, the Canadian Parliament enacted the Online News Act, which requires designated search engines and social media platforms to pay news publishers for any external article links or quotes their users view or share. Australia and the European Union respectively passed the News Media Bargaining Code (NMBC) and Copyright Directive in 2021, and legislators in Brazil, India, the United Kingdom, the United States, and California have either proposed or are actively considering similar measures.

Canada’s parliamentary budget officer predicts that news organizations could share an additional $329 million in annual revenue after the Online News Act becomes effective. However, this figure is a small fraction of the estimated $4.9 billion that Canadian news outlets lost from 2010 to 2022, and it will never be realized if Google and Meta choose to boycott the law altogether. Just hours after the passage of the Online News Act, Meta announced plans to permanently shut down news access for Canadian users. Shortly after, Google stated it too would block all Canadian news links on its search engine. Their responses should not come as a surprise: directly prior to Australia’s passage of the NMBC in 2021, Meta abruptly cut off users from viewing news pages, and Google announced it might have “no real choice” but to withdraw search services within the country. Faced with those ultimatums, Australian lawmakers soon amended the NMBC’s final text in a manner that exempted Meta and Google from any binding actions. And after France began enforcing the Copyright Directive in 2021, Google throttled users from seeing article previews in France, which drastically decreased click-throughs. Their actions underscore the problem with forced negotiation: it is very difficult to enforce payment schemes when digital gatekeepers can simply choke off access to the news content internet users see.

These legislative measures, sometimes referred to as “link taxes,” create the wrong incentives. In the past, they have discouraged Google and Meta from displaying news content on their platforms, which decreases critical streams of traffic to external news websites. In the future, such policies may even motivate search engines to accelerate the adoption of generative AI to answer user queries instead of displaying external links. Forced payment measures also seek to reinforce newspapers’ dependency on large technology companies, as they do not address the structural reasons for Google and Meta’s market dominance. For these reasons, U.S. technology companies need bright-line rules that meaningfully prevent harmful ad-tech, data collection, and AI practices. Such rules, in turn, can foster a healthier and more sustainable online environment in which newsrooms can evolve in the long term.

(1) Dominant technology platforms need clear ex ante rules to prevent anticompetitive practices that reinforce their gatekeeper power over news publishers.

Two-party negotiations cannot work if the playing field is not level. Because Google and Meta have taken steps to lock in gatekeeper power over digital advertising and content distribution in recent years, they basically own the league newspapers operate in. For example, Google’s 2008 acquisition of DoubleClick enabled it to effectively monopolize all three stages of the ad-tech process: the buy-side advertiser network, sell-side publisher tools, and the ad exchange through which most news websites auction online advertising spots. In turn, market dominance enables the search giant to demand up to 35 percent of proceeds that would otherwise flow to publishers. It also provides Google with ample means to compel news websites to adopt Accelerated Mobile Pages formatting and control their ability to engage in header bidding, among other actions. Similarly, Meta also increased its gatekeeper power by acquiring nascent competitors like Instagram (2012) and WhatsApp (2014), which allowed it to combine user data across multiple subsidiaries to curate personalized advertisements much more granularly than traditional newspapers can.

These behaviors have raised alarm bells in numerous jurisdictions. In June 2023, the European Commission filed a formal statement of objection to Google’s ad-tech practices, arguing that the company’s control over all stages of the digital advertising process allows it to illegally disadvantage website publishers. In January 2023, the U.S. Department of Justice similarly sued Google over alleged anticompetitive actions that distort free competition in the ad-tech space, seeking to split up its Ad Manager suite. In November 2021, the Federal Trade Commission (FTC) challenged Meta’s acquisitions of Instagram and WhatsApp, seeking a possible divestiture of both platforms. Also in 2021, an Australian Competition and Consumer Commission (ACCC) investigation found that Google had engaged in “systemic competition concerns” like blocking ad-tech competitors from placing ads on YouTube and other subsidiaries. Further, ACCC chair Rod Sims noted at the time, “Investigation and enforcement proceedings under general competition laws are not well suited to deal with these sorts of broad concerns, and can take too long if anti-competitive harm is to be prevented.” The ACCC report summarizes a widespread issue: enforcement actions occur after the fact and are not guaranteed to undo the years of consolidation that have helped Google and Meta lock in market power and divert advertising revenue from news organizations.

Traditional antitrust law requires a modernized approach in the digital age — one that implements forward-looking guardrails to prevent dominant technology companies from harming nascent rivals, news publishers, and society at large. The European Union recently put new ex ante rules into place with its Digital Markets Act, which aims to prohibit gatekeeper technology platforms from abusing their control over multiple sides of a market. Members of the U.S. Congress have floated several bills containing similar proposals to limit practices like self-prioritization and acquisitions, but their momentum stalled following debates over their possible effects on malware prevention, content moderation, and other issues. In March 2023, Canada’s Competition Bureau put forward over 50 recommendations to modernize its antitrust legal framework, which has not undergone significant updates since the 1980s. Comprehensive antitrust reform is never quick or straightforward to implement, but it is essential to preventing anticompetitive acquisitions, growing news websites’ ad-tech options and revenue, and fostering a more diverse and sustainable news ecosystem overall.

(2) Both technology platforms and newsrooms need formal guardrails to promote ethics, fairness, and transparency in any development and deployment of AI.

Approximately 100 million entities registered for ChatGPT within two months of its release, meaning numerous companies, including search engines and newsrooms, are deploying LLMs before direct legal safeguards are in place. The United States has existing federal and state privacy, copyright, consumer protection, and civil rights laws that apply to some aspects of the digital space, but there are broad legal uncertainties about how to interpret them in the context of generative AI (see sections 3 and 4).

In July 2023, the White House announced voluntary commitments from OpenAI, Google, Meta, and four other AI developers to invest in algorithms to “address society’s greatest challenges” and create “robust technical mechanisms to ensure that users know when content is AI generated.” This announcement follows previous nonbinding strategies like the White House’s Blueprint for an AI Bill of Rights (2022) and the National Institute of Standards and Technology’s AI Risk Management Framework (2023), which both call upon companies to prioritize transparency, accountability, fairness, and privacy in AI development. Broad voluntary principles, like these, are the first steps in the absence of a mandatory legal framework that directly regulates generative AI, but LLM developers will need to take significant strides to meet them. For example, OpenAI released a tool in January 2023 to help identify AI-generated text but withdrew it six months later due to high error rates. Furthermore, generative AI as an industry largely continues to obscure how it collects data, assesses and mitigates risk, and promotes internal accountability.

As politicians additionally debate mandatory safeguards to mitigate the risks of AI, it is important to consider how any forthcoming laws could better support journalism and trustworthy information-sharing online. In 2022, Congress introduced the draft American Data Privacy and Protection Act (ADPPA), which contains provisions for large companies to publicly explain how high-risk AI systems make decisions, incorporate training data, and generate output. In April 2023, the National Telecommunications and Information Administration at the Department of Commerce issued a request for comment on AI accountability measures like audits and certifications. Transparency measures, such as these, could help news readers evaluate the credibility and fairness of the AI-generated text they view. They could also assist marketers in contesting automated advertisement placement with MFA websites instead of traditional news publishers. Both internet users and news publishers could benefit from increased public visibility into all AI development, regardless of the algorithm’s perceived level of risk of any given algorithm, which could include high-level statistics into methodology, specific sources of training data, generalized outcomes, and error rates.

In June 2023, the European Parliament passed the draft AI Act, which could require developers to proactively mitigate automated output that perpetuates existing societal inequities. Under the act, “general purpose” algorithms (which would likely include LLMs like ChatGPT) would be required to identify “reasonably foreseeable risks” in their design and test training datasets for bias. Furthermore, “high-risk systems” (which would include social media ranking algorithms with over 45 million users) would be subject to more intensive standards like human oversight, assessments of an algorithm’s potential impact in specific contexts, and documentation of training datasets. Going further, evaluations for high-risk AI use by large search engines and social media companies should also include their potential impacts on journalism and information-sharing, including the spread of harmful content or burying of legitimate news online.

As politicians additionally debate mandatory safeguards to mitigate the risks of AI, it is important to consider how any forthcoming laws could better support journalism and trustworthy information-sharing online.

While technology platforms need legal responsibilities to ensure fairness and accountability in AI development, any newsrooms that choose to deploy LLMs must also develop clear and transparent processes when doing so. Some news organizations have already published initial principles for generative AI. For example, the Guardian and the News/Media Alliance (NMA) both recommend public disclosures of any AI-generated output. The Guardian additionally pledges to retain human oversight over generative AI deployment, while the NMA also states that publishers who use LLMs should continue to bear responsibility for any false or discriminatory outcomes. However, there is a clear gap in the development and publication of formal standards: according to a May 2023 World Association of News Publishers survey, 49 percent of newsroom respondents had deployed LLMs, but only 20 percent had implemented formal guidelines. As a baseline, newsrooms need to identify clear purposes or contexts in which they might deploy LLMs, including conditions, safeguards, and limitations. Going further, newsrooms also need to strengthen labor protections for positions that AI deployment might substantially affect.

(3) Technology platforms should recognize the IP rights of news outlets and human creators, especially when using copyrighted articles to train algorithms.

AI developers have trained LLMs by scraping billions of written articles, images, audio, and lines of software code from humans, typically without compensating, citing, obtaining permission from, or even informing the original creators. A wide range of professionals, ranging from the NMA to comedian Sarah Silverman to computer programmers, are asking — or, in some cases, suing — AI developers to pay their training data sources, stating their unlicensed use of content violates IP rights. Days after the Associated Press reached a licensing deal with OpenAI in July 2023, thousands of authors signed an open letter to urge LLM developers to both obtain consent from and compensate writers in order to scrape their work. In January 2023, a group of software developers sued OpenAI and GitHub for building the code-generating algorithm Copilot based on their licensed work. That same month, several artists filed a class action lawsuit against Stability AI, Midjourney, and DeviantArt for processing their copyrighted material to train algorithms that generated images in their unique styles. Shortly after, Getty Images sued Stability AI in the United Kingdom and the United States for training algorithms based on 12 million copyrighted images. In addition, the Daily Mail is reportedly considering legal action against Google for scraping hundreds of thousands of copyrighted articles to develop Bard without permission.

These cases could take years to resolve in court, and their outcomes are uncertain. Generative AI has created novel questions over the interpretation of existing IP rights, particularly whether algorithms fall under the fair use exception in the Copyright Act. Although AI developers have acknowledged their history of scraping copyrighted material without consent, they have also argued that generative AI qualifies as fair use because the output is sufficiently “transformative” in nature compared to the original input. The plaintiffs in these lawsuits disagree, arguing that fair use does not protect the exploitation of copyrighted material in highly commercial contexts where AI developers benefit financially at the expense of human creators. Furthermore, generative AI tools reproduce copyrighted text or images in many cases, sometimes even quoting source text verbatim, which possibly contradicts the transformative use argument. Going forward, the definitions of “fair use” and “derivative works” will be critical for Congress or the courts to clarify to help writers and other content creators exercise their IP rights in the production of AI.

But even if some copyright holders manage to successfully negotiate or sue for compensation from AI developers, one-time payments are a narrow solution that will not prevent more seismic long-term impacts on journalism and other professional careers. ChatGPT is estimated to require trillions of data points, while OpenAI is currently valued at up to $29 billion. In other words, the sheer scale of training datasets alone means that most creators will not receive substantial payments. Better-known creators might wield more power to negotiate payouts compared to smaller or lesser-known ones, but technology corporations would likely retain disproportionate power to decide. Moreover, the licensing agreements would likely be short term or otherwise limited, while the disruption to writers’ jobs and living wages would be permanent. Since algorithms continually generate inferences based on past outputs, it would be difficult, if not impossible, to engineer a long-term residual payment system that both quantifies the monetary value of original data points and tracks subsequent usage in perpetuity.

Generative AI has created novel questions over the interpretation of existing IP rights, particularly whether algorithms fall under the fair use exception in the Copyright Act.

Although copyright infringement lawsuits, if successful, are unlikely to lead to a long-term residual solution, they could drastically slow or even pause commercial sales of LLMs. Some image hosting websites, such as Getty Images, have already banned AI-generated images to prevent exposure to litigation. Stability AI, alternatively, has announced future plans to allow content creators to opt out of the processing of their work. In the case of generative AI, a more cautious and gradual pace of adoption could perhaps benefit the field in the long term. AI developers need time to devise creative ways to work collaboratively with copyright holders, increase the integrity of their training data, and mitigate the overall pitfalls of their algorithms on journalism. They should not commercially deploy these tools without a solid understanding of the legal and ethical IP risks they raise.

(4) Modernized data privacy regulations are necessary to curb surveillance-based advertising and, in turn, return some market power from large technology companies to news publishers.

Because LLMs are built upon billions of news articles, social media posts, online forums, and other text-based conversations from across the web, they inevitably sweep up sensitive personal information. In turn, their automated outputs could reveal personal details related to specific individuals, whether accurate or fabricated, which carries privacy and reputational risks. In March 2023, the Italian Data Protection Authority temporarily banned ChatGPT from processing local users’ data but restored access weeks later after OpenAI agreed to allow EU individuals to exclude their personal information from training data sets and delete inaccuracies. In April 2023, the European Data Protection Board formed an ongoing task force to coordinate potential enforcement actions against ChatGPT amid investigations by data protection authorities in France, Spain, Germany, and other member countries. In May 2023, the Office of the Privacy Commissioner of Canada, along with three provincial authorities, opened probes into OpenAI’s collection, processing, and disclosure of personal information without sufficient consent, transparency, or accountability mechanisms.

In July 2023, the FTC requested information on OpenAI’s training data sources, risk mitigation measures, and automated outputs that reveal details about specific people. However, the consumer protection agency primarily acts against companies that engage in “unfair or deceptive” practices, as the United States lacks a comprehensive federal privacy law that directly regulates how LLMs collect and process personal information. Dozens of privacy bills were introduced in the 116th and 117th Congresses that would have modernized U.S. privacy protections, most prominently the ADPPA in 2022, but none were enacted into law. Many of these proposals, including the ADPPA, shared a similar framework that would (a) allow individuals to access, modify, and delete personal information that companies hold; (b) restrict companies to processing personal information only as necessary to provide an initial service that users request; and (c) require minimum transparency standards in data usage.

Most of these U.S. bills were introduced before the public release of ChatGPT and entirely exempted publicly available information — a significant omission that could allow many LLMs, which are often trained based on data scanned from public-facing web pages, to avoid any forthcoming privacy legal restrictions. Even so, systemic boundaries on how all technology platforms process even nonpublic personal information could still significantly help shift some digital advertising dollars away from Google and Meta and back to news websites. With a more limited capability to algorithmically track and microtarget ads based on individuals’ browsing behavior or other personal attributes, marketers might increasingly favor contextual ads based on the content of a webpage. In other words, marketers might place protein bar ads in the sports section of a local newspaper instead of targeting Facebook users who browse health-related posts, or they might place diaper ads in a parenting magazine instead of identifying shoppers between the ages of 25 and 45 who recently purchased a pregnancy test. Because contextual advertising does not depend on granular data analytics about individual website visitors, it can better support the news publishers that produce content instead of the social media platforms and search engines that track and distribute it.

(5) Large technology platforms need robust content moderation policies that promote a safe and healthy information ecosystem for news organizations to thrive in.

Section 230 of the 1996 Communications Decency Act indirectly reinforces the gatekeeper power of large social media platforms and search engines. With the legal power to independently choose which content to promote, demote, host, or block, technology platforms exercise substantial control over the distribution and visibility of news content, even as they directly compete with external websites for traffic and screen time. Gatekeepers have economic incentives to keep users hooked on their platforms, which sometimes means algorithmically promoting scandalous or enraging clickbait that captures the most user attention while de-ranking news reporting that benefits the public interest. In turn, a higher influx of false or toxic posts simultaneously subjects journalists to increased hostility and impedes readers’ ability to parse online junk to identify real news.

Despite legitimate concerns about Section 230, a complete repeal of the statute could negatively impact both the news industry and internet users. Section 230 protects the free exchange of information and allows technology platforms to host news content without fear of frivolous litigation from right-wing extremists. For example, it shields technology platforms that host news articles about abortion access, even as some states like Texas have tried to block people from obtaining reproductive health information in the aftermath of Dobbs v. Jackson Women’s Health Organization (2022). As seen from the unintended consequences of the Allow States and Victims to Fight Online Sex Trafficking Act (FOSTA), a Section 230 repeal would likely lead technology platforms to drastically reduce the availability of third-party content. In turn, journalists would likely lose social media users as a diverse resource for leads and article ideas. Independent or freelance journalists might have difficulty maintaining their online audiences or public brands, and smaller news start-ups could disproportionately struggle to get off the ground, especially if technology platforms face legal pressure to exclusively work with well-known incumbent entities.

Instead, many researchers — including some news publishers — have supported middle-ground approaches to amend Section 230 or otherwise enact reasonable guardrails for technology platforms to address harmful or illegal content. The European Union will begin to enforce the Digital Services Act (DSA) in 2024, which could provide one possible model for the United States. The DSA requires technology platforms to adhere to minimum transparency standards like publishing content takedown statistics and explaining recommendation algorithms. Furthermore, it requires them to maintain user controls like opt-outs of personalized content ranking algorithms and notice-and-action systems to flag illegal material. The DSA prevents technology platforms from targeting paid advertisements based on a person’s sexual orientation or political affiliation and prohibits behavioral ads toward children, which could reduce their edge over newspapers in digital marketing. The law also requires larger digital platforms — including Facebook and Google — to assess the “systemic” and “societal or economic” risks of their services, share publicly available data with approved researchers, and allow external compliance audits. While the DSA is one of the first major laws to require external transparency and user controls over ranking algorithms, U.S. and global legislators have also proposed numerous other frameworks. Each raises its own set of debates, but it is important to weigh how any potential measure can better foster a healthy ecosystem for journalism to thrive in.

(6) Governments should promote policies that recognize the value of journalism as a public good.

The news industry creates positive externalities that benefit far more than direct subscribers or readers. Newsrooms dedicate substantial resources to sourcing, fact-checking, and disseminating information in the public interest, and journalists serve as independent mechanisms to hold powerful institutions accountable. However, their immense societal value does not suit the system of free market capitalism in which it exists. Newsrooms earn income based on advertisements and subscriptions and not the public benefit of the information they communicate, leaving their overall bottom line vulnerable to ranking algorithms, reader or marketer demand, and even macroeconomic fluctuations. Some venture capitalist firms or wealthy individuals have attempted to invest in newsrooms, but their goals can be misaligned. Andreessen Horowitz invested $50 million in BuzzFeed News in 2014, but its constant pressure for perpetual growth, high returns, and profitability ultimately did not fit the company’s journalistic mission.

Recognizing the civic value of journalism, some governments have considered direct or indirect public funding for journalism. In 2018, Canada established a pot of C$50 million (around $39 million) to support local newsrooms, dispensed by a third-party intermediary to preserve press independence from the government. However, public funding may not work in every country, especially given differing legal, cultural, and political norms around press independence. U.S. politicians have a particularly tumultuous relationship with both the mainstream media and technology companies, evident in their lackluster support for public news systems. The United States spent just $3.16 per capita on public broadcasting in 2019, barely a fraction of France’s $75.89, Australia’s $35.78, and Canada’s $26.51. As Politico’s Jack Shafer points out, even this sparse amount has been highly controversial: “Politicians — usually Republicans like President Donald Trump — routinely issue threats to defund NPR and PBS every time they object to the outlets’ coverage. Do we really want to make the print press beholden to such political whims?”

. . . journalists serve as independent mechanisms to hold powerful institutions accountable. However, their immense societal value does not suit the system of free market capitalism in which it exists.

Apart from public funding, governments could consider other avenues to help newspapers diversify revenue sources, which, in turn, could reduce reliance on volatile traffic streams. For example, both France and Canada offer tax credits to incentivize individuals to subscribe to newspapers, and Canada amended its tax laws in 2020 to permit newsrooms to seek charitable donations. U.S. legislators could take a similar route. Some pitched tax deductions for newspaper subscribers, advertisers, and employers in the Local Journalism Sustainability Act in 2020 and 2021, though these measures did not reach a vote. Congress could also consider mechanisms to help newsrooms function as nonprofit or hybrid organizations — for example, by changing rules that prevent nonprofit editorial boards from endorsing candidates. In March 2023, the nonprofit Texas Observer reversed its closure decision after crowdfunding over $300,000, demonstrating the potential for newsrooms to tap into alternative support like philanthropic donations or grants. That said, nonprofit status alone is not a one-track solution; there is a limited pool of foundation grants, and the relatively low rates of existing news subscribers suggest the onus cannot fall on grassroots donors to sustain the industry.

Conclusion

As AI becomes more ubiquitous, the news industry will need to carve out space in a more crowded, more chaotic, and less original information ecosystem. The relationship between technology platforms and newsrooms will continue to evolve in both the short and long terms, but robust data governance frameworks are necessary now to support the financial viability of newspapers and cultivate a diverse and trustworthy online sphere. Large search engines and social media platforms need clear boundaries around their monetization of personal information to target advertisements, acquisitions of nascent competitors, exclusionary actions like self-prioritization, use of copyrighted material, and amplification or de-amplification of online traffic. In turn, both technology platforms and newsrooms require bright-line responsibilities to promote ethical and human-centered standards at every stage in the AI development and deployment process.

As AI becomes more ubiquitous, the news industry will need to carve out space in a more crowded, more chaotic, and less original information ecosystem.

These policies are not exhaustive. The long-term health and sustainability of the news industry will require more than technological solutions alone. Direct financial support for newsrooms is critical — whether through nonprofit models, direct or indirect government funding, or even nontraditional monetization methods. For example, some newsrooms have embraced side ventures like consulting or hosting events to raise income. But neither the production requirements nor the societal benefits of journalism alone can translate into dollars and cents. To succeed, news outlets also require a civically engaged society — one bound by critical thinking and collective interest in the community. In addition, corporate executives will need to urgently prioritize the input and well-being of human writers, including through job protections and union contracts, in order to sustain journalism as a stable and accessible career option. Ultimately, the actions that technology platforms, newsrooms, governments, and individuals take today will shape the long-term trajectory of the news industry.


Caitlin Chin-Rothmann is a fellow at the Center for Strategic and International Studies (CSIS), where she researches the impact of technology on geopolitics and society. Her current research interests include the relationships between data brokers and government agencies, the evolution of news in a digital era, and the role of technology platforms in countering online harmful content.

Made with by Agora