It’s no secret that social media platforms like Facebook are host to thousands of spam pages and massive amounts of content intended to generate clicks and views from users. Although a lot of spam content may be created with the purpose of collecting personal data or generating financial rewards, a lot of this content is often viewed as benign and simply annoying, rather than suspicious.
However, spam networks can also be used by actors to generate and disseminate disinformation to achieve financial and strategic aims. In this blog, we will investigate a series of Facebook pages to better understand how spam news sites can be wielded as tools to generate revenue, influence audiences and promote harmful narratives online. We will also look at how AI-generated content is assisting distributors in reaching a larger audience. And, as always, we will include some OSINT tradecraft tips and tricks for open-source researchers and investigators.
The background
A primary goal of clickbait sites and spam news networks is, simply, money. There are ads deployed on the sites to generate revenue—this is no real surprise, and financial benefit is generally thought to be the main reason for the proliferation of spam ‘news’ pages on platforms like Facebook. It is worth noting, though, that clickbait news pages that are mainly designed to yield financial gain can serve a secondary purpose as well—they may promote harmful or controversial narratives, spread politically or ideologically-motivated disinformation, or seek to foster general division in an online audience.
In 2017, CNN reported on the scale of Macedonian-run spam networks and troll farms targeting Western audiences in the lead-up to the 2016 election, noting that the clickbait news sites contained ‘bogus stories’ that promoted particular candidates. In 2021, it was reported that content from Facebook troll farms reached an audience of 140 million Americans in the lead up to the 2020 election. Many of the troll farms, based in Eastern Europe, appeared to be targeting the same audiences as Russia’s Internet Research Agency (IRA) had during the 2016 election.
The Network
In May 2024, we viewed a Facebook page with the name ‘Proud American’ that was sharing, primarily, AI-generated images purporting to show female US veterans. Although, at times, the names of real individuals were used, the stories and images bore no relation to reality (we’ll take a closer look at an example of content further on). At the time, the content was largely formulaic and single-issue.
A few months later, in July 2024, a return to the page showed a change in content. Although the page name had remained, the content had shifted to, mainly, clickbait celebrity stories. Although the Facebook page itself did not feature advertising, the ‘news’ websites linked to were typical clickbait with advertising links to generate revenue. Further searching (using a combination of Facebook hashtag searching and Google dorking) discovered a network of ten or so similar, often generically named Facebook pages linking to the same content across a few different domains:
The Facebook pages combined had over 260,000 followers, and almost the same number of ‘likes’. Other characteristics of the pages included:
US-based addresses and (sometimes) phone numbers listed in the page information
Generic profile pictures and logos, sometimes with misspellings
Regular, rapid posting of clickbait news linking to an external domain
Most of the pages were created within the last six months, but some were older, with one page created in 2017.
Facebook’s transparency settings can tell us where a page owner is located—to find this out, we need to navigate to the ‘About’ section of the profile, and then choose ‘Page Transparency’. Then, select the option to ‘See all’.
This will show us the likely location of the owner of the page. For all pages we identified sharing the same content and domains, the location listed was Macedonia.
This came as no real surprise, considering the previous reporting on Macedonian troll farms and clickbait news sites. Still, it was evidence that efforts to remove and prevent spam and troll networks on the platform hadn’t been entirely successful.
The content
Although we did not have time to investigate every piece of content across the ‘news’ sites, there appear to be three main categories:
Content taken from other clickbait sources, entertainment news sites or satirical news sites—a quick keyword search helps to reveal when articles are copied outright from other news sources.
AI-generated text and imagery.
Inauthentic content and headlines that appear to be unique to the clickbait network.
The bulk of the ‘news’ websites linked to from the Macedonian-run Facebook pages tend to feature formulaic celebrity and gossip news, that, at first glance, doesn’t seem particularly interesting or divisive. There are, however, some common buzzwords that were more politically or culturally divisive: for example, references to ‘wokeism’ were particularly evident across a range of celebrity ‘news’ stories and posts.
Additionally, the Telegram channel for one of the key domains shared by the Facebook pages, dailynewsbreak[.]org, includes politicised election-related messages in each post, and some pieces of content posted across the different domains leverage narratives that promote specific candidates and values.
Since spam and troll networks may often pivot their content to appeal to a new audience or generate more clicks, it is also useful to investigate archives, like Archive.today and the Wayback Machine, to see whether there have been noticeable changes.
Archive.today: https://archive.md/
Wayback Machine: https://web.archive.org/
Archives of dailynewsbreak[.]org show that in the past, content was typically more sensationalist and divisive, with evidence of regular reposting from conspiracy theorist websites (see images below from 2021 and 2022).
As always, we want to consider agenda—are the networks of Macedonian-run troll farms and clickbait sites pushing an ideological or political agenda? Or are they just picking and choosing the content that is most likely to generate clicks from followers? The domains in question post inauthentic content, but the motivation of the actors behind it isn’t always obvious.
Changes over time in the type of content posted may be nothing more than a pivot to appeal to a wider audience—but these shifts in content may also be a tactic to present more divisive and harmful content to an audience who has, perhaps, built up some familiarity with the news source. For open-source investigators and those who seek to better understand how disinformation reaches and influences audiences, understanding actor intent can be a significant challenge, particularly when the individuals behind a domain or social media account are, by and large, anonymous.
To try and gauge whether there is a secondary (or even primary) political motivation or agenda for the spammers and clickbait creators, we might choose to:
Undertake detailed analysis of keywords, topics and narratives across all related domains—this may be time-consuming but can help to reveal patterns that may not be evident at first glance.
Monitor pages and domains for changes in the type of content over time, particularly in the lead-up to significant world events and elections.
Investigate the audience—who are the key groups targeted by this content? Which audiences are engaging with or sharing the content? Is there a specific target audience?
AI-generated content
Let’s take a moment now to look at one of the clearest trends in inauthentic content online—the increasing use of AI to generate fake content quickly and at scale. In the race to harvest as many clicks and site visits as possible, using AI-generated content is an easy way to increase the cycle of posting and visibility in social media feeds. However, it can also be used to fuel disinformation and ‘fake news’.
Let’s look at an example of content from one of the troll network domains and explore some of the implications. One of the prominent themes in the clickbait sites that we looked at was ‘veteran stories’—specifically stories of female US veterans, which were usually accompanied by AI-generated pictures.
The image above shows a story from May 2024, taken from the website stayinginformed[.]info, which is linked from the ‘Proud American’ Facebook page. The text of the story suggests that it was AI-generated—some of the indicators from this story included:
Odd choice of words and phrases that we wouldn’t expect to see in legitimate reporting (i.e. ‘the vibrant heart of St. Louis’).
A lack of concrete details or context—although there are some references to historical dates and events, there is nothing in the text to tell us when the purported event (the birthday celebration) is happening.
Flowery language—this, for me, is usually the first thing I notice when I encounter content that I think is AI. Take this paragraph, for example:
Cathay, seated in a place of honor, beamed with joy. Despite her advanced age, her eyes sparkled with the same determination and warmth that had carried her through her remarkable life. Throughout the day, she mingled with guests, sharing stories and laughter, her presence a testament to the enduring power of the human spirit.”
The real Cathay Williams—for she was a real person—never saw her 85th birthday, but she was, indeed, a veteran. The image used in the article bears no relation to the story of the real person. Ultimately, this isn’t content that seems particularly harmful or divisive—it is a made-up story, drawing from real-life characters, that isn’t likely to have a long-term influence on readers.
So, why does it matter? Sure, these websites are generating revenue from clicks, while failing to provide accurate or even interesting information, but we could say the same thing about, well, most of the internet.
However, the use of AI-generated content that is presented as real news (even clickbait entertainment-style news) further muddies the information environment—fabricated content that promotes narratives or agendas that is featured alongside ‘real’ stories may appear more genuine or believable to an audience. Additionally, the ability to generate realistic-looking images to support AI-generated stories increases the likelihood that some readers will be convinced of story being real—after all, ‘seeing is believing’!
AI-generated images – challenges for verification
We have all seen those odd, discombobulating AI-generated images of public figures with too many fingers, or animals with extra limbs—or perhaps deep fake videos of politicians or celebrities that are always too bizarre to be truly believable. But, of course, AI-generated imagery has continued to improve, along with users’ knowledge of effective prompting. As a result, fabricated images aren’t quite so easy to spot as they once were.
The image above was generated in a matter of seconds using fal.ai, and while it bears some of the hallmarks of AI—missing and mismatched buttons on the figure’s clothing; a wonky finger on their left hand; a slightly nonsensical arrangement of generic items in the background, etc.—it doesn’t look markedly different from real photos used in advertising that have undergone heavy editing. There are no extra limbs or fingers, or blobs of noise to catch our eye.
So, if we are examining suspected inauthentic content, but there are no obvious indicators of AI, what should be our next steps? The inauthentic content analysis map, which we covered in a previous blog, suggests some online tools for detecting the use of AI-generated text, and there are tools that attempt to detect AI images as well, (including https://isitai.com/ai-image-detector/ and https://huggingface.co/spaces/umm-maybe/AI-image-detector). However, as with all free browser-based tools, these might not offer a conclusive result. In the picture below, we can see the results from testing an image (quite clearly AI-generated) from one of the clickbait domains:
This is where deeper source evaluation methods are needed. The ‘R2C2’ process (read our blog on it here) for source evaluation can assist us in asking the right questions to evaluate the reliability, credibility, relevance and corroboration of online sources. When examining an image attached to a news article, we might ask:
Is the information provided by the source convincing and able to be believed?
Is the information in the image plausible? If not, under what conditions would the information be plausible?
Does it make sense? Is it free from logical contradictions?
If we conclude that an image (or any other piece of content) is fabricated, AI-generated, or, in some way, inauthentic, we can then ask questions about the actor’s motivation in producing such a source. Most casual readers, though, won’t go through this process—and that is part of the reason that fake imagery and content can, and likely will, be wielded effectively by troll networks, scammers, spammers and bad actors.
Key Takeaways
In this blog, we’ve looked at a network of (likely) Macedonian-operated Facebook pages that share the same/similar posts and domains, most of which are examples of inauthentic and fabricated content. These pages use AI-generated imagery and text as part of their content strategy. They also pivot the focus of content, possibly to reach a wider audience, and possibly to promote more divisive and potentially harmful content.
Key challenges for investigators and researchers include:
Validating and verifying information, including the provenance of stories and topics.
Detecting AI-generated content.
Understanding motivation and intent of content creators.
Using tools and processes such as the inauthentic content analysis map and R2C2 can help to verify and evaluate online content and suspected disinformation. Understanding the tactics and techniques that trolls and spammers use to reach their audience can help us to more effectively identify inauthentic content online—however, the persistence of troll and spam networks, and their ability to ‘hide in the noise’ by disseminating seemingly harmless content makes it particularly challenging to counter or debunk content.
To support your OSINT collection and analytical capability uplift, contact us at training@osintcombine.com to learn about our off-the-shelf and bespoke training offerings, including 'Disinformation: Detection, Collection, and Analysis'. This course is also available as an on-demand course via our Academy.
Comments