Inconsistent enforcement, suppressed speech, unchecked hate mar Tamil content moderation

“These platforms were not made for us.” Many content creators and digital rights advocates expressed this sentiment as we interviewed them for a recent study evaluating how Tamil speakers experienced content moderation on social media.

Content moderation is a complex process of enforcing a platform’s content policies, complying with local laws, and keeping users safe online, while also engendering trust in the platform. If this sounds nearly impossible, it is. All the more true for content moderation in languages other than English which goes under-examined or under-considered when developing and deploying content moderation systems, leading to inequitable outcomes.

In the study exploring how social media companies conduct content moderation in Tamil, a majority of the over 100 Tamil-speaking users surveyed reported inconsistent moderation. They pointed to chronic over-moderation. Users spoke of perceptions of suppressed speech or unexplained takedowns of political speech, while high amounts of harassment and hate speech remain on the platform. For many, this inconsistent moderation caused frustration and distrust.

For some users, erroneous moderation was not just the norm, but the price they paid for being online. Women, LGBTQ+ users, and those belonging to caste-oppressed groups told us that that very often, content moderation doubly penalised them. They felt as if the frequency and prevalence of harassment and slur-ridden targeted attacks required them to develop a thick skin to go online every day. Some users noted that these attacks were compounded in the case of intersecting identities. For instance, one user said: “You know the treatment of any Indian woman is very different. The treatment of an English-speaking Dalit woman is going to be very different from a Tamil-speaking Dalit woman.”

Researchers have pointed to these incidents in the past and showed how online platforms and their design often give rise to coordinated gender-based and caste-based attacks. Though counter-intuitive, these users’ efforts to shine a light on targeted harassment fell short, with their own posts being taken down while the offending content remains online. This double standard resulted in confusion and frustration, and ultimately distrust. Some of these users sought psychological support such as therapy or took breaks from going online. They spoke of poor content moderation as a plain fact, and not as an anomaly.

Digital rights advocates, the media, and even former platform representatives say part of this is by design. Our report highlighted three shortcomings in how companies pursue content moderation in non-English languages generally, but Tamil specifically, and offered recommendations on how that can be addressed.

Poor data availability

Social media companies say they struggle with non-English language moderation because of the unavailability of digitised data to train automated moderation systems. There is simply far more English-language data on the Web to train automated systems. Among Indic languages, Dravidian languages such as Tamil are especially under-resourced. As a result, the automated tools companies build to moderate content often are trained on far more English-language data than in Tamil or any other languages. AI experts say even the available data in Tamil does not adequately represent how users speak the language online. Tamil users, we found, often spoke in Tamlish — mixing Tamil and English or transliterating Tamil using Roman script — or spoke in coded language or by mixing classical and spoken Tamil interchangeably. As a result, moderation fell short simply because companies had not invested in robust tooling to moderate Tamil speech online.

Second, companies do not prioritise hiring subject matter, linguistic, or regional expertise even when they invest in a South Asia or Asia-Pacific presence. One former company representative told us that companies pursue a “coverage model” when seeking to deploy resources for content moderation, often ensuring there is at least one person tasked with the region and then deploying additional resources in the event of a crisis or government scrutiny that warrants more attention in a region or language. Long-time Tamil computing experts assert that this is a departure from previous eras of company engagement.

Finally, companies do not always disclose to users when or why their posts were taken down, engendering distrust and suspicion among users, particularly set against a backdrop of media reports and researchers pointing to instances where the Government of India has been responsible for ordering removal of posts.

User control

Companies can take multiple steps to improve moderation in Tamil and for the region. First, companies should be focused on making tools that help users control their own information environment. Already, companies that have long used centralised moderation claim to want to make moderation more dynamic and nimble, including shifting towards a community notes approach to fact-checking rather than a top-down approach where platforms or professional fact-checkers decide on content’s veracity and take action on that basis. Yet, these approaches fall short particularly when they are not designed in tandem with local users and local nuances.

Investments in moderation should be accompanied by robust engagement with local language experts and researchers, especially when those groups and expertise are in abundance. These organisations include those building high-quality datasets such as Karya or Tattle, which also builds user-controlled filters to block gender-based and caste-based harassment. Or efforts such as Vaani NLP or the Center for Tamil Natural Language Processing Research (CTNLPR), which contributes research and natural language processing tools in Tamil.

Woeful underinvestment in language capabilities and improving moderation across Indic languages remains a chronic issue plaguing companies’ operations in India and other parts of the South Asia region. Users have ultimately paid the price, feeling an indelible double standard when they experience content moderation.

Aliya Bhatia is Senior Policy Analyst and Dhanaraj Thakur is Research Director at the Center for Democracy & Technology, a non-profit non-partisan policy and research organisation based in Washington DC and Brussels; views expressed are personal

Published – September 18, 2025 12:20 am IST

Poor data availability

User control

Leave a Comment Cancel reply