Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

Are there tools that exist to anonymize writing styles?

I feel like with the rise of AI something that anonymizes writing styles should exist. For example it could look for differences in American versus British spelling like color versus colour or contextual things like soccer versus football and make edits accordingly. ChatGPT could be fed a prompt that says "Rewrite the following paragraphs as if they were written by an Australian" but I don't know if it would have a good enough grasp on the objective or if it would start shoehorning in references to koalas and fairy floss.

I tried searching online to see if something like this existed and found a few articles from around the 2010s such as Software Helps Identify Anonymous Writers or Helps Them Stay That Way by the New York Times. It talks about stylometry and Anonymouth but it seems like Anonymouth hasn't been updated in years. All recent articles seem to be about plagiarism and AI.

For context what got me thinking about the topic was remembering JK Rowling being revealed to be the author of a mystery novel called The Cuckoo’s Calling. Smithsonian wrote an article about it called How Did Computers Uncover J.K. Rowling’s Pseudonym?. I thought it could make for a neat post here.

Syn_Attck ,

There is a program built into Whonix, I believe it's called Kloak, that randomizes your keyboard input times so you can't be identified via keystroke timing JavaScript. There's also research into defeating stylomeyric analysis such as anonymouth but I'm sure there are plenty of new tools, if anyone find any that work well please reply here as I haven't looked in some years. 'Stylometric analysis' is the key phrase to search for.

With AI this will get worse (better identification based on typing styles) but it will also get better because you can setup a local LLM and ask it to re-write your text in a certain style. Touching on this, everyone uses a combination of unique phrases and misspelling or mis-spelling (see?) of words, and with enough text from a given account the chance of statistical probability in attribution is very high. It's how the Unibomber was identified after his manifesto was published, because he used a very specific phrase incorrectly and his brother recognized it, so his wife convinced him to call the FBI tip line about his brother.

SomeGuy69 ,

Yeah, it would need to be a browser extension, adding a button to scramble every text input field.
Or maybe even on the OS level, opening an input field above the browser one, so the original text was never input into the browser.

delirious_owl ,
@delirious_owl@discuss.online avatar

Translate to some foreign language. Then translate to some other foreign language. Then translate back to your language. Congrats, your writing style changed.

Syn_Attck ,

Ah, the classic game of Google Translate Telephone.

delirious_owl ,
@delirious_owl@discuss.online avatar

Better to do the translations locall, so the original never leaves your device

mctoasterson ,

Yes. I would use the privacy focused ones (there are several in Fdroid). If your threat model includes anonymity against state actor, such that they will be attempting to trace your writing style, you can be certain they could and would also just subpoena google for matching translation requests. It would be a lot easier to back into identifying you that way.

bitwolf ,

My coworkers use chatgpt for this.
Since it always answers in the same generic ways it's helpful to anonymize their peer reviews.

ReversalHatchery ,

I don't understand people who want to anonymize their writing but then use chatgpt to do that. For me at least they are not exactly the business that I would trust.

MachineFab812 ,

You can run your own offline instance. Not that randos are likely to, but still.

TheAnonymouseJoker Mod ,

Offline ChatGPT hosting needs hardware impossible for individuals to have.

bitwolf ,

They offer different sized models you don't have to use the fully fledged version

bitwolf ,

The concern here is less so OpenAI knowing, rather they worry about their coworkers identifying them.

Zerush ,
@Zerush@lemmy.ml avatar

You can look here, maybe you find something usefull, if you search for AI (+150k apps)

utopiah ,

I wouldn't just trust random Lemmy users (no offense) but instead check for actual fields, e.g stylometry or writeprint, and from there check the state of the art. Not being an expert would make that tricky so I would take a recent published papers, e.g https://arxiv.org/abs/2203.11849 to understand the challenge. As is always the case they'll review the field, e.g section 2 here, and clarify the 2 sides of the arm race, here Obfuscation/Deobfuscation. The former in 3.2 mentions examples of techniques authors estimate to be good starting point, e.g writeprintsRFC. I'd then search for such tools if they don't directly provide link to open-source repository, e.g theirs https://github.com/reginazhai/Authorship-Deobfuscation . I would then try a recent one that I can easily setup, e.g via Docker, and give it a go. I would then read the rest of the paper, see who cites it, and try to get a more up to date version.

TL;DR: I don't know but there is dedicated research which result I'd trust more than the opinion of strangers who are probably not expert.

sqgl ,

Just ask ChatGPT to paraphrase.

Lemongrab ,

Not a great solution unless you think you can trust OpenAI and their security implementation (which you shouldn't). We have seen simple PHP scripted prompts in the past have the AI recount an entire conversation from another user. Not safe at all.

sqgl ,

Fair point. Depends on what the document is used for I suppose — whether such security is an issue (vs simply anonymising the style).

LazaroFilm ,
@LazaroFilm@lemmy.world avatar

I wonder if google translate through multiple languages can fm the trick?

CorrodedCranium OP ,
@CorrodedCranium@leminal.space avatar

I feel like if someone wanted to give off the impression that they were a non-English speaker that might work. I think it would be limited to a surface level though. Whoever attempted to use it would likely miss out on a lot of the common pitfalls someone learning a new language would run into like mixing up the order of adjectives.

That and the content that is being run through a translator multiple times might get warped. I am not sure if going back and forth messes things up as badly as it did 10 years ago though.

delirious_owl ,
@delirious_owl@discuss.online avatar

This, but offline translated for privacy

breakcore , (edited )

There was a talk about detecting patterns and writing styles at Chaos Computer Congress a bunch of years ago.

The researchers also presented a tool to anonymize text as far as I can remember.

I will go look for the talk.

Edit: Found it!

https://media.ccc.de/v/31c3_-_6173_-_en_-_saal_g_-_201412291715_-_source_code_and_cross-domain_authorship_attribution_-_aylin_-_greenie_-_rebekah_overdorf

They talk about their software to find who wrote what, but also how to use that knowledge to write software that attempts to anonymize text.

CorrodedCranium OP ,
@CorrodedCranium@leminal.space avatar

The New York Times article I linked mentioned that. I will have to watch that video though so I can get a better understanding of the mechanics of it. Thanks for the link.

montar ,

Non-native english speakers tend to mix up various styles, you could ask somone to paraphrase your text.

Bristle1744 ,
@Bristle1744@lemmy.today avatar

Probably the cut and paste from magazines, but you copy and paste the sentences you want to use.
A lot of extra work, but no AI to rat you out.

CorrodedCranium OP ,
@CorrodedCranium@leminal.space avatar

Serial killer style

leraje ,
@leraje@lemmy.blahaj.zone avatar
shutz ,

Autocorrect?

If you use it before it has learned you writing idiosyncrasies?

CorrodedCranium OP ,
@CorrodedCranium@leminal.space avatar

That would be an interesting way of doing it. Someone could probably couple that with predictive text for decent results

ciferecaNinjo , (edited )

ChatGPT will probably remember it was you who asked and doxx you in retaliation when it discovers you’ve plagerized chatGPT.

Another thought is to translate it into Scottish. But then again, you probably still want to be understood.

Changing dialect may be too small of a change. But if you could say write this like 1-2 generations younger/older using high school slang of the time you might get a useful difference.

CorrodedCranium OP ,
@CorrodedCranium@leminal.space avatar

Changing dialect may be too small of a change. But if you could say write this like 1-2 generations younger/older using high school slang of the time you might get a useful difference.

I feel like knowing the correct use of slang for a demographic would be a challenge and require a lot of constant research. Even if someone was to go off of slang younger people were using I feel like there's a risk of it being a regional term.

Trying to force it I'd probably end up with something like "Those elf bars be dripping but that extra popcorn lung was a vibe check on god" which gives off "How Do You Do, Fellow Kids?" vibes.

https://leminal.space/pictrs/image/409c5a55-cc09-488c-911f-be587daabb99.jpeg

MigratingtoLemmy ,

I had asked for the same thing a while back but didn't really get much. The round-about method that I have found is to finetune FOSS LLMs on data you want it to represent (largely text) and then diving into some prompt engineering to get it to say something you like.

However, I haven't been able to find a test which can accurately point towards text not having specific weights that it relies on. Cue the attacks on GPT-4 which deanonymises data it was trained on. You might also want to read about DPT and Shadowing techniques to red-team LLMs and LLM-generated text as literature.

Cheers

  • All
  • Subscribed
  • Moderated
  • Favorites
  • privacy@lemmy.ml
  • incremental_games
  • meta
  • All magazines