Pawb.Social Feedback

368 readers

1 users here now

An official community for users of Pawb.Social services (furry.engineer, pawb.fun, and pawb.social) to provide feedback and suggestions.

founded 2 years ago

MODERATORS

crashdoom@pawb.social

[RFC] Use of Automated Moderation Tools (pawb.social)

submitted 1 year ago by crashdoom@pawb.social to c/pawbsocial_feedback@pawb.social

23 comments fedilink hide all child comments

Due to the recent spam waves affecting the Fediverse, we'd like to open requests for comment on the use of automated moderation tools across Pawb.Social services.

We have a few ideas on what we'd like to do, but want to make sure users would feel comfortable with this before we go ahead with anything.

For each of these, please let us know if you believe each use-case is acceptable or not acceptable in your opinion, and if you feel like sharing additional info, we'd appreciate it.

1. Monitoring of Public Streaming Feed

We would like to set up a bot that monitors the public feed (all posts with Public visibility that appears in the Federated timeline) to flag any posts that meet our internally defined heuristic rules.

Flagged posts would be reported per normal from a special system-user account, but reports would not be forwarded to remote instances to avoid false-positives.

These rules would be fixed based on metadata from the posts (account indicators, mentions, links, etc.), but not per-se the content of the posts themselves.

2. Building of a local AI spam-detection model

Taking this a step further, we would like to experiment with using TensorFlow Lite and Google Coral Edge TPUs to make a fully local model, trained on the existing decisions made by our moderation team. To stress, the model would be local only and would not share data with any third party, or service.

This model would analyze the contents of the post for known spam-style content and identifiers, and raise a report to the moderation team where it exceeds a given threshold.

However, we do recognize that this would result in us processing posts from remote instances and users, so we would commit to not using any remote posts for training unless they are identified as spam by our moderators.

3. Use of local posts for non-spam training

If we see support with #2, we'd also like to request permission from users on a voluntary basis to provide as "ham" (or non-spam / known good posts) to the spam-detection model.

While new posts would be run through the model, they would not be used for training unless you give us explicit permission to use them in that manner.

I'm hoping this method will allow users who feel comfortable with this to assist in development of the model, while not compelling anyone to provide permission where they dislike or are uncomfortable with the use of their data for AI training.

4. Temporarily limiting suspected spam accounts

If our heuristics and / or AI detection identify a significant risk or pattern of spammy behavior, we would like to be able to temporarily hide / suppress content from the offending account until a moderator is able to review it. We've also suggested an alternative idea to Glitch-SOC, the fork we run for furry.engineer and pawb.fun, to allow hiding a post until it can be reviewed.

Limiting the account would prevent anyone not following them from seeing posts or mentions by them, until their account restriction is lifted by a moderator.

In a false-positive scenario, an innocent user may not have their posts or replies seen by a user on furry.engineer / pawb.fun until their account restriction is lifted which may break existing conversations or prevent new ones.

We'll be leaving this Request for Comment open-ended to allow for evolving opinions over time, but are looking for initial feedback within the next few days for Idea #1, and before the end of the week for ideas #2 through #4.

you are viewing a single comment's thread
view the rest of the comments

[–] savvywolf@pawb.social 6 points 1 year ago

It all feels generally ok by me, but I have some thoughts.

With #4, I am worried about timing. Imagine someone makes some art, and puts it in the main furry hashtag. But they trigger your bot because they link to their personal website or Patreon or something. It could take some number of hours, depending on whose awake or not, for it to be "cleared". By that time, if it's inserted into the feed based on its date, nobody would see it. If you are going to release transparency statistics (which IMO is important), I'd like to see "time until released" be a metric.

If this system goes into place, will the limit on mastodon.social etc. be lifted if you can screen them for spam?

On a personal note, I'm a bit weary of moderation by AI because of false positives. I'm neurodivergent, so I speak and communicate in a "different" way to neurotypical people. So I'm worried that an AI will pick up on that and block me unfairly because of the lack of training data. But that shouldn't be a problem here if you guys are committed to keeping a person in the loop (unlike most other places with AI moderation...). Although, apparently over half the furry fediverse is neurodivergent, so it's probably the neurotypicals that'll need to be worried about that.

Lastly, and this may be a hot take so I'm curious about other pawb people's thoughts on this, but does it need to be local only? This may be a crazy idea, but since it's opt in anyway, why not throw up all the training data on github or somewhere? Maybe even work with similarly minded instance admins and create one giant set of data. It'd be an ethically sourced dataset for moderation that can also be audited by anyone (to make sure you don't have specific political parties "accidentally" marked as spam). Most of my (and I presume many other people's) hangups with AI moderation is how closed and secretive it is. Would be nice to break that trend with something open.

Might also increase uptake as well. I bet a lot of privacy focused Linux users still will give Valve their system information because they see where it's going, and the value of doing so (i.e. to beat those filthy Mac users).