this post was submitted on 18 Aug 2023
1256 points (92.9% liked)

Technology

59300 readers
4940 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Get out, now.

you are viewing a single comment's thread
view the rest of the comments
[–] r1veRRR@feddit.de 1 points 1 year ago (3 children)

Imagine a website where EVERYONE sees the exact same content. You could just calculate that content once, save the result, and give everyone that pre-calculated result. This is called caching (roughly speaking).

Now imagine the other extreme: NOONE sees the same content. That means you have to do your (comparatively) expensive calculations every single time. That requires a lot more compute power, esp. if you want to maintain a decent speed.

Most websites aren't entirely one or the other, but in general anything customizable will make things just a little less cache-able, and therefore everything a little more compute-intensive. Blocking is one of those customizations.

[–] bjorney@bjorney.lol 21 points 1 year ago (1 children)

That would make sense if there wasn't such a thing as a "follow" feature.

Everyone already sees different content

[–] Quill7513@slrpnk.net -1 points 1 year ago

It still adds complexity. Let's break out each later of complexity on a community site.

  1. The site has no direct logins. There's an HTML file some people update together
  2. There's no currated feed. There's a single feed all posts go into. When you hit the API you just ask it "give me all the latest posts." The API server returns you a reasonable number of posts. 20, let's say
  3. Each user has a feed of their own posts. Now, in addition to what you could get before, you also can make queries for the most recent 20 posts made by a specific user.
  4. Each user has a feed of posts from users they follow. Now when you query the API for your following feed, multiple queries are made. A query is made to see who you follow, and then a query is made to get the most recent 20 posts from a blended feed. This part is actually already quite expensive, but its still sort of the bare minimum you'd expect for a community site like MySpace, Facebook, Twitter, Reddit, Mastodon, or Lemmy

this is where we will deviate from history. It was here that blocks were implemented. However I want to explain blocks in the level of complexity as they exist today. We will revisit later that Elon could reduce complexity by removing any of these other features, leading us to the conclusion that his motivation is not technical in nature

  1. Posts can be threaded, meaning that some posts are only relevant in relation to other posts. You don't want yo show them when a user makes a query for their, or another user's feed. But you do need to show them when someone opens a feed for a post. These feeds are threaded in reverse of your other feeds. This necessitates a different database mechanism than the one you use for normal posts. Now you have an architectural problem
  2. Posts can be reposted, including posts in threads, so sometimes you have to load a lot of context when someone goes to their usual home feed
  3. It takes a long time to calculate thread posts, especially now that they sometimes show in the default feed. You determine from usage data that it's more important to update the feed once the initial feed load is complete so when you load the initial feed, you also load the first 10 posts of every thread in that initial feed. The initial feed load goes from 20 posts to 200 posts even though users only see 20 posts when they see the feed. Its just now when yoh click on one of those feeds, it opens instantly because it was already preloaded
  4. Even though your users engage more if there are preloaded feeds, they'd still engage EVEN MORE if you brought the initial load time down more. You introduce caching. Now parts of feeds are ready to go from your inexpensive to query database instead of your expensive to query database. The cost is you do need some processes in the background ALWAYS updating the expensive to query database so the cheap to query one is always up to date with queries people are likely to make
  5. Your website has become really popular. You need to be able to up and down scale it so that you can meet demand and keep costs low. This means introducing architectural complexity so that different parts can run separately. Your three databases (regular, threaded posts, and cache) are all run on seperate hardware. Your core service is run on a clustering mechanism so you can scale up multiple instances of it at the same time. You even identify parts of the core service that are accessed a lot that could be their own services run on a different clustering mechanism so that they can scale up and down even faster than your core service can
  6. Your website is a global phenomenon. Its not good enough to always be available, users across the globe need to have a good experience. You need cache databases in every region you do business in. You pick as your regions of focus: Western US, Central Canada, Eastern US, Western Europe, Eastern Europe, the Middle East, India, Northern Asia, Southeast Asia, the south pacific, Latin America, and Africa. You need to keep data in sync across all regions, and you must also have local databases for storing posts from users in those regions until you can sync them to the other regions. Your database integrity model has to change entirely now, requiring you to implement a much more complicated database mechanism.
  7. California and Europe have special rules you must follow. You develop special features to satisfy their requirements, and must add special checks for those requirements to all of your API calls.
  8. For marketing reasons, instead of the most recent 20 posts in the main feed, you want to show the top 20 posts based on a calculation of recency, activity from other users, and engagement with your sponsored ads. Loading the feed is now 3x more expensive (in terms of computational time)
  9. Users don't want to see posts from other users. You must add a list of users for every user representing the users they don't want to see posts from. You must check on every feed update for posts that don't conform to these rules and throw them out, and then load a new post in their place. Every time a user blocks another user, you must invalidate the parts of the cache that were relevant to that users feed. Invalidating caches is very expensive
  10. Users don't want users they block to see their own posts. Now instead of checking a single user's block list as you load the feed, you must check both that block list, and every block list for ever poster for every post you load. This can make the feed load anywhere between 2x expensive or 200x expensive, depending on the diversity of posters in the feed you're calculating. You must also invidate more caches for every block

So yes. Blocking users on a website like twitter is complicated (with Twitter's current implementation and feature set) but there are ways to simplify this better than what Elon is doing. Not preloading tweets on page load or reducing the complexity of the sorting algorithm for the main feed could both do what Elon is claiming this is about. He doesn't want to get rid of those though. He views them as special important features. He picked blocking as the thing to reduce complications for because he views it as an unimportant feature, antithetical to his personal twitter goals

[–] pinkdrunkenelephants@sopuli.xyz 11 points 1 year ago* (last edited 1 year ago)

Computational power has Jack shit to do with his decision. That's just a cheap talking point meant to justify it in our eyes as if most of us even know what it means.

Stop letting obvious concern trolls manipulate you into accepting absurdities.

[–] mpa92643@lemmy.world 5 points 1 year ago

That's not really a solid argument. Blocking is likely implemented as a very tiny piece of what is already very likely a massive table join operation. Computationally, it's likely to have as much an impact on their compute costs as the floor mats in your car have on fuel efficiency.

Everyone already sees different content. It's an inherent part of Twitter. It's not a static site where everyone sees the same thing. You see the tweets of who you're following, and don't see tweets of those you've muted. All that filtering is happening at the server level. Any new tweets or edited tweets or deleted tweets change that content too, which is happening potentially hundreds of times a second for some users.

Anyway, caching would be implemented after a query for what tweets the user sees is performed to reduce network traffic between a browser and the Twitter servers. There's some memoization that can be done at the server level, but the blocking feature is likely to have almost no impact on that given the fundamental functionality of Twitter.