RoundSparrow

joined 1 year ago
MODERATOR OF
 

These issues have been obvious for months, lemmy.ml wasn't sharing the server logs

Now at least there are multiple sites with a modest amount of data who see these issues:

https://github.com/LemmyNet/lemmy/issues/4017

1
2023-09-28 (bulletintree.com)
 

Is there something special about the 28th day of the month and precisely 90 days?

A very obvious server-crashing / denial of service problem was called-out in Lemmy code two days before the Reddit deadline. https://github.com/LemmyNet/lemmy/issues/3394

Observations:

  1. Why would anyone think 5 is a good design for production in the first place. It puts into question the developers for over 4 years of experience - they clearly understand the technical issue - it is the same coding / parameter issue for any programming language. What is the motivation / priority here?

  2. lemmy.ml developer-run server (then the Lemmy server with the most data) was crashing from PostgreSQL overloads May and June 2023 every day...

  3. there were active countdowns to the July 1 Reddit API change, This was June 28.

  4. The change takes about 30 seconds to code, by no means is it difficult to understand. But it must be approved by the core developers of over 4 years on the project... and even notify live sites to urgently edit the Rust source code and re-compile. (And why not move this value to an environment variable that can be set without recompiling Rust code?)

June 28 issue opened / code created
July 1 Reddit API deadline
September 28 code published

90 days to change what has contributed to lemmy.ml, beehaw, lemmy.world - and the entire network of Lemmy servers crashing constantly from Lemmy overload. Almost as bad as GitHub Issue 2910 being ignored all month of June 2023!

https://github.com/LemmyNet/lemmy/issues/3394

[–] RoundSparrow@bulletintree.com 1 points 1 year ago* (last edited 1 year ago)

On a positive note, an actual concern for data integrity expressed by core developer!

"This is a major issue with moderation, we should consider publishing the fix in an 0.18.5 release."

https://github.com/LemmyNet/lemmy/pull/3988

Although the project has an obsession over deleting data. Removing data. For communist flag waving project leaders, why is there such a focus on messages being deleted/removed? Why isn't it more like WIkipedia where commons is emphasized and terms-of-service emphasize that this is primarily a public forum and contributing to commons (communism?! hello?).

The lack of care for data and actually noticing data on the site keeps being demonstrated. Issue has been ignored for months - and was newly introduced bug when all the post Reddit API change was going on...

Just opened today, people repeating it, which the site creators do not repeat these easily solved bugs as priorities. Why not have a "top 20 bugs" and organize them 2 ways, easy of fix and importance of fix. https://github.com/LemmyNet/lemmy/issues/3987

2023-09-22

Communications... still really odd how May, June, July there was so little: https://sh.itjust.works/post/5652703

The claims to support Reddit level performance without listening to what Reddit has to say about PostgreSQL scaling from more than a decade ago is... still really bad. They still claim 'high performance' on the front page of the project as they have for a long time, when it isn't because it lacks any caching and there are still bugs lurking in database due to lack of testing with significant data.

Claiming that federation scales to Reddit when Reddit is a single-site (and has no federation equivalent) is pretty odd performance claim.

 

A change in direction for the project this week?

Maybe the reputation of stability on lemmy.world and people realizing that the amount of activity really wasn't that high - and lemm.ee shutting out images. Most of all, Beehaw's criticism maybe finally resonated.

Beehaw was online a full year before Reddit - and saw just how long-term issues were not being addressed... maybe that is what it took.

It is worth keeping a positive eye on things.

The logging that comes out of the Rust code on errors really says it all. Over 4.5 years of coding on the same project and running it on the live public Internet at lemmy.ml - and there is no way for a site operator/admin to view the Rust code failure logs without having to do all that independently (and no recommendations on the importance of viewing logs). And when GitHub issues get posted with log problems, the developers with all the Rust code experience ignore them.

[–] RoundSparrow@bulletintree.com 1 points 1 year ago* (last edited 1 year ago)

This gem of a quote: "That index is great, I didn't see that because on my database I guess the person table is too small for it to matter." https://github.com/LemmyNet/lemmy/pull/3960

That's exactly the problem with the whole project, no data, not concerned about data, ignoring all the problems on lemmy.ml since April despite 4 years of experience... data is not the focus of the project developers. lemmy.ml ran for 4 years on public Internet with nearly zero data. Not even a couple hundred megabytes of data.... and no testing and observation of problems with scaling and more data. It seems like they have 4 years of experience that isn't experience... just like Beehaw has shown with moderation experience and tools.

They ignored Issue 2910 for months during the critical Reddit API issue period, it worked fine with no data in the system, but was surely crashing lemmy.ml lemmy.world beehaw and all my test systems once even a modest amount of data is populated (it really does not take much)!

data loss... this has been going on for more weeks than I can count, and the developers with over 4 years experience are not the ones fixing such data-damage/data-loss bugs... just has been the pattern since Issue 2910 was ignored for months.

https://github.com/LemmyNet/lemmy/pull/3965

 

It seems api_tests is unstable, failing around half the time.... been that way for days it seems.

What it comes down to me in May was that after 4 years of coding - if they knew they had scale problems (the queue system of federation and the PostgreSQL were both buggy and performing badly)...

The natural answer was to split the code out. Push more to lemmy-ui, such as adding caching to the API calls for "trending communities" and getSite call to NodeJS caching... something.

If they wanted to maintain their Rust approach to development, create a temporary app to get out of the problems and have a fresh approach. Nginx would allow even specific API paths to be redirected to another application. The read-only post and comment listings, community listings, could have been forked out.

The API is why people left Reddit. and Lemmy had an API and kbin did not have an API.

 

Cambridge Analytica was well underway in 2013, now 10 years ago. People like to think that just because researchers find x on Twitter and y on Facebook - that that is the clearly documented cases - that the tactics and general psychology didn't copy everywhere.

Cambridge Analytica is mostly famous for Facebook... but I don't view their direct targeting of individuals to be the long-term damage. The long-term damage is that they legitimized psycological manipulation, falsehoods, as a form of winning audiences. The were Psychology/Psychiatry professionals who applied human history and experience towards making people believe false things. Like a rebirth of Dr. AA Brill from 1929 on a new scale. The legitimization of it without any ethical uprising...

 

The only instance with significant creation activity that isn't all bot content... had to resort to cloudflare due to the data performance... and now the problems with that solution have started to be taken on... https://lemmy.world/post/4366376

"This is how the Fediverse works. There is so much bad practices, so much haphazardly implemented functionality and so much bad API documentation all over the place that I wonder why nothing has extremely exploded so far." - Dirk at lemmy.ml

Dirk's comment is from hee: https://lemmy.world/post/4128651

Dirk is from Germany, not American like I am - and joined the same server I did - creating this site. The core developers left lemmy.ml crashing for months - now it's working better because there is less activity - not because the code scales and is improved.

The terrible SQL from Rust and Diesel was why I never opened shop. I knew lemmy.ml was online for over 4 full years..

Another new front-end was the 2023 priority from all this. July 11 is when they started: https://github.com/LemmyNet/lemmy-ui-leptos

When a dozen independent projects are building front-end apps, the people with over 4 years of experience with the code take all that back-end experience and bottle it up. Instead of actually improving documentation and API testing surface... WITH ALL those YEARS of experience... off to start a new Rust-centered project. The advantage being that Rust can use the same Rust objects... but that doesn't do anything for the front-ends that don't use Rust, the smartphone apps, etc.

"For the longer term, I have some further ideas:
4) Invite-based registrations
I believe that one of the best ways to effectively combat spam and malicious users is to implement an invite system on Lemmy. I have wanted to work on such a system ever since I first set up this instance, but real life and other things have been getting in the way, so I haven’t had a chance. However, with the current situation, I believe this feature is more important then ever, and I’m very hopeful I will be able to make time to work on it very soon.

My idea would be to grant our users a few invites, which would replenish every month if used. An invite will be required to sign up on lemm.ee after that point. The system will keep track of the invite hierarchy, and in extreme cases (such as spambot sign-ups), inviters may be held responsible for rule breaking users they have invited.

While this will certainly create a barrier of entry to signing up on lemm.ee, we are already one of the biggest instances, and I think at this point, such a barrier will do more good than harm."

 

yesterday .world had to turn off sign-up and even shut down shitposting community.

This is basically the front-door of Lemmy. And as others are starting to notice, 60K users after all the people seeking better and trying out things isn't that many.. And there are seemingly a lot of people who use multiple servers given the technical instability of Lemmy's code... I am one of those people who spends 10 minutes a day on 4 servers, but I'm cutting back because content just isn't there and it's now often gaming topics and news making the rounds as duplicate stories over a 2 to 4 day period (besides memes)

Social media in general... there was so much Facebook hate for the past 8 years... but not much betterment came of it. Today looking over Reddit, it hasn't really changed that much in the past 4 months... there was a group of dissatisfied people who don't seem to want to actually build something better - just want to protest Reddit.

YouTube and TV advertising - that's a huge topic. YouTube does have a lot of small-time original creators, but is the money the reason why? You can''t make money on Reddit or Lemmy unless you have a business shop or something related to specialty topics (such as auto repair in a discussion community about same).

 

It isn't just the constant database crashes in Lemmy, GitHub issue 2910...

https://github.com/LemmyNet/lemmy/pull/3708

On July 24, sanitation of HTML was added to the code. But the testing was not called for and it broken titles of postings, link parameter ampersands, discussion of programming code in code blocks

It's data... and now it's very difficult to undo all the damaged data that has been put into the database for weeks now.

Lemmy is a Link aggregator, and it damages Links now... the ampersand parameter deliminator in URL links is now broken because of this code not being tested. Why wasn't there a call for testing to something that was going to alter every new post and comment from both federation and Lemmy itself? how did such obvious things such as a ? parameter list in URL get overlooked... and then new bugfix release comes out after this was known as an issue - and still not fixed.

Database crashing that results in lost data from unsaved post and comments, failure to deliver Federation data without any way client or server operators are notified, and damaged data as fundamental as URL website links...

I'm all for code changes gong in fast, but the lack of actually testing things and spot-checking on Lemmy instead of just changing Rust code without really realizing that a link aggregator uses ampersand in URL links... and not asking people to help think of side-effects...

Development process could even ask just a couple sites with more attentive operators to try out the code for a few days and ask people to report any problems before advising all sites to upgrade and break their URL links.

 

I had hoped to be able to do this with mostly API tricks and leveraging existing data structures within Lemmy's database... but I think it's proving that August 2023 isn't the right timing.

I learned a lot by prototyping this and thinking about multi-person editing like moderators can do in a community...

 

June 4, 2023 is when I felt I had to get involved. And June 8, 2023 is when I crated my own testing-focused experimental Lemmy install that compiled Main from GitHub source and ready to test the changes I thought were surely going to come to the SQL because of GitHub issue 2910. I wanted to test the code that that developers who had been working and running lemmy.ml for over 4 full years - would surely address. It was June 4 with Issue 2910, the June 30 Reddit API cutoff deadline countdown was well under way. Lemmy.ml put in major hardware upgrades on June 13, 2023 - and I was puzzled why such an easy 2 or 3 hour fix for Issue 2910 wasn't put in... but I still had hope that everyone would see the pending countdown to June 30 API deadline and a fix could go in within 10 days - by June 23... for some slack time before June 30..

Watching Beehaw, up and running for 17 months on Lemmy - crashing constantly... I thought, surely the developers were seeing Issue 2910 happening over there... but June 28 came, June 29 came, June 30... nothing. With 4 years of experience on the Rust code base and such, they were the ones to fix Issue 2910.... but from June 4 to June 30, it just didn't happen.

view more: next ›