News and Discussions about Reddit
Welcome to !reddit. This is a community for all news and discussions about Reddit.
The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:
Rules
Rule 1- No brigading.
**You may not encourage brigading any communities or subreddits in any way. **
YSKs are about self-improvement on how to do things.
Rule 2- No illegal or NSFW or gore content.
**No illegal or NSFW or gore content. **
Rule 3- Do not seek mental, medical and professional help here.
Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.
Rule 4- No self promotion or upvote-farming of any kind.
That's it.
Rule 5- No baiting or sealioning or promoting an agenda.
Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.
Rule 6- Regarding META posts.
Provided it is about the community itself, you may post non-Reddit posts using the [META] tag on your post title.
Rule 7- You can't harass or disturb other members.
If you vocally harass or discriminate against any individual member, you will be removed.
Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.
Rule 8- All comments should try to stay relevant to their parent content.
Rule 9- Reposts from other platforms are not allowed.
Let everyone have their own content.
:::spoiler Rule 10- Majority of bots aren't allowed to participate here.
view the rest of the comments
Too bad the website is still openly accessible and still capable of being scraped
Probably what they’re targeting. Such as sites like safereddit, etc.
Well that's part of the thing. Web scraping doesn't get covered by policies. Like, they could ban your ip or any accounts you have, but web scraping itself will always be acceptable. It's why projects like NewPipe and Invidious don't care about YouTube cease and desist letters.
Oops look like this community hasn't been reviewed. Login if you still want to see the content.
Yea, I've seen those pop-ups when trying to find something out. It sucks but isn't a significant barrier to web scraping
I tried that on my desktop. So long as you are not actually logged in you cannot see the communities that are too small for a review or too adult after a review.
Parsing absolutely comes with a lot more overhead. Especially since many websites integrate a lot of JS interactivity nowadays, you oftentimes don’t get the full contents you’re looking for straight out of the HTML you’re getting out of your HTTP request, depending on the site.
In what way?
HTML definitely provides more overhead than json if you only care about the data.
Still waiting for the news that they took down old.reddit. Without the third party apps, that was the only way it could still be usable.
We use Akamai where I work for security, CDN, etc. Their services make it largely trivial to identify traffic from bots. They can classify requests in real time as coming from known bots like Googlebot to programming frameworks like python & java to bots that impersonate Googlebot, to virtually any other automated traffic from unknown bots.
If Reddit was smart they’d leverage something like that to allow Google, Bing, etc. to crawl their data and block all others, or poison others with bogus data. But we’re talking about Reddit here…