Why did I think for a moment the post was about Scilab (Matlab replacement)? I felt confused with Scilab being centralized. It runs on your computer, after all

[–] Kena@lemm.ee 2 points 7 hours ago

AI slop

[–] belated_frog_pants@beehaw.org 7 points 14 hours ago

Ugh this shit doesnt need AI.

[–] SweetCitrusBuzz@beehaw.org 26 points 1 day ago (2 children)

It had me up until 'AI'.

[–] TheMachineStops@discuss.tchncs.de 8 points 23 hours ago

Yeah the AI thing is stupid, everyone suddenly wants to incorporate AI. Check out the telegram bot though, you can request research papers or books through the bots and someone uploads it in a couple of hours.

[–] hendrik@palaver.p3x.de 2 points 17 hours ago (1 children)

If you do it right, you can have that AI replace all the complicated pirating and downloading process. I think someone already came up with a paper writer AI. You just give it the topic, and it fabricates a whole paper, including nice diagrams and pictures. 😅

Yeah, but that also made me worry. I wonder how AI and science mix. Supposedly, some researchers use AI. Especially "Retrieval-Augmented Generation" (information retrieval) and such. I'm not a scientist, but I didn't have much luck with AI and factual information. It just makes a lot of stuff up. To the point where I'm better off without.

[–] Mirodir@discuss.tchncs.de 4 points 13 hours ago (1 children)

AI can be good but I'd argue letting an LLM autonomously write a paper is not one of the ways. The risk of it writing factually wrong things is just too great.

To give you an example from astronomy: AI can help filter out "uninteresting" data, which encompasses a large majority of data coming in. It can also help by removing noise from imaging and by drastically speeding up lengthy physical simulations, at the cost of some accuracy.

None of those use cases use LLMs though.

[–] hendrik@palaver.p3x.de 3 points 10 hours ago (1 children)

Right, the public and journalists often lump everything together under the term "AI". When it's really a big difference between some domain specific pattern recognition task that can be done with machine learning and >99% accuracy... Or an ill-suited use-case where a LLM gets slapped on.

For example I frequently disagree with people using LLMs for summarization. That seems to be something a lot of people like. And I think they're particularly bad at it. All my results were riddled with inaccuracies, sometimes it'd miss the whole point of the input text. And it'd rarely summarize at all. It just picks a topic/paragraph here and there and writes some shorter version of that. Missing what a summary is about, providing me with the main points and conclusion, reducing the details and roughly outlining how the author got there. I think LLMs just can't do it.

I like them for other purposes, though.

[–] Mirodir@discuss.tchncs.de 3 points 10 hours ago* (last edited 10 hours ago) (1 children)

Re LLM summaries: I've noticed that too. For some of my classes shortly after the ChatGPT boom we were allowed to bring along summaries. I tried to feed it input text and told it to break it down into a sentence or two. Often it would just give a short summary about that topic but not actually use the concepts described in the original text.

Also minor nitpick but be wary of the term "accuracy". It is a terrible metric for most use cases and when a company advertises their AI having a high accuracy they're likely hiding something. For example, let's say we wanted to develop a model that can detect cancer on medical images. If our test set consists of 1% cancer inages and 99% normal tissue the 99% accuracy is achieved trivially easy by a model just predicting "no cancer" every time. A lot of the more interesting problems have class imbalances far worse than this one too.

[–] hendrik@palaver.p3x.de 2 points 8 hours ago* (last edited 8 hours ago)

What's the correct term within casual language? "correctness"? But that has the same issue... I'm not a native speaker...

By the way, I forgot my main point. I think that paper generator was kind of a joke. At least the older one, which predates AI and uses "hand-written context-free grammar":

SCIgen

And there are projects like Papergen and several others. But I think what I was referring to was the AI scientist which does everything from brainstorming research ideas, to simulating experiments, writing reports etc. That's not meant to be taken seriously, in the sense that you'll publish the generated results. But seems pretty creative to me, to write a paper about an artificial scientist...

[–] Andromxda@lemmy.dbzer0.com 52 points 1 day ago (3 children)

Doesn't Anna's Archive already include a full backup of Sci-Hub and distribute it via Torrent and IPFS in addition to their website and the providers and mirrors they usually use for uploading?

[–] Imgonnatrythis@sh.itjust.works 32 points 1 day ago (1 children)

Scihub database stops in 2021. Big win for corporate publishers and wealthy scientists; they've had an edge since then. It's super important to have access to up to date resources. The database here seems to fill the gap - Merry Christmas to me!!

[–] Andromxda@lemmy.dbzer0.com 14 points 1 day ago (2 children)

Anna's Archive allows new uploads though. From their website:

We have the full Sci-Hub collection, as well as new papers.

https://annas-archive.org/scidb

[–] albert180@discuss.tchncs.de 19 points 1 day ago* (last edited 1 day ago)

Wrong. Annas Archive doesn't Accept Uploads directly themselves (at least <10.000), and they recommend STC too.

To upload academic papers, please also (in addition to Library Genesis) upload to STC Nexus. They are the best shadow library for new papers.

https://annas-archive.org/faq

[–] Imgonnatrythis@sh.itjust.works 3 points 1 day ago

That's great! Stc looks like a different effort. Variety is great on this front as it's a monumental task archiving these papers.

[–] hendrik@palaver.p3x.de 16 points 1 day ago* (last edited 1 day ago) (2 children)

I think they stopped endorsing IPFS. I can't find a good source right now. If you wan't to support Anna's Archive, you can help seed their torrents. They don't seem to have that much redundancy.

[–] doeknius_gloek@discuss.tchncs.de 15 points 1 day ago (2 children)

You're right.

We’ve decided that IPFS is not yet ready for prime time. We’ll still link to files on IPFS from Anna’s Archive when possible, but we won’t host it ourselves anymore, nor do we recommend others to mirror using IPFS. Please see our Torrents page if you want to help preserve our collection.

[–] eleitl@lemm.ee 4 points 16 hours ago

IPFS is not for bulk mirroring, it's for content delivery. IPFS works well enough if the content publishers and end users know what they're doing.

[–] itslilith@lemmy.blahaj.zone 5 points 1 day ago (2 children)

I'm curious, could anyone more knowledgeable about IPFS give an impression of the state of the protocol? It seems like a really interesting technology, but it also leans heavily into web3 and crypto bullshit. It's that reflective of the network, or just bad marketing?

[–] eleitl@lemm.ee 2 points 16 hours ago

You can use IPFS fine without any crypto bullshit.

[–] ComradeMiao@lemmy.dbzer0.com 7 points 1 day ago (2 children)

It seems like most big projects have dropped it. I remember reading one of the big fall backs was it has one central node hosting via cloudflare then they dropped it or something. Only half remembering. It sounds so cool! Sad it doesn’t work

[–] eleitl@lemm.ee 2 points 16 hours ago (1 children)

IPFS is designed for decentralized pinning and decentralized use. You're supposed to run a local node or use browsers with built-in IPFS to access content. If you're using it wrong it will suck.

[–] ComradeMiao@lemmy.dbzer0.com 2 points 15 hours ago (1 children)

Yes of course you need to run the program but it all runs through a central node which barely works

[–] eleitl@lemm.ee 2 points 15 hours ago

Again, if you're usung central anything with IPFS you're using it wrong and you are getting the worst of two worlds.

[–] Natanox@discuss.tchncs.de 10 points 23 hours ago (1 children)

Can confirm. Meddled with it a little bit a while ago trying to productively use it to host Lutris installer files. It's an absolute mess; slow, unreliable, without proper documentation and a really bad default node application.

Also it managed to get our server temporarily banned by the hosting provider since the "sane default settings" includes the node doing a whole sweep of your local subnet on all NICs respectively, knocking at multiple ports of every device it can find. Because the expected environment of a node apparently is your home network… a default setting that caused problems for many people for many years by now.

A project like in this post might benefit from looking at more modern/mature reimplementations of IPFS' concept, like Veilid (which would also offer additional features as well).

[–] TheMachineStops@discuss.tchncs.de 6 points 23 hours ago* (last edited 22 hours ago) (1 children)

Just looked up Veilid seems to similar to I2P, but it is still in development and can't be used for now. Also I agree that IPFS is horrible and not just the setup, the developer themselves are against piracy. What is the point of a decentralised network that picks and chooses what it hosts? BitTorrent, Tor, Freenet, and I2P never did this as far as I know.

DCMA Denylist https://github.com/ipfs-inactive/faq/issues/36#issuecomment-140567411

[–] hendrik@palaver.p3x.de 3 points 18 hours ago (1 children)

I think you're all making look a bit worse than it is. I downloaded a few PDFs via IPFS and it worked for me. And I was happy it provided me with what I needed at that time. I can't comment on reliability or other nuances. It also was slow in my case, but I took that as the usual trade-off. Usually, you either get speed or anonymity, not both. And there are valid use-cases for denylists. For example viruses, malware, CSAM and spam. I'd rather not have my node spread those. It's complicated. And I also talk in public like that. I think what matters is what you do and implement, not if you say you comply with regulation and the DMCA...

Thanks for the links, I'll have a look.

[–] TheMachineStops@discuss.tchncs.de 4 points 17 hours ago (2 children)

What are talking about, IPFS isn't anonymity network, it is similar to torrenting everyone can see your ip.

[–] hendrik@palaver.p3x.de 2 points 16 hours ago* (last edited 16 hours ago) (1 children)

I thought it had that factored in. But yeah, if it's bittorrent, just as a CDN, this isn't anonymous. I'll look it up.

[–] TheMachineStops@discuss.tchncs.de 6 points 15 hours ago

When I first heard of IPFS, I also thought it was anoymous, but I researched and it just like bittorrent everyone can see your IP. You have to use VPN.

https://discuss.ipfs.tech/t/how-to-make-ipfs-node-ip-address-anonymous/12359/3

If you want an anonymous P2P try freenet or I2P. There is also a new anonymous network currently being developed called Vailid which seems promising.

[–] eleitl@lemm.ee 1 points 16 hours ago

So use an anonymyzing network overlay if you want anonymity.

[–] Andromxda@lemmy.dbzer0.com 14 points 1 day ago (1 children)

I'm seeding around 10TB of Anna's Archive data

[–] Zoop@beehaw.org 3 points 14 hours ago

Thank you. 💖

[–] ComradeMiao@lemmy.dbzer0.com 2 points 1 day ago

The IPFS backups of that and libgen are dead but the torrents work!