FediDB has stoped crawling until they get robots.txt support
-
How is it air gapped and federated? Do you unairgap it periodically for a refresh then reairgap it? I’ve not heard of airgapped federated servers before and am intrigued. Is it purely for security purposes or also bandwidth savings? Are there other reasons one may want to run an air gapped instance?
-
I don't think that'll work. Asking for consent and retrieving the robots.txt is yet another request with a similar workload. So by that logic, we can't do anything on the internet. Since asking for consent is work and that requires consent, which requires consent... And if you're concerned with efficiency alone, cut the additional asking and complexity by just straightforward doing the single request.
Plus, it's not even that complex. Sending a few bytes of JSON with daily precalculated numbers is a fraction of what a single user interaction does. It's maybe zero point something of a request. Or with a lots of more zero's in-between if we look at what a server does each day. I mean every single refresh of the website or me opening the app loads several files, API endpoints, regularly loads hundreds of kilobytes of Javascript, images etc. There are lots of calculations and database requests involved to display several posts along with votes etc. I'd say one single pageview of me counts like the FediDB collecting sttats each day for like 1000 years.
I invented these numbers. They're wrong. But I think you get what I'm trying to say... For all practical purposes, these requests are for free and have zero cost. Plus if it's efficientcy, it's always a good idea not to ask to ask, but outright do it and deal with it while answering. So it rally can't be computational cost. It has to be consent.
-
In this scenario, I have multiple servers which are networked together and federated via ActivityPub but the server cluster itself is air gapped.
As to your questions about feasibility and purposes, I will admit I both didn't think about that, and should have been more clear that this air gapped federated instance was theoretical lol
-
You're definitely right that I went a bit extreme with what I used as a reason against it, but I feel like the point still stands about "just ask before you slam people's servers with yet another bot on the pile of millions of bots hitting their F2B system"
-
It was a good read, personally speaking I think it probably would have just been better off to block gotosocial until proper robot support was provided I found it weird that they paused the entire system.
-
It is not possible to detect bots. Attempting to do so will invariably lead to false positives denying access to your content to what is usually the most at-risk & marginalized folks
Just implement a cache and forget about it. If read only content is causing you too much load, you're doing something terribly wrong.
-
Thank you for providing the link.
-
Then I’m not sure what point you were trying to make in the above conversation lol.
-
The point was "don't add another bot into the pile of millions of bots that hit people's servers every day unless you're gonna be polite about it"
-
While I agree with you, the quantity of robots has greatly increased of late. While still not as numerous as users, they are hitting every link and wrecking your caches by not focusing on hotspots like humans do.
-
You need a bigger cache. If you dont have enough RAM, host it on a CDN
-
Sure thing! Help me pay for it?
-
that website feels like uncovering a piece of ancient alien weaponry
-
False positives? Meh who cares ... That's what appeals are for. Real people should realize after not too long
-
Every time I tried to appeal, I either got no response or met someone who had no idea what I was talking about.
Occasionally a bank fixes the problem for a week. Then its back again.