

I’ve been thinking about setting up Anubis to protect my blog from AI scrapers, but I’m not clear on whether this would also block search engines. It would, wouldn’t it?
Canadian software engineer living in Europe.
I’ve been thinking about setting up Anubis to protect my blog from AI scrapers, but I’m not clear on whether this would also block search engines. It would, wouldn’t it?
I use them quite heavily in combination with Cookie Autodelete. I then create a separate profile for each surveillance capitalist service I work with. So for example, here’s my list of containers:
Every time I visit one of these sites, Firefox opens them in the respective container, and the cookies they create are isolated to that container. When I’m in the LinkedIn container, Cookie AutoDelete nukes every cookie that isn’t from LinkedIn (including Google, GitHub, etc.). When I’m not in any container, all cookies are deleted everywhere.
Basically it’s a nice way to leverage Cookie Autodelete without having to whitelist Big Tech for all my browsing.
I don’t think there’s an official “way”, but here’s mine (which I love):
On start-up I open all the apps I usually use, one per designated workspace:
Workspaces 6-9 are left empty, ready for whatever app I need in the moment, but only ever one app per workspace.
With this setup, I’ve mapped Ctrl+Fx
to each workspace, so Ctrl+F4
takes me to PyCharm where I write the code, and Ctrl+F5
followed by another F5 takes me to Firefox and reloads the page. Ctrl+F3
is always the terminal, etc., so you quickly start building these shortcuts to mean Fwhatever is $APP_NAME.
I almost never use the mouse, unless what I’m doing is necessarily mouse-driven: browsing or drawing charts etc. Everything else is keyboard-driven.
I have a few interesting ones.
Download a video:
alias yt="yt-dlp -o '%(title)s-%(id)s.%(ext)s' "
Execute the previous command as root:
alias please='sudo $(fc -n -l -1)'
Delete all the Docker things. I do this surprisingly often:
alias docker-nuke="docker system prune --all --volumes --force"
This is a handy one for detecting a hard link
function is-hardlink {
count=$(stat -c %h -- "${1}")
if [ "${count}" -gt 1 ]; then
echo "Yes. There are ${count} links to this file."
else
echo "Nope. This file is unique."
fi
}
I run this one pretty much every day. Regardless of the distro I’m using, it Updates All The Things:
function up {
if [[ $(command -v yay) ]]; then
yay -Syu --noconfirm
yay -Yc --noconfirm
elif [[ $(command -v apt) ]]; then
sudo apt update
sudo apt upgrade -y
sudo apt autoremove -y
fi
flatpak update --assumeyes
flatpak remove --unused --assumeyes
}
I maintain an aliases file in GitLab with all the stuff I have in my environment if anyone is curious.
I have much the same:
The only difference is that I’m using a Synology 'cause I have 15TB and don’t know how to do RAID myself, let alone how to do it with an old laptop. I can’t really recommend a Synology though. It’s got too many useless add-ons and simple tools like rsync never work properly with it.
Yeah this was a deal-breaker for me too.
TIL about using lsblk
instead of just reading through the output of journalctl
to find the disk and partitions. Thanks!
That’s not been my experience. It may be using a web view under the hood, but the functionality is quite different. Additional features, breaking the video call out of the primary pane, etc. To suggest that they’re essentially the same is not accurate.
Really? All I’ve seen is a Flatpak that’s really just a wrapped web view. Is there now a native version of Teams for Linux?
Yes. Tailscale is surprisingly simple.
# systemctl start tailscale
# tailscale up
Lowering the barrier to entry by moving from a technology few use (mercurial) to something popular (git) makes sense. Requiring participation on a proprietary platform owned by Microsoft instead of an open one like Codeberg or GitLab is just lazy. If someone wants to contribute to Firefox, asking them to create an account is a small ask, and I’d argue that if they’re unwilling to do even that, then their participation in the community is likely to be far from useful.
They could have opted for Codeberg for example and made a public donation to the project of a few hundred dollars a month. Instead, they opted for funnelling more power and support into a terrible company.
This is what I get for posting at 1am. Thanks for the clarification. Yeah I just assumed it was the same situation as coreutils.
Granted, sudo isn’t in coreutils, but it’s sufficiently standard that I’d argue that the licence is very relevant to the wider Linux community.
Anyway, I answered this at length the last time this subject came up here, but the TL;DR is that private companies (like Canonical, who owns Ubuntu) love the MIT license because it allows them to take the code and make proprietary versions of it without having to release the source code. Consider the implications of a sudo
binary that’s Built For Ubuntu™ with closed-source proprietary hooks into Canonical’s cloud auth provider. It’s death by a thousand MIT-licensed cuts to our once Free operating system.
Is it GPL though? If this is a case of MIT-licensed stuff weaseling its way into Linux core utils, I’m not interested.
The version of Firefox that ships with Debian is quite old if I recall. You might want to try installing it either as a flatpak or as a separate apt repo from Mozilla directly to see if that solves it.
I mean, you can buy it and use it in a general purpose fashion, and yeah, those cores would do wonders for all sorts of compiles. Also, it can be useful if you’re like me and do a lot of Dockerised development. Given that most games are x86 only though, sadly this would be no good :-(
The Ampre Altra runs from 32 to 128 cores (dear gods that’s beautiful), but with that architecture, and the company’s stated purpose, it makes more sense in a computer meant to be used as a server rather than a desktop gaming rig. You’d use a chip like that in a Kubernetes cluster for example.
Combined with an Nvidia card, a brand notorious for being a Pain In The Ass in Linuxland, I’m going to go out on a limb here and suggest that the intended purpose of a box like this is a server for AI/ML-based services.
This all appears to be based on the user agent, so wouldn’t that mean that bad-faith scrapers could just declare themselves to be typical search engine user agent?