Troels' blog

Sunday, June 08, 2025

Yak Shaving: So There I Am, Shaving a Yak…

An important term to know of in the software development industry is yak shaving. The term is properly defined in Wiktionary, but Wiktionary's etymology section for the term is odd, and it doesn't really convey why hairy cattle is being referred to.

On the web, there are many different anecdotes illustrating yak shaving. In my opinion, the following is the best one. It was posted by the late Bill Gaiennie in 2009 on a website which now only exists in the WayBackMachine. I think it deserves to exist on a live website, so here it is:

I simply wanted to snap some pictures of my dogs running around the park last weekend. As I was about to round up the dogs to head to the park, I went to grab my camera, but then realized that I had left it at work.
So I jumped in the car to run by my office to grab the camera, but realized that I didn’t have a key to get into the building. I knew that a co-worker always had a key, so I started to drive to his house, but then realized that the last time I borrowed something from him, an exotic suit, I never returned it because I had accidentally ripped a pretty big hole in the jacket. I knew that my co-worker wouldn’t trust me with the office key if I didn’t return the suit, sans hole, but I had not been able to get it repaired because it was made out of yak hair, something I wasn’t even sure I could get.
I contacted a tailor who would be able to repair the suit, but wouldn’t be able to do it unless I was able to provide a supply of yak hair so that she could weave it into a fabric, and then from there use the yak fabric to repair the yak suit jacket….

So there I was at the zoo, shaving a yak, all so I could take a few pictures of my dogs at the park.

Sunday, March 16, 2025

Disable Fedora's Postgres packages when using PGDG packages

In a poll in the Copenhagen Postgres User Group, I asked folks which software source they use when installing Postgres. For some strange reason, the poll is gone from LinkedIn, but I'm rather sure the majority uses packages from postgresql.org, i.e. the "PGDG" packages.

A word of caution for those who use PGDG package on cutting-edge distributions like Fedora: Your distribution may sometimes get ahead of PGDG, and then your installation will install your distribution's Postgres when updating software. This may wreak havoc, because your distribution's Postgres will have data in a different directory than PGDG's. Consequently:

On RPM-based systems, you should add a line like this in in /etc/yum.repos.d/fedora.repo / /etc/yum.repos.d/fedora-updates.repo:

exclude=postgresql* postgis*

That way there's no ambiguity about which flavor of Postgres packages your system tracks.

Sunday, March 02, 2025

Overcoming stubborn jstack

Situation: Need to peek into what the different threads of a Java process are doing. But when running jstack, something the following happens after a little while:

target process 1985458 doesn't respond within 10500ms or HotSpot VM not loaded

You could try adding the "-F" flag to jstack, but often it will not work, because of jstack's timeout.

One way to handle this is to create a core dump of the Java process and then running "jhsdb jstack ..." on the resulting core file. Since jhsdb does not have a timeout, this makes a difference.

To generate the core file on Linux, you first need to install gdb on the server, if it's not already around. Then, assuming the Java process ID is 54321:

gdb -p 54321

In the gdb shell, run:

generate-core-file

You should now have a file "core.54321".

Things which may cause trouble for core file generation:

When running gdb, your current working directory is on a file system which is too small, or you don't have write permissions.
Your login session has limits on the size of core files. For this, run "ulimit -c unlimited".

My experience is that core file generation will not cause the process to fail (if you answer yes to leave the process running when exiting gdb), but it will freeze, while the core file is being created.

Now, having produced a core file, analyze it without a timeout (it can take a long time, depending on the size of the Java process). One long command line:

jhsdb jstack --exe /usr/lib/jvm/java-17-openjdk-amd64/bin/java --core core.54321 > output.txt

Of course, you may need to adjust the path to the Java which was used by process 54321.

By the way, you may re-use the core file to get a memory dump (one long line):

jhsdb jmap --exe /usr/lib/jvm/java-17-openjdk-amd64/bin/java --core /tmp/core.54321 > 54321-jmap.txt

Saturday, January 04, 2025

Fixing LDAP auth for Postgres with AD

I've spent hours trying to find out why I couldn't get Postgres LDAP auth to work against a Samba active directory server in one setup (it worked well against a Samba active directory in another...).

I kept getting this in Postgres' logs:

2025-01-04 19:03:32.037 CET [58282] LOG: could not search LDAP with scope 2 for filter "(sAMAccountName=troels)" on server "dcsrv.test": Operations error
2025-01-04 19:03:32.037 CET [58282] DETAIL: LDAP diagnostics: 00002020: Operation unavailable without authentication

Adding this line in /etc/ldap/ldap.conf made things work:

REFERRALS off

In Red Hat derived Linux distributions, the path to ldap.conf is /etc/openldap/ldap.conf.

For why this make a difference, search for "referrals" in the Python FAQ.

(I'm getting the impression it would be nice if "off" was the default for REFERRALS.)

Thursday, October 03, 2024

Antivirus software considered harmful

The IT profession needs to confront policies which demand antivirus be installed as a general measure

IT professionals still have in fresh memory the failure of CrowStrike's Falcon endpoint protection product which caused an US$10 billion global IT breakdown. The incident has been extensively covered in news outlets, on YouTube channels, and in blog posts – and rightly so.

However, no-one seems to question the idea of having such software installed on servers in the first place. That has long been a mystery to me.

Lack of independent justification for AV

After seven years, the following well-written ITPro article is still relevant: Does antivirus software do more harm than good? In the article, several sources describe how antivirus (AV) software itself has security holes, and that AV may interfere with other security measures. The problem with AV software is that it often runs in close proximity to the guts of the underlying system, so vulnerabilities in AV software (which are not uncommon) can have extraordinarily nasty security implications, as well as stability implications, see CrowdStrike.

In other sectors like research and medicine, it's considered natural to ask "Is there any independent evidence supporting the net gain from spending time and money on X?". Even though the IT world spends huge amounts of resources on AV, I've yet to see studies which provide such evidence. Sure, there are reports demonstrating that a given product detects malware XYZ (though not necessarily malware ZYX). But where's a study which documents how that detection makes a system more secure? Bear in mind that a file with malware might not get a chance to infect, because it relies on specific user interaction which is unlikely to happen, or it may rely on operating system vulnerabilities which have been patched long ago. On the other hand, when the AV product sometimes introduces gaping security holes, then the overall gain may well be negative.

Down-to-earth evidence of AV's questionable role

The above may appear overly academic, so here are some down-to-earth observations:

Billions of Internet-connected devices run without AV software, and it generally works very well. Phone operating systems get software through well-understood, curated channels, and they have built-in robustness such as sandboxing and ASLR. So even though the smartphone ecosystem mainly consists of happy-go-lucky non-technical end-users, we have seen very little (if any) akin to an ILOVEYOU pandemic, and we see very little ransomware on phones.
I did not have to clean up after the CrowStrike incident. But in my professional life, I've lately had to deal with detrimental effects from another AV product: That AV product had suddenly decided to "eat" a DLL file from a perfectly valid piece of software which had been installed weeks before. As a consequence, a rather important service stopped working. Paradoxically, the impacted service was helping us conduct safe IT practices. It took time and money to get things back running, and it has happened several times. I'm sure many of you have also seen AV software constitute a source of arbitrary and unpredictable outage.

AV at odds with pillars of secure IT practice

Let's step back and review decade-old basic system administration principles, and how they are often broken by AV:

Install as little software as possible in order to limit attack surfaces – especially minimize software which runs with high privileges. AV breaks this principle, because it's perfectly possible to run a computer without an AV add-on (which millions of Macs and Linux servers, and probably also some Windows servers, do).
Keep your software up-to-date in order to close the holes which malware will try to exploit. I've seen several cases where AV would conflict with software updates from Microsoft, making this principle harder to follow.
Run software with the lowest possible privileges: All non-trivial software has bugs, so processes should be constrained as much as possible. I claim that many AV products run with excessively high privileges and/or in kernel space, breaking the principle. See this article about DLL search order hijacking for extra suspense.
Make software hard to infect, primarily by disallowing software to write to its own binaries, but also by using techniques like address space layout randomization (ASLR). The article from 2017 quotes a software developer who spent lots of time fighting tricks performed by AV which could conflict with the product's built-in robustness.
Prefer to have software be delivered via well known channels, such as yum repositories, app stores, etc. Many AV products introduce proprietary update services which see little scrutiny and may even decide to replace itself with another product, see Kaspersky's recent surprise act. On top of that, CrowdStrike's update system is hardly the only one which made it impossible to roll out in non-production, before applying updates to production.
Stay away from software which shows signs of sloppy software engineering. Kudos to AV vendors for publishing CVE reports. But less so for what seems to be sloppiness: CrowdStrike has huge financial ballast, yet it failed to conduct basic testing steps, and it was not the first time in recent history. Other AV products also have questionable track records, but I've yet to hear such aspects being part of IT departments' decision making.
Absolutely remove abandoned or end-of-life software. Yet, AV software is often rather hard to remove, causing friction when trying to uphold the principle. I've seen many servers running AV software which was no longer being updated, after a vendor change long ago (in one case, because removal required down-time where the server had to be booted into safe mode, before the abandoned product could be removed).
Spend time wisely, because there is never time for it all. Time spent administering/maintaining AV software and battling AV-derived problems may result in less time spent on security enhancing activities like log analysis, systems/code cleanup, restore tests, AD hardening, etc.
I've been in a meeting where a server break-in was discussed. The server had known vulnerabilities which would take time to address. Someone proposed installing AV on the server, buying time. If the decision for AV had been chosen, I'm rather sure it would have stolen much needed attention on getting the root cause handled, because it would have added a false sense of security.

On top of that, AV software is often seen incurring significant system overhead, adding latency and affecting end-user productivity. System overhead also requires extra CPU and RAM, which goes against green IT ambitions.

Culture and profit

Given all the technical arguments against AV, we may instead be dealing with a cultural issue: What is considered a given in one community is foreign to another. You will probably agree that in some parts of the IT landscape, it would be surprising to run across AV. For example:

Internet routers don't run AV, even though they are directly exposed to all sorts of traffic.
I claim most VMware administrators would strongly object, if someone requested AV be run directly on hypervisors.
Printers don't run AV (and are often not patched, but that's a separate, sad story).
Millions of compute nodes in high-performance compute environments deliver gazillions of compute hours, typically without having AV interfere.

The list goes on. And fortunately, many Linux and Mac hosts are still allowed to run without AV, although many overly-general IT policies are starting to force AV into those environments.

AV got started in the old days when Microsoft had not yet introduced proper kernel/userland and administrator/user separation into Windows, and where organizations had no good way to curb users from installing software from arbitrary sources. Fast-forward to 2024 where Windows systems can be very robust, but AV thinking prevails -- why? Inertia is not foreign to the IT business, of course: "Nobody Gets Fired For Buying ~~IBM~~AV". Let me propose an additional reason: Some software vendors and certain security consultants have found AV to be a very profitable business, and they would naturally hate to see profits go down.

Time to start questioning

AV uselessness may appear to be a controversial proposition, and many will disagree with me. On the other hand, I also know of many system administrators and programmers who agree.

IT professionals, can we at least agree that it's our responsibility to start questioning AV as a general requirement? The compliance crowd will not have the guts to do it. So-called IT security consultants often profit from antivirus products, so they don't want to rock the boat.

It's on IT professionals' plate to address this.

PS 1

There may be situations where AV makes good sense to have. I argue that AV can make sense to have

on mail gateways, as part of general anti-junkmail efforts
on file shares which are shared by many users, depending on the environment
on hosts which – for some reason – can neither be regularly patched, nor brought onto a segregated network (but don't expect it to be very effective)

PS 2

While I generally object to the notion of buying security as an add-on product, I acknowledge that some products can provide real security. A product like Nessus' vulnerability scanner, for example, is a good addition to an IT organization's arsenal, especially because it can run in an unintrusive manner which neither installs anything on a server, nor tries to intercept all network traffic.

Wednesday, September 04, 2024

Find recently updated files

The following task comes up once in a while, but I keep forgetting how to do it. So I decided to finally write it down.

Find the ten youngest directory entries in the current directory and all subdirectories:

find . -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

Same task, but solely with regards to files (not directories):

find . -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

Note: In case no files/directories are found, the following error message may be returned: "stat: missing operand".

Note also, that this may not work on non-Linux operating systems: For example, I've seen it fail on an old FreeBSD server where the stat command behaves very different from what I'm used to.

Saturday, October 07, 2023

(Un)safe English locales

I recently spent a frustrating amount of time troubleshooting a Java application which had stopped working properly after an upgrade: Certain features of the product were broken due to the application running on a Linux server which had been configured with the "en_DK" locale.

"en_DK" had been chosen expecting to have system with messages in English, but with currency symbols etc. suitable for Denmark. This makes sense, because it's not uncommon for IT systems to provide more precise (error) messages in English, compared to a small language like Danish.

Unfortunately, there is not universal agreement about locales. Even within Linux distributions, there is not 100% consistency.

The GNU C Library (Glibc) seems to have the longst list of recognized locales, including 19 English ones. Java's list of supported locales is somewhat shorter and includes only 11 English locales. Interestingly, Java has an English locale for Malta (en_MT) which glibc does not have.

I haven't been able to find MacOS's list of locales, but forum posts suggest it has only 6 English locales: en_AU, en_CA, en_GB, en_IE, en_NZ, en_US.

Ignoring Mac for a moment, these are unsafe English locales, i.e. not supported by both glibc and Java:

Locale	Country
en_AG	Antigua and Barbuda
en_BW	Botswana
en_DK	Denmark
en_HK	Hong Kong
en_IL	Israel
en_MT	Malta
en_NG	Nigeria
en_SC	Seychelles
en_ZM	Zambia
en_ZW	Zimbabwe

On the other hand, the following locales should be safe:

Locale	Country	Safe even om Mac
en_AU	Australia	🍎
en_CA	Canada	🍎
en_GB	Great Britain	🍎
en_IE	Ireland	🍎
en_IN	India
en_NZ	New Zealand	🍎
en_PH	Philippines
en_SG	Singapore
en_US	USA	🍎
en_ZA	South Africa

Looking beyond unix/POSIX-like systems: Windows recognizes more than 100 English locale identifiers. Windows' list does invalidate the above safe-list.

For Danes, I suggest using one of the following locales:

da_DK (and sometimes accept poor error messages)
en_IE, as the Irish are sane enough to use a 24-hour date format, etc
C, which is the fall-back "POSIX system default" locale