Saturday, January 04, 2025

Fixing LDAP auth for Postgres with AD

I've spent hours trying to find out why I couldn't get Postgres LDAP auth to work against a Samba active directory server in one setup (it worked well against a Samba active directory in another...).

I kept getting this in Postgres' logs:

2025-01-04 19:03:32.037 CET [58282] LOG:  could not search LDAP with scope 2 for filter "(sAMAccountName=troels)" on server "dcsrv.test": Operations error
2025-01-04 19:03:32.037 CET [58282] DETAIL:  LDAP diagnostics: 00002020: Operation unavailable without authentication

Adding this line in /etc/ldap/ldap.conf made things work:

REFERRALS off

In Red Hat derived Linux distributions, the path to ldap.conf is /etc/openldap/ldap.conf.

For why this make a difference, search for "referrals" in the Python FAQ.

(I'm getting the impression it would be nice if "off" was the default for REFERRALS.)

Thursday, October 03, 2024

Antivirus software considered harmful

The IT profession needs to confront policies which demand antivirus be installed as a general measure

IT professionals still have in fresh memory the failure of CrowStrike's Falcon endpoint protection product which caused an US$10 billion global IT breakdown. The incident has been extensively covered in news outlets, on YouTube channels, and in blog posts – and rightly so.

However, no-one seems to question the idea of having such software installed on servers in the first place. That has long been a mystery to me.

Lack of independent justification for AV

After seven years, the following well-written ITPro article is still relevant: Does antivirus software do more harm than good? In the article, several sources describe how antivirus (AV) software itself has security holes, and that AV may interfere with other security measures. The problem with AV software is that it often runs in close proximity to the guts of the underlying system, so vulnerabilities in AV software (which are not uncommon) can have extraordinarily nasty security implications, as well as stability implications, see CrowdStrike.

In other sectors like research and medicine, it's considered natural to ask "Is there any independent evidence supporting the net gain from spending time and money on X?". Even though the IT world spends huge amounts of resources on AV, I've yet to see studies which provide such evidence. Sure, there are reports demonstrating that a given product detects malware XYZ (though not necessarily malware ZYX). But where's a study which documents how that detection makes a system more secure? Bear in mind that a file with malware might not get a chance to infect, because it relies on specific user interaction which is unlikely to happen, or it may rely on operating system vulnerabilities which have been patched long ago. On the other hand, when the AV product sometimes introduces gaping security holes, then the overall gain may well be negative.

Down-to-earth evidence of AV's questionable role

The above may appear overly academic, so here are some down-to-earth observations:

  • Billions of Internet-connected devices run without AV software, and it generally works very well. Phone operating systems get software through well-understood, curated channels, and they have built-in robustness such as sandboxing and ASLR. So even though the smartphone ecosystem mainly consists of happy-go-lucky non-technical end-users, we have seen very little (if any) akin to an ILOVEYOU pandemic, and we see very little ransomware on phones.

  • I did not have to clean up after the CrowStrike incident. But in my professional life, I've lately had to deal with detrimental effects from another AV product: That AV product had suddenly decided to "eat" a DLL file from a perfectly valid piece of software which had been installed weeks before. As a consequence, a rather important service stopped working. Paradoxically, the impacted service was helping us conduct safe IT practices. It took time and money to get things back running, and it has happened several times. I'm sure many of you have also seen AV software constitute a source of arbitrary and unpredictable outage.


AV at odds with pillars of secure IT practice

Let's step back and review decade-old basic system administration principles, and how they are often broken by AV:

  • Install as little software as possible in order to limit attack surfaces – especially minimize software which runs with high privileges. AV breaks this principle, because it's perfectly possible to run a computer without an AV add-on (which millions of Macs and Linux servers, and probably also some Windows servers, do).

  • Keep your software up-to-date in order to close the holes which malware will try to exploit. I've seen several cases where AV would conflict with software updates from Microsoft, making this principle harder to follow.

  • Run software with the lowest possible privileges: All non-trivial software has bugs, so processes should be constrained as much as possible. I claim that many AV products run with excessively high privileges and/or in kernel space, breaking the principle. See this article about DLL search order hijacking for extra suspense.

  • Make software hard to infect, primarily by disallowing software to write to its own binaries, but also by using techniques like address space layout randomization (ASLR). The article from 2017 quotes a software developer who spent lots of time fighting tricks performed by AV which could conflict with the product's built-in robustness.

  • Prefer to have software be delivered via well known channels, such as yum repositories, app stores, etc. Many AV products introduce proprietary update services which see little scrutiny and may even decide to replace itself with another product, see Kaspersky's recent surprise act. On top of that, CrowdStrike's update system is hardly the only one which made it impossible to roll out in non-production, before applying updates to production.

  • Stay away from software which shows signs of sloppy software engineering. Kudos to AV vendors for publishing CVE reports. But less so for what seems to be sloppiness: CrowdStrike has huge financial ballast, yet it failed to conduct basic testing steps, and it was not the first time in recent history. Other AV products also have questionable track records, but I've yet to hear such aspects being part of IT departments' decision making.

  • Absolutely remove abandoned or end-of-life software. Yet, AV software is often rather hard to remove, causing friction when trying to uphold the principle. I've seen many servers running AV software which was no longer being updated, after a vendor change long ago (in one case, because removal required down-time where the server had to be booted into safe mode, before the abandoned product could be removed).

  • Spend time wisely, because there is never time for it all. Time spent administering/maintaining AV software and battling AV-derived problems may result in less time spent on security enhancing activities like log analysis, systems/code cleanup, restore tests, AD hardening, etc.
    I've been in a meeting where a server break-in was discussed. The server had known vulnerabilities which would take time to address. Someone proposed installing AV on the server, buying time. If the decision for AV had been chosen, I'm rather sure it would have stolen much needed attention on getting the root cause handled, because it would have added a false sense of security.

On top of that, AV software is often seen incurring significant system overhead, adding latency and affecting end-user productivity. System overhead also requires extra CPU and RAM, which goes against green IT ambitions.

Culture and profit

Given all the technical arguments against AV, we may instead be dealing with a cultural issue: What is considered a given in one community is foreign to another. You will probably agree that in some parts of the IT landscape, it would be surprising to run across AV. For example:

  • Internet routers don't run AV, even though they are directly exposed to all sorts of traffic.
  • I claim most VMware administrators would strongly object, if someone requested AV be run directly on hypervisors.
  • Printers don't run AV (and are often not patched, but that's a separate, sad story).
  • Millions of compute nodes in high-performance compute environments deliver gazillions of compute hours, typically without having AV interfere.

The list goes on. And fortunately, many Linux and Mac hosts are still allowed to run without AV, although many overly-general IT policies are starting to force AV into those environments.

AV got started in the old days when Microsoft had not yet introduced proper kernel/userland and administrator/user separation into Windows, and where organizations had no good way to curb users from installing software from arbitrary sources. Fast-forward to 2024 where Windows systems can be very robust, but AV thinking prevails -- why? Inertia is not foreign to the IT business, of course: "Nobody Gets Fired For Buying IBMAV". Let me propose an additional reason: Some software vendors and certain security consultants have found AV to be a very profitable business, and they would naturally hate to see profits go down.

Time to start questioning

AV uselessness may appear to be a controversial proposition, and many will disagree with me. On the other hand, I also know of many system administrators and programmers who agree.

IT professionals, can we at least agree that it's our responsibility to start questioning AV as a general requirement? The compliance crowd will not have the guts to do it. So-called IT security consultants often profit from antivirus products, so they don't want to rock the boat.

It's on IT professionals' plate to address this.



PS 1

There may be situations where AV makes good sense to have. I argue that AV can make sense to have

  • on mail gateways, as part of general anti-junkmail efforts
  • on file shares which are shared by many users, depending on the environment
  • on hosts which – for some reason – can neither be regularly patched, nor brought onto a segregated network (but don't expect it to be very effective)


PS 2

While I generally object to the notion of buying security as an add-on product, I acknowledge that some products can provide real security. A product like Nessus' vulnerability scanner, for example, is a good addition to an IT organization's arsenal, especially because it can run in an unintrusive manner which neither installs anything on a server, nor tries to intercept all network traffic.

Wednesday, September 04, 2024

Find recently updated files

The following task comes up once in a while, but I keep forgetting how to do it. So I decided to finally write it down.

Find the ten youngest directory entries in the current directory and all subdirectories:

find . -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

Same task, but solely with regards to files (not directories):

find . -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

Note: In case no files/directories are found, the following error message may be returned: "stat: missing operand". 

Note also, that this may not work on non-Linux operating systems: For example, I've seen it fail on an old FreeBSD server where the stat command behaves very different from what I'm used to.


Saturday, October 07, 2023

(Un)safe English locales

I recently spent a frustrating amount of time troubleshooting a Java application which had stopped working properly after an upgrade: Certain features of the product were broken due to the application running on a Linux server which had been configured with the "en_DK" locale.

"en_DK" had been chosen expecting to have system with messages in English, but with currency symbols etc. suitable for Denmark. This makes sense, because it's not uncommon for IT systems to provide more precise (error) messages in English, compared to a small language like Danish.

Unfortunately, there is not universal agreement about locales. Even within Linux distributions, there is not 100% consistency.

The GNU C Library (Glibc) seems to have the longst list of recognized locales, including 19 English ones. Java's list of supported locales is somewhat shorter and includes only 11 English locales. Interestingly, Java has an English locale for Malta (en_MT) which glibc does not have.

I haven't been able to find MacOS's list of locales, but forum posts suggest it has only 6 English locales: en_AU, en_CA, en_GB, en_IE, en_NZ, en_US.

Ignoring Mac for a moment, these are unsafe English locales, i.e. not supported by both glibc and Java:

LocaleCountry
en_AGAntigua and Barbuda
en_BWBotswana
en_DKDenmark
en_HKHong Kong
en_ILIsrael
en_MTMalta
en_NGNigeria
en_SCSeychelles
en_ZMZambia
en_ZWZimbabwe

On the other hand, the following locales should be safe:

LocaleCountrySafe even om Mac
en_AUAustralia🍎
en_CACanada🍎
en_GBGreat Britain🍎
en_IEIreland🍎
en_INIndia
en_NZNew Zealand🍎
en_PHPhilippines
en_SGSingapore
en_USUSA🍎
en_ZASouth Africa

Looking beyond unix/POSIX-like systems: Windows recognizes more than 100 English locale identifiers. Windows' list does invalidate the above safe-list.

For Danes, I suggest using one of the following locales:

  • da_DK (and sometimes accept poor error messages)
  • en_IE, as the Irish are sane enough to use a 24-hour date format, etc
  • C, which is the fall-back "POSIX system default" locale

Tuesday, July 11, 2023

Unbootable RHEL 9 when using BP-028

When installing Red Hat Enterprise Linux 9, you may choose to apply a security profile, such as ANSSI-BP-028 High.

I've recently seen two VMware-virtualized RHEL 9.2 servers not being able to boot properly when installed with the ANSSI-BP-028 High profile. Instead, they booted into emergency mode.

The way to fix it:

Add file /etc/modules-load.d/for_uefi.conf containing a single line:

vfat 

Then run the following two commands:

dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
reboot

Friday, March 21, 2014

parallel_ntp_scan

NTP-based DDoS attacks are fashionable, currently.

I've coded a little application which quickly scans a network for NTP servers. For those found, it rates them according to their susceptibility to being implicated in an amplification DDoS attack.

Saturday, October 01, 2011

What a dying SFP looks like

Fibre channel (FC) storage is handy, and generally very reliable, in my experience. I certainly do not miss the days of messing around with disks in a server-room. And I like the fact that RAIDs may be cut up into slices (LUNs) which may be shared by many servers, resulting in very efficient use of the disks (if so wanted).

One part about FC that I dislike (in addition to the price tags): SFPs. Why on earth are transceivers not an integral part of a Fibre Channel switch? Having the transceivers be separate units means more electrical contact points, and a potential support mess (it's not hard to imagine a situation where the support contract of an SFP has run out, while the switch itself is still covered).

Anyway: Today, I experienced an defunct SFP, for the first time. The following observations may give a hint of how to discover that an SFP is starting to malfunction. The setup is an IBM DS4800 storage system where port 2 on controller B is connected to port 0 on an IBM TotalStorage SAN32B FC switch (which is an IBM-branded Brocade 5100 switch).

Friday morning at 07:49, in syslog: A few messages like this from the FC switch:
raslogd: 2011/09/30-07:49:07, [SNMP-1008], 2113, WWN 10:... | FID 128, INFO, IBM_2005_B32_B,  The last device change happened at : Fri Sep 30 07:49:01 2011

At the same time the storage system started complaining about "Drive not on preferred path due to ADT/RDAC failover", meaning that at least one server had started using a non-optimal path, most likely due to I/O timeouts on the preferred path. And a first spike in the bad_os count occurred for the FC switch port:


bad_os is a counter which exists in Brocade switches, and possibly others. Brocade describes it as the number of invalid ordered sets received.

At 10:55, in syslog:
raslogd: 2011/09/30-10:55:02, [FW-1424], 2118, WWN 10:... | FID 128, WARNING, IBM_2005_B32_B, Switch status changed from HEALTHY to MARGINAL
At the same time, there was a slightly larger spike in the bad_os graph.
Coinciding: The storage system sent a mail warning about "Data rate negotiation failed" for the port.

At 17:00: The count for bit-traffic flat-lined (graph not shown). I.e.: All traffic had ceased.

At no point did the graphs for C3 discards, encoding errors or CRC errors show any spikes.

The next morning, the involved optical cable was switched; that didn't help. Inserting another SFP helped, leading to the conclusion that the old SFP had started to malfunction.

Morale: Make sure to not just keep spare cables around. A spare SFP should also be kept in stock.
And monitor your systems: A centralized and regularly inspected syslog is invaluable. Generating graphs for key counters is also mandatory for mature systems operation; one way to collect and display key counts for Brocade switches is to use Munin and a Munin plugin which I wrote.

PS: Brocade documentation states that SFP problems might result in the combination of rises in CRC errors and encoding/disparity errors. This did not happen in this situation.