Skip to content

Cleaning a broken GnuPG (gpg) key

IT

I've long said that the main tools in the Open Source security space, OpenSSL and GnuPG (gpg), are broken and only a complete re-write will solve this. And that is still pending as nobody came forward with the funding. It's not a sexy topic, so it has to get really bad before it'll get better.

Gpg has a UI that is close to useless. That won't substantially change with more bolted-on improvements.

Now Robert J. Hansen and Daniel Kahn Gillmor had somebody add ~50k signatures (read 1, 2, 3, 4 for the g{l}ory details) to their keys and - oops - they say that breaks gpg.

But does it?

I downloaded Robert J. Hansen's key off the SKS-Keyserver network. It's a nice 45MB file when de-ascii-armored (gpg --dearmor broken_key.asc ; mv broken_key.asc.gpg broken_key.gpg).

Now a friendly:

$ /usr/bin/time -v gpg --no-default-keyring --keyring ./broken_key.gpg --batch --quiet --edit-key 0x1DCBDC01B44427C7 clean save quit

pub  rsa3072/0x1DCBDC01B44427C7
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: SC  
     Vertrauen: unbekannt     Gültigkeit: unbekannt
sub  ed25519/0xA83CAE94D3DC3873
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: S  
sub  cv25519/0xAA24CC81B8AED08B
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: E  
sub  rsa3072/0xDC0F82625FA6AADE
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: E  
[ unbekannt ] (1). Robert J. Hansen <rjh@sixdemonbag.org>
[ unbekannt ] (2)  Robert J. Hansen <rob@enigmail.net>
[ unbekannt ] (3)  Robert J. Hansen <rob@hansen.engineering>

User-ID "Robert J. Hansen <rjh@sixdemonbag.org>": 49705 Signaturen entfernt
User-ID "Robert J. Hansen <rob@enigmail.net>": 49704 Signaturen entfernt
User-ID "Robert J. Hansen <rob@hansen.engineering>": 49701 Signaturen entfernt

pub  rsa3072/0x1DCBDC01B44427C7
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: SC  
     Vertrauen: unbekannt     Gültigkeit: unbekannt
sub  ed25519/0xA83CAE94D3DC3873
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: S  
sub  cv25519/0xAA24CC81B8AED08B
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: E  
sub  rsa3072/0xDC0F82625FA6AADE
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: E  
[ unbekannt ] (1). Robert J. Hansen <rjh@sixdemonbag.org>
[ unbekannt ] (2)  Robert J. Hansen <rob@enigmail.net>
[ unbekannt ] (3)  Robert J. Hansen <rob@hansen.engineering>

        Command being timed: "gpg --no-default-keyring --keyring ./broken_key.gpg --batch --quiet --edit-key 0x1DCBDC01B44427C7 clean save quit"
        User time (seconds): 3911.14
        System time (seconds): 2442.87
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:45:56
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 107660
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 26630
        Voluntary context switches: 43
        Involuntary context switches: 59439
        Swaps: 0
        File system inputs: 112
        File system outputs: 48
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
 

And the result is a nicely useable 3835 byte file of the clean public key. If you supply a keyring instead of --no-default-keyring it will also keep the non-self signatures that are useful for you (as you apparently know the signing party).

So it does not break gpg. It does break things that call gpg at runtime and not asynchronously. I heard Enigmail is affected, quelle surprise.

Now the main problem here is the runtime. 1h45min is just ridiculous. As Filippo Valsorda puts it:

Someone added a few thousand entries to a list that lets anyone append to it. GnuPG, software supposed to defeat state actors, suddenly takes minutes to process entries. How big is that list you ask? 17 MiB. Not GiB, 17 MiB. Like a large picture. https://dev.gnupg.org/T4592

If I were a gpg / SKS keyserver developer, I'd

  • speed this up so the edit-key run above completes in less than 10 s (just getting rid of the lseek/read dance and deferring all time-based decisions should get close)
  • (ideally) make the drop-sig import-filter syntax useful (date-ranges, non-reciprocal signatures, ...)
  • clean affected keys on the SKS keyservers (needs coordination of sysops, drop servers from unreachable people)
  • (ideally) use the opportunity to clean all keyserver filesystem and the message board over pgp key servers keys, too
  • only accept new keys and new signatures on keys extending the strong set (rather small change to the existing codebase)

That way another key can only be added to the keyserver network if it contains at least one signature from a previously known strong-set key. Attacking the keyserver network would become at least non-trivial. And the web-of-trust thing may make sense again.

Updates

09.07.2019

GnuPG 2.2.17 has been released with another set of quickly bolted together fixes:

  * gpg: Ignore all key-signatures received from keyservers.  This
    change is required to mitigate a DoS due to keys flooded with
    faked key-signatures.  The old behaviour can be achieved by adding
    keyserver-options no-self-sigs-only,no-import-clean
    to your gpg.conf.  [#4607]
  * gpg: If an imported keyblocks is too large to be stored in the
    keybox (pubring.kbx) do not error out but fallback to an import
    using the options "self-sigs-only,import-clean".  [#4591]
  * gpg: New command --locate-external-key which can be used to
    refresh keys from the Web Key Directory or via other methods
    configured with --auto-key-locate.
  * gpg: New import option "self-sigs-only".
  * gpg: In --auto-key-retrieve prefer WKD over keyservers.  [#4595]
  * dirmngr: Support the "openpgpkey" subdomain feature from
    draft-koch-openpgp-webkey-service-07. [#4590].
  * dirmngr: Add an exception for the "openpgpkey" subdomain to the
    CSRF protection.  [#4603]
  * dirmngr: Fix endless loop due to http errors 503 and 504.  [#4600]
  * dirmngr: Fix TLS bug during redirection of HKP requests.  [#4566]
  * gpgconf: Fix a race condition when killing components.  [#4577]

Bug T4607 shows that these changes are all but well thought-out. They introduce artificial limits, like 64kB for WKD-distributed keys or 5MB for local signature imports (Bug T4591) which weaken the web-of-trust further.

I recommend to not run gpg 2.2.17 in production environments without extensive testing as these limits and the unverified network traffic may bite you. Do validate your upgrade with valid and broken keys that have segments (packet groups) surpassing the above mentioned limits. You may be surprised what gpg does. On the upside: you can now refresh keys (sans signatures) via WKD. So if your buddies still believe in limiting their subkey validities, you can more easily update them bypassing the SKS keyserver network. NB: I have not tested that functionality. So test before deploying.

10.08.2019

Christopher Wellons (skeeto) has released his pgp-poisoner tool. It is a go program that can add thousands of malicious signatures to a GNUpg key per second. He comments "[pgp-poisoner is] proof that such attacks are very easy to pull off. It doesn't take a nation-state actor to break the PGP ecosystem, just one person and couple evenings studying RFC 4880. This system is not robust." He also hints at the next likely attack vector, public subkeys can be bound to a primary key of choice.

Wiping harddisks in 2019

Linux

Wiping hard disks is part of my company's policy when returning servers. No exceptions.

Good providers will wipe what they have received back from a customer, but we don't trust that as the hosting / cloud business is under constant budget-pressure and cutting corners (wipefs) is a likely consequence.

With modern SSDs there is "security erase" (man hdparm or see the - as always well maintained - Arch wiki) which is useful if the device is encrypt-by-default. These devices basically "forget" the encryption key but it also means trusting the devices' implementation security. Which doesn't seem warranted. Still after wiping and trimming, a secure erase can't be a bad idea :-).

Still there are three things to be aware of when wiping modern hard disks:

  1. Don't forget to add bs=4096 (blocksize) to dd as it will still default to 512 bytes and that makes writing even zeros less than half the maximum possible speed. SSDs may benefit from larger block sizes matched to their flash page structure. These are usually 128kB, 256kB, 512kB, 1MB, 2MB and 4MB these days.1
  2. All disks can usually be written to in parallel. screen is your friend.
  3. The write speed varies greatly by disk region, so use 2 hours per TB and wipe pass as a conservative estimate. This is better than extrapolating what you see initially in the fastest region of a spinning disk.
  4. The disks have become huge (we run 12TB disks in production now) but the write speed is still somewhere 100 MB/s ... 300 MB/s. So wiping servers on the last day before returning is not possible anymore with disks larger than 4 TB each (and three passes). Or 12 TB and one pass (where e.g. fully encrypted content allows to just do a final zero-wipe).

hard disk size one pass three passes
1 TB2 h6 h
2 TB4 h12 h
3 TB6 h18 h
4 TB8 h24 h (one day)
5 TB10 h30 h
6 TB12 h36 h
8 TB16 h48 h (two days)
10 TB20 h60 h
12 TB24 h72 h (three days)
14 TB28 h84 h
16 TB32 h96 h (four days)
18 TB36 h108 h
20 TB40 h120 h (five days)

Hard disk wipe animation


  1. As Douglas pointed out correctly in the comment below, these are IT Kilobytes and Megabytes, so 210 Bytes and 220 Bytes. So Kibibytes and Mebibytes for those firmly in SI territory. 

Openssh taking minutes to become available, booting takes half an hour ... because your server waits for a few bytes of randomness

Linux

So, your machine now needs minutes to boot before you can ssh in where it used to be seconds before the Debian Buster update?

Problem

Linux 3.17 (2014-10-05) learnt a new syscall getrandom() that, well, gets bytes from the entropy pool. Glibc learnt about this with 2.25 (2017-02-05) and two tries and four years after the kernel, OpenSSL used that functionality from release 1.1.1 (2018-09-11). OpenSSH implemented this natively for the 7.8 release (2018-08-24) as well.

Now the getrandom() syscall will block1 if the kernel can't provide enough entropy. And that's frequenty the case during boot. Esp. with VMs that have no input devices or IO jitter to source the pseudo random number generator from.

First seen in the wild January 2017

I vividly remember not seeing my Alpine Linux VMs back on the net after the Alpine 3.5 upgrade. That was basically the same issue.

Systemd. Yeah.

Systemd makes this behaviour worse, see issues #4271, #4513 and #10621.
Basically as of now the entropy file saved as /var/lib/systemd/random-seed will not - drumroll - add entropy to the random pool when played back during boot. Actually it will. It will just not be accounted for. So Linux doesn't know. And continues blocking getrandom(). This is obviously different from SysVinit times2 when /var/lib/urandom/random-seed (that you still have lying around on updated systems) made sure the system carried enough entropy over reboot to continue working right after enough of the system was booted.

#4167 is a re-opened discussion about systemd eating randomness early at boot (hashmaps in PID 0...). Some Debian folks participate in the recent discussion and it is worth reading if you want to learn about the mess that booting a Linux system has become.

While we're talking systemd ... #10676 also means systems will use RDRAND in the future despite Ted Ts'o's warning on RDRAND [Archive.org mirror and mirrored locally as 130905_Ted_Tso_on_RDRAND.pdf, 205kB as Google+ will be discontinued in April 2019].
Update: RDRAND doesn't return random data on pre-Ryzen AMD CPUs (AMD CPU family <23) as per systemd bug #11810. It will always be 0xFFFFFFFFFFFFFFFF (264-1). This is a known issue since 2014, see kernel bug #85991.

Debian

Debian is seeing the same issue working up towards the Buster release, e.g. Bug #912087.

The typical issue is:

[    4.428797] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: data=ordered
[ 130.970863] random: crng init done

with delays up to tens of minutes on systems with very little external random sources.

This is what it should look like:

[    1.616819] random: fast init done
[    2.299314] random: crng init done

Check dmesg | grep -E "(rng|random)" to see how your systems are doing.

If this is not fully solved before the Buster release, I hope some of the below can end up in the release notes3.

Solutions

You need to get entropy into the random pool earlier at boot. There are many ways to achieve this and - currently - all require action by the system administrator.

Kernel boot parameter

From kernel 4.19 (Debian Buster currently runs 4.18 [Update: but will be getting 4.19 before release according to Ben via Mika]) you can set RANDOM_TRUST_CPU at compile time or random.trust_cpu=on on the kernel command line. This will make recent Intel / AMD systems trust RDRAND and fill the entropy pool with it. See the warning from Ted Ts'o linked above.

Update: Since Linux kernel build 4.19.20-1 CONFIG_RANDOM_TRUST_CPU has been enabled by default in Debian.

Using a TPM

The Trusted Platform Module has an embedded random number generator that can be used. Of course you need to have one on your board for this to be useful. It's a hardware device.

Load the tpm-rng module (ideally from initrd) or compile it into the kernel (config HW_RANDOM_TPM). Now, the kernel does not "trust" the TPM RNG by default, so you need to add

rng_core.default_quality=1000

to the kernel command line. 1000 means "trust", 0 means "don't use". So you can chose any value in between that works for you depending on how much you consider your TPM to be unbugged.

VirtIO (KVM, QEMU, ...)

For Virtual Machines (VMs) you can forward entropy from the host (that should be running longer than the VMs and have enough entropy) via virtio_rng.

So on the host, you do:

kvm ... -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0,bus=pci.0,addr=0x7

and within the VM newer kernels should automatically load virtio_rng and use that.

You can confirm with dmesg as per above.

Or check:

# cat /sys/devices/virtual/misc/hw_random/rng_available
virtio_rng.0
# cat /sys/devices/virtual/misc/hw_random/rng_current
virtio_rng.0

Patching systemd

The Fedora bugtracker has a bash / python script that replaces the systemd rnd seeding with a (better) working one. The script can also serve as a good starting point if you need to script your own solution, e.g. for reading from an entropy provider available within your (secure) network.

Chaoskey

The wonderful Keith Packard and Bdale Garbee have developed a USB dongle, ChaosKey, that supplies entropy to the kernel. Hard- and software are open source.

Jitterentropy_RNG

Kernel 4.2 introduced jitterentropy_rng which will use the jitter in CPU timings to generate randomness.

modprobe jitterentropy_rng

This apparently needs a userspace daemon though (read: design mistake) so

apt install jitterentropy-rngd (available from Buster/testing).

The current version 1.0.8-3 installs nicely on Stretch. dpkg -i is your friend.

But - drumroll - that daemon doesn't seem to use the kernel module at all.

That's where I stopped looking at that solution. At least for now. There are extensive docs if you want to dig into this yourself.

Update: The Linux kernel 5.3 will have an updated jitterentropy_rng as per Commit 4d2fa8b44. This is based on the upstream version 2.1.2 and should be worth another look.

Haveged

apt install haveged

Haveged is a user-space daemon that gathers entropy though the timing jitter any CPU has. It will only run "late" in boot but may still get your openssh back online within seconds and not minutes.

It is also - to the best of my knowledge - not verified at all regarding the quality of randomness it generates. The haveged design and history page provides and interesting read and I wouldn't recommend haveged if you have alternatives. If you have none, haveged is a wonderful solution though as it works reliably. And unverified entropy is better than no entropy. Just forget this is 2018 2019 :-).

early-rng-init-tools

Thorsten Glaser has posted newly developed early-rng-init-tools in a debian-devel thread. He provides packages at http://fish.mirbsd.org/~tg/Debs/dists/sid/wtf/Pkgs/early-rng-init-tools/ .

First he deserves kudos for naming a tool for what it does. This makes it much more easily discoverable than the trend to name things after girlfriends, pets or anime characters. The implementation hooks into the early boot via initrd integration and carries over a seed generated during the previous shutdown. This and some other implementation details are not ideal and there has been quite extensive scrutiny but none that discovered serious issues. Early-rng-init-tools look like a good option for non-RDRAND (~CONFIG_RANDOM_TRUST_CPU) capable platforms.

Linus to the rescue

Luckily end of September Linus Torvalds was fed up with the entropy starvation issue and the non-conclusive discussions about (mostly) who's at fault and ... started coding.

With the kernel 5.4 release on 25.11.2019 his patch has made it into mainline. He created a try_to_generate_entropy function that uses CPU jitter to generate seed entropy for the PRNG early in boot.

In the merge commit Linus explains:

This is admittedly partly "for discussion". We need to have a way forward for the boot time deadlocks where user space ends up waiting for more entropy, but no entropy is forthcoming because the system is entirely idle just waiting for something to happen.

While this was triggered by what is arguably a user space bug with GDM/gnome-session asking for secure randomness during early boot, when they didn't even need any such truly secure thing, the issue ends up being that our "getrandom()" interface is prone to that kind of confusion, because people don't think very hard about whether they want to block for sufficient amounts of entropy.

The approach here-in is to decide to not just passively wait for entropy to happen, but to start actively collecting it if it is missing. This is not necessarily always possible, but if the architecture has a CPU cycle counter, there is a fair amount of noise in the exact timings of reasonably complex loads.

We may end up tweaking the load and the entropy estimates, but this should be at least a reasonable starting point.

So once this kernel is available in your distribution, you should be safe from entropy starvation at boot on any platform that has hardware timers (I haven't encountered one that does not in the last decade).

Ted Ts'o reviewed the approach and was fine and Ahmed Dawish did some testing of the quality of randomness generated and that seems fine, too.

Updates

14.01.2019

Stefan Fritsch, the Apache2 maintainer in Debian, OpenBSD developer and a former Debian security team member stumbled over the systemd issue preventing Apache libssl to initialize at boot in a Debian bug #916690 - apache2: getrandom call blocks on first startup, systemd kills with timeout.

The bug has been retitled "document getrandom changes causing entropy starvation" hinting at not fixing the underlying issue but documenting it in the Debian Buster release notes.

Unhappy with this "minimal compromise" Stefan wrote a comprehensive summary of the current situation to the Debian-devel mailing list. The discussion spans over December 2018 and January 2019 and mostly iterated what had been written above already. The discussion has - so far - not reached any consensus. There is still the "systemd stance" (not our problem, fix the daemons) and the "ssh/apache stance" (fix systemd, credit entropy).

The "document in release notes" minimal compromise was brought up again and Stefan warned of the problems this would create for Buster users:

> I'd prefer having this documented in the release notes:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916690
> with possible solutions like installing haveged, configuring virtio-rng,
> etc. depending on the situation.

That would be an extremely user-unfriendly "solution" and would lead to 
countless hours of debugging and useless bug reports.

This is exactly why I wrote this blog entry and keep it updated. We need to either fix this or tell everybody we can reach before upgrading to Buster. Otherwise this will lead to huge amounts of systems dead on the network after what looked like a successful upgrade.

Some interesting tidbits were mentioned within the thread:

Raphael Hertzog fixed the issue for Kali Linux by installing haveged by default. Michael Prokop did the same for the grml distribution within its December 2018 release.

Ben Hutchings pointed to an interesting thread on the debian-release mailing list he kicked off in May 2018. Multiple people summarized the options and the fact that there is no "general solution that is both correct and easy" at the time.

Sam Hartman identified Debian Buster VMs running under VMware as an issue, because that supervisor does not provide virtio-rng. So Debian VMs wouldn't boot into ssh availability within a reasonable time. This is an issue for real world use cases albeit running a proprietary product as the supervisor.

16.01.2019

Daniel Kahn Gillmor wrote in to explain a risk for VMs starting right after the boot of the host OS:

If that pool is used by the guest to generate long-term secrets because it appears to be well-initialized, that could be a serious problem.
(e.g. "Mining your P's and Q's" by Heninger et al -- https://factorable.net/weakkeys12.extended.pdf)
I've just opened https://bugs.launchpad.net/qemu/+bug/1811758 to report a way to improve that situation in qemu by default.

So ... make sure that your host OS has access to a hardware random number generator or at least carries over its random seed properly across reboots. You could also delay VM starts until the crng on the host Linux is fully initialized (random: crng init done).
Otherwise your VMs may get insufficiently generated pseudo-random numbers and won't even know.

12.03.2019

Stefan Fritsch revived the thread on debian-devel again and got a few more interesting tidbits out of the developer community:

Ben Hutchings has enabled CONFIG_RANDOM_TRUST_CPU for Debian kernels from 4.19.20-1 so the problem is somewhat contained for recent CPU AMD64 systems (RDRAND capable) in Buster.

Thorsten Glaser developed early-rng-init-tools which combine a few options to try and get entropy carried across boot and generated early during boot. He received some scrutiny as can be expected but none that would discourage me from using it. He explains that this is for early boot and thus has initrd integration. It complements safer randomness sources or haveged.

16.04.2019

The Debian installer for Buster is running into the same problem now as indicated in the release notes for RC1. Bug #923675 has details. Essentially choose-mirror waits serveral minutes for entropy when used with https mirrors.

08.05.2019

The RDRAND use introduced in systemd to bypass the kernel random number generator during boot falls for a AMD pre-Ryzen bug as RDRAND on these systems doesn't return random data after a suspend / resume cycle. Added an update note to the systemd section above.

03.06.2019

Bastian Blank reports the issue is affecting Debian cloud images now as well as cloud-init generates ssh keys during boot.

10.07.2019

Added the update of jitterentropy_rng to a version based on upstream v2.1.2 into the Jitterentropy section above.

16.09.2019

The Linux Kernel Mailing List (LKML) is re-iterating the entropy starvation issue and the un-willingness of systemd to fix its usage of randomness in early boot. Ahmed S. Darwish has reported the issue leading to ext4 reproducibly blocking boot with Kernel 5.3-r8. There are a few patches floated and the whole discussion it worth reading albeit non-conclusive as of now.

Ted Ts'o says "I really very strongly believe that the idea of making getrandom(2) non-blocking and to blindly assume that we can load up the buffer with 'best efforts' randomness to be a terrible, terrible idea that is going to cause major security problems that we will potentially regret very badly. Linus Torvalds believes I am an incompetent systems designer." in this email.

In case you needed a teaser to really start reading the thread! Linus Torvalds also mentions the issue (and a primer on what "never break userspace" means) in the Linux kernel 5.3 release notes.

18.09.2019

... and Martin Steigerwald kindly noticed that I update this blog post with the relevant discussions I come across as this entropy starvation mess continues to haunt us.

25.11.2019

Added the "Linus to the rescue" section after the Linux kernel 5.4 has been released.

02.04.2020

I ran into the same issue on a Gentoo system today. Luckily OpenRC handeled this gracefully but it delayed booting: syslog-ng actually hangs the boot for some time ... waiting for entropy. Argh. The Gentoo forums thread on the topic clearly listed the options:

  1. Make syslog-ng depend on haveged by adding rc_syslog_ng_need="haveged" to /etc/rc.conf (and obviously having haveged installed)
  2. Re-compiling the kernel with CONFIG_RANDOM_TRUST_CPU=y where that is an option

  1. it will return with EAGAIN in the GRND_NONBLOCK use case. The blocking behaviour when lacking entropy is a security measure as per Bug #1559 of Google's Project Zero

  2. Update 18.12.2018: "SysVinit times" ::= "The times when most Linux distros used SysVinit over other init systems." So Wheezy and previous for Debian. Some people objected to the statement, so I added this footnote as a clarification. See the discussion in the comments below. 

  3. there is no Buster branch in the release notes repository yet (17.12.2018). Update: I wrote a section for the release notes 06.05.2019 and Paul Gevers amended and committed that. So when users of affected systems read the release notes before upgrading to Buster they will hopefully not be surprised (and worried) by the long boot delays. 

Google GMail continues to own the email market, Microsoft is catching up

Other

Back in 2009 I wrote about Google's GMail emerging as the dominant platform for email. It had 46% of all accounts I sampled from American bloggers for the Ph.D. thesis of a friend. Blogging was big back then :-).

Now I wondered how things have changed over the last decade while I was working on another email related job. Having access to a list of 2.3 million email addresses from a rather similar (US-centric) demographic, let's do some math:

Google's GMail has 39% in that (much larger, but still non-scientific and skewed) sample. This is down from 46% in 2009. Microsoft, with its various email domains from Hotmail to Live.com has massively caught up from 10% to 35%. This is definitely also due to now focussing more on the strong Microsoft Office brands e.g. for Office 365 and Outlook.com. Yahoo, the #2 player back in 2009, is at 18%, still up from the 12% back then.

So Google plus Microsoft command nearly ¾ of all email addresses in that US-centric sample. Adding Yahoo into the equation leaves the accounts covered at >92%. Wow.

Email has essentially centralized onto three infrastructure providers and with this the neutrality advantage of open standards will probably erode. Interoperability is something two or three players can make or break for 90% of the user base within a single meeting in Sunnyvale.

Google is already trying their luck with "confidential email" which carry expiry dates and revokable reading rights for the recipient. So ... not really email anymore. More like Snapchat. Microsoft has been famous for their winmail.dat attachments and other negligence of email best practices. Yahoo is probably busy trying to develop a sustainable business model and trying to find cash that Marissa didn't spend so hopefully less risk of trying out misguided "innovations" in the email space from them.

All other players are less that 1% of the email domains in the sample. AOL used to have 3.1% and now the are at 0.6% which is in the same (tiny) ball park as the combined Apple offerings (mac.com, me.com) at 0.4%.

There is virtually no use of the new TLDs for (real, user)1 email. Just a few hundreds of .info and .name. And very few that consider themselves .sexy or .guru and want to tell via their email TLD.

Domain owner   2009 2018
GMail   46.1% 38.6%
Yahoo 11.6% 18.3%
Microsoft 9.9% 35.4%
AOL 3.1% 0.6%
Apple 1.0% 0.4%
Comcast 2.3% 0.2%
SBCGlobal 0.9%   0.09%

  1. There is extensive use of cheap TLDs for "throw-away" spam operations

Tales from the Edge. #Security.

Fun

Late 2017, King county, Washington

An overworked team with an impossible mission, to create a secure Internet browser, on Windows, is called to the weekly time-waster product team meeting.

Product Manager:
Team, you know that Edge needs to be the most secure browser on the planet, right?
So how can this thing segfault if some dude from the security consultancy fuzzes the Backup.dat?

You MUST make sure this is protected. It MUST be a violation of Windows Policy to modify the file. Go, make it happen! Report back next week!

The team disperses.

Early next morning, at a set of tables in the middle of a dimly lit cube farm...

Developer:
Hey, team lead, do you know what the PM meant with "Windows Policy"? I never heard about a "Windows Policy". Is this the "Group Policy"? Or did he mean the product license? Like the shrink-wrap contract? Do we need to consult legal?

Team lead:
Oh, ffs, Bob. No time for discussion. The requirement is crystal clear. Implement it. You're the security lead. We have a deadline approaching.

Developer:
O.k., boss. I'll see what I can do.

Windows Edge backup folder "Protected - It is a violation of Windows Policy to modify"

Continue reading "Tales from the Edge. #Security."

Prevent Ubuntu from phoning home

Linux

Ubuntu unfortunately has decided again to implement another "phone home" feature, this time transferring your lsb_release information, CPU model and speed (from /proc/cpuinfo), uptime output, most of uname -a and curl version to a Ubuntu news web-service.

Here is the Launchpad bug report #1637800 introducing this ... web bug.

This thing runs both systemd-timer based (via /lib/systemd/system/motd-news.service and /lib/systemd/system/motd-news.timer) and on request when you log in (via /etc/update-motd.d/50-motd-news).

Ubuntu news on ssh login

There has even been a bug filed about the motd advertising HBO's Silicon Valley show.

To prevent this from running (it is enabled by default on Ubuntu 17.04 and may probably propagate down to earlier versions as well), edit /etc/default/motd-news to include

ENABLED=0

so

sed -i "s/ENABLED=1/ENABLED=0/" /etc/default/motd-news # run as root

for your automated installs.

Update:

02.07.2017: Dustin Kirkland responded to a YC "hacker news" mention of his motd spam. He mentions:

You're welcome to propose your own messages for merging, if you have a well formatted, informative message for Ubuntu users.
We'll be happy to review and include them in the future.

What could possibly go wrong?

Generate an indexed list of passwords

Other

Generating an indexed list of passwords without complex perl or python:

pwgen -y 20 30 | nl -w 2 -n rz -s -

Explanation:

pwgen: -y = complex passwords (including symbols) ; 20 = length of password; 30 = number of passwords to generate

nl: -w 2 = zero pad to a width of two characters; -n rz = print right-justified; -s - = use dash as a separator

screenshot of pwgen | nl

Security is hard, open source security unnecessarily harder

IT

Now it is a commonplace that security is hard. It involves advanced mathematics and a single, tiny mistake or omission in implementation can spoil everything.

And the only sane IT security can be open source security. Because you need to assess the algorithms and their implementation and you need to be able to completely verify the implementation. You simply can't if you don't have the code and can compile it yourself to produce a trusted (ideally reproducible) build. A no-brainer for everybody in the field.

But we make it unbelievably hard for people to use security tools. Because these have grown over decades fostered by highly intelligent people with no interest in UX.
"It was hard to write, so it should be hard to use as well."
And then complain about adoption.

PGP / gpg has received quite some fire this year and the good news is this has resulted in funding for the sole gpg developer. Which will obviously not solve the UX problem.

But the much worse offender is OpenSSL. It is so hard to use that even experienced hackers fail.

IRC wallop on hackint

Now, securely encrypting a mass communication media like IRC is not possible at all. Read Trust is not transitive: or why IRC over SSL is pointless1.
Still it makes wiretapping harder and that may be a good thing these days.

LibreSSL has forked the OpenSSL code base "with goals of modernizing the codebase, improving security, and applying best practice development processes". No UX improvement. A cleaner code for the chosen few. Duh.

I predict the re-implementations and gradual improvement scenarios will fail. The nearly-impossible-to-use-right situation with both gpg and (much more importantly) OpenSSL cannot be fixed by gradual improvements and however thorough code reviews.

Now the "there's an App for this" security movement won't work out on a grand scale either:

  1. Most often not open source. Notable exceptions: ChatSecure, TextSecure.
  2. No reference implementations with excellent test servers and well documented test suites but products. "Use my App.", "No, use MY App!!!".
  3. Only secures chat or email. So the VC-powered ("next WhatsApp") mass-adoption markets but not the really interesting things to improve upon (CA, code signing, FDE, ...).
  4. While everybody is focusing on mobile adoption the heavy lifting is still on servers. We need sane libraries and APIs. No App for that.

So we need a new development, a new code, a new open source product. Sadly so the Core Infrastructure Initiative so far only funds existing open source projects in dire needs and people bug hunting.

It basically makes the bad solutions of today a bit more secure and ensures maintenance of decade old crufty code bases. That way it extends the suffering of everybody using the inadequate solutions of today.

That's inevitable until we have a better stack but we need to look into getting rid of gpg and OpenSSL and replacing it with something new. Something designed well from the ground up, technically and from a user experience perspective.

Now who's in for a five year funding plan? $3m2 annually. ROCE 0. But a very good chance to get the OBE awarded.

Keep calm and enjoy the silence

Updates:

10.06.22: Carl Tashian made a GUI mockup to show the complexity of the OpenSSL "user interface".

21.07.19: A current essay on "The PGP problem" is making rounds and lists some valid issues with the file format, RFCs and the gpg implementation. The GnuPG-users mailing list has a discussion thread on the issues listed in the essay.

19.01.19: Daniel Kahn Gillmor, a Senior Staff Technologist at the ACLU, tried to get his gpg key transition correct. He put a huge amount of thought and preparation into the transition. To support Autocrypt (another try to get GPG usable for more people than a small technical elite), he specifically created different identities for him as a person and his two main email addresses. Two days later he has to invalidate his new gpg key and back-off to less "modern" identity layouts because many of the brittle pieces of infrastructure around gpg from emacs to gpg signature management frontends to mailing list managers fell over dead.

28.11.18: Changed the Quakenet link on why encrypting IRC is useless to an archive.org one as they have removed the original content.

13.03.17: Chris Wellons writes about why GPG is a failure and created a small portable application Enchive to replace it for asymmetric encryption.

24.02.17: Stefan Marsiske has written a blog article: On PGP. He argues about adversary models and when gpg is "probably" 3 still good enough to use. To me a security tool can never be a sane choice if the UI is so convoluted that only a chosen few stand at least a chance of using it correctly. Doesn't matter who or what your adversary is.
Stefan concludes his blog article:

PGP for encryption as in RFC 4880 should be retired, some sunk-cost-biases to be coped with, but we all should rejoice that the last 3-4 years had so much innovation in this field, that RFC 4880 is being rewritten[Citation needed] with many of the above in mind and that hopefully there'll be more and better tools. [..]

He gives an extensive list of tools he considers worth watching in his article. Go and check whether something in there looks like a possible replacement for gpg to you. Stefan also gave a talk on the OpenPGP conference 2016 with similar content, slides.

14.02.17: James Stanley has written up a nice account of his two hour venture to get encrypted email set up. The process is speckled with bugs and inconsistent nomenclature capable of confusing even a technically inclined person. There has been no progress in the last ~two years since I wrote this piece. We're all still riding dead horses. James summarizes:

Encrypted email is nothing new (PGP was initially released in 1991 - 26 years ago!), but it still has a huge barrier to entry for anyone who isn't already familiar with how to use it.

04.09.16: Greg Kroah-Hartman ends an analysis of the Evil32 PGP keyid collisions with:

gpg really is horrible to use and almost impossible to use correctly.

14.11.15:
Scott Ruoti, Jeff Andersen, Daniel Zappala and Kent Seamons of BYU, Utah, have analysed the usability [local mirror, 173kB] of Mailvelope, a webmail PGP/GPG add-on based on a Javascript PGP implementation. They describe the results as "disheartening":

In our study of 20 participants, grouped into 10 pairs of participants who attempted to exchange encrypted email, only one pair was able to successfully complete the assigned tasks using Mailvelope. All other participants were unable to complete the assigned task in the one hour allotted to the study. Even though a decade has passed since the last formal study of PGP, our results show that Johnny has still not gotten any closer to encrypt his email using PGP.

  1. Quakenet has removed that article citing "near constant misrepresentation of the presented argument" sometime in 2018. The contents (not misrepresented) are still valid so I have added and archive.org Wayback machine link instead. 

  2. The estimate was $2m until end of 2018. The longer we wait, the more expensive it'll get. And - obviously - ever harder. E.g. nobody needed to care about sidechannel attacks on big-LITTLE five years ago. But now they start to hit servers and security-sensitive edge devices. 

  3. Stefan says "probably" five times in one paragraph. Probably needs an editor. The person not the application. 

Security by policy does not work

Management

The laptop systems aboard the International Space Station (ISS) have been infected by computer viruses and worms multiple times. The W32.Gammima.AG virus made it to space in July 2008. And it happily spread from laptop to laptop onboard the ISS. The virus has been written to steal credentials for some common games. It is unknown how many of these were run in orbit. The latency would kill the experience for sure.

I am sure there have been policies in place to prevent astronauts carrying personal soft- and hardware up to the ISS. Personal items must be explicitly applied for and will only be approved after severe scrutiny of each item. Even beyond the obvious security considerations, this is necessary as the launch weight needs to be calculated exactly.
NASA and Roscosmos both have very strict policies for their personnel and strict training to make sure they know and follow policy. The group of astronauts primarily affected by the policy is very well known and counts a few dozen heads.

Still at least one infected USB stick made it up to the ISS and could spread its malware. Other infections have happened and we can assume similar infection vectors.

So the policy has proven unenforceable. It is broken. It is still correct per se. There is nothing wrong with prohibiting personal soft- and hardware in a high risk environment. So the policy stays in place. NASA still needed to make sure to rely much less on its effectiveness.

Hence NASA did the only sane thing: Move from an unenforceable policy to a technically feasible solution, significantly reducing the security exposure. In May 2013 NASA announced the ISS laptops are being migrated to Debian 6. Imagine how much pressure Microsoft must have put up to prevent such a technical decision due to the adverse marketing message it provides along the way. And still the engineers at NASA saw this as the best way forward.

The take-away message here is: Security by policy does not work.

Continue reading "Security by policy does not work"

Encrypting files with openssl for synchronization across the Internet

Linux

Well, shortly after I wrote about encrypting files with a keyfile / passphrase with gpg people asked about a solution with openssl.

You should prefer to use the gpg version linked above, but if you can't, below is a script offering the same functionality with openssl.

You basically call crypt_openssl <file> [<files...>] to encrypt file to file.aes using the same keyfile as used in the gpg script (~/.gnupg/mykey001 per default).

A simple crypt_openssl -d <file.aes> [<files.aes...>] will restore the original files from the encrypted AES256 version that you can safely transfer over the Internet even using insecure channels.

Please note that you should feed compressed data to crypt_openssl whenever you can. So use preferably use it on .zip or .tar.gz files.

Continue reading "Encrypting files with openssl for synchronization across the Internet"

Encrypting files with gpg for synchronization across the Internet

Linux

Automatically transferring (syncing) files between multiple computers is easy these days. Dropbox, owncloud or bitpocket to name a few. You can imagine I use the latter (if you want a recommendation)1.

In any case you want to encrypt what you send to be stored in "the cloud" even if it is just for a short time. There are many options how to encrypt the "in flight" data. Symmetric ciphers are probably the safest and most widely researched cryptography these days and easier to use than asymmetric key pairs in this context as well.

Encryption is notoriously hard to implement correctly and worthless when the implementation is flawed. So I looked at gpg, a well known reference implementation, and was amazed that it can neither use a proper keyfile for symmetric encryption (you can just supply a passphrase via --passphrase-file) nor does it handle multiple files on the command line consistently. You can use --multifile (wondering...why does a command need that at all?) with --decrypt and --encrypt (asymmetric public/private key pair encryption) but not with --symmetric (symmetric shared key encryption). Duh!

With a bit of scripting around the gpg shortcomings, you end up with crypt_gpg that can nicely encrypt or decrypt multiple files (symmetric cipher) in one go.


  1. Dropbox is closed source so it cannot be assessed for its security. Owncloud needs a thorough code review before I would dare to run it on my systems. 

Continue reading "Encrypting files with gpg for synchronization across the Internet"

Securing the grub boot loader

Open Source

Since version 2.0 the behaviour of grub regarding passwords has changed quite substantially. It can be nicely used to secure the boot process so that a X display manager (gdm, kdm, lightdm, ...) or login prompt cannot be circumvented by editing the Linux kernel boot command line parameters. The documentation is concise but many old how-tos may lead you down the wrong GNU grub "legacy" (the pre-2.0 versions) path.

So this assumes you have a grub installed and working. I.e. if you press Shift during boot, you get a grub menu and can edit menu entries via the e key.

First you need to setup grub users and corresponding passwords:

Run grub-mkpasswd-pbkdf2 to encrypt every password you want to use for grub users (which are technically unrelated to Linux system users at this time).
You'll get a string like 'grub.pbkdf2.sha512.10000...'. It will replace the plain text passwords.

In '/etc/grub/40_custom' add lines like:

# These users can change the config at boot time and run any menuentry:
set superusers="root user1"
password_pbkdf2 root grub.pbkdf2.sha512.10000.aaa...
password_pbkdf2 user1 grub.pbkdf2.sha512.10000.bbb...
# This user can only run specifically designated menuentries (not a superuser):
password_pbkdf2 user2 grub.pbkdf2.sha512.10000.ccc...

Now once you did this grub v. 2.0+ will ask for a supervisor password every time you want to boot any menu item. This is a changed behavior from v. 1.9x which defaulted to allow all entries if no user restriction was specified. So you need to add '--unrestricted' to all 'menuentries' that any user shall be able to boot. You can edit '/boot/grub/grub.cfg' and add --unrestricted to (the default) menuentries. Or you can edit the 'linux_entry ()' function in '/etc/grub/10_linux' so that the 'echo "menuentry ..."' lines include --unrestricted by default:

[...]
echo "menuentry '$(echo "$title" | grub_quote)' --unrestricted ${CLASS} \$menuentry_id_option 'gnulinux-$version-$type-$boot_device_id' {" | sed "s/^/$submenu_indentation/"
else
echo "menuentry '$(echo "$os" | grub_quote)' --unrestricted ${CLASS} \$menuentry_id_option 'gnulinux-simple-$boot_device_id' {" | sed "s/^/$submenu_indentation/"
[...]

Make a backup of this file as it will be overwritten by grub updates. This way all Linux kernels detected by the script will be available to all users without identifying to grub via username / password.

Now issue update-grub to re-generate 'grub.cfg' with the amended menuentries.

If everything worked well, your system can now be booted unrestricted but the grub configuration can only be changed from the grub superusers after identifying with their username and password at the grub prompt.

Bonus point:

If you want to create menuentries that user2 (and any superuser) from the above example user list can run, add blocks like these to the end of '40_custom':

menuentry "Only user2 (or superuser) can run this Windows installation" --users user2 {
set root=(hd1,1)
chainloader +1
}

Update

16.12.2015:
Hector Marco and Ismael Ripoll have found a nearly unbelievable exploit in Grub2 that allows you to tap backspace 28 times to get a rescue shell and that way bypass a password prompt. Time to update!
Read the excellent analysis of the bug and the exploit vector in Hector Marco's blog post.

Google GMail dominating the email market

Other

Google's GMail was launched in April 2004 and only in February 2007 Google dropped its invite system to open up to the general public acc. to Wikipedia's history of GMail. That's some five years of operations up to now.

It kind of amazed me how many people I know have GMail as their primary mail provider. So I took the chance today to get a bit of statistics to check my gut feelings:

A friend of mine selected some (mostly American) bloggers that have indicated specific interests in a topic related to his Doctoral thesis. This sample ended up to be 1,375 people. These folks have 295 different email domains. Only.

A whooping 46% of the (rather random) sample use GMail, 12% Yahoo, 8% Hotmail and about 3% AOL. While Yahoo has some foreign domains in the sample (yahoo.co.uk, yahoo.ca, see mostly American bloggers above), these add up to around 0.1% of the sample so it's not really significant.

Distribution of American blogger's email domains

This data is in no way representative, but still wow. Google basically has a monopoly on search and now seems to have a close-to-majority footprint in personal email.

I guess the dominance is currently larger in the States than in Europe or Asia as GMail has only gradually learned languages beyond English.
Large local providers should also have some foothold in these markets. Similar to the Comcast and SBC customers still significant in sample depicted above. Just the local providers in Europe and Asia will be somewhat stronger (for now). Google is also aggressively targeting corporations with hosted email and apps now so one can expect further and accelerated growth in that area. Quite a number of companies are considering using hosted email instead of the conventional mail system they have operated on site for many years now.

So while Gina Trapani recommends "Break Google's Monopoly on Your Data: Switch to Yahoo Search", may I humbly point out: It's becoming quite impossible to just keep your emails between the recipient and the addressee these days.

Even if you personally do not use GMail, Google can (technically) still profile you because a huge chunk of people you communicate with send from GMail and receive and store your emails there.

Nearly all email that is sent also passes spam filters before delivery. Google bought the Postini spam filter in 2007. That anti-spam service is used by many enterprises and even city governments, see here.

So time to consider (unencrypted) email as what it has always been: The digital equivalent of a postcard.
Just now Google has become the postmen. All of them, every second shift. You should hope they're not nosey. Or send letters.

Update:

11.05.2014: Benjamin Mako Hill has written a blog entry Google Has Most of My Email Because It Has All of Yours doing analysis for his own email box. He found a third of his inbox emails come from Google and - as he doesn't usually reply to newsletters and the like - more than half of his own email replies (57% in 2013) end up at GMail. He published his code in case you want to do the analysis on our own email.

Disabling a group policy'd screensaver on Windows

IT

I guess many people know the issue of having a screen saver forced active after a some time through a group policy in a corporate environment. This is usually done to make sure systems are locked during breaks if people forget to press Win+L (or Ctrl+Alt+Del and then Enter). While that may well help IT security, it turns problematic when giving presentations for extended periods of time. Having to move the mouse through the presentation pointer every few minutes or dash back to the PC once the screen saver has kicked in, again, is simply annoying. On your company's systems you may be able to get the system admins to allow configuration of the interval or allow for disabling the screen saver, but on foreign systems you're often lost. But...

Continue reading "Disabling a group policy'd screensaver on Windows"

Remote keyless entry system Keeloq broken by security researchers

Vehicles

The remote keyless entry system KeeLoq is being used by Chrysler, Daewoo, Fiat, General Motors, Honda/Infiniti, Jaguar, Toyota/Lexus, Volvo and Volkswagen. A number of garage door opening systems and the like also use this technology. It is based on a secret cipher that has now been compromised by an international IT security research team. Two intercepted messages are deemed sufficient to clone a KeeLoq RFID tag as there are general keys inserted by the manufacterer and the key structure is partially determined by make and model. A stronger KeeLoq implementation (still) needs physical access to the key but only for a few minutes. It's also possible to permanently lock the legitimate owner out of his car or building and render his KeeLoq RFID useless. Details can be found at the researchers site and the folks at Wikipedia have also amended their KeeLoq article.