Jim's Depository

this code is not yet written
 

femtoblogger has reached that odd state for software. It works well enough that I am happy using it. There are rough edges, but not rough enough that I will fix them.

The only thing I have changed recently is to add a meta robots tag to suggest the aggregate pages, like the front page and archive months, not be indexed. That should help keep the clicks on target. I already had robot tags to deter indexing of all the non-content pages.

There remain two rough points:

  1. The WYSIWYG editing: This is still a bit awkward. Sometimes I get stuck in bold and have to pop into HTML mode to get out. Pasting in code ends up double spaced. I could make lots of elaborate workarounds, but I consider these to be browser bugs and hope they will shake themselves out over time. I could also switch to one of the giant WYSIWYG javascript editors, but then I wouldn’t be very femto, would I.
  2. I keep having a nagging desire to have images. I could do it now by attaching the image and making my own IMG tag, but I’m too lazy for that. I’ve been holding off coding proper image support (with resizing for display and full resolution click through) on the grounds that if I’m too lazy to type my own image tag then surely I shouldn’t spend a couple hours making full image support.

I suppose since femtoblogger has become stable it is time to move it into a public subversion repository.

A Debian administrator might want to install…

  • debsums  - check installed files for tampering, not complete, but a good start.
  • rkhunter - look for root kits.
  • chkrootkit - look for root kits.

Think about running these regularly to catch your basic root kitter.

You could cron them, but I prefer to run them manually, since I know I’d pull the cron entry if I rooted you.

I suppose you could do a forced reinstall before running for a little extra comfort.

I think a better tool would be one that used a central repository with a copy of each package and called on the observed machine to generate on the fly signatures of files with a random seed.

A truly nasty rooter could still thwart that by faking things in either the C runtime library or the appropriate system calls.

I write from the end of June, 2008 having just completed a quarterly spam analysis and adjustment. Following is a brief description of the mail community, the incoming mail stream, how I process it, and the results.

The Mail Community

  • 150 people, mostly engineers in a software company
  • old addresses, average age 5 years+
  • many “first name” addresses

The Incoming Mail Stream

  • We are running about 500 to 1000 incoming emails per hour.
  • 95% of the incoming email is spam, 5% is real.

The Process

  • No mail is destroyed or rejected for spaminess, it is marked with a header and the mail clients shuttle it off to a junk folder, just in case.
  • All mail first passes through bogofilter. This can definitively mark a message as real mail or spam or it may be unsure and pass the message on to more expensive filters. 90% of the real mail is discovered at this point, as is 85% of the spam. I have a broad ‘unsure’ area to reduce false positives.
  • Only the 15% or so of the mail that bogofilter was not sure about will proceed to the following filters.
  • The second filter is dcc, the distributed checksum clearinghouse. This sends a fuzzy checksum to a central server and checks how many copies of the message have been seen so far. If it has been seen too many times then I consider it spam. This successfully discovers about 50% of the remaining spam with a quick round trip of a UDP packet.
  • clamav is used to detect viruses and mark them as spam so the mail clients will sequester them. This only marks a couple messages out of a 1000 incoming, but dcc marks many viruses so I don’t see the total size of my virus stream.
  • If a message is still uncharacterized it goes on to spamassassin. This discovers 90% of the remaining spam. That leaves about 0.3% of the total spam sneaking past my filters to offend the users. Spamassassin is configured to do the network checks, but not to use its bayesian filter, since bogofilter already does something similar.

The Results

  • 99.7% of the spam is detected and tagged.
  • <0.0% false positives. (I haven’t found one.)
  • CPU consumption small enough to be unmeasurable.
  • Mail which gets as far as spamassassin will take a 4 to 10 second delay while it processes. The other tests are fast enough to not be noticed.

Maintenance

The bogofilter works best if it is trained regularly to follow spam trends. I have in the past manually sorted thousands of messages into good and bad piles for training, but that is mind numbing. For ongoing training I do the following:

  • Anything that just barely got tagged as spam by bogofilter (scored above 85% but below 90%) is used as spam to train bogofilter. This tracks spam techniques as they drift out of my target sights without warping my spam stats by reporting 10,000 copies of the same message.
  • Anything that gets past bogofilter, but is subsequently caught by dcc or spamassassin is trained into bogofilter as spam. This catches new trends in spam.
  • Periodically I spot check the real mail, pick out any spam that squeaked through, and train it into the bogofilter to keep up with trends in our real mail.

Results

The end result is I spend dozens of man hours per year to stop 250,000 spam. I’d just hire google to front end filter our mail for \$3/address/year, but the security policy won’t allow that.

An extra note on bogofilter:

Bogofilter is built with a single user in mind. I'm sure it works better when it has a single user's mail to think about and can rely on the human to tag the false positives and negatives.

In a 150 user common filter you can rely on exactly 0 of them to report their miscategorized spam. If you try to force them to comply you will find that 10% of them do it backwards and pollute your statistics so badly you have to erase everything and start again.

That said, it works quite well and is speedy and doesn't rely on external network servers so it makes a good first line of defense.
Going forward:

I will have to drop dcc. Their licensing is no longer free enough to be distributed by Debian. That will slow more messages, but in practice anything dcc catches is also caught by spamassassin.

I'd like to add an adaptive whitelist out front to prevent false positives and give me a stream of known good messages for training the bogofilter. I haven't found one I like yet, but I keep looking. Maybe I'll have to write it.

This morning was 53°F in the cabin. When I awoke I didn’t need to crawl out into the cold to open the curtains and check the local weather because I could use my iPhone to access my server in St. Louis, that displays data from my server in Reston that de-NATs the server in Wisconsin so I can download live video from the webcam looking out from the front of the cabin. 

A different sort of geek might have built a heater.

I have a server which contains a bunch of virtual machines. These machines are continually harassed by script kiddies. I use Fail2ban to keep the trolling to a minimum. 

  • Each virtual machine sends its syslog activity to the physical server, using something like this in its syslog.conf…  *.* @some.host.com
  • The physical server saves all the syslog activity from the virtual machines, safe from tampering. (/etc/defaults/syslogd needs a -r)
  • fail2ban runs on the physical server and drops bans into the FORWARD chain to protect the inner machines.
  • The syslog port needs to be protected to only take traffic from trusted machines.  This ought to block anything from the machine’s two physical ethernets but let through the virtual ones… /sbin/iptables -I INPUT -p udp –dport 514 -m physdev –physdev-in eth0 -j REJECT /sbin/iptables -I INPUT -p udp –dport 514 -m physdev –physdev-in eth1 -j REJECT

Things that needed changing…

/etc/fail2ban/actions.d/iptables.conf… the actionstart and actionstop need to also put the chains into the FORWARD rule….

# Option:  fwstart

# Notes.:  command executed once at the start of Fail2Ban.

# Values:  CMD

#

actionstart = iptables -N fail2ban-<name>

              iptables -A fail2ban-<name> -j RETURN

              iptables -I INPUT -p <protocol> –dport <port> -j fail2ban-<name>

              iptables -I FORWARD -p <protocol> –dport <port> -j fail2ban-<name>

# Option:  fwend

# Notes.:  command executed once at the end of Fail2Ban

# Values:  CMD

#

actionstop = iptables -D INPUT -p <protocol> –dport <port> -j fail2ban-<name>

             iptables -D FORWARD -p <protocol> –dport <port> -j fail2ban-<name>

             iptables -F fail2ban-<name>

             iptables -X fail2ban-<name>

Interesting observation when using a single fail2ban on multiple machines. It catches horizontal sweeps much sooner. Today I noticed it catch someone that was making one try at root on each of my machines. The merged auth.log files tripped my 10 hour ban after one attempt on each of three machines.

Things you will want to know if you have to replace your OpenVPN certificates, because say you got caught in the Debian key entropy problem.

  • Don’t forget to also run build-key-server.
  • Don’t forget to copy keys/server.* and ca.crt up to /etc/openvpn if that is where you keep them.
  • Each windows client with old keys is going to chew up 30 slots in your server until they get new keys. If you have many users, you don’t have enough slots. The windows clients retry every two seconds, but it takes 60 seconds to time out on the server side.

I had to resort to grepping syslog and dropping firewall blocks on people trying old certificates. I used another script watching my http logs to unblock people who had created new certificates. “TLS Error: TLS key negotiation failed to occur within 60 seconds” is a good bit to select IPs for blocking.

You know you have too many clients connected if you see “MULTI: new incoming connection would exceed maximum number of clients“ in the syslog.

I’ve added a few features today.

  • There is now a ‘recent comments’ link under Contribute. It also has its own RSS feed, so I can watch for comments should the comment spammers get through.
  • Articles now have URLs constructed from their titles. I feel I could do better at this. I need to read some papers on automated abbreviators.
  • I fixed a nasty little problem where a sqlite3 “insert or replace” was whacking my password. Turns out that deletes the whole row on a replace, so you better be specifying all the field values.

And now 4 days later.

  • I now log much of the POST methods into my database. There is a distributed botnet trying to leave a comment on one article in this blog. I’m curious what they wish to say. I’d hate to miss out on a first contact situation.
  • Found a couple URLs that were missing their “.php”.  Apache was ok with this, lighttpd is not.
I have made contact with the robots. We should all be afraid. Thus far the robots have attempted to add these comments:
  • SEX
  • SEX
  • SEX SEX SEX LOVE
  • zubav1na-ps1h1chesk1e-bolezn1  except the digits 1 are supposed to be the letter 'i', I just didn't want to get indexed by it.
I suppose some filtering software will now block my site because it talks about sex.
More robot chatter:
  • fandango
  • tatuazh
So if a tattooed robot offers to dance the fandango with you, you should know it only wants sex.

Hi, apologies for disturbing you, but could you help me out?. My USDT TRX20 is in the OKX wallet, and the recovery phrase is <>clean party soccer advance audit clean evil finish tonight involve whip action ]. What's the process to transfer it to Binance?

I noticed that the Opera browser rocketed up to 38.4% of my hits. A quick dig of the logs shows that I am being drilled by bots that look to be trying to create link spam and masquerade as Opera browsers.

I suppose eventually they will have a human help them through the captcha and succeed. I have changed things about so untrusted users will get rel=nofollow tags on all their links. Maybe that will make them lose interest and go away.

I should probably make an RSS feed for comments while I’m at is so I notice when they get through the defenses.

dark web market urls dark web sites

While updating my systems monitoring I discovered Munin today. Munin captures a wide variety of system information and dumps it into RRD files to ultimately graph it at a central location.

The user interface doesn’t communicate problems well, but it provides the underlying data for you to answer those nagging questions that come up, like “When did our email traffic get so high?” or “Has that disk always run that hot?”

And my install notes:

  • Also install sensord, smartmontools, and ethtool when installing the munin-node package.
  • Make sure to punch a firewall rule for port 4949 from the central machine. The central machine does not want to get hung on one of the nodes.
  • Run sensors-detect after installing sensord.
  • Check your syslog after restarting sensord, a bunch of my Dells explode the daemon with a problem in the fan sensor, I have to take lm85 out of their modules. 
  • If you are a linux user with SATA drives, then when you install smartmontools, you must edit the /etc/smartd.conf file to comment out the DEVICESCAN line and put in a specific device line with the “-d ata”. May as well add a “-m you@email.address” while you are there so it will notify you. Don’t forget to edit /etc/defaults/smartmontools to enable the daemon.
  • In /etc/munin/plugins/ you might wish to take out the ntp* files, unless you care about that. Look where those links go and you will see other plugins you could link in. I added veth* ones for my virtual ethernets and the smart one to get all my drive failure data.
  • Again, if on linux with SATA drives you must edit the /etc/munin/plugin-conf.d/munin-node file in a couple of places to tell it to use “-d ata”

Sample bits of /etc/munin/plugin-conf.d/munin-node:

[hddtemp_smartctl] user root env.drives sda env.type_sda ata
[smart_sd*] user root env.smartargs -H -c -l error -l selftest -l selective -d ata

If you run sendmail as your mail server munin has 3 plugins that are in the base Debian install.  Link all 3 into your /etc/munin/plugins directory.   One, sendmail_mailqueue will work out of the box.  The other two depend on sendmail stats files that do not get created in a base Debian install.

To enable stats logging you must manually create the stats files.

# touch /var/lib/sendmail/sendmail.st
# touch /var/lib/sendmail/sm-client.st

Once these files have been created, with sendmail write permission, sendmail will start logging to them.  Gotta love sendmail, "If you create the log file for me, I will write to it."

You can test your mail statistics file creation manually with the mailstats command.
If you want to collect apache statistics with Munin you need to enable extended server status in apache.
ExtendedStatus On
<Location /server-status>
   SetHandler server-status
   Order deny,allow
   Deny from all
   Allow from 127.0.0.1
   Allow from munin-server.mydomain.com
</Location>

If your web server does not bind to localhost (127.0.0.1), you need to define the server status URL in your /etc/munin/plugin-conf.d/munin-node config file.
[apache_*]
env.url "http://servername.mydomain.com/server-status?auto"


After decades of backing up with dump I no longer do it. I suppose I got in the habit back in the days of tapes and just stayed through the disk years.

rsync is far better. I should have switched years ago.

  • Efficient incremental backups over networks, even for appending little bits to the ends of long files.
  • The backups are real directories of real files, easy to ferret about and find what you need.
  • Nifty trick to keep N days of backups without using N times the space, but still each tree looks like a snapshot.
  • Easy ssh based security.
  • You can do either a push or a pull depending on your security requirements.

The first thing to do is to read about the rsync –link-dest option. It lets you use hard links to share the contents of files across days of your backups.

The second thing to do is to decide on your backup strategy. For many machines I just keep 7 days of backups, it makes things easy. There is a backup for each day of the week and they overwrite when it wraps. For other applications where I have to go further back I rename directories, much like logrotate would.

The third thing to think about is what happens if your ssh key or rsync password is compromised. If you are running backups from cron, then there will be a machine readable key on your machine somehow. This may or may not be a danger depending how you have things secured. In my setups if you could get the key you could have gotten the data anyway. (Remember that your backup archive machine needs to be at least as secure as the live machine.)

Enough talking, more sample code:

Scenario #1: Many big machines, lots of bits to push, on the same secure network. We want to go fast. All the machines have different security policies.

I use push in this situation. There is a dedicated backup machine to receive and hold the bits. Only two trusted people have access to this machine. The backup machine runs an rsync daemon with a module for each host that lets the host write backups (only in its host specific area, write only). On each host there is a root cron job with the rsync password embedded to run the backup.

Sample backup script… cron these, offset their run times so keep contention down….

#!/bin/sh   
 HOST=`hostname` export
RSYNC_PASSWORD=1234567890abcdef12345678890abcdef   
 DAY=`date +%a | tr '[A-Z]' '[a-z]'`   
 case $DAY in   sun ) PDAY=sat ;;   mon ) PDAY=sun ;;   tue ) PDAY=mon
;;   wed ) PDAY=tue ;;   thu ) PDAY=wed ;;   fri ) PDAY=thu ;;   sat )
PDAY=fri ;; esac   
 OPTS="-aqH --link-dest=/$PDAY/ --no-devices --no-specials
--exclude=/proc/ --exclude=/sys/ --exclude=/dev/ --exclude=/tmp/
--delete"   
 time rsync $OPTS / $HOST@warehouse.federated.com::$HOST/$DAY

Sample module from rsyncd.conf…

[nexus]     auth users = nexus     secrets file = /etc/rsyncd.secrets   
 use chroot = yes     path = /warehouse/nexus     numeric ids = yes   
 list = no     read only = no     write only = yes     uid = 0     gid =
0     hosts allow = 111.222.33.44     hosts deny = *

Scenario #2: Offsite backup of Virtual Private Server

I have a machine that lives in a hosting facility. I have broken my first rule of service providers. They are not close enough for me to pop over and wrap my hands around someone if there is a problem, so I content myself with a full backup and the ability to be up and running at a new provider in 60 minutes if needed. I don’t want any credentials sitting on a machine at the hosting facility, so I do a pull in this situation. I also use ssh to protect my data in transit, but use the rsync daemon and modules on the far side to get better control, for instance to make it read only.

Cron job on my backup server (pardon the db_client related noise, that machine has to run dropbear instead of a more common ssh. But do notice that I have a –rsh to force a tunnel and a :: to use daemon mode and modules.)

#!/bin/bash

function saveone () { TAG=$1 SRC=$2

rm -rf vhosts/\$TAG.9 [ -d vhosts/\$TAG.8 ] && mv vhosts/\$TAG.8
vhosts/\$TAG.9 [ -d vhosts/\$TAG.7 ] && mv vhosts/\$TAG.7 vhosts/\$TAG.8
[ -d vhosts/\$TAG.6 ] && mv vhosts/\$TAG.6 vhosts/\$TAG.7 [ -d
vhosts/\$TAG.5 ] && mv vhosts/\$TAG.5 vhosts/\$TAG.6 [ -d vhosts/\$TAG.4
] && mv vhosts/\$TAG.4 vhosts/\$TAG.5 [ -d vhosts/\$TAG.3 ] && mv
vhosts/\$TAG.3 vhosts/\$TAG.4 [ -d vhosts/\$TAG.2 ] && mv vhosts/\$TAG.2
vhosts/\$TAG.3 [ -d vhosts/\$TAG.1 ] && mv vhosts/\$TAG.1 vhosts/\$TAG.2
[ -d vhosts/\$TAG.0 ] && mv vhosts/\$TAG.0 vhosts/\$TAG.1 [ -d
vhosts/\$TAG ] && mv vhosts/\$TAG vhosts/\$TAG.0
RSYNC\_PASSWORD=a8e261e7bac90138087f770caa5fea5b export RSYNC\_PASSWORD
OPTS="-aqHz --bwlimit=400 --exclude lost+found --exclude /tmp --exclude
/var/tmp --exclude /proc --exclude /sys --no-devices --no-specials
--delete" rsync \$OPTS --rsh "dbclient -l root -i .ssh/id\_archivist.db"
--link-dest=/home/archivist/vhosts/\$TAG.0/ \$SRC
/home/archivist/vhosts/\$TAG/ \> \~/\$TAG.log }

saveone studt-net rhth.lunarware.com::rhth

\~root/.ssh/authorized_keys on the virtual private server (Look at the bit in front of the ssh-dss, it restricts what that key can do, in particular it makes it only able to run the rsync daemon.)

no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="/rsync
--server --daemon ." ssh-dss
adfeadfaefasdfefeI\_DELETED\_MY\_KEY\_HEREadfasdfefadfae backups

rsyncd.conf

[machine] auth users = archivist secrets file = /etc/rsyncd.secrets path
= / numeric ids = yes list = no read only = yes write only = no uid = 0
gid = 0

There you have it. Reasonably safe backups. There is room for improvement, for instance, rather than coming straight into root with the restricted command it could be a different account and use “super” to run the command, and it should check the source IP and only work from the backup machine.

 

Eww, nasty double spacing of the code segments. I'll have to think about how to fix that. Safari put each line into its own div for some reason.
more articles