Technical Difficulties from on Top of the Mountain: August 2009

Technical Difficulties from on Top of the Mountain

2009-08-21

The same old code.

Today I was playing with cygwin, and decided to cat all the source code on my machine to the screen. Something like this:

find . -name '*.c' -exec cat {} \;

No particular reason, and it didn't take that long since it was all coming off a solid state drive, though it did pause a couple times ... But the very last lines were:

int main(void) {
        printf("hello world\n");
        return 0;
}

Some pretty old code there.

¶ 11:33 AM 0 comments

2009-08-13

Program to program communication

Communicating between programs usually means across computers, especially when its a client program ( or web browser ), talking to a server somewhere else. For that you almost universally use a inet socket ( TCP, or UDP if you're a glutton for punishment).

net plug
There used to be a bunch of other protocols, but IP pretty much crushed them all. Ungerman Bass had its own SNA back when the internet was starting to form, Digital Equipment Corp had DECnet, IBM had SNA, and Novell Netware used IPX. But TCP/IP was good enough, pretty darn simple, and as we went from 1mbit to 10,000mbit, other things became the bottleneck. Sure if you run a telco, you may still wax on about ATM, but even the ATM network is carrying TCP traffic.

On a server however, you have a fair amount of traffic staying on the same machine, passing back and forth between different programs; usually a result of dividing a problem down into smaller parts that are hopefully harder to mess up. So you need a mechanism for setting up connections and passing messages.

Even in this case you could use INET sockets, but that's not your only option, nor is it the best choice for a number of reasons. First, there's a lot of overhead to INET sockets. Even if your packet isn't going to cross great distances, the operating system still does all the packet overhead like it would. This puts a limit on the number of packets you can read and write, especially a problem if your communications is a bunch of short messages. Secondly, when you create a INET service, it is visible to the network beyond your computer, allowing anyone running a portscan to find it. So speed and security are both good reasons to look elsewhere.

As the workstation market began to grow, and AT&T decided to wade back into the unix market; they added new kernel services to System V, called IPC ( inter process communication ). There were three parts: semaphores, shared memory and message queues. Semaphores allowed passing access or control between processes to a shared resource, shared memory seemed like a good way to avoid having to pass around large data sets when disk access was expensive, and message queues gave you both an orderly mechanism passing data, as well as atomically handling a message. Unfortunately at the time, all these data structures existed in kernel memory, and they were fixed in size ( originally compiled into the kernel settings ), so on a typical machine they were ridiculously small. One one HP machine with 64MB, the limits shown were 64 semaphores, 4k message queue, and 64k shared memory. Even today on a machine with 2GB of RAM, ipcs -l shows a queue size limit of 16k.

Moving on.

On unix, creating the raw socket() itself is protocol neutral. Its just that most everyone in the universe uses INET. You can also use sockets for RAW packets, ATM, Appletalk, IPX, X25; and one more format called AF_UNIX ( although its now refered to as AF_LOCAL for POSIX reasons, but the structs are still all un_ ). A AF_UNIX socket is for communicating locally on the machine. Originally, like INET sockets, there was a private namespace, using 32bit numbers which you used for the "port" number, but then they expanded it to also allow you to map the socket into the filesystem, so you would get a file that showed up like this:

% ls -lF
total 0
srw------- 1 woolstar users 0 2009-06-26 23:46 agent.16975=

ssh-agent uses this to allow ssh processes to authenticate against a stored key. Back when this first showed up, on some OS's you could actually just talk to this entry like it was an ordinary device. That was sort of the spirit of unix, everything was a file. You could open up /dev/serial0 the same way you would open up foo.txt. So back in the day, you could open this file mapped socket, send it some data, and then close it, and the interaction on the server side would look just like you had telnet'd in. Sadly, AF_UNIX sockets don't work like this any more. Even though they're sitting right there in the filesystem, you can't just echo "hi" > mytest.sock You have to use socket functions to connect to it and read and write to it.

Still, if you control both sides, the upside is definitely worth it. In some tests I did around 2001, for smaller packets on a single processor machine, AF_UNIX connections could out-run AF_INET by over ten to one. Also AF_UNIX allows some funky *magic* data to be passed between machines. Like one process can pass an open file handle over to another process. You can also authenticate your user id and group across an AF_UNIX connection and the kernel will validate you to the recipient. And as I mentioned before, AF_UNIX connections are only available on the machine, so there's no outside hacking into these services.

But for my current project, the lack of transparently using AF_UNIX sockets was a bummer, because I have a project I'm working on where a process runs and then opens up an external file to write log entries to. I want to have those entries go immediately into another process for processing, and so I wanted to throw a named socket into the file system and have the first program log to that "file". The first process is this big third part server that I didn't want to have to mess with, so spoofing a file would have been ideal. Luckily there's something else available now that will do it.

fifo(7) or named pipes

fifo
A fifo is like a queue, where you put several things in, and then pull them out. In this case the first thing you put in is the first thing that comes out ( First In First Out ... fifo). There's also FILO, but it doesn't make as good an acronym in my opinion. So the modern linux kernel allows you to create a named pipe in the file system, and as an improvement, you don't even have to have any active processes attached to either end. It could just be sitting there. Then when you want, you attach and try to read from it, which doesn't do much cause there's nothing in it. When someone else comes along and write to it, then the message shows up to the reader.

There are some caveats of course. If no one is hanging around waiting for the message, you can't write to the named pipe. The kernel isn't going to save things up for you. Also, if several processes are reading to a named pipe, you can't tell the difference. Its not like sockets where each connection will have its own file handle and you can tell the lifespan of each client. But for the purposes of log processing it will do just fine. At least I hope it will. I will have to get back to you on that.

¶ 9:00 PM 0 comments

2009-08-10

Disturbing old things.

There was a mouse down in the basement this weekend. The kids got all excited, until the mouse was smushed, and then they were all like, "papa, time to cleanup." Note for future adventures, smashing things are the most effective when hunting mice. Very hard to stab a mouse.

In the process of cornering the criter, I had to disturb a pile of wood pellet sacks from last winter's burning operation. I was saving those to count them, but I've done a terrible job of keeping track of how much I actually burnt per month.

I have some notes in twitter account, which turns out to be a terrible place to put anything you want to lookup later. Best I can find is numbers for January and March, 1,400 and 1,600; and 1,320 for April. Overall I had six tons, and there's about 18 bags left out of 300.

It seems I burn as much pellets as I have. So the lesson there for conservation, buy less pellets. Temperatures already getting down to 40s at night, so we'll see how that goes.

¶ 12:00 PM 0 comments