Any ordinary robot could see from a mile away that this wasn't kitten.

Monday, October 19, 2009

5:45PM - Lisp libraries

I've released a few Common Lisp libraries this month:


Calispel is a Common Lisp library for thread-safe message-passing channels, in the style of the occam programming language.

Calispel channels let one thread communicate with another, facilitating unidirectional communication of any Lisp object. Channels may be unbuffered, where a sender waits for a receiver (or vice versa) before either operation can continue, or channels may be buffered with flexible policy options.

Because sending and receiving on a channel may block, either operation can time out after a specified amount of time.

A syntax for alternation is provided (like ALT in occam, or Unix select()): given a sequence of operations, any or all of which may block, alternation selects the first operation that doesn't block and executes associated code. Alternation can also time out, executing an "otherwise" clause if no operation becomes available within a set amount of time.


jpl-queues is a Common Lisp library implementing a few different kinds of queues.

These kinds of queues are provided:

  • Bounded and unbounded FIFO queues.
  • Lossy bounded FIFO queues that drop elements when full.
  • Unbounded random-order queues that use less memory than unbounded FIFO queues.

Additionally, a synchronization wrapper is provided to make any queue conforming to the jpl-queues API thread-safe for lightweight multithreading applications. (See Calispel for a more sophisticated CL multithreaded message-passing library with timeouts and alternation among several blockable channels.)


SimpSamp is a Common Lisp library for simple random sampling without selection. In English, that means it will return a requested number of random elements from a set, without duplicating any of the elements.

For instance, if you have a list of 1,000,000 elements and want 1,000 unique random elements from the list, SimpSamp will do the trick.

SimpSamp implements two algorithms: selection sampling and reservoir sampling, both described in The Art of Computer Programming.

Current music: Super Mario Brothers, world 2-1

Monday, July 27, 2009

3:33PM - Releasing Common Lisp geospatial libraries

cl-wkb implements the OGC Well-Known Binary geographic geometry data model, and provides WKB encoding and decoding functionality.

Three people have asked for it—two in one day—so today I've released it under a BSD-like license.

Of probably more use to Lisp GIS hackers (together with cl-wkb, or overall), I'm throwing in: cl-geo, a set of geographic data structures and operations.

I hope some people find it to be useful.

Sunday, July 19, 2009

9:34PM - How to manage a set of servers

This article is discusses sane system administration for a group of similarly-configured machines. It's also the basis for a talk that I'd like to give at the Spokane Linux User's Group.

Due credit: this is essentially my digested version of the concepts presented in the paper Bootstrapping an Infrastructure, and the derived material at http://www.infrastructures.org/. Some bits, like unified user accounts and NFS home directories, aren't important to me, so I haven't included them here. And I've been a lot more specific about choices of tools than the infrastructures.org folks have—you'll find a general Debian bias. =)

Due disclosure: I'm only in the initial stages of implementing for myself what I describe here. I'm writing this while the goals and principles are fresh in my mind. Actual mileage may vary, especially around section 5.


Your current situation may look something like this: you manage several machines—could be a few or it could be hundreds. These machines could be servers, routers, or workstations. You have certain preferences and practices for what you like to have in common for all your machines.

  • Security updates should run at 2:10am.
  • All my web servers should have these bits of configuration.
  • All my firewalls should have these particular rules.
  • I really want curl and wget on all my machines, and pv is a neat little utility that I want everywhere, too.

But there are problems:

  • You find that your machines, in aggregate, easily diverge from your ideal vision, gradually decaying into chaos.
  • You find yourself doing the same thing over and over again. Shell loops and SSH can only get you so far, and mistakes lead to even greater divergence among machines.

You want the things that should be in common to machines to actually be consistent. You want to quit repeating yourself.

Here, we'll be discussing what to do about your situation.

1. Figure out how to install a consistent machine image, automatically if you can.

I'm using Debian preseeding because I use Debian, and FAI is too complex and hard to learn. FAI can manage machine configuration, and kind of needs to to some extent, but that overlaps with what I want Puppet to do (see below). Debian preseeding is "as simple as possible, but no simpler."

2. Learn the hierarchy of data.

Your infrastructure is divided into a number of machines. The data on each machine is divided into system and non-system data.

System data means installed programs, initialization scripts, and other stuff that comes from your OS distribution. System stuff is boilerplate. It's easy to replace this.

Non-system data is everything that makes a machine unique. It is further divided into configuration and local state.

Configuration is easy: parameters that define what a machine is supposed to do, and how it's supposed to do it. This includes the selection of installed software packages, and most of everything in /etc.

Local state is the non-configuration, non-distribution-supplied data that your applications need in order to operate. This would include the web root hierarchy of a web server, the mail spool and mailboxes on a mail server, or the zone files of a DNS server. Note that this includes both machine-generated state (mail spool directories) and human-generated state (web roots). The local state includes most of everything in /var.

3. Understand key goals when it comes to machine data.

System data can be thrown away and replaced easily, because you can reimage a new machine consistently.

Configuration data should largely be consistent across machines.

  • Some things will always vary: hostname, IP address, and so forth.
  • Some things you'll want to be standardized: NTP configuration, smarthost configuration for non-mail servers, OS package repository selection, a set of software packages you want to have available on all machines, and so forth.
  • Some things will be standardized, but only for certain classes of machines: web servers will always have Apache with certain configuration bits, mail servers will always run Postfix with other configuration bits—or whatever you prefer, of course.

A machine's configuration is important—of course you don't want to lose it. But we'll be managing configuration from a central location, so if it gets physically lost on a particular machine, we can recreate it with the configuration management system (discussed below).

Local state is the crux of a machine's data. Your web server just won't work correctly unless you have the right HTML pages in place. You cannot lose this data, so you'll back it up regularly. With the methodology described here, it won't be hard to make this automatic.

3. Implement a system for centralized, managed configuration.

You have an ideal vision of what the common configuration of your machines should be. You need the ability to express that configuration, as well as the parts that differ on each machine. The differences are conditional on certain variables: what's the MAC address or the hostname of the machine? What's the role or class (web, mail, etc.) that I've assigned to it? And so forth.

You need to be able to express this information in one place. When you make a configuration change, your running machines will synchronize to reflect that change.

A configuration management system should include:

  • A procedure to express configuration in this fashion, and
  • The tools for your machines to apply the configuration.

Puppet and Cfengine are contenders for this position.

Add a client for Puppet or Cfengine to your standard machine installation image. Your new machines will automatically configure themselves.

  • Installing a machine from your installation image results in a plain, stripped-down configuration.
  • The configuration management system updates the new machine's configuration to whatever it needs to be, as you define in the managed configuration.

You'll also want to log, timestamp, and document your changes to the managed configuration. You'll want to see which of your admins made what change, and why. Which files were touched? What were the lines that were altered? What did the file look like before the change?

Applying a revision control system—such as Git, Darcs, or Subversion—to your managed configuration makes this possible. When you do this, you have a complete history of configuration changes, and can roll-back changes that turn out to be mistakes.

4. The golden rule.

Now that you have a configuration management system in place, follow our golden rule for infrastructures:

Never, ever change the configuration data or system data of a machine in a way which is outside the control of your configuration management system.

(...also known as "cowboy sysadmining.") It's very tempting to do, because it's the quick and lazy thing. It's also quite common in many (if not most) IT shops. However, this causes long-term damage.

A violation of this rule could be considered a defect or an error in the machine. When you break this rule, a machine diverges from your ideal vision. One of two things can happen:

  1. The configuration management system detects the error and corrects it, undoing your work.
  2. The error goes undetected, and the machine is permanently different from all the other machines in your infrastructure.

Possibility #2 is the most dangerous of the two. In this case:

  • The machine behaves unpredictably. It's different from all the other machines of its type, so it's hard to reason about its behavior.
  • Personnel trained to Do The Right Thing by using the configuration management system will probably not be aware of the variation. There's no automatic documentation or audit trail, like there is with a revision-controlled configuration management system.
  • If the machine needs to be replaced, the new machine will not carry this change, since the change isn't tracked. So, the replacement of a machine is now a dangerous and risky undertaking.

I would go so far as to say it's not worth doing configuration management if you won't stick to this golden rule. After all, you wanted predictability in your infrastructure, right? You wanted to keep the important stuff in common, right?

If you find yourself with a compelling reason to compromise on this, think carefully about how to do it safely. Develop a procedure comparable to how you would update the managed configuration, to make sure that your infrastructure doesn't gradually descend into chaos once again.

5. Reap the rewards.

Need to add a new web server? It's as simple as:

  • Imaging a machine,
  • Updating your managed configuration to say "the machine with MAC address <foo> has IP address <bar>, and it's a web server,"
  • And letting the machine configure itself.

Now just drop in your web content.

Need to replace the failed hard disk of a critical server? This, too, should be pretty easy:

  • Image a machine.
  • Restore the backup you made of the local server data onto the new machine.
  • Update your managed configuration to say "this particular critical server no longer has the MAC address <foo>. Now it's <bar>."
  • Watch the machine configure itself.

If you did everything right, you'll see your replacement machine come up and take off running. Just like that.

Welcome to the future.

A. "But that would be putting all my eggs in one basket. And then someone pwns my basket."

It's true. Implementing a configuration management system on a centralized server is a security risk. That's because each of your managed machines will do whatever your configuration management server tells them to do, automatically. If a bad guy cracks your management server, they could change the configuration to include "please wget r00tsh3ll-3.1.337.tar.gz from my web server, untar it to /tmp, and move this executable file to this system binaries directory, and run it."

But this is a risk that can be managed. And if you manage it well, it's probably less risky than what you're doing now.

You have to log into your servers for maintenance somehow. In non-managed situations, that "somehow" is probably "with SSH from a system administrator's workstation or laptop." If a cracker gets access to the admin's computer, they can spread their compromise to each other machine that it has access to.

With a centralized, formal configuration management server, you'll be able to lock that server down and make sure nobody's visiting Flash-laden porn web sites on the critical server.


See also: http://www.infrastructures.org/

Wednesday, October 29, 2008

10:43AM - Slashdot eternal redux

Every year:

  • "Digital Dark Age" – preservation of information on aging media
  • The Internet is Broken – lack of authentication/trust considerations in major protocols

Every 3 months:

  • Microsoft is doing something – how can we make them look bad and accuse them of monopolization?

Every 4 years:

  • E-voting – it's about more than just killing puppies!

What am I missing?

Current music: Rob N Tug / Fabric 30 / Lifelike and Chris Menace Discopolis: Defected

Monday, September 22, 2008

8:33PM - Yost serial wiring

Warning: only the most pathologically geeky need continue.

Today I got my serial situation sorted out.

I use RS-232 (serial ports) for machine management on an out-of-band medium. If the network is unavailable and I can stick a serial cable into some machine, I can log into it, poke around, fix whatever needs to be fixed, and watch most bootup messages (after POST, beginning with the bootloader), all without needing a video signal or a keyboard. (Expensive motherboards and expensive add-on cards will also capture BIOS-printed text to a serial port; of course Suns and serial-equipped NewWorld Macs do this as a matter of course. I think the most Enterprisey versions of Windows support a serial console, but I don't have any experience with that.)

That's handy, because video only goes so far. If I have a serial cable between machine X and machine Y already, I can log into and monitor machine Y by SSHing into machine X, nomatter where I am physically.

It's a little known secret that serial cables you'll find at RadioShack or Best Buy are outrageously overpriced. Actually, it's a well-known fact. Today I received my order of RJ45 to DB-9 and RJ45 to DB-25 adapters, 34 in all. That plus the abundance of free cable I'm sitting on and admittedly more time than I was expecting adds up to hundreds of dollars worth of serial cable. At whatever arbitrary lengths I need. The whole thing came to under $25 shipped from monoprice.com.

Here's how it works:

  • First I wired the adapters I'll be using. These take an RJ45 plug (like Ethernet) and give you some sort of DB-9 or DB-25 connector (like most serial ports use).

    The wiring is fixed per device, and considered a semi-permanent fixture. Therefore, I screw each of the adapters into each port that they'll live on for the foreseeable future. Even if I'm not using that port now; even if I change which other device a cable on that port runs to, the adapter never changes.

  • Then I cut off lengths of 8-conductor cable. It can be just about anything; I have mostly Cat-5 lying around (as do most people (that have read this far)), but I also happen to have some 4-line telephone cable, which is non-twisted and therefore useless for even 10BaseT. Perfect for this.

    The pinouts I use for the cables are always the same for this system.

  • Voilà! I plug any cable in-between any two devices with these adapters hanging off of them. (It turns out the tabs on RJ45 connectors are exponentially more convenient than the thumbscrews of DB-9 and DB-25.)

Did you notice how I didn't mention anything about null-modem cables? Under this scheme, every cable is a null-modem cable, and every adapter is wired for the kind of device it connects to, and expects this kind of cross-over/null-modem cable. So servers and terminals are wired one way; modems are wired another. That's my favorite part about the scheme. As long as you stick to it, you don't have to worry about whether a particular cable is a null-modem cable. It's quite elegant.

What is the scheme? It's the "Yost Serial Device Wiring Standard," 21 years old now. Anyone that's doing anything with serial needs to be using this. It's loads more convenient and cheaper than ready-made serial cables.

So, instead of scrounging for one of the two serial cables I own, I've now got five cables permanently in-place. One of them is 25 feet long, strung over a doorway, instead of 10 feet long, with a bulky DB-25 null-modem in the middle connecting two segments and strewn about the floor, ready to be tripped over. I can SSH into my most trusted system to access the console of any other in my closet.

And, I've now got a UPS cable. Apparently APC considers their cables business (WTF?) to be Serious Business, and protects their trade secrets (pinouts) vigilantly.

Whatever.

I found this diagram from the makers of my UPS software.

Following this pinout, I was pleased to find that I can make Yost DB-9 M adapters for my APC UPSes and use the same serial cables with them. Assuming standard coloring (which is also the coloring as is used in the Yost standard), cut off the pins for brown, orange, blue, and white. Solder the brown and orange leads together, and (separately) blue and white. Cut off green's pin and solder it onto red's pin (or vice-versa) as you would any other Yost adapter. Then push the black pin into position 1 on the DB-9 connector, the yellow pin into position 2, and the red/green pin into position 9. There's your $30 cable.


Needless to say, the biggest downside of going this route was that it took me about 4 hours to wire just 23 adapters. But that's a one-time cost; making new cables (as I need them) takes just a few minutes.

The next-biggest downside was that the adapters I purchased had the pins pre-crimped on the leads. Which was helpful, considering I don't have the pins or crimper for these adapters, until I realized that for each adapter, I'd have to cut off the pin of one wire, strip that wire, and solder it onto the pin of another wire: signal ground, which is green and red on the RJ-45 side, connects to just one pin on the DB-9/DB-25 side. That's what took most of the time. And mashing those now-oversized pins into the connectors.

But I'm still pleased, and glad to get this years-long todo item crossed off.

Monday, July 28, 2008

5:37AM - Common Lisp exercise #12

Fill in the blank:

(with-simple-restart (________ "We'll do it live!")
  ...)

Thursday, June 19, 2008

1:11PM - Correctness doesn't matter with Python

c.f. Issue 1085283. Python's binascii module wouldn't correctly encode binary data to MIME Quoted-Printable format. Python's quopri module depends on binascii's implementation of Quoted-Printable.

It took two years to fix the two of the buglets, but they still refuse to fix the other: correctly encoding octet 0.

Python is supposed to be a high-level language, but their use of C has led to yet another stupid bug and they refuse to fix it. This is frustrating. I'm going to have to copy-and-paste the backup Python implementation of Quoted-Printable out of quopri just to do the right thing.

Wednesday, April 30, 2008

11:30AM - En Route now supports GTFS

En Route now supports Google Transit Feed Specification for transit schedule times and vehicle paths, and my STA→En Route Spokane conversion scripts now output data in GTFS format.

This means:

  1. I have an external data model instead of my ad-hoc STA data formats. Maintenance becomes an order of magnitude easier.
  2. I can submit STA transit data to Google to include Spokane Transit with Google Transit's trip planner.
  3. Once I start importing street GIS data from OpenStreetMap, it becomes possible for other transit systems to run on my trip planner.
    • I can import data from agencies that make their Google feeds public, like Portland's TriMet.
    • I can market my trip planner to the few and possibly growing number of agencies that do Google Transit, but don't have their own trip planner. They get a trip planner for only the cost of the software, since they already put together the data (which is the expensive part).
  4. I can use standard GTFS tools to view and manipulate STA transit data. Like schedule_viewer:

Current music: Sasha & John Digweed / Northern Exposure / 2: 0°/South / Underworld: Dark & Long

Wednesday, March 5, 2008

7:00PM - En Route Systems

I'm commercializing my trip planner: say hi to En Route Systems.

Friday, February 8, 2008

3:13AM - More programmer's hubris

I do like me some acrid meta-commentary.

If like me you read too many programming blogs, then the following might ring a few bells:

It turns out there are Two Types of Programmers

Programmers like me.

and

Programmers who aren't like me.

It can only get better. (Via [info]figg on [info]misc_tech.)

Sunday, January 27, 2008

9:08PM

The Spokesman-Review wrote about En Route Spokane (my trip planner) in their 2008-01-21 issue, on page A10, in the technology section. You can read the article on the technology section's weblog.

Saturday, January 19, 2008

1:34AM - On CSS and non-trivial layout

I really hate it when CSS fanatics decry the use of tables for non-tabular data (read: for layout). And I'm having big problems determining why I shouldn't use tables for my widget layout.

I'll say it again: I'll use tables for layout. Tables made of ground-up puppies.

I fail to see how tables made with boiled puppy grounds is any better than a complicated div structure 14 elements deep. I like CSS and all, and will use it to style my puppy-table, but which is more "accessible" or "semantically pure": 14 nested divs or a table? (Hint: neither.) And which will take 10 hours to get right, BEFORE checking it with MSIE? (Hint: the former.)

I'm sure there's a way to do this with CSS, but it's simply too hard. The table abstraction for layout is much simpler to understand. Would table-based layout be so awful if there were a separate element like table called layout?

Will update with final resulting table so you CSS folks can do my homework for me.

Friday, January 4, 2008

6:47AM - Diversions: character Mandelbrot in Common Lisp

Mandelbrot in Common Lisp, rendered with textual characters:

Snoopy swearing

The code is even uglier. )

Friday, December 21, 2007

3:31PM - Trip Planner: addresses, pretty graphics, and more

Hi.

(I'm going to be writing a lot about Trip Planner in the time to come, so bear with me.)

It now supports address geocoding, which means you can directly enter addresses, instead of street intersections. For addresses and intersections, it will correct misspellings or variations on spelling in street names.

It now has a professional layout with complete content, such as an introduction, description of how it works, and a description of how to use it effectively.

And then there's Trippy, the adorable anthropomorphic bus that every trip planner needs:

The map pane has also been enhanced, with bus stop icons and different line coloring, depending on whether you're walking or riding the bus:

  1. Gives you step-by-step bus and walking directions.
  2. Points your bus stops out on the map.
  3. Finds the best stop for your destination.
  4. Factors in your walking time and gives you a breakdown of time.

Sunday, December 9, 2007

12:49PM - plug.self.gratuitous: Trip Planner ready for public consumption

Trip Planner is a simple web application to help you figure out how to take the bus between one location (where you're at now) and another (where you'd like to be). It serves Spokane County and the Spokane Transit Authority service area.

  • Quickly and easily tells you how to get there. No searching through schedules or deciding between one route and another.
  • Finds the nearest stop to your destination and points* it out on the map.
  • Tells you when it's faster to walk than to wait for the next bus.
  • Maps out your walking path and factors in the time it takes to walk, to make sure you're there on time.
  • Finds shortcuts. If you can get there faster by transferring to a different route—even if it's a few blocks away—Trip Planner hones in.

Trip Planner is nominally ready. (Nobody told me 75% of the time would be spent in data import and massage.) Most of the glaring flaws have been polished off, it's moderately quick, and now shows you a nice non-polling Web 2.0 Loading page while it computes trips.

At this point, I'm committing to not doing development/data-import changes on the live site (my workstation); this means it's safe to use! If you tried it before and it was flaking out on you, I should have known better. It works now.

If you're in Spokane, WA, you're encouraged to try it out; even if you think you know how to get somewhere by bus, Trip Planner might find a shortcut for all but the simplest trips. Also, for you drivers, reducing your oil consumption has never been more fashionable.

If you've taken a look at Trip Planner and don't think it would do what you want from it, or if it starts giving you flak, please send me feedback.


Trip Planner is written in Common Lisp. Major software it uses to get stuff done are Postmodern (PostgreSQL interface, used here for PostGIS), Hunchentoot (web server), and CXML (XML library, for XHTML document generation).

Trip Planner performs searches on a graph modeling Spokane County streets and bus routes.

Edges have both a generalized weight/cost function and a time-to-traverse function; usually, the former is equal to the latter, but the cost of walking is 1.5 times the time it takes to walk, by default. Bus stops are vertices, as are the route-vertices that define the path a bus takes. Edges from bus stops to route-vertices are dynamic: their traversal time depends on what time the search visits that stop.

Dijkstra's algorithm is trivially modified to take the "leaving-at" time as a search parameter, and to take traversal time into account during the search. (This is actually not a correct algorithm for finding the minimum-cost path in a time-dependent network, but does the right thing about 95% of the time; more on that later.)

* Lie Now implemented.

Sunday, December 2, 2007

12:08AM - Firefox 3

Firefox 2 was a terrible memory hog. It would unusably amass 300-400MB of resident RAM in my daily usage over the course of about a couple weeks. This made it unusable to me.

Konqueror was zippier and more memory-efficient, but would crash about once every month or two. This made it unusable to me.

Opera goes back and forth in the browser history on mouse-wheel left-scroll/right-scroll events. I couldn't find a way to disable this. This made it unusable to me.

To me, things weren't looking bright for the top three Linux browsers.

Firefox 3 beta 1, so far, has been more responsive than Konqueror and much leaner than Firefox 2—about 100MB of resident RAM over the course of four days, with about 10-20 tabs open on average.

Only 32-bit Firefox 3 beta 1 builds are made available by Mozilla. I wanted a 64-bit (x86-64) build, which should be faster, at the cost of some memory. An x86-64 build would also correctly integrate with my system, versus my 32-bit chroot environment: timezone, MIME helper applications, fonts. I thought building Mozilla would be a big, involved, treacherous process, but it turned out not to be.

Grab the source. Absorb this documentation. This is the .mozconfig I used:

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-@CONFIG_GUESS@
. $topsrcdir/browser/config/mozconfig
ac_add_options --enable-optimize='-O3 -march=athlon64'
ac_add_options --enable-default-toolkit=cairo-gtk2
ac_add_options --disable-debug
ac_add_options --enable-official-branding
ac_add_options --prefix=/usr/local/stow/firefox-3.0b1

I now have a stable, memory-efficient, 64-bit Firefox build for Linux, and all three integration problems above are solved.

Tuesday, October 23, 2007

12:59AM - On Virgin Mobile

Virgin Mobile USA is a pretty reasonable cell phone company, if you don't use your cell phone often. It's especially nice if you primarily use your landline; for less than a dollar a month, Qwest can forward calls to your landline that you don't answer (after, say, 3 rings) to your cell phone, minimizing your cell minutes while you're at home. (In an ideal world, I'd love for both phones to ring simultaneously until one of them is answered. Some day I'll do that with Asterisk.)

If you use less than 88 minutes a month, the $0.18 TALK plan is the cheapest: that's the flat rate. No monthly fee. If you use more talk time, $6.99 TALK wins: $6.99 is the monthly fee, and you pay $0.10/min on top of that. All the other, more expensive plans where you puchase a set number of minutes each month aren't worth it: if you use just the right amount of airtime, you might come out ahead; use less (by a small margin) and you pay for more than you need; use more (by a small margin) and your overage is $0.25/min.

Correction: Many of the bucket-of-minutes plans include extra or unlimited night/weekend minutes. That's kind of important, but I didn't consider that for my comparisons.

I'm only writing this because I worked out this graph a few months ago and decided to post it. See the margins?

There are two down sides. There's an effective minimum commitment to spend at least $6.66/mo on average, no-matter what your plan: you are required to "top-up" (make a payment) of $20 every 90 days, even if you have a cash balance. This is cumulative, so you don't lose your old balance. Also, the marketing/customer service image: "yo yo, this is Simone, thanks for choosing Virgin Mobile, home-skillet."

But I really love VMU because you use it strictly pre-paid. They'd love to have your credit card information, but they don't need it—or any credit checks—you can pay in cash if you are so inclined. They'd also appreciate using your real name with their service, surely, but there's no compelling reason to use it, if you're so inclined. (It worked out well for Jason Bourne!)

Finally: gnuplot can be pretty fun. )

Wednesday, September 19, 2007

4:34PM - Blah blah trip planner

Beta is the new hotness, so here it is, in its half-finished glory: Spokane Transit Trip Planner. Also, I need cheerleaders.

It's actually useful for planning real trips right now, but it's slow, and it doesn't track all busses, so results will not necessarily be optimal.

On the bright side, it's much prettier and implements "get me there by X o'clock."

Why use the Trip Planner?

  • You're lazy. Thinking is work. Trip Planner makes the connections for you.
  • Spokane busses are subtly timed to frustrate you and your attempts at an optimal trip. Trip Planner breaks through the fog.
  • Ever wonder if it really is faster to walk? Give in to your OCD.
  • Very soon now, Trip Planner will rule the world with an iron fist. Wouldn't it be better to get to know it now?

Riding the bus was never this cool.

Current mood: Sure

Monday, July 30, 2007

1:14AM - Public transit trip planning

So. I believe an update is in order.

On my own time, I've been writing a trip-planner for the bus system in Spokane. What this means is, you plug in a starting point, an ending point, and either what time you're leaving, or what time you want to be there by. The trip-planner tells you how to get to the nearest stop, which buses to take, which transfers to make.

The trip-planner is also smart enough to tell you if it would be faster to get off bus X, walk a few blocks, and catch bus Y. The trip-planner plans the most optimal trip. The trip-planner is infallible.

I'm writing this in Common Lisp. Represent.

The street data comes from the City of Spokane. The bus scheduling and path information came from a contact at the Spokane Transit Authority. Dijkstra's algorithm, while accounting for the time-of-day to get to any known vertex, is used to find the shortest route. When I enter the proper optimization phase, I'll consider A*.

The bus route data is very labor-intensive to import. It'll be at least a few days before I can get all the routes in. Therefore, and also because the trip-planner may go down at any moment, display crazy incorrect data, and slow down my workstation immensely (10-60 seconds per query!), it is not yet public. Here's a screen-shot and a static copy of one particular trip, though:

The STA says they'll have their own trip-planner opened up Real Soon Now. I hope to beat them to it.

Current mood: hopeful
Current music: Orbital / The Altogether

Sunday, July 22, 2007

6:58AM - Programmers and their hubris

Read Programmers Need To Learn Statistics Or I Will Kill Them All. It's worth it for this pile of gold alone:

I’m sure you’ve all thought about it at some point. “Imagine you’re on a planet where everyone was blind, and you’re the only one with sight. How would you describe the sunset?” It’s commonly something done as an exercise in high school and it’s retarded. If this planet were populated with programmers though it would be really interesting.

Zed: Wow, the sunset here is a brilliant blue.
Joe Programmer: No, you’re fucking wrong it’s red asshole.
Zed: Uh, it’s blue. Guy with vision here. Remember?
Frank Programmer: Yeah, it’s red man. You’re an idiot. See, I can hear the way it makes the air move so I know it looks red.
Zed: Look! I’m the one who can see! It’s blue.
Joe Programmer: I have written huge web applications in every language and even programmed the original VAX. I know that sunset is red.
Frank Programmer: It’s red because of the heat it generates on my arm. Yes. I’m sure that’s it.
Zed: Fuck! Fuck! I have eyes! You do not! See!? No?! Exactly! Because you can’t fucking see because you have no fucking eyes! Arrggh! I’m going to get a burrito.
Joe Programmer: That guy is such an asshole.
Frank Programmer: Yep. Still sounds red to me though.


On another note, try my Spokane Transit trip planner—oh, it doesn't yet:

  • Tell you which busses to take,
  • Accept addresses—intersections only,
  • Account that it takes longer to walk uphill than downhill,
  • Account that it takes longer to cross busy streets than other ones,
  • Let you cut across parks,
  • Let you tell it to skip alleyways after dark,
  • Have a Glitz 2.0 user interface,
  • Run reliably,

But it will tell you the mathematically shortest walking path, assuming a naïve model of walking! The rest (2%!) comes soon.

Current music: John Digweed / Renaissance: The Mix, Part 2 / Quivver / Twist And Shout

Sunday, July 1, 2007

4:13AM - Trip-planning

I am under the age of 10 or otherwise find alleyways to be extraordinarily fun.
(Link may break at any moment; under interactive development with SLIME.)

By the power of special variables and edge-weighting... Yes! There we go. Going in a big fucking loop just to stay on alleyways.

I find that to be inexplicably hot.

Tuesday, June 12, 2007

5:32AM - Google Maps API and XHTML

Everyone knows Google Maps. Google also has their stuff opened-up to third-party developers looking to visualize maps. You load magic Javascript from Google's servers, say you'd like to create a GMap (or whatever it's called) object, rendering to some element, and now you've got your own map. You can ask the map component to go to some location, or to overlay a sequence of connected line segments over the map, or do any of several surely much cooler things.

Old news.

Google encourages you to use XHTML, rather than HTML, for your Maps-utilizing web app. But they didn't seem to think that decision through very well—if your "XHTML" page is sent as a "text/html" document, all is well. But then your browser reads this jibberish in as tag-soup HTML. XHTML should be served as "application/xhtml+xml".

Google Maps uses legacy techniques, like document.write, in their Javascript, which break in Firefox—IF the document it came in was marked as pure-XHTML "application/xhtml+xml". And can you really put Firefox in the wrong there? You have to lie about the Content-Type of your page and say it's "text/html" for anything to work.

So, what was the point of encouraging XHTML again?

(Crossbitching.)

Current music: Orbital / Untitled (The Brown Album) / Impact (The Earth is Burning)

Friday, June 8, 2007

12:54PM - More showing off

CL-USER> (let ((monroe (street-by-name "N" "Monroe" "St" "" :error))
               (indiana (street-by-name "W" "Indiana" "Av" "" :error))
               (rowan (street-by-name "W" "Rowan" "Av" "" :error))
               (queen (street-by-name "W" "Queen" "Pl" "" :error)))
           (let ((isect1 (first (intersections monroe indiana)))
                 (isect2 (first (intersections rowan queen))))
             (condense-path (encode-path (shortest-path-by-distance isect1 isect2)))))

((:START
  #<STREET-VERTEX at #<POINT 47.6749deg,-117.4267deg> on Northwest, Monroe, Indiana>)
 (:WALK
  #<STREET-VERTEX at #<POINT 47.6749deg,-117.4267deg> on Northwest, Monroe, Indiana>
  #<STREET "W" "Northwest" "Bl" "">
  #<STREET-VERTEX at #<POINT 47.6847deg,-117.4479deg> on T J Meenach, Northwest, Cochran>
  1.9214540230061514d0)
Read more... )

It's like Google Maps, but for pedestrians. Maybe a little less glitzy?

Current mood: productive

7:29AM - On GIS, and how I'd like to hug the world.

CL-USER> (let ((monroe (street-by-name "N" "Monroe" "St" "" :error))
               (indiana (street-by-name "W" "Indiana" "Av" "" :error)))
           (intersections monroe indiana))
(#<STREET-VERTEX at #<POINT 47.6749deg,-117.4267deg> on Northwest, Monroe, Indiana>)
CL-USER> (let ((mission (street-by-name "E" "Mission" "Av" "" :error))
               (hamilton (street-by-name "N" "Hamilton" "St" "" :error)))
           (intersections mission hamilton))
(#<STREET-VERTEX at #<POINT 47.6718deg,-117.3965deg> on Hamilton, Mission>
 #<STREET-VERTEX at #<POINT 47.6717deg,-117.3965deg> on Hamilton, Mission>)

(Mission has a big island going down its middle, up to at least this intersection. Two physical roads means there's two physical intersections.)

This means I've been at least moderately successful. The best part: I didn't even have to screw around with GIS libraries. MapInfo MIF format is super-trivial to read and use, and ogr2ogr took care of de-projecting the map data and converting it to that format.

Current mood: pleased
Current music: Fluke / Progressive History XXX / Electric Guitar

Wednesday, June 6, 2007

10:00PM - On GIS, TIGER/Line®, and how I'd like to stab the world.

The U.S. Census Bureau publishes public-domain data for things like streets and landmarks in the United States. So I downloaded the data for Spokane County and rolled up my sleeves.

Try reading this. Page 30. There's about 24 different record types that express tiny fragments of street data.

This is what I want:

  • A set of streets. Each street has:
    • A name.
    • A set of coordinates that define the path the street takes.
    • Address ranges.

Bonus points for:

  • A set of explicit indications of where streets intersect.

That's. It. Just getting that simple data out of this hairy mess will be a months-long project.


My application is just for Spokane. The city publishes some GIS data for local streets. I should try those files.

Current mood: blank

Navigate: (Previous 25 entries)

Advertisement