codemachine

Shell: While Loops and Variables

A common idiom in shell scripting is to tweak the value of IFS (internal field separator) while reading lines of input:

1
2
3
4
while IFS= read -r line
do
    # ...
done

This leads to questions about IFS itself and the -r flag, and there are plenty of good answers out there. I’d like to focus, however, on the syntax of IFS= and it’s location in the above line.

Shell variables can be assigned and referenced:

1
2
3
> var=A
> echo $var
A

Sometimes you want to set a variable for the duration of a single command:

1
2
3
4
> name=Bob bash -c 'echo $name'
Bob
> echo $name

At first glance, you might expect the first line of our while loop to look like:

1
IFS= while read -r line

but this causes a syntax error. In the name=Bob example, our entire line consisted of a single simple command, defined as

a sequence of optional variable assignments and redirections, in any sequence, optionally followed by words and redirections, terminated by a control operator.

The while loop, however, is a compound command, with the format:

1
2
3
4
while compound-list-1
do
    compound-list-2
done

with compound-list-1 being a sequence of lists. A list is defined as

a sequence of one or more AND-OR lists separated by the operators ‘;’ and ‘&’.

with an AND-OR list being

a sequence of one or more pipelines separated by the operators “&&” and “||” .

A pipeline, in turn, has the format:

[!] command1 [ | command2 …]

It feels like we’re going in circles, but the long and short of it is that we can view

1
while IFS= read -r line

as

1
while simple-command

Note that this means we’re setting IFS to a temporary value only during the read command, not during the body of the loop.

To make this a little more concrete, here’s a script I’ve called while-vars.sh:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
var=A
i=1

tester() {
  echo "var in tester: $var"
  (( $i > 0 ))
}

echo -e "var before loop: $var\n"

while var=B tester; do
  let i-=1
  echo "var in loop: $var"
done

echo -e "\nvar after loop: $var"

The result:

1
2
3
4
5
6
7
8
> bash while-vars.sh
var before loop: A

var in tester: B
var in loop: A
var in tester: B

var after loop: A

Debugging Etags

I’ve been using ctags to navigate the codebases I work with in Vim for a couple years, largely thanks to a blog post by Tim Pope where he describes how to use git hooks to keep your tags up-to-date. Omitting a few details, the script I use boils down to this:

git ls-files | ctags -L - -o ".git/tags" --tag-relative=yes --languages=-javascript,sql

On a large Rails app I’ve been working with, it takes 1 second and generates a 7MB tags file.

More recently, I started playing around with Emacs, and I’ve been looking for a way to port my tagging strategy over to etags. There are a few ways you can generate etags. Emacs comes with its own etags executable, but the more featureful implementations of ctags can also generate them.

I’ve been using universal-ctags, which picked up where exuberant-ctags left off a few years ago, so I added -e to my tagging command and gave it a whirl. 90 seconds later it handed me an 8GB tags file.

At first, I thought this must be a problem with the etags format itself, but when I tried Emacs’ own etags executable, it took 13 seconds and produced a 3MB file. Next, I tried exuberant-ctags, which took 2 seconds and produced a 1MB file.

Narrowing in on the problem further was an interesting process that called on several shell-scripting concepts and tools, including I/O redirection, sub-shells, and awk.

Benchmarking

First, I needed to gather some data profiling each file’s contribution to execution time and tags-size. I wanted something like,

some_file.rb <- source-file
0.004        <- processing time (seconds)
8159         <- source file-size (bytes)
13389        <- tags file-size (bytes)

another_file.rb
0.002
345
4859

...

I wrote a shell script to iterate through the files, generating tags for each and recording the time taken and resulting tags size, appending these stats to a log file.

1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/sh

for f in $(git ls-files)
do
  ( echo "$f"
    ( TIMEFORMAT='%R'
      time ctags -e -o tmp.TAGS --tag-relative=yes --languages=-javascript,sql $f )
    ( ls -l $f
      ls -l tmp.TAGS ) | awk '{ print $5 }'
    echo ) >> etagging.log 2>&1
  rm tmp.TAGS
done

I’ll break this down a bit. First, we run git ls-files in a sub-shell to generate a list of files to loop through.

for f in $(git ls-files)

For each file, we run some commands (echo, time, ctags, ls) and redirect their output to a log file. This could be done like,

run a command >> etagging.log
run another command >> etagging.log
run one more command >> etagging.log

but using a sub-shell lets us capture it all in one go:

( run a command
  run another command
  run one more command ) >> etagging.log

Time and redirection

Using the time builtin to benchmark tags creation introduces a little more complexity. We only want the real (perceived) time, so we need to set the TIMEFORMAT shell variable accordingly.

Since time broadcasts its results through stderr rather than stdout, we can’t rely on just >>, which redirects stdout, or our time data would print to screen rather than being recorded. So once we’ve redirected stdout to the log-file, we need to redirect stderr there as well.

( run a command
  run another command
  run one more command ) >> etagging.log 2>&1

You could read this as,

run commands in a sub-shell, send the sub-shell’s standard output to the log-file, and send its standard error data to the same location you’re sending the standard output (i.e. the log-file)

The digits in 2>&1 are file descriptors, indicating stderr (2) and stdout (1). A running process has 3 standard I/O streams through which to communicate. As a source of input, it has stdin (0); when it’s ready to broadcast some output, it generally sends that to stdout (1), but some output is semantically different (e.g. error messages), and it’s useful to have a separate stream for such data. This is where stderr (2) comes in.

If you’re familiar with pointers in C, you could think of &1 as the location of stdout, so 2>&1 says to redirect stderr to the same place that stdout is headed. The order of redirection operations is significant. If we’d written,

( run some commands ) 2>&1 >> etagging.log

we’d be directing stderr to the same location as stdout and then directing stdout elsewhere. It would be like saying,

Hey stderr, ask stdout where it’s currently headed. Go there.

Hey stdout, change of plans: I want you to go to this log-file.

Space and a little awk

We also want to record the size of the source-file and the size of the tags-file. We use awk to extract these sizes (in bytes) from the 5th field of long-format ls file-listings:

( ls -l $f
  ls -l tmp.TAGS ) | awk '{ print $5 }'

Sorting the results

Once I had the profiling data, I wanted to sort it by time and tag-size to see which files were causing the big slowdown and eating up my diskspace. The sort command expects newline-separated records with whitespace-separated fields. I used awk to translate the results to the horizontal format sort expects.

awk 'BEGIN { RS=""; FS="\n" } { print $1, $2, $3, $4 }' etagging.log

The BEGIN block to sets up awk’s RS (record-separator) and FS (field-separator) variables, allowing it to correctly identify each record. The next block defines the actions to take on each record. In this case I just want to print each of its fields on a single line. Piping this into sort generates results sorted by time:

awk 'BEGIN { RS=""; FS="\n" } { print $1, $2, $3, $4 }' etagging.log | sort -nrk2 > etagging-time

Here I’m telling sort to sort numerically, in reverse order, treating the 2nd field as the sort-key. I did the same for tag file size, the 4th field:

awk 'BEGIN { RS=""; FS="\n" } { print $1, $2, $3, $4 }' etagging.log | sort -nrk4 > etagging-size

Identifying the Culprit

Here’s what floated to the top:

$ head -n 3 etagging-time
app/models/something_big.json 108.024 273517 8084569921
vendor/assets/stylesheets/bootstrap/bootstrap.min.css 2.159 118153 288792277
app/models/appointment.rb 0.252 10096 2481

$ head -n 3 etagging-size
app/models/something_big.json 108.024 273517 8084569921
vendor/assets/stylesheets/bootstrap/bootstrap.min.css 2.159 118153 288792277
vendor/assets/stylesheets/intlTelInput.css 0.051 18194 5464144

The two top offenders, by both time and by size, were a large JSON file and a minified bootstrap stylesheet, neither of which I had much interest in tagging. The JSON file outshadowed everything else by miles, and that shed some light on the performance disparity between universal-ctags and the other tagging libraries: only universal-ctags had JSON support, so it was the only one tagging JSON at all.

A quick fix was to add JSON to the languages I exclude from tagging, but it begged the question, why didn’t ctags exhibit the same problem as etags?

The hint was hiding in that minified stylesheet. The JSON file and the stylesheet had extremely long lines. Both ctags and etags include source line references, and these references get truncated to a reasonable length when generating ctags, but not when generating etags.

Conclusion

The team at universal-ctags was incredibly helpful in debugging this and helped turn a source of frustration into a learning experience. They were quick to respond and are looking into resolving the underlying issue. In the meantime, I’ve adjusted my command for generating etags.

git ls-files | ctags -L - -e -o ".git/etags" --tag-relative=yes --languages=-javascript,sql,json,css

Karabiner

A while back I stopped using “jk” to exit Vim’s insert-mode, turning instead to the mostly-useless Caps Lock. I set it to be Control, then used Karabiner to turn it into a dual-purpose Control/Escape. Typed by itself, it’s Escape; in concert with another key it’s Control. The boost in comfort and productivity has been huge.

Bringing Escape closer to home feels like a more sensible solution, and I’m no longer typing “jk” all over the place when my fingers forget they’re not in Vim. The productivity gains, however, are largely the result of having a Control key that’s so accessible. It’s opened up my use of control-modified commands like Vim’s autocompletion and the shell’s reverse-incremental-search quite a bit.

To set this up on OS X, first go to the Keyboard pane of System Preferences and change Caps Lock to Control.

caps-lock

Then use Karabiner to send Escape when you type Control by itself.

* karabiner preferences -> "Change Key" tab
* scroll down to "Change Control_L Key (Left Control)"
* check "Control_L to Control_L (+ When you type Control_L only, send Escape)"

escape

More Control

I recently took this one step further and turned my Return key into a dual-purpose Control/Return, giving me easy access to a Control key on either side of the keyboard.

return

Unix Know-how

I was working with MySQL queries that involved timezone conversion when I noticed that my local instance of MySQL didn’t recognize named timezones. Queries with named timezones were returning null, while those with numeric offsets from UTC were returning correct conversions:

> SELECT CONVERT_TZ('2014-01-01 12:00:00', 'America/New_York', 'UTC');
=> null

> SELECT CONVERT_TZ('2014-01-01 12:00:00', '-5:00', '+00:00');
=> 2014-01-01 17:00:00

I hadn’t loaded my system’s zoneinfo files into the mysql database. As per the docs, I used the mysql_tzinfo_to_sql utility to load them from /usr/share/zoneinfo:

$ mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql

The process failed before loading all the tables:

ERROR 1406 (22001) at line 38408: Data too long for column 'Abbreviation' at row 1

Now I could reference America/New_York, but not UTC, since the process had failed before loading that table. A coworker suggested I write the command’s output to a file so I could debug:

$ mysql_tzinfo_to_sql /usr/share/zoneinfo > debuggingfile

The debuggingfile contained many insert statements, and line 38408 revealed the problem:

INSERT INTO time_zone_transition_type (Time_zone_id, Transition_type_id, Offset, Is_DST, Abbreviation) VALUES (@time_zone_id, 0, 0, 0, 'Local time zone must be set--see zic manual page');

The 'Local time zone must be set--see zic manual page' value was too long for the Abbreviation column. I shortened it to 'unset', fed the file into mysql, and all was well.

$ mysql -u root mysql < debuggingfile

I was struck by the simplicity of this solution, and how a little unix know-how can demystify a problem.

Search & Replace

Performing a project-wide search-and-replace is a common task, and yet I still forget how to do it in Vim. While there’s not that much to it (build an argument list of relevant files and run a global substitution across them), I’ve had to look it up enough times to start wondering if there’s a better way. I ended up writing a shell function, as well as a Ruby-specific wrapper for it.

Now if I want to rename a function across my project’s javascript files, I can drop onto the command-line and run:

$ greplace **.js uglyFunctionName nicerFunctionName

Or, if I’m renaming a Ruby method:

$ rupl bad_method_name good_method_name

The Sauce

Using find, grep, and sed in concert, we declare which files to search, what to search for, and what to do with those files that contain a match.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
greplace() {
  if [ "$#" != 3 ]; then
    echo "Usage: greplace file_pattern search_pattern replacement"
    return 1
  else
    file_pattern=$1
    search_pattern=$2
    replacement=$3

    # This is built for BSD grep and the sed bundled with OS X.
    # GNU grep takes -Z instead of --null, and other versions of sed may not support the -i '' syntax.

    find . -name "$file_pattern" -exec grep -lw --null "$search_pattern" {} + |
    xargs -0 sed -i '' "s/[[:<:]]$search_pattern[[:>:]]/$replacement/g"
  fi
}

rupl() {
  if [ "$#" != 2 ]; then
    echo "Usage: rupl search_pattern replacement"
    return 1
  else
    search_pattern=$1
    replacement=$2

    greplace '**.rb' "$search_pattern" "$replacement"
  fi
}

Ingredients

The first thing greplace does is test whether it received the wrong number of arguments: [ "$#" != 3 ]. If so, we print a usage message and return an error code. Otherwise, we set some local variables with more memorable names than 1, 2, and 3.

Next, we find pathnames in the current directory (and subdirectories) that match file_pattern. Using find ... --exec <command> {}; lets us run a command on each found path, expanding {} to the pathname. Replacing ; with + will instead expand {} to as many of the found pathnames as possible, which allows us to feed all the found files as arguments to a single grep.

We grep the relevant files for search_pattern, restricting results to the names of files (-l) that contain a whole-word (-w) match. We also print a null-character after each filename in the results (--null), which will be useful as a delimiter in the next step.

The results of grep are piped into xargs -0, which constructs an argument list (recognizing the null-character delimiter) and feeds this list to sed for further processing.

We then use sed -i to edit each file “in place” (rather than writing results to stdout) without creating any backup files (''), which could be risky, but since I’m working with Git this seems reasonable.

The actual search-and-replace is simply a pattern substitution. The [[:<:]] and [[:>:]] delimiters restrict it to whole-word matches.

Caveats

A few things limit this function’s portability. For one, not all versions of grep recognize the --null flag. GNU grep uses -Z instead. Also, the -i '' syntax may not be recognized by all versions of sed (actually, from what I was able to gather, that syntax might be unique to the version bundled with OSX).

That being said, it would only take a few minor tweaks to get this working on a different system.

Faster Specs

Getting the full benefits of TDD requires fast-running specs. The feedback cycle is what makes the difference between a pleasurable “red-green-refactor” flow and an eternity of testing-tedium where the only reason you’re writing tests is so you be done writing them. While TDD is lauded in the Rails community, many large Rails apps suffer from slow-running test suites.

I’ve been working with a Rails app that has a couple of bloated, callback-ridden models. Much of the test-suite uses FactoryGirl, and generating test objects for those big models and their associations can slow things down to a crawl. So when a new feature came along, I took the opportunity to write some fast unit-tests in a different style.

Couch-Surfer

Imagine an app that logs the journeys of world-travellers (lots of them) as they couch-surf around the globe visiting homebody friends. Each traveller periodically sends a postcard to their next host to let them know how far off they are. We have a few persisted models: Traveller, Homebody, CouchCrash, and Postcard.

The Traveller and Homebody models are rather large, so I’ve abbreviated them here:

1
2
3
4
5
6
7
8
9
10
class Traveller < ActiveRecord::Base
  has_many :couch_crashes
  has_many :homebodies, through: :couch_crashes
  # and many more associations, validations, callbacks...
end

class Homebody < ActiveRecord::Base
  has_many :couch_crashes
  # and many more associations, validations, callbacks...
end

CouchCrash and Postcard are pretty small, despite their associations with the larger models:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class CouchCrash < ActiveRecord::Base
  belongs_to :traveller
  belongs_to :homebody
  has_many :postcards

  validates_presence_of :traveller, :homebody, :arrival_date
end

class Postcard < ActiveRecord::Base
  belongs_to :traveller
  belongs_to :couch_crash
  belongs_to :homebody, through: :couch_crash

  validates_presence_of :traveller, :couch_crash, :distance
end

Each visit, or couch_crash, is scheduled with an arrival_date. But these aren’t always accurate, as it’s hard to know exactly when the traveller will reach their destination. We’d like to add a feature that assesses the status of a visit as “far off”, “approaching”, or “in progress” based on the arrival date and available postcards. We won’t bother with a “completed” status since couch-crashers have been known to stick around forever.

For simplicity’s sake, we’ll say any visit whose arrival date is more than a week away is “far off”. Within a week of the arrival date, an “approaching” status requires a postcard from within 100 miles and “in Progress” requires one within 5 miles (I know, that’s a waste of a stamp). Otherwise, with either no postcards or only those over 100 miles away, the visit remains “far off”.

Approaching the spec

A spec for the “approaching” status using FactoryGirl might look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
describe CouchCrash do
  describe '#status' do

    context 'within 1 week of arrival date' do
      context 'with a postcard from 100 miles away' do

        it 'is "approaching"' do
          visit = FactoryGirl.build(:couch_crash, arrival_date: 1.week.from_now)
          postcard_100 = FactoryGirl.build(:postcard, distance: 100)

          visit.stub(:postcards).and_return([postcard_100])

          expect(visit.status).to eq(:approaching)
        end

      end
    end

  end
end

Using build rather than create should keep us from hitting the database. Stubbing the association between visit and its postcards should do the same. On the surface, this looks like a well-isolated, fast unit-test, but let’s take a closer look at the factories we’re using:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# spec/factories/couch_crashes.rb
FactoryGirl.define do
  factory :couch_crash do
    traveller
    homebody
    arrival_date 2.weeks.from_now
  end
end

# spec/factories/post_cards.rb
FactoryGirl.define do
  factory :post_card do
    traveller
    couch_crash
    distance 300
  end
end

It’s best practice to define your factories with the minimum set of attributes necessary for a valid object. You don’t want to set land-mines for the next developer that comes along and calls create. So the couch_crashes factory generates associated traveller and homebody objects. In doing so, it involves two of our most bloated models. Take a look at their factories:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# spec/factories/travellers.rb
FactoryGirl.define do
  factory :traveller do
    first_name "Yngwie"
    last_name  "Malmsteen"
    association :hometown, factory: :city
    luggage
    bicycle

    after(:build) do |traveller|
      pump = FactoryGirl.build(:bicycle_pump)
      traveller.bike_pump = pump
      traveller.inflate_tires
      traveller.pack_luggage
      traveller.buy_stamps
      # etc.
  end
end

# spec/factories/homebodies.rb
FactoryGirl.define do
  factory :homebody do
    first_name "Joe"
    last_name  "Stumps"
    spouse
    credit_score 400
    house
    couch
    car
    dog
    # etc.
  end
end

We’re also unintentionally hitting the database, as FactoryGirl saves both traveller and homebody in order to build the association. You can avoid this by specifying a build-strategy for the association:

1
2
3
4
factory :couch_crash do
  association :traveller, strategy: :build
  association :homebody,  strategy: :build
  ...

You’d also have to change the syntax in the associated factories:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
factory :traveller do
  ...
  association :luggage, strategy: :build
  association :bicycle, strategy: :build
  ...
end

factory :homebody do
  ...
  association :house, strategy: :build
  association :couch, strategy: :build
  association :car,   strategy: :build
  association :dog,   strategy: :build
  ...
end

It would be nice to avoid involving these large models any more than necessary, so let’s rewrite the spec with a different technique. Instead of using factories to generate complex test objects, we’ll use test doubles to stub out the context.

Test-doubles

Rspec’s double method returns a test-double — a dummy object that stands in for a more complex object from your production code. The double can be told how to respond to various method calls:

1
2
3
4
5
6
7
8
9
10
red_thing = double("thing")
# The argument (ie. "thing") is optional.
# It provides a name that test output can make use of.

red_thing.stub(:color).and_return("red")
# equivalent form:
red_thing.stub(:color) { "red" }

# Or, more concisely:
red_house = double("thing", color: "red")

The double only knows what it’s been told explicitly, and will raise an error upon receiving any unexpected method call. If you’re using Rspec 3, you can also use “verifying doubles”, which know what class of object they’re standing in for and will ensure that any methods being stubbed are actually present in the code.

Rewrite

While our spec should still read from the ground up, beginning with the context and arriving at an expectation, it can be helpful when writing to start with the expectation and work backwards. This is especially true when the context is complex. It also helps clarify what needs to be stubbed out, so let’s give it a shot.

1
expect(visit.status).to eq(:approaching)

What is visit? Just a test double with the right attributes:

1
visit = double("visit", arrival_date: 1.week.from_now, postcards: [postcard_100])

What about postcard_100? Just another test double.

1
postcard_100 = double("postcard", distance: 100)

Putting it all together, we have:

1
2
3
4
5
6
7
8
9
10
11
12
context 'within 1 week of arrival date' do
  context 'with a postcard from 100 miles away' do

    it 'is "approaching"' do
      postcard_100 = double("postcard", distance: 100)
      visit = double("visit", arrival_date: 1.week.from_now, postcards: [postcard_100])

      expect(visit.status).to eq(:approaching)
    end

  end
end

I initially wanted faster specs to enable a better TDD flow. A nice side benefit of writing these stubbed tests is that it illuminates the dependencies and coupling in the production code you’re working with and encourages better composition overall. FactoryGirl is still a wonderful tool, but it shouldn’t be the only one in your belt.

Vim Key-mappings

:map

In the land of Vim, most key sequences can easily be mapped to others. The basic syntax is map a b, which tells Vim that when you type a, it should act like b. Similarly, map abc wxyz would process wxyz when you typed abc, but let’s look at a more useful example.

You can use m to set a mark at the current cursor position, then jump to it later using the backtick (`) key. Take this buffer for example:

1
2
3
4
5
def penguify(being)
  Penguin.new(being.mass)
rescue NameError
  puts "Can't penguify massless being."
end

I’ll put my cursor on the N in NameError and type (in normal mode) mx. This sets a mark we can jump to by typing `x. This is nice, but the backtick isn’t the most comfortable key to reach for.

There’s a similar command using the single-quote. Typing 'x jumps to the first non-whitespace character on the marked line. Probably not as useful. Let’s map the more reachable ' to the more useful `.

On Vim’s command-line, enter: map ' `. Now both ` and ' will take us directly to our mark. Instead of ditching the single-quote’s original command entirely, let’s map the backtick to it with map ` '. But this causes a problem. Hit either ` or ' and you’ll get an error (E223: recursive mapping). We’ve mapped ` to ', which triggers `, which triggers ', and on and on.

:noremap

To recover, let’s remove both mappings with unmap ` and unmap ', to start fresh. Now instead of using map we’ll use noremap. Running noremap a b will map a to b but avoid triggering anything b is mapped to. So we can enter noremap ' ` and noremap ` ' to swap our keys without falling into a recursive pit.

map-modes

Depending on how you define them, your key-mappings will only apply in certain modes. The mappings we created with map and noremap apply in Normal, Visual, Select, and Operator-pending modes. Note the absence of Insert mode in that list — we’re not in danger of inserting doesn`t when we wanted doesn't.

The map, noremap, and unmap commands each have mode-specific variations. My .vimrc, for instance, has a mapping for line-completion in Insert mode:

inoremap <C-L> <C-X><C-L>

The <C-L> represents Control-L, and is case-insensitive (same as <c-l>). This makes line-completion less cumbersome without polluting modes other than Insert with the mapping. For more on map-modes, check out :help :map-modes. The map-overview (:help map-overview) is a good place to start.

key-notation

Vim uses a special notation for some keys. We saw <C-L> already. There’s also <Left>, <S-Left> (shift-left), <Space>, <CR> (carriage return / enter), and many more (see :help key-notation). We can use these to expand our key-mapping vocabulary.

editor-envy

I noticed a feature in Sublime Text that I wanted to simulate in Vim: ⌘Enter adds a newline to the end of the current line rather than inserting it at the cursor position. This is handy if you’re in the middle of a line and want to open a new line beneath it without breaking the text the cursor’s on.

To similate this, I needed to inoremap something to <C-O>o. From Insert mode, <C-O> pops you into Normal mode for a single command. Once there, o opens a new line beneath the current one and drops you onto it in Insert mode. In the interest of portability, I decided against using the key, since it’s Mac-specific, and went with Control instead:

inoremap <C-CR> <C-O>o

Now I can hit Control-Enter from Insert mode to drop down to a new line without disrupting the one I’m on. Actually no, I can’t. I can if I’m using MacVim, but terminal Vim doesn’t recognize the <C-CR> key-combo. This is where things get interesting.

terminal keycodes

To get the <C-CR> key-mapping to work in terminal Vim, I needed to first tell iTerm what to send when I hit Control-Enter, then tell Vim what to listen for and how to interpret it. Let’s start with iTerm. The steps for Terminal.app are similar, though the menus and appearance will differ.

In iTerm’s Preferences (⌘,), the Profiles tab has a Keys subtab. From there, you can define custom actions to trigger with any number of key-combinations. Clicking the ‘+’ at the bottom of the list reveals a dialog to add a new combination.

iterm keys screenshot

I hit Control-Enter to enter ^↩ in the Keyboard Shortcut field and selected Send Escape Sequence from the Action drop-down, revealing a field labeled “Esc+”. Here I entered [25~, telling iTerm to send Esc + [25~ when Control-Enter is typed.

“Why [25~? Where did that come from?” I was hoping you wouldn’t ask. Figuring out what codes to use, what wouldn’t conflict with anything, and what would be interpretted consistently across xterm, GNU screen, and tmux was not a straightforward process. Lots of googling and trial and error, and recounting it is probably best saved for another post. For now, I’ll stay focused on getting it wired up with Vim.

Next, I needed to tell Vim how to interpret the ^[[25~ escape sequence that iTerm would be sending its way. (Note that the initial ^[ is the Escape character itself.) I set an unused Function key to the escape sequence:

set <F13>=^[[25~

To enter that command correctly, you need to type set <F13>=, hit Control-V, hit Escape, then finish with [25~. Control-V followed by Escape enters the actual terminal code for the Escape key (which appears as the single character ^[). The same is true whether you’re entering it on Vim’s command-line or inserting it in your .vimrc.

With Vim listening for the escape sequence and associating it with a key, I mapped that key to <C-CR>:

map  <F13> <C-Cr>  
map! <F13> <C-Cr>

The call to map applies the mapping in Normal, Visual, Select, and Operator-pending mappings, while map! applies to Insert and Command-line mappings. With all this in place, terminal Vim can recognize Control-Enter and the <C-CR> key-notation.

You can apply this approach to a lot of other key’s that would otherwise be off-limits. A section of my vimrc wires up a bunch of them. I’m cutting down on the mappings these days, but it’s nice to know you can do this:

if &term =~ "xterm" || &term =~ "screen" || &term =~ "builtin_gui"
  " Ctrl-Enter
  set  <F13>=[25~
  map  <F13> <C-CR>
  map! <F13> <C-CR>

  " Shift-Enter
  set  <F14>=[27~
  map  <F14> <S-CR>
  map! <F14> <S-CR>

  " Ctrl-Space
  set  <F15>=[29~
  map  <F15> <C-Space>
  map! <F15> <C-Space>

  " Shift-Space
  set  <F16>=[30~
  map  <F16> <S-Space>
  map! <F16> <S-Space>

  " Ctrl-Backspace
  set  <F17>=[1;5P
  map  <F17> <C-BS>
  map! <F17> <C-BS>

  " Alt-Tab
  set  <F18>=[1;5Q
  map  <F18> <M-Tab>
  map! <F18> <M-Tab>

  " Alt-Shift-Tab
  set  <F19>=[1;5R
  map  <F19> <M-S-Tab>
  map! <F19> <M-S-Tab>

  " Ctrl-Up
  set  <F20>=[1;5A
  map  <F20> <C-Up>
  map! <F20> <C-Up>

  " Ctrl-Down
  set  <F21>=[1;5B
  map  <F21> <C-Down>
  map! <F21> <C-Down>

  " Ctrl-Right
  set  <F22>=[1;5C
  map  <F22> <C-Right>
  map! <F22> <C-Right>

  " Ctrl-Left
  set  <F23>=[1;5D
  map  <F23> <C-Left>
  map! <F23> <C-Left>

  " Ctrl-Tab
  set  <F24>=[31~
  map  <F24> <C-Tab>
  map! <F24> <C-Tab>

  " Ctrl-Shift-Tab
  set  <F25>=[32~
  map  <F25> <C-S-Tab>
  map! <F25> <C-S-Tab>

  " Ctrl-Comma
  set  <F26>=[33~
  map  <F26> <C-,>
  map! <F26> <C-,>

  " Ctrl-Shift-Space
  set  <F27>=[34~
  map  <F27> <C-S-Space>
  map! <F27> <C-S-Space>
endif

Rigging Vim’s Netrw

If you’re a Vim user, you’re probably familiar with the NERDTree, a plugin that provides a sidebar for navigating the filesystem, much like you get with a more graphical editor such as Sublime Text. It’s a nice feature, but you don’t necessarily need to install another plugin to get it. Most distributions of Vim come with Netrw already built in. Built by Charles CampBell, Netrw is a plugin for browsing, reading, and writing files both locally and across networks.

Netrw is not NERDTree. It does much more, but the flip side is that NERDTree focuses on doing one thing well. That being said, at some point I got interested in reproducing what I liked about NERDTree using the built-in capabilities of Netrw. It took a bit of configuration and some dirty language (vimscript) but if you’re not averse to any of that, read on.

vim screenshot

My first goal was to toggle a sidebar navigator open/closed with a keystroke or two. The :Vexplore command opens a Netrw browser in a vertical split. If you pass the command a directory, it will open into that location, otherwise it opens in the current file’s parent directory. There’s a distinction between the current file’s parent directory and the “current working directory” that Vim keeps track of. Say you start Vim from within ~/Development. You can :edit files anywhere you like (~/Development/resources, ~, /usr/local, etc.), and until you explicitly tell Vim to :cd to a new location, the current working directory will remain where it started, at ~/Development. You can use this as a home-base to work from in the current Vim session. With this in mind, I composed a small set of functions to toggle the sidebar in either the current file’s directory (to access neighboring files), or the “current working directory” (which I tend to leave at the project root), and mapped them to a couple keystrokes I find convenient.

1
2
3
4
5
6
7
fun! VexToggle(dir)
  if exists("t:vex_buf_nr")
    call VexClose()
  else
    call VexOpen(a:dir)
  endif
endf

I’m using t:vex_buf_nr to track whether the sidebar is currently open. The t: is scoping the variable to the current tab. That’s so each tab can have its own sidebar. If you’re not familiar with Vim’s tabs, don’t worry about it. It’s a minor detail here. In the else clause, we pass a:dir (the dir argument that was passed into VexToggle()) to VexOpen().

1
2
3
4
5
6
7
8
9
10
fun! VexOpen(dir)
  let g:netrw_browse_split=4    " open files in previous window
  let vex_width = 25

  execute "Vexplore " . a:dir
  let t:vex_buf_nr = bufnr("%")
  wincmd H

  call VexSize(vex_width)
endf

VexOpen() starts by setting some options. “Open files in previous window” ensures that when we select a file to open, it opens in the window (split) we were in before entering the browser. We’re also setting the desired window width for later use.

Next, we use vimscript’s string concatenation operator (.) to compose the Vexplore call. It’s a little ugly, but sometimes vimscript paints you into a corner like that. Now that we have an explorer open, let’s remember it (the next line). The "%" expands to the current file name, and we store the associated buffer number for later reference.

If you have several splits open, calling :Vexplore will open a Netrw explorer in a vertical split next to the current split, so there’s no guarantee it will sit on the far left of the screen or even occupy the full height of Vim. Calling wincmd H fixes that. Finally, calling VexSize() will set the sidebar’s width.

I made a couple mappings to call VexToggle(). The first passes it Vim’s “current working directory” as an argument, while the second passes an empty string. That way, I can use the first mapping to toggle an explorer sidebar from the project root and the second to toggle an explorer from whichever directory houses the file I’m currently editing.

noremap <Leader><Tab> :call VexToggle(getcwd())<CR>
noremap <Leader>` :call VexToggle("")<CR>

vim screenshot

When the sidebar is open, either mapping can be used to close it. VexClose() starts by noting which window it was called from, so it can return the cursor to that window after the sidebar has closed. The exception is when the cursor was in the sidebar when VexClose() was called, in which case the cursor will land in the previous window (whichever window holds the alternate file "#"). The middle section switches to the sidebar, closes it, and removes the internal variable that was tracking its presence. Finally, we switch to the appropriate destination window and call NormalizeWidths() to normalize the widths of all open windows. Note that we have to subtract 1 from the original window number that was stored, since closing the sidebar window decremented all the remaining window numbers.

1
2
3
4
5
6
7
8
9
10
11
fun! VexClose()
  let cur_win_nr = winnr()
  let target_nr = ( cur_win_nr == 1 ? winnr("#") : cur_win_nr )

  1wincmd w
  close
  unlet t:vex_buf_nr

  execute (target_nr - 1) . "wincmd w"
  call NormalizeWidths()
endf

vim screenshot

All that’s left are the final touches to window sizing, which occur in VexSize() and NormalizeWidths(). The first function sets and locks the sidebar width, then calls the second to normalize the widths off all other windows. NormalizeWidths() is a little hacky, but as far as I can tell it’s the only native vimscript way to normalize window widths without affecting their heights. 'eadirection' controls which dimensions are affected when 'equal always' is set. We set it to hor (horizontal), toggle 'equal always' off and back on (it’s on by default), triggering the width normalization, and finally restore 'eadirection' to it’s original value.

1
2
3
4
5
6
7
8
9
10
11
12
fun! VexSize(vex_width)
  execute "vertical resize" . a:vex_width
  set winfixwidth
  call NormalizeWidths()
endf

fun! NormalizeWidths()
  let eadir_pref = &eadirection
  set eadirection=hor
  set equalalways! equalalways!
  let &eadirection = eadir_pref
endf

Netrw lets you open a selected file in a vertical split with the v key, and I wanted to normalize window widths when such a split was added so things would remain evenly sized. The following autocommand makes it so.

augroup NetrwGroup
  autocmd! BufEnter * call NormalizeWidths()
augroup END

vim screenshot

Closing Notes

I ran into a couple minor bugs in Netrw during all of this, and turned to the vim_use mailing list for help. Netrw’s author (Dr. Chip) was quick to respond with a fix and point me toward the newest version. Big thanks Dr. Chip!

I find myself mostly using Netrw’s “thin” liststyle rather than the “tree” style I originally liked, but both work equally well in the sidebar. Finally, my vimrc is available for reference, though the relevant Netrw settings I’m using are pasted below:

let g:netrw_liststyle=0         " thin (change to 3 for tree)
let g:netrw_banner=0            " no banner
let g:netrw_altv=1              " open files on right
let g:netrw_preview=1           " open previews vertically

Polymorphic Mythology

I was recently introduced to polymorphic associations in Active Record. They provide some extra flexibility in how you choose to wire up your models, and can be an elegant solution to some otherwise awkward problems. To demonstrate, I’ll show how you could use them to catalog a collection of mythology.

We’ll start with a tiny collection of tales: The Reign of the Hydra, The Golden Voyage, and The Life of King Adrastus. In addition to needing a myth model, we’ll need models for beasts, voyages, and heros. Let’s set things up so that a character/event/etc. can be the central figure in any number of myths, with each myth centered around a single such figure. Our beast model, then, could simply be,

1
2
3
class Beast < ActiveRecord::Base
  has_many :myths
end

backed by a straightforward migration,

1
2
3
4
5
6
7
class CreateBeasts < ActiveRecord::Migration
  def change
    create_table :beasts do |t|
      t.string :name
    end
  end
end

But wiring up the myth model isn’t so simple. We could write three belongs_to statements into myth.rb, create three columns — beast_id, voyage_id, and hero_id — in the myths table, and find a way to enforce that two of the three always hold null values, but that’s pretty cumbersome. Plus, as our catalog expands and we discover new types of central-figures (fools, floods, fires), we’ll have to add more columns to accommodate any new classes we create. That’s a lot of work to store a whole lot of nils.

Polymorphic associations allow you to handle this more elegantly. Let’s describe the role that our central-figure plays in the context of a myth. For lack of a better term, I’ll call it “memorable”. A dragon ravishing the countryside, an epic voyage, a tragic hero, these are all “memorable” things that could take center-stage in a myth. Using this common thread, we’ll build a polymorphic association that can relate a myth to any such “memorable” object.

1
2
3
class Myth < ActiveRecord::Base
  belongs_to :memorable, :polymorphic => true
end

At the other end of the association, we’ll tweak the has_many statements in each of our “memorable” models, declaring the role they can play in relation to a myth. The beast model, for example, becomes

1
2
3
class Beast < ActiveRecord::Base
  has_many :myths, :as => :memorable
end

Now we can back the myth model with a much simpler table. The “memorable” central-figure’s id and its type will be stored in a pair of columns, providing a myth with all it needs (a foreign key and the table that key applies to) to retrieve its central-figure.

1
2
3
4
5
6
7
8
9
10
class CreateMyths < ActiveRecord::Migration
  def change
    create_table :myths do |t|
      t.string  :name

      t.integer :memorable_id
      t.string  :memorable_type
    end
  end
end

Active Record provides a shorthand for creating such a pair of columns: t.references :memorable, :polymorphic => true, which we could use in place of lines 6 and 7 above.

The polymorphic association allows us to create associations between existing objects,

1
2
3
4
5
6
adrastus = Hero.create(:name => "Adrastus")
life = Myth.create(:name => "The Life of King Adrastus")
life.memorable = adrastus

afterlife = Myth.create(:name => "The Afterlife of King Adrastus")
adrastus.myths << afterlife

and to build associated myths off of a given “memorable” object,

1
2
3
4
adrastus.myths.build(:name => "Adrastus - The Prequel")
adrastus.save

adrastus.myths.create(:name => "Adrastus IV - The Return")

Note, however, that we can’t build a “memorable” object off of a given myth, since the type of object (hero, voyage, etc.) is ambiguous.

Chef Roles

It took me a while to wrap my head around a Chef role. It sounds simple enough at first — a collection of recipes that allows a node to act in a certain capacity, as say, a Redis server — but Chef also deals in cookbooks, which are also collections of recipes. So then what’s the difference between a cookbook and a role?

A cookbook is a collection of recipes relating to a particular piece of technology. The nginx cookbook, for example, contains several recipes related to building and configuring nginx: nginx::source builds nginx from source, nginx::ohai_plugin provides the Ohai plugin as a template, nginx::passenger builds the passenger gem, etc.

A node is a single server, and Chef can apply a variety of recipes to the node to set it up as needed. Those recipes can be selected from a variety of cookbooks, and we don’t have to use every recipe in a given cookbook. So how do we package the particular mix of recipes we need for a certain type of node? In a role.

For lack of a better analogy, you could think of a role as a multi-course meal made from several recipes pulled from a variety of cookbooks. A couple from a Pasta cookbook, one from a French Cuisine cookbook, one from a Pastries cookbook. The meal won’t be made from every single recipe in those books, just the desired ones.

Recipes can include each other too, like mixins in Ruby (the nginx::source recipe, for example, includes the nginx::ohai_plugin recipe as part of it) but let’s not add to the confusion just yet.