UNIX / Linux keyboard.

A Few Linux / Unix Tips and Tricks

Linux Command-Line Tips and Tricks

Command History Editing in csh and tcsh

Just add a line to your file $HOME/.cshrc saying:

bindkey -v

Go get a copy of tcsh if you don't already have one. It is a vastly improved and expanded version of csh. It is free yet very well supported like most really great Unix stuff. Don't worry, everything you do now in csh will work exactly the same. There will just be a lot more things you can use as you learn how.

Hint:
man tcsh

By the way, the name tcsh is obviously formed by adding a t to csh, but why a t? According to the manual page, as a tribute to the TENIX and TOPS-20 operating systems, which had some great features that were added during the csh-to-tcsh improvement.

Automatic Filename Completion

You want to accomplish something with the computer beyond just practicing your typing. Let the shell finish your commands and filenames — it's much faster and it makes no errors! The tcsh command does this by default, but you can add this to your file $HOME/.cshrc to tell csh to also complete file and command names when you press the Tab key:

set filec 

If the choice is ambiguous because there are multiple possibilities based on what you have typed so far, the shell will beep at you. That is, unless you also set autolist, in which case the shell shows you the choices. This is like the bash behavior, and it is available in tcsh but not csh

set filec
set autolist 

If you also set addsuffix then the autolist distinguishes between files and directories by adding "/" to the names of directories.

set filec
set autolist
set addsuffix 

You could also have the tcsh shell offer suggestions when you mis-type commands. I don't like this — I should have typed it correctly in the first place and filename and command completion should have avoided this problem. But if you want to:

set correct=cmd 

Avoid file removal and overwriting disasters

In both csh and tcsh the rmstar variable will cause a command like rm * to ask if you really want to remove everything. One of the few times that DOS default behavior might be especially useful, here's how to turn it on in your $HOME/.cshrc file:

set rmstar 

You will want to do that in interactive shells only, as it does not progress until you press y or n and a script would hang endlessly

Overwriting is when you redirect output into a file that already exists:

% some-command > valuable-data 

or when you move or copy a file on top of pre-existing valuable data:

% cp foo valuable-data-1
% mv foo valuable-data-2 

You can prevent the first disaster with the noclobber shell variable, and the second form of disaster with aliases for cp and mv. All this goes in your $HOME/.cshrc file:

# Don't overwrite existing files with stdout or stderr streams
set noclobber
# Don't overwrite existing files with "cp" or "mv"
alias cp   'cp -i'
alias mv   'mv -i' 

You can override these if you really want to overwrite the file:

% some-command >! valuable-data-1
% cp -f foo valuable-data-2
% mv -f foo valuable-data-3 

Show the Current Directory in the Prompt

My $HOME/.cshrc file contains something like this:

set host=`uname -n | sed 's/\..*//'`
set domain=`uname -n | sed 's/^[^\.]*//'`
set os=`uname -s`
set prompt0="${os}:${host}"
set prompt="${prompt0}:`echo $cwd | sed 's@'${HOME}'@~@'` % "
alias cd 'cd \!* ; set prompt="'${prompt0}':`pwd | sed "s@'$HOME'@~@"` % "' 

That makes my prompt contain:

Linux:penguin:~/src % _

OpenBSD:berkeley:/usr/share/doc % _ 

Put the Hostname and Directory in the xterm Title Bar

Exactly how you go about this depends on the interactive shell you're using. You'll probably set up a function (for ksh) or an alias (for csh or tcsh) or use the cwdcmd special alias (for tcsh). Whatever. The main trick is, whenever you do a cd your shell should also utter the following arcane incantation:

echo -n "^[]2;${HOST}:$cwd^G^[]1;${HOST}^G"

Yikes! You need to have the variable HOST set to the hostname, and if your shell insists on using CWD or pwd or PWD to indicate the current working directory, change "$cwd" above accordingly. Finally, ^[ really corresponds to control-left-bracket, which happens to be the same as the Escape key, and ^G really corresponds to a control-G. That incantation puts hostname:path in the xterm titlebar, and associates simply hostname with its icon. Tune what you want to put where accordingly.

Yeah, that's a real mess. That's why (a) I just define it in my shell start-up file, (b) I can never remember exactly how to do it without looking there, and (c) I never try to explain this to anyone, but tell them to look at this page. Now you see why.

Getting Rid of the Heinous "Caps Lock" Key

OK, so this is just a pet peeve of mine, but maybe the caps lock key annoys you, too.

If you are the system administrator and want to fix this system wide, put the following in a boot script like /etc/rc.d/rc.local

A user could put it in their $HOME/.login file:

xmodmap -e "remove Lock = Caps_Lock"
xmodmap -e "add Control = Caps_Lock"

That stopped working when I hit a new and fancier version of the KDE desktop, but I fixed that by putting the following in my $HOME/.login file. Yes, I realize the following looks strange with the doubled -option parameter. According to the manual page the first -option parameter with no argument replaces all previously selected options, and the second one does the work:

setxkbmap -option -option compose:rwin,ctrl:nocaps

Some awk Tricks

You can do just about anything with awk and it's a standard part of Unix. Yes, Perl does more, but due to the difficulty of reliably determining the purpose of Perl code by reading it, some organizations have administratively decided against its use for mission-critical applications. Here's an awk example I use in class. Imagine that a database contains expense records, and each record is printed in a line like in the following extract:

 5.44  meal       Feb 27 2016  Chicago  Breakfast before meeting
14.07  meal       Feb 27 2016  Chicago  Dinner after meeting
 0.75  transport  Feb 27 2016  Chicago  Toll driving home

Here's an awk script to print detailed records and summaries:

BEGIN	{	# This stuff happens before line 1!
		printf("\nExpense Report\n");
		printf("======================================\n");
	}

	{	# No pattern -- this matches and runs for every line!
		printf("Date: %s %s, %s\n", $3, $4, $5);
		printf("  City: %s\n", $6);
		printf("  Item: %s", $2);
		if (NF > 6) {
			printf("(");
			for (i = 7; i <= NF; i++)
				printf(" %s", $i);
			printf(")");
		}
		printf("\n");
		printf("  Cost: %s\n", $1);
		total_cost += $1;
		num_receipts ++;
	}

END	{	# This stuff happens after the last line!
		printf("======================================\n");
		printf("%d reports handled\n", num_receipts);
		printf("Total amount: $%.2f", total_cost);
		printf("  (average cost = $%.2f)\n", num_receipts/total_cost);
	}

Store the above awk program in a file. Let's say we name the file summary.awk (not that there's anything magic about the ".awk" extension, it's just a reminder). Now, if the database is stored in the file expenses, you can process the file as:

$ awk -f summary.awk expenses

How about something useful for processing WWW server logs? Your httpd logs will contain lists of domain names. If you simply sort them, they will be put in lexicographic ASCII order:

betelgeuse.ecn.purdue.edu
cleveland.widget.com
dover.gadget.com
elephant.elvis.com
happy.gadget.com
larry.elvis.com
methuselah.elvis.com
narvik.vikingnet.no
ollie.cs.purdue.edu
pc12.ecn.purdue.edu
pc17.banzai-institute.org
pc22.widget.com
sam.gadget.com
thor.library.purdue.edu
ws17.widget.com
www.gadget.com
www.purdue.edu
www.banzai-institute.org
www.widget.com
zaphod.cs.purdue.edu
zaphod.ecn.purdue.edu
zaphod.gadget.com
zaphod.library.purdue.edu

That's a sorted list, but we would prefer that the domains sorted right-to-left. We can't easily do that with sort since they have varying numbers of fields. We would like the com names first, then the edu names, then the no names (you did notice the Norweigan hostname, right?), then the org names.

Within each top-level domain, we would like to sort them by second-level domain.

Within each second-level domain, we would like to sort them by the third-level domain, and so on. We want to get this list, complete with indentation making it easy to read:

	       elephant.elvis.com
	          larry.elvis.com
	     methuselah.elvis.com
	         dover.gadget.com
	         happy.gadget.com
	           sam.gadget.com
	           www.gadget.com
	        zaphod.gadget.com
	     cleveland.widget.com
	          pc22.widget.com
	          ws17.widget.com
	           www.widget.com
	      ollie.cs.purdue.edu
	     zaphod.cs.purdue.edu
	betelgeuse.ecn.purdue.edu
	      pc12.ecn.purdue.edu
	    zaphod.ecn.purdue.edu
	  thor.library.purdue.edu
	zaphod.library.purdue.edu
	           www.purdue.edu
	      narvik.vikingnet.no
	pc17.banzai-institute.org
	 www.banzai-institute.org

No problem! Just apply this shell script to the file log:

#!/bin/sh

awk -F. '{ for (i = NF; i > 0; i++)	# Reverse fields, remove dots
		printf("%s ", $i);
	   printf("\n"); }' log  |  \
sort |  \				# Sort
awk '{	printf("%s", $NF);		# Reverse fields again, re-insert dots
	for (i = NF-1; i >= 0; i--)
		printf(".%s", $i);
	printf("\n"); }'  |  \
awk '{ printf("%30s\n", $1); }		# Right-justify the output

The first awk prints each domain names in reverse order with the dots removed. The sort puts the reversed names into our preferred order. The second awk reverses the domains again, and re-inserts the dots. The third awk formats the output with printf() to make the complete names line up along the right margin.

Large-Scale Searching For Files With Multiple Patterns

I had what I consider to be a lot of data (tens of thousands of files, over seven hundred megabytes), and I wanted to find the files that contained all the patterns in a list. In more detail:

The data was in 24 subdirectories with names of the form lists-*. Each subdirectory contained from 90 to 15,267 files. Each file is ASCII text, a list of one number and one word per line. The total is 72,805 files and just over 740 megabytes. The shells and commands cannot handle attempted commands referencing all the files, a command line with 72,805 tokens is way too long. The Bash or tcsh shell might handle it, but the commands cannot:

% ls lists-*/*
/bin/ls: Argument list too long 

I'm looking for a list of eight words, account names attacked as part of one specific password guessing attack:
dominique domino dontknow doogie doors dork doudou doug
Click here to see the page discussing these sorts of attacks and how to detect common patterns.

I would like to find the files containing all eight of those words precisely as they appear in my list. Substrings don't count: douglas and doorstop should not count as matches for doug and doors

Any word may appear arbitrarily many times in any file, but I want to find the files in which all of them appear at least once. That requires more than simple match counting, or else a file with eight instances of "dominque", or four of "dominique" and four of "domino", and so on, would incorrectly match.

As it turns out, this is pretty easy in Unix. The hard part is being patient during all the file I/O. Create a four-command script to do the work, and invoke that file with find -exec ...

% cat > tester << EOF
#!/bin/sh
egrep -iw 'dominique|domino|dontknow|doogie|doors|dorothy|doudou|doug' $1 /dev/null |
    sort -u |
    awk -F: '{F=$1} END {print NR, F}' | egrep '^8'
EOF
% chmod +x tester
% find lists-* -exec ./tester {} \; 

In the first grep, the -iw means "ignore case, and look for patterns as isolated words". Asking it to also scan the trivially empty file /dev/null means that each matching line will be preceded by the filename plus ":". Yes, you may be able to tell grep to do that with just one filename argument, but it depends on your specific version of grep.

The sort -u throws away duplicates.

The awk splits each line at the ":" and sets its internal variable "F" to the first field (which happens to be the name of the file from two commands back the pipeline in this case). After it has finished reading the entire stream, it prints the number of records (lines) and that variable.

The final grep just passes the lines starting with "8".

After much time passes, I have my answer.

Isn't Unix powerful?

Maybe you think it would be far more elegant to do this on a single line. Using awk and using the Bash shell so I don't need backslashes at the ends of all the lines, that can be done with something like this:

$ find lists-* -exec awk '
        BEGIN   { F1 = 0; F2 = 0; F3 = 0; F4 = 0;
                        F5 = 0; F6 = 0; F7 = 0; F8 = 0; }
        /\<dominique\>/ { F1 = 1 };
        /\<domino\>/    { F2 = 1 };
        /\<dontknow\>/  { F3 = 1 };
        /\<doogie\>/    { F4 = 1 };
        /\<doors\>/     { F5 = 1 };
        /\<dork\>/      { F6 = 1 };
        /\<doudou\>/    { F7 = 1 };
        /\<doug\>/      { F8 = 1 };
        END     { if (F1 && F2 && F3 && F4 && F5 && F6 && F7 && F8) {
                        printf("%s has all eight\n", FILENAME);
                } } ' {} \; 

The Cathedral and the Bazaar

The canonical explanation of the inherent quality of the open-source model is Eric S. Raymond's "The Cathedral and the Bazaar", comparing the traditional product development process to that of the construction of a medieval European cathedral, and the open source method to a Middle Eastern bazaar.

Spices for sale in the Egyptian Bazaar or Spice Bazaar or Mısır Çarşısı in Istanbul.

Spices and dried fruit and nuts for sale in the Egyptian Bazaar, or Spice Bazaar, or Mısır Çarşısı in İstanbul, Turkey.

UNIX applications for the Windows NT family (NT, 2000, XP, 2003, 2007, Vista, Windows 7, Windows 8, 10, ...)

If you want powerful tools, just install some form of Unix. Linux, BSD, you have a lot of choices. But if you are stuck with Windows, there is some hope. Remember that "Windows NT" is actually Microsoft's terminology for an entire family of operating systems — NT 3.*, NT 4.*, then Windows 2000 and everything that followed.

GnuWin32 is probably the answer. Other useful components include:

Gratuitous Pop-Culture References to Unix

The Matrix Reloaded, where Trinity uses Nmap version 2.54BETA25, finds a vulnerable SSH server, and exploits it using the SSH1 CRC32 exploit from 2002 See the Nmap site for screenshots from The Matrix Reloaded and also The Bourne Ultimatum and Die Hard 4, where Nmap also appeared.

Jurassic Park, where the kid says, "No problem, I know this, it's Unix!" OK, to be pedantic about it, they were looking at IRIX, SGI's somewhat quirky version of Unix, and specifically at the very SGI-specific graphical user interface.

Cruel and Unusual, by Patricia Cornwell. A murder mystery involving some surprisingly detailed Unix references, in between all the surprisingly detailed forensic pathology references. Also some classic user-induced security holes like obvious or guessable passwords.

Foucault's Pendulum, by Umberto Eco. A character makes an observation in chapter three that's truly a job for sed. If you change every instance of "a" to "akka" and every instance of "o" to "ulla", you can make English text look an awful lot like Finnish:

$ sed -e 's/a/akka/g' \
	-e 's/A/Akka/g' \
	-e 's/o/ulla/g' \
	-e 's/O/Ulla/g' english-story > finnish-story

And finally....

The Foundation for the Mis-application of Computer Languages (FoMCoL) aims to "promote and examine the art of writing applications in languages that are muchly inappropriate for the task in mind."

Raytracing with Prolog! A text editor written in /bin/sh and /bin/dd! And more!


Where next?

My Linux / Unix page My computer security page Analyze the Apache referer log with sed