Linux keyboard.

Bob's Blog

Why the Command Line Rules

It's easy to use a "point-and-click" graphical interface to do simple things. That's where people start using computers.

However, when the task becomes larger, more complicated, more realistic, then a graphical user interface (or GUI) really slows you down. The job becomes drudgery, and before long it has grown to where the GUI makes it impractical. But from the command line it's fast and easy.

For example, it doesn't take terribly long to right-click on a file icon, select Properties, click on the name, and change it from introduction.htm to introduction.html. But what if you want to make the corresponding change to every one of the hundreds to thousands of files included on a web site? With a command-line interface (or CLI), I could do that in thirty seconds or less.

I work with Linux, and have worked with other UNIX-family operating systems since the mid 1980s, so this will unsurprisingly focus on the Bash shell. But if you leave out the syntax details, all of this applies equally to PowerShell and the earlier Command Prompt environments found on Windows. Let's see why the command line is the way to get things done.

Time-and-motion studies going back into the 1980s have shown that a GUI interface is less efficient for many tasks. A relatively recent example is the 2005 paper "Hidden Costs of Graphical User Interfaces: Failure to Make the Transition from Menus and Icon Toolbars to Keyboard Shortcuts", by David M. Lane, H. Albert Napier, S. Camille Peres, and Anikó Sándor of Rice University, in International Journal of Human-Computer Interaction, 18(2), 133-144.

Of course, there are some tasks for which a GUI is superior. One is interacting with images, maybe selecting regions within an image. Or interacting with a complex system like an audio player which presents a graphical simulation of a multi-control physical system, with sliders for volume and balance, buttons to move between tracks, sliders to move within a track, and sliders as equalizers. Audacious is a good example.

Three-part graphical interface of Audacious: basic player controls at top, equalizer in the center, track list at bottom.

One great reason to use a graphical interface is to have multiple command-line interfaces accessible at the same time. Maybe in one I'm monitoring what is being appended to a log file, in another window I'm developing a log analysis script, and in a third window I'm testing that script. All I do with the mouse is bump it to move the pointer between the programming and program-testing windows. With my environment set for point-to-focus instead of click-to-focus, I don't have to also push a particular button on the mouse, I just scoot it roughly into the other window.

Meanwhile, I am using graphics to multi-task, while doing the active tasks with keyboard-driven command-line interfaces.

The Command Line Can Be A Programming Environment

Don't let that frighten you off. It is a programming environment, but you don't have to use it that way! However, when you advance to more complex tasks some day, you will be glad that it is.

You can use the command line to do anything that you could do within a shell script, which is a particular type of program. On the command line you are interacting with what's called a shell, an environment where you can assign values to variables, do various logical tests and control, and where your typed commands are interpreted one at a time by a program. That shell program is what would run a shell script stored in a file!

Wikipedia has a pretty good introduction to the Bash shell.

If you have started using computers after about 1990, the Bash shell was probably available. On Linux and macOS, Bash was and remains the default command-line interface. So it's not as though a Bash command-line environment is somehow exotic.

Build Your Own Tools

The design philosophy is to have a toolkit of simple tools. Each does just one thing, or a limited number of things in some cases. Almost all can be "plugged together" into a pipeline. That means that any basic environment lets you assemble an arbitrarily complex tool that possibly has never been built before.

Let's consider log analysis, a common troubleshooting task for a system administrator, and a common analysis task for a security specialist. Automated systems are constantly trying to break into your systems, and one way is by guessing passwords used to connect in via SSH.

In an example attack study in my cybersecurity pages, I used some logs collected on a Linux system exposed to the Internet. It collected up to 4 million log events per month. An attempted SSH break-in generated one event, one line in the log, something like this:

Aug  1 13:54:24 ipanema sshd[2623]: Failed password for invalid user rvences from 195.189.140.23 port 56100 ssh2

It might be interesting to explore questions such as:

  1. Which user accounts are they trying to take over?
  2. Where are the attacks coming from?
  3. At which times of day are the attacks more (or less) common?
  4. Where are the attacks coming from?

However, you don't want to read through 4 million lines of log data! And, you don't want to search for, and purchase, some specialized tool that only comes pretty close to providing the answers you want. You want to create your own tools in just a minute or two of thinking and experimenting!

For question #1 above, "Which user accounts are they trying to take over?", you could answer it by this sequence of steps:

1a: Select just those lines containing the string "sshd" followed by "Failed password".

1b: Out of those, select just the user name. That might be something like "invalid user rvences" or, in the case of users who actually exist on the server, just one word like "root". A little thinking and experimenting shows that you could solve this problem by first deleting everything through "Failed password for", and then deleting everything from the word "from" to the end.

1c: Sort all the names into alphabetical order.

1d: Count how many instances there are of each name, so now each line has a number and a name.

1e: Sort that list into reverse numerical order, so the most common name is first.

1f: Display just the top 20...

1g: ...with those lines numbered from #1 down to #20.

The Alaska Pipeline running southeast from Fairbanks.

The Alaska Pipeline running southeast from Fairbanks toward the oil terminal at Valdez. Each pipe segment simply carries crude oil, the same as all the others. We're building a command pipeline, with different processes applied at each segment.

Building The Pipeline

The above English description is the general plan. The resulting command pipeline will roughly be the following. I have highlighted the "tip-off" words that tell you which tool to use for each stage or segment:

1a: Select just the SSH failure lines with grep.

1b: Do the two modifications with two sed modules.

1c: Sort that with sort.

1d: Count blocks of repeated names with uniq.

1e: Sort that into reverse numerical order with sort. This will use the same tool as the simple sorting of step 1c, but with two options — one specifying numerical sorting, the other specifying reverse order.

1f: Output just the first 20 with head.

1g: Number those lines from #1 down to #20 with cat.

Later blog entries will come back to explain the details, and also explain how to answer those other common questions. I'll come back to this piece and add links when I get those written. But for now, a preview of the above list of steps could be the following command pipeline:

$ grep 'sshd.*Failed password' /var/log/secure |
	sed 's/.*Failed password for //' |
	sed 's/ from .*//' |
	sort |
	uniq -c |
	sort -nr |
	head -20 |
	cat -n
     1    98561 root
     2     9118 invalid user test
     3     8870 invalid user admin
     4     4582 invalid user oracle
     5     4426 invalid user guest
     6     4092 invalid user user
     7     2592 invalid user a
     8     2479 invalid user web
     9     2306 invalid user student
    10     2240 invalid user www
    11     2047 invalid user mysql
    12     1905 invalid user info
    13     1868 mail
    14     1730 invalid user students
    15     1712 invalid user testing
    16     1683 invalid user tester
    17     1656 invalid user administrator
    18     1627 invalid user ftp
    19     1591 backup
    20     1462 invalid user alex

The Command Line May Be The Only Choice

You don't run graphics on a server. They're mounted in a rack in a server room that might be down the hall, or across the corporate campus, or far away in a cloud data center. You connect in with SSH and there's your familiar and powerful command-line interface, the same thing that a server system administrator uses for almost everything else.

Next:

How to Start Writing Scripts
Someone asked me, "How can I learn scripting?" It's easy to get started! Bash or Python or whatever!

Latest:

What is "A.I.", or "Artificial Intelligence"?
So-called "A.I." is hype and misunderstanding, here's hoping the next "A.I. Winter" arrives soon.

Previous:

Which Programming Language Should I Learn?
Someone asked me, "Which programming language should I learn?" It depends on what you want to do.

Cybersecurity Certifications are Unfair
Cybersecurity certifications are not a fair test of knowledge, let alone skill. They have an illusion of relevance and meaning, making more money for the certifying companies.

How Not to Get a Job
In which I stumbled into a teaching job through mistaken identity, with the involvement of a doomsday cult.