
Bob's Blog
Automating Changes Across Thousands of Files
I
wrote
about a similar easy solution to a large problem
several months before,
but the scale of this situation was even worse.
In the total of 2,155 web pages
on my multiple web sites,
26,756 lines on
1,564 of those pages required a change.
That requires 1,564 editing session,
each making an average of a little over 17 changes.
That would take an enormous amount of time
to do by hand.
But with command-line tools included in Linux, BSD,
and all other UNIX family operating systems,
it only required a few minutes of thinking and preparation
and then a few seconds to run.
Let's see another example of working smarter, not harder.

Early modern European implementation of a multi-tab browser. From Agostino Ramelli's 1588 Le diverse et artificiose machine. In the U.S. Library of Congress collection, and available through the Public Domain Image Archive.
How Did My Problem Arise?
I have Google AdSense advertisements on the pages on my sites. The good news for me is that every month Google makes a direct deposit into my bank account based on the number of page views and, more lucratively, clicks on ads. Not huge income, but certainly welcome.
Every once in a while Google sends me an automatically generated report informing me of any issues. Some of these could lead to worse SERP, the much-desired Search Engine Results Page ranking, and worse SERP leads to fewer ad views and less revenue for me.
Google constantly modifies their many algorithms, and their search algorithm has been giving increasing importance to what they call Core Web Vitals, measures of the page loading performance, visual stability, and overall user experience. Much of that has to do with performance, how quickly all the contents of the page get loaded and the page layout stabilizes.
Here is how I could make the above image appear
on my page —
An img
element specifying the src
or source of the data
with an alt
or alternative description
which screen readers or text-only browsers like
Lynx
would present to the user instead of the JPEG image.
<img src="/blog/2025/pictures/ramelli-bookwheel.jpg" alt="A mechanical system in the form of a two-faced wheel reaching from floor level almost to the ceiling, carrying a dozen or more books. A system of gears keeps the books properly oriented as the wheel turns. A man is sitting at a chair using the device. A partially visible shelf across the room holds additional books, at least 23 are visible. From Agostino Ramelli's 1588 'Le diverse et artificiose machine', in the U.S. Library of Congress collection and available through the Public Domain Image Archive at pdimagearchive.org.">
The above is functional
and it also provides accessibility.
However, some time ago Google informed me that my pages,
generally speaking, lacked performance.
Google's analysis suggested adding
lazy loading,
telling the browser to delay loading images until they
were about to become visible as the user scrolls down.
That means adding a loading="lazy"
directive to every img
on every one of my pages.
Except for the banner image at the very top of a page,
because it should be immediately visible.
I had, of course, automated all those modifications,
resulting in something like this:
<img src="/blog/2025/pictures/ramelli-bookwheel.jpg" alt="A mechanical system in the form of a two-faced wheel reaching from floor level almost to the ceiling, carrying a dozen or more books. A system of gears keeps the books properly oriented as the wheel turns. A man is sitting at a chair using the device. A partially visible shelf across the room holds additional books, at least 23 are visible. From Agostino Ramelli's 1588 'Le diverse et artificiose machine', in the U.S. Library of Congress collection and available through the Public Domain Image Archive at pdimagearchive.org." loading="lazy">
That warning and my sweeping modifications happened maybe two years ago. Now I had received a new warning about poor performance.
After a few hours of analysis, I reached a conclusion: This is another round of the usual Google nonsense. The long download delays, the long-running Javascript, and the twitchy layout rearrangements were almost entirely due to the Google AdSense content. Google's advertisers provide their own, often faulty, content. For example, images absurdly larger than the area in which they are downsized to fit. The ad content is served out from the advertisers' servers, which may attempt to run outdated versions of TLS or not run HTTPS at all, with inappropriately short cache recommendations, missing content, disfunctional Javascript, and other nonsense.
Using the
Lighthouse
tool built into Chrome's Developer Tools
to experiment with changes in a few representative pages,
I determined that
loading="lazy"
did not significantly improve performance.
My server is in the Google Cloud,
with excellent latency and bandwidth.
What's more,
lazy loading was causing layout instability
that Lighthouse cited as
the worst remaining problem.
As you scrolled down a page, the browser progressively
loads more images and then recalculates how to lay out
the page.
Google refers to this as CLS
or Cumulative Layout Shift,
and the most recent message from Google cited that
as my sites' greatest search performance problem.
Lazy Versus What?
The two defined values are lazy
and eager
, which is the default behavior.
I thought about it for a while and decided that
rather than simply removing every loading="lazy"
directive, I should instead change all of them to
loading="eager"
.
That way, I could change them back to lazy
or whatever other modification I needed to do to all
images.
Meanwhile the short loading="eager"
directive would have no effect.
If I wanted to return to lazy
or a third alternative comes along,
it would be an equally easy fix.
Planning
First, let me warm up with a few commands to see
just how large this problem is.
How many web pages,
meaning how many HTML files?
I do virtual hosting
with the Nginx web server,
and have each site's document root in a subdirectory
of /var/www/html/.
2,155 files:
$ find /var/www/html/ -name '*.html' | wc 2155 2155 99807
How many of those pages contain a
loading="lazy"
directive?
Notice that $(...)
does
command substitution.
The command within runs first.
Its output, the list of all files named *.html
,
is inserted as a long list of parameters to the
grep
command.
With its -l
option it only lists
the files with matching lines,
rather than its default behavior of showing the me file
names and matching lines.
Then the wc
tells me how many lines,
words, and characters were in that output.
1,564 files containing that string:
$ grep -l 'loading="lazy"' $( find /var/www/html/ -name '*.html' ) | wc 1564 1564 74018
How many loading="lazy"
lines
appear within those pages?
Command substitution gets the list of *.html
file names.
Then cat
reads them all and generates one
enormous sequence of HTML,
and sends that into grep -c
to count
the number of matching lines.
26,756:
$ cat $( find /var/www/html/ -name '*.html' ) | grep -c 'loading="' 26756
Now to think through the details of what I want to do, and decide which command-line utilities are appropriate:
-
I want the processing to be applied to the list
of files that need this change.
So, get the list with something like
the above nested
grep -l pattern $( find location criteria )
and then go through that list in afor
loop. -
I will use the
sed
stream editor program to make the changes, with its-i
option telling it to edit the files in place rather than the usual behavior of sending the results to standard output. -
I want to make a web page describing this
and showing the great performance,
so I will wrap the loop within the
time
command.
Doing the Work
First, of course, I made a backup so I could quickly and easily restore the current state if something goes wrong. A simple error could make a horrible mess, so this is vital:
$ cd /var/www/html $ tar cf ~/web-sites-backup.tar * $ ls -l ~/web-sites-backup.tar -rw-rw-r-- 1 cromwell cromwell 4032307200 Feb 10 13:22 /home/cromwell/web-sites-backup.tar
Here's how I solved the problem:
$ time for f in $( grep -l 'loading="lazy"' $( find /var/www/html/ -name '*html' ) ) > do > sed -i 's/loading="lazy"/loading="eager"/' $f > done real 0m3.952s user 0m0.882s sys 0m2.188s
This required just 3.952 seconds of wall-clock time.
The CPU spent 0.882 seconds doing the user tasks,
most of that the 1,564 executions of
the sed
command,
plus 2.188 doing the kernel tasks of
all the file system I/O.
Then the upload of modified files
involved 4 GB of data and took 50.7 seconds.
Problem solved!
This is the latest one!
Previous:
Routing Through Starlink
By the mid 2020s, Internet connections in remote areas frequently used Starlink, the satellite system owned by the pro-fascist eugenicist Elon Musk. Let's see how Starlink works.
10 Billion Passwords, What Does It Mean?
RockYou2024 is a list of 10 billion unique leaked passwords. Let's analyze what that really means.
Easy Automation of Thousands of Changes
Use fundamental Linux commands and some shell syntax to make thousands of changes in thousands of files in seconds.
What is "A.I.", or "Artificial Intelligence"?
So-called "A.I." is hype and misunderstanding, here's hoping the next "A.I. Winter" arrives soon.