Been some time, however this is a useful one I found over the weekend.
Again to the trusty xargs
, that slightly blunt and brutish chainsaw for processing an extended listing of recordsdata or no matter that somebody gave you.
On this case, I had a listing of recordsdata that I knew with 99%+ certainty have been created on our outdated server and thus encoded in iso-8859-1
, contained characters that have been represented in another way in utf-8
(which we had switched to on our new server) and wanted changing, and a useful script wrapper round iconv
to do one file at a time.
All 41,000 of them. The listing took 4 hours to generate, throughout which era I used to be pondering the truth that I actually ought to have taken benefit of the truth that, usefully, the brand new server has 40 cores of Xeon goodness. So we ought to have the ability to parallel course of this listing now we have got it, proper? And ideally with out bothering with GNU Parallel or Perl’s Parallel::ForkManager
?
Seems we will!
xargs -P <n>
(if supported in your OS) runs the instructions generated by xargs
in n-way parallel.
So:
cat <listing of 41K recordsdata> | xargs -n 1 -P 100 <iconv wrapper>
We’d like the -n 1
because the wrapper solely takes one file at a time, and that is how we inform xargs
that. Deep breath. Hit RETURN.
Whoosh. Load on server briefly rockets to 45, then falls simply as quick to its regular 1 and a bit. In about one minute flat, for all 41,000 recordsdata.
Not unhealthy.