Nu Shell and Databricks

I’m a big fan of the command line. It’s often something that can seem daunting to people at first, but with a little time and patience you can often speed up many tasks just by knowing some useful commands and how to chain them together.

Most of the time I’m in Powershell which, thanks to Powershell Core, is now cross-platform and incredibly powerful. But I’m finding myself also using Nu more and more. In both shells I also use the Databricks CLI a lot. Want to check the status of jobs? Use the CLI. Want to upload and download data? Use the CLI. And so on.

Whilst the Databricks CLI is useful, there’s times where I want a little more power over it. Such as, using the CLI to find a Databricks runtime version which is under Long Term Support (LTS) and is Photon enabled. So, I can do this using, for instance, the Databricks CLI and some jq. But I’m also lazy and wanted something that’s a bit easier to query, and displays nicer, and is easier to output to something like CSV afterwards.

Well, I can get all of that from Nushell. The only downside is that it’s quite a few commands to get the data into the right shape to make querying it easy. So, instead, lets do the tedious bits and save them as a command aliases. So, lets fire up Nushell and give it a go.

First up, lets find our config file.

> config path
C:UsersDarrenFullerAppDataRoamingnushellnuconfigconfig.toml

Yours will look different to this, but this is the file we need to add our command aliases to.

Now, lets work out what our command looks like. I want to create a command that calls the Databricks CLI for the runtime versions and adds some useful information such as if it’s an LTS version. So what does that look like?

>  databricks clusters spark-versions 
    | from json 
    | get versions 
    | insert isLTS  str contains "LTS"  
    | insert isML  str contains "ML"  
    | insert photonEnabled  get name 
    | insert details  get name 
    | insert runtime  get details.runtime  
    | insert spark  get details.spark 
    | reject details

I’ve put that over multiple lines to make it easier to read, but if you want to run it you’ll need to have it all on the same line, like this.

> databricks clusters spark-versions | from json | get versions | insert isLTS  str contains "LTS"  | insert isML  get name  | insert photonEnabled  get name  | insert details  get name  | insert runtime  get details.runtime  | insert spark  get details.spark  | reject details

So what’s it doing? Lets break it down a bit.

command description
databricks clusters spark-versions Run the Databricks CLI to get the available runtime information
from json Parses the response from JSON as a table
get versions Gets the “version” part of the response object
insert isLTS get name str contains “LTS”
insert isML get name str contains “ML”
insert photonEnabled { get name str contains -i “Photon”
insert details { get name parse “runtime (includes Apache Spark spark,remainder”
insert runtime get details.runtime Adds a new “runtime” column by getting the runtime information from the details column
insert spark get details.spark Adds a new “spark” column by getting the spark version information from the details column
reject detail Removes the “details” column

That’s a lot of commands to run each time, so lets instead save this as a command alias in our config file.

startup = [
    "alias dbx-runtimes = ( databricks clusters spark-versions | from json | get versions | insert isLTS  str contains "LTS"  | insert isML  str contains "ML"  | insert photonEnabled  str contains -i "Photon"  | insert details  parse "runtime (includes Apache Spark spark,remainder"  | insert runtime  get details.runtime  | insert spark  get details.spark  | reject details )"
]

Here I’ve aliased the command with the name dbx-runtimes. I’ve also had to escape the double-quotation marks. But now that we have this we can run all of the above by simply calling the alias.

> dbx-runtimes
────┬──────────────────────────────────┬────────────────────────────────────────────────────────────────────┬───────┬───────┬───────────────┬────────────────────────────┬───────
 #  │               key                │                                name                                │ isLTS │ isML  │ photonEnabled │          runtime           │ spark
────┼──────────────────────────────────┼────────────────────────────────────────────────────────────────────┼───────┼───────┼───────────────┼────────────────────────────┼───────
  0 │ 6.4.x-esr-scala2.11              │ 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)     │ false │ false │ false         │ 6.4 Extended Support       │ 2.4.5
  1 │ 7.3.x-cpu-ml-scala2.12           │ 7.3 LTS ML (includes Apache Spark 3.0.1, Scala 2.12)               │ true  │ true  │ false         │ 7.3 LTS ML                 │ 3.0.1
  2 │ 7.3.x-hls-scala2.12              │ 7.3 LTS Genomics (includes Apache Spark 3.0.1, Scala 2.12)         │ true  │ false │ false         │ 7.3 LTS Genomics           │ 3.0.1
  3 │ 10.2.x-gpu-ml-scala2.12          │ 10.2 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.2 ML                    │ 3.2.0
  4 │ 7.3.x-gpu-ml-scala2.12           │ 7.3 LTS ML (includes Apache Spark 3.0.1, GPU, Scala 2.12)          │ true  │ true  │ false         │ 7.3 LTS ML                 │ 3.0.1
  5 │ 8.4.x-photon-scala2.12           │ 8.4 Photon (includes Apache Spark 3.1.2, Scala 2.12)               │ false │ false │ true          │ 8.4 Photon                 │ 3.1.2
  6 │ 10.1.x-photon-scala2.12          │ 10.1 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.1 Photon                │ 3.2.0
  7 │ 9.1.x-photon-scala2.12           │ 9.1 LTS Photon (includes Apache Spark 3.1.2, Scala 2.12)           │ true  │ false │ true          │ 9.1 LTS Photon             │ 3.1.2
  8 │ 10.2.x-photon-scala2.12          │ 10.2 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.2 Photon                │ 3.2.0
  9 │ 8.3.x-scala2.12                  │ 8.3 (includes Apache Spark 3.1.1, Scala 2.12)                      │ false │ false │ false         │ 8.3                        │ 3.1.1
 10 │ 9.0.x-photon-scala2.12           │ 9.0 Photon (includes Apache Spark 3.1.2, Scala 2.12)               │ false │ false │ true          │ 9.0 Photon                 │ 3.1.2
 11 │ 8.4.x-cpu-ml-scala2.12           │ 8.4 ML (includes Apache Spark 3.1.2, Scala 2.12)                   │ false │ true  │ false         │ 8.4 ML                     │ 3.1.2
 12 │ 10.1.x-gpu-ml-scala2.12          │ 10.1 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.1 ML                    │ 3.2.0
 13 │ 9.1.x-scala2.12                  │ 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)                  │ true  │ false │ false         │ 9.1 LTS                    │ 3.1.2
 14 │ 10.0.x-cpu-ml-scala2.12          │ 10.0 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.0 ML                    │ 3.2.0
 15 │ 9.0.x-gpu-ml-scala2.12           │ 9.0 ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)              │ false │ true  │ false         │ 9.0 ML                     │ 3.1.2
 16 │ 9.0.x-scala2.12                  │ 9.0 (includes Apache Spark 3.1.2, Scala 2.12)                      │ false │ false │ false         │ 9.0                        │ 3.1.2
 17 │ 8.3.x-cpu-ml-scala2.12           │ 8.3 ML (includes Apache Spark 3.1.1, Scala 2.12)                   │ false │ true  │ false         │ 8.3 ML                     │ 3.1.1
 18 │ 10.1.x-cpu-ml-scala2.12          │ 10.1 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.1 ML                    │ 3.2.0
 19 │ 10.0.x-scala2.12                 │ 10.0 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.0                       │ 3.2.0
 20 │ apache-spark-2.4.x-esr-scala2.11 │ Light 2.4 Extended Support (includes Apache Spark 2.4, Scala 2.11) │ false │ false │ false         │ Light 2.4 Extended Support │ 2.4
 21 │ 10.1.x-scala2.12                 │ 10.1 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.1                       │ 3.2.0
 22 │ 9.1.x-cpu-ml-scala2.12           │ 9.1 LTS ML (includes Apache Spark 3.1.2, Scala 2.12)               │ true  │ true  │ false         │ 9.1 LTS ML                 │ 3.1.2
 23 │ 10.2.x-scala2.12                 │ 10.2 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.2                       │ 3.2.0
 24 │ 10.2.x-cpu-ml-scala2.12          │ 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.2 ML                    │ 3.2.0
 25 │ 8.3.x-photon-scala2.12           │ 8.3 Photon (includes Apache Spark 3.1.1, Scala 2.12)               │ false │ false │ true          │ 8.3 Photon                 │ 3.1.1
 26 │ 10.0.x-photon-scala2.12          │ 10.0 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.0 Photon                │ 3.2.0
 27 │ 10.0.x-gpu-ml-scala2.12          │ 10.0 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.0 ML                    │ 3.2.0
 28 │ 8.4.x-scala2.12                  │ 8.4 (includes Apache Spark 3.1.2, Scala 2.12)                      │ false │ false │ false         │ 8.4                        │ 3.1.2
 29 │ 9.1.x-gpu-ml-scala2.12           │ 9.1 LTS ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)          │ true  │ true  │ false         │ 9.1 LTS ML                 │ 3.1.2
 30 │ apache-spark-2.4.x-scala2.11     │ Light 2.4 (includes Apache Spark 2.4, Scala 2.11)                  │ false │ false │ false         │ Light 2.4                  │ 2.4
 31 │ 7.3.x-scala2.12                  │ 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)                  │ true  │ false │ false         │ 7.3 LTS                    │ 3.0.1
 32 │ 8.4.x-gpu-ml-scala2.12           │ 8.4 ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)              │ false │ true  │ false         │ 8.4 ML                     │ 3.1.2
 33 │ 9.0.x-cpu-ml-scala2.12           │ 9.0 ML (includes Apache Spark 3.1.2, Scala 2.12)                   │ false │ true  │ false         │ 9.0 ML                     │ 3.1.2
 34 │ 8.3.x-gpu-ml-scala2.12           │ 8.3 ML (includes Apache Spark 3.1.1, GPU, Scala 2.12)              │ false │ true  │ false         │ 8.3 ML                     │ 3.1.1
────┴──────────────────────────────────┴────────────────────────────────────────────────────────────────────┴───────┴───────┴───────────────┴────────────────────────────┴───────

Your output might look different depending on when you run the command.

But from this we can now start adding in some filters to get to the records we want. So if I want to find all of the runtimes which are Long Term Support but aren’t ML instances I can do the following.

> dbx-runtimes | where isLTS | where isML == $false | sort-by key
───┬────────────────────────┬────────────────────────────────────────────────────────────┬───────┬───────┬───────────────┬──────────────────┬───────
 # │          key           │                            name                            │ isLTS │ isML  │ photonEnabled │     runtime      │ spark
───┼────────────────────────┼────────────────────────────────────────────────────────────┼───────┼───────┼───────────────┼──────────────────┼───────
 0 │ 7.3.x-hls-scala2.12    │ 7.3 LTS Genomics (includes Apache Spark 3.0.1, Scala 2.12) │ true  │ false │ false         │ 7.3 LTS Genomics │ 3.0.1
 1 │ 7.3.x-scala2.12        │ 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)          │ true  │ false │ false         │ 7.3 LTS          │ 3.0.1
 2 │ 9.1.x-photon-scala2.12 │ 9.1 LTS Photon (includes Apache Spark 3.1.2, Scala 2.12)   │ true  │ false │ true          │ 9.1 LTS Photon   │ 3.1.2
 3 │ 9.1.x-scala2.12        │ 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)          │ true  │ false │ false         │ 9.1 LTS          │ 3.1.2
───┴────────────────────────┴────────────────────────────────────────────────────────────┴───────┴───────┴───────────────┴──────────────────┴───────

A lot simpler to read, and very easy to now work with. And if I want to save the results I could just add | save runtimes.csv and I’ll have a csv with the same data in it.

I’ve done the same with the Databricks cluster node types as well, though that is a lot less complex than the above one, but it makes being able to query for the information a lot simpler. And with Nushell providing great features for filtering, displaying, and getting data, it’s a smooth and easy workflow.


Source link