The Swiss Army Knife of Text Operations in Linux

Share on:

One of the most common tasks performed on the command line is sorting data. This can be done using the sort command, which is part of the GNU Core Utilities package. Sorting is useful when you need to arrange a list of items in a specific order, such as alphabetically or numerically.

Here's an example of how to use the sort command:

1LC_ALL=C sort -u -b -i -f -S 80% --parallel=8 file.txt

Let's break down this command:

  • LC_ALL=C: This sets the locale to C, which is a neutral language setting that can be used with any character set.
  • -u: Only output unique lines from the input files. If this option is not specified, duplicate lines will be included in the output.
  • -b: Ignore leading whitespace characters when sorting. This option is useful if your data contains spaces or tabs at the beginning of each line.
  • -i: Ignore case when sorting. This means that uppercase and lowercase letters are considered equivalent for purposes of comparing two strings.
  • -f: Fold upper- and lower-case letters. Like -i, this option makes the sort command case-insensitive, but it also folds all upper-case letters into their corresponding lower-case counterparts before performing the comparison.
  • -S: Sort a file using temporary files that take up to 80% of free disk space (default is 25%).
  • --parallel=8: Use eight processes in parallel to perform the sort. This can speed up the sorting process by allowing multiple cores to work on the task simultaneously.

The file.txt at the end of the command specifies the input file that you want to sort.

Another common operation performed on the command line is deleting data from files, directories, or databases. The json format is a popular way to store structured data, and it can be used with various tools and libraries to manipulate JSON documents.

Here's an example of how to delete an element from a JSON file using the jq command:

1LC_ALL=C jq --raw-output -c 'del(.invite_code | select (..))' file

This command deletes the .invite_code field from each object in the input file. The | character is used to pipe the output of one command into another, and the select() function is used to filter out the elements that match a specific condition.

If you need to perform multiple deletions on a single line, you can use pipes (|) to chain together several jq commands:

1LC_ALL=C jq 'del(.updatedAt | select (..))' file | jq 'del(.createdAt | select (..))' | jq 'del(.roles | select (..))' | jq --raw-output -c 'del(.preferences.fcm_token | select (..))' > outfile

This example deletes four different fields from the input JSON document and outputs the result to a new file called outfile.

Directories can be deleted using the rm command, which is also part of the GNU Core Utilities package. You can use the -f option to force deletion without prompting for confirmation, and the -r option to delete all subdirectories recursively.

Here's an example that deletes all directories that don't match a specific pattern:

1find . -type d -not -name "US*" -exec rm -f -r '{}' \;

The find command is used to locate the directories, and -type d specifies that only directories should be included in the search. The -not -name "US*" option excludes any directory names containing the string "US". Finally, the -exec rm -f -r {} \; part of the command actually deletes each matching directory.

Finally, if you need to perform a batch operation on multiple files or directories, you can use a for loop in combination with other commands. For example, here's how you could decompress all RAR archives in the current directory using a password:

1for f in *.rar; do echo [password] | unrar x $f; done

This script will prompt for a password once and then use it to extract each RAR archive found.

Another useful command is find, which can locate files or directories based on various criteria. Here's an example that finds all DOC files in the current directory and its subdirectories, and converts them to TXT format using the catdoc utility:

1find . -iname "somefile.doc" -exec bash -c '/usr/bin/catdoc "{}" >> "{}".txt'  \;

The -iname option makes the search case-insensitive.

If you are looking for directories with very specific names, you can use a regular expression (regex) with the find command. Here's an example that finds all directories in the current directory with names consisting of only uppercase letters and containing exactly two characters:

1find .  -maxdepth 1 -type d  -regextype egrep  -regex ".*/[A-Z]{2}"

This will output a list of matching directory paths.