String Manipulation

refer to linux foundation course from edx:

Operator Meaning
[[ string1 > string2 ]] Compares the sorting order of string1 and string2.
[[ string1 == string2 ]] Compares the characters in string1 with the characters in string2.
myLen1=${#string1} Saves the length of string1 in the variable myLen1.

At times, you may not need to compare or use an entire string. To extract the first character of a string we can specify:
${string:0:1} Here 0 is the offset in the string (i.e., which character to begin from) where the extraction needs to start and 1 is the number of characters to be extracted.
To extract all characters in a string after a dot (.), use the following expression: ${string#*.}

to check if a file exists, use the following conditional test:
[ -e <filename> ]
to check if a directory exists, use the following conditional test:
[ -d <filename> ]
to check if a sym-link exists, use the following conditional test:
[ -s <sym-link> ]


Regular Expressions and Search Patterns

some examples:

Command Usage
a.. matches azy
b.|j. matches both br and ju
..$ matches og
l.* matches lazy dog
l.*y matches lazy
the.* matches the whole sentence

search pattern

Search Patterns Usage
.(dot) Match any single character
a|z Match a or z
$ Match end of string
* Match preceding item 0 or more times
Command Usage
grep [pattern] <filename> Search for a pattern in a file and print all matching lines
grep -v [pattern] <filename> Print all lines that do not match the pattern
grep [0-9] <filename> Print the lines that contain the numbers 0 through 9
grep -C 3 [pattern] <filename> Print context of lines (specified number of lines above and below the pattern) for matching the pattern. Here the number of lines is specified as 3.

wc (word count) counts the number of lines, words, and characters in a file or list of files. Options are given in the table below.

By default all three of these options are active.

For example, to print the number of lines contained in a file, at the command prompt type wc -l filename and press the Enter key

wc -l (lines)

wc -w (words)

wc -c (charactors)

text edit tools II


sort is used to rearrange the lines of a text file either in ascending or descending order, according to a sort key. You can also sort by particular fields of a file. The default sort key is the order of the ASCII characters (i.e., essentially alphabetically).

sort can be used as follows:

Syntax Usage
sort <filename> Sort the lines in the specified file
cat file1 file2 | sort Append the two files, then sort the lines and display the output on the terminal
sort -r <filename> Sort the lines in reverse order

uniq is used to remove duplicate lines in a text file and is useful for simplifying text display. uniq requires that the duplicate entries to be removed are consecutive. Therefore one often runs sort first and then pipes the output into uniq; if sort is passed the -u option it can do all this in one step.

sort file1 file2 | uniq > file3

sort -u file1 file2 > file3

paste can be used to create a single file containing all three columns. The different columns are identified based on delimiters (spacing used to separate two fields). For example, delimiters can be a blank space, a tab, or an Enter. In the image provided, a single space is used as the delimiter in all files.

paste accepts the following options:

  • -d delimiters, which specify a list of delimiters to be used instead of tabs for separating consecutive values on a single line. Each delimiter is used in turn; when the list has been exhausted, paste begins again at the first delimiter.
  • -s, which causes paste to append the data in series rather than in parallel; that is, in a horizontal rather than vertical fashion.

To paste contents from two files one can do:
$ paste file1 file2

The syntax to use a different delimiter is as follows:
$ paste -d, file1 file2

Common delimiters are ‘space’, ‘tab’, ‘|’, ‘comma’, etc

Suppose you have two files with some similar columns. You have saved employees’ phone numbers in two files, one with their first name and the other with their last name. You want to combine the files without repeating the data of common columns. How do you achieve this?

The above task can be achieved using join, which is essentially an enhanced version of paste. It first checks whether the files share common fields, such as names or phone numbers, and then joins the lines in two files based on a common field.


To combine two files on a common field, at the command prompt type join file1 file2 and press the Enter key.

$ cat phonebook
555-123-4567 Bob
555-231-3325 Carol
555-340-5678 Ted
555-289-6193 Alice
$ cat directory
555-123-4567 Anytown
555-231-3325 Mytown
555-340-5678 Yourtown
555-289-6193 Youngstown
The result of joining these two file is as shown in the output of the following command:
$ join phonebook directory
555-123-4567 Bob Anytown
555-231-3325 Carol Mytown
555-340-5678 Ted Yourtown
555-289-6193 Alice Youngstown

split is used to break up (or split) a file into equal-sized segments for easier viewing and manipulation, and is generally used only on relatively large files.


Common text edit tools

refer to linux foundation from Edx:

Command Usage
cat file1 file2 Concatenate multiple files and display the output; i.e., the entire content of the first file is followed by that of the second file.
cat file1 file2 > newfile Combine multiple files and save the output into a new file.
cat file >> existingfile Append a file to the end of an existing file.
cat > file Any subsequent lines typed will go into the file until CTRL-D is typed.
cat >> file Any subsequent lines are appended to the file until CTRL-D is typed.

The tac command (cat spelled backwards) prints the lines of a file in reverse order. (Each line remains the same but the order of lines is inverted.) The syntax of tac is exactly the same as for cat as in

Command Usage
echo string > newfile The specified string is placed in a new file.
echo string >> existingfile The specified string is appended to the end of an already existing file.
echo $variable The contents of the specified environment variable are displayed.

$ less <filename>
$ cat <filename> | less

head reads the first few lines of each named file (10 by default) and displays it on standard output. You can give a different number of lines in an option

$ head –n 5 atmtrans.txt

tail prints the last few lines of each named file and displays it on standard output. By default, it displays the last 10 lines.

$ tail -n 15 atmtrans.txt

Command Description
$ zcat compressed-file.txt.gz To view a compressed file
$ zless <filename>.gz
$ zmore <filename>.gz
To page through a compressed file
$ zgrep -i less test-file.txt.gz To search inside a compressed file
$ zdiff filename1.txt.gz
To compare two compressed files
Command Usage
sed -e command <filename> Specify editing commands at the command line, operate on file and put the output on standard out (e.g., the terminal)
sed -f scriptfile <filename> Specify a scriptfile containing sed commands, operate on file and put output on standard out.
Command Usage
sed s/pattern/replace_string/ file Substitute first string occurrence in a line
sed s/pattern/replace_string/g file Substitute all string occurrences in a line
sed 1,3s/pattern/replace_string/g file Substitute all string occurrences in a range of lines
sed -i s/pattern/replace_string/g file Save changes for string substitution in the same file

You must use the -i option with care, because the action is not reversible. It is always safer to use sed without the –i option and then replace the file yourself, as shown in the following example:

$ sed s/pattern/replace_string/g file1 > file2

The above command will replace all occurrences of pattern with replace_string in file1 and move the contents tofile2. The contents of file2 can be viewed with cat file2. If you approve you can then overwrite the original file with mv file2 file1.

Example: To convert 01/02/… to JAN/FEB/…
sed -e ‘s/01/JAN/’ -e ‘s/02/FEB/’ -e ‘s/03/MAR/’ -e ‘s/04/APR/’ -e ‘s/05/MAY/’ \
-e ‘s/06/JUN/’ -e ‘s/07/JUL/’ -e ‘s/08/AUG/’ -e ‘s/09/SEP/’ -e ‘s/10/OCT/’ \
-e ‘s/11/NOV/’ -e ‘s/12/DEC/’


awk is used to extract and then print specific contents of a file and is often used to construct reports.

awk is invoked as shown in the following:

Command Usage
awk ‘command’ var=value file Specify a command directly at the command line
awk -f scriptfile var=value file Specify a file that contains the script to be executed along with f

As with sed, short awk commands can be specified directly at the command line, but a more complex script can be saved in a file that you can specify using the -f option.

The table explains the basic tasks that can be performed using awk. The input file is read one line at a time, and for each line, awk matches the given pattern in the given order and performs the requested action. The -F option allows you to specify a particular field separator character. For example, the /etc/passwd file uses : to separate the fields, so the -F: option is used with the /etc/passwd file.

The command/action in awk needs to be surrounded with apostrophes (or single-quote (‘)). awk can be used as follows:

Command Usage
awk ‘{ print $0 }’ /etc/passwd Print entire file
awk -F: ‘{ print $1 }’ /etc/passwd Print first field (column) of every line, separated by a space
awk -F: ‘{ print $1 $6 }’ /etc/passwd Print first and sixth field of every line