text edit tools II

sort

sort is used to rearrange the lines of a text file either in ascending or descending order, according to a sort key. You can also sort by particular fields of a file. The default sort key is the order of the ASCII characters (i.e., essentially alphabetically).

sort can be used as follows:

Syntax Usage
sort <filename> Sort the lines in the specified file
cat file1 file2 | sort Append the two files, then sort the lines and display the output on the terminal
sort -r <filename> Sort the lines in reverse order

uniq is used to remove duplicate lines in a text file and is useful for simplifying text display. uniq requires that the duplicate entries to be removed are consecutive. Therefore one often runs sort first and then pipes the output into uniq; if sort is passed the -u option it can do all this in one step.

sort file1 file2 | uniq > file3

sort -u file1 file2 > file3

paste can be used to create a single file containing all three columns. The different columns are identified based on delimiters (spacing used to separate two fields). For example, delimiters can be a blank space, a tab, or an Enter. In the image provided, a single space is used as the delimiter in all files.

paste accepts the following options:

  • -d delimiters, which specify a list of delimiters to be used instead of tabs for separating consecutive values on a single line. Each delimiter is used in turn; when the list has been exhausted, paste begins again at the first delimiter.
  • -s, which causes paste to append the data in series rather than in parallel; that is, in a horizontal rather than vertical fashion.

To paste contents from two files one can do:
$ paste file1 file2

The syntax to use a different delimiter is as follows:
$ paste -d, file1 file2

Common delimiters are ‘space’, ‘tab’, ‘|’, ‘comma’, etc

Suppose you have two files with some similar columns. You have saved employees’ phone numbers in two files, one with their first name and the other with their last name. You want to combine the files without repeating the data of common columns. How do you achieve this?

The above task can be achieved using join, which is essentially an enhanced version of paste. It first checks whether the files share common fields, such as names or phone numbers, and then joins the lines in two files based on a common field.

join

To combine two files on a common field, at the command prompt type join file1 file2 and press the Enter key.

$ cat phonebook
555-123-4567 Bob
555-231-3325 Carol
555-340-5678 Ted
555-289-6193 Alice
$ cat directory
555-123-4567 Anytown
555-231-3325 Mytown
555-340-5678 Yourtown
555-289-6193 Youngstown
The result of joining these two file is as shown in the output of the following command:
$ join phonebook directory
555-123-4567 Bob Anytown
555-231-3325 Carol Mytown
555-340-5678 Ted Yourtown
555-289-6193 Alice Youngstown

split is used to break up (or split) a file into equal-sized segments for easier viewing and manipulation, and is generally used only on relatively large files.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s