1. General Information
Sorting lines of text is a common task in Linux. In this tutorial we will learn thesort bycommand for examples.
2. Introduction tosort bydomain
Osort bycommand can help us to reorganize the lines ofstandard input(stdin) or from a text file.Osort byThe command writes the sorted result tostandard outlet(standard output).It is available on all Linux distributions as it is part of theGNU CoreutilsPackage.
Usage syntaxsort byThe command is simple:
sort [OPTION]... [FILE]...
Osort byBy default, the utility sorts the lines alphabetically:
$ cat cities.txtNew YorkParisBeijingHamburgLos AngelesAmsterdam$ sort cities.txtAmsterdamBeijingHamburgLos AngelesNew York CityParis
However, if we pass several options thatsort byThe command can do more than that, e.g. B. Sort lines by number, in reverse order, by field, etc.
Let's look at some examples to learn how to sort rows in different ways using thesort byDomain.
3. Sort by number
Many times we need to sort rows numerically. We can pass the option-NORTEForsort byto do that.
Let's create a new file,Cities2.txt,and add a new column: Population in millions. We order the lines in the new file by population:
$ cat Cities2.txt8.18 New York City2.15 Paris21.45 Beijing1.82 Hamburg3.90 Los Angeles1.38 Amsterdam$ sort -n Cities2.txt1.38 Amsterdam1.82 Hamburg2.15 Paris3.90 Los Angeles8.18 New York City21.45 Beijing
Well, the rows are ordered by population in the output above.
classify -ncan help us sort the rows by decimals. However, it cannot correctly classify signed binary or hexadecimal numbers.
4. Sort in reverse order
To sort the files in reverse order, we use the option-R.
Now we are going to sort the file.Cities2.txtby population in descending order:
$ sort -nr cities2.txt 21.45 Beijing8.18 New York3.90 Los Angeles2.15 Paris1.82 Hamburg1.38 Amsterdam
5. Sort by month
Sometimes there are months in our text, like "November" or "August".Osort byThe command admits the expedient-METROOption to sort rows by month:
$ cat Monate.txt OutubroJaneiroDezembroNovembroAgosto$ black -M Monate.txtJaneiroAgostoOutubroNovembroDezembro
6. Sort by ASCII character code
Occasionally, we want to sort the rows byASCIIcharacter code not text.
Let's look at a text file:
$ gato ascii.txtCBbcAa
If we do this with the default option ofsort bycommand we get:
$ ordenar ascii.txtaAbBcC
The result is sorted alphabetically.
However, it is not in the order of the ASCII code. For example, an uppercase "A" has an ASCII code of 65, while a lowercase "a" has an ASCII code of 97.
LC_ALLis the environment variable that overrides other locales.To sort lines by ASCII code, we need to set the environment variableLC_TODO=C,therefore, we force a byte-by-byte ordering.
Let's see how this environment variable changes the default behavior ofsort byDomain:
$ LC_ALL=C ordena ascii.txtABCabc
In the above command, we set theLC_TODO=Ctemporarily only forsort bycommand execution. nothing will changeLC_ALLvalue in the current shell.
7. Write the sorted output to a file
By default, thesort byThe command writes the result to stdout.Sometimes we want to save the output to a file. we can pass them-MADREpossibility ofsort byCommand to write the result to a file instead of stdout:
$ ordenar -o ascii_result.txt ascii.txt$ cat ascii_result.txtaAbBcC
In addition to using the-Öoption, we can also redirect stdout to our output file:
$ ordenar ascii.txt > ascii_result.txt
However, if we want to write the sorted result back to the input file, we must do so via a temporary file:
$ ordenar ascii.txt > ordenado.tmp && mv ordenado.tmp ascii.txt$ cat ascii.txtaAbBcC
8. Sort and remove duplicates
when we passed- youpossibility ofsort byThe command produces a "unique" result, generates sorted rows, and removes duplicates:
$ cat dup.txt New YorkParisBeijingParisNew YorkHamburgNew YorkHamburg$ sort -u dup.txt BeijingHamburgNew YorkParis
9. Sort by keys
Until now, we have always sorted items at the beginning of the lines.We can also sort the rows by key. For this we pass the-kpossibility ofsort byDomain.
It is very useful when we need to sort some data based on fields like CSV files.
Let's learn how to use a work time report in a CSV file (Name, month, working hours, comments):
$ cat working_hours.csvDr.Schmidt Jan 123 a few comments... Mr. Green Feb 20 A few comments... Dr. Schmidt Mar 25 a few comments... Mr Adams Jan 77 a few comments... Mr. Green, Jan 45, some comments... Mr. Adams, Feb 150, some comments... Mr. Adams, March '80, some comments... Mr. Green, March '107, some comments.. Dr. Schmidt, February '87, some comments...
Next, we'll see how to sort the CSV file by fields and part of a field.
9.1. Sort by a field
Let's say we want to sort the rows by the third field:working day.
Let's take a look at thosesort byCommand to solve the problem first:
$sort -t, -k 3n,3 working_hours.csvMs.Green Feb 20, some comments... Dr. Schmidt, Mar 25, some comments... Ms. Green, Jan 45, some comments... Mr. Adams, January 77, some comments... Mr. Adams, March 80, some comments... Dr. Schmidt, February 1987, some comments... Mrs. Green, March 107, some comments. .. Dr. Schmidt, Jan 123, some comments... Mr. Adams, Feb 150, some comments...
Now let's understand each part of the command.
The default field separator for thesort byThe command is space. We can also define a custom field separator with the option-T.Since the fields in our CSV file are separated by commas, we have "-T,".
Then we define a sort key,3n,3for him-kPossibility. Defining a key insort bythe command is:
POS1[sort options],POS2
OPOS1shows the initial position of the key while thePOS2is the last key position. if we don'tPOS2, the end of the line is used asPOS2.
Our goal is to sort by the third field with the-NORTE(numeric), then we have-k3n,3.
9.2. Sort by part of a field
Sorting by fields can help us a lot when it comes to sorting data based on fields. However, sometimes we want part of a field to be the sort key.
Now let's expand on the requirement from the previous section:We want to rank our firstworking hours.csvSort by personal name and then by working hours.
Sorting by work hours isn't a problem for us, but we found that to sort by people's names, we need to exclude titles (Sra. Sr. Dr.) of the 1st field.
Let's first see the solution and then understand how to sort by part of a field:
$ sort -t, -k 1.4,1 -k 3n,3 working_hours.csv Mr.Adams, Jan '77, some comments... Mr.Adams, Mar '80, some comments... Mr.Adams, Feb. 150 , some comments... Ms Green, Feb 20, some comments... Ms Green, Jan 45, some comments... Ms Green, March 107, some comments... Dr, 25, Some comments. .Dr.Schmidt,Feb,87,Some comments...Dr.Schmidt,Jan,123,Some comments...
the classification key1.4,1managed Let us understand its meaning.
We learned that a sort key is defined asPOS1, POS2. In addition, eachVATis defined asFC, so a complete sort key definition is:
F1.C1[Sort Options],F2.C2
OF1miF2represent the field indices. in this case they are1for the 1st field.
C1is the index of characters within the fieldF1to start the range comparison. If we don't define aC1, the comparison starts with the first character of the fieldF1.
C2is the index of characters within the fieldF2to end the range comparison. if we omitC2, the sort comparison ends at the last character of the fieldF2.
To remove the titles from our exampleNameThe field must start the range comparison from the fourth character. that's why we have-k 1,4,1″.
10. Rate a TSV file
So far, we've learned how to sort a file by fields, and we've used CSV input files as examples.
In practice, TSV (Tab Separated Values) is another commonly used data format. In this section, we'll sort a TSV file and review the method of sorting by field.
Suppose some famous movie actors get together for a weightlifting game and the result of the game (Name, body weight <BW>, result <BW>) is written to a TSV file:
$ cat match_result.tsvbrad Pitt 78,50 150,00 Michael Caine 77,60 149,50tom Hanks 79,00 148,00 CARY GRANT 78,80 149,50spike Lee 80,00 149,50vin Diesel 77,89 150,00david Tennant 79,50,50,50,50,00
Now it's our job to calculate your ratings for the game. According to the rule of lifting weights:
- Two players have different scores: the player with the highest score wins.
- Two players have the same score: the player with the lower body weight wins.
For this reason,First we need to sort by the third field in descending order and then by the second field in ascending order.
The tricky part of this problem is sorting by two fields, but it's not a challenge for us right now. We can create the command to sort by fields:ordenar -k3nr,3 -k2n,2 match_result.tsv.
We still need to specify Tab as the field separator with the-TPossibility.
Let's try:
$ sort -t'\t' -k3nr,3 -k2n,2 match_result.tsvsort: Multi-character string "\\t"
Hoppla,sort bycandies\T' as a multiple character! Next, we will see how to correctly pass Tab as a field separator.
10.1. Pass tab as field separator
We have two ways to pass Tab to as a field separatorsort byDomain:
- Skip a literal tab
- Escape the tab character
Normally when we enterABAtyping on the command line triggers command completion instead of displaying a literal tab on the command.
However,We can add a tab literal on the command line by writing it firstSTRG-VSoABA:
$ sort -t '' -k3nr, 3 -k2n, 2 match_result.tsvjackie chan 78.77 151.00vin diesel 77,89 150.00brad pitt 78.50 150.00michael caine 77.6 149.50 cary runci 78.80 149.50spike 80.00.00.0000 149.50TOM 147,50
We should note that this is the case in the above command.-t ‘<TAB>’.
Another way toskip guide to-TThe option is to exit with the tab keyQuotation ANSI-C:
$ sort -t $ '\ t' -k3nr, 3 -k2n, 2 match_result.tsvjackie chan 78,77 151,00vin Diesel 77,89 150,00brad 148,00 David Tennant 79,50 147,50
Excellent! We have solved the problem. Jackie Chan won first prize!
11. Conclusion
sort byis a handy and simple command line utility. In this article, we learned some common uses ofsort bycommand for examples.
Sort by keys with the-kThe option is not as easy as other sorting options. But it allows us to sort the data based on the field with more flexibility.
We must take into account that when sorting the TSV files, we must correctly pass Tab as the field separator.
With this handy tool in our command line arsenal, we can easily solve most classification problems.