A FEW USEFUL UNIX COMMANDS ========================== We'd like to present a few commands/tricks that might be useful when running scientific programs. The purpose of this document is to give a very concise review of some very useful techniques. Although the Unix commands themselves are to a large extent the same on different types of machines (e.g. SunOS, Linux,...), there exist several flavors of shells, i.e. several types of command line interface, among which the user can choose. They each have a (slightly) different programming syntax and so we will restrict ourselves here to the bash shell, as it is one of the more comfortable ones. 1) COMPILING AND RUNNING PROGRAMS Copy this very simple C++ program into a file called prog.cpp: //////////////////////////////////////////////////////////////////// #include #include using namespace std; int main() { float temp; cout << "Please enter a number: "; cin >> temp; cout << endl << "The square root of " << temp << " is " << sqrt(temp) << "." << endl; return 0; } //////////////////////////////////////////////////////////////////// To compile it type % g++ prog.cpp The sign "%" in the previous line should NOT be typed. It only indicates which Unix commands you have to type. The executable created by g++ will be called a.out (that's the default). To run this program type, % ./a.out If you would like to give the executable program a more descriptive name than a.out (e.g. prog.exe) you can type: % g++ -o prog.exe prog.cpp To tell the compiler to optimize the executable for speed, simply type % g++ -O -o prog.exe prog.cpp 2) READING INPUT FROM AND SAVING RESULTS TO A FILE Sometimes we would like to have the program read its input from a file instead of the keyboard. To see how this can be done create a file called prog.input with one line containing the input parameter. To have the program read its input from this file type % ./prog.exe < prog.input To run the program in the background, % ./prog.exe < prog.input & When running a program in the background it is often inconvient to have output on the screen. To save the output in a file with name prog.output instead of displaying it on the screen type, % ./prog.exe < prog.input > prog.output & and have a look at the output with % cat prog.output If the file prog.output exists already, the shell will overwrite the existing file. If you would like the output to be appended to the file prog.output you can use >> instead % ./prog.exe < prog.input >> prog.output & With the previous commands, not everything is sent to the output file. In particular, error messages keep being sent to the screen. If you also want to have the error messages in the file, use % ./prog.exe < prog.input &> prog.output & or % ./prog.exe < prog.input &>> prog.output & Sometimes one would like to see the output AND have a copy of it in a file. Again, that's easy % ./prog.exe < prog.input | tee prog.output "tee" reads from standard input and writes to standard output and files. 3) WRITING SCRIPTS TO AUTOMATE THE EXECUTION OF PROGRAMS Suppose you have a program calculating some physical properties given a few input parameters. You now want to repeat these calculations for a large number of values of the input parameters, each time saving the results into different output files. How can you avoid having to do this by hand? The solution is to write a small script. For example, ###################################################################### #!/bin/bash for temp in 10 20 30 40 50 do ./prog.exe > prog.output.$temp << MARKER $temp MARKER done ###################################################################### Copy this script into a file called run_prog, make it executable (with the command 'chmod u+x run_prog'), and run it (./run_prog). You will see that the program prog.exe will be called for the temperatures temp=10, 20, ..., 50, with the results saved in prog.out.10, ... One can also make the list of sampling values a parameter of the script. Replace the second line with % for temp in $* and try the command % ./run_prog 5 15 25 35 45 4) PIPES AND ALL THAT One of the philosophical principles behind the Unix command line is to have small utility programs doing only one type of operation (but do it well). More complex operations are then realized by "linking" together these building blocks. For example, the following command prints a list of the integer from 1 to 10: % echo `seq 1 10` I can use this as an input for the script run_prog, with % ./run_prog `seq 1 10` We can now sort the results stored in all the output files according to the fifth column % cat prog.output.* | grep "[0-9]" | sort -n -k 5 As a final example, we have seen during the lectures a small C++ program, which was able to take a text and extract from it a list of all the words appearing in it. Here it is. //////////////////////////////////////////////////////////////////// #include #include #include #include #include using namespace std; int main() { vector data; copy(istream_iterator(cin),istream_iterator(), back_inserter(data)); sort(data.begin(), data.end()); unique_copy(data.begin(), data.end(),ostream_iterator(cout,"\n")); } //////////////////////////////////////////////////////////////////// Here is the same thing in one line, using the Unix shell. % cat shakespeare.txt | tr -cs "A-Za-z" "[\012*]" | tr A-Z a-z | sort \ | uniq | more The power and flexibility of this approach is clear. Its weakness is speed: compare the time needed for texts of increasing size with that needed by the C++ program. Still, for just one play by Shakespeare, it is still fast enough. Suppose now that you also want to count how many times each word appears in the text, and sort the results in order of increasing frequency. That's easy: % cat shakespeare.txt | tr -cs "A-Za-z" "[\012*]" | tr A-Z a-z | sort \ | uniq -c | sort -rn | head -50 How would you modify the C++ program to do the same job?