Source: https://pypl.github.io/PYPL.html
git
:
#include
: Includes header files.#define
: Defines macros for code replacement.#ifdef
, #ifndef
, #else
, #endif
: Conditional compilation.#pragma
: Compiler-specific directives.#define SQUARE(x) ((x) * (x))
int result = SQUARE(5); // Expands to: ((5) * (5))
-O
: Optimization levels; -g
: Debugging info; -std
: C++ standard.
g++ main.o helper.o -o my_program
From http://www.linfo.org/shell.html:
A shell is a program that provides the traditional, text-only user interface for Linux and other UNIX-like operating systems. Its primary function is to read commands that are typed into a console [...] and then execute (i.e., run) them. The term shell derives its name from the fact that it is an outer layer of an operating system. A shell is an interface between the user and the internal parts of the OS (at the very core of which is the kernel).
Bash
?Bash
stands for: Bourne Again Shell
, a homage to its creator Stephen Bourne. It is the default shell for most UNIX systems and Linux distributions. It is both a command interpreter and a scripting language. The shell might be changed by simply typing its name and even the default shell might be changed for all sessions.
macOS has replaced it with zsh, which is mostly compatible with Bash
, since v10.15 Catalina.
Other shells available: tsh, ksh, csh, Dash, Fish, Windows PowerShell, ...
As shell is a program, it has its variables. You can assign a value to a variable with the equal sign (no spaces!), for instance type A=1
. You can then retrieve its value using the dollar sign and curly braces, for instance to display it the user may type echo ${A}
. Some variables can affect the way running processes will behave on a computer, these are called environmental variables. For this reason, some variables are set by default, for instance to display the user home directory type echo ${HOME}
. To set an environmental variable just prepend export
, for instance export PATH="/usr/sbin:$PATH"
adds the folder /usr/sbin
to the PATH
environment variable. PATH
specifies a set of directories where executable programs are located.
Ctrl+Alt+F1
to login into a virtual terminal you get after successful login: a login shell (that is interactive).$PS1
).Bash
as a command line interpreterWhen launching a terminal a UNIX system first launches the shell interpreter specified in the SHELL
environment variable. If SHELL
is unset it uses the system default.
After having sourced the initialization files, the interpreter shows the prompt (defined by the environment variable $PS1
).
Initialization files are hidden files stored in the user's home directory, executed as soon as an interactive shell is run.
Initialization files in a shell are scripts or configuration files that are executed or sourced when the shell starts. These files are used to set up the shell environment, customize its behavior, and define various settings that affect how the shell operates.
login:
/etc/profile
, /etc/profile.d/*
, ~/.profile
for Bourne-compatible shells~/.bash_profile
(or ~/.bash_login
) for Bash
/etc/zprofile
, ~/.zprofile
for zsh
/etc/csh.login
, ~/.login
for csh
non-login: /etc/bash.bashrc
, ~/.bashrc
for Bash
interactive:
/etc/profile
, /etc/profile.d/*
and ~/.profile
/etc/bash.bashrc
, ~/.bashrc
for Bash
non-interactive:
/etc/bash.bashrc
for Bash
(but most of the times the script begins with: [ -z "$PS1" ] && return
, i.e. don't do anything if it's a non-interactive shell).$ENV
(or $BASH_ENV
) might be read.To get a little hang of the bash, let’s try a few simple commands:
echo
: prints whatever you type at the shell prompt.date
: displays the current time and date.clear
: clean the terminal.pwd
stands for Print working directory and it points to the current working directory, that is, the directory that the shell is currently looking at. It’s also the default place where the shell commands will look for data files.ls
stands for a List and it lists the contents of a directory. ls usually starts out looking at our home directory. This means if we print ls by itself, it will always print the contents of the current directory.cd
stands for Change directory and changes the active directory to the path specified.cp
stands for Copy and it moves one or more files or directories from one place to another. We need to specify what we want to move, i.e., the source and where we want to move them, i.e., the destination.mv
stands for Move and it moves one or more files or directories from one place to another. We need to specify what we want to move, i.e., the source and where we want to move them, i.e., the destination.touch
command is used to create new, empty files. It is also used to change the timestamps on existing files and directories.mkdir
stands for Make directory and is used to make a new directory or a folder.rm
stands for Remove and it removes files or directories. By default, it does not remove directories, unless you provide the flag rm -r
(-r
means recursively).rm
are lost forever, please be careful!When executing a command, like ls
a subprocess is created. A subprocess inherits all the environment variables from the parent process, executes the command and returns the control to the calling process.
A subprocess cannot change the state of the calling process.
The command source script_file
executes the commands contained in script_file
as if they were typed directly on the terminal. It is only used on scripts that have to change some environmental variables or define aliases or function. Typing . script_file
does the same.
If the environment should not be altered, use ./script_file
, instead.
To run your brand new script you may need to change the access permissions of the file. To make a file executable run
chmod +x script_file
Finally, remember that the first line of the script (the so-called shebang) tells the shell which interpreter to use while executing the file. So, for example, if your script starts with #!/bin/bash
it will be run by Bash
, if is starts with #!/usr/bin/env python
it will be run by Python
.
Some commands, like cd
are executed directly by the shell, without creating a subprocess.
Indeed it would be impossible the have cd
as a regular command!
The reason is: a subprocess cannot change the state of the calling process, whereas cd
needs to change the value of the environmental variable PWD
(that contains the name of the current working directory).
In general a command can refer to:
The shell looks for executables with a given name within directories specified in the environment variable PATH
, whereas aliases and functions are usually sourced by the .bashrc
file (or equivalent).
command_name
is: type command_name
.which command_name
.In order to live happily and without worries, don't use spaces nor accented characters in filenames!
Space characters in file names should be forbidden by law! The space is used as separation character, having it in a file name makes things a lot more complicated in any script (not just Bash
scripts).
Use underscores (snake case): my_wonderful_file_name
, or uppercase characters (camel case): myWonderfulFileName
, or hyphens: my-wonderful-file-name
, or a mixture:
myWonderful_file-name
, instead.
But not my wonderful file name
. It is not wonderful at all if it has to be parsed in a script.
cat
stands for Concatenate and it reads a file and outputs its content. It can read any number of files, and hence the name concatenate.wc
is short for Word count. It reads a list of files and generates one or more of the following statistics: newline count, word count, and byte count.grep
stands for Global regular expression print. It searches for lines with a given string or looks for a pattern in a given input stream.head
shows the first line(s) of a file.tail
shows the last line(s) of a file.file
reads the files specified and performs a series of tests in attempt to classify them by type.We can add operators between commands in order to chain them together.
|
, forwards the output of one command to another. E.g., cat /etc/passwd | grep my_username
checks system information about "my_username".>
sends the standard output of one command to a file. E.g., ls > files-in-this-folder.txt
saves a file with the list of files.>>
appends the output of one command to a file.&>
sends the standard output and the standard error to file.&&
pipe is activated only if the return status of the first command is 0. It is used to chain commands together: e.g., sudo apt update && sudo apt upgrade
||
pipe is activated only if the return status of first command is different from 0.;
is a way to execute to commands regardless of the output status.$?
is a variable containing the output status of the last command.tr
stands for translate. It supports a range of transformations including uppercase to lowercase, squeezing repeating characters, deleting specific characters, and basic find and replace. For instance:
echo "Welcome to Advanced Programming!" | tr [a-z] [A-Z]
converts all characters to upper case.echo -e "A;B;c\n1,2;1,4;1,8" | tr "," "." | tr ";" ","
replaces commas with dots and semi-colons with commas.echo "My ID is 73535" | tr -d [:digit:]
deletes all the digits from the string.sed
stands for stream editor and it can perform lots of functions on file like searching, find and replace, insertion or deletion. We give just an hint of its true power
echo "UNIX is great OS. UNIX is open source." | sed "s/UNIX/Linux/"
replaces the first occurrence of "UNIX" with "Linux".echo "UNIX is great OS. UNIX is open source." | sed "s/UNIX/Linux/2"
replaces the second occurrence of "UNIX" with "Linux".echo "UNIX is great OS. UNIX is open source." | sed "s/UNIX/Linux/g"
replaces all occurrencies of "UNIX" with "Linux".echo -e "ABC\nabc" | sed "/abc/d"
delete lines matching "abc".echo -e "1\n2\n3\n4\n5\n6\n7\n8" | sed "3,6d"
delete lines from 3 to 6.cut
is a command for cutting out the sections from each line of files and writing the result to standard output.
cut -b 1-3,7- state.txt
cut bytes (-b
) from 1 to 3 and from 7 to end of the lineecho -e "A,B,C\n1.22,1.2,3\n5,6,7\n9.99999,0,0" | cut -d "," -f 1
get the first column of a CSV (-d
specifies the column delimiter, -f n
specifies to pick the find
is used to find files in specified directories that meet certain conditions. For example: find . -type d -name "*lib*"
find all directories (not files) starting from the current one (.
) whose name contain "lib".locate
is less powerful than find
but much faster since it relies on a database that is updated on a daily base or manually using the command updatedb
. For example: locate -i foo
finds all files or directories whose name contains foo
ignoring case.Double quotes may be used to identify a string where the variables are interpreted. Single quotes identify a string where variables are not interpreted. Check the output of the following commands
a=yes
echo "$a"
echo '$a'
The output of a command can be converted into a string and assigned to a variable for later reuse:
list=`ls -l` # Or, equivalently:
list=$(ls -l)
./my_command &
Ctrl-Z
suspends the current subprocess.jobs
lists all subprocesses running in the background in the terminal.bg %n
reactivates the fg %n
brings the Ctrl-C
terminates the subprocess in the foreground (when not trapped).kill pid
sends termination signal to the subprocess with id pid
. You can get a list of the most computationally expensive processes with top
and a complete list with ps aux
(usually ps aux
is filtered through a pipe with grep
)All subprocesses in the background of the terminal are terminated when the terminal is closed (unless launched with nohup
, but that is another story...)
Most commands provide a -h
or --help
flag to print a short help information:
find -h
man command
prints the documentation manual for command.
There is also an info facility that sometimes provides more information: info command
.
git
Version control, also known as source control, is the practice of tracking and managing changes to software code. Version control systems are software tools that help software teams manage changes to source code over time.
git
is a free and open-source version control system, originally created by Linus Torvalds in 2005. Unlike older centralized version control systems such as SVN and CVS, Git is distributed: every developer has the full history of their code repository locally. This makes the initial clone of the repository slower, but subsequent operations dramatically faster.
git
work?git clone
(download) the repository.git add
a file to your local repo.git commit
(save) the changes, this is a local action, the remote repository (the one in the cloud) is still unchanged.git push
your changes, this action synchronizes your version with the one in the hosting platform.git
works? (Collaborative)If you and your teammates work on different files the workflow is the same as before, you just have to remember to pull
the changes that your colleagues made.
If you have to work on the same files, the best practice is to create a new branch
, which is a particular version of the code that branches form the main one. After you have finished working on your feature you merge
the branch into the main.
git
commandsgit diff
shows the differences between your code and the last commit.git status
lists the status of all the files (e.g. which files have been changed, which are new, which are deleted and which have been added).git log
shows the history of commits.git checkout
switches to a specific commit or brach.git stash
temporarily hides all the modified tracked files.An excellent visual cheatsheet can be found here.
git config --global user.name "Name Surname"
git config --global user.email "name.surname@email.com"
See here for more details on SSH authentication.
Clone the course repository:
git clone git@github.com:pcafrica/hpc_for_data_science_2023-2024.git
Before every lecture, download the latest updates by running:
git pull origin main
from inside the cloned folder.
Perform the following tasks in your command-line terminal.
test1
.test1
and create a new directory test2
.test2
and go up one directory.f1.txt
, f2.txt
, f3.dat
, f4.md
, README.md
, .hidden
..txt
files.README.md
to folder test2
..txt
files to test2
in one command.f3.dat
.test1
and the folder itself in one command.You can access an open dataset of logs collected from a high-performance computing cluster at the Los Alamos National Laboratories. The dataset is available on this webpage.
To download the dataset using wget
, run the following command:
wget https://raw.githubusercontent.com/logpai/loghub/master/HPC/HPC_2k.log_structured.csv
After downloading the dataset, perform the following analyses using only Bash commands.
nodes.log
In this exercise, you'll create a Bash script that automates the process of creating a backup of a specified directory. The script should accomplish the following tasks:
Note: You can use basic commands like read
, mkdir
, cp
, tar
, and echo
.
Hint: Generate a timestamp in the format YYYYMMDD_hhmmss with date +%Y%m%d_%H%M%S
.
backup.sh
.backup_<timestamp>
) inside a specified backup directory (you can define this directory at the beginning of your script).backup_<timestamp>.tar.gz
.git
. Collaborative file management (1/3)+
button in the top right corner), and ensure everyone clones it.git
. Collaborative file management (2/3)Now, let's work on the same file, main.cpp
. Each person should create a hello world main.cpp
that includes a personalized greeting with your name. To prevent conflicts, follow these steps:
git checkout -b [new_branch]
.main
branch using the following commands:git checkout main
git pull origin main
git merge [new_branch]
git push origin main
git
. Collaborative file management (3/3)git
conflictsThe first person to complete this process will experience no issues. However, subsequent participants may encounter merge conflicts.
Git will mark the conflicting sections in the file. You'll see these sections surrounded by <<<<<<<
, =======
, and >>>>>>>
markers.
Carefully review the conflicting sections and decide which changes to keep. Remove the conflict markers (<<<<<<<
, =======
, >>>>>>>
) and make the necessary adjustments to the code to integrate both sets of changes correctly.
After resolving the conflict, commit your changes and push your resolution to the repository.