Symbolic Links
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can you reuse one file in multiple directories?
How can you ease moving around your file system?
Objectives
Detail symbolic links, and how they are used
Learn when not to use symbolic links, and how to avoid pitfalls in using them.
A symbolic link, also called a soft link, is a pointer which enables you to find another file, much like a shortcut in Windows. Like these it is useful for creating shortcuts within the file system, for simplifying the file paths used by other programs, or easing your navigation between different work directories in a networked system (important when working on HPC systems).
It is important to remember that symbolic links do not point directly to any data that might be in the target, they instead point to the file system itself. This allows you to link to either files or directories using the same command, and also to link to filesystems hosted on remote computers. But it also means that there is a high risk of data loss if the remote files are moved or deleted. Because of this it is recommended that you use them sparingly in your workflows.
We will use the dataset from the BASH introduction course to demonstrate the use of links.
A symbolic link can be created using the command:
cd ~/Desktop/
ln -s data-shell/molecules
ls -l molecules
lrwxrwxrwx 1 user group 20 Feb 8 20:00 molecules -> data-shell/molecules/
This has created a symlink to the molecules
directory, with the name molecules
.
Like cp
, ln
will default to the given object name, but unlike the copy command it does
not need to be given a destination location.
You can identify symlinks by the @
following their name if ls -F
is used:
ls -F
data-shell/ data-shell.zip molecules@
You can use cd
to enter, and exit, this directory, as you would any other directory:
cd ~/Desktop/molecules
pwd
cd ..
pwd
/home/jon/Desktop/molecules
/home/jon/Desktop
This is because the cd
command is able to resolve, or track, the symlinks by processing
them after following the ..
path in the second cd
command. We can disable this ability,
using the -P
flag (which forces cd
to resolve the symlinks to the original directory
structure before following ..
paths):
cd ~/Desktop/molecules
pwd
cd -P ..
pwd
/home/jon/Desktop/molecules
/home/jon/Desktop/data-shell
In this case, even though we seemed to be in ~/Desktop/molecules
, using ..
while using
the -P
flag takes us to the ~/Desktop/data-shell
directory, because that is the true
parent directory. This relationship is made explicitly clear if we use -P
for the first
cd
command:
cd -P ~/Desktop/molecules
pwd
cd ..
pwd
/home/jon/Desktop/data-shell/molecules
/home/jon/Desktop/data-shell
In this case we arrive directly in the original directory with the first cd
command,
meaning that it does not matter whether we use the -P
flag or not for the second command,
we will always arrive back in the data-shell
directory.
Shell intrinsic commands, such as ls
and pwd
are able to make use of the shell’s tracking
of the symlink, so that they deal with the directory structure as we would expect. Commands
such as ls
and cp
are not able to do this, and so they always resolve symlinks to the
original directory structure before following ..
paths.
Because of this behaviour, it is advised that you avoid using ..
paths which cross a
symlink in your scripts - in this situation it would be safer to use the absolute path
(or a path relative to a fixed point, such as your home directory ~/
).
Symlinks can be removed without destroying the object they point to:
cd ~/Desktop
ls -l molecules
rm molecules
ls -ld data-shell/molecules
lrwxrwxrwx 1 user group 20 Feb 8 19:58 molecules -> data-shell/molecules
drwxr-xr-x 2 user group 4096 Feb 8 11:36 data-shell/molecules
Although we can leave the name of the symlink the same as the original object, one of the most useful features of symlinks is being able to rename files without moving or changing the original file.
For example, in data-shell/data/elements/
we have xml
files describing each atom. Each
of these is named using the periodic table symbol, e.g. N.xml
is the Nitrogen descriptor.
However, we have a program which is expecting the files to have the full atom name, e.g.
Nitrogen.xml
. We can easily enough create these symlinks, e.g.:
cd ~/Desktop/data-shell/data
mkdir elements-fullnames
cd elements-fullnames
ln -s ../elements/N.xml Nitrogen.xml
ls -l
lrwxrwxrwx 1 user group 17 Feb 8 21:32 Nitrogen.xml -> ../elements/N.xml
Doing this will enable us to use the program, without having to create all the input files again. Do keep in mind though that, although you can delete symlinks without deleting the original file, if a program tries to write to a symlink, it will write to the original file. This method is suitable for easily replicating or renaming input files. Extreme caution should be used if you use the same method for output or log files.
Scripting the linking of all atom files.
There are over 100 atom files in the
elements
directory, linking to each of these by hand would be quite painful. Fortunately these are text files, and each of them contains the full name of the element in the first line of the file, e.g.:head -1 ~/Desktop/data-shell/data/elements/N.xml
<element name="Nitrogen"/>
To strip the atom name out of this string you can either use
sed
:head -1 ~/Desktop/data-shell/data/elements/N.xml | sed -E -e 's/^.*"([A-Za-z]*)".*$/\1/'
Nitrogen
Please write a bash script which will use a
for
loop and this string processing pipeline to create links that use the full element names for these files within whatever directory it is run.Solution
for orig_file in "${@}" do element_name=$(grep -i 'name=' ${orig_file} | sed -E -e 's/^.*"([A-Za-z]*)".*$/\1/' ) ln -s ${orig_file} ${element_name}.xml done
This script should be run using:
bash link_script.sh ~/Desktop/data-shell/data/elements/*.xml
If you have a hammer, every problem is a nail
In the above solution we use
sed
and regex to extract the string we require from the xml file. There are other bash tools that could do this for us, and in ways which are arguable more readable (and being as readable as possible is a good trait for code). Can you identify a tool from either these lessons, or the bash introduction lessons, that you could use for this, and adapt your script to use this tool?Solution
The
cut
tool can be used to split the text we need, by using"
as the delimiter:head -1 ~/Desktop/data-shell/data/elements/N.xml | cut -d '"' -f 2
Nitrogen
Using this command instead would make your code more readable. Both solutions do still make similar assumptions about what format the string will take though (and the
head
command also makes major assumptions about the formatting of the file). These assumptions can make your scripts fragile. This fragility can be addressed by either well documenting what inputs you expect the script to have, or by building in extra checks to your code. The choice of which of these solutions to use depends on the script itself - which solution do you think would be the most appropriate for this script?
Key Points
Symbolic links to objects (files or directories) can be created using
ln -s
These are links to the object, not it’s contents, so these can change or be deleted
Symbolic links can cross physical disks, and so are useful in networked filesystems
Caution must be exercised when following
..
paths across symbolic linksThey are most useful for linking to, and/or renaming, input and configuration files or directories