Dantalian Documentation¶
“I ask of thee, art thou mankind?”“Nay, I am the world, the world inside the gourd.”— The Mystic Archives of Dantalian
This is the documentation for Dantalian 0.6, built on March 17, 2015.
User Guide¶
This guide is for end users. People looking to contribute, or otherwise modify or hack on Dantalian, and packagers should refer to the Developer Guide, although this guide will probably be useful as well.
The guide is split into several sections, listed below.
Installation¶
Using packages¶
The easiest method of installation is via packages. However, Dantalian currently only has packages for Arch Linux.
If there are no packages available for your distribution, you will need to install Dantalian manually. If you are able, please consider making a package yourself (Refer to the Developer Guide for information on packaging Dantalian).
Manual Installation¶
Make sure you have satisfied all of the dependencies above. Dantalian is installed just like any Python package:
$ python setup.py install
This will most likely require root, and will install Dantalian globally on the system. Otherwise, you can use virtualenv, or install it for the user:
$ python setup.py install --user
It is recommended to install the man pages as well. The man pages can be built like so:
$ cd doc
$ make man
The man pages can be found in doc/_build/man. How they are installed depends on your system. On Arch Linux, man pages are installed in /usr/share/man as gzipped archives, so you would do the following:
$ cd doc/_build/man
$ gzip ./*
# install ./* /usr/share/man/man1
File tagging with hard links¶
This section has nothing to do with Dantalian, surprisingly. Instead, it will be about organizing your files with hard links. Dantalian is merely a tool to assist in doing the former, so you need to first understand tagging with hard links before using Dantalian, and inversely, this will probably be useful to you even if you do not use Dantalian.
You must understand what hard links are. Here are some relevant external resources:
- In Unix, what is a hard link?
- Hard link (Wikipedia)
- What is the difference between a hard link and a symbolic link?
Some terminology to avoid ambiguity or confusion:
- pathname
- A string which describes a location in the file system, either relative or absolute.
- filename
- A directory entry, or the parts of a pathname that are separated by slashes. For the pathname /foo/bar/baz, foo, bar, baz are all filenames. When referring to the filename of a link, the filename is the last component in the pathname. For example, the filename of the link pictures/pic1.jpg is pic1.jpg. Each link has exactly one filename.
- link
- A directory entry pointing to a file.
- file
- A file in the file system, comprising of its inode and corresponding data blocks. Each file has at least one link pointing to it; when no more links exist, the file is considered deleted, and its space is marked for recycling.
- directory
- Informally known as folders. A special case of files, above, in that they have inodes, and links pointing to them, but creating more than one link to a directory is generally forbidden, so the link to a directory can be thought of as the directory itself.
Organizing files with hard links¶
What does it mean to organize files with hard links? It means you create links to files in directories that they belong in. If you organize your “files” (links to files, strictly speaking) in directories, congratulations, you are already doing it. However, there’s a lot more organizational power in file systems that lay untapped by regular users.
For example, you have a report for project A, so you put the “file” (again, the link to the file using the above definition) in the directory project-A. But the report was also presented in a meeting, and you like to keep all the meeting materials in a specific directory for easy reference. What do you do?
You could make a copy of the report in the meeting notes directory, but this has disadvantages (or potentially advantages). First, there would be two files on disk, resulting in twice the space usage. Second, if you change one of the files, the other file won’t be changed. You have to remember to change the other file as well if you want them to be the same. It may be that you want the copy in the meeting notes directory to stay static, to preserve the file as it was at a certain point in time, in which case making a copy is advantageous. Conversely, you may want the two copies to be the same. If you also want the file in four other directories, suddenly you are using up six times the disk space, and you’ll need to remember to edit six different files when you want to change something.
Alternatively, you can make a new link to the report file in the meeting notes directory. Since there is still only one file, you do not use significantly more disk space (each link requires a piddling amount of extra space), and any changes to the file are, well, reflected in the file. You can access the file using either link, but it’s still the same file.
That’s organizing files with hard links in a nutshell. Put “files” in the directory they belong in, using any organization scheme you like, and if you want to use a different organization scheme at the same time, or you simply want the file to be in more than one place at once, make a new link to it. Simple, yet powerful and flexible.
Creating, removing, and breaking links¶
Links can be created using ln. ln foo bar creates the link bar pointing at the same file as foo. The corresponding system call is link() (see link(2)).
Links can be renamed (or moved, the two actions are synonymous) using mv. mv foo bar moves the link foo to bar. The link will still be pointing at the same file. The corresponding system call is rename() (see rename(2)).
Links can be removed (unlinked) using rm. Note that this removes links, not files. When a file no longer has any links, there is no longer any way to access it, but programs using the file can continue doing so. If there are no programs using it either, the disk space will be open for reuse, and the file can be considered deleted (barring recovery attempts using special software). The corresponding system call is unlink() (see unlink(2)).
Take care not to accidentally break links. Consider two links foo and bar pointing to the same file. If I make a copy of the file (cp foo baz), the new link baz is not pointing at the same file as foo or bar; it is pointing at a new file with the same contents (a copy of the original file). Likewise, if you remove foo and create a new file (not link), foo will no longer be pointing at the same file as bar. This last point may seem obvious, but be careful when editing files, since many programs actually do this when saving files (remove the existing link and create a new file) instead of writing to the original file. For example, Emacs will by default move the link for the file you are editing as a backup and save the buffer as a new file, breaking your links. Most text editors will not break links (vim, vi, nano, gedit, etc.), but large, graphical editors of all sorts (office suites, photo editors, etc.) behave less reliably (this is an unfortunate consequence of laypeople conflating files and links, and questionable programming). You should test programs to see if they break links before taking advantage of hard link organization.
Tagging with hard links¶
Tagging with hard links is just a slight perspective shift from organizing with links. All the material in the previous section is sufficient for organization, but instead of thinking of a file as having links in directories A, B, and C, it may be helpful to instead think of the file as being tagged A, B, and C. This way, to find all of the files with a given tag, you just open the corresponding directory.
It’s also helpful to conceptually set a root for organization, so that you aren’t thinking of directories /home/foo/projects/working and /home/foo/projects/completed, but the tags working and completed, with /home/foo/projects as the root.
If you’re feeling adventurous, you can even include the filename in your mental model (think of a file tagged project-foo/specs.doc, project-specs/foo.doc, and document/12345.doc).
Basic Usage¶
Dantalian is essentially a group of scripts to help manage hard link tagging as described previously.
Libraries¶
Libraries are used to designate root directories as an anchor point for tags and file organization. They can be identified by the special .dantalian directory that they contain.
For more information about libraries, see Libraries.
Creating libraries¶
Libraries can be initialized with the command dantalian init. This will create the library in the working directory. Alternatively, pass the path where you want to create the library:
$ dantalian init path/to/directory
Tags¶
In Dantalian, tags are directories. A file has a tag if it has a hard link in the directory corresponding to that tag.
For example, in the following library:
.
├── even
│ └── 1.txt
└── odd
└── 1.txt
There are two tags, //even and //odd, and one file (assuming both 1.txt are hard links to the same file) which is tagged (perhaps incorrectly) with both //even and //odd tags.
Tags can be referred to in two ways, by the path to its directory, whether relative or absolute, or by its tag qualifier. A tag qualifier is simply the path of its directory, relative to the library root, prepended by //.
For example, given the following:
library
└── tag1
└── tag2
if the current working directory is tag1, we can refer to tag2 as tag2 (relative path) or //tag1/tag2 (tag qualifier).
See also dantalian-concepts(1).
Basic Commands¶
Check the man pages for the command reference.
Tagging and Untagging¶
Tags can be created and removed using the commands dantalian mktag and dantalian rmtag. This can be done manually using the standard utility mkdir.
$ dantalian mktag //kitties
$ dantalian rmtag //kitties
$ mkdir kitties
$ rmdir kitties
Note that mktag and rmtag only take tag qualifiers, and mkdir and rmdir only take pathnames.
Tags can be applied to and removed from files using dantalian tag and dantalian untag (see dantalian-tag(1) and dantalian-untag(1)). This can also be done manually by manipulating the links with ln and rm.
$ dantalian tag file1 tag1
$ dantalian tag file1 -t tag1 tag2 tag3
$ dantalian tag tag1 -f file1 file2 file3
$ dantalian tag -f file1 file2 -t tag1 tag2
$ dantalian untag file1 tag1
$ dantalian untag file1 -t tag1 tag2 tag3
$ dantalian untag tag1 -f file1 file2 file3
$ dantalian untag -f file1 file2 -t tag1 tag2
Basic Queries¶
You can list the tags of a file with dantalian tags:
$ dantalian tags file1
//spam
//eggs
You can perform an AND search on tags with dantalian find:
$ dantalian find //spam //eggs
/home/foo/library/spam/file1
You can list the files of a single tag simply using ls in the respective directory. You can do this with AND tag queries using Dantalian FUSE features.
Advanced Usage¶
If you want to integrate Dantalian into other scripts, frameworks, or programs, you should use Dantalian’s Python library instead of calling the dantalian command line script.
Comprehensive documentation of Dantalian’s Python library can be found in (and is) the source code.
What follows is a quick rundown of basic usage.
Generally, you will use open_library in dantalian.library to load the library as a Python object, then call the methods on the library object that correspond to Dantalian commands. More advanced scripting and/or optimization will require digging deeper into the source code (and, likely, having to write a bit of stuff yourself).
Performance and Scalability¶
Note that these are rough numbers and predictions based on theory. These assume a Linux kernel compiled with common flags and an ext4 file system.
Space cost¶
Each tag for each file costs about 20-200B (completely unverified, but should be about right). This is the cost for each link, or each entry in a directory.
Each directory (or new tag) costs an upfront 2kB or so, but that space is used up by each link in it, covering for the costs mentioned above for each link, so the net cost of each tag is minimal.
This space cost can be thought of as caching the results of single tag file lookup queries.
Time cost¶
File access is constant. In fact it is no different than just opening a file regularly. Looking up all files with a given tag is linear to the number of files with that tag; this is the minimum theoretically possible. Looking up all of the tags of the file is much uglier, requiring a full traversal of the directory tree. However, in practice this runs fairly quickly due to how file systems are designed. I also find that querying files based on tags is done much more commonly than looking up the tags any given file has.
Hard limits¶
ext4 has a hard 65000 limit on links to inodes. This means that each file can have at most 65000 tags and each tag can have at most 64998 subtags (each directory can have at most 64998 subdirectories, as each subdirectory has a link to the parent (..), and each directory has a link to itself (.)).
Linux has a limit on how many levels of symbolic links there can be in a single lookup. I think this is 40, but can be different depending on how the kernel was compiled. This means that Dantalian only supports 40 (or however many your kernel supports) levels deep of converted directories.
Practical considerations¶
The first obstacle that you will likely encounter when scaling Dantalian up is size constraints, since everything must reside on one file system. This obstacle is encountered once the amount of data you are trying to organize exceeds the amount of space of one storage device (say, a 1 TB hard disk drive). This can be circumvented by using LVM and creating a virtual file system that spans multiple physical storage devices.
Any time constraints can be ameliorated with additional caching if required, but otherwise probably cannot be improved further due to mathematical limits.
The only other tricky problem is the hard limits mentioned above. Unless you are trying to organize truly vast amounts of data (where the metadata exceeds the data [1]), they probably won’t be an issue. If they are, however (either more than 64998 subtags under a single tag, or more than 40 levels of converted directories that you need to access), then the workaround would be to use customized file systems or kernels to bypass these hard limits.
[1] | You should probably clarify in your head what exactly your needs are. If you are storing vast amounts of metadata, so much so that the metadata itself can be considered data, you should definitely be using some sort of database instead. |
Libraries¶
A library is an abstract tag-based file organization system layered transparently on top of the underlying file system using hard links. Libraries are created on directories, which become the root for its library. A special .dantalian directory is created in the root directory of a library.
Note
Dantalian uses hard links heavily. Make sure you are familiar with how hard links work! They are very powerful, but can be messy and/or dangerous if you are not familiar with them. Especially take care not to accidently break hard links, e.g., by copying and removing files. Dantalian leverages the advantages hard links provide, but won’t protect you from yourself!
Tags are directories, and all directories are potential tags (including .dantalian, however you shouldn’t use it as such). Files are “tagged” by creating a hard link in the respective directory. Files can have any number and combination of tags. File names and tag names are restricted only by the underlying file system (on ext4, for example, up to 255 bytes and all characters except / are allowed, so knock yourself out). All files of all types can be tagged, including symlinks. Dantalian provides functionality such that even directories can be tagged, perfect for hardcore file organizers.
Usage¶
While it is possible to manage the library solely using standard utilities such as ln, mv, etc., Dantalian provides useful scripts for performing operations, such as tagging, untagging, and deleting.
Check the man pages for the command reference.
Specific Requirements¶
There are some requirements for libraries:
- The root directory must be located on a POSIX filesystem that supports hard links (e.g., ext4).
- Everything under the root directory must be on one contiguous file system.
- Do funky things with block device mounts at your own risk. This includes mounting another device inside a library, mounting a different library FUSE (more on this in FUSE Usage) in a library, and mounting the same library FUSE in itself.
While the above may seem complicated, for most users, it should not be a problem. If you run into the above situations, chances are, you’re an advanced enough user to figure out why and how to fix them.
Name Conflicts¶
Files are hard linked under the tags that it possesses. The file may have different names in each of the directories, e.g., to avoid name conflicts. Dantalian works fine in this case, although it may be confusing for you, the human user, because Dantalian finds files by the path to one of its hard links and manages them internally by hard link references and inodes.
Dantalian will resolve name conflicts if it needs to, e.g., to create a hard link to tag a file. See Names and Paths for more information on name conflict resolution.
Tagging Directories¶
Directories generally are not allowed to be hard linked in most file systems, for various reasons. However, symbolic links are regular files and thus can be hard linked, even if they point to a directory. Dantalian uses this to implement tagging of directories.
Dantalian can convert directories. Converting a directory moves it to a special location under .dantalian and replaces it with an absolute symbolic link to its new location. This allows directories to be tagged just like other files. In other words, Dantalian will manage the actual directory, and a symbolic link will be used in place of it for tagging.
This feature imposes an extra requirement on the library root directory. Namely, when the root directory path is changed, the symbolic links of all converted directories must be fixed by running dantalian fix. Also, unlike regular files, which can be freely hard linked to directories outside of the library (and tagged in other libraries), if you hard link the symbolic link of a dantalian-converted directory outside of it, move the library, and run dantalian fix, it will break the external hard links. If this is one of your use cases, place the directories in a fixed location outside of the library, create a symbolic link manually, and then tag it with Dantalian instead of using dantalian convert.
Because converted directories are all kept in one location, no two converted directories may have the same name. However, the name of the directory Dantalian keeps track of and the name of the symbolic link that the user interacts with are independent of each other. Thus, if there’s a naming conflict, the actual directory can be renamed, and the symbolic links follow the naming rules as above.
Moving Libraries¶
Since libraries are simply directories, moving and/or backing up libraries is very simple. There are two thing to keep in mind: use rsync -H to preserve hard links, and don’t forget to run dantalian fix to fix absolute symbolic links for converted directories. The latter is important as Dantalian currently will not check if it needs fixing.
Nested Libraries¶
Only one library can exist in any given directory, but libraries can be nested. Behavior is well-defined, but I wouldn’t recommend it unless you have a clear use case and know what you are doing. Dantalian works with a single library for its operations. Usually, it will search up through the directories and use the first library it finds, so take care where you run it. You can also specify a specific library by using the --root option. In fact, if you are nesting libraries, it is recommended to always use --root.
Scalability¶
Dantalian’s scalability ultimately depends on the host file system, but it is generally pretty lenient. On ext4, for example, the main limiting factor is number of files per directory, i.e., the number of files that have a given tag. Dantalian remains usable no matter the number, but if you have, say, more than 10,000 files with a given tag, ls (specifically readdir() on the kernel level) may begin to see performance issues. However, file access will not be affected.
Note that you can use LVM to create virtual partitions that span multiple physical drives, if necessary.
Rough performance numbers¶
- Space
- Depends, ~20-200B per tag per file
- Time
- Constant for file access, linear for enumerating files of a tag. (This is pretty straightforward; the only thing is that a directory lookup in, e.g., a file manager, might lock up while it is lsing a directory
Names and Paths¶
Files¶
Everything is done internally via inodes, so all operations take filenames only as a way to indicate a particular file/inode, and Dantalian works with that. Thus, file naming is for the most part a concern for the user only.
File Renaming Algorithm¶
When Dantalian needs to add a file to a directory (e.g., when renaming or tagging), it will attempt to use the name of the file directly. If it runs into a filename/path conflict, it will then attempt to generate a new name using the algorithm described below:
def resolve(dir, name):
base, extension = split_extension(name)
for i=1; ; i++:
new_name = '.'.join([base, i, extension])
if is_okay(dir, new_name):
return new_name
For example, Dantalian will try, in order:
file.mp3
file.1.mp3
file.2.mp3
file.3.mp3
...
If between generating the new name and using it the name becomes unavailable, Dantalian will try to generate a name again from the beginning.
FUSE Name Collision Resolution¶
When file names are projected in a FUSE mounted library, there is a high chance of name collisions, in which case the virtual names of affected files are changed with the following algorithm:
def fuse_resolve(name, path):
base, extension = split_extension(name)
new_name = '.'.join([base, get_inode_number(path), extension])
return new_name
In practice there will be no further name collisions, but if there are, then name collision resolution will be propagated outward until there are no name collisions. This state is guaranteed as file systems cannot assign the same inode number to two different files.
Basically, File name conflicts will be resolved by adding the inode number (which is guaranteed to be unique per file system) at the end of the file name, but before the extension, e.g., if two files are both named file.mp3, the latter will appear as file.12345.mp3, assuming its inode number is 12345.
Node names use node instead of an inode number for resolution.
FUSE Usage¶
Dantalian offers an optional FUSE mount feature, which allows much more powerful interaction with libraries.
To use it, run dantalian mount /path/to/mount/location on the command line. You will want to mount it somewhere outside of the library.
Usage¶
For the most part, FUSE-mounted libraries behave exactly like regular libraries, so you can use the regular Dantalian commands as well as regular file system operations to interact with it. However, certain Dantalian commands behave differently or are restricted for sanity’s sake (for example, you cannot mount a FUSE-mounted library, for obvious reasons). Dantalian distinguishes between a mounted library and a regular library by the existence of a virtual directory .dantalian-fuse, which simply points to .dantalian.
To unmount, use fusermount -u path/to/mount.
See Names and Paths for information about name resolution.
Nodes and virtual space¶
Dantalian manages the virtual space using a node tree.
Node types:
- FSNode
- BorderNode
- TagNode
- RootNode
- BorderNode
FSNodes represent virtual directories. BorderNode is an abstract subclass for nodes that lead back into real space (back to the underlying file system). There are two types: TagNodes project the intersection of their tags under themselves, whereas the RootNode (there will only be one, at the root) projects the library root under itself.
It is useful to divide the virtual space into categories when describing Dantalian FUSE behavior. Paths which point to nodes are in nodespace. Paths which point to files directly under TagNodes are in tagspace. Paths which point more than one directory beyond TagNodes or any files under RootNodes are in outsidespace.
Socket Operations¶
You can also interact directly with a FUSE-mounted library using socket operations. FUSE-mounted libraries open a socket at .dantalian/fuse.sock. Dantalian provides scripts that allow you to interact dynamically with a mounted library, but they simply echo standard commands to the socket, which can be done by hand (like all other Dantalian operations) from, e.g. a remote client that doesn’t have Dantalian installed. For example, the socket command:
$ dantalian mknode path/to/node tag1 tag2
can be done by:
$ echo mknode path/to/node tag1 tag2 > library/.dantalian/fuse.sock
The socket processes commands much like a shell, so make sure to quote anything that contains spaces.
A list of socket commands can be found in the man pages.
FUSE Operations¶
FUSE intercepts calls to the kernel to perform file system operations, allowing it to present a file system API in user space. How it behaves depends on how these operations are implemented. As a rule of thumb, interaction with nodespace is extremely limited. Calls to outsidespace will be passed on to the OS/underlying file system. Calls to tagspace will manipulate the tags on the files according to the library rules.
These operations are only documented in the source code currently.
Developer Guide¶
Welcome to dantalian’s Developer Guide! If you’re looking to build an application using dantalian, contribute, or just tinker with the code, take this with you. Also, if you’re a user who is just curious about the design or abstractions behind dantalian, a quick peek won’t hurt.
Note
This section is a little out of date as of 0.6. For packaging, refer to Installation. For API and such, refer to the source code.
Dependencies¶
These are the dependencies and the specific version numbers that I am working on, to aid in debugging and development should version problems arise.
Build Dependencies¶
- Python 3.3.2
For the documentation:
- Sphinx==1.1.3
If you want, you can use a custom ctags extension for Sphinx: ext_ctags
Place ext_ctags.py in sphinx/ext wherever Sphinx is installed for your environment.
Usage Dependencies¶
- Python 3.3.2
- findutils 4.4.2
- fuse 2.9.2
Building¶
Source Package¶
Refer to Dependencies for the build dependencies.
I recommend that you use a Python virtualenv for building dantalian.
Get a copy of the code from the repository of the version or commit you are building:
$ git clone https://github.com/darkfeline/dantalian.git
# stable branch
$ git checkout master
# development branch
$ git checkout develop
Build the documentation:
$ cd doc
$ make html
$ make man
Make the source package:
$ cd ..
$ python setup.py sdist
Packages will be in the dist directory.
Built Package¶
Built packages can also be made for distribution, e.g., for a package repository. Likely, this will entail configuration specific to the distribution, repository, and/or package manager that you are using.
A simple vanilla package can be built by creating a setup.cfg with the following text:
[install]
prefix=/usr
and running:
$ python setup.py bdist
Library Specification¶
dantalian is an implementation of the dantalian library, an abstract interface. Much like how POSIX system calls define an interface providing a standard file system interface abstraction, the dantalian library defines a standard interface for a multidimensionally hierarchical tagging system.
dantalian provides a transparent implementation of the library that lies closely on top of the underlying file system. Details pertaining to dantalian’s library implementation will be listed separately below in the notes.
Library Sublayer¶
Libraries require a single POSIX filesystem underneath them to manage the files. Libraries only manage the tag metadata.
Note
dantalian’s implementation anchors the library on a root directory given by an absolute path on the file system, but the general library specification has no such requirement.
Library objects¶
Libraries interact with files (including directories) and tags. Both are described with strings.
Tags start with //, similar to absolute paths, but doubled to distinguish them. Like POSIX paths, parent and child tags are separated with /. / is not allowed in tag names, but all other characters are legal.
Files are identified by their path, in standard POSIX format. However, paths starting with // are not legal, since that is reserved for tags. Libraries handle files by their inode. Thus, if a file is moved, it maintains its status in the library, but must be referred to with its new path.
Note
Tags are directories in dantalian’s library implementation. Thus, tags and directories (as files) may be referenced interchangeably as a file or a tag, respectively.
Directories are considered tags relative to the library root. Thus, a directory albums in the root directory is synonymous with tag //albums, and a directory artists in albums with tag //albums/artists.
Due to dantalian’s implementation, the special root tag // exists as an implementation detail. The only documented appearance of the root tag is when calling dantalian.library.BaseLibrary.listtag(), which will include the root tag if the file is hard linked under the library root directory. The root tag will work everywhere a tag will, but again, is an implementation detail specific to dantalian’s implementation.
Tagging¶
Libraries allow objects to be associated with tags and track these associations.
Both files and tags may be tagged. Each object can have any number and any combination of tags. Each object can only be tagged with a given tag once; the relationship is binary, either tagged or untagged. Tags can be tagged with themselves.
Note
Directories can only be tagged once by virtue of common file system limitations. Symbolic links act identically to files. In order to tag a directory multiple times in dantalian’s library implementation, the directory must be converted (stored in a designated location and replaced with a corresponding symbolic link). If a file system were to support directory hard links, then the library specification applies normally.
Library class and methods¶
The library interface is defined in the dantalian.library.BaseLibrary class. Library implementation must implement the following methods:
- tag(file, tag)
file should be tagged with tag after call, regardless of whether it was before.
- untag(file, tag)
file should not be tagged with tag after call, regardless of whether it was before.
- mktag(tag)
tag is created. Do nothing if it exists.
- rmtag(tag)
tag is removed. Do nothing if it doesn’t exist.
- listtags(file)
Return a list of all of the tags of file.
- find(tags)
Return a list of files that have all of the given tags in tags.
- mount(path, tree)
Mount a virtual representation of the library representation tree at path.
Implementation specifics¶
This section contains additional information about dantalian’s library implementation.
Directories are tags, and vice versa. Objects tagged with a given tag are hard linked under the respective directory. A file can appear within a directory multiple times; such a file will be considered as tagged once with the corresponding tag.
Due to practical reasons, there is a directory .dantalian in the library root directory reserved for internal use. It is treated normally, i.e., as a directory and as a tag, but in almost all cases it should not be used as a tag and should be considered an implementation detail.
Everywhere a tag is needed in a library’s method calls, a path to a directory can be substituted.
Library Implementation¶
This section documents dantalian’s library implementation. See Library Specification for a reference to the library specification.
Library¶
Library is the actual implementation that dantalian provides. It implements the following public methods and invariants in addition to those described in Library class and methods (Filename/path conflicts will be resolved according to File Renaming Algorithm.):
Note
dantalian respects symbolic links to directories outside of the library (Symbolic links to directories inside of the library, on the other hand, should always be converted by dantalian. Handmade symbolic links to library-internal paths subject to breakage and Armageddon.).
For simple operations, dantalian will act as though external symlinked directories are a part of the library. For complex operations, these external directories will be ignored (This is because dantalian is not really descending symbolic links, but only acting on the directories stored internally. This simulates only descending into internal symbolic links.). The latter case will be noted below if applicable.
- tag(file, tag)
If file does not have a hard link under the tag directory, make one. file has at least one hard link under the tag directory after call.
- untag(file, tag)
file should not be tagged with tag after call, regardless of whether it was before.
- mktag(tag)
The directory corresponding to tag is created. Do nothing if it exists.
- rmtag(tag)
The directory corresponding to tag is removed. Do nothing if it doesn’t exist.
- listtags(file)
Return a list of all of the tags of file.
- find(tags)
Return a list of files that have all of the given tags in tags.
- mount(path, tree)
Mount a virtual representation of the library representation tree at path.
The following are methods that are not in the abstract library interface:
- convert(dir)
Store directory dir internally and replace the original with a symbolic link with the same name pointing to the absolute path of the stored directory. Resolve name conflict if necessary (if a file with the same name is made in between moving the directory and creating the symbolic link, for example).
- cleandirs()
Remove all directories stored internally that no longer have any symbolic links referring to them in the library.
- rm(file)
Remove all hard links to file in the library. Any errors will be reported and removal will resume for remaining hard links.
Note
rm() does not descend into symbolic links to external directories.
- rename(file, new)
Rename all hard links to file in the library to new. File name conflicts are resolved and reported. Any errors will be reported and renaming will resume for remaining hard links.
Note
rename() does not descend into symbolic links to external directories.
- fix()
Fix the absolute paths of symbolic links in the library to internally stored directories after the library’s path has been changed. Hard link relationships of the symbolic links are preserved only in the library. (This is because the Linux kernel/POSIX system calls do not allow for editing symbolic links in place. They must be unlinked and remade.) Symbolic links are unlinked and a new symbolic link is made then relinked. Filename conflicts are resolved and reported (if a file with the same name is made in between deleting and creating the symbolic link, for example).
- maketree()¶
Return a tree generated using the library’s configuration files.
ProxyLibrary¶
ProxyLibrary is a subclass of Library for virtual FUSE mounted libraries. It overrides the following methods:
- fix()
Log a warning and do nothing. (Action not allowed.)
- mount(path, tree)
Log a warning and do nothing. (Action not allowed.)
FUSE/Mounted Library Specification¶
The dantalian library API requires a mount() method, which uses FUSE to mount a virtual file system representation of the library.
The mounted library provides a standard file-system-like interface to libraries. While the dantalian implementation of the library already provides such an interface, other implementations may not by default. Also, mounted libraries provide additional features even for dantalian’s existing file-system-like interface.
Virtual Spaces¶
In describing mounted library behavior, it is useful to divide the file system space into a number of categories.
Directories corresponding to nodes are considered to be in virtual space. (Nodes are virtual space.)
Directories and files corresponding to real directories and files on the file system are considered to be in real space.
There is also a subcategory for directories and files in real space: real space files and directories pulled in by TagNodes are additionally considered to be in tag space. Note that this is not recursive. Given the following:
TagNode/
dir1/
file1
dir2/
file2
TagNode is in virtual space as it is a node. Everything under it is in real space, but only dir1, dir2, and file2 are in tag space. file1 is not in tag space.
FUSE Operations¶
FUSE provides syscall-like operation hooks to emulate a file system. Their implementations for mounted libraries are found as methods in the dantalian.operations.TagOperations class.
Note
The behavior of the following operations on tag space is subject to change, due to planned additions to tag nodes.
- chmod(path, mode)
If path is in real or tag space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- chown(path, uid, gid)
If path is in real or tag space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- create(path, mode)
If path is in real space, forward to OS. If path is also in tag space, tag the file accordingly. If path is in virtual space, the operation is invalid and raises EINVAL.
- getattr(path, fh=None)
If path is in real or tag space, forward to OS. If path is in virtual space, get file attributes from the node.
- getxattr()
Not implemented.
- listxattr()
Not implemented.
- link(source, target)
Note
Note that this is different from standard. Usually link(a, b) creates a link at b to a, but this link(source, target) creates a link at source to target. This is a quirk in the FUSE library used in dantalian.
If source is in real space, link it (forward request to OS). If source is also in tag space, tag the newly created link accordingly. If source is in virtual space, raise EINVAL.
- mkdir(path, mode)
If path is in real space, forward to OS. If path is also in tag space, additionally convert the new directory and tag it accordingly. If path is in virtual space, the operation is invalid and raises EINVAL.
- open(path, flags)
If path is in real space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- read(path, size, offset, fh)
If path is in real space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- readdir(path, fh)
If path is in real space, forward to OS. If path is in virtual space, get information from the node.
- readlink(path)
If path is in real space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- removexattr()
Not implemented.
- rename(old, new)
This one is tricky; here’s a handy chart.
From To -> Virtual Tag Real Virtual EINVAL EINVAL EINVAL Tag EINVAL untag, tag move, untag Real EINVAL tag, remove move
- rmdir(path)
If path is in real space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- setxattr()
Not implemented.
- statfs(path)
Forward the request to the OS (via built-in os module).
- symlink(source, target)
Note
This has the same quirk as link().
If source is in real space, link it (forward request to OS). If source is also in tag space, tag the newly created symlink. If source is in virtual space, raise EINVAL.
- truncate(path, length, fh=None)
If path is in real or tag space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- unlink(path)
If source is in real space, but not tag space, forward to OS. If source is in tag space, untag the file instead. If source is in virtual space, raise EINVAL.
- utimens(path, times=None)
If path is in real space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
- write(path, data, offset, fh)
If path is in real space, forward to OS. If path is in virtual space, the operation is invalid and raises EINVAL.
Nodes¶
Nodes are used to construct and maintain the virtual library file system. Internally, nodes are implemented as mapping type data objects.
Currently, there are three node types and one virtual node class.
dantalian.tree.BaseNode is the fundamental node class, representing a virtual directory in a mounted library. Its implementation is dantalian.tree.Node.
dantalian.tree.BorderNode is a virtual class/interface for nodes that pull the host file system into the virtual space (i.e., tagged files)
It has two subclasses, dantalian.tree.BaseRootNode and dantalian.tree.BaseTagNode, and their implementations dantalian.tree.RootNode and dantalian.tree.TagNode, respectively.
RootNodes pull all of the tags in the library under themselves as virtual directories. They will usually be the root node for the node trees that describe the mounted library structure, but this is not necessary.
TagNodes pull the intersection set of files of a given set of tags under themselves.
Node File Attributes¶
Nodes implement a basic set of default file attributes.
- atime, ctime, mtime
- Defaults to time of node creation
- uid, gid
- Defaults to process’s uid and gid
- mode
- Set directory bit, and permission bits 0o777 minus umask bits.
- size
- Constant 4096
Currently these are dummy values and do not change, save for nlinks.
Socket Commands¶
Socket commands allow interaction with the mounted FUSE process, thereby dynamically modifying parts of the virtual FUSE-mounted library. Socket commands may be invoked by the relevant commands of the dantalian CLI script, or by echoing the commands directly into the FUSE library socket. The dantalian CLI script simply writes the commands to the socket as well.
Currently, there are the following commands:
- mknode path tag1 [tag2 ...]
- Make a TagNode at the given path with the given tags. Make intermediary Nodes if needed.
- rmnode path
- Remove the Node at the given path.
Command Reference (Man Pages)¶
Reference information for Dantalian are contained in the manual pages, which are duplicated below.
dantalian(1) – file tagging using hard links¶
SYNOPSIS¶
dantalian [options] command [args]
DESCRIPTION¶
dantalian provides an interface to scripts that automate management of file tagging using hard links.
OPTIONS¶
-h, --help | Print help information. |
COMMANDS¶
There are three types of commands. Library commands require a library. dantalian will search up the directory tree from the working directory and use the first library it finds, or a library can be specified explicitly by path.
Global commands do not require a library. Socket commands require a virtual FUSE library, and simply write commands to the virtual FUSE library’s command socket.
LIBRARY COMMANDS¶
- dantalian-tag(1)
- Tag files.
- dantalian-untag(1)
- Untag files.
- dantalian-mktag(1)
- Make tags.
- dantalian-rmtag(1)
- Remove tags.
- dantalian-tags(1)
- List tags of files.
- dantalian-find(1)
- Find files with tags.
- dantalian-rm(1)
- Remove all tags of files.
- dantalian-rename(1)
- Rename tagged file.
- dantalian-convert(1)
- Convert directories into taggable symbolic links.
- dantalian-revert(1)
- Revert converted directories from symbolic links.
- dantalian-fix(1)
- Fix symbolic links of converted directories.
- dantalian-clean(1)
- Clean stored converted directories.
- dantalian-mount(1)
- Mount library as virtual FUSE library.
GLOBAL COMMANDS¶
- dantalian-init(1)
- Initialize a library.
SOCKET COMMANDS¶
- dantalian-mknode(1)
- Make a tag node.
- dantalian-rmnode(1)
- Remove nodes.
SEE ALSO¶
- dantalian-concepts(1)
- Concepts and general information.
- Online documentation
- http://dantalian.readthedocs.org/
- Project website
- http://darkfeline.github.io/dantalian/
dantalian-concepts(1) – Concepts and general information¶
TAGS AND HARD LINKS¶
Directories are tags, and tags are directories. A file is considered tagged with a given tag if it has at least one hard link in the respective directory. The name of the hard link does not matter, and there can be more than one hard link for a file in a given directory.
Tags can be referred to interchangeably using the path to their respective directory, either relative or absolute, or by their tag qualifier (unless otherwise noted). Tag qualifiers are similar to UNIX paths, but are relative to their library’s root directory and are preceded with two slashes.
For example, if the root of the library is /home/user/library, and the library contains a directory /home/user/library/foo/bar, the tag qualifier for that directory would be //foo/bar.
dantalian-tag(1) – Tag files¶
SYNOPSIS¶
DESCRIPTION¶
This command tags all of the given files with all of the given tags. After calling this command, all of the files will have at least one hard link in each tag’s corresponding directory.
If the file was already tagged, nothing will happen. If it was not tagged, this command will create the respective hard link using a name as similar as possible to the file’s name as provided to the command.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
EXAMPLES¶
Tagging one file with one tag:
$ dantalian tag file1 tag1
$ dantalian tag file1 -t tag1
$ dantalian tag tag1 -f file1
$ dantalian tag -f file1 -t tag1
$ dantalian tag -t tag1 -f file1
Tagging one file with many tags:
$ dantalian tag file1 -t tag1 tag2 tag3
$ dantalian tag -f file1 -t tag1 tag2 tag3
Tagging many files with many tags:
$ dantalian tag -f file1 file2 file3 -t tag1 tag2 tag3
dantalian-untag(1) – Untag files¶
SYNOPSIS¶
DESCRIPTION¶
This command removes all of the given tags from all of the given files. After calling this command, none of the files will have any hard links in each tag’s corresponding directory.
If the file was not tagged, nothing will happen.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
EXAMPLES¶
See the examples in dantalian-tag(1), as untag works similarly.
dantalian-mktag(1) – Make tags¶
SYNOPSIS¶
dantalian mktag [options] tag...
DESCRIPTION¶
This command makes tags (directories).
This command only works with tag qualifiers. If you want to work with paths, use mkdir(1) instead.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
EXAMPLES¶
Make tags:
$ dantalian mktag //tag1 //tag2
Note that you cannot do this:
$ dantalian mktag tag1
Instead do:
$ mkdir tag1
dantalian-rmtag(1) – Remove tags¶
SYNOPSIS¶
dantalian rmtag [options] tag...
DESCRIPTION¶
This command removes tags (directories).
This command only works with tag qualifiers. If you want to work with paths, use rmdir(1) or rm(1) instead.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
EXAMPLES¶
See dantalian-mktag(1).
dantalian-tags(1) – List tags of files¶
SYNOPSIS¶
dantalian tags [-h] [–print0] file
DESCRIPTION¶
This command lists the tags of the given file as tag qualifiers.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
--print0 | Print the files separated with NULLs instead of newlines. |
dantalian-find(1) – Find files with tags¶
SYNOPSIS¶
dantalian find [options] tag...
DESCRIPTION¶
This command lists the files that have all of the given tags, using the path corresponding to the first tag given.
For example, if foo has tag1 and tag2, then
$ dantalian find tag1 tag2
will print /path/to/tag1/foo, while
$ dantalian find tag2 tag1
will print /path/to/tag2/foo.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
--print0 | Print the files separated with NULLs instead of newlines. |
-t DESTINATION | Instead of printing the files, hard link them in the given destination directory, which may be provided as a path or a tag qualifier. It may be outside of the library as well, but must be on the same file system. |
dantalian-rm(1) – Remove all tags of files¶
SYNOPSIS¶
dantalian rm [options] file...
DESCRIPTION¶
This command removes all of the tags of the given files. In most cases, this is the same as deleting the file entirely, unless there are hard links to the files outside of the library. Hard links to the files that reside outside of the library are not affected.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
dantalian-rename(1) – Rename tagged file¶
SYNOPSIS¶
dantalian rename [options] file new
DESCRIPTION¶
This command attempts to rename all hard links of the given file in the library to the given name. If this is not possible, it will append an incrementing index to the end of the name, before the file extension, until a free name is found, for each hard link.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
EXAMPLES¶
Rename all hard links to foo.txt, to bar.txt:
$ dantalian rename foo.txt bar.txt
If the directory for one of the hard links already has a bar.txt, dantalian will try to rename it bar.1.txt, then bar.2.txt, and so on.
dantalian-convert(1) – Convert directories into taggable symbolic links¶
SYNOPSIS¶
dantalian convert [options] directory...
DESCRIPTION¶
This command converts the given directories into symbolic links that can be tagged. The directories are moved to a special library directory, and a symbolic link is created at its original path.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
dantalian-revert(1) – Revert converted directories from symbolic links¶
SYNOPSIS¶
dantalian revert [options] file...
DESCRIPTION¶
This command reverts converted directories back into directories from symbolic links. The directories must only have one tag (alternatively, one hard link) in the library.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
dantalian-fix(1) – Fix symbolic links of converted directories¶
SYNOPSIS¶
dantalian fix [options]
DESCRIPTION¶
This command fixes the symbolic links of converted directories after the library has been moved or otherwise has its path changed. Hard link relationships of the symbolic links are preserved only in the library. (This is because Linux system calls do not allow for editing symbolic links in place. They must be unlinked and remade.) Symbolic links are removed and a new symbolic link is made then relinked.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
dantalian-clean(1) – Clean stored converted directories¶
SYNOPSIS¶
dantalian clean [options]
DESCRIPTION¶
This command removes directories that have been converted, but no longer have any symbolic links pointing to them in the library.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
dantalian-mount(1) – Mount library as virtual FUSE library¶
SYNOPSIS¶
dantalian mount [options] path
DESCRIPTION¶
This command mounts the library as a virtual FUSE library at the given path.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |
dantalian-init(1) – Initialize a library¶
SYNOPSIS¶
dantalian init [options] [path]
DESCRIPTION¶
This command initializes a library at the given path, if a path was provided. Otherwise, it initializes a library in the working directory. This command is safe to call on an existing library.
OPTIONS¶
-h, --help | Print help information. |
dantalian-mknode(1) – Make a tag node¶
SYNOPSIS¶
dantalian mknode [options] path tag...
DESCRIPTION¶
This command makes a tag node in a virtual FUSE library using the given tags at the given path.
OPTIONS¶
-h, --help | Print help information. |
--root=PATH | Specify the root directory of the library to use. |