Volution Do
Motto: Volution Do -> VolutionDo -> VoDo -> VooDoo -> voodoo!
Sources
Goals
Should
- script execution harness: prepare environment, find interpreter, run;
- script self-containment: take the script and its data and put them together;
- zero installation and configuration (i.e. "copy and run");
- zero dependencies (at least for the tool itself) (i.e. C (or machine compilable) language, static linking);
- minimum file system cluttering (i.e. the smaller the number of files, the better);
- stripping unneeded features at build time;
(optional) possibility to bundle also the interpreter and needed tools along with the script (provided that itself respects the above requirements) (it could work with bash, busybox, toybox, etc.);
- (optional) possibility to create a self-contained executable with both the script, data, and needed tools (with the above constraints);
Shouldn't
- support for Windows;
- support for UNIX-like OS's that lack proper POSIX support;
- design and implement a new scripting language;
provide portability for interpreters (i.e. Python, Ruby, etc., but could work with statically linked bash, lua, etc.);
- built-in graphical user interface;
Concepts
- script
any text or binary file that is interpretable by a certain interpreter (e.g. *.sh, *.py, *.awk, etc.);
- interpreter
any program existing on the host which is able to execute a script (e.g. bash, python, awk, sed, pdmenu, even grep);
- attachment
any text or binary file that could be used by a script, but only for reading (for example lookup tables, blobs, etc.);
- bundle
a grouping of scripts and attachments, constituting the main packaging and deployment model, following the SemVer versioning scheme;
- wrapper
the vdo harness itself;
- tool
any executable, or library (together with it's data) that is used by the script / interpreter to fulfill its purpose; tools used by the wrapper are excluded from here and as noted in the goals section should be contained within the wrapper itself;
- host
- the system where the wrapper, script, interpreter, or tools are used;
Use cases
System administration automation
- the problem:
- you administer more than one server, you've written some custom scripts (nothing fancy), and now you want to use them on all servers;
- the "simple" solution:
you start scp-ing them from one place to another; (you could put them on a shared file system but this has its own problems...)
sooner or later you need to update them and scp-ing again...
- the "right" solution:
you use git / hg / some other versioning system;
the vdo solution:
- you bundle the scripts in a single file, which you host on your web-server (maybe at a "secret" URL);
you run vdo --install <url> to install once and then vdo --upgrade to fetch a new version;
then you run vdo automation-action-1 ..., vdo automation-action-2 ...;
CLI tool development
- the problem:
- you've created some scripts to handle some common / repeating task;
you want to make them work like git: my-tool command-1 ..., my-tool command-2 ...;
- you want to make it available to other people on UNIX like systems;
the vdo solution:
- you bundle the scripts in a single file, which again you host on the web or send by email;
the user just vdo --install <file>|<url>;
then just my-tool command-1 ...;
the vdo bonus:
you could use the XDG plugin to provide support file-system path resolution;
Features
Basic features
- embedded mode (i.e. self-contained executable, the wrapper containing the bundle);
- standalone mode (i.e. only the wrapper executable and the bundle file);
basic OS resource limits (i.e. ulimit, nice, etc.);
- basic OS environment preparation:
- environment;
- file-descriptors;
namespaces (i.e. clone);
process sessions and groups (i.e. setsid, setgid);
- basic environment enforcements:
expected executables in PATH (ala autoconf);
- tty attachment;
mutual exclusion (i.e. flock, etc.);
- OS features detection (i.e. kernel version, certain devices, etc.);
Advanced features
local / global mode (ala npm);
- enforcement hooks;
- pre- / post-execution hooks;
- extract into a temporary place any tools that are bundled with the scripts; (or export them via FUSE if available;)
- (optional) embedded scripting language support:
- (optional) customized script loader for a selection of interpreters:
PEP 302 for Python;
Monitoring features
most of these features are optional;
basic resource usage (i.e. getrusage);
transcript infrastructure (non stderr communication with simple protocol);
Management features
most of these features are not available in standalone / embedded mode;
bundle installation management (ala npm, pip, gem, etc.);
Security features
most of these features are optional;
digital signatures (and encryption) for bundles (i.e. OpenPGP, CMS) (or certain scripts / attachments / tools);
basic OS enforcements (i.e. chroot, setuid, capabilities, etc.);
advanced OS policy enforcements (i.e. cgroups, AppArmor, !SELinux, etc.);
Eye-candy features
most of these features apply only if in interactive mode (i.e. attached to a tty), and are optional;
stdin / stdout wrapping (ala rlfe, rlwrap);
output coloring (e.g. at least red for stderr);
stdout / stderr paging (with less);
Transcript features
logback-inspired message model;
- status support;
- meters support;
Modes
Embedded
- both the wrapper and the bundle is contained in a single executable file;
- there is no need to install anything, just download / make executable / run;
you invoke it like my-tool command-1 ...;
- the wrapper is invisible;
- support for a single bundle;
Standalone
- just like in embedded mode, but you have two files: the wrapper and you bundle;
Local / global
- allows you to manage (query, install, upgrade, etc.) multiple bundles either:
- at user level (i.e. local mode);
- or system wide (i.e. global mode);
- similar to tools like:
Design
Inspiration
- other people's work:
- my personal work:
Assumptions
interpreters should be able to "read" the script from a read-only, non-seekable file-descriptor (i.e. stdin, a pipe, a socket, etc.); (for interpreters that don't, we could use something like create / unlink / read / ... / close;)
- scripts should also be able to "read" the attachment files from file-descriptors (all of the above apply);
- the scripts and data should be small enough to be hold in memory without great overhead;
Guidelines
- any data or meta-data targeted for the wrapper should be either:
- totally opaque for the wrapper (e.g. signatures, etc.) (i.e. the wrapper doesn't interpret them at all);
- JSON-based in UTF-8 encoding (e.g. bundle specifications, script specifications, etc.); (we don't want to invent thousand little syntaxes which get out of hand;)
- only where necessary (for efficiency reasons, etc.) using open formats or for which acceptable open-source libraries are found;
Specifications
- these specifications should contain as little information as possible, but as much as strictly necessary:
except for the description field there are no other user targeted information;
any "meta-data" about the bundle, scripts or attachments, should be placed in an external specification pointed by the reference field (i.e. author, release date, manuals, etc.);
- bundle:
versioning should strictly follow the SemVer versioning scheme;
<bundle-specification> ::= { "bundle" : { "provider" : <uri> "identifier" : <single-name> "version" : <semver> ? "description" : <string> ? "reference" : <uri> } "scripts" : { <single-name> : <script-reference> ... } ? "attachments" : { <single-name> : <attachment-reference> ... } ? "environments" : { <alias:single-name> : <environment-specification> ... } ? "descriptors" : { <alias:single-name> : <descriptor-specification> ... } "interpreters" : { <alias:single-name> : { "type" : "executable" "executable" : { ? "path" : <format-string> | [ <format-string>, ... ] ? "name" : <format-string> | [ <format-string>, ... ] } ? "argument0" : <format-string> ? "arguments" : <arguments-specification> ? "environment" : <environment-specification> | <alias:single-name> ? "descriptors" : <descriptor-specification> | <alias:single-name> } ... } ? "plugins" : { <alias:single-name> : { "provider" : null | <uri> "identifier" : <single-name> ? "configuration" : <json> } ... } }
- script specification:
<script-specification> ::= { "identifier" : <single-name> ? "description" : <string> ? "reference" : <uri> ? "commands" : [ <single-name>, ... ] ? "shortcuts" : { <shortcut:single-name> : <command:single-name>, ... } "interpreter" : <alias:single-name> ? "argument0" : <format-string> ? "arguments" : <arguments-specification> ? "environment" : <environment-specification> | <alias:single-name> ? "descriptors" : <descriptor-specification> | <alias:single-name> "object" : <uri> } <script-reference> ::= <uri>
- arguments specification:
<arguments-specification> ::= [ <format-string> <positive-integer> -1 ... ]
- environment specification:
<environment-specification> ::= { ( "+:" | "?:" ) ++ <name:format-string> : <value:format-string> "-:" ++ <name:format-string> : null ... } | [ "-:" ++ <name:format-string> "!:" ++ <name:format-string> "-|" ++ <pattern:format-string> "+|" ++ <name-pattern:format-string> "!|" ++ <name-pattern:format-string> "^^" ++ <alias:single-name> <environment-specification> ... ]
- descriptor specification:
<descriptor-specification> ::= { ??? }
- attachment specification:
<attachment-specification> ::= { "identifier" : <single-name> ? "content-type" : <content-type> ? "name" : <file-name> ? "description" : <string> ? "reference" : <uri> "object" : <uri> } <attachment-reference> ::= <uri>
- generic definitions:
<single-name> ::= /^\@[a-z0-9]([a-z0-9-]*[a-z0-9])$/ <dot-compound-name> ::= <single-name> ++ ( "." ++ <single-name> ) * <slash-compound-name> ::= <single-name> ++ ( "/" ++ <single-name> ) * <uri> ::= ... // conforming to the RFC 3986
Examples
- bundle:
{ "bundle" : { "provider" : "https://github.com/cipriancraciun", "identifier" : "network-tools", "version" : "v1.5.2", "description" : "Network related tools (interfaces, routing, firewall, etc.)" "reference" : "http://cipriancraciun.github.com/network-tools" }, "interpreters" : { "bash" : { "type" : "executable", "executable" : { "name" : ["python2.7", "python2", "python" ] }, // called like `python /proc/5938/fd/4 argument-1 argument-2 ...` "arguments" : [ "${script:proc-fd-path}", -1 ] }, "python" : { "type" : "executable", "executable" : { "name" : ["python2.7", "python2", "python" ] }, "arguments" : [ "-e", "-E", "-X", "${script:proc-fd-path}", "--", -1 ] } }, "scripts" : { // the value will be replaced with something like "sha1:..." "list-interfaces" : "./scripts/list-references.bash" } }
- script:
{ "identifier" : "list-interfaces", "commands" : [ "list-interfaces" ], "shortcuts" : { "interfaces" : "list-interfaces", "li" : "list-interfaces" }, "interpreter" : "python", "description" : "list all network interfaces", "reference" : "http://cipriancraciun.github.com/network-tools/commands/network-interfaces", "object" : "sha1:..." }
Storage
"standalone" / "embedded" mode
in both cases we define a store interface that has the following interface:
func initialize (_uri string) (_store Store, _error error);
func load (_namespace string, _path string) (_data []byte, _error error);
in "standalone" mode we use the cdb file format (via the tinycdb library);
in "embedded" mode we use .c files, compiled and linked in the executable itself;
- the namespaces are:
/specifications, containing:
/version -> "v0.0";
/bundles -> [ <identifier>, ... ];
/bundles/<identifier> -> <bundle-specification>;
/scripts/<identifier> -> <script-specification>;
/attachments/<identifier> -> <attachment-identifier;
/objects, containing the "binary" data of scripts and attachments:
/objects/<identifier> -> ...;
- in general we follow an REST-full "path" naming, requiring only a root, and then the structure is "self-describing";
Other modes
- we should note that the "standalone" / "embedded" schema easily maps on other storage engines as well:
- plain file system, where the above namespaces are "top" directories;
- zip files, similar as above;
HTTP, FTP, actually any protocol supported by curl / libcurl;
- relational databases like SQLite, etc.;
embedded key-value stores like BerkeleyDB, TokyoCabinet, etc.;
- CouchDB:
- have all the specifications joined into a single JSON forming the "document":
{ "_id" : sha1(" ".join([bundle.provider, bundle.identifier, bundle.version)), "version" : "v0.0", "bundle" : <bundle-specification>, "scripts" : { <alias> : <script-specification>, ... }, "attachments" : { <alias> : <attachment-specification>, ... }, "_attachments" : { ... } // this is provided by CouchDB }
- have all the specifications joined into a single JSON forming the "document":
Execution
Embedded mode
as said in the modes section the wrapper is completely hidden from the user under a single executable;
you can copy that executable anywhere on the local file system (let's assume it is called my-tool);
you put the containing directory in the ${PATH} environment variable;
you execute the tool as my-tool command-1 ...;
Standalone mode
Explanation
you copy somewhere on the local file system the vdo wrapper executable (usually in a "central" folder):
let's assume it's called vdo-standalone.elf;
and it's placed in ~/.lib/vdo-standalone.elf;
- you copy the bundle in the same directory as the wrapper:
let's assume it's called my-tool.vdo;
thus it's placed in ~/.lib/my-tool.vdo;
you create a symbolic link somewhere else on the local file system pointing to the wrapper but having the name of your bundle without the .vdo extension:
let's assume you've placed it in ~/.bin/my-tool;
pointing to ~/.lib/vdo-standalone.elf;
you put the containing directory of the symbolic link in the ${PATH} environment variable;
you execute the tool as my-tool command-1 ...;
Details
when the vdo-standalone.elf is executed it looks at the zeroth argument which usually is the path of the executable (or a symbolic link to it); (obviously if someone execves the tool and puts a random value there it doesn't work;)
- it takes that value and considers it the current executable;
it takes the dirname of the current executable and considers it the current directory;
it takes the basename of the current executable and considers it the bundle name;
- while a bundle is not found:
concatenate the current directory, with the bundle name, and the .vdo string;
- if such a path exists (being a file or symbolic link) then this is the bundle; break;
- if the current executable is a symbolic link, read its value and consider that as the new current executable; continue;
- (we didn't find any bundle yet, and the symbolic link chain ended) fail in finding the bundle; exit;
Local / global mode
Techniques
Interpreter script feeding
- unlinked temporary file (a.k.a. "close-behind"):
- notes:
open(O_RDWR) / unlink / write / ... / fork + execve / read / ... / close;
tmpfs would be the best option;
technique used by Bash (see here);
- advantages:
- capable of handling large files;
- disadvantages:
- requires the file system to be writable;
- requires the file system to support "close-behind" semantic;
- involves disk I/O;
- notes:
single-headed pipe (see writeless-pipe.go experiment):
- notes:
pipe / write(w) / ... / close(w) / fork + execve / read(p) / ... / close(p);
(Linux only) fcntl(F_SETPIPE_SZ);
see writeless-pipe.go experiment;
- advantages:
- no disk I/O;
- doesn't require a file system;
- disadvantages:
- usable only for small files:
- 4 kiB on any UNIX-like system;
- 1 MiB on Linux;
- usable only for small files:
- notes:
referencing descriptors using paths like /proc/self/fd/5:
- disadvantages:
- valid only for the current process, and those inheriting the descriptors;
requires /proc to be mounted;
- disadvantages:
replacing stdin with the stream descriptor, and moving stdin to another descriptor:
- advantages:
- does not require any special file-system to be mounted;
- disadvantages:
would break libraries that read from stdin;
would work only for a few interpreters (at least python, ruby, perl, bash, etc.);
- advantages:
Notes
- the fact that specifications are mandatory described as JSON doesn't preclude the fact that you can't write them in other syntaxes (like my personal favorite Lisp, YAML, even XML if you're sadomasochistic);
bundles could be developed / shared following the Gist model, and stored / shared via a CouchDB database;