ARCHITECTURE
============

How to build software as a set of small
programs composed by the shell, with Unix
as the integration layer.  This is the
architecture used by seth (the Ethereum
Swiss Army Knife), by git, and by the
classical Unix toolset.  It is the
architecture this machine is built on.


THE PHILOSOPHY
--------------

Bash is a programming language.  Not a
scripting language.  Not glue.  A real
language with real data structures and
real control flow that has been running
the world for fifty years.  Most things
should be written in bash.  When bash is
not the right tool -- when you need
cryptographic primitives, or numerical
precision, or a library that only exists
in another language -- you write that
piece in whatever language fits, and you
expose it as an executable that takes
arguments and produces output.

The integration layer is Unix.  Not a
framework.  Not an RPC protocol.  Not
shared memory.  Unix: processes, pipes,
arguments, environment variables, exit
codes, stdin, stdout, stderr.  This is
the most battle-tested integration layer
in the history of computing.  It works
between any two languages.  It works
between any two machines.  It has worked
for fifty years and it will work for
fifty more.

A program that does one thing and does it
well is better than a program that does
many things.  A hundred small programs
composed by the shell are better than one
large program with a hundred flags.  The
composition is the architecture.


THE STRUCTURE
-------------

A project using this architecture has
three directories:

    bin/        One file.  The front door.
    libexec/    All the subcommands.
    Makefile    Build and install.

bin/ contains a single executable -- the
command the user types.  For a project
called "foo", this is bin/foo.  It does
almost nothing:

    #!/usr/bin/env bash
    set -e
    PATH=${0%/*/*}/libexec/foo:$PATH foo "$@"

Three lines.  It prepends libexec/foo to
PATH and re-executes itself.  Now "foo"
resolves to libexec/foo/foo -- the real
dispatcher.

libexec/ contains the dispatcher and all
the subcommands.  For a project called
"foo":

    libexec/foo/foo          the dispatcher
    libexec/foo/foo-bar      "foo bar"
    libexec/foo/foo-baz      "foo baz"
    libexec/foo/foo---quux   "foo --quux"

The dispatcher (libexec/foo/foo) parses
global options, sets up the environment,
and execs the appropriate subcommand.
When the user types "foo bar arg1 arg2",
the dispatcher execs "foo-bar arg1 arg2".
When the user types "foo --quux arg1",
the dispatcher execs "foo---quux arg1".

The triple-dash convention: "foo --quux"
maps to the file "foo---quux".  Double
dash in the command becomes triple dash
in the filename.  This is because the
dispatcher needs two paths:

  1. "foo bar" -- a regular subcommand.
     Goes through the full option parser.

  2. "foo --quux" -- a utility command.
     Dispatched directly, no option parsing.

The direct dispatch of -- commands is not
an optimization.  It is a structural
necessity.  The option parser may invoke
-- commands to process its own arguments.
For example, the dispatcher for seth
contains:

    export ETH_FROM=$(seth --to-address $1)

If --to-address went through the option
parser, and the option parser called
--to-address, the result would be
infinite recursion.  The -- dispatch path
bypasses the parser entirely, which is
what makes the recursion safe.

The dispatch logic in the dispatcher is:

    if [[ $1 = -* ]] && command -v \
        "${0##*/}-$1" &>/dev/null; then
      exec "${0##*/}-$1" "${@:2}"
    fi

If the first argument starts with a dash
and a matching command exists, exec it
directly.  Otherwise fall through to the
option parser.  Two paths.  They can call
each other but never themselves.  The
recursion bottoms out.


THE ENVIRONMENT
---------------

Configuration is environment variables.
Not config files.  Not YAML.  Not JSON.
Environment variables.

    ETH_RPC_URL=https://mainnet.infura.io/...
    ETH_FROM=0x1234...
    ETH_GAS=200000

Command-line options map directly to
environment variables.  --from becomes
ETH_FROM.  --gas becomes ETH_GAS.
--rpc-url becomes ETH_RPC_URL.  The
option parser simply sets environment
variables and exports them.

This means:

  - You can configure the tool in your
    shell profile or .bashrc.

  - You can configure per-project by
    putting exports in an rc file
    (.sethrc) in the project directory.

  - You can configure per-command by
    prefixing: ETH_GAS=1000000 seth send ...

  - Every subcommand inherits the full
    configuration without being passed
    anything explicitly.

  - The configuration is visible:
    env | grep ETH_ shows everything.

The dispatcher loads rc files once on
first invocation, using a guard variable:

    if ! [[ $SETH_INIT ]]; then
      export SETH_INIT=1
      [[ -e ~/.sethrc ]] && . ~/.sethrc
    fi

Recursive calls skip the init because
SETH_INIT is already set in the
inherited environment.


THE SUBCOMMANDS
---------------

Each subcommand is a standalone
executable.  It can be written in any
language.  The dispatcher doesn't care.
The shell doesn't care.  Unix doesn't
care.  If it's executable and it's on
PATH, it works.

Most subcommands are bash:

    #!/usr/bin/env bash
    set -e
    seth rpc eth_blockNumber

Some are JavaScript (for cryptographic
operations that need a library):

    #!/usr/bin/env node
    ...

Some could be Rust, Python, Perl, C,
Haskell, or anything else.  The interface
is the same: arguments in, text out, exit
code for success or failure.

This is genuine polyglot architecture.
Not "we have bindings for multiple
languages."  The architecture is
inherently language-agnostic because the
integration happens at the process level,
not the library level.  You never import
anything from another subcommand.  You
exec it.


THE COMPOSABILITY
-----------------

Subcommands call each other through the
shell.  This is the composition mechanism.

    seth-basefee:
      seth block ${1:-latest} baseFeePerGas

    seth-block-number:
      seth rpc eth_blockNumber

    seth-gas-price:
      seth rpc eth_gasPrice

    seth-call:
      DATA=$(seth calldata "${@:2}")
      result=$(seth rpc eth_call ...)
      seth --abi-decode "$2" "$result"

seth-call uses seth-calldata to encode
the arguments, seth-rpc to make the RPC
call, and seth---abi-decode to decode the
result.  Three subcommands composed in
one script.  Each one is independently
useful.  Each one can be tested alone.
Each one can be replaced without touching
the others.

Pipes work naturally:

    seth balance $address | seth --from-wei

The user composes commands the same way
the implementation composes them.  The
internal architecture and the external
interface are the same thing.


THE PLUGIN ARCHITECTURE
-----------------------

Plugins are free.  They cost nothing to
implement because the architecture
already supports them.

To add a command to foo, create an
executable called foo-mycommand and put
it on PATH.  That's it.  "foo mycommand"
now works.  The dispatcher finds it via
PATH lookup.  No registration.  No plugin
API.  No manifest file.

The plugin inherits the full environment
-- all the configuration, the RPC URL,
the account address, everything.  It can
call other subcommands.  It can be
written in any language.  It composes
with everything else.

This is how git works.  git-lfs, git-
annex, git-subtree -- they're all just
executables on PATH that git dispatches
to.  The plugin architecture is the
architecture.  There's nothing special
about it.  It's just Unix.


THE HELP SYSTEM
---------------

Each subcommand documents itself in
comments at the top of the file:

    #!/usr/bin/env bash
    ### foo-bar -- do the bar thing
    ### Usage: foo bar [<options>] <arg>
    ###
    ### Longer description here.

The ### convention uses any character
repeated three times at the start of a
line.  The help extractor is a perl
one-liner:

    perl -ne \
      'print "$2\n" if /^(\S)\1\1(?: (.*))?/'

This matches ###, ///, ---, or any other
triple-character prefix.  The convention
works in bash (###), JavaScript (///),
Python (###), C (///), and anything else.
It's language-agnostic, like the rest of
the architecture.

The ### comments serve triple duty:

  1. They're human-readable documentation
     when you open the file.

  2. They generate the --help output.

  3. The OPTS string (if present) drives
     the option parser, so the documented
     options are the implemented options.

The help text and the implementation
cannot diverge because they are the same
file.


HELP DISPATCH

"foo bar --help" is handled by the
dispatcher before anything else:

    if [[ $2 = --help ]]; then
      exec "${0##*/}" help -- "$1"
    fi

This runs "foo help bar", which looks up
the file for foo-bar and extracts its
### header.

The help command (foo-help) handles three
cases:

  1. The subcommand has ### comments with
     multiple lines: print the full help
     text (the Usage, description, options,
     examples, see-also references).

  2. The subcommand has a single ### line:
     print the one-line description.

  3. The subcommand has NO ### comments:
     print the entire source code of the
     file, prefixed with the command name
     on each line.

Case 3 is the critical fallback.  If you
create a new subcommand and don't write
any documentation, "foo mycommand --help"
shows you the source code.  The source
code IS the documentation.  This is not
a punishment for not writing docs -- it's
a genuine design choice.  A five-line
bash script is its own documentation.
Reading the source is faster than reading
a man page.  The fallback is better than
the formal system for small commands.

This means there is no such thing as an
undocumented command.  Every command has
help.  The worst case is that the help
is the source.  For a one-liner like:

    #!/usr/bin/env bash
    set -e
    seth rpc eth_blockNumber

the source IS better documentation than
any prose description could be.


HELP LISTING

"foo help" with no arguments scans every
file in libexec, extracts the one-line
### description from each, and prints
them in two groups: special commands
(--- prefix, the utility operations) and
regular commands (the main operations).
Symlinks are skipped to avoid showing
aliases as separate commands.

The listing is auto-generated from the
filesystem.  Adding a command to the
project automatically adds it to the
help listing.  Removing the file removes
it from the listing.  The filesystem is
the registry.


THE INTERFACE
-------------

The interface between commands is text.
Not JSON.  Not protobuf.  Not a binary
format.  Text.

  - seth rpc eth_blockNumber returns a
    hex string.
  - seth --to-dec converts it to decimal.
  - seth balance returns wei.
  - seth --from-wei converts to ether.

Each command takes text and produces text.
The shell connects them.  Pipes compose
them.  This is the Unix way.

When structured output is needed (block
data, transaction receipts), JSON is used
because it's the native format of the
Ethereum JSON-RPC API.  But the default
is a single value on stdout.  Most
commands produce one line.  The common
case is simple.


THE AESTHETICS
--------------

Programs should be small.  The smallest
useful program is one line:

    seth rpc eth_blockNumber

The largest subcommand in seth is about
100 lines.  Most are under 20.  Many are
under 5.

Programs should do one thing.  seth-
balance gets a balance.  seth---to-hex
converts to hex.  seth---from-wei
converts from wei.  If a program does
two things, it should be two programs.

Programs should compose.  If you can't
pipe the output of one command into
another, something is wrong.

Programs should be honest.  set -e means
fail immediately on error.  set -x (when
appropriate) means show what you're
doing.  The trace output is
documentation.  A program that runs
silently and succeeds is a program you
have to trust.  A program that shows its
work is a program you can verify.

Programs should be portable.  Bash runs
everywhere.  Unix runs everywhere.
Environment variables work everywhere.
PATH works everywhere.  This architecture
doesn't require any runtime, any
framework, any package manager.  It
requires bash and a filesystem.

Filenames are the public API.  The
existence of a file called seth-call
on PATH means "seth call" is a command.
The naming convention is the
documentation.  The filesystem is the
registry.

Symlinks are aliases.  seth---to-word
is a symlink to seth---to-uint256.
Two names for the same thing.  At the
filesystem level.  No code required.

Environment variables are configuration.
The environment is the config file.
env | grep ETH_ is the config dump.

The shell is the composition layer.
Not a framework.  Not an API.  The
shell that's already running.  The
shell you're already in.  The oldest
and most powerful composition tool in
computing.

This is how software should be built.