========
Etta 1.1
========

Bob Riemersma
Version 1.1
May 2012


Etta (from "Rosetta Stone") is a program for converting SGML
documents such as many U.S. Library of Congress books available
online for downloading into usable HTML documents.

There is a dearth of working tools for consumer and low-end
viewing and conversion of SGML-encoded documents.  Etta is an
attempt to fill this gap in SGML support.

The goal here is not to provide extremely generalized or
"complete" SGML conversions or to support on-the-fly Web back-
end functionality.  Instead Etta attempts to handle the more
limited problem of turning typical SGML-encoded "books" into
another format for which there is widespread software support.

Etta conversions are rough, as its results will demonstrate.
But it can be useful in converting such SGML documents into
forms suitable for offline e-reading.  If nothing else Etta may
get you 98% of the way to a finished result, saving huge
amounts of time over fully manual conversion attempts - even if
you do have to go in and tweak things manually after
conversion.


Typical Use Case
----------------

You found a rather interesting lengthy document at the U. S.
Library of Congress.  However they only provide (1) page by
page images of the source document, (2) section/chapter per
HTML page views of the document's text, or (3) a fairly useless
SGML download of the document's full text in one file.

But you'd like to view and read this document offline at your
leisure. You really want an e-book format.

With Etta you can download that SGML file and then convert the
file to HTML based on the input SGML.  You can view the HTML
output file using Microsoft's Internet Explorer or pretty much
any 3rd party Web browser.

You can also clean up anything especially objectionable in the
output by hand-editing.  An HTML editor or text editor works
for this if you understand HTML. The HTML that Etta produces
can also be opened using word processing tools such as
Microsoft's Word, which the average user may be more familiar
with.  From here the document can be cleaned up and saved into
another format if desired.

I like to save as RTF from Word myself, then re-open this RTF
file in Wordpad and re-save for a smaller, cleaner file.  You
could also use 3rd party tools to convert HTML to alternative
formats such as PDF or proprietary e-book formats.


Altering and Improving Etta's Conversion Process
------------------------------------------------

If you want to try improving the process you can create custom
specsfiles and stylesfiles, or modify and recompile Etta
itself.  Additional information to help you do this is provided
below.


Etta System Requirements
========================

Windows 95B (or later) with IE 4.0 Desktop Update, or Windows
98 or later.

Tested on:

    o Windows 95 OSR2 with Desktop Update,
    o Windows XP SP3,
    o Windows Vista SP2, and
    o Windows 7 SP1.

No special memory or disk requirements.  Memory use is kept
low to accommodate even very old PCs.  Etta itself has a very
small disk footprint, and aside from that enough disk for the
input and output files is needed.


Deploying Etta
==============

The program was written and compiled for simple "XCopy"
deployment.  You can just copy the following files into any
folder you wish.  Then if desired you can add a shortcut for
running Etta to your Desktop, Start Menu, etc.

There is a GUI Etta.exe and an EttaBatch.exe for command
line execution.

Files to deploy:

    Etta.exe           The GUI Etta program.
    EttaBatch.exe      The command line Etta program.
    defaultspecs.txt   A default "specsfile" for Etta.
    defaultstyles.txt  A default "stylesfile" for Etta.


In-House Deployment
-------------------

For deployment within corporate networks you should be able to
wrap Etta into a Windows Installer package or other such
mechanism fairly easily.

Etta and EttaBatch will run properly from Program Files for
standard users in a UAC or managed security ecosystem.

Though the two "defaults" text files should be deployed to the
Etta install folder they are only read and not written to.
Users can create custom files anywhere on disk if desired and
direct Etta to use them.  There is no need to allow users to
write to the Etta install folder.


Running Etta
============

GUI Etta
--------

You just run Etta.exe by double-clicking its icon or a shortcut
you have created for Etta.  Etta runs as a GUI (Windows
Subsystem) program and opens a window for user interaction.

From there you specify four files:

    o SGML Text File    (to be converted)
    o Etta Specs File   (default prefilled)
    o HTML Styles File  (default prefilled)
    o HTML Output       (converted result)

Then simply click the Convert button.  After completion you
can specify another conversion or close Etta.  It's that
easy.


EttaBatch
---------

The EttaBatch command line help is:

    #Convert SGML Text document to HTML.

    EttaBatch [/? | /I <sgml file> /O <html file>
                   [/P <specs file>] [/Y <styles file>]]

      /?               this help.
      /I <sgml file>   input SGML file.
      /O <html file>   output HTML file.
      /P <specs file>  optional Etta specsfile, default is
                       defaultspecs.txt in EttaBatch's folder.
      /Y <styles file> optional Etta stylesfile, default is
                       defaultstyles.txt in EttaBatch's folder.

The switches are case-insensitive, "/" or "-" can be used as
the switch indictor, and file names can be (and should be if
they contain spaces) quoted using " characters.

The arguments are not positional, and may be provided in any
order on the command line.

"Help" requests and command line syntax errors return a result
code of 1.  Program faults return a result code equal to the
error code of the fault.


Known Issues
============

Tables
------

Tables in particular are a mess.  Most SGML samples don't seem
to contain anything that can be mapped to a <tr> tag making
HTML table conversion basically useless.  In the "default"
specs and styles I have mapped what tags there are to make
SGML tables come out as a block of blue text preserving as
much data as possible.  The missing hints result in something
decidedly non-tabular however.


Handwriting
-----------

"Handwriting" markup is rendered in HTML as italic text in
brown, just to make it distinct.  Position info such as "above
text" or "left margin" is lost.


Misc.
-----

There is no attempt to link any images into the generated HTML.

Etta doesn't attempt to guide its conversion using SGML DTDs,
but instead uses a simple "specsfile" along with a
corresponding HTML stylesheet it embeds in the output HTML
file.  Details on those can be found below.


Recompiling Etta
================

You may want to try improving Etta by altering or by
supplementing its existing logic.

Note that several source modules are shared between the two
VB6 Projects:

    EttaWork.bas
    Spec.cls
    Tag.cls


GUI Etta Project
----------------

You can just open Etta.vbp in Visual Basic 6.0, make your
changes, and recompile.  No extra build steps should be
required.


EttaBatch Project
-----------------

Open EttaBatch.vbp in VB6, make your changes, recompile.  Then
drag EttaBatch.exe and drop it onto LinkConsole.vbs, which
will relink the program for the Console Subsystem.  This
allows StdIO operations to work properly.

Note:  LinkConsole.vbs may need updating on your system.  Open
       (edit) this file and examine the comments at the head of
       the script for information.


Etta Specifications Files
=========================

These files are given to Etta to drive the process of replacing
SGML tags with appropriate HTML tags.  In many cases this also
involves applying a style to the HTML tag.

Each conversion requires a specsfile.  A default specsfile is
provided with Etta.


Specs File Contents
-------------------

A specsfile is a text file with comma-separated values on CRLF
delimited lines.  The first line is a header line that Etta will
skip:

    Tag,Modifier,Self Closing,Action,Action Data

Subsequent lines must contain these 5 field values, which may
not contain commas:

Tag           The SGML tag to replace, without <> brackets.
Modifier      An SGML tag modifier, some string found among the
              attributes of the opening tag.
Self Closing  This is either y or n (actually anything not y is
              treated as n (no).  This means "self closing tag"
              such as the <lb> tag in SGML.
Action        Replacement action for Etta to take.  See details
              listed below.
Action Data   Optional value used in conjunction with some Etta
              Actions (particularly replace and repmod).

None of this information is case-sensitive.  Etta converts
values to lowercase before using them, as it does when parsing
tags from the SGML source.

Do not include any blank or extra lines.  The last line should
end with one CRLF.

When Action Data is not needed you can either enter it as "" or
leave it empty, but the preceding comma is required.  Non-empty
values can also be quoted  Valid examples:

    p,,n,pass,
    TEXT,,N,REPLACE,BODY
    lb,"",y,replace,br
    hi,smallcaps,n,repmod,span
    "handwritten",,n,replace,span class=handwritten


Actions
-------

pass          Pass through.
replace       Replace with Action Data tag, which may include
              attributes such as class.
repmod        Replace with Action Data tag using Modifier
              class.
deltag        Delete this tag and close-tag.
deltext       Delete this tag and anything through end-tag.
pagenum       Delete this tag, if any text is present copy
              text as a centered paragraph (<p align=center>).


Etta HTML Styles Files
======================

These are text files containing a block of HTML to be output
within the <head> of the generated HTML file.  Normally it
consists of a <style> block used in conjunction with a
specsfile but it could also have other <head> elements such
as <title>.

Each conversion requires a stylesfile.  A default stylesfile
is provided with Etta.
