SiSU -->
[ document manifest ]
 

SiSU - SiSU information Structuring Universe - Structured information, Serialized Units

Ralph Amissah

Rights: Copyright ©  2009 Ralph Amissah


SiSU - SiSU information Structuring Universe - Structured information, Serialized Units,
Ralph Amissah

Structured information, Serialized Units

SiSU - from less markup than the most elementary equivalent html, you can have more

1. Description

1.1 Outline
1.2 Short summary of features
1.3 How it works
1.4 Simple markup
1.4.1 Sparse markup requirement, try to get the most out of markup
1.4.2 Single markup file provides multiple output formats
1.4.3 Syntax relatively easy to read and remember
1.4.4 Kept simple by having a limited publishing feature set, and features identified as most important, are available across several document types
1.5 Designed with usability in mind
1.6 Code separate from content
1.7 Object citation numbering, a text or object positioning / citation system - "paragraph" (or text object) numbering, that remains same and usable across all output formats by people and machine
1.8 Handling of Dublin Core meta-tags making use of the Resource Description Framework
1.9 Easy directory management
1.10 Document Version Control Information
1.11 Table of contents
1.12 Auto-numbering of headings
1.13 Numbering and cross-hyperlinking of endnotes
1.14 "Skinnable"
1.15 Multiple Outputs
1.15.1 html - several presentations: full length & segmented; css & table based
1.15.2 EPUB
1.15.3 XML
1.15.4 ODT:ODF, Open Document Format - ISO/IEC 26300:2006
1.15.5 PDF - portrait and landscape, (through the generation of LaTeX output which is then transformed to pdf)
1.15.6 Search - loading/populating of relational database while retaining document structure information, object citation numbering and other features (currently PostgreSQL and/or SQLite)
1.15.7 Search - database frontend sample, utilising database and SiSU features, including object citation numbering (backend currently PostgreSQL)
1.15.8 Other forms
1.16 Concordance / Word Map or rudimentary index
1.17 Managed (document) directory, database, or site structure
1.18 Batch processing
1.19 Integration to superior Gnu/Linux and Unix tools
1.19.1 Backup and version control
1.19.2 Editor support
1.20 Modular design, need something new add a module

2. Markup and Output Examples

2.1 Markup examples
2.2 A few book (and other) examples
2.2.1 "Viral Spiral", David Bollier
"The Wealth of Networks", Yochai Benkler
"Two Bits", Christopher Kelty
"Free Culture", Lawrence Lessig
"CONTENT", Cory Doctorow
"Democratizing Innovation", by Eric von Hippel
"Free as in Freedom: Richard Stallman's Crusade for Free Software", by Sam Williams
"Free For All: How Linux and the Free Software Movement Undercut the High Tech Titans", by Peter Wayner
"The Cathedral and the Bazaar", by Eric S. Raymond
"Down and out in the Magic Kingdom", Cory Doctorow
"Little Brother", Cory Doctorow
"For the Win", Cory Doctorow
"Accelerando", Charles Stross
"Tainaron", Leena Krohn
"Sphinx or Robot", Leena Krohn
"War and Peace", Leo Tolstoy, PG Etext 2600
"Don Quixote", Miguel de Cervantes [Saavedra], translated by John Ormsby, PG Etext 996
"Gulliver's Travels", Jonathan Swift, transcribed from the 1892 George Bell and Sons edition by David Price, PG Etext 829
"Alice's Adventures in Wonderland", Lewis Carroll, PG Etext 11
"Through The Looking-Glass", Lewis Carroll, PG Etext 12
"Alice's Adventures in Wonderland" and "Through The Looking-Glass", Lewis Carroll, PG Etexts 11 and 12
"Gnu Public License 2", (GPL 2) Free Software Foundation
"Gnu Public License v3 - Third discussion draft", (GPLv3) Free Software Foundation
"Debian Social Contract"
"Debian Constitution v1.3", (simple/default markup)
"Debian Constitution v1.3", (markup adjusted for output to more closely match the original)
"Debian Constitution v1.2", (simple/default markup)
"Debian Constitution v1.2", (markup adjusted for output to more closely match the original)
"A Uniform Sales Terminology", Vikki Rogers and Albert Kritzer
"The Autonomous Contract" 1997 - markup sample
"The Autonomous Contract Revisited" - markup sample
"United Nations Convention on Contracts for the International Sale of Goods"
/PECL/ the "Principles of European Contract Law"
2.3 SQL - PostgreSQL, SQLite
2.4 Lex Mercatoria as an example
2.5 For good measure the markup for a document with lots of (simple) tables
2.6 And a link to the output of a reported case

3. A Checklist of Output Features

4. Introduction to SiSU Markup  114 

4.1 Summary
4.2 Markup Examples
4.2.1 Online
4.2.2 Installed

5. Markup of Headers

5.1 Sample Header
5.2 Available Headers

6. Markup of Substantive Text

6.1 Heading Levels
6.2 Font Attributes
6.3 Indentation and bullets
6.4 Footnotes / Endnotes
6.5 Links
6.5.1 Naked URLs within text, dealing with urls
6.5.2 Linking Text
6.5.3 Linking Images
6.6 Grouped Text
6.6.1 Tables
6.6.2 Poem
6.6.3 Group
6.6.4 Code
6.7 Book index

7. Composite documents markup

Markup Syntax History

8. Notes related to Files-types and Markup Syntax

9. Commands Summary

9.1 Description
9.2 Document Processing Command Flags

10. command line modifiers

11. database commands

12. Shortcuts, Shorthand for multiple flags

12.1 Command Line with Flags - Batch Processing

Technical Information

13. Technical notes

13.1 See abandoned U.S. Provisional Patent Application

14. Diagram / Chart

14.1 The Chart
14.2 I/O
14.3 The Program
14.4 Software utilised
14.4.1 SiSU
14.4.2 SiSU Modules

15. SiSU development environment and technologies of interest, including data formats

15.1 Development environment, Debian
15.2 Programming language, Ruby
15.3 SGML & XML Family
15.3.1 SGML
15.3.2 XML Family
15.4 TeX Family
15.5 Pdf
15.6 Relational Databases, SQL
15.7 Other Databases
15.8 Text Search
15.9 Character Encoding, Unicode
15.10 Information Visualization
15.11 Metadata - semantic
15.12 Syndication, Web feed formats
15.13 Other
15.14 Editors
15.15 Version Control
15.16 Licenses

A Summary of notable events

16. A history of SiSU and its outputs including search

A Chronological history of developments on SiSU

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

January
February
March
April
June
July
August
September
November
December

2004

January
February
March
April
May
June
July
August
September
October
November
December

2005

January
February
March
April
May
June
July
August
September
October
November
December

2006

January
February
March
April
May
June
July
August
September
October
November
December

2007

January
February
March
April
May
June
July
August
September
November
December

2008

January
February
April
June
September
October
November
December

2009

January
December

2010

March

2010

March

FAQ, Howto, Installation, etc.

HowTo

17. Getting Help

17.1 SiSU "man" pages
17.2 SiSU built-in help
17.3 Command Line with Flags - Batch Processing

18. Setup, initialisation

18.1 initialise output directory
18.1.1 Use of search functionality, an example using sqlite
18.2 misc
18.2.1 url for output files -u -U
18.2.2 toggle screen color
18.2.3 verbose mode
18.2.4 quiet mode
18.2.5 maintenance mode intermediate files kept -M
18.2.6 start the webrick server
18.3 remote placement of output

19. Configuration Files

20. Markup

20.1 Headers
20.2 Font Face
20.2.1 Bold
20.2.2 Italics
20.2.3 Underscore
20.2.4 Strikethrough
20.3 Endnotes
20.4 Links
20.5 Number Titles
20.6 Line operations
20.7 Tables
20.8 Grouped Text
20.9 Composite Document

21. Change Appearance

21.1 Skins
21.2 CSS

Extracts from the README

22. README

22.1 Online Information, places to look
22.2 Installation
22.2.1 Debian
22.2.2 RPM
22.2.3 Source package .tgz
22.2.4 to use setup.rb
22.2.5 to use install (prapared with "Rake")
22.2.6 to use install (prapared with "Rant")
22.3 Dependencies
22.4 Quick start
22.5 Configuration files
22.6 Use General Overview
22.7 Help
22.8 Directory Structure
22.9 Configuration File
22.10 Markup
22.11 Additional Things
22.12 License
22.13 SiSU Standard

Extracts from man 8 sisu

23. Post Installation Setup

23.1 Post Installation Setup - Quick start
23.2 Document markup directory
23.2.1 Configuration files
23.2.2 Debian INSTALLATION Note
23.2.3 Document Resource Configuration
23.2.4 Skins

24. FAQ - Frequently Asked/Answered Questions

24.1 Why are urls produced with the -v (and -u) flag that point to a web server on port 8081 ?
24.2 I cannot find my output, where is it?
24.3 I do not get any pdf output, why?
24.4 Where is the latex (or some other interim) output?
24.5 Why isn't SiSU markup XML
24.6 LaTeX claims to be a document preparation system for high-quality typesetting. Can the same be said about SiSU?
24.7 Can the SiSU markup be used to prepare for a LaTex automatic building of an index to the work?
24.8 Can the conversion from SiSU to LaTeX be modified if we have special needs for the LaTeX, or do we need to modify the LaTeX manually?
24.9 How do I create GIN or GiST index in Postgresql for use in SiSU
24.10 Are there some examples of using Ferret Search with a SiSU repository?
Have you had any reports of building SiSU from tar on Mac OS 10.4?
24.12 Where is version 1?
24.13 What is the difference between version 1 and 2?

Installation

25. Installation

25.1 Debian
25.2 Other Unix / Linux
25.2.1 source tarball

26. SiSU Components, Dependencies and Notes

26.1 sisu
26.2 sisu-complete
26.3 sisu-examples
26.4 sisu-pdf
26.5 sisu-postgresql
26.6 sisu-remote
26.7 sisu-sqlite

27. Quickstart - Getting Started Howto

27.1 Installation
27.1.1 Debian Installation
27.1.2 RPM Installation
27.1.3 Installation from source
27.2 Testing SiSU, generating output
27.2.1 basic text, plaintext, html, XML, ODF, EPUB
27.2.2 LaTeX / pdf
27.2.3 relational database - postgresql, sqlite
27.3 Getting Help
27.3.1 The man pages
27.3.2 Built in help
27.3.3 The home page
27.4 Markup Samples

28. SiSU Components, Dependencies and Notes

29. Breakage and Fixes

31st October 2006 - SiSU < 0.48.3 break against Ruby > 1.8.5-3, break on cyclic include; Fixed SiSU: >=0.48.3 (see notes)
21st September 2005 - Avoid ruby-1.8.3 (2005-09-21) and (2005-10-12), Ruby Segfaults; Fixed: later versions of Ruby (see notes)

License, Standard

30. License

31. Things SiSU Standard

Download information

Download information

32. Download SiSU - Linux/Unix

SiSU Current Version - Linux/Unix
Source (tarball tar.gz)
Git (source control management)
Debian
RPM

Changelog - sisu

33. SiSU Version Manifest / changelog

Current version
3.0
Previous versions
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2.0
1.0
0.71
0.70
0.69
0.68
0.67
0.66
0.65
0.64
0.63
0.62
0.61
0.60
0.59
0.58
0.57
0.56
0.55
0.54
0.53
0.52
0.51
0.50
0.49
0.48
0.47
0.46
0.45
0.44
0.43
0.42
0.41
0.40
0.39
0.38
0.37
0.36
0.35
0.34
0.33
0.32
0.31
0.30
0.29
0.28
0.27
0.26
0.25
0.24
0.23
0.22
0.21
0.20
0.18
0.16
0.14
0.12
0.10
0.8
0.6
0.4
0.2
0.1
Release

Changelog - sisu-markup-samples

34. Version Manifest / changelog - SiSU Markup Samples

Current version
2.0
1.1
1.0

Method for providing digital documents including a common citation structure

[SiSU Provisional Patent Application of 2004 based on much older idea and work on SiSU, Abandoned]

The 'Invention' described (and diagrams) by Ralph Amissah.
Provisional patent application text prepared by Stephan Filipek of Winston & Strawn LLP

35. 1. Background

36. 2. Definitions

37. 3. Brief Descriptions of the Drawings

38. 4. Detailed Description of the Preferred Embodiments

39. 5. Document Processing, examples of subsequent steps

40. 6. Advantages of the Invention

41. 7. THE CLAIMS

Post Filing Appendix

42. Post Filing Appendix: Reasons for Abandonment of Patent Process Claim

Endnotes

Endnotes






SiSU is a flexible document preparation, generation publishing and search system.  1 

SiSU ("SiSU information Structuring Universe" or "Structured information, Serialized Units"),  2  is a Unix command line oriented framework for document structuring, publishing and search. Featuring minimalistic markup, multiple standard outputs, a common citation system, and granular search.

Using markup applied to a document, SiSU can produce plain text, HTML, XHTML, XML, OpenDocument, EPUB, LaTeX or PDF files, and populate an SQL database with objects  3  (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity (e.g. your search criteria is met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.

SiSU is the data/information structuring and transforming tool, that has resulted from work on one of the oldest law web projects. It makes possible the one time, simple human readable markup of documents, that SiSU can then publish in various forms, suitable for paper  4  , web  5  and relational database  6  presentations, retaining common data-structure and meta-information across the output/presentation formats. Several requirements of legal and scholarly publication on the web have been addressed, including the age old need to be able to reliably cite/pinpoint text within a document, to easily make footnotes/endnotes, to allow for semantic document meta-tagging, and to keep required markup to a minimum. These and other features of interest are listed and described below. A few points are worth making early (and will be repeated a number of times):

(i) The SiSU document generator was the first to place material on the web with a system that makes possible citation across different document types, with paragraph, or rather object citation numbering  7  a text positioning system, available for the pinpointing of text, 1997, a simple idea from which much benefit, and SiSU remains today, to the best of my knowledge, the only multiple format e-book/ electronic-document system on the web that gives you this possibility (including for relational databases).

(ii) Markup is done once for the multiple formats produced.

(iii) Markup is simple, and human readable (with a little practice), in almost all cases there is less and simpler markup required than basic html. In any event the markup required is very much simpler than the html, EPUB, LaTeX, [lout], structured XML, ODF (OpenDocument), PostgreSQL or SQLite feed etc. that you can have SiSU generate for you.

(iv) SiSU is a batch processor, dealing with as many files as you need to generate at a time.

(v) Scalability is dependent on your file system (in my case Reiserfs), the database (currently Postgresql and/or SQLite) and your hardware.

SiSU Sabaki  8  (or just SiSU) is the provisional name given to the software described here that helps structure documents for web and other publication. The name SiSU is a loose anagram for something along the lines of "SiSU is structuring unit", or "SiSU, information structuring unit" or the more descriptive "Structured information, Serialized Units" or "simple - information structuring unit" or the more descriptive "Structured information, Serialized Units" or what it may be directed towards "*semantic* and information structuring universe" 9  tongue in cheek, only just. Guess I'll get away with "Simple - information Structuring Universe". SiSU is also a Finnish word roughly meaning guts, inner strength and perseverance.  10 

SiSU was born of the need to find a way, with minimal effort, and for as wide a range of document types as possible, to produce high quality publishing output in a variety of document formats. As such it was necessary to find a simple document representation that would work across a large number of document types, and the most convenient way(s) to produce acceptable output formats. The project leading to this program was started in 1993 (together with the trade law project now known as Lex Mercatoria) as an investigation of how to effectively/efficiently place documents on the web. The unified document handling, together with features such as paragraph numbering, endnote handling and tables... appeared in 1996/97. SiSU was originally written in Perl,  11  and converted to Ruby,  12  in 2000, one of the most impressive programming languages in existence! In its current form it has been written to run on the Gnu/Linux platform, and in particular on Debian,  13  taking advantage of many of the wonderful projects that are available there.

SiSU markup is based on requiring the minimum markup needed to determine the structure of a document. (This can be as little as saying in a header to look for the word Book at a specified level and the word Chapter at another level). SiSU then breaks a document into its smallest parts (at a heading, and paragraph level) while retaining all structural information. This break up of the document and information on its structure is taken advantage of in the transformations made in generating the very different output types that can be created, and in providing as much as can be for what each output type is best at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf or Postscript), EPUB, XML (in this case, structural representation), ODF (OpenDocument [experimental]), SQL (e.g. document search; representing constituent parts of documents based on their structure, headings, chapters, paragraphs as required; user control).  14 

From markup that is simpler and more sparse than html you get:

  • far greater output possibilities, including html, EPUB, XML, ODF (OpenDocument), LaTeX (pdf), and SQL;
  • the advantages implicit in the very different output possibilities;
  • a common citation system (for all outputs - including the relational database, search results are relevant for all outputs);
  • For more see the short summary of features provided below.

    SiSU processes files with minimal tagging to produce various document outputs including html, EPUB, ODF, LaTeX (which is converted to pdf) and if required loads the structured information into an SQL database (PostgreSQL and SQLite have been used for this). SiSU produces an intermediate processing format.  15 

    SiSU was originally used in constructing Lex Mercatoria ‹http://lexmercatoria.org/› or ‹http://www.jus.uio.no/lm/› (one of the oldest law web sites), and considerable thought went into producing output that would be suitable for legal and academic writings (that do not have formulae) given the limitations of html, and publication in a wide variety of "formats", in particular in relation to the convenient and accurate citation of text. However, the construction of Lex Mercatoria uses only a fraction of the features available from SiSU today, /vis/ generation of flat file structures, rather than in addition the building of ("granular") SQL database content, (at an object level with relevant relational tables, and other outputs also available).

    (i) markup syntax: (a) simpler than html, (b) mnemonic, influenced by mail/messaging/wiki markup practices, (c) human readable, and easily writable,

    (ii) (a) minimal markup requirement, (b) single file marked up for multiple outputs,

    notes:

    * documents are prepared in a single UTF-8 file using a minimalistic mnemonic syntax. Typical literature, documents like "War and Peace" require almost no markup, and most of the headers are optional.

    * markup is easily readable/parsed by the human eye, (basic markup is simpler and more sparse than the most basic html), [this may also be converted to XML representations of the same input/source document].

    * markup defines document structure (this may be done once in a header pattern-match description, or for heading levels individually); basic text attributes (bold, italics, underscore, strike-through etc.) as required; and semantic information related to the document (header information, extended beyond the Dublin core and easily further extended as required); the headers may also contain processing instructions.

    (iii) (a) multiple outputs primarily industry established and institutionally accepted open standard formats, include amongst others: plaintext (UTF-8); html; EPUB; (structured) XML; ODF (Open Document text)l; LaTeX; PDF (via LaTeX); SQL type databases (currently PostgreSQL and SQLite). Also produces: concordance files; document content certificates (md5 or sha256 digests of headings, paragraphs, images etc.) and html manifests (and sitemaps of content). (b) takes advantage of the strengths implicit in these very different output types, (e.g. PDFs produced using typesetting of LaTeX, databases populated with documents at an individual object/paragraph level, making possible granular search (and related possibilities))

    (iv) outputs share a common numbering system (dubbed "object citation numbering" (ocn)) that is meaningful (to man and machine) across various digital outputs whether paper, screen, or database oriented, (PDF, html, EPUB, XML, Opendocument, sqlite, postgresql), this numbering system can be used to reference content.

    (v) SQL databases are populated at an object level (roughly headings, paragraphs, verse, tables) and become searchable with that degree of granularity, the output information provides the object/paragraph numbers which are relevant across all generated outputs; it is also possible to look at just the matching paragraphs of the documents in the database; [output indexing also work well with search indexing tools like hyperesteier].

    (vi) use of semantic meta-tags in headers permit the addition of semantic information on documents, (the available fields are easily extended)

    (vii) creates organised directory/file structure for (file-system) output, easily mapped with its clearly defined structure, with all text objects numbered, you know in advance where in each document output type, a bit of text will be found (e.g. from an SQL search, you know where to go to find the prepared html output or PDF etc.)... there is more; easy directory management and document associations, the document preparation (sub-)directory may be used to determine output (sub-)directory, the skin used, and the SQL database used,

    (viii) "Concordance file" wordmap, consisting of all the words in a document and their (text/ object) locations within the text, (and the possibility of adding vocabularies),

    (ix) document content certification and comparison considerations: the document and each object within it stamped with an md5 hash making it possible to easily check or guarantee that the substantive content of a document is unchanged.

    (x) SiSU's minimalist markup makes for meaningful "diffing" of the substantive content of markup-files,

    (xi) easily skinnable, document appearance on a project/site wide, directory wide, or document instance level easily controlled/changed,

    (xii) in many cases a regular expression may be used (once in the document header) to define all or part of a documents structure obviating or reducing the need to provide structural markup within the document,

    (xiii) prepared files may be batch process, documents produced are static files so this needs to be done only once but may be repeated for various reasons as desired (updated content, addition of new output formats, updated technology document presentations/representations)

    (xiv) possible to pre-process, which permits: the easy creation of standard form documents, and templates/term-sheets, or; building of composite documents (master documents) from other sisu marked up documents, or marked up parts, i.e. import documents or parts of text into a main document should this be desired

    there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added.

    (xv) there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added: (a) modular, (thanks in no small part to Ruby) another output format required, write another module.... (b) easy to update output formats (eg html, XHTML, EPUB, LaTeX/PDF produced can be updated in program and run against whole document set), (c) easy to add, modify, or have alternative syntax rules for input, should you need to,

    (xvi) scalability, dependent on your file-system (ext3, Reiserfs, XFS, whatever) and on the relational database used (currently Postgresql and SQLite), and your hardware,

    (xvii) only marked up files need be backed up, to secure the larger document set produced,

    (xviii) document management,

    (xix) Syntax highlighting for SiSU markup is available for a number of text editors.

    (xx) remote operations: (a) run SiSU on a remote server, (having prepared sisu markup documents locally or on that server, i.e. this solution where sisu is installed on the remote server, would work whatever type of machine you chose to prepare your markup documents on), (b) generated document outputs may be posted by sisu to remote sites (using rsync/scp) (c)document source (plaintext utf-8) if shared on the net may be identified by its url and processed locally to produce the different document outputs.

    (xxi) document source may be bundled together (automatically) with associated documents (multiple language versions or master document with inclusions) and images and sent as a zip file called a sisupod, if shared on the net these too may be processed locally to produce the desired document outputs, these may be downloaded, shared as email attachments, or processed by running sisu against them, either using a url or the filename.

    (xxii) for basic document generation, the only software dependency is Ruby, and a few standard Unix tools (this covers plaintext, html, EPUB, XML, ODF, LaTeX). To use a database you of course need that, and to convert the LaTeX generated to PDF, a LaTeX processor like tetex or texlive.

    as a developers tool it is flexible and extensible

    SiSU was developed in relation to legal documents, and is strong across a wide variety of texts (law, literature...). SiSU handles images but is not suitable for formulae/ statistics, or for technical writing at this time.

    SiSU has been developed and has been in use for several years. Requirements to cover a wide range of documents within its use domain have been explored.

    Some modules are more mature than others, the most mature being html and LaTeX / pdf. PostgreSQL and search functions are useable and together with /ocn/ unique (to the best of my knowledge). The XML output document set is "well formed" but largely proof of concept.

    SiSU markup is fairly minimalistic, it consists of: a (largely optional) document header, made up of information about the document (such as when it was published, who authored it, and granting what rights) and any processing instructions; and markup within text which is related to document structure and typeface. SiSU must be able to discern the structure of a document, (text headings and their levels in relation to each other), either from information provided in the instruction header or from markup within the text (or from a combination of both). Processing is done against an abstraction of the document comprising of information on the document's structure and its objects,  16  which the program serializes (providing the object numbers) and which are assigned hash sum values based on their content. This abstraction of information about document structure, objects, (and hash sums), provides considerable flexibility in representing documents different ways and for different purposes (e.g. search, document layout, publishing, content certification, concordance etc.), and makes it possible to take advantage of some of the strengths of established ways of representing documents, (or indeed to create new ones).

    SiSU markup is based on requiring the minimum markup needed to determine the structure of a document. (This can be as little as saying in a header to look for the word Book at a specified level and the word Chapter at another level). SiSU then breaks a document into its smallest parts (at a heading, and paragraph level) while retaining all structural information. This break up of the document and information on its structure is taken advantage of in the transformations made in generating the very different output types that can be created, and in providing as much as can be for what each output type is best at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf or Postscript), EPUB, XML (in this case, structural representation), ODF (OpenDocument), SQL (e.g. document search; representing constituent parts of documents based on their structure, headings, chapters, paragraphs as required; user control).  17 

    One of its strengths is that very small amounts of initial tagging is required for the program to generate its output.

    This is a basic markup example:

    Emphasis has been on simplicity and minimalism in markup requirements. Design philosophy is to try keep the amount of markup required low, for whatever has been determined to be acceptable output.  19 

    SiSU's markup is more minimalistic and simpler than (the equivalent) html and for it, you get considerably more than just html, as this preparation gives you all available output formats, upon request.

    For each document, there is only one (input, minimalistically marked up) file from which all the available output types are generated.  20 

    Eg. the markup example:

    Produces the following output:

    (and in addition to these: PostgreSQL, SQLite, texinfo and YAML   33  versions if desired)

    Syntax is kept simple and mnemonic.  34 

    To keep SiSU markup sparse and simple SiSU deliberately provides a limited publishing feature set, including: indent levels; bold; italics; superscript; subscript; simple tables; images; tables of contents and; endnotes. Which in most cases are available across the different output formats.

    The publishing feature set may be expanded as required.

    Output is designed to be uniform, easy to read, navigate and cite.

    Code  35  is separated from content. This means that when changes are desired in the output presentation, the code that produces them, and not the marked up text data set (which could be thousands of documents) is modified. Separating code from content makes large scale changes to output appearance trivial, and permits the easy addition of new output modules.

    Object citation numbering is a simple object (text) positioning and cition system that is human relevant and machine useable, used by SiSU for all manner of presentations, and that is available for use in all text mappings. It is based on the automated sequential numbering of objects (roughly paragraphs, (headings, tables, verse) or other blocks of text or images etc.). The text positioning system (in which I claim copyright) is invaluable for publishing requiring the citing text across multiple output formats, and for the general mapping of text within a document:

  • in html, html not being easily citeable (change font size, or use a different browser and the page on which specific text appears has changed), and
  • across multiple formats being common to all output formats html/xml/pdf/sql output,
  • the results of an sql search can just be "live" citation references to the documents in which the text is found, much like an index (see image examples provided).   36 
  • I claim copyright on the system I use which is the most basic of all, numbering all text in headings and paragraphs sequentially (with tables and images being treated as a single paragraph) and only footnotes/endnotes not following this numbering, as their position in text is not strictly determined, (a change from footnotes to endnotes would change their numbering), footnotes instead "belong" to the paragraph from which they are referenced, and have sequential numbers of their own.

    SiSU has a paragraph numbering system, that remains the same regardless of the output format. This provides an effective means of citation, pinpointing text accurately in all output formats, using the same reference. This is particularly useful where text has to be located across different output formats - for example once html is printed the number of pages and pages on which given text is found will vary depending on the browser, its settings the font size setting etc. Similarly SiSU produces pdf in different forms, eg. on the example site Lex Mercatoria as portrait and landscape documents - here too page numbering varies, but paragraph numbering is the same, vis a vis all versions of the text (portrait and landscape pdf and the html versions of the text, and as stored (with "paragraphs" as records) to the PostgreSQL or SQLite database).

    These numbers are placed in the text margins and are intended to be independent of and not to interfere with authors tagging. [The citation system (object citation numbering system, automated "paragraph numbering") which is automatically generated and is common and identical across all document formats] The paragraph numbering system is more accurately described as an (text) object numbering system, as headings are also numbered... all headings and paragraphs are numbered sequentially. Endnotes are automatically numbered independently and rather "belong" to the paragraph from which they are referenced, as an endnote does not (necessarily) form a part of a documents sequence, (they may be produced as either endnotes or footnotes (or both depending on what output you choose to look at - if you take the segmented html version document provided as an example, you will find that the endnotes are placed both at the end of each section, and in a separate section of their own called endnotes, and these are hyper-linked)). An attractive feature of providing citation numbering in this way is that it is independent of the document structure... it remains the same regardless of what is done about the document structure.

    The rules have been kept very simple, unique incremental object citation numbers are assigned to headings, paragraphs, verse, tables and images. It is possible to manually override this feature on a per heading or comment basis though this should be used exceptionally, it may be of use where there a substantive text, and the addition of a minor comment by the publisher that should not be mapped as part of the text.

    The object citation number markers contain additional numbering information with regard to the document structure, that can be used for alternative presentations, including such detail as the type of object (heading, paragraph, table, image, etc.), numbered sequentially.

    An advantage is that the numbering remains the same regardless of document structure.

    Text object ("paragraph") numbering is the same for all output versions of the same document, vis html, epub, pdf, pgsql, etc.

    In the relational database, as individual text objects of a document stored (and indexed) together with object numbers, and all versions of the document have the same numbering, the results of searches may be tailored just to provide the location of the search result in all available document formats.

    Note: there is a bug in the released behaviour of object citation numbering, (not certain when it was introduced) tables should be numbered, ie each table gets an ocn, required amongst other things for relational database. This will be corrected in a future release. Citation numbering of existing documents that contain tables will changed.

    SiSU is able to use meta tags based on the Dublin Core  37  and Resource Description Framework  38 

    This provides the means of providing semantic information about a document, both as computer processable meta-tags, and as human readable information that may be of value for classification purposes.

    This information is provided both in html metatags, and (where available) under the section titled "Document Information - Metadata", near the end of a document, for example in the segmented html version of this text at: ‹http://www.jus.uio.no/sisu/SiSU/metadata.html

    1. Directory file association, skins and special image management, made simpler.  39 

    The last part of the name of the work directory in which markup is being done, or rather from where SiSU is run in order to generate document output, is used in determining the sub-directory name for output files, that is created in the document output directory. This provides a rather easy way to associate documents e.g. of a given subject, or by owner.

      /www/docs
          /intellectual_property
          /arbitration
          /contract_law

      /www/docs
          /ralph
          /sisu

    all are placed in their own directories within the directory structure created. Similar rules are used in the creation of sql type databases (though they can be overridden).

    There are a couple of further associations with these directories.

    Directory wide skins.

    Directory specific images.

    2. If there is a "directory skin", that is a skin of the same name as the directory, it is used in the generation of the documents within it, rather than the default skin, unless the document has a specific skin associated with it.

    a. default skin (always available)

    b. directory skin (precedence over default if exists)

    c. document skin (takes precedence wherever document requests a specific skin)

    Skins are defined in the document skin directory and if a directory association is desired a softlink made to the relevant skin. Skins (directory association auto load) auto load skin if a directory skin exists of same name as directory stub, (and there is no specific doc skin)

    3. If the working directory has within it a sub-directory called image_local, the images within that directory are used for references to images, that are not part of the default site build.

    The possibility of citing an exact document version.

    Permits the inclusion of document version control information to the document body and metatags.  40  This provides a much more certain method of referring to the exact version of a particular document, (assuming that the document is from a trusted source, that will retain earlier versions of a document).  41 

    This information (where available) is provided under the section of the document titled "Document Information - MetaData", near the end of a document, for example in the segmented html version of this text at: ‹http://www.jus.uio.no/sisu/SiSU/metadata.html

    SiSU produces a rudimentary a table of contents based on document headings.

    Headings can be automatically numbered, (and automatically named for hyper-linking)

    SiSU can automatically number footnotes/endnotes. This is the default operation where no number is provided.

    Footnotes/endnotes may also be manually numbered. Where a number, or numbers are provided for a footnote/endnote, this does not increment the automatic footnote/endnote number counter.

    In the html output footnotes/endnotes are cross-hyper-linked (to their reference point and vice versa). In th pdf output footnotes are linked from their reference point only.

    SiSU is skinnable, on a site-wide, directory-wide and per document basis, so different looking versions of things may be produced with little difficulty. There is a default skin which may be modified, as the background site skin, and each working directory may have a skin associated with it, as may each individual document. The hierarchy of application is document, directory, then site... ie if a document skin exists it gets precedence.

    Whilst it is skinnable, the default output styles are selected to work across the widest possible range of document types.

    From markup that is simpler and more sparse than html you get:

  • far greater output possibilities, including multiple html types, XML (different structured types), LaTeX (pdf landscape, portrait), and SQL (Postgresql or SQLite or other);
  • the advantages implicit in these very different output possibilities;  42 
  • a common citation system
  • As many output formats/presentations as one cares to write modules for - several types of html (e.g. structure based on css, or structure based on tables); LaTeX/pdf and Lout/pdf; pgsql other databases easily added; yaml...

    Most documents are produced in single and segmented html versions, described below:

    The Scroll (full length text presentations)

    The full length of the text in a single scrollable document.  43  As a rule the files they are saved in are named: /doc/ or more precisely doc.html

    For various reasons texts may only be provided in this form (such as this one which is short), though most are also provided as segmented texts.

    "Scroll" is a reference to the historical scroll, a single long document/ parchment, and also no doubt to what you will have to do to get to the bottom of the text.  44 

    The Segmented Text

    The text divided into segments (such as articles or chapters depending on the text)  45  As a rule the files they are saved in are named: /toc/ and /index/ or more precisely toc.html and index.html

    If you know exactly what you are looking for, loading a segment of text is faster (the segments being smaller). Occasionally longer documents such as the WTA 1994 ‹http://www.jus.uio.no/lm/wta.1994/toc› are only provided in segmented form.

    Cascading Style Sheet, and Table based html

    SiSU outputs html, two current standard forms available are:

    and

    table based [largely discontinued]  46 

    The html is tested across several browsers

    I like to remind you that there are other excellent browsers out there, many of which have long supported practical features like tabbing.

    The html is tested across several browsers, including:

    Also lighter weight graphical browsers:

    And for console/text browsing:

    The html tables output is rendered more accurately across a wider variety set and older versions of browsers (than the html css output).

    SiSU generates EPUB documents.

    SiSU generates well formed XML, and multiple versions. An XML SAX version with a flat/shallow structure, and XML DOM version with a deeper (embedded) structure. There is also a released working xhtml module. Examples of SAX and DOM versions are provided within this document.

    SiSU generates Open Document Output format.

    SiSU outputs LaTeX if required which is easily transformed to PDF.  60  PDF documents are generated on the site from the same source files and Ruby program that produce html. Landscape oriented pdf introduced, providing easier screen viewing, they are also (paper saving, being currently) formatted to have fewer pages than their portrait equivalents.

    SiSU (from the same markup input file) automatically feeds into PostgreSQL  64  and/or SQLite  65  database (could be any other of the better relational databases)  66  - together with all additional information related to document structure, and the alternative ways in which it is generated on the site retained. As regards scaling of the database, it is as scalable as the database (here Postgresql or SQLite) and hardware allow. I will prune the images later.

    This is one of the more interesting output forms, as all the structural data for the documents are retained (though can be ignored by the user of the database should they so choose). All site texts/documents are (currently) streamed to four pgsql database tables:

  • one containing semantic (and other) headers, including, title, author, subject, (the Dublin Core...);
  • another the substantive texts by individual "paragraph" (or object) - along with structural information, each paragraph being identifiable by its paragraph number (if it has one which almost all of them do), and the substantive text of each paragraph quite naturally being searchable (both in formatted and clean text versions for searching); and
  • a third containing endnotes cross-referenced back to the paragraph from which they are referenced (both in formatted and clean text versions for searching).
  • a fourth table with a one to one relation with the headers table contains full text versions of output, eg. pdf, html, xml, and ascii.
  • There is of course the possibility to add further structures.

    At this level SiSU loads a relational database with documents broken in to their smallest logical structurally constituent parts, as text objects, with their object citation number and all other structural information needed to construct the structured document. Text is stored (at this text object level) with and without elementary markup tagging, the stripped version being so as to facilitate ease of searching.

    Because the document structure of sites created is clearly defined, and the text object citation system is available for all forms of output, it is possible to search the sql database, and either read results from that database, or just as simply map the results to the html output, which has richer text markup.

    The combination of the SiSU citation system with a relational database is pretty powerful, giving rise to several possibilities. As individual text objects of a document stored (and indexed) together with object numbers, and all versions of the document have the same numbering, complex searches can be tailored to return just the locations of the search results relevant for all available output formats, with live links to the precise locations in the database or in html/xml documents; or, the structural information provided makes it possible to search the full contents of the database and have headings in which search content appears, or to search only headings etc. (as the Dublin Core is incorporated it is easy to make use of that as well).

    This is a larger scale project, (with little development on the front end largely ignored), though the "infrastructure" has been in place since 2002.

    Sample search frontend   67  A small database and sample query front-end (search from) that makes use of the citation system, object citation numbering to demonstrates functionality.  68 

    SiSU can provide information on which documents are matched and at what locations within each document the matches are found. These results are relevant across all outputs using object citation numbering, which includes html, EPUB, XML, LaTeX, PDF and indeed the SQL database. You can then refer to one of the other outputs or in the SQL database expand the text within the matched objects (paragraphs) in the documents matched.

    (further work needs to be done on the sample search form, which is rudimentary and only passes simple booleans correctly at present to the SQL engine)

    A few canned searches, showing object numbers. Search for:

    Note that the searches done in this form are case sensitive.

    Expand those same searches, showing the matching text in each document:

    Note you may set results either for documents matched and object number locations within each matched document meeting the search criteria; or display the names of the documents matched along with the objects (paragraphs) that meet the search criteria.  69 

    OCN index mode, (object citation number) the numbers displayed are relevant (and may be used to reference the match) in any sisu generated rendition of the text  70  the links provided are to the locations of matches within the html generated by SiSU.

    Paragraph mode, you may alternatively display the text of each paragraph in which the match was made, again the object/paragraph numbers are relevant to any SiSU generated/published text.

    Several options for output - select database to search, show results in index view (links to locations within text), show results with text, echo search in form, show what was searched, create and show a "canned url" for search, show available search fields. Also shows counters number of documents in which found and number of locations within documents where found. [could consider sorting by document with most occurrences of the search result].

    Simple search, results with files in which search found, and text object (paragraph or endnote) where found within files.

    There are other forms as well, YAML file, Ruby Marshal dumps, document pre-processing (processing of documents prior to the steps described here, to produce input suitable for the program) snap in a new module as required/desired, well formed XML, no problem.

    Concordance /WordMaps:  71  SiSU produces a rudimentary index based on the words within the text, making use of paragraph numbers to identify text locations. This is generated in html and hyper-linked but identifies these words locations in the other document formats. Though it is possible to search using a search engine, this is a means for browsing an alphabetical list of words which may suggest other useful content.

    SiSU builds the web site (or more generically provides a suitable directory structure) - placing various output texts in the hierarchy of the web-site (or db), which (for directories) is a sub-directory with the name of the text file.

    SiSU is a batch processing tool, handling and transforming multiple (or individual) documents (in many ways) with a single instruction.

    As should have been noted by the above description of SiSU, it makes use of existing programs found on Gnu/Linux and Unix, amongst those already mentioned include the LaTeX to pdf converters and the database PostgreSQL or SQLite.

    Unix provides many tools for version control. For documents Subversion, CVS and even the old RCS are useful for the per-document histories they provide.

    For writing code superior (more recent) version control system exist. These can also be used for documents though they tend to take stamps of changes across the repository as a whole, rather than for each individual file that is tracked, (as CVS and RCS do). My personal preference is for distributed systems such as Git, Mercurial or Darcs, of which I use Git for both code and documents.

    Several backup tools exist. At the base level I tend to use rdiff.

    SiSU documents are prepared / marked up in utf-8 text you are free to use the text editor of your choice.

    Syntax highlighting for a number of editors are provided. Amongst them Vim, Kwrite, Kate, Gedit and diakonos. These may be found with configuration instructions at ‹http://www.sisudoc.org/sisu/sisu_syntax_highlighting/doc.htmlVim   72  as of version 7 has built in sytax highlighting for SiSU.

    Need a new output format that does not already exist, write a new module.

    Prefer a new input syntax, you could write a new syntax matching the existing design, though my personal preference is some uniformity in entry appearance. If necessary has been fairly easy to extend the design parameters. It is intended to incorporate some additional basic semantic tagging, (book, article, author etc.) However, keeping the requirements for input minimal, and relatively simple has been a design goal.



    Current markup examples and document output samples are provided at ‹http://www.jus.uio.no/sisu/SiSU/examples.html

    For some documents hardly any markup at all is required at all, other than a header, and an indication that the levels to be taken into account by the program in generating its output are.




    Aukio, by Leena Krohn

      73 


    Sphinx or Robot by Leena Krohn

    A Sample search form is available at ‹http://search.sisudoc.org

    A few canned searches, showing object numbers. Search for:

    Note that the searches done in this form are case sensitive.

    Expand those same searches, showing the matching text in each document:

    Note you may set results either for documents matched and object number locations within each matched document meeting the search criteria; or display the names of the documents matched along with the objects (paragraphs) that meet the search criteria.  110 

    There is quite a bit to peruse if you explore the site Lex Mercatoria:

    or perhaps:

    SiSU is not optimised for table making, but does handle simple tables.



    This table gives an indication of the features that are available for various forms of output of SiSU.

    sisu-2.0.0 on 2010-03-06

    featuretxtltx/pdfHTMLEPUBXML/sXML/dODFSQLitepgSQL
    headings*********
    footnotes*********
    bold, underscore, italics.********
    strikethrough.******
    superscript, subscript.******
    extended ascii set (utf-8)********
    indents*******
    bullets.*****.
    groups
    * tables***.....
    * poem****..*..
    * code****..*..
    url*******..
    links*******..
    images-***TT*TT
    image caption-***
    table of contents*****.
    page header/footer?-*****t
    line break*******
    page break**
    segments**
    skins******
    ocn.*****-?**
    auto-heading numbers*********
    minor list numbering*********
    special characters....

    sisu-1.0.0 on 2009-10-28

    featuretxtltx/pdfHTMLXML/sXML/dODFSQLitepgSQL
    headings********
    footnotes********
    bold, underscore, italics.*******
    strikethrough.*****
    superscript, subscript.*****
    extended ascii set (utf-8)*******
    indents******
    bullets.****.
    groups
    * tables**.....
    * poem***..*..
    * code***..*..
    url******..
    links******..
    images-**TT*TT
    image caption-**
    table of contents****.
    page header/footer?-****t
    line break******
    page break**
    segments*
    skins*****
    ocn.****-?**
    auto-heading numbers********
    minor list numbering********
    special characters...

    sisu-0.36.6 on 2006-01-23

    featuretxtltx/pdfHTMLXHTMLXML/sXML/dODFSQLitepgSQL
    headings*********
    footnotes*********
    bold, underscore, italics.********
    strikethrough.******
    superscript, subscript.******
    extended ascii set (utf-8)********
    indents*******
    bullets.*****.
    groups
    * tables**......
    * poem***...*..
    * code***...*..
    url*******..
    links*******..
    images-**TTT*TT
    image caption-**
    table of contents*****.
    page header/footer?-*****t
    line break*******
    page break**
    segments*
    skins******
    ocn.*****-?**
    auto-heading numbers*********
    minor list numbering*********
    special characters...

      Done
      * yes/done
      . partial
      - not available/appropriate
      Not Done
      T task todo
      t lesser task/todo
        not done



    SiSU source documents are plaintext (UTF-8)  115  files

    All paragraphs are separated by an empty line.

    Markup is comprised of:

  • at the top of a document, the document header made up of semantic meta-data about the document and if desired additional processing instructions (such an instruction to automatically number headings from a particular level down)
  • followed by the prepared substantive text of which the most important single characteristic is the markup of different heading levels, which define the primary outline of the document structure. Markup of substantive text includes:
  • heading levels defines document structure
  • text basic attributes, italics, bold etc.
  • grouped text (objects), which are to be treated differently, such as code blocks or poems.
  • footnotes/endnotes
  • linked text and images
  • paragraph actions, such as indent, bulleted, numbered-lists, etc.
  • Some interactive help on markup is available, by typing sisu and selecting markup or sisu --help markup

    To check the markup in a file:

    sisu --identify [filename].sst

    For brief descriptive summary of markup history

    sisu --query-history

    or if for a particular version:

    sisu --query-0.38

    Online markup examples are available together with the respective outputs produced from ‹http://www.jus.uio.no/sisu/SiSU/examples.html› or from ‹http://www.jus.uio.no/sisu/sisu_examples/

    There is of course this document, which provides a cursory overview of sisu markup and the respective output produced: ‹http://www.jus.uio.no/sisu/sisu_markup/

    an alternative presentation of markup syntax: /usr/share/doc/sisu/on_markup.txt.gz

    With SiSU installed sample skins may be found in: /usr/share/doc/sisu/markup-samples (or equivalent directory) and if sisu-markup-samples is installed also under: /usr/share/doc/sisu/markup-samples-non-free



    Headers contain either: semantic meta-data about a document, which can be used by any output module of the program, or; processing instructions.

    Note: the first line of a document may include information on the markup version used in the form of a comment. Comments are a percentage mark at the start of a paragraph (and as the first character in a line of text) followed by a space and the comment:

      % this would be a comment

    This current document is loaded by a master document that has a header similar to this one:

      % SiSU master 2.0

      @title: SiSU
       :subtitle: Manual

      @creator: :author: Amissah, Ralph

      @rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3

      @classify:
       :type: information
       :topic_register: SiSU:manual;electronic documents:SiSU:manual
       :subject: ebook, epublishing, electronic book, electronic publishing,
          electronic document, electronic citation, data structure,
           citation systems, search

      % used_by: manual

      @date:
       :published: 2008-05-22
       :created: 2002-08-28
       :issued: 2002-08-28
       :available: 2002-08-28
       :modified: 2010-03-03

      @make:
       :num_top: 1
       :breaks: new=C; break=1
       :skin: skin_sisu_manual
       :bold: /Gnu|Debian|Ruby|SiSU/
       :manpage: name=sisu - documents: markup, structuring, publishing in multiple standard formats, and search;
           synopsis=sisu [-abcDdeFhIiMmNnopqRrSsTtUuVvwXxYyZz0-9] [filename/wildcard ]
           . sisu [-Ddcv] [instruction]
           . sisu [-CcFLSVvW]
           . sisu --v2 [operations]
           . sisu --v3 [operations]

      @links:
       { SiSU Homepage }http://www.sisudoc.org/
       { SiSU Manual }http://www.sisudoc.org/sisu/sisu_manual/
       { Book Samples & Markup Examples }http://www.jus.uio.no/sisu/SiSU/examples.html
       { SiSU Download }http://www.jus.uio.no/sisu/SiSU/download.html
       { SiSU Changelog }http://www.jus.uio.no/sisu/SiSU/changelog.html
       { SiSU Git repo }http://git.sisudoc.org/?p=code/sisu.git;a=summary
       { SiSU List Archives }http://lists.sisudoc.org/pipermail/sisu/
       { SiSU @ Debian }http://packages.qa.debian.org/s/sisu.html
       { SiSU Project @ Debian }http://qa.debian.org/developer.php?login=sisu@lists.sisudoc.org
       { SiSU @ Wikipedia }http://en.wikipedia.org/wiki/SiSU

    Header tags appear at the beginning of a document and provide meta information on the document (such as the Dublin Core), or information as to how the document as a whole is to be processed. All header instructions take the form @headername: or on the next line and indented by once space :subheadername: All Dublin Core meta tags are available

    @indentifier: information or instructions

    where the "identifier" is a tag recognised by the program, and the "information" or "instructions" belong to the tag/indentifier specified

    Note: a header where used should only be used once; all headers apart from @title: are optional; the @structure: header is used to describe document structure, and can be useful to know.

    This is a sample header

      % SiSU 2.0 [declared file-type identifier with markup version]

      @title: [title text] [this header is the only one that is mandatory]
        :subtitle: [subtitle if any]
        :language: English

      @creator:
       :author: [Lastname, First names]
       :illustrator: [Lastname, First names]
       :translator: [Lastname, First names]
       :prepared_by: [Lastname, First names]

      @date:
       :published: [year or yyyy-mm-dd]
       :created: [year or yyyy-mm-dd]
       :issued: [year or yyyy-mm-dd]
       :available: [year or yyyy-mm-dd]
       :modified: [year or yyyy-mm-dd]
       :valid: [year or yyyy-mm-dd]
       :added_to_site: [year or yyyy-mm-dd]
       :translated: [year or yyyy-mm-dd]

      @rights:
       :copyright: Copyright (C) [Year and Holder]
       :license: [Use License granted]
       :text: [Year and Holder]
       :translation: [Name, Year]
       :illustrations: [Name, Year]

      @classify:
       :topic_register: SiSU:markup sample:book;book:novel:fantasy
       :type:
       :subject:
       :description:
       :keywords:
       :abstract:
       :isbn: [ISBN]
       :loc: [Library of Congress classification]
       :dewey: [Dewey classification
       :pg: [Project Gutenberg text number]

      @links: { SiSU }http://www.sisudoc.org
        { FSF }http://www.fsf.org

      @make:
       :skin: skin_name [skins change default settings related to the appearance of documents generated]
       :num_top: 1
       :headings: [text to match for each level
          (e.g. PART; Chapter; Section; Article; or another: none; BOOK|FIRST|SECOND; none; CHAPTER;)
       :breaks: new=:C; break=1
       :promo: sisu, ruby, sisu_search_libre, open_society
       :bold: [regular expression of words/phrases to be made bold]
       :italics: [regular expression of words/phrases to italicise]

      @original:
       :language: [language]

      @notes:
       :comment:
       :prefix: [prefix is placed just after table of contents]



    Heading levels are :A~ ,:B~ ,:C~ ,1~ ,2~ ,3~ ... :A - :C being part / section headings, followed by other heading levels, and 1 -6 being headings followed by substantive text or sub-headings. :A~ usually the title :A~? conditional level 1 heading (used where a stand-alone document may be imported into another)

    :A~ [heading text] Top level heading [this usually has similar content to the title @title: ] NOTE: the heading levels described here are in 0.38 notation, see heading

    :B~ [heading text] Second level heading [this is a heading level divider]

    :C~ [heading text] Third level heading [this is a heading level divider]

    1~ [heading text] Top level heading preceding substantive text of document or sub-heading 2, the heading level that would normally be marked 1. or 2. or 3. etc. in a document, and the level on which sisu by default would break html output into named segments, names are provided automatically if none are given (a number), otherwise takes the form 1~my_filename_for_this_segment

    2~ [heading text] Second level heading preceding substantive text of document or sub-heading 3 , the heading level that would normally be marked 1.1 or 1.2 or 1.3 or 2.1 etc. in a document.

    3~ [heading text] Third level heading preceding substantive text of document, that would normally be marked 1.1.1 or 1.1.2 or 1.2.1 or 2.1.1 etc. in a document

      1~filename level 1 heading,

      % the primary division such as Chapter that is followed by substantive text, and may be further subdivided (this is the level on which by default html segments are made)

    markup example:

      normal text,  *{emphasis}*, !{bold text}!, /{italics}/, _{underscore}_, "{citation}",
      ^{superscript}^, ,{subscript},, +{inserted text}+, -{strikethrough}-, #{monospace}#

      normal text

      *{emphasis}* [note: can be configured to be represented by bold, italics or underscore]

      !{bold text}!

      /{italics}/

      _{underscore}_

      "{citation}"

      ^{superscript}^

      ,{subscript},

      +{inserted text}+

      -{strikethrough}-

      #{monospace}#

    resulting output:

    normal text, emphasis, bold text, italics, underscore, citation, superscript, subscript, inserted text, strikethrough, monospace

    normal text

    emphasis [note: can be configured to be represented by bold, italics or underscore]

    bold text

    italics

    underscore

    citation

    superscript

    subscript

    inserted text

    strikethrough

    monospace

    markup example:

      ordinary paragraph

      _1 indent paragraph one step

      _2 indent paragraph two steps

      _9 indent paragraph nine steps

    resulting output:

    ordinary paragraph

    indent paragraph one step

    indent paragraph two steps

    indent paragraph nine steps

    markup example:

      _* bullet text

      _1* bullet text, first indent

      _2* bullet text, two step indent

    resulting output:

  • bullet text
  • bullet text, first indent
  • bullet text, two step indent
  • Numbered List (not to be confused with headings/titles, (document structure))

    markup example:

      # numbered list                numbered list 1., 2., 3, etc.

      _# numbered list numbered list indented a., b., c., d., etc.

    Footnotes and endnotes are marked up at the location where they would be indicated within a text. They are automatically numbered. The output type determines whether footnotes or endnotes will be produced

    markup example:

      ~{ a footnote or endnote }~

    resulting output:

    markup example:

      normal text~{ self contained endnote marker & endnote in one }~ continues

    resulting output:

    normal text  117  continues

    markup example:

      normal text ~{* unnumbered asterisk footnote/endnote, insert multiple asterisks if required }~ continues

      normal text ~{** another unnumbered asterisk footnote/endnote }~ continues

    resulting output:

    normal text   *  continues

    normal text   **  continues

    markup example:

      normal text ~[* editors notes, numbered asterisk footnote/endnote series ]~ continues

      normal text ~[+ editors notes, numbered asterisk footnote/endnote series ]~ continues

    resulting output:

    normal text   *1  continues

    normal text   +1  continues

    Alternative endnote pair notation for footnotes/endnotes:

      % note the endnote marker "~^"

      normal text~^ continues

      ^~ endnote text following the paragraph in which the marker occurs

    the standard and pair notation cannot be mixed in the same document

    urls found within text are marked up automatically. A url within text is automatically hyperlinked to itself and by default decorated with angled braces, unless they are contained within a code block (in which case they are passed as normal text), or escaped by a preceding underscore (in which case the decoration is omitted).

    markup example:

      normal text http://www.sisudoc.org/ continues

    resulting output:

    normal text ‹http://www.sisudoc.org/› continues

    An escaped url without decoration

    markup example:

      normal text _http://www.sisudoc.org/ continues

      deb http://www.jus.uio.no/sisu/archive unstable main non-free

    resulting output:

    normal text http://www.sisudoc.org/ continues

    deb http://www.jus.uio.no/sisu/archive unstable main non-free

    where a code block is used there is neither decoration nor hyperlinking, code blocks are discussed later in this document

    resulting output:

      deb http://www.jus.uio.no/sisu/archive unstable main non-free
      deb-src http://www.jus.uio.no/sisu/archive unstable main non-free

    To link text or an image to a url the markup is as follows

    markup example:

      about { SiSU }http://url.org markup

    resulting output:

    about SiSU markup

    A shortcut notation is available so the url link may also be provided automatically as a footnote

    markup example:

      about {~^ SiSU }http://url.org markup

    resulting output:

    about SiSU   118  markup

    Internal document links to a tagged location, including an ocn

    markup example:

      about { text links }#link_text

    resulting output:

    about text links

    Shared document collection link

    markup example:

      about { SiSU book markup examples }:SiSU/examples.html

    resulting output:

    markup example:

      { tux.png 64x80 }image

      % various url linked images

      {tux.png 64x80 "a better way" }http://www.sisudoc.org/

      {GnuDebianLinuxRubyBetterWay.png 100x101 "Way Better - with Gnu/Linux, Debian and Ruby" }http://www.sisudoc.org/

      {~^ ruby_logo.png "Ruby" }http://www.ruby-lang.org/en/

    resulting output:


    Gnu/Linux - a better way


    Way Better - with Gnu/Linux, Debian and Ruby


    Ruby

      119 

    linked url footnote shortcut

      {~^ [text to link] }http://url.org

      % maps to: { [text to link] }http://url.org ~{ http://url.org }~

      % which produces hyper-linked text within a document/paragraph, with an endnote providing the url for the text location used in the hyperlink

    512

    note at a heading level the same is automatically achieved by providing names to headings 1, 2 and 3 i.e. 2~[name] and 3~[name] or in the case of auto-heading numbering, without further intervention.

    Tables may be prepared in two either of two forms

    markup example:

      table{ c3; 40; 30; 30;

      This is a table
      this would become column two of row one
      column three of row one is here

      And here begins another row
      column two of row two
      column three of row two, and so on

      }table

    resulting output:

    This is a tablethis would become column two of row onecolumn three of row one is here
    And here begins another rowcolumn two of row twocolumn three of row two, and so on

    a second form may be easier to work with in cases where there is not much information in each column

    markup example:  120 

      !_ Table 3.1: Contributors to Wikipedia, January 2001 - June 2005

      {table~h 24; 12; 12; 12; 12; 12; 12;}
                                      |Jan. 2001|Jan. 2002|Jan. 2003|Jan. 2004|July 2004|June 2006
      Contributors*                   |       10|      472|    2,188|    9,653|   25,011|   48,721
      Active contributors**           |        9|      212|      846|    3,228|    8,442|   16,945
      Very active contributors***     |        0|       31|      190|      692|    1,639|    3,016
      No. of English language articles|       25|   16,000|  101,000|  190,000|  320,000|  630,000
      No. of articles, all languages  |       25|   19,000|  138,000|  490,000|  862,000|1,600,000

      \* Contributed at least ten times; \** at least 5 times in last month; \*\** more than 100 times in last month.

    resulting output:

    Table 3.1: Contributors to Wikipedia, January 2001 - June 2005

    Jan. 2001Jan. 2002Jan. 2003Jan. 2004July 2004June 2006
    Contributors*104722,1889,65325,01148,721
    Active contributors**92128463,2288,44216,945
    Very active contributors***0311906921,6393,016
    No. of English language articles2516,000101,000190,000320,000630,000
    No. of articles, all languages2519,000138,000490,000862,0001,600,000

    * Contributed at least ten times; ** at least 5 times in last month; *** more than 100 times in last month.

    basic markup:

      poem{

        Your poem here

      }poem

      Each verse in a poem is given an object number.

    markup example:

      poem{

                          `Fury said to a
                         mouse, That he
                       met in the
                     house,
                  "Let us
                    both go to
                      law:  I will
                        prosecute
                          YOU.  --Come,
                             I'll take no
                              denial; We
                           must have a
                       trial:  For
                    really this
                 morning I've
                nothing
               to do."
                 Said the
                   mouse to the
                     cur, "Such
                       a trial,
                         dear Sir,
                               With
                           no jury
                        or judge,
                      would be
                    wasting
                   our
                    breath."
                     "I'll be
                       judge, I'll
                         be jury,"
                               Said
                          cunning
                            old Fury:
                           "I'll
                            try the
                               whole
                                cause,
                                   and
                              condemn
                             you
                            to
                             death."'

      }poem

    resulting output:

                        `Fury said to a
                       mouse, That he
                     met in the
                   house,
                "Let us
                  both go to
                    law:  I will
                      prosecute
                        YOU.  --Come,
                           I'll take no
                            denial; We
                         must have a
                     trial:  For
                  really this
               morning I've
              nothing
             to do."
               Said the
                 mouse to the
                   cur, "Such
                     a trial,
                       dear Sir,
                             With
                         no jury
                      or judge,
                    would be
                  wasting
                 our
                  breath."
                   "I'll be
                     judge, I'll
                       be jury,"
                             Said
                        cunning
                          old Fury:
                         "I'll
                          try the
                             whole
                              cause,
                                 and
                            condemn
                           you
                          to
                           death."'

    basic markup:

      group{

        Your grouped text here

      }group

      A group is treated as an object and given a single object number.

    markup example:

      group{

                          `Fury said to a
                         mouse, That he
                       met in the
                     house,
                  "Let us
                    both go to
                      law:  I will
                        prosecute
                          YOU.  --Come,
                             I'll take no
                              denial; We
                           must have a
                       trial:  For
                    really this
                 morning I've
                nothing
               to do."
                 Said the
                   mouse to the
                     cur, "Such
                       a trial,
                         dear Sir,
                               With
                           no jury
                        or judge,
                      would be
                    wasting
                   our
                    breath."
                     "I'll be
                       judge, I'll
                         be jury,"
                               Said
                          cunning
                            old Fury:
                           "I'll
                            try the
                               whole
                                cause,
                                   and
                              condemn
                             you
                            to
                             death."'

      }group

    resulting output:

                        `Fury said to a
                       mouse, That he
                     met in the
                   house,
                "Let us
                  both go to
                    law:  I will
                      prosecute
                        YOU.  --Come,
                           I'll take no
                            denial; We
                         must have a
                     trial:  For
                  really this
               morning I've
              nothing
             to do."
               Said the
                 mouse to the
                   cur, "Such
                     a trial,
                       dear Sir,
                             With
                         no jury
                      or judge,
                    would be
                  wasting
                 our
                  breath."
                   "I'll be
                     judge, I'll
                       be jury,"
                             Said
                        cunning
                          old Fury:
                         "I'll
                          try the
                             whole
                              cause,
                                 and
                            condemn
                           you
                          to
                           death."'

    Code tags code{ ... }code (used as with other group tags described above) are used to escape regular sisu markup, and have been used extensively within this document to provide examples of SiSU markup. You cannot however use code tags to escape code tags. They are however used in the same way as group or poem tags.

    A code-block is treated as an object and given a single object number. [an option to number each line of code may be considered at some later time]

    use of code tags instead of poem compared, resulting output:

                          `Fury said to a
                         mouse, That he
                       met in the
                     house,
                  "Let us
                    both go to
                      law:  I will
                        prosecute
                          YOU.  --Come,
                             I'll take no
                              denial; We
                           must have a
                       trial:  For
                    really this
                 morning I've
                nothing
               to do."
                 Said the
                   mouse to the
                     cur, "Such
                       a trial,
                         dear Sir,
                               With
                           no jury
                        or judge,
                      would be
                    wasting
                   our
                    breath."
                     "I'll be
                       judge, I'll
                         be jury,"
                               Said
                          cunning
                            old Fury:
                           "I'll
                            try the
                               whole
                                cause,
                                   and
                              condemn
                             you
                            to
                             death."'

    From SiSU 2.7.7 on you can number codeblocks by placing a hash after the opening code tag code{# as demonstrated here:

    1  ┆                      `Fury said to a
    2  ┆                     mouse, That he
    3  ┆                   met in the
    4  ┆                 house,
    5  ┆              "Let us
    6  ┆                both go to
    7  ┆                  law:  I will
    8  ┆                    prosecute
    9  ┆                      YOU.  --Come,
    10 ┆                         I'll take no
    11 ┆                          denial; We
    12 ┆                       must have a
    13 ┆                   trial:  For
    14 ┆                really this
    15 ┆             morning I've
    16 ┆            nothing
    17 ┆           to do."
    18 ┆             Said the
    19 ┆               mouse to the
    20 ┆                 cur, "Such
    21 ┆                   a trial,
    22 ┆                     dear Sir,
    23 ┆                           With
    24 ┆                       no jury
    25 ┆                    or judge,
    26 ┆                  would be
    27 ┆                wasting
    28 ┆               our
    29 ┆                breath."
    30 ┆                 "I'll be
    31 ┆                   judge, I'll
    32 ┆                     be jury,"
    33 ┆                           Said
    34 ┆                      cunning
    35 ┆                        old Fury:
    36 ┆                       "I'll
    37 ┆                        try the
    38 ┆                           whole
    39 ┆                            cause,
    40 ┆                               and
    41 ┆                          condemn
    42 ┆                         you
    43 ┆                        to
    44 ┆                         death."'

    To make an index append to paragraph the book index term relates to it, using an equal sign and curly braces.

    Currently two levels are provided, a main term and if needed a sub-term. Sub-terms are separated from the main term by a colon.

        Paragraph containing main term and sub-term.
        ={Main term:sub-term}

    The index syntax starts on a new line, but there should not be an empty line between paragraph and index markup.

    The structure of the resulting index would be:

        Main term, 1
          sub-term, 1

    Several terms may relate to a paragraph, they are separated by a semicolon. If the term refers to more than one paragraph, indicate the number of paragraphs.

        Paragraph containing main term, second term and sub-term.
        ={first term; second term: sub-term}

    The structure of the resulting index would be:

        First term, 1,
        Second term, 1,
          sub-term, 1

    If multiple sub-terms appear under one paragraph, they are separated under the main term heading from each other by a pipe symbol.

        Paragraph containing main term, second term and sub-term.
        ={Main term:sub-term+1|second sub-term

        A paragraph that continues discussion of the first sub-term

    The plus one in the example provided indicates the first sub-term spans one additional paragraph. The logical structure of the resulting index would be:

        Main term, 1,
          sub-term, 1-3,
          second sub-term, 1,



    It is possible to build a document by creating a master document that requires other documents. The documents required may be complete documents that could be generated independently, or they could be markup snippets, prepared so as to be easily available to be placed within another text. If the calling document is a master document (built from other documents), it should be named with the suffix .ssm Within this document you would provide information on the other documents that should be included within the text. These may be other documents that would be processed in a regular way, or markup bits prepared only for inclusion within a master document .sst regular markup file, or .ssi (insert/information) A secondary file of the composite document is built prior to processing with the same prefix and the suffix ._sst

    basic markup for importing a document into a master document

      << filename1.sst

      << filename2.ssi

    The form described above should be relied on. Within the Vim editor it results in the text thus linked becoming hyperlinked to the document it is calling in which is convenient for editing. Alternative markup for importation of documents under consideration, and occasionally supported have been.

      << filename.ssi

      <<{filename.ssi}

      % using textlink alternatives

      << |filename.ssi|@|^|





    2.0 introduced new headers and is therefore incompatible with 1.0 though otherwise the same with the addition of a couple of tags (i.e. a superset)

    0.38 is substantially current for version 1.0

    depreciated 0.16 supported, though file names were changed at 0.37

  • sisu --query=[sisu version [0.38] or 'history]
  • provides a short history of changes to SiSU markup

    SiSU 2.0 (2010-03-06:09/6) same as 1.0, apart from the changing of headers and the addition of a monospace tag related headers now grouped, e.g.

      @title:
       :subtitle:

      @creator:
       :author:
       :translator:
       :illustrator:

      @rights:
       :text:
       :illustrations:

    see document markup samples, and sisu --help headers

    the monospace tag takes the form of a hash '#'

      #{ this enclosed text would be monospaced }#

    1.0 (2009-12-19:50/6) same as 0.69

    0.69 (2008-09-16:37/2) (same as 1.0) and as previous (0.57) with the addition of book index tags

      /^={.+?}$/

    e.g. appended to a paragraph, on a new-line (without a blank line in between) logical structure produced assuming this is the first text "object"

       ={GNU/Linux community distribution:Debian+2|Fedora|Gentoo;Free Software Foundation+5}

      Free Software Foundation, 1-6
      GNU/Linux community distribution, 1
          Debian, 1-3
          Fedora, 1
          Gentoo,

    0.66 (2008-02-24:07/7) same as previous, adds semantic tags, [experimental and not-used]

      /[:;]{.+?}[:;][a-z+]/

    0.57 (2007w34/4) SiSU 0.57 is the same as 0.42 with the introduction of some a shortcut to use the headers @title and @creator in the first heading [expanded using the contents of the headers @title: and @author:]

      :A~ @title by @author

    0.52 (2007w14/6) declared document type identifier at start of text/document:

    SiSU 0.52

    or, backward compatible using the comment marker:

    % SiSU 0.38

    variations include 'SiSU (text|master|insert) [version]' and 'sisu-[version]'

    0.51 (2007w13/6) skins changed (simplified), markup unchanged

    0.42 (2006w27/4) * (asterisk) type endnotes, used e.g. in relation to author

    SiSU 0.42 is the same as 0.38 with the introduction of some additional endnote types,

    Introduces some variations on endnotes, in particular the use of the asterisk

      ~{* for example for describing an author }~ and ~{** for describing a second author }~

    * for example for describing an author

    ** for describing a second author

    and

      ~[* my note ]~ or ~[+ another note ]~

    which numerically increments an asterisk and plus respectively

    *1 my note +1 another note

    0.38 (2006w15/7) introduced new/alternative notation for headers, e.g. @title: (instead of 0~title), and accompanying document structure markup, :A,:B,:C,1,2,3 (maps to previous 1,2,3,4,5,6)

    SiSU 0.38 introduced alternative experimental header and heading/structure markers,

      @headername: and headers :A~ :B~ :C~ 1~ 2~ 3~

    as the equivalent of:

      0~headername and headers 1~ 2~ 3~ 4~ 5~ 6~

    The internal document markup of SiSU 0.16 remains valid and standard Though note that SiSU 0.37 introduced a new file naming convention

    SiSU has in effect two sets of levels to be considered, using 0.38 notation A-C headings/levels, pre-ordinary paragraphs /pre-substantive text, and 1-3 headings/levels, levels which are followed by ordinary text. This may be conceptualised as levels A,B,C, 1,2,3, and using such letter number notation, in effect: A must exist, optional B and C may follow in sequence (not strict) 1 must exist, optional 2 and 3 may follow in sequence i.e. there are two independent heading level sequences A,B,C and 1,2,3 (using the 0.16 standard notation 1,2,3 and 4,5,6) on the positive side: the 0.38 A,B,C,1,2,3 alternative makes explicit an aspect of structuring documents in SiSU that is not otherwise obvious to the newcomer (though it appears more complicated, is more in your face and likely to be understood fairly quickly); the substantive text follows levels 1,2,3 and it is 'nice' to do most work in those levels

    0.37 (2006w09/7) introduced new file naming convention, .sst (text), .ssm (master), .ssi (insert), markup syntax unchanged

    SiSU 0.37 introduced new file naming convention, using the file extensions .sst .ssm and .ssi to replace .s1 .s2 .s3 .r1 .r2 .r3 and .si

    this is captured by the following file 'rename' instruction:

      rename 's/\.s[123]$/\.sst/' *.s{1,2,3}
      rename 's/\.r[123]$/\.ssm/' *.r{1,2,3}
      rename 's/\.si$/\.ssi/' *.si

    The internal document markup remains unchanged, from SiSU 0.16

    0.35 (2005w52/3) sisupod, zipped content file introduced

    0.23 (2005w36/2) utf-8 for markup file

    0.22 (2005w35/3) image dimensions may be omitted if rmagick is available to be relied upon

    0.20.4 (2005w33/4) header 0~links

    0.16 (2005w25/2) substantial changes introduced to make markup cleaner, header 0~title type, and headings [1-6]~ introduced, also percentage sign (%) at start of a text line as comment marker

    SiSU 0.16 (0.15 development branch) introduced the use of

    the header 0~ and headings/structure 1~ 2~ 3~ 4~ 5~ 6~

    in place of the 0.1 header, heading/structure notation

    SiSU 0.1 headers and headings structure represented by header 0{~ and headings/structure 1{ 2{ 3{ 4{~ 5{ 6{



    SiSU SiSU is a document publishing system, that from a simple single marked-up document, produces multiple of output formats including: plaintext, html, xhtml, XML, epub, odt (odf text), LaTeX, pdf, info, and SQL (PostgreSQL and SQLite), which share numbered text objects ("object citation numbering") and the same document structure information. For more see: ‹http://www.jus.uio.no/sisu

    -a [filename/wildcard]
    produces plaintext with Unix linefeeds and without markup, (object numbers are omitted), has footnotes at end of each paragraph that contains them [ -A for equivalent dos (linefeed) output file] [see -e for endnotes]. (Options include: --endnotes for endnotes --footnotes for footnotes at the end of each paragraph --unix for unix linefeed (default) --msdos for msdos linefeed)

    -b [filename/wildcard]
    see --xhtml

    --color-toggle [filename/wildcard]
    screen toggle ansi screen colour on or off depending on default set (unless -c flag is used: if sisurc colour default is set to 'true', output to screen will be with colour, if sisurc colour default is set to 'false' or is undefined screen output will be without colour). Alias -c

    --concordance [filename/wildcard]
    produces concordance (wordmap) a rudimentary index of all the words in a document. (Concordance files are not generated for documents of over 260,000 words unless this limit is increased in the file sisurc.yml). Alias -w

    -C [--init-site]
    configure/initialise shared output directory files initialize shared output directory (config files such as css and dtd files are not updated if they already exist unless modifier is used). -C --init-site configure/initialise site more extensive than -C on its own, shared output directory files/force update, existing shared output config files such as css and dtd files are updated if this modifier is used.

    -CC
    configure/initialise shared output directory files initialize shared output directory (config files such as css and dtd files are not updated if they already exist unless modifier is used). The equivalent of: -C --init-site configure/initialise site, more extensive than -C on its own, shared output directory files/force update, existing shared output config files such as css and dtd files are updated if -CC is used.

    -c [filename/wildcard]
    see --color-toggle

    --dal [filename/wildcard/url]
    assumed for most other flags, creates new intermediate files for processing (document abstraction) that is used in all subsequent processing of other output. This step is assumed for most processing flags. To skip it see -n. Alias -m

    --delete [filename/wildcard]
    see --zap

    -D [instruction] [filename]
    see --pg

    -d [--db-[database type (sqlite|pg)]] --[instruction] [filename]
    see --sqlite

    --epub [filename/wildcard]
    produces an epub document, [sisu version 2 only] (filename.epub). Alias -e

    -e [filename/wildcard]
    see --epub

    -F [--webserv=webrick]
    see --sample-search-form

    --git [filename/wildcard]
    produces or updates markup source file structure in a git repo (experimental and subject to change). Alias -g

    -g [filename/wildcard]
    see --git

    --harvest *.ss[tm]
    makes two lists of sisu output based on the sisu markup documents in a directory: list of author and authors works (year and titles), and; list by topic with titles and author. Makes use of header metadata fields (author, title, date, topic_register). Can be used with maintenance (-M) and remote placement (-R) flags.

    --help [topic]
    provides help on the selected topic, where topics (keywords) include: list, (com)mands, short(cuts), (mod)ifiers, (env)ironment, markup, syntax, headers, headings, endnotes, tables, example, customise, skin, (dir)ectories, path, (lang)uage, db, install, setup, (conf)igure, convert, termsheet, search, sql, features, license

    --html [filename/wildcard]
    produces html output, segmented text with table of contents (toc.html and index.html) and the document in a single file (scroll.html). Alias -h

    -h [filename/wildcard]
    see --html

    -I [filename/wildcard]
    see --texinfo

    -i [filename/wildcard]
    see --manpage

    -L
    prints license information.

    --machine [filename/wildcard/url]
    see --dal (document abstraction level/layer)

    --maintenance [filename/wildcard/url]
    maintenance mode files created for processing preserved and their locations indicated. (also see -V). Alias -M

    --manpage [filename/wildcard]
    produces man page of file, not suitable for all outputs. Alias -i

    -M [filename/wildcard/url]
    see --maintenance

    -m [filename/wildcard/url]
    see --dal (document abstraction level/layer)

    --no-ocn
    [with --html --pdf or --epub] switches off object citation numbering. Produce output without identifying numbers in margins of html or LaTeX/pdf output.

    -N [filename/wildcard/url]
    document digest or document content certificate ( DCC ) as md5 digest tree of the document: the digest for the document, and digests for each object contained within the document (together with information on software versions that produced it) (digest.txt). -NV for verbose digest output to screen.

    -n [filename/wildcard/url]
    skip the creation of intermediate processing files (document abstraction) if they already exist, this skips the equivalent of -m which is otherwise assumed by most processing flags.

    --odf [filename/wildcard/url]
    see --odt

    --odt [filename/wildcard/url]
    output basic document in opendocument file format (opendocument.odt). Alias -o

    -o [filename/wildcard/url]
    see --odt

    --pdf [filename/wildcard]
    produces LaTeX pdf (portrait.pdf & landscape.pdf). Default paper size is set in config file, or document header, or provided with additional command line parameter, e.g. --papersize-a4 preset sizes include: 'A4', U.S. 'letter' and 'legal' and book sizes 'A5' and 'B5' (system defaults to A4). Alias -p

    --pg [instruction] [filename]
    database postgresql ( --pgsql may be used instead) possible instructions, include: --createdb; --create; --dropall; --import [filename]; --update [filename]; --remove [filename]; see database section below. Alias -D

    --po [language_directory/filename language_directory]
    see --po4a

    --po4a [language_directory/filename language_directory]
    produces .pot and po files for the file in the languages specified by the language directory. SiSU markup is placed in subdirectories named with the language code, e.g. en/ fr/ es/. The sisu config file must set the output directory structure to multilingual. v3, experimental

    -P [language_directory/filename language_directory]
    see --po4a

    -p [filename/wildcard]
    see --pdf

    --quiet [filename/wildcard]
    quiet less output to screen.

    -q [filename/wildcard]
    see --quiet

    --rsync [filename/wildcard]
    copies sisu output files to remote host using rsync. This requires that sisurc.yml has been provided with information on hostname and username, and that you have your "keys" and ssh agent in place. Note the behavior of rsync different if -R is used with other flags from if used alone. Alone the rsync --delete parameter is sent, useful for cleaning the remote directory (when -R is used together with other flags, it is not). Also see --scp. Alias -R

    -R [filename/wildcard]
    see --rsync

    -r [filename/wildcard]
    see --scp

    --sample-search-form [--webserv=webrick]
    generate examples of (naive) cgi search form for sqlite and pgsql depends on your already having used sisu to populate an sqlite and/or pgsql database, (the sqlite version scans the output directories for existing sisu_sqlite databases, so it is first necessary to create them, before generating the search form) see -d -D and the database section below. If the optional parameter --webserv=webrick is passed, the cgi examples created will be set up to use the default port set for use by the webrick server, (otherwise the port is left blank and the system setting used, usually 80). The samples are dumped in the present work directory which must be writable, (with screen instructions given that they be copied to the cgi-bin directory). -Fv (in addition to the above) provides some information on setting up hyperestraier for sisu. Alias -F

    --scp [filename/wildcard]
    copies sisu output files to remote host using scp. This requires that sisurc.yml has been provided with information on hostname and username, and that you have your "keys" and ssh agent in place. Also see --rsync. Alias -r

    --sqlite --[instruction] [filename]
    database type default set to sqlite, (for which --sqlite may be used instead) or to specify another database --db-[pgsql, sqlite] (however see -D) possible instructions include: --createdb; --create; --dropall; --import [filename]; --update [filename]; --remove [filename]; see database section below. Alias -d

    --sisupod
    produces a sisupod a zipped sisu directory of markup files including sisu markup source files and the directories local configuration file, images and skins. Note: this only includes the configuration files or skins contained in ./_sisu not those in ~/.sisu -S [filename/wildcard] option. Note: (this option is tested only with zsh). Alias -S

    --sisupod [filename/wildcard]
    produces a zipped file of the prepared document specified along with associated images, by default named sisupod.zip they may alternatively be named with the filename extension .ssp This provides a quick way of gathering the relevant parts of a sisu document which can then for example be emailed. A sisupod includes sisu markup source file, (along with associated documents if a master file, or available in multilingual versions), together with related images and skin. SiSU commands can be run directly against a sisupod contained in a local directory, or provided as a url on a remote site. As there is a security issue with skins provided by other users, they are not applied unless the flag --trust or --trusted is added to the command instruction, it is recommended that file that are not your own are treated as untrusted. The directory structure of the unzipped file is understood by sisu, and sisu commands can be run within it. Note: if you wish to send multiple files, it quickly becomes more space efficient to zip the sisu markup directory, rather than the individual files for sending). See the -S option without [filename/wildcard]. Alias -S

    --source [filename/wildcard]
    copies sisu markup file to output directory. Alias -s

    -S
    see --sisupod

    -S [filename/wildcard]
    see --sisupod

    -s [filename/wildcard]
    see --source

    --texinfo [filename/wildcard]
    produces texinfo and info file, (view with pinfo). Alias -I

    --txt [filename/wildcard]
    produces plaintext with Unix linefeeds and without markup, (object numbers are omitted), has footnotes at end of each paragraph that contains them [ -A for equivalent dos (linefeed) output file] [see -e for endnotes]. (Options include: --endnotes for endnotes --footnotes for footnotes at the end of each paragraph --unix for unix linefeed (default) --msdos for msdos linefeed). Alias -t

    -T [filename/wildcard (*.termsheet.rb)]
    standard form document builder, preprocessing feature

    -t [filename/wildcard]
    see --txt

    --urls [filename/wildcard]
    prints url output list/map for the available processing flags options and resulting files that could be requested, (can be used to get a list of processing options in relation to a file, together with information on the output that would be produced), -u provides url output mapping for those flags requested for processing. The default assumes sisu_webrick is running and provides webrick url mappings where appropriate, but these can be switched to file system paths in sisurc.yml. Alias -U

    -U [filename/wildcard]
    see --urls

    -u [filename/wildcard]
    provides url mapping of output files for the flags requested for processing, also see -U

    --v2 [filename/wildcard]
    invokes the sisu v2 document parser/generator. This is the default and is normally omitted.

    --v3 [filename/wildcard]
    invokes the sisu v3 document parser/generator. Currently under development and incomplete, v3 requires >= ruby1.9.2p180. You may run sisu3 instead.

    --verbose [filename/wildcard]
    provides verbose output of what is being generated, where output is placed (and error messages if any), as with -u flag provides a url mapping of files created for each of the processing flag requests. Alias -v

    -V
    on its own, provides SiSU version and environment information (sisu --help env)

    -V [filename/wildcard]
    even more verbose than the -v flag.

    -v
    on its own, provides SiSU version information

    -v [filename/wildcard]
    see --verbose

    --webrick
    starts ruby's webrick webserver points at sisu output directories, the default port is set to 8081 and can be changed in the resource configuration files. [tip: the webrick server requires link suffixes, so html output should be created using the -h option rather than -H ; also, note -F webrick ]. Alias -W

    -W
    see --webrick

    --wordmap [filename/wildcard]
    see --concordance

    -w [filename/wildcard]
    see --concordance

    --xhtml [filename/wildcard]
    produces xhtml/XML output for browser viewing (sax parsing). Alias -b

    --xml-dom [filename/wildcard]
    produces XML output with deep document structure, in the nature of dom. Alias -X

    --xml-sax [filename/wildcard]
    produces XML output shallow structure (sax parsing). Alias -x

    -X [filename/wildcard]
    see --xml-dom

    -x [filename/wildcard]
    see --xml-sax

    -Y [filename/wildcard]
    produces a short sitemap entry for the document, based on html output and the sisu_manifest. --sitemaps generates/updates the sitemap index of existing sitemaps. (Experimental, [g,y,m announcement this week])

    -y [filename/wildcard]
    produces an html summary of output generated (hyperlinked to content) and document specific metadata (sisu_manifest.html). This step is assumed for most processing flags.

    --zap [filename/wildcard]
    Zap, if used with other processing flags deletes output files of the type about to be processed, prior to processing. If -Z is used as the lone processing related flag (or in conjunction with a combination of -[mMvVq]), will remove the related document output directory. Alias -Z

    -Z [filename/wildcard]
    see --zap



    --no-ocn
    [with --html --pdf or --epub] switches off object citation numbering. Produce output without identifying numbers in margins of html or LaTeX/pdf output.

    --no-annotate
    strips output text of editor endnotes  *2  denoted by asterisk or dagger/plus sign

    --no-asterisk
    strips output text of editor endnotes  *3  denoted by asterisk sign

    --no-dagger
    strips output text of editor endnotes  +2  denoted by dagger/plus sign



    dbi - database interface

    -D or --pgsql set for postgresql -d or --sqlite default set for sqlite -d is modifiable with --db=[database type (pgsql or sqlite)]

    --pg -v --createall
    initial step, creates required relations (tables, indexes) in existing postgresql database (a database should be created manually and given the same name as working directory, as requested) (rb.dbi) [ -dv --createall sqlite equivalent] it may be necessary to run sisu -Dv --createdb initially NOTE: at the present time for postgresql it may be necessary to manually create the database. The command would be 'createdb [database name]' where database name would be SiSU_[present working directory name (without path)]. Please use only alphanumerics and underscores.

    --pg -v --import
    [filename/wildcard] imports data specified to postgresql db (rb.dbi) [ -dv --import sqlite equivalent]

    --pg -v --update
    [filename/wildcard] updates/imports specified data to postgresql db (rb.dbi) [ -dv --update sqlite equivalent]

    --pg --remove
    [filename/wildcard] removes specified data to postgresql db (rb.dbi) [ -d --remove sqlite equivalent]

    --pg --dropall
    kills data" and drops (postgresql or sqlite) db, tables & indexes [ -d --dropall sqlite equivalent]

    The -v is for verbose output.



    --update [filename/wildcard]
    Checks existing file output and runs the flags required to update this output. This means that if only html and pdf output was requested on previous runs, only the -hp files will be applied, and only these will be generated this time, together with the summary. This can be very convenient, if you offer different outputs of different files, and just want to do the same again.

    -0 to -5 [filename or wildcard]
    Default shorthand mappings (note that the defaults can be changed/configured in the sisurc.yml file):

    -0
    -mNhwpAobxXyYv [this is the default action run when no options are give, i.e. on 'sisu [filename]']

    -1
    -mhewpy

    -2
    -mhewpaoy

    -3
    -mhewpAobxXyY

    -4
    -mhewpAobxXDyY --import

    -5
    -mhewpAobxXDyY --update

    add -v for verbose mode and -c for color, e.g. sisu -2vc [filename or wildcard]

    consider -u for appended url info or -v for verbose output

    In the data directory run sisu -mh filename or wildcard eg. "sisu -h cisg.sst" or "sisu -h *.{sst,ssm}" to produce html version of all documents.

    Running sisu (alone without any flags, filenames or wildcards) brings up the interactive help, as does any sisu command that is not recognised. Enter to escape.





    This section has been much reduced in content since the release of SiSU which it predated. It provides links to some relevant information.

    The description provided in the abandoned U.S. Provisional Patent Application may be of interest as it provides greater detail and by an large supersedes the description given here ‹http://www.jus.uio.no/sisu/sisu_provisional_patent_application_200408› and accompanying diagrams ‹http://www.jus.uio.no/sisu/diagram/sisu_provisional_patent_application_diagram_200408.pdf› and reasons for abandoning ‹http://www.jus.uio.no/sisu/SiSU/2005.html#ppa

    Of particular interest is the ease of streaming documents to a relational database, at an object (roughly paragraph) level and the potential for increased precision in the presentation of matches that results thereby. The ability to serialise html, latex, xml, sql, (whatever) is also inherent in / incidental to the design.



    This is the short form of an old summary based on design decisions of 2002. It predates the release of SiSU by a number of years, and should possibly be removed.

    A rough chart of the SiSU program structure can be found here:

    What follows is a brief description of the chart's components, based on the numbers and letters used in the chart.

    A Input text ascii with minimalistic human markup requirements

    B Machine intermediate processing output, used by all other modules - there for the time being is a selection: human readable; a Ruby marshal dump of the same, and; a YAML file   121  Once the intermediate stage is created, if no changes to input (i.e. A) are made, it is possible to start with B as input for program (i.e. to skip stage A and processing required to get to stage B). This might be of interest if document appearance is modified but not content. Abstract document structure is "created" here, with the pre-processing of the likes of tables, numbering (headings, paragraphs etc.) and endnotes, to ensure that all subsequent processing is based on the same integral document structures.

    C Various final publication outputs that all share a common citation numbering system
    html - there are possibilities for output based on tables or output based on css
    pdf - landscape and portrait currently set to A4 paper size
    XML with a flat structure,sax, and with a deeper (embedded) structure, dom
    sql - data in sql database retaining document structure, this is in some ways similar to B output, as is likely to be further processed for presentation.

    1 data feed controller for other program components

    2 Creation of intermediate stage B, which contains information related to document structure used by all subsequent data output modules.

    3 Parameter extraction. Program takes data related to the document being processed.

    4 Relates primarily to appearance/ design, how the site or document should look:

    4a Initialised variables used for "typesetting". eg. Margin widths etc. called by program. This can be done in 3 stages, there are i. the default program-wide settings, ii. possibility of setting alternative site-wide settings, iii. possibility of providing settings for an individual document

    4b Template, includes formatting classes eg. for appearance of html (whether table based or css) or for pdf output. (For examples of templates at work, see examples provided earlier of html output in css and tables versions, and of pdf landscape and portrait outputs that result from templates that provide different the LaTeX output for the resulting pdfs.

    5 Here we have the logic engines that call process B the intermediate machine generated data and call upon the relevant templates to produce the different presentations of the document.

    5a html module - to construct html documents

    5b LaTeX module - to construct LaTeX, which is then fed to pdflatex to produce pdf files

    5c SQL module - to import data into PostgreSQL database retaining document structure detail and other detail common to the other output formats. This keeps all information regarding document structure in four relational database tables, one containing semantic and other headers, a second substantive texts, a third endnotes, a fourth pre-formatted texts. (the flexibility exists to carry this further)

    SiSU is written in Ruby and assumes Linux OS (development has been on Debian/Gnu/Linux)

    SiSU generates

    html output

    LaTeX output, then uses LaTeX (and /pdflatex/ LaTeX to pdf) for pdf output

    lout output lout, then uses Lout to produce postscript (and postscript to pdf conversion), [not currently maintained]

    sql output (database feed) eg PostgreSQL, making use of Ruby dbi or pgsql modules to be used by PostgerSQL, or sqlite, making use of Ruby dbi or sqlite modules to be used by sqlite

    Not required but taken advantage of if available:

    tidy (XML, xhtml well formed check)

    trang (relaxng, rnc to dtd conversion)

    there are other modules ... see this document.



    SiSU started as a way to make html manageable, together with the core concept of making text citable through the use of object character numbering. LaTeX/pdf provided a way of making near print quality output, and demonstrating how conveniently the concept worked across different output formats. Relational database storage using the same concept underscored this and the concept makes database search results relevant, to locating results quickly in all output formats that use object character numbers.

    There are a number of data formats and technologies that are of particular interest to SiSU, and to keep an eye on more generally. These links are kept here for convenience. Note that whilst all the technologies mentioned are of interest in the context of SiSU, not all of them are supported by SiSU.

  • *Debian*,   122  wikipedia entry,   123  social contract   124  Debian is one of the largest software integration projects with over 15,000 packages available and is probably the most technically sophisticated of the linux distributions. It is built for multiple hardware architectures, and is the base distribution for somewhere around 130 Debian derirvative linux distributions.
  • *Unix*
  • Organisations

    Information

  • *SGML* - Standard Generalized Markup Language (of which XML is part of the family), wikipedia entry   136 
  • *LaTeX* a document preparation system for the TeX typesetting program, wikipedia entry   181 
  • *PDF* - Portable Document Format, wikipedia entry   184  pdflatex, pdftex,   185 
  • *SQL* - Structured Query Language, wikipedia entry   189 
  • *CouchDB*   200  looks fascinating, work in progress, book in progress Relax with CouchDB   201  an interview with it's author,   202  wikipedia entry   203  makes use of JSON: wikipedia entry   204 
  • *Unicode*, wikipedia entry   213 
  • information visualization toolkit prefuse   216 
  • *DC* - Dublin Core, wikipedia entry   220 
  • *sitemap*   230  - standard web index information protocol backed by Google, Yahoo and Microsoft wikipedia entry   231 
  • RDF - Resource Description Framework wikipedia entry   232 
  • RSS - Really Simple Syndication / Rich Site Summary wikipedia entry   233  rss 2.0 specification   234  media rss   235 
  • Organisations

    Technologies

  • *HDF* - Hierarchical Data Format, wikipedia entry   240 
  • Organisations

    Licenses





    Note: a much more comprehensive history can be gleaned from the Chronology pages, which however, also contain all sorts of additional random information and opinion of the author, and since the release of SiSU as Software Libre under the GPL in the document changelog.

    While working with legal texts and in an academic environment, a site that was first called Ananse, The International Trade Law Monitor and later still Lex Mercatoria,  263  I was faced with a number of issues, those of interest here being technical. Amongst them was the relatively fast evolution of html, (in which text was prepared for the Web), which made having to continually update text/document representations to reflect the improvements in what was possible with the latest html markup cumbersome. There was also the fact that some of the strengths of html were limitations in other document representational contexts, e.g. good document rendition across multiple screens was a different problem from ideal paper rendition. Also within an academic and law environment one of the limits of html repeatedly presented as critical with regard to academic writing was the fact that it was not possible to reliably cite the location of content within a document. HTML rendered differently in different browsers; change the font size and it again came out differently. This lead to work on figuring out how these limitations could be overcome, which resulted amongst other things in the early development of the object number system, that could be used independently of page numbers to locate text.

    The use case came to be scholarly writings in law and literature, and conventions and useful across writings in literature, the humanities and law, and a smaller section of the social sciences.

    SiSU came to be through a series of steps which started from seeking to overcome these problems, starting with the recognition that multiple document format types could be generated (and technically updated as need be) from a single lightly structured prepared source text/document, and that these multiple output formats could share a common numbering system for the referencing of text within a document and further, that to achieve this text could be usefully represented as individual objects identified by these object numbers, and these could be the building blocks from which the alternative document representations and formats could be built, to take advantage of many of the individual and distinct native strengths of various primary standard ways in existence, for the convenient representation or extraction of text, each idealised for a different context, amongst them html, XML, ODF, LaTeX, pdf and (SQL type) relational databases.

    Seeking to achieve the requirement of minimal effort (in the form of preparation and maintenance) relative to payoff as regards the described objectives: the idea was to have a document structure meta-markup that with as little effort as possible initially and over time (it should be possible to develop (change or add) output formats without having to think about the original source document), was able to the greatest extent possible, to take advantage of as many of the most interesting features available in each of the most important standard document representational methods, viz. html, XML, ODF, LaTeX, PDF and SQL type relational databases, from that common prepared document source, and that resulted in a meaningful common way of identifying text content.

    This resulted in: (a) a minimalist/light structured markup from which the primary benefits of multiple document representation types could be generated.  264  Keeping markup/preparation relatively minimalist and easy to remember, and independent of the development/evolution of document output representations, in order to keep document preparation effort to a minimum, both initially and with regard to maintenance over time; (b) having an abstraction layer for the representation of the document, that was generated independently of the prepared source, which represented text as numbered objects that could be utilised in any of the final document output representational forms in a shared/ common/ similar way for the location of content within a document  265  Separating markup from abstraction and subsequent outputs meant that the markup syntax and underlying output generating modules could be developed/evolved independently of each other. You could arbitrarily change the markup syntax (or have alternative preparation syntaxes) provided you could generate the abstraction layer, from which subsequent outputs would result. Or you could change the abstraction layer and related output generation modules whilst retaining the markup syntax.

    The first technical work that in any way relates to the way SiSU works dates back to earlyish in the history of the site Lex Mercatoria, which was at the time called Ananse, (and later the International Trade Law Project and then International Trade Law Monitor). Looking for more convenient ways to manage site content, while at the University of Tromso, I had a young student Tommy Johansen look at it whilst over a summer. I (and Geofrey Armstrong) at the time gathered content for the site. Tommy Johansen wrote some Perl scripts for generating html content, which were used early in the sites history and which were convenient in particular for: (a) producing uniform output, (b) separating code from markup, (c) their ability to produce tables of content, (d) the possibility of matching text in a header to segment text (not yet regular expressions). After Tommy Johansen left his scripts were used, pretty much unchanged for a good while, and though this was before text objects, or object numbers, document abstraction, or any document representation other than html, these were features that were retained by what was to become SiSU.

    In 1997/1998 object numbers were introduced to html output, overcoming the problem of the precise location of text within a fixed/published html document. The possibility of using text objects (and object numbers) for other forms of output was conceptually conceived around the same time as the introduction of object numbers to html, as it was clear that this system should have wider use across different types of output.  266 

    In 1999 I was switching from Windows to Gnu/Linux... first Red Hat then SuSE  267  as far as SiSU was concerned, the program was written in Perl and relatively easy to port.  268 

    In 2000 I was switching from Perl to Ruby... well that was the end of 2000, November (Dave Thomas' book which I was waiting for from the beginning of the year was published at last, and I finally received my copy).  269 

    By June 2001 SiSU was generating LaTeX output that was converted to both portrait and landscape pdf that shared the same object numbers as the html output.

    In May 2002 tired of waiting for the version dubbed Woody, I was switching to Debian...   270 

    SiSU search was finally actually implemented in 2002,  271  in the form of the database structure that made object search possible and the ability to populate the database with objects with corresponding object numbers from same document source as other output formats. I did not have much of an immediate incentive to implement search as I did not have an online database. However, having an implementation and showing it around was the reason for the initial opening of these pages and placing a description of what SiSU did on the Net in November 2002, ‹http://www.jus.uio.no/sisu› and updated regularly if haphazardly  272  since, and a pdf chart/diagram that included the relational database aspect as a feature, which should still be available at ‹http://www.jus.uio.no/sisu/diagram/sisu.chart.pdf› (prepared in 2002).  273 

    Concordance files, first called "wordmaps" were introduced the same year 2002. The search front-end has continued to evolve, and screen-shots of that were made in 2004.

    In June 2004 an IBM software innovations evaluator (at first reluctantly) met me, (he was busy at the time, though the contact was arranged through an IBM Manager met at a Linux show, who was curious about what a lawyer was doing with Linux and programming, he asked what is it you are doing and said "we [IBM] should have a look at it"), anyhow, the software innovations evaluator had a look at SiSU and gave it a very positive/ enthusiastic review (so naturally I thought he was great), this was not a code review, mind, it was a "review"/reaction based on what it SiSU did and how it did it, and the implications of it all ... what it meant could be done. To paraphrase, he said:

    We have large document management systems. We can search over a hundred thousand documents and tell you that your search criteria is met by say 300 of them, but there is no way we can tell you without going in to each document, where those matches are... once you open a document we can highlight matches.

    He wrote a letter I kept and published as a souvenir.

    "Ralph Good to meet with you today, I was very impressed with your software.

    [colleague's name] - in summary - Ralph has built an application that runs on linux and takes ASCII documents and pulls them apart in to the smallest constituent parts, storing them as XML, PDF and HTML, the HTML are hyperlinked up so the document can be browsed in its full form. the format and text data created is stored in a database.

    This has potential in any place that needs the power of full text search whilst holding the structural concepts of the document i.e. legal, pharma, education, research.. which ones we need to figure out, ..."

    He suggested I get a software patent. I reluctantly agreed to investigate (that story is told elsewhere).

    Subsequent meetings with IBM were odd ;-)  274 

    Well the person who arranged the original meeting with the "software innovations evaluator", did say that IBM was such a large organisation that different groups were working on different projects and had different interests, and frequently it was a question of meeting the right people; and that there usually were multiple entry points which could be quite different in their interests and responses. Interesting encounters, entertaining mail.

    I was an example of a prime beneficiary of Software Libre, and one who had come to understand/know (believe if you prefer) through use that it was technically superior to proprietary software.

    In January 2005 SiSU was first released under GPL.

    May 2005 first Debian packages for SiSU. I had visited Wookey earlier in the year as a shortcut to building my first Debian package.

    In July 2005 at Debconf5, Helsinki,  275  SiSU was first uploaded into Debian, by Gunnar Wolf.

    At Debconf5 after talking to various people, it was clarified to me that generating hash sums was a fast and not particularly memory intensive process, so the decision was made to incorporate md5 or optionally sha256 hash sums into the document abstraction representation, as this makes possible several additional/alternative forms of document representation that rely on the hashes for unique identification of objects (also across document collections). Document Content Certificates were introduced shortly afterwards that make use of the hash sums to identify objects - headings, paragraphs, footnotes, images etc. and make it possible to evidence the existence of a document's contents without actually publishing it... or show a summary proving that the document remains unchanged.

    In March 2005 with internationalisation in mind, character representation for source documents was switched over to Unicode UTF-8 ... and as a result output readily available across most languages in: html, XML and SQL database representation (PostgreSQL and SQLite), tested to be OK even for Chinese... LaTeX / PDF output, and for ODF, work across several European languages, but need further implementation work for other languages that not yet covered.

    Open Document Format output was first introduced to a SiSU release late in 2005 (October).

    Manifests that summarise the generated output made available, were also introduced late in 2005 as were Zipped versions of SiSU markup containing all related documents and images (sisupod.zip). These latter being a bit interesting as they gather the constituent parts of a document, which include the source document and any images, (and in the case of multilingual documents, may contain multiple language versions of the source document), in a single zipped file, which can be emailed, and which outputs can also be generated from.

    In 2006 I got to visit Oaxtepec, Mexico for Debconf6

    Alternative XML representations for SiSU markup were introduced in 2006 shortly after Subtech... they provide 3 forms of XML (SAX, DOM and a Node based tree, that can be converted to and from SiSU markup) these work though are largely proof of concept and require further work, especially as regards what the XML should most conveniently be.

    Since the release of SiSU code and features have continued to evolve gently... Over the years many "requirements" have been requested, and incorporated, too many to make mention of here, including amongst them things like "canned search" in the sample cgi search forms to fairly complex footnote alternatives, and alternative XML representations of the input text. Since 2005 (SiSU becoming Software Libre), most of these have been mentioned in the changelog, and a few others may be evident from the Chronology pages dating back to 1993.

    Wookey has been a Debian mentor (he introduced me to Debian packaging, and did uploads subsequent to the initial upload of SiSU), in recent times the greatest indirect support (i.e. not coding/programming or developing SiSU directly, that has now run to date for around 10 years now solo) has come from the young Daniel Baumann who is amazing in providing feedback especially in relation to how to package and things technical in Debian, and who has been extremely generous with his time and expertise.

    It was not until March 2007 that a sample search database was put online which can be found at ‹http://search.sisudoc.org

    A rule of thumb for SiSU remains that what it does - the idea, and what it means can be done is more beautiful than the code, which is again a lot more beautiful than these descriptive pages... for which there has been little time and attention, but which indeed I return to and have plans to work on.





    October 3, 1993 Ananse aka the International Trade Law Monitor and then Lex Mercatoria, is live online from this date.

    The origins of SiSU were intertwined with those of the International Trade Law project, first named Ananse (subsequently named the International Trade Law Monitor and then Lex Mercatoria) which was started at the Law Faculty of the University of Tromsø, and had a web presence from this date. From this date the efforts that resulted in SiSU had begun and progress was visible on the Net.

    The project presented legal content (conventions, treaties related to international commercial law) on the web through the site LexMercatoria (aka. Ananse, The International Trade Law Monitor) and resulted in the exploration of the techniques by which this was best done started out as a single multi-faceted project which began in 1993 at the University of Tromsø. The activities of providing legal information, and developing content generating technologies were conceptually easily distinguishable, though most of the early history of what became SiSU was shared/common (between the law content, and the programming for the generation of documents) until LexMercatoria, (the law content of the site, and domain) was acquired in 2000 by the International Law Publishers, Cameron May.

    Lex Mercatoria is dedicated to the provision of information on international commercial law with subsidiary interests in commerce and (mostly open standard) Net technologies that may be of interest to law academics and professionals worldwide.

    Lex Mercatoria is dedicated to the provision of information on international commercial law with subsidiary interests in commerce and (mostly open standard) Net technologies that may be of interest to law academics and professionals worldwide. As such Lex Mercatoria provides information and links related to international commerce and trade law. The LM presents the full texts and where relevant country implementation details of several of the most important conventions and other documents used in international trade and commerce. These materials are presented by subject (e.g. free trade, sale of goods, transport, insurance, payment), chronologically, and has information pages on trade related organisations. LM also maintains extensive links to other sites related by the subject international commerce.

    The subsidiary interests result in a rather large scope of interest for which we try to keep a manageable set of links. Lex Mercatoria is interested in global commerce, both traditional and electronic, and in following the use made of the Web and Net for its promotion. It is interested in the legal and technological infrastructure that exists and that is being developed to facilitate global commerce (both traditional and electronic). More generally Lex Mercatoria is also interested in the means by which paper is replaced electronically in commerce and publishing. Lex Mercatoria is particularly interested in the use of Open Standards and in the availability of adequate information on matters related to the conduct of global commerce. As such interests include:

  • the infrastructure for global commerce more generally that which facilitates global commerce, such as:
  • uniform laws and rules for international commerce
  • technological standards for electronic commerce
  • enabling technologies for electronic commerce
  • information technology useful to commerce and law
  • trends related to publishing on the Net and in particular legal publishing
  • open standard file formats
  • alternative citation systems
  • information management
  • the use of open standards (these being identified as ensuring greater inter-operability; and having the potential for providing much greater security and privacy)
  • Another attempt to describe Lex Mercatoria's origins and purpose:

    Lex Mercatoria was begun in 1993 at the Law Faculty of the University of Tromsø, in Northern Norway. It was originally named Ananse and then the International Trade Law Monitor. It was the first legal website devoted to a particular subject area (admittedly a general and broad one) namely, international trade and commercial law. Lex Mercatoria provides the text of some of the more important treaties, conventions, model laws, rules aimed at harmonizing international trade/commerce, and sets of links to sites that are of interest for (the working of) international commerce. Lex Mercatoria has continued in its original spirit to grow its independent and egalitarian set of link collections in response to a continuous exploration of the use and implications of the Net for international commercial law, international commerce and publishing. Recognising the problems for information management resulting from the glut of information available on the web an attempt is made to organise and restrict the links provided to those that are likely to be most useful in the area targeted.

    Lex Mercatoria is particularly interested in uses made of the Net (both in international commercial law and in technology related to electronic commerce) for the provision and development of: open (and harmonizing) standards; and for readily available deep and accurate information.

    Always remembering that we are a small unit and will continue to do what we can, we have defined our objective broadly and generously as being:

    "To investigate the potential of W3 as an information resource, with regard to legal research and education. This we plan to do taking a practical example, - focusing on international trade law as a limited and vitally important area of law that is of global interest". [This we shall pursue as far as we are able.]

    This statement of "our objective" dates back to the project's conception in 1993. It ought now be moderated, but its spirit remains unaltered. Within this time span The Web has proven its worth, independently of any individual's efforts or investigations - its' creators apart.

    We however have multiple objectives, which include:

  • "To explore, utilize and demonstrate the potential of the new IT mediums insofar as they pertain to our chosen subject area." (1993) In this there has been an element of figuring out what can be done most effectively/ successfully with limited resources. We have stuck to a few basic tools and rules of thumb, and have gained considerable experience in: _getting_ the most out of the basic text markup language of the Web html without frills; efficient site management (with the help of Perl); the selection and effective use of basic tools (an editor, markup languages, scripting languages); and the importance of efficiently maintaining cross platform (server and browser) inter-operability - through the selection and careful use of inter-operable and preferably open standards. An outline of our navigation and text presentations may be viewed Full Text here; for an overview of our contents our TOC home page is your best bet; and we also have general pages on Specific Paragraph key technologies which may be of interest.
  • Towards greater: transparency; harmonization and unification; and uniformity of application - in international trade/ commerce (law). (1995)
  • The area of attention of Lex Mercatoria has expanded somewhat with the developments in use of the Net as they pertain to international commerce, a short description is attempted in the next section.

    The history and more general information on LexMercatoria may be found at ‹http://www.lexmercatoria.org/› or ‹http://www.jus.uio.no/lm/› its home pages, or more specifically off information pages on the site ‹http://www.jus.uio.no/lm/lm.information/toc.html



    July - August, 1994 The first steps towards automation on the Trade Law Monitor, a number of Perl scripts, for presentation of convention texts by Tommy Johansen.



    January 1995 We were visited by the Director Professor Nicholas Triffin and Executive Secretary Albert Kritzer, of the Institute of International Commercial Law (IICL), Pace University School of Law. The IICL, under the direction of Professor Albert Kritzer, are engaged in a Project on the United Nations Convention on Contracts for the International Sale of Goods. Professor Kritzer has significant publications on this Convention.

    This visit, was the most important event to happen to the International Trade Law site, at the time and generated a lot of positive press.

    11th April 1995 Volume of PC Magazine   276  our "Trade Law Library Page"   277  was selected as one of PC Magazine's top 100 Web Sites. "Trade Law Library" The one other law site selected being: The Legal Information Institute at Cornell University   278 


    PC Magazine

    16th June 1995 New/ Reorganized, less cluttered: International Trade Law - Home Page All work from this time is done on a "new" second server, which remains unofficial. The original server still open to the public. An attempt is made to maintain both servers. The intention is that data transfer and mirroring between the new NT and original UNIX HP server running NCSA Mosaic Server should be seamless.

    August, 1995 Extensively listed by the Yale University United Nations Scholars' Workstation August 12, 1995: (1) Decision made to ensure that the ITL site is portable and not tied physically to any given location. (2) Decision to transfer from UNIX to Windows NT platform. All substantial additions and changes since mid-June have been on this server.

    17th November 1995 "Evaluation of the ITL"   279  positive Project evaluation of the ITL (International Trade Law Project by Professor Olav Torvund, of the Norwegian Research Center for Computers and Law, for the Information Technology, Oslo. ITL presentation of International Trade Law materials on The Internet using World Wide Web. This also gives a history of the effort.



    Ralph Amissah - SubTech: attended the "Fourth International Conference on Substantive Technology in Law School and Law Practice", hosted by the University of Quebec, Montreal.

    August, 1996 US Library of Congress - complimentary remarks on the work up to this time.

    US Library of Congress   280  "Guide to Law Online Linking Page to:

    INTERNATIONAL TRADE LAW SITES

    INTERNATIONAL TRADE LAW Treaties, etc. (from Tromsø, Norway) This superb site created by Ralph Amissah and hosted by the University of Tromsoe, in Norway, is one of the very finest law Web sites in the world. It provides an extensive list of international trade conventions and related instruments, including rules and model Laws, and often provides hypertext access to the full texts. This basic list is arranged by decades, but the component lists, arranged by topic, may often be more useful, and may be accessed directly ..." [bold text added for emphasis] verified 06/1997 (till 02/2001)


    U.S. Library of Congress



    February 7, 1997 "On the Net and the liberation of information that 'wants' to be free"   281  published on ITL - updated 17 and submitted for paper publication. Work paper contributed to the publication prepared in commemoration of the 10th Anniversary of the Law Faculty of the University of Tromsø.  282  Now I must confess that I did not know about the FSS or OSS at the time, and it would have been a good thing/useful to incorporate these ideas.

    This article was attached part of an official submission to the judges of the Washington Supreme Court and Court Commissioner's Office, prepared by Mr. Bradley Hillis; Office of the Administrator for the Courts, State of Washington, U.S. in October 1997.

    March 14, 1997 Rudimentary Electronic Citation System and Electronic ID or Document Verification System complete. Presentation of Article "On the Net..." provided as a practical example, the substantive text remains the same as that of February 17. Electronic Citation is mentioned in passing within the article that is provided as an example, at e§ 75 and e§ 148. These are the numbers found at the end of "paragraphs" marked ecs § # in a ghost or shadow colour and in superscript, or: "ecs § ..." [this was subsequently changed to { 75 } and then just the number displayed in the page margin]. Although provided as hyperlinks here, these numbers are particularly useful in written citation of the text, as different browsers format html texts differently, and most browsers print the same document out in different font sizes, resulting in different page numbering. After some consideration, it appears to me that for the present time the preferred citation system is the simplest, numbering sequentially all elements of the substantive document - including title, author, headings and paragraphs. Anything else requires decisions as to what may be best and why and how to achieve uniformity of adoption. For example should headings be numbered differently from ordinary text? What about the author's numbering of such headings if any? Should sub-headings be numbered differently from headings, why not? If so how? Until such questions are decided, this is my take, our "ghost 'paragraph' numbering" will be incorporated into future texts presented at this site without other suitable means of referencing.

    March 1997 At this time am also working on an Electronic ID or Document Verification System, using the hash value of the ascii content of text to ascertain that a published version has not been changed. The tools are already available to do this, but it is a new idea, and challenge for me.

    Summer 1997 "Missing Specifications in International Sales, Article 65 of the CISG" published in the Pace International Law Review  283 

    September 1997 Paper: The Autonomous Contract: Reflecting the borderless electronic-commercial environment in contracting. Presented at the XIII Nordic Conference on Legal Informatics 17th - 19th September 1997 and published in "Elektronisk handel - rettslige aspekter. Nordisk årsbok i rettsinformatikk 1997" (Electronic Commerce - Legal Aspects. The Nordic yearbook of Legal Informatics 1997) edited by Randi Punsvik. ISBN 82 518 3686 7.

    October 1997 "On the Net"   284  article was attached as part of an official submission to the Judges of the Washington Supreme Court and Court Commissioner's Office that was prepared by Mr. Bradley Hillis; Office of the Administrator for the Courts, State of Washington, USA.

    December, 1997 Ralph Amissah - paper: Missing Specifications in International Sales: Article 65 of the United Nations Convention on Contracts for the International Sale of Goods, 9 Pace International Law Review (1997) 239-255, December 1997.



    1998 All work done off-line. One site update in April. Updating continued after that offline.

    January 1998 Ralph Amissah - Guest Speaker at the Association of American Law Schools Annual Meeting, San Francisco by invitation and under the sponsorship of the National Center for Automated Information Research. Topic: Thinking and Teaching about Law in A Global Context as an Exercise in Common Enterprise. (presentation of "The Trade Law Monitor: Recognizing, Understanding and Taking Advantage of the Discontinuity in Information Dissemination that the Net Represents"). With respect to the substantive technology related to the project, a few ideas related to and implemented in SiSU at the time were presented (the name SiSU came later). These included citation independent of format (independent of page numbers: numbering of everything sequentially as an object, headings, paragraphs etc. except footnotes/endnotes, which belong to the object/paragraph that references them, (which were not sequential as they could be either footnotes or endnotes) point being work that was taking place at the time to set rules for distinguishing headings and other objects and numbering them differently added little value and was more of a hindrance than an aid), and document authentication (which less has been done with, but is also evident in the work).

    April 1998 SiSU pre-processing of standard form documents against termsheets to produce Banking legal documentation sets.

    somewhere in 1998 Finally understood that the OSS and FSS works, and that it has and continues to produce some of the very best software in existence today  286 

    somewhere in 1998 Generated/published a version of "Tainaron - Mail from another city" by Leena Krohn   287  using SiSU   288 

    from late 1998 - 1999 Extensive work with "rationalising" the design and maintenance of the site - close to 90% of the site as a result is automatically generated from various Perl scripts that identify what to do with each text. Optimisation for more recent versions of the browsers: Opera; Internet Explorer; and Netscape (in roughly that order). Scripts do large batches. Finally an easy/convenient way to handle tables.



    February 1999 Decision made Gnu/Linux identified as the most attractive way forward. Perl works as it should on the platform. I have had a good time with NT but it is resource hungry. (more recently I hear MS has plans to do something to address its shortcomings in the Perl department).  289 


    a better way

    a better way

    March 1999 Lex Mercatoria site down. Critical hard disk failure. Have been working on a new site - all texts being generated by Perl scripts, which greatly improve the ease of maintenance. A trip to Norway is called for. Question is whether to get the old site back up, or push on to have the new site ready as soon as possible.

    8th March 1999 Ralph Amissah - made a Fellow of the Institute of International Commercial Law, School of Law, Pace University, White Plains, NY, USA

    17th May 1999 New site is ready, planned hosting in Norway and the US as detailed in the credits at the bottom of the pages.

    More efficient techniques used in creating the site.

    May 27-29, 1999 Lex Mercatoria back on the air and grateful to the Law Faculty of the University of Oslo for hosting the site. Somewhat streamlined, possibly slightly smaller than we were and for the time being, but technically superior to anything that we have been (construction of the site is fully automated with only one page being manually constructed) and with the potential to become better yet. At this time the home page is the only manually generated page on the site, which is once again hosted on a UNIX platform (Sun Solaris running Apache) which happens to be what the University of Oslo uses.

    May Scripts (numbering system etc.) have been used at the request of Albert Kritzer and Richard Hainebach to produce a Kluwer text Uniform Law for International Sales, Sales under the 1980 United Nations Convention, Third Edition by John O. Honnold, Schnader Professor of Commercial Law Emeritus University of Pennsylvania, Secretary, UNCITRAL, and Chief, U.N. International Trade Law Branch, 1969 - 1974, Kluwer Law International. Also made kindly made available by Kluwer for testing of scripts International Project Finance by Hoffman. At some point prepared content from the Trade Law Project (prepared by our scripts) is noticed within the Kluwer Arbitration site, did not have a problem with this, but the direction of content flow should remain clear.

    2nd June 1999 LexMercatoria regenerated with first set of "bugs" cleared most documents should now have titles, which are required for meaningful query results from the search engine. (any fresh bugs will be corrected in next update).

    14th July 1999 There has been quite an extensive update of the site though much remains to be done. For a trial period of three weeks we will try to wean you off our old home page and trust you will be able to find your way about our new one. If your browser supports redirection, you will be redirected to the auto-generated page one minute after the old home page has been fully loaded. Unless there is good reason to reconsider we are likely to phase out the old home page, in time.

    Download times for the site would speed up considerably if we dropped the use of tables on long documents, and we are considering this. This is particularly noticeable if you (like myself at present) are not amongst the privileged with broadband Net access. There are bound to be a few bugs. Not all files have yet been transferred from the old site to the new, though the new site contains a more up to date set of documents. Our old file system was insensitive to case, the new file system is case sensitive, some links may not yet be fully compliant. Patience, these and any other issues will be addressed.

    6th December 1999 Another new interface for the site is under test, the result of another generation of improvement in our site building tools (collectively fondly nicknamed SiSU). Information on the text presentations and navigation is available  290  . There is much greater consistency in presentation and viewing should have been enhanced and (for most part) made faster, across most graphical browsers and platforms. What we unfortunately do not provide examples of and so you will not see is that it is particularly well suited to the electronic publication of books, and has been tested on several legal academic and practitioners texts of over 500 pages. In parts of the site there are likely to be some "bugs", these however bad they look, should from a technical standpoint be minor to correct.

    Status as of year end 1999 The document providing information on the text presentations and navigation contains a summary of the year from that perspective which is copied below:

    The site has undergone a facelift for the Millennium, but in most respects our focus with regard to the presentation of documents has remained the same. We hope it results in an improved user experience.

    In 1993 we boldly set out amongst other things:

    "To explore, utilize and demonstrate the potential of the new IT mediums insofar as they pertain to our chosen subject area."

    We have largely achieved this goal in demonstrating how various complicated legal (and other) documents of different content, structures and sizes can be can be presented on the Net using simple html.

    If we have been limited in the possibilities that we have explored and utilized, our path has been selected by figuring out what could be achieved most effectively/ successfully with limited resources. We have stuck to a few basic tools and rules of thumb, and have gained considerable experience in: getting the most out of the basic text markup language of the Web, html, without frills; efficient site management; the selection and effective use of basic tools (an editor, markup languages, scripting languages); and how to efficiently maintain cross platform (server and browser) compatibility in our product, through the selection and careful use of inter-operable and preferably open standards, and focus of effort on (few of) what we determine to be key complementary technologies. Our approach has been to identify simple, effective and efficient tools and solutions and to get the most out of them. In effect we have been exploring what can be made of technologies that are available to anyone on the Net. We have also kept an eye on other IT technologies that we do not necessarily use but provide for your perusal and benefit through the maintenance of an information technology compendium.

    In the construction of this site our primary focus has remained since the outset (1993) been on presenting texts using html in a convenient manner. It has in part represented an experiment in how best this might be done for our purposes. The results remain as good as can be found anywhere for publications using html 4.0.

    Our aim has been to be able to provide and create and maintain efficiently high quality usable presentations of texts (legal, academic, practitioner's, & including conventions, rules, contracts) whilst avoiding unnecessary complexity, indeed, so far it has been achieved using the most basic of markup languages on the Net, plain html with the help of Perl scripts  291  for its transformation from ascii.

    Our 1996 list of design criterion for text presentations has now been met and implemented consistently throughout the site [though a few bugs may still remain]. Whilst most individual requirements set were met as early as 1997, presentations have been continuously improved upon. The rationalisation of how best to achieve consistent presentation across various types of text, and its implementation is a feature of the 1999.  292  An idea of these criterion may be gleaned from the contents of this document.

    The year's changes improve the site and to provide greater utility from text presentations, including: greater consistency between different types of presentation; improved navigation of the site and individual texts; faster loading and better rendition of texts across different types of browser, the main ones we support being Opera, Internet Explorer, Netscape Navigator, (and we expect Konquerer).

    The programs that generate the site have been tested on several books (academic and practitioner's texts) of over 500 pages, and the results are particularly well suited for their electronic presentation. The text navigation and presentation features (generated by the site generation program) come to their own on these longer texts, in which it is easy to appreciate the utility of the resulting document presentations.

    So on the technical front we are now, in a sense, free to set new goals, and indeed may look in a number of additional directions. The site has concentrated on making the most of html presentations across most modern browsers, and without making concession to having different presentations for different types of browser. In future we may also present texts as in RTF and possibly pdf, but our primary additional focus will be on XML and we will look at xhtml. /PHP/ being open source and designed for cross-platform functionality is of interest. We may if requested go back to having (in addition) html presentations without our paragraph numbering. In mentioning these possibilities we perhaps run a bit ahead of ourselves, as far as this text is concerned.

    Introduce a navigation page describing how to use the auto-generated pages on Lex Mercatoria ‹http://www.jus.uio.no/lm/navigation/doc.html

    "Always remembering that we remain a small unit and will continue to do what we can."



    16th February 2000 First read about Ruby, around this date (appears in diary), with the comment "just read of, apparently combines the best features of Perl and /Python/".  293  Immediately installed Ruby,  294  and started reading ruby-talk.  295 

    28th April 2000 Ruby Talk item lists my having voted for the Ruby Newsgroup by this date.  296 

    June 2000 Ralph Amissah - paper Revisiting the Autonomous Contract presented at the Schmitthoff Symposium 2000, Law and Trade in the 21st Century, Legal Problems in International Business at the Dawn of the New Millennium, held by The Centre for Commercial Law Studies, Queen Mary and Westfeild College, University of London.

    July 2000 Ralph Amissah - at LII, Cornell Law School, (Professor Tom Bruce and Professor Peter Martin) "Summit" for 18 participants on Emerging Public Legal Information Standards, session leader for Site Structuring for an International Audience.

    8th July 2000 Lex Mercatoria ‹http://www.lexmercatoria.org/› is acquired by Specific Paragraph Cameron May , internationally renowned law publishers and conference organizers. Ralph Amissah the original site author and owner remains actively involved with the site. The programs (SiSU) that were and continue to be used to generate Lex Mercatoria remain with Ralph Amissah.

    <