SiSU -->
[ document manifest ]
<< previous TOC next >>
< ^ >

SiSU - SiSU information Structuring Universe - Structured information, Serialized Units,
Ralph Amissah

Structured information, Serialized Units

SiSU - from less markup than the most elementary equivalent html, you can have more

1. Description

1.1 Outline
1.2 Short summary of features
1.3 How it works
1.4 Simple markup
1.4.1 Sparse markup requirement, try to get the most out of markup
1.4.2 Single markup file provides multiple output formats
1.4.3 Syntax relatively easy to read and remember
1.4.4 Kept simple by having a limited publishing feature set, and features identified as most important, are available across several document types
1.5 Designed with usability in mind
1.6 Code separate from content
1.7 Object citation numbering, a text or object positioning / citation system - "paragraph" (or text object) numbering, that remains same and usable across all output formats by people and machine
1.8 Handling of Dublin Core meta-tags making use of the Resource Description Framework
1.9 Easy directory management
1.10 Document Version Control Information
1.11 Table of contents
1.12 Auto-numbering of headings
1.13 Numbering and cross-hyperlinking of endnotes
1.14 "Skinnable"
1.15 Multiple Outputs
1.15.1 html - several presentations: full length & segmented; css & table based
1.15.2 EPUB
1.15.3 XML
1.15.4 ODT:ODF, Open Document Format - ISO/IEC 26300:2006
1.15.5 PDF - portrait and landscape, (through the generation of LaTeX output which is then transformed to pdf)
1.15.6 Search - loading/populating of relational database while retaining document structure information, object citation numbering and other features (currently PostgreSQL and/or SQLite)
1.15.7 Search - database frontend sample, utilising database and SiSU features, including object citation numbering (backend currently PostgreSQL)
1.15.8 Other forms
1.16 Concordance / Word Map or rudimentary index
1.17 Managed (document) directory, database, or site structure
1.18 Batch processing
1.19 Integration to superior Gnu/Linux and Unix tools
1.19.1 Backup and version control
1.19.2 Editor support
1.20 Modular design, need something new add a module

2. Markup and Output Examples

2.1 Markup examples
2.2 A few book (and other) examples
2.2.1 "Viral Spiral", David Bollier
"The Wealth of Networks", Yochai Benkler
"Two Bits", Christopher Kelty
"Free Culture", Lawrence Lessig
"CONTENT", Cory Doctorow
"Democratizing Innovation", by Eric von Hippel
"Free as in Freedom: Richard Stallman's Crusade for Free Software", by Sam Williams
"Free For All: How Linux and the Free Software Movement Undercut the High Tech Titans", by Peter Wayner
"The Cathedral and the Bazaar", by Eric S. Raymond
"Down and out in the Magic Kingdom", Cory Doctorow
"Little Brother", Cory Doctorow
"For the Win", Cory Doctorow
"Accelerando", Charles Stross
"Tainaron", Leena Krohn
"Sphinx or Robot", Leena Krohn
"War and Peace", Leo Tolstoy, PG Etext 2600
"Don Quixote", Miguel de Cervantes [Saavedra], translated by John Ormsby, PG Etext 996
"Gulliver's Travels", Jonathan Swift, transcribed from the 1892 George Bell and Sons edition by David Price, PG Etext 829
"Alice's Adventures in Wonderland", Lewis Carroll, PG Etext 11
"Through The Looking-Glass", Lewis Carroll, PG Etext 12
"Alice's Adventures in Wonderland" and "Through The Looking-Glass", Lewis Carroll, PG Etexts 11 and 12
"Gnu Public License 2", (GPL 2) Free Software Foundation
"Gnu Public License v3 - Third discussion draft", (GPLv3) Free Software Foundation
"Debian Social Contract"
"Debian Constitution v1.3", (simple/default markup)
"Debian Constitution v1.3", (markup adjusted for output to more closely match the original)
"Debian Constitution v1.2", (simple/default markup)
"Debian Constitution v1.2", (markup adjusted for output to more closely match the original)
"A Uniform Sales Terminology", Vikki Rogers and Albert Kritzer
"The Autonomous Contract" 1997 - markup sample
"The Autonomous Contract Revisited" - markup sample
"United Nations Convention on Contracts for the International Sale of Goods"
/PECL/ the "Principles of European Contract Law"
2.3 SQL - PostgreSQL, SQLite
2.4 Lex Mercatoria as an example
2.5 For good measure the markup for a document with lots of (simple) tables
2.6 And a link to the output of a reported case

3. A Checklist of Output Features

4. Introduction to SiSU Markup  114 

4.1 Summary
4.2 Markup Examples
4.2.1 Online
4.2.2 Installed

5. Markup of Headers

5.1 Sample Header
5.2 Available Headers

6. Markup of Substantive Text

6.1 Heading Levels
6.2 Font Attributes
6.3 Indentation and bullets
6.4 Footnotes / Endnotes
6.5 Links
6.5.1 Naked URLs within text, dealing with urls
6.5.2 Linking Text
6.5.3 Linking Images
6.6 Grouped Text
6.6.1 Tables
6.6.2 Poem
6.6.3 Group
6.6.4 Code
6.7 Book index

7. Composite documents markup

Markup Syntax History

8. Notes related to Files-types and Markup Syntax

9. Commands Summary

9.1 Description
9.2 Document Processing Command Flags

10. command line modifiers

11. database commands

12. Shortcuts, Shorthand for multiple flags

12.1 Command Line with Flags - Batch Processing

Technical Information

13. Technical notes

13.1 See abandoned U.S. Provisional Patent Application

14. Diagram / Chart

14.1 The Chart
14.2 I/O
14.3 The Program
14.4 Software utilised
14.4.1 SiSU
14.4.2 SiSU Modules

15. SiSU development environment and technologies of interest, including data formats

15.1 Development environment, Debian
15.2 Programming language, Ruby
15.3 SGML & XML Family
15.3.1 SGML
15.3.2 XML Family
15.4 TeX Family
15.5 Pdf
15.6 Relational Databases, SQL
15.7 Other Databases
15.8 Text Search
15.9 Character Encoding, Unicode
15.10 Information Visualization
15.11 Metadata - semantic
15.12 Syndication, Web feed formats
15.13 Other
15.14 Editors
15.15 Version Control
15.16 Licenses

A Summary of notable events

16. A history of SiSU and its outputs including search

A Chronological history of developments on SiSU

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

January
February
March
April
June
July
August
September
November
December

2004

January
February
March
April
May
June
July
August
September
October
November
December

2005

January
February
March
April
May
June
July
August
September
October
November
December

2006

January
February
March
April
May
June
July
August
September
October
November
December

2007

January
February
March
April
May
June
July
August
September
November
December

2008

January
February
April
June
September
October
November
December

2009

January
December

2010

March

2010

March

FAQ, Howto, Installation, etc.

HowTo

17. Getting Help

17.1 SiSU "man" pages
17.2 SiSU built-in help
17.3 Command Line with Flags - Batch Processing

18. Setup, initialisation

18.1 initialise output directory
18.1.1 Use of search functionality, an example using sqlite
18.2 misc
18.2.1 url for output files -u -U
18.2.2 toggle screen color
18.2.3 verbose mode
18.2.4 quiet mode
18.2.5 maintenance mode intermediate files kept -M
18.2.6 start the webrick server
18.3 remote placement of output

19. Configuration Files

20. Markup

20.1 Headers
20.2 Font Face
20.2.1 Bold
20.2.2 Italics
20.2.3 Underscore
20.2.4 Strikethrough
20.3 Endnotes
20.4 Links
20.5 Number Titles
20.6 Line operations
20.7 Tables
20.8 Grouped Text
20.9 Composite Document

21. Change Appearance

21.1 Skins
21.2 CSS

Extracts from the README

22. README

22.1 Online Information, places to look
22.2 Installation
22.2.1 Debian
22.2.2 RPM
22.2.3 Source package .tgz
22.2.4 to use setup.rb
22.2.5 to use install (prapared with "Rake")
22.2.6 to use install (prapared with "Rant")
22.3 Dependencies
22.4 Quick start
22.5 Configuration files
22.6 Use General Overview
22.7 Help
22.8 Directory Structure
22.9 Configuration File
22.10 Markup
22.11 Additional Things
22.12 License
22.13 SiSU Standard

Extracts from man 8 sisu

23. Post Installation Setup

23.1 Post Installation Setup - Quick start
23.2 Document markup directory
23.2.1 Configuration files
23.2.2 Debian INSTALLATION Note
23.2.3 Document Resource Configuration
23.2.4 Skins

24. FAQ - Frequently Asked/Answered Questions

24.1 Why are urls produced with the -v (and -u) flag that point to a web server on port 8081 ?
24.2 I cannot find my output, where is it?
24.3 I do not get any pdf output, why?
24.4 Where is the latex (or some other interim) output?
24.5 Why isn't SiSU markup XML
24.6 LaTeX claims to be a document preparation system for high-quality typesetting. Can the same be said about SiSU?
24.7 Can the SiSU markup be used to prepare for a LaTex automatic building of an index to the work?
24.8 Can the conversion from SiSU to LaTeX be modified if we have special needs for the LaTeX, or do we need to modify the LaTeX manually?
24.9 How do I create GIN or GiST index in Postgresql for use in SiSU
24.10 Are there some examples of using Ferret Search with a SiSU repository?
Have you had any reports of building SiSU from tar on Mac OS 10.4?
24.12 Where is version 1?
24.13 What is the difference between version 1 and 2?

Installation

25. Installation

25.1 Debian
25.2 Other Unix / Linux
25.2.1 source tarball

26. SiSU Components, Dependencies and Notes

26.1 sisu
26.2 sisu-complete
26.3 sisu-examples
26.4 sisu-pdf
26.5 sisu-postgresql
26.6 sisu-remote
26.7 sisu-sqlite

27. Quickstart - Getting Started Howto

27.1 Installation
27.1.1 Debian Installation
27.1.2 RPM Installation
27.1.3 Installation from source
27.2 Testing SiSU, generating output
27.2.1 basic text, plaintext, html, XML, ODF, EPUB
27.2.2 LaTeX / pdf
27.2.3 relational database - postgresql, sqlite
27.3 Getting Help
27.3.1 The man pages
27.3.2 Built in help
27.3.3 The home page
27.4 Markup Samples

28. SiSU Components, Dependencies and Notes

29. Breakage and Fixes

31st October 2006 - SiSU < 0.48.3 break against Ruby > 1.8.5-3, break on cyclic include; Fixed SiSU: >=0.48.3 (see notes)
21st September 2005 - Avoid ruby-1.8.3 (2005-09-21) and (2005-10-12), Ruby Segfaults; Fixed: later versions of Ruby (see notes)

License, Standard

30. License

31. Things SiSU Standard

Download information

Download information

32. Download SiSU - Linux/Unix

SiSU Current Version - Linux/Unix
Source (tarball tar.gz)
Git (source control management)
Debian
RPM

Changelog - sisu

33. SiSU Version Manifest / changelog

Current version
3.0
Previous versions
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2.0
1.0
0.71
0.70
0.69
0.68
0.67
0.66
0.65
0.64
0.63
0.62
0.61
0.60
0.59
0.58
0.57
0.56
0.55
0.54
0.53
0.52
0.51
0.50
0.49
0.48
0.47
0.46
0.45
0.44
0.43
0.42
0.41
0.40
0.39
0.38
0.37
0.36
0.35
0.34
0.33
0.32
0.31
0.30
0.29
0.28
0.27
0.26
0.25
0.24
0.23
0.22
0.21
0.20
0.18
0.16
0.14
0.12
0.10
0.8
0.6
0.4
0.2
0.1
Release

Changelog - sisu-markup-samples

34. Version Manifest / changelog - SiSU Markup Samples

Current version
2.0
1.1
1.0

Method for providing digital documents including a common citation structure

[SiSU Provisional Patent Application of 2004 based on much older idea and work on SiSU, Abandoned]

The 'Invention' described (and diagrams) by Ralph Amissah.
Provisional patent application text prepared by Stephan Filipek of Winston & Strawn LLP

35. 1. Background

36. 2. Definitions

37. 3. Brief Descriptions of the Drawings

38. 4. Detailed Description of the Preferred Embodiments

39. 5. Document Processing, examples of subsequent steps

40. 6. Advantages of the Invention

41. 7. THE CLAIMS

Post Filing Appendix

42. Post Filing Appendix: Reasons for Abandonment of Patent Process Claim

Endnotes

Endnotes

Metadata

SiSU Metadata, document information

Manifest

SiSU Manifest, alternative outputs etc.

Method for providing digital documents including a common citation structure

[SiSU Provisional Patent Application of 2004 based on much older idea and work on SiSU, Abandoned]

The 'Invention' described (and diagrams) by Ralph Amissah.
Provisional patent application text prepared by Stephan Filipek of Winston & Strawn LLP

39. 5. Document Processing, examples of subsequent steps

Figure 3 is a flowchart including the process of Figure 1 and several downstream processes that make use of the output provided by the present method. In particular, the output 36 from the processing technique 10 can be utilized to generate an html document 100, a XML document 200, a LaTeX document 300, word maps and/or indexes 400, RSS feeds 500, and provide data for a relational database 600. It should be understood that the output formats described herein are exemplary only, and that other output formats are possible and could be added. For example, additional types of output modules that utilize the output 36 of the technique 10, such as a Lout or a texinfo module, could be added to obtain other desired output type documents.

For HTML output the program must process each object. If a heading of a particular level exists it is used to build a table of contents, and provide the appropriate html markup: e.g. a bold or heading value. If segmented html is to be produced, the default level for splitting the document (level 4) is used, and the name for the segment associated with level 4 is used, or the segment is assigned a name. Endnotes are taken from the text object in which they are embedded, and placed either at the end of the segment, or in a segment of their own at the end of the document, or both. Native markup is converted to html equivalents, e.g. bold underscore and the like. Object citation numbering (created earlier in the processing technique 10 of Figure 1) is made available, (currently in the page margin beside the Object to which they belong). Document appearance modifiers are applied if instructed in the header. Semantic metadata is placed in the HTML header and made available for searching, and semantic metadata is provided in visible form at the end of the document, as additional document information.

For LaTeX output, which is used to generate pdf output, appropriate Latex headers are created which usually are the programs default standard, which instructs that there should be a table of contents, how page numbering should look and so on. Any additional processing instructions are used to mark up that document. For example, that there should be a new page or a new column between specified levels. LaTeX markup is applied to the document in place of native markup, including the markup of appropriate headings in LaTeX style, from which LaTeX will make its own table of contents as instructed, and markup of footnotes/endnotes. Object Citation Numbering is made visible, in a convenient way for common citation system, (currently in text margins, beside the Object to which they belong). It is noted that there is a Lout module that operates in similar manner to produce Lout output that is used to generate a pdf file output.

For XML the program must process each Object. The document structure is to define XML structure. XML structure may be embedded within different levels of elements, with appropriate tagging. It has been convenient to separate structural levels 1 to 3, and subsequent levels 4 to 6 (or 8 as appropriate). Alternatively, a more flat version of XML may be produced, where heading level information is provided as an attribute. Endnotes are either kept as embedded where they occur and XML tagged appropriately, or placed after the text of the object in which they occur with their appropriate tags, and a marker within the text as to where they occur. Object Citation Numbers are tagged and placed in the Object to which they belong. Semantic metadata at the head of the document.

For building indexes or Word Maps, the output document is scanned (if instructed to) against a prepared array of relevant subject specific terms. If no such prepared list is provided (and in any event for words not contained in the subject specific list), it is scanned to identify each unique word (that has not been excepted in a list of exceptions). An alphabetic list of all relevant words and or terms found in a document is prepared and presented, displaying against each word or term the Object Citation Numbers for each of the locations within the document at which each word or term is to be found. The Object Citation Number may be displayed as a live link to the location within the document to which it refers. Footnotes are indexed as part of the Object from which they are referenced, and indexed words/terms occurring within them are listed under the Object Citation Number of the Object to which they belong.

For building RSS feeds, semantic metadata is used from the input document 12 and a RSS feed is created as instructed, based for example on the documents' subject and date. Abstract or notes in semantic markup are used to describe the document, and links are created as instructed to the other types of output created.

In the case of populating a Relational Database (for example, a SQL database, such as PostgreSQL), different structures may be used to represent the document in the database, but the following has been useful and utilized. If they do not already exist from previous processing, four tables are created:

  • One table with a record/row per document, to contain the document headers. Semantic metadata is stored a column per item, and as a minimum the document title is required. However, for additional search possibilities, the more semantic data there is available about the document the better. Markup instructions may also be stored in their own columns if the option of recreating the original input document is desired). A row for each document is stored in the database.
  • A second table, with the content of the document, containing one record/row per Object. Here, the columns contain at least all structurally relevant information, (e.g. if the Object is a heading, then information about what level heading it is), their object citation numbers (if any), information on the footnotes associated with that Object, and, cross linked to the first table, identifiers that the content belongs to a specific document in table 1. As database input is automatic (a machine process using the output 36 of the Processing method 10 to populate the database), it is convenient to have the content entered in two separate columns, one clean without markup, in an indexed column for searches, the other with some form of standard markup for viewing (such as basic HTML where font alterations, such as emphasis have been applied). The two columns are necessary to be able to recreate the source document if display markup has been included in it.
  • A third table is used to contain any endnotes, one record/row per endnote, with columns (at least) for its content, the endnote number, and the Object Citation Number of the Object to which they belong, referencing it back to table 2. Again, it is convenient to have two columns containing the content, one clean for searching, and the other marked up for display (and to be able to recreate the source document).
  • A fourth table may be used, a record/row per document to contain the other forms of output that have been created for the document, each in their own column, with data included as "blobs" within the database e.g. for XML, html, and/or pdf outputs. These documents may also exist independently of the relational database, as described earlier, stored on the file system. (It is just as easy to have external references to the file system outputs). Documents may be added, deleted, updated, etc. to the relational database, by streaming them in from a source document 12 that has gone through the processing technique 10.
  • Each of the tables would, in an additional column to those described herein, rely on the database to handle unique "id" serialization of the records contained in the database.
  • An advantage of being able to stream in the document at the granular level implied by representing an Object per row in table 2, is that searches may be done, and search results presented at an Object level, together with the relevant Object Citation Number. The Object Citation Number is conveniently identical and therefore relevant for locating the search result in all document output forms. For example, an elaborate SQL search may be conducted on content and semantic data, and the result could be the title of each document in which the search is found, with each title followed by a listing of all Object Citation Number locations where the search was found. These Object Citation Numbers could be presented as direct links to the referenced content. In this case an index for the contents of all documents in the database matching the search term is generated. Alternatively, for the same search a preferred result may be to obtain document titles followed by the Objects in which the search term was found.
  • It is noted that an additional table may be created:

  • A fifth table may be created to store added comments, and may be associated with a given object. However, other aspects may become desirable, as one of the main features of the program is the simplicity of recreating output from the initial input files. Here the input files have added content, and it is no longer possible to just destroy and recreate, or update the database.
  • The relational database contains all the information required to reproduce the document in form for input to the first process 10 of Figure 1. The input for processing has a simple form, which leads to additional beneficial and interesting possibilities. For example, if comments are added to the relational database, they may be incorporated back into an original document as unnumbered objects, following the object to which they were attached, and the document could be reprocessed to include, or ignore these added comments. There are further possibilities, such as being able to instruct via the database that a particular object should be replaced by another object or objects. The complexity would lie in processing the database information to produce a document of the required form for input to the processing technique of Figure 1.

    An example of the utility added to the database by combining the features explained above includes the ability to search content or metadata for matching documents. A result may be to get the titles, including the objects containing the search terms, along with their object citation number. Alternatively, the result could be just the titles and object citation numbers, which looks remarkably like a book index, of the specific search term with locations of all the results. These results can be obtained directly from the database, and can be used to pinpoint the text in one of the more richly marked up documents (vis html, XML, pdf).




    [ document manifest ]
    << previous TOC next >>
    < ^ >



    SiSU


    Viral Spiral - How the Commoners Built a Digital Republic of Their Own

    David Bollier

    2009


    The Wealth of Networks - How Social Production Transforms Markets and Freedom

    Yochai Benkler

    2006


    Free Culture - How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity

    Lawrence Lessig

    2004


    CONTENT - Selected Essays on Technology, Creativity, Copyright and the Future of the Future

    Cory Doctorow

    2008


    Democratizing Innovation

    Eric von Hippel

    2005


    Free As In Freedom - Richard Stallman's Crusade for Free Software

    Sam Williams

    2002


    Two Bits - The Cultural Significance of Free Software

    Christopher Kelty

    2008


    Free For All - How Linux and the Free Software Movement Undercut the High Tech Titans

    Peter Wayner

    2002


    The Cathedral & the Bazaar - Musings on Linux and Open Source by an Accidental Revolutionary

    Erik S. Raymond

    1999


    Little Brother

    Cory Doctorow

    2008


    Down and Out in the Magic Kingdom

    Cory Doctorow

    2003


    For the Win

    Cory Doctorow

    2008


    Free Software Foundation - FSF