|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 3 is a flowchart including the process of Figure 1 and several downstream processes that make use of the output provided by the present method. In particular, the output 36 from the processing technique 10 can be utilized to generate an html document 100, a XML document 200, a LaTeX document 300, word maps and/or indexes 400, RSS feeds 500, and provide data for a relational database 600. It should be understood that the output formats described herein are exemplary only, and that other output formats are possible and could be added. For example, additional types of output modules that utilize the output 36 of the technique 10, such as a Lout or a texinfo module, could be added to obtain other desired output type documents.
For HTML output the program must process each object. If a heading of a particular level exists it is used to build a table of contents, and provide the appropriate html markup: e.g. a bold or heading value. If segmented html is to be produced, the default level for splitting the document (level 4) is used, and the name for the segment associated with level 4 is used, or the segment is assigned a name. Endnotes are taken from the text object in which they are embedded, and placed either at the end of the segment, or in a segment of their own at the end of the document, or both. Native markup is converted to html equivalents, e.g. bold underscore and the like. Object citation numbering (created earlier in the processing technique 10 of Figure 1) is made available, (currently in the page margin beside the Object to which they belong). Document appearance modifiers are applied if instructed in the header. Semantic metadata is placed in the HTML header and made available for searching, and semantic metadata is provided in visible form at the end of the document, as additional document information.
For LaTeX output, which is used to generate pdf output, appropriate Latex headers are created which usually are the programs default standard, which instructs that there should be a table of contents, how page numbering should look and so on. Any additional processing instructions are used to mark up that document. For example, that there should be a new page or a new column between specified levels. LaTeX markup is applied to the document in place of native markup, including the markup of appropriate headings in LaTeX style, from which LaTeX will make its own table of contents as instructed, and markup of footnotes/endnotes. Object Citation Numbering is made visible, in a convenient way for common citation system, (currently in text margins, beside the Object to which they belong). It is noted that there is a Lout module that operates in similar manner to produce Lout output that is used to generate a pdf file output.
For XML the program must process each Object. The document structure is to define XML structure. XML structure may be embedded within different levels of elements, with appropriate tagging. It has been convenient to separate structural levels 1 to 3, and subsequent levels 4 to 6 (or 8 as appropriate). Alternatively, a more flat version of XML may be produced, where heading level information is provided as an attribute. Endnotes are either kept as embedded where they occur and XML tagged appropriately, or placed after the text of the object in which they occur with their appropriate tags, and a marker within the text as to where they occur. Object Citation Numbers are tagged and placed in the Object to which they belong. Semantic metadata at the head of the document.
For building indexes or Word Maps, the output document is scanned (if instructed to) against a prepared array of relevant subject specific terms. If no such prepared list is provided (and in any event for words not contained in the subject specific list), it is scanned to identify each unique word (that has not been excepted in a list of exceptions). An alphabetic list of all relevant words and or terms found in a document is prepared and presented, displaying against each word or term the Object Citation Numbers for each of the locations within the document at which each word or term is to be found. The Object Citation Number may be displayed as a live link to the location within the document to which it refers. Footnotes are indexed as part of the Object from which they are referenced, and indexed words/terms occurring within them are listed under the Object Citation Number of the Object to which they belong.
For building RSS feeds, semantic metadata is used from the input document 12 and a RSS feed is created as instructed, based for example on the documents' subject and date. Abstract or notes in semantic markup are used to describe the document, and links are created as instructed to the other types of output created.
In the case of populating a Relational Database (for example, a SQL database, such as PostgreSQL), different structures may be used to represent the document in the database, but the following has been useful and utilized. If they do not already exist from previous processing, four tables are created:
It is noted that an additional table may be created:
The relational database contains all the information required to reproduce the document in form for input to the first process 10 of Figure 1. The input for processing has a simple form, which leads to additional beneficial and interesting possibilities. For example, if comments are added to the relational database, they may be incorporated back into an original document as unnumbered objects, following the object to which they were attached, and the document could be reprocessed to include, or ignore these added comments. There are further possibilities, such as being able to instruct via the database that a particular object should be replaced by another object or objects. The complexity would lie in processing the database information to produce a document of the required form for input to the processing technique of Figure 1.
An example of the utility added to the database by combining the features explained above includes the ability to search content or metadata for matching documents. A result may be to get the titles, including the objects containing the search terms, along with their object citation number. Alternatively, the result could be just the titles and object citation numbers, which looks remarkably like a book index, of the specific search term with locations of all the results. These results can be obtained directly from the database, and can be used to pinpoint the text in one of the more richly marked up documents (vis html, XML, pdf).
SiSU Book Samples and Markup Examples
Viral Spiral - How the Commoners Built a Digital Republic of Their Own
David Bollier
2009
The Wealth of Networks - How Social Production Transforms Markets and Freedom
Yochai Benkler
2006
Free Culture - How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity
Lawrence Lessig
2004
CONTENT - Selected Essays on Technology, Creativity, Copyright and the Future of the Future
Cory Doctorow
2008
Eric von Hippel
2005
Free As In Freedom - Richard Stallman's Crusade for Free Software
Sam Williams
2002
Two Bits - The Cultural Significance of Free Software
Christopher Kelty
2008
Free For All - How Linux and the Free Software Movement Undercut the High Tech Titans
Peter Wayner
2002
The Cathedral & the Bazaar - Musings on Linux and Open Source by an Accidental Revolutionary
Erik S. Raymond
1999
Cory Doctorow
2008
Down and Out in the Magic Kingdom
Cory Doctorow
2003
Cory Doctorow
2008
Free Software Foundation - FSF
GPL - GNU General Public License