Abstracts for the EBTI/CBETA Conference 2008
| When: | February, Friday 15th - Sunday 17th, 2008 |
| Where: | Dharma Drum Buddhist College 法鼓佛教研修學院, Jinshan 金山, Taipei County, Taiwan R.o.C |
| Contact: | Dr. Marcus Bingenheimer (m.bingenheimer@gmail.com) |
Conference Abstracts:
- Keynote Address: Lewis Lancaster "The State of the Art in Digital Humanities" [abstract]
- Closing Remarks: Ching-chun Hsieh
- Huimin Bhikṣu: "From CBETA (Chinese Buddhist Electronic Text Association) to BIP (Buddhist Informatics Program) and IBA (Integrated Buddhist Archives) " [abstract]
- Venerable Bo Kwang: "Introduction to the Digitization of Hanguk Bulgyo Chonso"
- Jens Braarvig: "Thesaurus Literaturae Buddhicae" [abstract]
- David Germano: "Tibetan Canons & Integrated Reference Resources" [abstract]
- Tony K. Lin: "On the Ranjana Script in the Xuzangjing — with Focus on the Ranjana Letters from the CBETA Database" [abstract] [slides]
- John McRae: "The Online Presentation of the BDK Tripiṭaka Translation Series" [abstract]
- Charles Muller: "Translation and Textual Research Through the Combined Usage of Digital Canons and Digital Lexicons: Applications of the Digital Dictionary of Buddhism" [abstract]
- Kiyonori Nagasaki: "Outline of the Activities of the SAT Project" [abstract]
- Toshinori Ochiai: "The Digital Archives of Old Japanese Buddhist Manuscripts ― Currrent Plans and Their Implementation" [paper]
- Yun-Hee Oh, Eun-su Cho: "Tripitaka Koreana Knowledgebase - Text-Image Database for effective reading of the Text" [abstract]
- Morten Schlütter: "Problematizing the Digital: The Use and Misuse of Electronic Text in the Study of Chinese Buddhist History" [abstract]
- Min Bahadur Shakya: "The Digital Sanskrit Buddhist Canon: Its prospects and future" [abstract]
- Peter Skilling: "The Fragile Palm Leaves Database: Problems and Prospects" [abstract]
- Christian Wittern: "Reading the Text, Weaving the Web: Scholarly Interactions with Digital Text" [abstract]
- Yap Cheah Shen: "An open source, cross-platform and embeddable search engine for the Pali Tipitaka" [abstract]
Keynote address: The State of the Art in Digital Humanities
Keynote Speaker: Lewis Lancaster (University of California, Berkeley)
The history of information technology has shown that new generations of software and hardware are short-lived and dramatically different from one another. Applications of digital formats that were at the very limits of our capacities a few years ago, tend to become amusing as we compare them with current equipment and programs. At the same time, any tendency to sit back and wait for the future before entering into and contributing to technology advances cannot be characterized as the exercise of prudence. The reluctance of many scholars in the humanities to step forward and take risks in an age where printing has been encapsulated by the digital, has placed our fields at a disadvantage. Caution, with regard to the new, often only implies fear of failure. These are not offenses that apply to those in this conference. In the field of Buddhist studies, the hard work of creating data and strategies of research have come from this small community who are meeting here. I believe that you all deserve an enthusiastic expression of approval. You have been the pioneers who had the ability and the capacity for endurance and resolution. From your energy that has been directed toward the actual work of digitization, we have seen the rise of the productive power of scholars. The appearance of Buddhist canons in electronic medium has transformed what was once merely theoretical and speculative into a practical and fundamental part of scholarly research. The digital texts, dictionaries, and tools for users are the support from which future insights and research additions will be derived. The codex will not be soon replaced but it is less important as an element in study than was the case a decade ago.
We cannot yet be complacent about these developments. The rapidly evolving process of the digital world has not been completed. There are a whole range of events which mark the changes that still transform and provide the shifting identity of electronic complexity and progress. The challenges we face today are no less daunting than those of the past. Our goal must be to identify the next stage of the digital era and to consider all of the possibilities of applications and variations. One of the themes of this conference will be to discuss ways in which we can join forces for the sake of mutual support as well as action for interoperability of data sets. The information that has been produced by such hard labors must not be dropped into the forgotten corners of the internet or cached in such a fashion that it is either concealed or suppressed by our policies and practices. We must present our material in a manner that allows it to be studied or viewed in relationship to other sets of data as an intelligible part of a distinctive whole.
Title: From CBETA (Chinese Buddhist Electronic Text Association) to BIP (Buddhist Informatics Program) and IBA (Integrated Buddhist Archives)
Presenter: Huimin Bhikṣu (Dharma Drum Buddhist College)
1. Introduction
The Chinese Buddhist Electronic Text Association (http://www.cbeta.org) was founded on Feb 15, 1998 to assume the production (including tasks of input, collation, mark-up, processing of out-of-list characters, and search system etc.) of the Chinese electronic Buddhist Text, based on Taisho Tripitaka (Daizo Shuppansha©) Vol. 1- 851, and Shinsan Zokuzokyo (Kokusho Kankokai ©) Vol. 1- 90, under official grant for input and distribution by the copyright holders2 (as stated above). The Text is being shared on the Internet and distributed yearly in CDs. The most recent CBETA Chinese Electronic Tripitaka Collection Version 2008 (containing Taisho Tripitaka Vol. 1-55 & 85, Shinsan Zokuzokyo Vol. 1-88 and completed modern punctuations in 230 scriptures and 1411 volumes.) has been published on Feb. 15. The search function of Synonyms, exclude and Custom Title-Catalog are added.
2. Creation Procedure and Related Techniques for the CBETA Chinese Electronic Tripitaka Collection
2.1. Procedure for manual typing and optical character recognition
2.1.1 OCR kaeriten removal program
2.1.2 Computer program for replacing incorrect strings
2.1.3 Image analysis program
2.2 Process for treating out-of-list characters
2.3 Line head information
2.4 Collation by computer program
2.5 Image-text comparison
2.6 Manual proofreading
2.7 The markup process
3. Features of CBETA Chinese Electronic Tripitaka Collection Version 2008
4. Example of Informatics Program in Buddhist Education
To adapt to the informatics environment of the coming age, the Chung-Hwa Institute of Buddhist Studies worked out the Computers and Buddhist Studies Program (CBS Program for short) in 2001, and renamed it Buddhist Informatics Program (BIP) in 2002.
Students majoring in this program are required to select courses from three Buddhist Studies programs (Chinese, Indian, and Tibetan Buddhism) next to courses in information technology. DDBC also offers a degree course in Buddhist Informatics for students who major in Chinese, Indian, or Tibetan Buddhism.
BIP is based on common technologies in humanities computing (e.g. markup and programming languages), applied Buddhist informatics (e.g. web- and print publishing), and Buddhist teaching technology. The three-year curriculum includes fundamental, basic, and advanced courses and a final research project.
5. IBA - Integrated Buddhist Archives: An "Indra's Net" for Buddhist Studies
5.1 Rationale
5.2 Aims
A. Common Portal Site & Integrated Search Interface
B. Maintenance of Legacy Projects & Archiving
C. Establishing Guidelines for Best Practice and Review
D. Assistance with Realizing Projects
5.3 Organizational Form
Title: Thesaurus Literaturae Buddhicae
Presenter: Jens Braarvig (University of Oslo)
The TLB is a multilingual internet implemented searchable system to access Buddhist literature sentence by sentence in Sanskrit, Chinese, Tibetan and English. The presentation will focus on methodological and theoretical aspects of the set-up of such multilingual databases, as well as the linguistic, semantic and philological problems connected both with the construction of the TLB as well as its employment.
Title: Tibetan Canons & Integrated Reference Resources
Presenter: David Germano (University of Virginia)
The Tibetan Canons project is creating an XML-based thematic research collection of Tibetan canonical compilations as the basis for archiving associated resources and scholarship, as well as providing browsing, searching, and viewing facilities of texts that are integrated with powerful reference resources (historical dictionary, bibliographies, gazetteer, encyclopedia, maps, and multimedia database). The project began with the Nyingma Gyubum (tantric texts from the Nyingma tradition mostly excluded from the mainstream canon), and is now focused on the Kangyur and Tengyur (sutras/tantras and exegetical material respectively). The overall system is being built to accommodate catalogs as the front end to access text scans, input text editions, summaries, translations, and scholarship, while the associated text viewing system will allow users to view texts, and easily access associated reference resources by highlighting words in the text. The presentation will show the overall system in its present state, and discuss the intellectual, social and technical challenges of creating such associated reference resources in a manner that allows them to be collaboratively built by an broad community of scholars. While such an approach has great promise in terms of integrating reference resources into access to large textual collections, the creation of each reference resource in its own right is a complex process, and their mutual integration and cross-reference constitutes a further challenge. I will trace our past failures and successes in trying to address this fundamental opportunity and problem, show where we are at today, and trace a path for the project's future development. Finally, I will try to generalize our experience to offer a provisional model for approaching such issues in other areas.
Title: On the Ranjana Script in the Xuzangjing — with Focus on the Ranjana Letters from the CBETA Database
Presenter: Tony K. Lin (Deputy Director, Institute of Graduate Studies of Buddhism, Foguang University, Taipei)
The identification and deciphering of Sanskrit letters in Buddhist scriptures have always deserved special attention when undertaking digitization projects.
The present paper has its focus on the identification of the Ranjana letters in the Xuzangjing (the Supplement to the Manji Canon) from the CBETA database.
Sanskrit letters from the scriptures are processed by CBETA into two forms, i.e., font and image. In the current study, the Ranjana script from the Xuzangjing provided by CBETA belongs to font form. The Ranjana letters being deciphered in this paper total to 661. When going through these letters, the author of the present paper noticed that many of them could hardly be identified due to bad handwriting. Such illegibility also comes from unclear printing. Miscopy exists, leading to a situation where many Ranjana letters never match their Chinese transliterations. These examples show that in early times copying work of Buddhist texts sometimes was undertaken by those who might have not had sufficient Sanskrit knowledge.
In another aspect, Sanskrit letters in scanned image form– due to the fact that they are never actualized into font input projects– are often not proofread. As a result, there is less chance that errors can ever be located and corrected. In fact there are a good number of Sanskrit letters in image form known in the Utterances on Image-making and Iconometry《造像量度經》which add up to about 300. As far as these Sanskrit letters are concerned in the said text, the issues highlighted by this paper are worthy of more academic attention.
The significance of the identification and deciphering of Sanskrit letters in Buddhist texts not only consists in the aspect of documentation projects, but also in the developments of Buddhism such as the disseminations of the teaching methods or the proclamations of the sectarian doctrines, providing important sources for related studies.
Title: The Online Presentation of the BDK Tripiṭaka Translation Series
Presenter: John McRae, BDK
The BDK Tripiṭaka Translation Series is a project dedicated to the English translation of the Buddhist scriptures contained in the Taishō Shinshū Daizōkyō and other sources. Working from a target list of 139 texts for the first series, we have now published 65 different texts in 32 printed volumes. As of spring 2007 it was decided that, while continuing the publication and distribution of printed books through the University of Hawai'i Press, all of our translations should also be freely distributed online. Since our book production process is based on the use of Adobe Portable Document Format (PDF) files, initially we will distribute "facsimile" versions of the printed volumes, with only minor changes mandated by the online distribution format or required by other technical issues. Eventually, we also want to create an online database in which our English translations are presented simultaneously with the original Chinese and Japanese source texts, using concordance software in such a way that the user can quickly correlate source and translated passages, explore Buddhist technical terminology, etc. The oral presentation will describe our progress as of spring 2008, as well as our plans for the immediate future.
Title: Translation and Textual Research Through the Combined Usage of Digital Canons and Digital Lexicons: Applications of the Digital Dictionary of Buddhism
Presenter: Charles Muller (Tōyō Gakuen University)
Over twenty years have now passed since the beginning of the lexicographical compilation that has resulted in the Digital Dictionary of Buddhism (DDB), and over thirteen years have passed since its installation on the WWWeb. Originally uploaded with approximately 3,200 entries, this compilation of terms, text names, person names, school names, etc., contains, at the time of this writing, almost 44,000 entries, based on the contributions of 57 individuals. The DDB is also subscribed to by 18 university libraries from top-rate institutions in North America, Europe, and Asia.
Originally viewed by Muller first and foremost as a lexicographical tool for the translation of Buddhist canonical texts, the DDB is now fulfilling that role to a degree that is enhanced greatly by the concurrent maturation of canonical text digitization projects such as CBETA, SAT, RITK, and the digital Hanguk bulgyo jeonseo. As the usage of these digital canons grows in scope and sophistication, translators around the world can benefit immensely by the combined usage of digital canons and the DDB, both through the DDB's web implementation and its usage as a localized tool. In this presentation, we will demonstrate some of the benefits of combined usage of digital text and digital lexicon.
Title: Outline of the Activities of the SAT Project
Presenters:
Kiyonori NAGASAKI (Yamaguchi Prefectural University)
Masahiro SHIMODA (Tokyo University)
The SAT project, directed at present by Masahiro SHIMODA, was originated by late professor Yasunori Ejima in 1986 with the aim of building the text database of the Taishō Shinshū Daizōkyō, prior to the initiation of a similar project by the CBETA. These texts in this database, including around 6 million lines in 85 volumes composed of Buddhist canons compiled in India, China and Japan, have been checked character by character by Buddhist researchers by comparing them with the original texts of the first edition of the Taishō Daizōkyō, and with the database texts released by CBETA in 2005.
At first, our database was encoded in the Shift JIS system. At that time, missing characters were handled through the usage of the MOJIKYO-Font. Characters not contained in the MOJIKYO-Font were represented by the numbers of an original character code set. The structure of the database was based on the layout of the original Taishō volumes, using the structure of volume/page/paragraph/line. The first provisional version of the database, with some of texts left unchecked, was released in 2005 via the Internet.
At this point, we began to develop a viewer software program for our database based on the above format with the primary aim of facilitating the work of collaborating scholars. This program, able to represent the text in a vertical writing system and conducting rapid keyword searches, is implemented using Shockwave Flash, as this makes the system accessible to users of various operating systems, such as Windows, Mac OS X, Linux, and so on.
Since early 2006, we have introduced a Web-based collaboration system on GNU/Linux in order to improve the efficiency of our work. This new system has enabled a number of Japanese scholars in separate geographical locations to engage in real-time collaboration. With the introduction of this system, we have shifted from the MOJIKYO-Font (which includes too many licensing restrictions) to "GT-Font", which, having been developed by academic bodies, is distributed with no charge for academic use. As for the characters not included in the Shift JIS and the GT-Font, --approximately ten thousand Chinese characters--we have created them in GT-Font style and are distributing them in the Web-based character database, which functions in complete harmony with the Web-based collaboration system.
In July 2007, we completed the task of correcting the wrong characters of the database and released the software to the contributors, with all the Indian characters, around ten thousand in all, installed. Now, we have started to work on releasing our text database on our Web site. We will publish it in XML format in October 2007. Our present policy is to publish the database in a format close to the original bound volumes. In order to follow open standards, we will change the format of our database. In addition, we are preparing a functional Web site on which users can search comparable keyword or display some fragments which are needed by users via formatted URI.
In the immediate future, we will try to modify the database based on the results of various Buddhist studies published after the publication of the Taishō Shinshū Daizōkyō via our Web-based collaboration system. In order to deal with this task, we will consider a means of version control to allow users to decide whether to view the latest text or the previous one.
Title: The Digital Archives of Old Japanese Buddhist Manuscripts ― Currrent Plans and Their Implementation
Presenter: Toshinori OCHIAI (International Institute for Postgraduate Buddhist Studies 国際 仏教学大学院大学)
[paper]Title: Tripitaka Koreana Knowledgebase - Text-Image Database for effective reading of the Text
Presenters: Yun H. Oh (Director, The Research Institute of Tripitaka Koreana), Eun-su Cho (Seoul National University)
The Tripitaka Koreana and the Tripitaka Koreana Knowledgebase Project
The Tripitaka Koreana (hereafter the TK) is generally used to refer to the Korean collection of the Buddhist scriptures that were carved onto wooden printing blocks in the Goryeo dynasty of Korea(918-1392 CE), and, in many cases, it is used to designate a specific set of woodblocks which is currently stored at Haeinsa Monastery. To be exact, that which remains at Haeinsa today is the second edition (known as the Jaejo Daejanggyeong 再雕大藏經, 1236-1251), and it was preceded by the first edition(the Chojo Daejanggyeong 初雕大藏經, 1011-1087) and Uichon’s collection of scriptures and commentaries (敎藏, often known as 續藏經, 1090-1099).
One of the main goals of the Tripitaka Koreana KnowledgeBase (hereafter the TKB) project is the digital restoration of the text and image of the whole sets of the TK. A series of previous projects resulted in several versions of the TK text and the reference materials and data, such as the unicode version text, the text in variant character types, the text with punctuation marks, the bibliographical introduction of the sutras, the collation data of the TK and the Taishō Shinshū Daizōkyō, the dictionary of the variant characters, the dictionary of the buddhist terms, and etc. Through projects with support from Korean government, photo images of both the first and the second edition TK rub-prints are now being put into image database along with detailed bibliographical data.
Feature and Vision
The TKB aims to provide an efficient platform for new types of the TK study and the Buddhist Studies, that facilitates collated reading of both different editions of texts and different formats of a text, while also providing easy-to-use reference tools and materials at hand.
To achieve the goal of ‘efficiency’ in use, the TKB adops image streaming technique with dynamic zoom-in and zoom-out feature. Image format could carry information that cannot be conveyed with text format only, and it also could carry that might be useful, or even crucial, for other domains of study; such is the case with ‘Edged Strokes’(角筆, codes or marks made with sharpened wood pencil, etc, on or around Chinese character to note Korean pronunciation of it.) found in some of the photos of the first edition TK, which set fire to the academic community of medieval Korean language by providing important clues to their research. For general image viewing purpose, and especially for this case, zoom in and out could be so useful a feature.
The TKB will also include image tagging feature which can facilitate new ways of approach and access to original material (image data), thus encouraging new types of use and study. The feature will include tagging notes to the image or to specific area of it, searching the tagged notes, adding comments to the notes, thus facilitating communication among users.
We hope that the TKB, with primary function of TK knowledgebase and additional features of the above, could work as a stepstone to broaden the scope of the Tripitaka study in the future; from the study of text based research to the study of the Tripitaka Culture in general.
Title: Problematizing the Digital: The Use and Misuse of Electronic Text in the Study of Chinese Buddhist History
Presenter: Morten Schlütter (The University of Iowa)
In recent years, many scholars of Chinese Buddhism have been turning to digitized primary sources and other electronic resources to aid and enhance their scholarship. As more and more of such digital resources are becoming available they are having an ever-increasing impact on the study of Chinese Buddhist history, and clearly in many ways are set to revolutionize the field. This paper will be an exploration of how especially the availability of digital primary sources, both Buddhist and secular, is transforming the study of Chinese Buddhist history in complex ways that are highly promising, but also pose a number of challenges.
Surely, the positive dominate: new, almost unimagined, research possibilities have been created by the release of the CBETA version of the Buddhist canon, together with collections of Chinese secular primary sources such as the Siku quanshu electronic database developed by Digital Heritage in Hong Kong. In this paper, I will give several examples from my own research of how combining searches of the CBETA canon with searches of electronic text versions of various Chinese historical sources can give us new insights into several different areas of Buddhist history that would have been almost impossible to achieve in the pre-digital era.
On the other hand, we tend to use electronic primary sources with great enthusiasm but little reflection. I will argue in this paper that we need to be more sensitive to the question of how the availability of digitized texts may influence the way we do research: what kinds of questions we ask, what sources we choose to use, and indeed how we read primary sources. The programming interfaces of editions of electronic texts may also steer our searches in certain directions, and bracket our research in other ways. Drawing on my own research experience, I will give some examples of how the use of (and the very availability of) digitized primary sources can lead to distorted and misleading research results, and discuss strategies of how we might prevent or at least deal with such problems. The paper will conclude with some reflections from a user’s perspective on what kind of features are especially desirable in editions of electronic primary sources (and problems that should be avoided), as well as some thoughts on the kinds of resources for the study of Chinese Buddhist history that we may hope will be developed in the future.
Title: The Digital Sanskrit Buddhist Canon: Its prospects and future
Presenter: Min Bahadur Shakya (Director, Nagarjuna Institute of Exact Methods)
The Nagarjuna institute of Exact Methods (NIEM) is a non-profit, educational foundation in Nepal which promotes research into the Sanskrit Buddhist Canon. The scriptures which make up this canon comprise over a hundred thousand printed pages. The Digital Sanskrit Buddhist Canon (DSBS) website, hosted at the University of The West and input at NIEM, aims to form a comprehensive collection of digital Buddhist scriptures in Sanskrit. Already about 200 Sanskrit Buddhist e-texts are available online, including many of the most common and important texts.
Furthermore, the DSBC represents a starting point for devising a modern and universal Buddhist ‘canon’. Sanskrit texts form the basis of all Mahayana schools, which today are followed by hundreds of millions of people; and some traditions, such as the Newar Buddhists of Nepal, still use Sanskrit materials directly. The DSBC thus fulfils a real demand for online access to, and propagation of, the basic Indian Buddhist texts.
With the rapid progress made by the DSBC, it is now possible to conceive a Buddhist canon in a standard language, Sanskrit, usable by and accessible to all. However much effort and support will be needed to meet this distant goal, we must ensure that the canon is properly categorized, textually sound, and sufficiently inclusive. Practically speaking, this means: inputting more texts; creating a concordance of the Chinese and Tibetan translations; better search and markup; mechanisms for correcting errors in texts; working with specialists to organize the canon on scientific principles; and providing a venue to publish new editions.
Title: The Fragile Palm Leaves Database: Problems and Prospects
Presenter: Peter Skilling (Bangkok)
Fragile Palm Leaves is an independent non-profit foundation based in Bangkok. The foundation seeks to preserve and study the Buddhist literature of Southeast Asia. It holds a considerable collection of palm-leaf and paper manuscripts in the languages and scripts of the region, including Burmese, Mon, Thai, and Thai Khün. There is a large collection of Pali manuscripts, mostly on palm-leaf in the Burmese script. The Burmese and Burmese manuscripts are being catalogued with some assistance from the Pali Text Society (UK).
The Fragile Palm Leaves database is in Filemaker Pro for Macintosh. It contains data on 10,340 manuscripts containing 21,385 titles. This may well be the largest data pool of information on Pali and Southeast Asian Buddhist literature, giving access to titles, authors, donors, and other details.
There are, however, problems. The database use Times Norman font for Macintosh and Ava Laser (the latter for one character only), and is therefore not universally accessible. The complex architecture of the database was never quite finished, so that some of the fields are imperfect. Fragile Palm Leaves received initial help from the Pali Text Society, which arranged for someone to design the database. Since then improvements have been made by several experts, and most recently assistance was received from the Max Planck Institute, Berlin. Problems, however, persist, and Fragile Palm Leaves lacks the expertise or staff to solve them.
Title: Reading the Text, Weaving the Web: Scholarly Interactions with Digital Text
Presenter: Christian Wittern (Kyoto University)
The increasing amount of digital resources available today for scholarly investigation poses the problem of how to make productive use of these materials with some urgency. With 'productive use', I mean engaging with digital resources that goes beyond simple searches, but rather tries to give computational support to more of the scholarly activities, that for textual scholars include (but are not limited to) the acts of reading, editing, annotating and translating.
Every act of reading is, according to Peter Shillingsburg, a script act that adds a new page to the history of the text, and thus in some ways contributes to the establishment of a new text, if a text is understood in a broad way including the context of its perception and influence.
I have been interested in the possibilities of an interactive reading environment for scholars for some time, having given it different names like the "scholar's workbench", "System for Markup And Retrieval of Texts (SMART)" and more recently "KanDoku". In this presentation, I will not only recount some of the ideas behind this, but also try to further develop them and show in a new prototype how they can be put to use with current technology to develop a collaborative workspace for the reading, annotation and translation of (in this case) premodern Chinese, especially Buddhist texts.
Title: An open source, cross-platform and embeddable search engine for the Pali Tipitaka
Presenter: Yap Cheah Shen (Ksana Search Forge 剎那搜尋工坊)
Data format:
TEI conformance: the Tipitaka XML is provided by VRI, based on The Six Dhamma Councils. the internal encoding is Unicode Devanagari script. http://www.tipitaka.org
Single database for multiple script display:
Devanagari, Roman, Cyrillic, Gujarati, Kannada, Malayalam, Myanmar, Sinhala, Bengali, Gurmukhi, Khmer, Telugu, Thai, Tibetan.
Technical highlights:
Supported platform: Windows 2000/XP/Vista/CE, Linux, FreeBSD, Mac OS X platforms.
Embeddable: low memory foot-print (less than 8MB), zero configuration needed, provides simple API for working with other front-ends easily.
Supported media: CDROM/DVD run directly without installation. USB disk, Flash disk.
Search Engine:
The search engine is orginally designed for CJK texts, but works well for English and Pali texts. we have tested the engine for databases up to 10GB (English Wikipedia) on small devices (OLPC One-laptop-per-child and Asus eeePC).
Word search: begin with, end with, containing , regular expression*)
Advanced search (under development): Phase, AND , OR , NOT, NEAR
Search speed: most queries will complete in 0.05 seconds . (tested on ordinary PC below $300 in 2007 )
Indexing speed: 2 MB per second. the size of Tipitaka is about 300MB, which can be indexed in 3 minutes.
Development Roadmap
Granularity of tagging: Improve the granularity to paragraph and sentense, in order to create parellel corpus with other Tipitaka in Chinese and other languages.
More CPUs platforms: our goal is a searchable Tipitaka on PDA and Smart Phone.
License
The program is licensed under GPL 3.0 , see http://gplv3.fsf.org/ for detail. The Pali Tipitaka is maintained and released by Vipassana Research Institute (VRI)