4.2: A File Structure for The Complex, The Charging, and the Indeterminate
[Source: AUG. 24/11:00-12:30/GOLD ROOM 84 • ACM 20th National Conference/1965 SESSION 4: Complex Information Processing T. H. Nelson Vassar College, Poughkeepsie, N.Y. ] THE KINDS OF FILE structures required if we are to use the computer for personal files and as an adjunct to creativity are wholly different in character from those customary in business and scientific data processing. They need to provide the capacity for intricate and idiosyncratic arrangements, total modifiability, undecided alternatives, and thorough internal documentation. The original idea was to make a file for writers and scientists, much like the personal side of Bush's Memex, that would do the things such people need with the richness they would want. But there are so many possible specific functions that the mind reels. These uses and considerations become so complex that the only answer is a simple and generalized building-block structure, user-oriented and wholly general-purpose. The resulting file structure is explained and examples of its use are given. It bears generic similarities to list-processing systems but is slower and bigger. It employs zippered lists plus certain facilities for modification and spin-off of variations. This is technically accomplished by index manipulation and text patching, but to the user it acts like a multifarious, polymorphic, many-dimensional, infinite blackboard. The ramifications of this approach extend well beyond its original concerns, into such places as information retrieval and library science, motion pictures and the programming craft; for it is almost everywhere necessary to deal with deep structural changes in the arrangements of ideas and things. I want to explain how some ideas developed and what they are. The original problem was to specify a computer system for personal information retrieval and documentation, able to do some rather complicated things in clear and simple ways. The investigation gathered generality, however, and has eventuated in a number of ideas. These are an information structure, a file structure, and a file language, each progressively more complicated. The information structure I call zippered lists; the file structure is the ELF, or ]Evolutionary Lis't File; and the file language (proposed) is called PRIDE. In this paper I will explain the original problem. Then I will explain why the problem is not simple, and why the solution (a file structure) must yet be very simple. The file structure suggested here is the Evolutionary List File, to be built of zippered lists. A number of uses will be suggested for such a file, to show the breadth of its potential usefulness. Finally, I want to explain the philosophical implications of this approach for information retrieval and data structure in a changing world. This work was begun in 1960 - without any assistance. Its purpose was to create techniques for handling personal file systems and manuscripts in progress. These two purposes are closely related and not sharply distinct. Many writers and research professionals have files or collections of notes which are tied to manuscripts in progress. Indeed, often personal files shade into manuscripts, and the assembly of textual notes becomes the writing of text without a sharp break. I knew from my own experiment what can be done for these purposes with card file, notebook, index tabs, edge-punching, file folders, scissors and paste, graphic boards, index-strip frames, Xerox machine and the roll-top desk. My intent was not merely to computerize these tasks but to think out (and eventually program) the dream file: the file system that would have every feature a novelist or absent-minded professor could want, holding everything he wanted in just the complicated way he wanted it held, and handling notes and manuscripts in as subtle and complex ways as he wanted them handled. Only a few obstacles impede our using computer-based systems for these purposes. These have been high cost, little sense of need, and uncertainty about system design. The costs are now down considerably. A small computer with mass memory and video-type display now costs $37,000; amortized over time this would cost less than a secretary, and several people could use it around the clock. A larger installation servicing an editorial office or a newspaper morgue, or a dozen scientists or scholars, could cost proportionately less and give more time to each user. The second obstacle, sense of need, Is a matter of fashion. Despite changing economies, it is fashionably believed that computers are possessed only by huge organizations to be used only for vast corporate tasks or intricate scientific calculations. As long as people think that, machines will be brutes and not friends, bureaucrats and not helpmates. But since (as I will indicate) computers could do the dirty work of personal file and text handling, and do it with richness and subtlety beyond anything we know, there ought to be a sense of need. Unfortunately, there are no ascertainable statistics on the amount of time we waste fussing among papers and mislaying things. Surely half the time spent in writing is spent physically rearranging words and paper and trying to find things already written; if 95% of this time could be saved, it would only take half as long to write something. The third obstacle, design, is the only substantive one, the one to which this paper speaks. Let me speak first of the automatic personal filing system. This idea is by no means new. To go back only as far as 1945, Vannevar Bush, in his famous article "As We May Think "1 , described a system of this type. Bush's paper is better remembered for its predictions in the field of information retrieval, as he foresaw the spread and power of automatic document handling and the many new indexing techniques it would necessitate. But note his predictions for personal filing: "Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate~ supplement to his memory. "It consists of a desk, and while it can pres~mebly be operated from a distance, it is primarily the piece of furniture at which he works. On the top are slanting translucent screens, on which material can be projected for convenient reading. There is a keyboard, and sets of buttons and levers. Otherwise it looks like an ordinary desk. "A special button transfers him immediately to the first page of the index. Any given book of his library / and presumably other textual material, such as notes/ can thus be called up and consulted with far greater facility than if it were taken from a shelf. As he has several projection positions, he can leave one item in position while he calls up another. He can add marginal notes and comments, .... " (i, 106-7) Understanding that such a machine required new kinds of filing arrangements, Bush stressed his file's ability to store related materials in associative trails, lists or chains of documents joined together. "When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item. The user taps a single key, and the items are permanently joined .... "Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. It is more than this, for any item can be joined into numerous trails .... "Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item .... " (1, 107) Two decades later, this machine is still unavailable*. The hardware is ready. Standard computers can handle huge bodies of written information, storing them on magnetic recording media and displaying their contents on CRT consoles, which far outshine desktop projectors. But no programs, no file software are standing ready to do the intricate filing job (keeping track of associative trails and other stKuctures) Ithat the active scientist or thinker wants and needs. While WallaceZ reports that the System Development Corporation has found it worthwhile to give its employees certain limited computer facilities for their own filing systems, this is a bare beginning. Let us consider the other desideratum, manuscript handling. The remarks that follow are intended to apply to all forms of writing, including fiction, philosophy, sermons, news and technical writing. The problems of writing are little understood, even by writers. Systems analysis in this area is scanty; as elsewhere, the best doers may not understand what they do. Although there is considerable anecdote and lore about the different physical manuscript and file techniques of different authors, literary tradition demerits any concern with technical systems as detracting from "creativity." (Conversely, technical people do not always appreciate the difficulty of organizing text, since in technical writing much of the organization and phraseology is given, or appears to be.) But in the computer sciences we are profoundly aware of the importance of systems details, and of the variety of consequences for both quality and quantity of work that result from different systems. Yet to design and evaluate systems for writing, we need to know what the process of writing is. There are three false or inadequate theories of how writing is properly done. The first is that writing is a matter of inspiration. While inspiration is useful, it is rarely enough in itself. "Writing is 10% inspiration, 90% perspiration," is a common saying. But this leads us to the second false theory, that "writing consists of applying the seat of the pants to the seat of the chair." Insofar as sitting facilitates work, this view seems reasonable, but it also suggests that what is done while sitting is a matter of comparative indifference; probably not. The third false theory is that all you really need is a good outline, created on prior consideration, and that if the outline is correctly followed the required text will be produced. For most good writers this theory is quite wrong. Rarely does the original outline predict well what headings and sequence will create the effects desired: the balance of emphasis, sequence of interrelating points, texture of insight, rhythm, etc. We may better call the outlining process inductive: certain interrelations appear to the author in the material itself, some at the outset and some as he works. He can only decide which to emphasize, which to use as unifying ideas and principles, and which to slight or delete, by trying. Outlines in general are spurious, made up after the fact by examining the segmentation of a finished work. If a finished work clearly follows an outline, that online probably has been han~nered out of many inspirations, comparisons and tests . Between the inspirations, then, and during the sitting, the task of writing is one of rearrangement and reprocessing, and the real outline develops slowly. The original ~rude or fragmentary texts created at the outset generally undergo many revision processes before they are finished. Intellectually they are pondere~ juxtaposed, compared, adapted, transposed, and judged; mechanically they are copied, overwritten with revision markings, rearranged and copied again. This cycle may be repeated many times. The whole grows by trial and error in the processes of arrangement, comparison and retrenchment. By examining and mentally noting many different versions, some whole but most fragmentary the intertwining and organizing of the final written work gradually takes place***. Certain things have been done in the area of computer manuscript handling. IBM recently announced its "Administrative Terminal System"5,6,7, 8 which permits the storage of unfinished sections of text in computer memory, permits various modifications by the user, and types up the final draft with page numbers, right justification and headers. While this is a good thing, its function for manuscripts is cosmetic rather than organizing. Such a system can be used only with textual sections which are already well organized, the visible part of the iceberg. The major and strenuous part of such writing must already have been done. If a writer is really to be helped by an automated system, it ought to do more than retype and transpose: it should stand by him during the early periods of muddled confusion, when his ideas are scraps, fragments, phrases, and contradictory overall designs. And it must help him through to the final draft with every feasible mechanical aid-- making the fragments easy to find, and making easier the tentative sequencing and juxtaposing and comparing. It was for these two purposes, taken together-- personal filing and manuscript assembly-- that the following specifications were drawn up. Here were the preliminary specifications of the system: It would provide an up-to-date index of its own contents (supplanting the "code book" suggested by Bush). It would accept large and growing bodies of text and commentary, listed in such complex forms as the user might stipulate. No hierarchical file relations were to be built in; the system would hold any shape imposed on it. It would file texts in any form and arrangement desired-- combining, at will, the functions of the card file, loose-leaf notebook, and so on. It would file under an unlimited number of categories. It would provide for filing in Bush trails. Besides the file entries themselves, it would hold co~mnentaries and explanations connected with them. These annotations would help the writer or scholar keep track of his previous ideas, reactions and plans, often confusingly forgotten. In addition to these static facilities, the system would have various provisions for change. The user must be able to change both the contents of his file and the way they are arranged. Facilities would be available for the revising and rewording of text. Moreover, changes in the arrangements of the file's component parts should be possible, including changes in sequence, labelling, indexing and comments. It was also intended that the system would allow index manipulations which we may call dynamic outlining (or dynamic ~indexing). Dynamic outlining uses the change in one text sequence to guide an automatic change in another text sequence. That is, changing an outline (or an index) changes the sequence of the main text which is linked with it. This would permit a writer to create new drafts with a relatively small amount of effort, not counting rewordings. However, because it is necessary to examine changes and new arrangements before deciding to use or keep them, the system must not commit the user to a new version until he is ready. Indeed, the system would have to provide spin-off facilities, allowing a draft of a work to be preserved while its successor was created. Consequently the system must be able to hold several-- in fact, many-- different versions of the same sets of materials. Moreover, these alternate versions would remain indexed to one another, so that however he might have changed their sequences, the user could compare their equivalent parts. Three particular features, then, would be specially adapted to useful change. The system would be able to sustain changers in the bulk and block arrangements of its contents. It would permit dynamic outlining. And it would permit the spinoff of many different drafts, either successors or variants, all to remainwithin the file for comparison or use as long as needed. These features we may call evolutionary. The last specification, of course, one that emerged from all the others, was that it should not be complicated. These were the original desiderata. It was not expected at first that a system for this purpose would have wider scope of application; these jobs seemed to be quite enough. As work continued, however, the structure began to look more simple, powerful and general, and a variety of new possible uses appeared. It became apparent that the System might be suited to many unplanned applications involving multiple categories, text summaries or other parallel documents, complex data structures requiring human attention, and files whose relations would be in continuing change. Note that in the discussion that follows we will pretend we can simply see into the machine, and not worry for the present about how we can actually see, understand and manipulate these files. These are problems of housekeeping, I/0 and display, for which many solutions are possible. Elements of the ELF The ELF's File Operations Technical Aspects USES PRIDE Philosophy CONCLUSION References Figures