file: savesys.doc #2THE PROPOSAL #2to realize the copy-restore #2procedure for PCs and work stations A.E. Shevel shevel@pnpi.spb.ru 1/4-92 2/4-92 3/4-92 20/4-92 12/5-92 In any computer system, even in personal computers it is necessary to maintain the integrity of the user data. Many program systems such as data bases have their own adhoc developed program utilities to do this work. On the other hand in many work stations the users have not only data base management program systems. In this case one should be careful to maintain the integrity of user data of any types. The standard way to do this is to use the appropriate utilities such as BACKUP, various archivers and so on. However to maintain a reasonable level of reliability of the user data several different copies of the data are required. It is recommended to have two or more copies. For instance, if yesterday you have copied your data on to one cartridge(s), today you should copy your data on to another cartridge(s). This is true if you work intensivly enough and the copy is performed every day. Often to reduce the risk of the losses to a minimum one will need more than two sets of the copies. In general, to maintain the integrity of your data a - 2 - precisely specified procedure to copy and restore the data is required. It is clear, that this procedure should maintain a detailed protocol of copy-restore (recover) operation, i.e. appropriate bookkeeping. The main scheme of this procedure will be discussed below. The three general aims must be achieved by the copy-recover procedure. On one side we attempt to guarantee the highest level of reliability of the data integrity. On the other hand the volume of a regularly copied data should be reduced. Finally, a user has not to remember the copy or recover sequence. It has to be enough if the user knows the procedure name and where are the cartridges (diskettes). THE MAIN FEATURES The copy-recover system is a menu driven program dedicated to perform a copy and recover (restore) of a working area on a hard disk. A logical disk [MS DOS], directory or list of logical disks [MS DOS only] or directories may be regarded as a working area. The copy-recover is able to maintain three levels of a data copy: - stable base copy (may be once a month or more seldom); - base copy (may be once a week or more seldom); - current copy (may be done once a day or more often). There is a stability check which to be done when the copy -recover system runs a STABLE BASE COPY or BASE COPY. The stability check consists of a computation of file control sums and a comparison these file control sums with previous ones. - 3 - The messages about instability of the files or directories will be issued and placed in a copy protocol. Any type of a copy process generates a copy protocol which will be written on secondary storage media. The copy-recover system guarantees the high reliability of a user data on a hard disk and reaches a minimum data volume to be copied at the same time. The copy-recover system permits to use any type of secondary storage: diskettes, tape cartridges, robotic mass storage subsystem, network file-servers. The copy-recover system generates the report about its own status. It is possible to recover one file or, a list of files, directory or a list of directories. The copy-recover system runs under Unix (or MS DOS). The copy-recover system is driven by a control file which is a part of the copy-restore system. The control file contains all control parameters. Any parameter may be changed by means special menu. The copy-recover system is activated by a user call. A user prints the name of this system: #2CRSYS1#0. The system will display on the screen a main menu: 1. To start the copy process. 2. To start the recover process. 3. To start the check of secondary storage pieces. - 4 - 4. To display reports on status and control parameters. 5. To change control parameters. 6. To exit from system. THE DETAILS ON SECONDARY STORAGE Suppose we have a set of secondary storage pieces (cartridges or diskettes). The copy-recover system has to be used to initiate these cartridges. The initialization process consists of several steps: - check a quality of a cartridge; - write a label on a cartridge. A cartridge label looks like the following: <#2UserName#0><#2number#0> where: #2UserName#0 is a user name; #2number#0 is a cartridge number in this cartridge set. The copy-recover system "knows" the number of cartridges in the copy-recover set. Hence, every initialization process increases this number by one. That may be done automatically. Also one cartridge may be excluded from a copy-recover set. The excluding may take place thanks to the fact that a cartridge has became erroneous. A possible menu is: - 5 - 1. To check cartridge. 2. To exclude cartridge. 3. To exit to main menu. It is useful to add that system has automatically to inform a user about cartridges to be required to do a copy or recover operation. A user does not need to know which cartridges are to be allocated to the CURRENT TIME COPY or BASE COPY. THE MAIN COPY-RESTORE SCHEME During the first execution this procedure should to do the #2STABLE BASE COPY (SBC)#0. It means the procedure will copy all data from a work area of the Hard Disk (HD). A work area is defined by a parameter or by default. It may be any directory, a list of directories, logical HD [MS DOS only], whole HD [MS DOS only]. The copy process consists of the following steps: o compress the next file; o compute file control sum; o create the next line in copy protocol; o if this copy is not the first SBC the comparison with previous SBC should be done to check stability. The copying is performed on a block media, for example on diskettes, cartridges, robotic mass memory subsystem or network file-server. In any case the copy-restore system should know the available volume in secondary storage to perform all kinds of copies. All needed control data are placed in a special #2CONTROL FILE#0. At least two sets of SBC cartridges are required. - 6 - From time to time as defined in CONTROL FILE the system should perform so called #2BASE COPY#0 #2(BC)#0 of your data. The BC is performed as follow. If a file already is copied to STB and not changed from that time, the only reference to SBC will be written on BC instead a file itself. If this BC is not first one the comparison between neighbour BCs should be done to decide "Is it useful to do SBC or not?" The stability is checked also. The period between the neighbour BASE COPY is called #2BASE #2COPY#0 #2INTERVAL (BCI)#0. It is easy to understand that the BC should be done on various sets of cartridges. The number of various base copies is called the #2NUMBER OF BASE COPIES#0 #2(NBC)#0. As a rule the performing of BC is a complicated and time consuming process. The desire to do it not very often is obvious. Indeed the value of BCI may be large, for example one week or even one month. To guarantee the minimum risk of the losses one should have so called #2CURRENT TIME COPY (CTC)#0. The CTC is copy only those fragments of user data which have changed from time of BASE COPY. During CTC an integrity check is not performed. It is supposed that the volume in bytes of CTC is less than volume of BC. It will be take into account the volume of CTC to volume of BC ratio which may be expressed in percents. This value we shall call #2CCR#0. The BC is performed when BCI has elapsed or when the CCR becomes higher than the appropriate value, for instance, 70%. It is easy to see that if you have several sets of copies BC and CTC it is difficult to remember which set of cartridges - 7 - should be used today. The same problem will arise during the restore of the user data from a range of SBC, BC and CTC copies. With this approach we need to develop a program system which is able to maintain the copy-restore scheme described above. This copy-restore program should support all needed bookkeeping for that copying. For example, the system has to print the next appropriate cartridge label to be inserted to copy or restore the user data. This system should check every cartridge and inform user about the wrong cartridge or wrong sequence of cartridges. It is clear this procedure must maintain the whole information about all copies and must prepare the report on the user request. To do that the copy-restore procedure writes the copy protocol on HD and secondary storage every time it makes any copy. THE RECOVER PROCESS To recover the data a user calls the system and chooses the appropriate option in the main menu. After that he may see the following: 1) recover the file(s) 2) recover the directory(ies) 3) recover the logical disk(s) [MS DOS only] 4) recover the HD [MS DOS only]. THE ENVIRONMENT - 8 - Which is an OS environment where such a procedure is to be useful? Firstly, we regard the PC under Unix (Xenix) with 1-4 users or more. Hence every user should be able to do the needed copy without learning a manypaged manual. He needs only to know how to call the procedure and where are the cartridges. Secondly, we regard a work station under Unix or Unix like operating system. If your computer is attached to a network it is possible that you can use a centralized file-server or robotic mass storage subsystem. In this case you do not need personal cartridges and centralized media may be used in the same manner as described above. THE REMARKS ON TECHNICAL REALIZATION Of course, the realization should be done with the appropriate tools like the WINDOW tools. It is supposed this can contribute to be portable. ----- The end of proposal -------