V.I. Benevelski, A.N. Lodkin, P.V. Neustroev, A.A. Oreshkin, and A.E. Shevel shevel@pnpi.spb.ru November 1992 #3The experience with a small transputer farm #3for off-line computing #2Abstract Some time ago we began our transputer farm project with reasonable help of Zeuthen institute on High Energy Physics (Germany). In this paper main results are described. ------------------------------------------------------- The need for the high speed computer systems is obvious for the several simulation techniques and data analysis in high energy physics (HEP). One approach is based on the farming technics. The "event parallelism" is a very trivial approach for physics experiments producing data packets of many different coincidence events. Because of their independency, sets of event data can be processed simultaneously without interprocessor communication. Each processor performs the same data treatment on a complete, private data set. In our case this approach of processor farm has been used to build up the transputer network. There is a bit different kind of the program parallelism which concerning the described above method. Often you need to calculate some value or the range of the values as a function on variable VAR. You can load the copy of the program computing the required values into a range of the microprocessors. Of course, every copy of the program will use different value of VAR. In this case if there are 3 microprocessors you can decrease the whole computation time by a factor three. If there are n microprocessors you may "speed up" your calculation by factor n. It is useful to emphasize that we have no any assumptions on the type of a computation program. - 2 - It is well known the specialized computing installation for the specific problem class is cheaper then to use another method. As a result the many physics research laboratories around the world built the specialized computer installation to decrease the cost of an arithmetic operation. These computer installations known as microprocessor farms were based on various microprocessors [1-5]. Some years ago many laboratories began to use the Inmos microprocessor - transputer [6-13]. This microprocessor is specially designed as computing device and as communicatiomn device. In this paper we assume the transputer T800. The T800 consists of a one chip architecture including a 32 bits CPU, a floating point arithmetic unit, 4 Kbyte of internal memory and 4 serial communication channels (links). From the autumn of 1990 we use the tansputer farm (showed in figure 1) consisting of the three trasputer modules and the mother board named IDA (the modules and mother board are produced in Zeuthen). Every transputer has the channel to send the data to our mainframe EC-1046 (like IBM/370-165). In order to do it we have developed the special transfer module which can join the transputer link and the our telecommunicating multiplexor controller [...] attached directly to the EC-1046 channel. The controller is a group device, i.e. up to 16 lines can be attached to the controller. Every line attached to the controller has its own EC-1046 channel address. In other words every transputer has the separate channel address in the EC-1046. The real data transfer speed is about 80-100 KB/sec. Actually the speed is limited by the controller because the transputer link speed is about 2 Mbit/sec. In our opinion this relatively low speed was not prevent us to gain the experience in the transputer farming. The software which we have in hand assumes the transputer link channels to be 100 percent reliable. When the transputers - 3 - are to be connected to a remote computer installation with long cables we cannot make this assumption. In our opinion it is very important to be able to detect and recover from effects due to malfunctioning equipment, faulty or missing cables, etc. In our case the data transfer between a transputer and the EC-1046 is performed by the newly developed routines realizing the special protocol (do not mix with the transputer link protocol). The routines permit to detect and recover from the accidental equipment errors. The real scheme of our transputer farm is shown in the figure 1. The farm elements ("worker") are arranged in linear chain, connected to one end to a "farmer" processor (IDA), which interfaces to the host computer IBM PC/AT. The host IBM PC has 80 MB HD, streamer magtape, 4 MB of core memory, processors 386/387 with 20/25 MHz. . - 4 - +------------+ | IBM PC/AT | | 80 MB HD | "host" | 4 MB core | | 386/387 | | 20/25 MHz | +-----+------+ | +---+--+ | IDA | | T800 | "farmer" | 2 MB | +---+--+ | "worker" | "worker" "worker" +----+---+ +--------+ +--------+ | Node 1 | | Node 2 | | Node 3 | | | | | | | | T800 +---+ T800 +---+ T800 | | 8 MB | | 8 MB | | 8 MB | +---+----+ +---+----+ +---+----+ | | | +---+----+ +---+----+ +---+----+ | Link | | Link | | Link | | module | | module | | module | | # 1 | | # 2 | | # 3 | +---+----+ +----+---+ +---+----+ | | | +-------------+-----------+ | +----------+ +--------+----------+ | EC-1046 | | Telecommunication +-------+ | | multiplexor | | # 1 | +-------------------+ +-----+----+ | Figure 1. . - 5 - | +----+-----+ | Magnetic | | tape | | unit | +----------+ Figure 1. The real scheme of St.PNPI transputer farm. Every "worker" has the several running tasks. The main task is a physics task (fortran code). The link task is the task which transmites the data from the left transputer to the right transputer and vice versa (fig. 1). The link task performs the data transmission between the main task and the transputer link as well. The data transmission are performrd in according the Standard Protocol - SP protocol which is available in ITOOLS. In the EC-1046 side we have special structure of the transputer farm software which running under VM/CMS. Every transputer is served by the special transputer Virtual Machine (VM) which have names VM1, VM2 and VM3. A transputer VM contains the program to transfer the data between a transputer and VM, the error recovery procedures, the diagnostics procedures and so on. There is the main transputer VM (MTVM) which collect the data from the VM1, VM2 and VM3. The MTVM has the possibility to rout the collected data to the real tape unit or to the pseudo tape unit on the disk storage. The data will be written on a single tape in according with the FIFO rule. To control the functioning of this software there is the special Transputer Operator VM - TOVM. Firstly, to start the transputer farm software on EC-1046 one should start TOVM. TOVM support the self explanated dialog and the help facility. Hence any operator will able to start the transputer session. During the transputer learning we met some difficulties. Main difficulty is the next. The using of DOS leads to the situation when any user farm session monopolizes the farm. This monopolization takes place without the dependency on a user - 6 - activity during the session. #2The main results The above transputer farm is placed in the table near PC in VME crate. A range of the benchmarks shown us that the farm throughput is 6-8 times higher then the EC-1046 throughput. On the other hand in double precision arithmetics the farm throughput is approximately equal to the i486/33MHz (see table 1). Double precision Linpack Whetstone (MWPS) MFLOPS The computer device Intel 80386+80387 20 MHz 1.5 0.2 Transputer T-800 20 MHz 2.1 0.46 Intel 80486 33 MHz 7.0 1.6 Intel 80860 40 MHz 17.7 6.2 Table 1. The comparison of various computer devices in computing power. In the first quarter of 1992 it was used over 700 hours for the calculations in the continued work [14]. One of the main results for us is adopting the transputer technology in our institute. On other hand, now there are many ways to build the microprocessor farms [...]. Several of this ways are concerning the use of transputer technology. Now the computer industry has the scaled production of the TRAnsputer Modules (TRAMs). The short list of the TRAM types is shown below: - 7 - TRAM convertor: transputer link - SCSI; TRAM convertor: transputer link - RS-422(232); TRAM convertor: transputer link - VME bus; TRAM convertor: transputer link - Ethernet; TRAM computer : transputer - Intel 80860. This list can be continued. In another words the transputer technology may be used as a common communication aid to concentrate the computing power of the various microprocessors. With above tools it is possible to build the computing farm which will have an appropriate cost/performance ratio. The possible computing farm structure can be seen below in the figure 2. The system software is Unix like operating system which includes the languages C, Fortran and others; the X-Window 11 user interface and so on. Firstly, this farm (fig. 2) may be used as a reasonable on-line computing installation with a good computing power. Secondly, this farm may be used for the off-line analysis. Finally, the farm hardware configuration may be changed very quickly to meet new the experiment demands. +----------------------+ +---------------------+ | T-805 + i860 (16 MB) +----+ T-805 + i860(16 MB) +--.... +-----------+----------+ +---------------------+ | | +---------+---------+ | Work station | +-------------------+ Figure 2. The scheme of the computing farm on i860s. - 8 - #2Conclusions 1. Actually most of the physics analysis programs may be accelerated by a parallel processing. 2. The microprocessor farm on the base of transputer networks is a flexible tool to realize the laboratory computing installation. In particularly, the computing power may be increased step by step. #2Acknowledgement The authors would like to emphasize the valuable technical help of Helen Fotieva and Helen Shulyak. Also we would like to thank for the useful discussion to the first transputer farm user S. Sherman and to other people who were interesting in our project. #2Literature 1. Martin Pohl Multiprocessors for high energy physics Computer Physics Communications 45(1987)47-60 2. Thomas Nash Event Parallelism: Distributed memory parallel computing for high energy physics experiments FERMILAB-Conf-89/120 May 1989 3. Paul B. Mackenzie Machines for Lattice gauge theory - 9 - FERMILAB-Conf-125-T May, 1989 4. Peter S. Cuper Computing and Data Handling Recent Experiences at Fermilab and SLAC FERMILAB-Conf-90/79 April 9, 1990 5. D. Lord, A. Fucci, P. Sphicas, P. Favre, J.P. Ikonen, M. Koratzinos, H. Masuch, A. Paton, C. Pirotte and Tether CERN, Emulators and parallel-processing compute servers CERN/ECP 90-9 CERN/CN 90-25 November 1990 6. S. Booth, R.W. Dobinson, D.R.N. Jeffery, W. Lu, K.M. Storr, and A. Thornton An evaluation of the Meiko Computing Surface for HEP FORTRAN farming CERN-DD/89/13 May 2, 1989 7. Anthony J.G. Hay The role of MIMD arrays of transputers in computational physics. Computer Physics Communications 56 (1989) 1-24. 8. Jaap Hoek Use of attached transputers hardware to VAX's for - 10 - offline analysis Computer Physics Communications 57 (1989) 503-504. 9. J.M. Carter, M.G. Green and T. Medcalf Transparent use of tranputers for off-line computation Computer Physics Communications 57 (1089) 495-498. 10. L.W. Wiggers and J.C. Vermeulen The use of transputers in the ZEUS online system Computer Physics Communication 57 (1989) 316-320. 11. C. Bizeau, A. Bogaerts, R.W. Dobinson, D.R.N. Jeffery, W. Lu, C. Parkman and Y. Perrin The use and possible abuse of transputers links Computer Physics Communications 57 (1989) 301-308. 12. Henrik Kristensen Evaluation of the CD-TSE transputer development environment for Apollo workstation CERN Computing & Networks Division CN/90/1 January 1990 10 p. 13. D. Fincham, P.J. Mitchell Multicomputer molecular dynamics simulation using distributed neighbour lists - 11 - Preprint of Daresbury Laboratory DL/SCI/P730T November 1990 14. A.A. Boloshov, V.V. Vereschagin, S.G. Sherman Pion-pion amplitude from pn->ppn reaction Nucl. Phys. A530, pp 660-678, 1991.