ICL 2966 restoration progress during 2010

Below you will find updates on some of the projects currently in progress, in planning or completed by the Museum volunteers.

Each project has either a working group or project team assigned to do the work. Working groups are either managed in association with the CCS (Computer Conservation Society) or solely within the Museum.

 

Below you will see updates on the 2966 restoration project during 2010. To see the progress made in 2009 click here.

12/07/2010 update from Delwyn Holroyd

Following the head crash last week, I examined the affected head in more detail with a USB microscope, and concluded that it will need to be replaced. However, since we don't currently have the necessary re-alignment tools and special disk pack required this will have to wait. (We are hoping to acquire these tools).

In the meantime, I transferred the reconditioned spindle from this drive into one of the others. I also replaced the drive motor bearings in the new drive and gave it a thorough clean. I checked the heads with the microscope and found they were very dirty - they will need cleaning before use. The drive was run for several hours without heads loaded to run in the new bearings.

Whilst that was happening, I turned my attention once again to the 7501 terminal. After checking the documentation in more detail it turns out it can't support UART style communications on the modem port after all, which means it can't be directly interfaced to a standard PC serial port. The buffer chips on the interface boards do support async operation but the board is strapped for synchronous operation only, without start or stop bits. Instead the SYN character (16h) is used to achieve byte alignment at the beginning of each message. The next step will be to wire up a loopback plug to check that the comms is working, and then construction of a suitable interface board.


05/07/2010 update from Delwyn Holroyd

It's been a while since the last update because of VCF, but nonetheless some progress has been made, and some steps backward....

On the Saturday of VCF the machine decided not to play nicely: the store cabinet indicated a +5V fault (although there wasn't one, luckily), it refused to boot from the laptop interface, and later in the afternoon the OCP overheat warning came on, although once again I am not convinced - it didn't appear to be any warmer than normal. We switched off anyway to avoid any risk of damage.

Things were better on Sunday, with the machine deciding to boot again after I cleaned the contacts on the off-card connector linking the DCU to the laptop. I suspect the real problem here is marginal signal quality due to the construction technique of the interface board (point-to-point mod wire with no ground plane). There was no OCP overheat today, but the +5V fault warning was still present. This will be a fault in the monitoring board.

Just before VCF the power supply for the 7501 terminal was re-tested by Phil H and found to be working - taking it apart seems to have fixed it so possibly just reseating the PCB connectors was all that was necessary. The week after VCF the power connectors onto the backplane were cleaned and the unit re-assembled. To my surprise it appears to be working! It can't do much without having a control program loaded into it, but the ROM code does some self-tests and has store dump and alter functions and these were used to dump the ROM contents to screen. The next step is to set up the interface board for standard async RS232 comms (in a mainframe application it uses synchronous comms). Once this is done it should be possible to interface the terminal directly to a George 3 emulator running on a PC, and George will download the control program which turns it into a functional terminal.

I have now finished building the EDS80 interface board - this is properly constructed on a PCB and even uses surface mount technology: slightly incongruous but it's much easier to obtain 3V3 logic level differential transceivers in surface mount. On Sunday the board was hooked up to the working drive ready for initial testing. After fixing a problem with one of the ribbon cables I was able to issue a 'select' command to the drive, and the drive responded with status information and it's selected signal. The data clocks from the drive were present but free running at around 14MHz since no diskpack was loaded. It was a great relief to find that I hadn't made any errors in the pinout on the cables to the drive.

Here comes the bad news - the next step was to load the scratch pack and see if data could be read. The heads loaded ok and the data clock signals went down to around 9.6MHz, the frequency expected when the PLL is locked to the servo track on the diskpack. Before I could do anything else, I noticed a high-pitched noise from the drive followed immediately by a burning smell - the drive was spun down within seconds but I had just witnessed a head crash, something I've been incredibly paranoid about avoiding at all costs.

As the disk slowed down the cause was immediately apparent - the bottom guard platter was bent, and there was some dust evident on the disk surfaces. At this point I realised it was not the normal scratch diskpack - during VCF they had all been moved around and without thinking I had picked up the wrong one. I expect you can guess how annoyed I am with myself about this!

Examination showed the crash was on the bottom head, closest to the bent guard platter. It's possible this generated enough disturbance in the airflow to cause the problem, or it could have been simply down to the dust on the pack. It was time to follow the procedures in the drive manual for head crash recovery, and the drive and heads are now clean again, but there is a slight mark on the affected head. I will be seeking further advice on this before using this drive again.

The greatest irony was when I noticed the number on the diskpack casing: 666 - truly the devil's diskpack!


31/05/2010 update from Delwyn Holroyd

This week I made a concerted effort to find the missing OCP board that was indicated as possibly faulty in the diagnostic run of several weeks ago. Whilst comparing the contents of the spares box with the actual board numbers that should be in the machine, I discovered one of the board numbers I had written down is not actually part of the machine! Sure enough the bag was mislabelled, and it contained the board we have been looking for. However, there are still only 29 boards in the box out of 30 in the OCP upper platter, so one of the set is missing and nowhere to be found amongst the other spares.

Swapping this board didn't make any difference to the error messages reported on a normal boot, not too surprising as we know there were a number of faults reported in the diagnostic run.

I also spent some time trying to diagnose the store block fault, which is still present - but I was unable to make it fail using the store self-tester. This might indicate an addressing error - the store self-test writes the same data to each location, so would not pick up on this.

Since we acquired the ICL 7501 terminal a month or so back I've been searching in the ICL archive at the museum for schematics, without any success. I had discovered that the related 7502 terminal processor has all it's diagrams grouped into a 'machine logic set' under one document number, which I found referenced in a technical description. I suspected the same thing would apply to the 7501, but how to find it? As luck would have it today I stumbled across the technical description for one of the 7501 boards (document number one higher than one I had already found), and this proved to reference the elusive machine logic set document, which contains the schematics for everything in the unit except for the Farnell made power supply. This will greatly assist any fault-finding that might be necessary.


22/05/2010 update from Delwyn Holroyd

The ICL 7501 power supply has been returned to Phil H for more detailed examination, but we are somewhat hampered by not having any schematics or other information for it since it's a Farnell made unit. I've made contact with a company that specializes in old Farnell power supplies in the hope they can turn up some information on it.

I re-assembled the EDS80 drive motor I took apart last week with new bearings and tried it in a drive. As this seemed to work ok I then removed the noisy motor from the 'good' drive to replace it's bearings. This one was extremely difficult to get apart - after some work with the rubber mallet the brake assembly finally came off the drive shaft (it just lifted free on the first motor...) and after quite a lot more persuasion the other parts were finally separated. Re-assembly was much more straightforward, and the drive now runs like new.

Work is also progressing on the design of a drive interface rig. This will allow the EDS80 drive to be controlled directly to read disk contents at the lowest possible level, in order to secure the data.


16/05/2010 update from Delwyn Holroyd

The working EDS80 drive was run once again for a number of hours without heads loaded to continue the spindle bearing run-in. The bearings in the drive motor are very noisy, and this is the case with most of the other motors too. I've disassembled one and ordered new bearings. Once this job is done the drive should run as quietly as when it was new! This is important because it will enable us to hear any unusual noises coming from the drive which might indicate an impending head crash.

The power supply for the ICL 7501 terminal was refitted to the logic chassis having been checked on the bench, and cables made to power the logic chassis and fans independently from the rest of the monitor. Under normal load the main +5V supply was fine but the +-12V and -5V supplies went out of spec, so it was quickly turned off again. This will now require a more thorough examination with all voltage rails under a representative load.


10/05/2010 update from Delwyn Holroyd

Some major progress to report... a serviceable EDS80 disk drive and the first run of the engineers diagnostic software on the machine.

The bearings have been replaced in one of the most seized up spindles, chosen because there was a risk that trying to dismantle it could have caused damage. This has now been fitted to the drive that the system was booted from just before it's spindle started to make unpleasant rattling noises. At the time we didn't know the construction of the spindle (there are no diagrams because it wasn't intended to be a field serviceable part) and it was unclear exactly what these noises might indicate.

As it turns out the spindles contain two sealed ball-bearings of a standard type which are easy to source. The main difficulty lies in the amount of force required to remove the pulley from the shaft. The lower bearing and pulley resist the pressure of a spring that pre-loads both bearings and eliminates any play in the assembly, and therefore have to be a very tight fit. The rattling noises are due to the bearings starting to break up because the 'for life' lubrication has degraded. If ignored this could lead to a catastrophic bearing failure and probably a head crash.

Another issue with replacing the drive spindle is getting the alignment correct. There was a special alignment tool for this, but we don't have one. The ICL engineer who maintained the system at Tarmac told us the alignment wasn't as critical as the maintenance instructions imply, and this has proven to be the case. I aligned the spindle essentially by eye to score marks on the drive casting marking the original position. The track positions are located by pre-recorded servo information on one of the surfaces, so it is only necessary to ensure that the heads move on a path passing through the centre of the spindle, such that the tracks run perpendicular to the heads.

After a run in of the new bearings without heads loaded, the drive was run with a scratch pack and heads loaded for some time without issue.

The next step was to load the engineers disk pack, which had not been done before. It proved to be in good condition and loaded ok. The machine booted from it happily and started to run the diagnostic test suite. The first part of this does detailed tests on the DCU, which all passed. Further tests identified faults in SCU couplers and in the OCP. One of the store blocks is also failing - a new fault. The fault codes can be checked against a listing which identifies the most likely board responsible. Unfortunately the first OCP fault indicates a board that is mysteriously absent from the box which contains an otherwise full set of spare boards - further searching will be required!

It's too risky to repeatedly load the engineers pack for diagnostics until the data on it has been secured, and this is now the most urgent task. The fact that this pack is readable is very good news indeed for the restoration.


26/04/2010 update from Delwyn Holroyd

The museum has recently acquired an ICL 7501 terminal on loan from the Jim Austin Computer Collection. Once restored, we intend to connect it to the 2966 as a user terminal. It's typical of the type of end user equipment used on ICL mainframe systems in the early 1980s.

ICL mainframes required terminals implementing proprietary communications protocols such as ICLC01 and ICLC03. Unlike Unix systems which use relatively dumb character based terminals, on an ICL system a complete message is constructed by the user on the terminal and then sent to the mainframe. This means the terminal needs to directly support cursor movement and message editing. It also has facilities for dividing the screen into protected and unprotected fields, typically used to display a form with areas for the user to fill in. Messages could even be validated by the terminal prior to sending to the mainframe, for instance checking that only numeric characters are entered in a particular field.

The 7501 is an integrated version of the earlier 7502 comms controller and a 7561 video terminal (the type used on the 2966 SCP operating station): instead of having the separate 7502 cabinet containing the controller logic it's built into the base of the terminal itself, resulting in a somewhat taller unit than the 7561 with a row of switches and LEDs below the screen.

Much of the controller logic is also shared with the SCP, with the familiar Minicom processor also found in the DCU, the modem board and memory boards in common. The main difference is the video display board which supports an 80-column display rather than the 40-column deemed more appropriate for system operators.

The 7500 series terminal controllers required 'teleloading' to obtain their control programs. The built-in ROM code has just enough intelligence to request a teleload from the mainframe, which then downloads the required program. As a consequence these systems do a good impression of being completely non-functional until this has happened. We'll be able to test this procedure under George 3 emulation on a PC: readers of this page will realise the 2966 is not quite up to the job yet! Luckily the required teleload utilities and control programs have survived in a dump of a George 3 filestore.

Very little 7500 series terminal equipment seems to have survived, so we are always on the look out. If you know where there are any of these distinctive orange terminals, or even the older blue and grey 7181 terminals, please get in touch with the museum.


07/03/2010 update from Delwyn Holroyd

The failed DCU power supply gave an opportunity to do some spring cleaning around the 2966 area last week, but this week I was able to resume work on the machine. Many thanks to our resident power supply expert Phil H for examining and testing the spare -5V supply: although it looked bad on the outside thankfully it was clean on the inside and proved to work. This has now been fitted in the machine. Meanwhile armed with some new LM311s Phil was able to repair the other unit and this will now be the spare.

I first of all checked that we hadn't suffered any more regressions: the machine still boots to the same extent it did before, and the store is still working.

The main task of the day was cleaning all the board edge connectors in the OCP (or CPU in today's terminology). It's not clear when this was last done, and the maintenance logs for the system show it was a fairly routine operation which frequently 'cured' faults (although whether this was down to the cleaning or the physical movement of the boards is open to debate). This revealed that the clock distribution board in the scheduler wasn't actually plugged in, which clearly wouldn't have been helping matters! I also confirmed that all the boards were in the correct slots.

Unfortunately, none of this changed the fault condition at all, so no easy short-cut in the diagnostic process!

The OCP is by far the most complex part of the system. Unlike the rest of the system it's built using ECL (emitter coupled logic) technology, and consists of sixty individual boards mounted on two backplanes. ECL is much faster than TTL, but consumes a great deal more power. The OCP doesn't obey 2900 target level instructions directly, instead it has a microcoded instruction set known as MICOS II, aided by the scheduler which breaks down the target level instructions into one or more microcode 'tasks'. This makes it fairly easy to emulate other instruction sets: 1900 and System 4 were supported (our machine has a 1900 decoder board). The basic clock beat is 80ns (12.5MHz) although some steps occur at 40ns. It has a pipelined architecture which allows one microcode instruction to be completed every clock beat. Target level instructions take a variable number of clocks depending on how complex they are. Most data paths are 32-bit, with 36-bit extensions in some places, and also support for efficiently converting to and from the 24-bit 1900 architecture.

Given the current completely non-functional state of the OCP, and without the aid of the diagnostic software it's difficult to know where to start. Over the last couple of months I've been scanning and studying all the detailed reference documentation from aperture cards in the archive. Armed with this knowledge the diagnostic registers are starting to make sense, but there's still a lot to learn!


21/02/2010 update from Delwyn Holroyd

Early in the day there was some difficulty in booting via the laptop interface, with the system indicating parity errors on the interface. This has happened before when the system is cold, but normally clears after a few attempts. Today after a few dozen attempts it was clear it wasn't going to. Cleaning the contacts on the off-card connector for the interface cable eventually cured it, and afterwards it worked reliably.

There were no problems with the store, and all blocks passed self-test again.

Unfortunately the -5V power supply in the DCU cabinet then chose to die, and the only spare looked in a very sorry state, so no more work can be done on the main cabinets until these have been looked at by our PSU expert Phil H.


15/02/2010 update from Delwyn Holroyd

The objective for this week was to get the store working, and I'm happy to say this was achieved.

Following the board replacements last time the store self-tester worked as expected, and soon showed that two of the four sub-stores had different stuck bits when reading back from any memory location. To narrow down the fault I swapped two of the sub-store control boards to see if the fault followed - however the fault actually disappeared! Reseating the control board for the other faulty sub-store also cured that fault. Presumably these are dry joints and I expect we haven't seen the last of them, but at least the cause should be clear if/when they do re-occur.

With a working store, the next task was to start replacing the boards swapped out last time to identify which actually had faults. It became clear that at least one fault was associated with the cabling between the second and third cabinets - the interface between the store and the coupler in the SCU. It now seems likely that some of the symptoms of the mysterious faults last time were cured by reseating the cabling. The cables are very solidly made woven ribbon with a sealed termination onto a standard header, and look to be in good condition. Hopefully contact cleaner will resolve any lingering reliability issues with these, although it's possible the terminations have degraded.

During the board replacement process, another of the sub-stores started to fail intermittently, and then permanently. This time when the control board was exchanged with another the fault followed, and the faulty control board was replaced with a spare.

Although the day ended with a full 8MB of working store, it's likely some of the intermittent faults will return. Hopefully they will become permanent, which makes them a lot easier to find!


03/02/2010 update from Delwyn Holroyd

After swapping many boards in the SCU, the cause of last week's fault was traced to the SM64 control module in the store cabinet. The behaviour during store initialisation is now different in several respects: not only does it take much longer, but the SCP configures the store into 'non-interleaved' mode.

The store module consists of four sub-stores, each divided into logical blocks. The sub-stores are normally operated in parallel, or 'interleaved' to speed up accesses to store - modern server motherboards use a similar scheme to increase memory bandwidth. If there are faults in one or more sub-stores the system can instead fall back to non-interleaved mode. Individual logical store blocks can also be marked bad and the system will avoid using them. A diagnostic status register indicates which store blocks are good.

Prior to the most recent fault, the 'good block' register had a consistent but unexpected collection of bits set. The machine only has a half-populated store module (8Mb, maximum is 16Mb) so all the bits should be set in one half of the register, but this was not the case. This behaviour together with some other oddities had made me suspect the store wasn't previously functioning properly at all.

With the replacement boards, the set bits are now all in one half as expected, and it appears that two of the four sub-stores are not functioning. This is of course entirely believable!

We don't have nearly as many spares for the SCU and Store modules as for the DCU, so repairs on failed boards will be necessary. Away from the museum work is continuing on building an expanded board test rig. The new test rig will also be compatible with DCU boards, but will have a larger number of I/O channels to interface with the SCU boards. Work is also continuing on scanning the relevant technical descriptions and logic diagrams from aperture cards.


25/01/2010 update from Delwyn Holroyd

Last week the boot process started to fail with an error message 'Invalid SCU coupler type', which I thought probably referred to an incorrect entry in the configuration file that had just been sent to the SCP at that stage - the file hadn't changed but possibly it was being corrupted on the way. In view of this and other strange behaviour seen last week I swapped the processor board in the SCP (system control processor), but to no avail. I also swapped the serial interface boards at each of the link between the SCP and the DCU to eliminate that as a source of corruption (somewhat unlikely, given that the SCP's control program had just been loaded successfully via the same route).

A bit more digging using diagnostic commands from the SCP showed that it wasn't possible to access any of the coupler registers in the SCU. This is done via the DCM (Diagnostic Control Module) which is attached to another serial interface on the SCP. In addition to their normal operation, all the registers in the SCU and it's couplers are connected together serially to form a number of loops. To read a register, the relevant loop is 'spun' so that the required bits are loaded serially into a buffer in the DCM. To write a register bits are loaded from the buffer into the loop.

After some fruitless attempts to read from coupler registers, it spontaneously started to work again! The boot now progressed beyond the invalid coupler type error, so this was clearly referring to a failed attempt to read the property code from a coupler (which identifies the coupler type). After a power cycle of the SCU cabinet, the registers were no longer accessible and we were back where we started. I suspect the cause of this fault probably lies within the DCM.

Even whilst the coupler registers were accessible, the boot process still failed at the store initialisation stage, and this time the 'SCU Reset' trick from last week didn't help. Unfortunately I have to conclude this was a red herring, and there is probably another intermittent fault that just happened to be taking a break last week.

I refitted a repaired 5V/150A power supply module in the DCU cabinet, testing it first with no load and then with a partial load (only some of the logic boards plugged in). Finally with all boards plugged in I balanced the three 5V supply modules so that they were sharing the load equally. Thanks to Phil H for the repair, which involved replacing a failed IC in the switching circuit.


18/01/2010 update from Delwyn Holroyd

I arrived at the museum fully prepared for a day of debugging the store, this being the area where the boot process has been failing. Before getting into that, I realised that following power-up the 'store running' indicator light on the SCU control panel was not illuminated, and a manual activation of the 'SCU Reset' control was necessary. When I attempted to boot again the store initialized successfully! The following steps loaded the initial OCP microprogram and started it running. At this point it stopped with another error, saying the OCP is faulty... unfortunately activating the OCP Reset control didn't help, not all problems are so easily cured!

Not having prepared for conducting OCP diagnostics, I decided to verify diagnostic access to the store from the SCP console. Even before the system has fully booted, diagnostic commands can be entered on the SCP in engineers mode. These allow access to the internal state of the SCU and it's couplers, main store and OCP. Enough of the registers were behaving as expected to convince me that the diagnostic interface was working, but a number of things did not behave as documented in the fault-finding and reference guides. It could be that the documentation doesn't match the hardware, or it could be a side-effect of a more subtle fault.

The store is divided into blocks, and part of the initialization procedure runs a self-test to validate each block and mark it as valid in a diagnostic register. From this register it appears that several store blocks are failing. A lot of further investigation will be required.

Towards the end of the afternoon, the boot process stopped working altogether, with a consistent error occurring at a much earlier stage. I suspect the problem is once again the serial interface between the DCU and the SCP: the original boards at both ends of the link had faults and were swapped out just before Christmas.


10/01/2010 update from Delwyn Holroyd

Over the Christmas break I spent some time examining a CME installation tape which I have in virtual form as a file on a PC. CME (Concurrent Machine Environment) allows the system to host VME (the native 2900 operating system), and a 1900 operating system at the same time. The installation tape contains a package of microcode for the machine, and it proved possible to extract all the necessary IPL elements from this to construct a virtual IPL tape. Most of the effort was spent in analyzing the first and second level bootstrap programs for the DCU to figure out what format they expect to find on the tape, and the commands being sent to the tape deck. I then adapted the software I wrote at the end of last year for the PC interface to make it emulate a tape deck in the required fashion. I tested it this week and successfully booted the system to the same point we had reached when booting from disk.

This is great news as it allows fault finding to proceed on the SCU and OCP without needing a working disk drive, and without any risk to our valuable bootable disk packs.

I also removed the spindle from one of the EDS80 drives for further examination.


Article created on the 12/07/2010

Back to the projects section

We rely solely on your financial donations to run the museum. Please help this important project!