Timestamp:
07/26/99 21:35:15 (25 years ago)
Author:
Joel Sherrill <joel.sherrill@…>
Branches:
4.10, 4.11, 4.8, 4.9, 5, master
Children:
220ad7d
Parents:
29e68b75
Message:

Patch from Eric Valette <valette@…> based on a tremendous
bug report from David Decotigny <David.Decotigny@…>:

During the last few days, I've been back working on RTEMS. Let me
remind you that RTEMS didn't boot on our (old) Dell P90 machines (ref:
PC 590) : we could only get a reboot out of them.

1/ The symptoms
---------------

Hopefully, the problem was rather deterministic. The stack couldn't be
written correctly : issueing one or more "push" would always push '0'
onto the stack. The way to solve this was to issue a "pop", such as
"pushl eax ; popl eax". After this "pop", the stack would be writeable
again.

BUT, it will be writable for 8 consecutive "push"s. After these 8
"push"s, the other "push"s are wrong again, and a blank push/pop is
needed.

Considering that the L1 cache lines of this pentium are 32 bytes long,
and that 8 long int are 32 bytes long too, it came to us that there
was a problem with the cache.

Actually, the bug of the push could be shown through memory accesses
directly : writing on an not-in-cache mem location would put 0 until
this mem location is accessed through a single "read". Then, the whole
cache line would be right again.

2/ The consequences
-------------------

Of course, that was the first thing that we've been able to observe ;)
RTEMS could not boot. Actually, when a "call" pushed 0 onto the stack,
the ret could only lead to raise an exception a bit later. Since, in
the early stage, the Interrupt vector points to 0, averything couldn't
get worse : triple fault + reboot.

3/ Explanation
--------------

This cache mechanism corruption only appeared after load_segment()
returned (through a jump). Investigating a bit further shows that this
appears /sometimes/ during the PICs initialization.

"Sometimes" proved to be "When writing something with the 4th bit of
%al set". That is "when writing 0x28 or 0xff" for example. Clearing
this bit would just make the things work right.

Actually, this isn't a bug in the proper PIC initialization (which is
quite academic). It came from the "delay" routine, which theoretically
does nothing but writing to an "inexistant" port (0xed), in order to
lose some time.

BUT, in the special case of our Dell P90, it appears that this 0xed
port does something cruel with the cache mechanism when its 4th bit
(aka bit 3 or 0x8) is set.

I didn't investigate this non-standard behaviour of the P90 any
further : I don't know if this is documented, or if it is just another
(known ?) bug of the early Pentiums. Just notice that we have 5 such
machines, and it has the same effect on the cache mechanism.


(No files)

Note: See TracChangeset for help on using the changeset viewer.