1 Technical notes on my code additions of 12 and 17 Feb 00
5 - Dumb UPSes were not working in the 3.7.0 release, see below.
6 Both Riccardo and I are surprised that we didn't have any
7 dumb testers :-(. Fortunately, we now have two volunteers
8 (Al Tuttle and Christian Moeller) -- thanks guys.
9 - The TIMEOUT value on the slave could shortened (too quick)
10 by the value of NETTIME. The TIMEOUT timing now begins
11 from the time the slave (or master) considers it is really
14 Changes submitted this submission:
15 - Two users reported that dumb UPSes worked fine in 3.6.2, but not
16 at all in 3.7.0. The main problem reported was that apcupsd
17 reported loss of serial communications with the UPS.
18 - As best I can determine, there were four problems:
19 1. The code (since a long time) has called smart_poll() even
20 for dumb UPSes. Prevously, this just caused a 5 second wait
21 for a dumb UPS. However, recently I added code that checks
22 for loss of communications. Solution, prevent dumb UPSes
23 from executing smart_poll(). Note, there were a number of
24 different tests in the code for distinguing SmartUPSes and
25 dumb UPSes. I standardized on the following:
27 if (ups->mode.type > SHAREBASIC)
28 it is a smart UPS (character signalling)
30 it is a dumb UPS (ioctl signalling)
31 2. A number of the serial port status flags tests were bitwise
32 tests resulting in a zero or nonzero value, where the nonzero
33 value was not 1. The code in apcactions.c then tested for
34 zero or 1, possibly overlooking a valid status. I corrected
35 both the code in apcaction.c to check for zero or nonzero, rather
36 than zero or 1. I also corrected the code in apcserial.c that
37 picks up the flags bits to return a zero or one. If you are
38 not familar with the following construct:
40 variable = !!(flag & bit);
42 Note, that this expression sets variable to zero if (flag & bit) is
43 zero and sets it to 1 if (flag & bit) is nonzero. It could also
46 variable = (flag & bit)?1:0;
48 which some people prefer.
49 3. Finally, a real show stopper was in the dumb UPS code in
50 check_serial() in apcserial.c. The code picked up the serial port
51 line flags, then did a read_andlock_shmarea(), which promptly
52 erased the value of the serial port flags. The fix was to do
53 the read_andlock_shmarea() before the ioctl().
54 4. Under certain circumstances, which seem to depend on whether
55 apcupsd was executed from a command line or from a script,
56 when the serial port was reset to its original state on termination
57 of apcupsd by a tcsetattr() command, the status bits of the
58 serial port would be reset, thus signalling the UPS to drop
59 power. This caused the premature killpowers, and is aparently
60 also the reason why there were previously a number of sleep()
61 calls before the termination -- they simply masked the problem.
62 - I added a new -R option to the command line that puts a SmartUPS
63 into dumb mode. I did not document it because it is primarily for
64 testing. In my case, I don't have the correct cable, but at least,
65 I can exercise the dumb mode code.
66 - I merged in the changes that Riccardo sent me for 3.7.1
67 - I modified the message which is printed if a slave attempts to
68 do a power kill, which is not possible, but happens because the
69 install inserts the call in the halt script. The message now
70 simply says that the killpower was ignored rather than printing
71 FATAL ERROR. This is a little less concerting to the user.
72 - The nologin_file flag was passed from the master to the slave. It
73 prevented the slave from setting a nologin file, so I removed it.
74 - I noticed several times that the TIMEOUT for the slave seemed
75 to be incorrect -- that is the slave shutdown rather rapidly. It
76 turns out that the time variables were reset only when a pass was
77 made through do_action(). One pass is made each time the slave it
78 contacted by the master, which is NETTIME. Consequently if you had
79 a net time of 60, the slave TIMEOUT value could triger up to 60
80 seconds before it should have. I corrected this by resetting the
81 timeout values when the slave detects that it is REALLY on batteries
82 i.e. the second on battery signal.