Documentation/northbridge/intel/sandybridge/nri_read.md

   1 # Read training
   2
   3 ## Introduction
   4
   5 This chapter explains the read training sequence done on Sandy Bridge and
   6 Ivy Bridge memory initialization.
   7
   8 Read training is done to compensate the skew between DQS and SCK and to find
   9 the smallest supported roundtrip delay.
  10
  11 Every board does have a vendor depended routing topology, and can be equip
  12 with any combination of DDR3 memory modules, that introduces different
  13 skew between the memory lanes. With DDR3 a "Fly-By" routing topology
  14 has been introduced, that makes the biggest part of DQS-SCK skew.
  15 The memory code measures the actual skew and actives delay gates,
  16 that will "compensate" the skew.
  17
  18 When in read training the DRAM and the controller are placed in a special mode.
  19 On every read instruction the DRAM outputs a predefined pattern and the memory
  20 controller samples the DQS after a given delay. As the pattern is known, the
  21 actual delay of every lane can be measured.
  22
  23 The values programmed in read training effect DRAM-to-MC transfers only !
  24
  25 ## Definitions
  26 ```eval_rst
  27 +---------+-------------------------------------------------------------------+------------+--------------+
  28 | Symbol  | Description                                                       | Units      | Valid region |
  29 +=========+===================================================================+============+==============+
  30 | SCK     | DRAM system clock cycle time                                      | s          |              |
  31 +---------+-------------------------------------------------------------------+------------+--------------+
  32 | tCK     | DRAM system clock cycle time                                      | 1/256th ns |              |
  33 +---------+-------------------------------------------------------------------+------------+--------------+
  34 | DCK     | Data clock cycle time: The time between two SCK clock edges       | s          |              |
  35 +---------+-------------------------------------------------------------------+------------+--------------+
  36 | timA    | IO phase: The phase delay of the IO signals                       | 1/64th DCK | [0-512)      |
  37 +---------+-------------------------------------------------------------------+------------+--------------+
  38 | SPD     | Manufacturer set memory timings located on an EEPROM on every DIMM| bytes      |              |
  39 +---------+-------------------------------------------------------------------+------------+--------------+
  40 | REFCK   | Reference clock, either 100 or 133                                | MHz        | 100, 133     |
  41 +---------+-------------------------------------------------------------------+------------+--------------+
  42 | MULT    | DRAM PLL multiplier                                               |            | [3-12]       |
  43 +---------+-------------------------------------------------------------------+------------+--------------+
  44 | XMP     | Extreme Memory Profiles                                           |            |              |
  45 +---------+-------------------------------------------------------------------+------------+--------------+
  46 | DQS     | Data Strobe signal used to sample all lane's DQ signals           |            |              |
  47 +---------+-------------------------------------------------------------------+------------+--------------+
  48 ```
  49 ## Hardware
  50 The hardware does have delay logic blocks that can delay the DQ / DQS of a
  51 lane/rank by one or multiple clock cylces and it does have delay logic blocks
  52 that can delay the signal by a multiple of 1/64th DCK per lane.
  53
  54 All delay values can be controlled via software by writing registers in the
  55 MCHBAR.
  56
  57 ## IO phase
  58
  59 The IO phase can be adjusted in [0-512) * 1/64th DCK. Incrementing it by 64 is
  60 the same as Incrementing IO delay by 1.
  61
  62 ## IO delay
  63 Delays the DQ / DQS signal by one or multiple clock cycles.
  64
  65 ### Roundtrip time
  66 The roundtrip time is the time the memory controller waits for data arraving
  67 after a read has been issued. Due to clock-domain crossings, multiple
  68 delay instances and phase interpolators, the signal runtime to DRAM and back
  69 to memory controller defaults to 55 DCKs. The real roundtrip time has to be
  70 measured.
  71
  72 After a read command has been issued, a counter counts down until zero has been
  73 reached and activates the input buffers.
  74
  75 The following pictures shows the relationship between those three values.
  76 The picture was generated from 16 IO delay values times 64 timA values.
  77 The highest IO delay was set on the right-hand side, while the last block
  78 on the left-hand side has zero IO delay.
  79
  80 #### roundtrip 55 DCKs
  81 ![timA for lane0 - lane3, roundtrip 55][timA_lane0-3_rt55]
  82
  83 [timA_lane0-3_rt55]: timA_lane0-3_rt55.png
  84
  85 #### roundtrip 54 DCKs
  86 ![timA for lane0 - lane3, roundtrip 54][timA_lane0-3_rt54]
  87
  88 [timA_lane0-3_rt54]: timA_lane0-3_rt54.png
  89
  90
  91 #### roundtrip 53 DCKs
  92 ![timA for lane0 - lane3, roundtrip 53][timA_lane0-3_rt53]
  93
  94 [timA_lane0-3_rt53]: timA_lane0-3_rt53.png
  95
  96 As you can see the signal has some jitter as every sample was taken in a
  97 different loop iteration. The result register only contains a single bit per
  98 lane.
  99
 100 ## Algorithm
 101 ### Steps
 102 The algorithm finds the roundtrip time, IO delay and IO phase. The IO phase
 103 will be adjusted to match the falling edge of the preamble of each lane.
 104 The roundtrip time is adjusted to an minimal value, that still includes the
 105 preamble.
 106
 107 ### Synchronize to data phase
 108
 109 The first measurement done in read-leveling samples all DQS values for one
 110 phase [0-64) * 1/64th DCK. It then searches for the middle of the low data
 111 symbol and adjusts timA to the found phase and thus the following measurements
 112 will be aligned to the low data symbol.
 113 The code assumes that the initial roundtrip time causes the measurement to be
 114 in the alternating pattern data phase.
 115
 116 ### Finding the preamble
 117 After adjusting the IO phase to the middle of one data symbol the preamble will
 118 be located. Unlike the data phase, which is an alternating pattern (010101...),
 119 the preamble consists of two high data cycles.
 120
 121 The code decrements the IO delay/RTT and samples the DQS signal with timA
 122 untouched. As it has been positioned in the middle of the data symbol, it'll
 123 read as either "low" or "high".
 124
 125 If it's "low" we are still in the data phase.
 126 If it's "high" we have found the preamble.
 127
 128 The roundtrip time and IO delay will be adjusted until all lanes are aligned.
 129 The resulting IO delay is visible in the picture below.
 130
 131 **roundtrip time: 49 DCKs, IO delay (at blue point): 6 DCKs**
 132 ![timA for lane0 - lane3, finding minimum roundtrip time][timA_lane0-3_discover_420x]
 133
 134 [timA_lane0-3_discover_420x]: timA_lane0-3_discover_420x.png
 135
 136 **Note: The sampled data has been shifted by timA. The preamble is now
 137 in phase.**
 138
 139 ## Fine adjustment
 140
 141 As timA still points the middle of the data symbol an offset of 32 is added.
 142 It now points the falling edge of the preamble.
 143 The fine adjustment is to reduce errors introduced by jitter. The phase is
 144 adjusted from `timA - 25` to `timA + 25` and the DQS signal is sampled 100
 145 times. The fine adjustment finds the middle of each rising edge (it's actual
 146 the falling edge of the preamble) to get the final IO phase. You can see the
 147 result in the picture below.
 148
 149 ![timA for lane0 - lane3, fine adjustment][timA_lane0-3_adjust_fine]
 150
 151 [timA_lane0-3_adjust_fine]: timA_lane0-3_adjust_fine.png
 152
 153 Lanes 0 - 2 will be adjusted by a phase of -10, while lane 3 is already correct.