I2C: aborted transfers with fast reacting I2C slaves
Background
There was a time that my college came to me, and ask me to look into the issue related to I2C driver.
The issue is that, the I2C read/write functions defined in SDK sometimes fail to complete the operation. BTW, the chip and SDK are all designed and implemented by our company, Amazon, so we don’t have assistance from third party.
Busy Bus
As we have multiple harts to access I2C master engine, and retry method can solve the problem, we initially suspected that this issue could result from contention. Therefore I initially adopted RISV-V AMOSWAP.W/D (See “A” Standard Extension for Atomic Instructions in “The RISC-V Instruction Set Manual”) to implement a spinlock to protect the resource. Of curse, given we’re working on an ASMP architecture, the implementation needed further shared memory setting, but that’s another story.
void spinlock_aquire(uintptr_t* lock)
{
volatile uintptr_t lockval; lockval = 1; // initialize swap value
do
{
#if __riscv_xlen == 32
__asm__ volatile("amoswap.w.aq %0,%0,(%2)" : "=r"(lockval) : "r"(lockval), "r"(lock));
#else
__asm__ volatile("amoswap.d.aq %0,%0,(%2)" : "=r"(lockval) : "r"(lockval), "r"(lock));
#endif
} while (lockval);
}void spinlock_release(uintptr_t* lock)
{
#if __riscv_xlen == 32
__asm__ volatile("amoswap.w.rl x0,x0,(%0)" : : "r"(lock));
#else
__asm__ volatile("amoswap.d.rl x0,x0,(%0)" : : "r"(lock));
#endif
}
However, even spin lock invisible by all harts cannot preclude the issue, I start to wonder the root cause is not the one.
Arbitration
As we broke it down, we found that the register indicated the reason of abortion is because of a mechanism, arbitration.
See: https://www.nxp.com/docs/en/user-guide/UM10204.pdf
Arbitration, like synchronization, refers to a portion of the protocol required only if more than one controller is used in the system. Targets are not involved in the arbitration procedure. A controller may start a transfer only if the bus is free. Two controllers may generate a START condition within the minimum hold time (tHD;STA) of the START condition which results in a valid START condition on the bus. Arbitration is then required to determine which controller will complete its transmission.
To understand we can see the picture of I2C frame
When SCL is high, there are three possible situations:
- SDA rising edge: STOP
- SDA falling edge: START
- else: data
Consider a situation: in the middle of transmission, I2C slave pulls
SDA down quickly after falling edge of SCL.
This will cause an arbitration.
Patch: Linux kernel
I found a commit regarding the issue of Synopsys DesignWare IP(https://www.synopsys.com/designware-ip.html)
it turns out here is a workaround for this IP, while our SDK does not take care of. One can see `1080ee7e28e1c` for the commit
i2c: designware: Move SDA hold time configuration to common code
SDA hold time configuration is common to both master and slave code. It
is also something that can be done once during probe and do only
register write when HW needs to be reinitialized.
Remove duplication and move SDA hold time configuration to common code.
It will be called from slave probe and for master code from a new
i2c_dw_set_timings_master() to where we will populate more probe time
timing parameter setting.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
See: https://www.spinics.net/lists/linux-i2c/msg26520.html
I2C DesignWare may abort transfer with arbitration lost if I2C slave pulls SDA down quickly after falling edge of SCL.
Long story short, we can extend SDA hold time to alleviate the issue. Note that the unit is the number of circles of IC period
My solution
Then I came up with the patch
+#define BITS_PER_LONG 64
+#define GENMASK_INPUT_CHECK(h, l) 0
+#define __GENMASK(h, l) \
+ (((~UL(0)) - (UL(1) << (l)) + 1) & \
+ (~UL(0) >> (BITS_PER_LONG - 1 - (h))))
+#define GENMASK(h, l) \
+ (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
+#define DW_IC_SDA_HOLD_RX_MASK GENMASK(23, 16)
+
+static void setupSDAHold(...)
+{
+ uint32_t sda_hold_time;
+ READREG(base + SDA_HOLD_ADDR_OFFSET, &sda_hold_time);
+ if (!(sda_hold_time & DW_IC_SDA_HOLD_RX_MASK))
+ sda_hold_time |= 1 << SDA_RX_HOLD_BIT;
+
+ WRITEREG(base + SDA_HOLD_ADDR_OFFSET, sda_hold_time);
+}
+...
The approach success to solve the issue.
Reference
Conclusion
After the wrong speculation of busy bus and contention, the problem is more likely to do with SDA and SCL issue. While the real reason behind the whole issue still not fully disclosed, further trails are needed to make it possible to tell the whole story, the workaround above is proved to be helpful.
Moreover, the example implies that any products using the DesignWare IP are with potential of suffering the problem. Thus, when we develop SDK for the drivers, we will have to pay attention to this workaround.