Diagnostic - TheTaoOfFieldbus

Go to content

VANISHED FRAMES / CRCS A PRAGMATIC APPROACH

The final purpose of this paper is to discuss some alternative methodologies in order to find
out and solve problems linked to EtherCAT ® networks
With some simple considerations (and a little bit of fantasy) it is possible to test more and simpler solutions and solve complex problems because nowadays the request that comes from the market requires always increasing number of functions, and networks are filled with thousand of devices.
Of course with EtherCAT everything is possible with amazing cycle time and very funny combinations that cover a wide range from I/O to motion control /Robotics and so forth.
I tested performances of different fieldbuses with Universities and Companies for many years  and the conclusions were always the same: EtherCAT is the best.
Now stop the BLA BLA and let us begin:
The basic configuration is composed by one EtherCAT Master, some nodes and two drives.
We can extend the final results  also for more complex situations without any problems and limitations.
In advance I also assume that the reader is comfortable with EtherCAT Slave terminology, for this purpose I suggest to read the manual “EtherCAT_ET1100_Datasheet_all_v1i8.pdf”.
It’s free and it’s an excellent point to develop a Master or Slave or both
The topology is the following :


In the picture there are the slaves wired and the list of the ports.
Preamble, every EtherCAT slave has 4 ports (always), beyond that it depends on how many ports are planned to be implemented (hw- develop).
The ports can have different states, it depends on how many slaves are wired to and also how the Master fixes the wished behaviour.
Our configuration has only node 1 with more than 2 ports enabled.
The remaining slaves work only with two ports : they have only 2 ports, the drives have two RJ45 connectors and the nodes are wired with the EBUS and inside they have only 2 ports. You can recognize soon the drives positions , nodes number 11 and 12
Below I recap the behavior of the frame when it is processed inside the ESC.


1
0EtherCAT Processing Unit0
2
0EtherCAT Processing Unit1 / 10
3
0EtherCAT Processing Unit1 / 12 / 20 (log. ports 0,1, and 2)
or
0EtherCAT Processing Unit3 / 31 / 10 (log. ports 0,1, and 3)
4
0EtherCAT Processing Unit3 / 31 / 12 / 20

For our purpose we need three things :
a)    For each single ESC, we can control the ports ‘s behaviour. There are registers to control and monitor the status. They are the following :
Reg : 0x0100:0x0103 - ESC DL Control
Reg : 0x0110:0x0111 - ESC DL Status
Most important are the following bits that allows to force the Open/Close state (as a door for a room):

0x100-0x103

Bits
Auto
Auto Close
Manual Open
Manual Close
Loop Port 0
9:8
00
01
10
11
Loop Port 1
11:10
00
01
10
11
Loop Port 2
13:12
00
01
10
11
Loop Port 3
15:14
00
01
10
11

We can also read the current status of the registers 0x110-0x111

b)    The network topology in every moment. The reason is simple we use some special command as APRD, this command is related to the number of slaves that are present on the network. If  we modify the configuration automatically the same command looses the real meaning and it is addressed to a wrong node. You can image the final result
c)    This is the nice point : To find out some free registers inside the ESC. This makes the difference. We need to use some place where it is possible to write and store some values in every moment (after it will become clear). Now there are not many areas available. Of course there are 4kbytes of registers and more RAM memory , the last depends from the ESC and also may be dangerous to use. We risk a conflict between Master and Slave. For this reason I will not use it. There are many reserved registers, but I cannot use them , they are write protected. But if you read well in the documentation there is also an area named USER RAM mapped from 0x0F80 to 0x0FFF. This sounds good. There is also a single register addressed to 0x0040. It’s the Reset register. If you write for three consecutive frames the characters 0x52 (‘R’),0x45 (‘E’),0x53 (‘S’) the ESC resets itself. We can use it because we write only one incremented bytes , for this reason it is not possible to write “RES”. My final choice is to use to USER RAM for a different motivation: if we have more cyclic frames or service frames we can apply the same technique and we can write in different bytes (the last point is important)


The most obvious question: Why am I compelled to write cyclically in one register?
Suppose that there are CRC errors on the network, the first reaction it is to use the CRC’s registers.
The ESC have many diagnostic registers and the area covered from 0x3xxx is filled of useful regs.
You can monitor the EtherCAT frames that crosses the whole network and if there is a CRC errors they log how many times the errors is present and what is the port involved.
Consider that when a CRC error is triggered it is not possible to write inside the ESC (there are exceptions- shadow buffer)
The reasons are different :

1)    Our goal is not to monitor the CRC errors only but also Lost Frame (Vanished Frame)
2)    The CRC registers are many. You are compelled to analyze the whole network and to read the whole list of registers (many bytes for each slave)
3)    You are compelled to know the whole topology for the network , sometimes it’s not so easy . For example if there are many branches it becomes complicated. Consider that many Masters offer only  the chance to read these values. There are not algorithms that indicate where is the error.
If you write in one register as 0x0F80 you can easily trace where the error is triggered with less effort.
What is the technique?
We use a BWR for the register 0x0F80 with one data byte. This value is incremented every cycle. If the WC that returns from BWR is different from the expected than we stop to command a BWR and we replace it with APRD.
If we wish to work with  the same cyclic frame or we replace the BWR with a NOP. In the second scenario we will use a different frame and also more APRD sub-commands . This increases the speed many times.
In this way we scan the whole network.
The command APRD reports the last value written for each slave and we can compare it with the last written. The first slave that reports a different value it is the source of the error!!!

This is easy and fast

Below a trace that shows the method.
Step1: EtherCAT Master writes the value 0x41 (BWR)




After there is an error on the network for some reason (CRC/LINK/LOST FRAME)
Step2 : EtherCAT Master starts to poll each slave (cyclic frame) from Node 1
Note that the command is Auto Increment , it means that each slaves increments the WC by 1 and start with an offset position of zero. Because the command is executed only for the slave that read an offset of zero.


Notes :
1)    Pros.  If the network is composed by 2000 (cycle time 1ms) slaves and you work with the same cyclic frame then     you need only 2000 of frames to scan the whole network, the  total time is 2s. But this is the worst scenario if the error is caused by the last slave. Different node position fires the error before.
2)    Pros.The total cost to insert the subcommand BWR (APRD) is 13 bytes inside the frame,  so splitted :
   1 byte -> data,
   12 bytes -> the sub command header
    You don’t risk to modify the DC scheduling.
3) Cons. It’s possible to write in the register only when the frame crosses  the   EtherCAT Processing Unit .
If the error is generated during the return => you are compelled to use the registers area 0x3xxx.
4)  Pros. When it is possible, with huge configurations, I suggest to favour branches than linear network. Let this problem be an exercise for you J.
In this case this technique gains again every credit.


LOST FRAMES (VANISHED FRAMES)
In this case I think that to use the technique described before it’s the best solution.
This time the CRCs registers can have a poor role (or null) to discover the problem.
These categories of errors fall down when there are hw noises (as the previous).
But for some reasons it is not possible to detect them.
One example is the control register 0x100 (bit 0), when it is set to TRUE, it can destroy every frame that is not an EtherCAT Frame.
Imagine that during the transmission the EtherCAT Type is modified, from the slave point of view this is not an EtherCAT Frame more, so it’s trashed for ever and you can say goodbye to your data L.
The problem is the following: What was the position where we lost the frame ?
Here it becomes hard to discover the cause.

But if we apply the same strategy used before with the following steps and variations we can capture and solve the problem easily :

a) We add the command BWR to the cyclic frame. This command writes in the register 0x0F80,  a value that is incremented every cycle.
b) If the frame doesn’t return than in the next cycle we replace the BWR with a NOP command.
In this way we have the last snapshot relative to machine or plant situation.
This part is important because we have also to link the problem with a well defined state machine (sw) related to the hw.
Many Masters restart from this point, they say ok there is  a problem we reset all and restart again. This is not a Casino where you can test if you are lucky.
c) But if the frame doesn’t return, how can I find the position relative to the error?
Simple, because we wrote in the register 0x0f80, if there was an error and the frame didn’t return.
In any case the frame crossed the network and wrote this value in the register in one, two or more slaves.
d) We use  the possibility to open and close the port to find where the register was incremented the last time. Because  we can close and open the door, we can limit the research inside of the whole network.
Below one example in which the frame didn’t return:



I wrote a simple tool to open and close the Ports.
In order to run it I have no need of a realtime system, it is enough your network card (without any particular stack sw)

Step 1 : Open the connection with Start Capture Button


Step2 : Send the command APRD relative to registers offset 0x110-0x111 for each slave




After that you read the current status you can decide to close a port for test purpose.
In real situation it’s better than you think on an algorithm to optimize the actions and reaction time, but this is a different story
Step3 : We close the port number 2 (Manual Close) relative to node 1
We push the button Send  Packet Write reg 101



As the picture below :



The trace confirms that the slaves present are only 10 now.



The register status reports that the link are still enabled, this is normal the wiring is still present and it ‘d be so always. When we recovery a network there is no need to unplug some cable. But the port is closed so every frame is routed to the next:



With the same mechanism we can check and control every ports until we find the point where the error was present.


Conclusion
We can apply the same technique to solve different kinds of problems.
I checked both solutions with excellent results!!!
I can only say that if there could be on hand some internal counters (inside to ESC ) that are able to trace the "correct frames", everything can became easier to implement. But they are not present now (in the future I don't know) so we have to find some alternatives
A number equals to 20 it is enough for every situations.
Someone said : "stay hungry stay foolish"
It's late , I wish an apple :)

Copyright notice
The website and its content is copyright of Nicola Urbano - ©Nicola Urbano -2010. All rights reserved.
Any redistribution or reproduction of part or all of the contents in any form is prohibited without to contact the Author
Wireshark and the "fin" logo are registered trademarks of the Wireshark foundation. EtherCAT® is registered trademark of ans licensed by Beckhoff Automation GmbH
Back to content