[etherlab-users] R: System randomly freezes in multi-thread Qt application with a RT process
Viola Roberto
roberto.viola at systemceramics.com
Fri May 31 09:30:00 CEST 2019
Hi Simone,
first of all the guc firmware aren’t a issue, so go on ☺
In order to understand the issue i think you should try to split your application in small pieces: in this way we can try to delimit the perimeter of the issue.
You could try to run a simple application (using the ethercat example that you can find on the repository) that only read or write some objects from your slaves and step after step you can add pieces of code in order to identify when and where the issue happens. I don’t know how much is big your app, but from my point of view it’s the only way to achieve some results.
Other note: in your first log i saw that the crash happened in EoE context, did you use it? Could you try to disable it and test it again?
Have a nice weekend
Roberto
Da: Simone Comari [mailto:simone.comari2 at unibo.it]
Inviato: mercoledì 29 maggio 2019 15:50
A: Viola Roberto <roberto.viola at systemceramics.com>; etherlab-users at etherlab.org
Cc: Edoardo Ida <edoardo.ida2 at unibo.it>
Oggetto: R: System randomly freezes in multi-thread Qt application with a RT process
Hi Roberto,
Thanks for following up, much appreciated.
We tried the same setup on a laptop (Dell Inspiron-5567, Intel® Core™ i7-7500U CPU @ 2.70GHz × 4), dual booted Windows 10/Ubuntu 16.04.6 but the behavior remains the same.
I noticed nevertheless that both workstations have the same driver for the video card (i.e. i915). I also noticed that at the end of RT kernel build (I think when making modules_install) there was a warning about a couple of missing firmwares for this device:
possible missing firmware /lib/firmware/i915/kbl_guc_ver9_14.bin for module i915
possible missing firmware /lib/firmware/i915/bxt_guc_ver8_7.bin for module i915
So I copied the missing files (taken from here<https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915>) into /lib/firmware/i915 and updated with following command:
sudo update-initramfs -u
I then recompiled both the RT kernel and etherlab libs, but except for the disappeared warnings, nothing changed.
The video card I'm mounting is the following:
$ lshw -C video
*-display
description: VGA compatible controller
product: HD Graphics 620
vendor: Intel Corporation
physical id: 2
bus info: pci at 0000:00:02.0<mailto:pci at 0000:00:02.0>
version: 02
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:280 memory:de000000-deffffff memory:b0000000-bfffffff ioport:f000(size=64) memory:c0000-dffff
I've tried to stress both CPU
$ stress --cpu `nproc` --vm `nproc` --vm-bytes 1GB --io `nproc` --hdd `nproc` --hdd-bytes 1GB --timeout 60s
stress: info: [6624] dispatching hogs: 4 cpu, 4 io, 4 vm, 4 hdd
stress: info: [6624] successful run completed in 60s
and video card
$ glmark2
=======================================================
glmark2 2014.03+git20150611.fa71af2d
=======================================================
OpenGL Information
GL_VENDOR: Intel Open Source Technology Center
GL_RENDERER: Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2)
GL_VERSION: 3.0 Mesa 18.0.5
=======================================================
[build] use-vbo=false: FPS: 1392 FrameTime: 0.718 ms
[build] use-vbo=true: FPS: 1494 FrameTime: 0.669 ms
[texture] texture-filter=nearest: FPS: 1220 FrameTime: 0.820 ms
[texture] texture-filter=linear: FPS: 1370 FrameTime: 0.730 ms
[texture] texture-filter=mipmap: FPS: 1379 FrameTime: 0.725 ms
[shading] shading=gouraud: FPS: 1352 FrameTime: 0.740 ms
[shading] shading=blinn-phong-inf: FPS: 1356 FrameTime: 0.737 ms
[shading] shading=phong: FPS: 1334 FrameTime: 0.750 ms
[shading] shading=cel: FPS: 1365 FrameTime: 0.733 ms
[bump] bump-render=high-poly: FPS: 1004 FrameTime: 0.996 ms
[bump] bump-render=normals: FPS: 1474 FrameTime: 0.678 ms
[bump] bump-render=height: FPS: 1496 FrameTime: 0.668 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1141 FrameTime: 0.876 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 847 FrameTime: 1.181 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1543 FrameTime: 0.648 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 670 FrameTime: 1.493 ms
[desktop] effect=shadow:windows=4: FPS: 891 FrameTime: 1.122 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 638 FrameTime: 1.567 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 493 FrameTime: 2.028 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 690 FrameTime: 1.449 ms
[ideas] speed=duration: FPS: 1265 FrameTime: 0.791 ms
[jellyfish] <default>: FPS: 1258 FrameTime: 0.795 ms
[terrain] <default>: FPS: 189 FrameTime: 5.291 ms
[shadow] <default>: FPS: 982 FrameTime: 1.018 ms
[refract] <default>: FPS: 360 FrameTime: 2.778 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 1339 FrameTime: 0.747 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 1337 FrameTime: 0.748 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 1329 FrameTime: 0.752 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 1343 FrameTime: 0.745 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 1343 FrameTime: 0.745 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 1315 FrameTime: 0.760 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 1221 FrameTime: 0.819 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 1275 FrameTime: 0.784 ms
=======================================================
glmark2 Score: 1142
=======================================================
and no apparent issues arised.
I'm attaching two log sessions from my latest laptop trials. Inside you can find a brief description of the operations carried out for each session.
I noticed that in one of them it looks like one source of the freezing is related to rt_mutex, which I'm not really confident with but I have the feeling is the actual source of our problem (probably in the way we implemented the ethercat communication in our code).
Please let me know if you have any suggestions or need any other information.
Thanks again,
Simone
Roberto Viola
Technical Dept
+39 0536836680
[cid:image6abc58.JPG at 5be950e6.42bbc0b5]
SYSTEM CERAMICS S.p.A.
Via Ghiarola Vecchia, 73
41042 Fiorano (Mo) ITALY
+39 0536 836111
info at system-electronics.it<mailto:info at system-electronics.it>
www.system-electronics.it<http://www.system-electronics.it>
[cid:imaged460d1.JPG at c521fac3.45979b49]
________________________________
Le informazioni contenute in questa email, inclusi i suoi allegati, sono riservate e ad uso esclusivo del destinatario. Qualora le fosse pervenuta per errore, lei non è autorizzato a copiare, inoltrare e/o rendere nota questa email e i suoi allegati, totalmente o parzialmente, e pertanto la preghiamo di cancellarla immediatamente senza visionarne il contenuto e gli allegati.
Avvertenza: la presente casella e-mail ed i messaggi da essa derivanti, sono di esclusivo utilizzo aziendale /lavorativo e mai personale.
Risposte al presente messaggio: si avvisa il destinatario che eventuali sue risposte, potranno essere lette dall’intera azienda /ufficio /reparto di appartenenza del mittente.
The information contained in this e-mail, including attachments, is confidential and exclusively for the use of the intended recipient. If you received this communication by mistake you are not authorized to copy, send and/or publish this message and its attachments, in whole or in part and therefore please delete this message.
____________________________________________________
SIMONE COMARI
Research Fellow
DIN – Dept. of Industrial Engineering
Alma Mater Studiorum – University of Bologna
Via Umberto Terracini, 24, 40131 Bologna (BO), Italy
E-mail: simone.comari2 at unibo.it<mailto:simone.comari2 at unibo.it>
Websites:
https://www.unibo.it/sitoweb/simone.comari2
http://grab.diem.unibo.it<http://grab.diem.unibo.it/>
________________________________
Da: Viola Roberto <roberto.viola at systemceramics.com<mailto:roberto.viola at systemceramics.com>>
Inviato: lunedì 27 maggio 2019 10:57
A: Simone Comari; etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>
Oggetto: R: System randomly freezes in multi-thread Qt application with a RT process
Hi Simone,
from the logs it seems a issue releated to your i915 (video card, i guess inside your CPU).
You should try to understand the cause of issue: i suggest to try without the ethercat and stressing the cpu and the video card in some other way. I guess it’s not related to ethercat.
BTW what videocard do you have?
Did you try to catch the logs on other systems (laptop for example)?
R.
Da: Simone Comari [mailto:simone.comari2 at unibo.it]
Inviato: giovedì 23 maggio 2019 12:51
A: Viola Roberto <roberto.viola at systemceramics.com<mailto:roberto.viola at systemceramics.com>>; etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>
Oggetto: R: System randomly freezes in multi-thread Qt application with a RT process
Hi Roberto,
First of all, thank you for your quick response.
Attached you can find the kernel.log and system.log of a single session, that is:
1. Boot up
2. Application launch (successful ethercat network setup) through Qt IDE
3. Successful enabling of a single motor (i.e. one of the ethercat slaves) through our GUI
4. Simple operation (e.g. manual velocity control) until problem occurs (it took a couple of minutes this time)
5. Hard shut-down of the "frozen" system
I hope these are the logs you were talking about, please let me know otherwise.
Maybe it's worth mentioning we followed these<https://github.com/UNIBO-GRABLab/cable_robot/wiki/Installation> instructions to install both the RT kernel and ethercat libs, just in case we misused patches or configurations.
Thanks again.
Best regards,
Simone
Roberto Viola
Technical Dept
+39 0536836680
[cid:image001.jpg at 01D51792.D2B48E50]
SYSTEM CERAMICS S.p.A.
Via Ghiarola Vecchia, 73
41042 Fiorano (Mo) ITALY
+39 0536 836111
info at system-electronics.it<mailto:info at system-electronics.it>
www.system-electronics.it<http://www.system-electronics.it>
[cid:image002.jpg at 01D51792.D2B48E50]
________________________________
Le informazioni contenute in questa email, inclusi i suoi allegati, sono riservate e ad uso esclusivo del destinatario. Qualora le fosse pervenuta per errore, lei non è autorizzato a copiare, inoltrare e/o rendere nota questa email e i suoi allegati, totalmente o parzialmente, e pertanto la preghiamo di cancellarla immediatamente senza visionarne il contenuto e gli allegati.
Avvertenza: la presente casella e-mail ed i messaggi da essa derivanti, sono di esclusivo utilizzo aziendale /lavorativo e mai personale.
Risposte al presente messaggio: si avvisa il destinatario che eventuali sue risposte, potranno essere lette dall’intera azienda /ufficio /reparto di appartenenza del mittente.
The information contained in this e-mail, including attachments, is confidential and exclusively for the use of the intended recipient. If you received this communication by mistake you are not authorized to copy, send and/or publish this message and its attachments, in whole or in part and therefore please delete this message.
____________________________________________________
SIMONE COMARI
Research Fellow
DIN – Dept. of Industrial Engineering
Alma Mater Studiorum – University of Bologna
Via Umberto Terracini, 24, 40131 Bologna (BO), Italy
E-mail: simone.comari2 at unibo.it<mailto:simone.comari2 at unibo.it>
Websites:
https://www.unibo.it/sitoweb/simone.comari2
http://grab.diem.unibo.it<http://grab.diem.unibo.it/>
________________________________
Da: Viola Roberto <roberto.viola at systemceramics.com<mailto:roberto.viola at systemceramics.com>>
Inviato: giovedì 23 maggio 2019 08:00
A: Simone Comari; etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>
Oggetto: R: System randomly freezes in multi-thread Qt application with a RT process
Hi Simone, just a quick hint in order to understand the freeze: try to run the setup inside a VM (kvm or virtualbox) in order to catch the serial log from the kernel or, if you have a UART avaiable on your system, directly from it.
In this way we should try to understand the issue better.
R.
Da: etherlab-users [mailto:etherlab-users-bounces at etherlab.org] Per conto di Simone Comari
Inviato: mercoledì 22 maggio 2019 18:52
A: etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>
Oggetto: [etherlab-users] System randomly freezes in multi-thread Qt application with a RT process
Hi all,
I am a young research fellow at the university of Bologna and I just started working with EtherCAT technology and RT systems yet, so please forgive me if I misuse words or I'm not precise enough.
First, I'll try to describe my setup:
* Ubuntu 16.04.6 with patched fully preemptible RT kernel 4.13.13-rt5
* Qt 5.12.2
* PCI driver e1000e
* Ethercat master running on this Linux RT
* Elmo GOLD SOLO WHISTLE Drives (ethercat slaves)
Secondly, a brief outline of my software architecture:
* POSIX threads
* Qt-based GUI running on a non-RT thread
* Ethercat network setup (ethercat master and slaves init) done in the same non-RT thread
* If initialization is successful, start a new RT-thread in charge of handling all ethercat-related functionalities (read/write/status-check).
* Shared resources between RT and non-RT ones handled with pthread_mutex (even if I'm not 100% sure I'm using it correctly)
* Implementation of our generic ethercat master can be found here<https://github.com/UNIBO-GRABLab/grab_common/blob/e5278b6fe611654bfa84c951d8b77e56ebbc8fa9/libgrabec/src/ethercatmaster.cpp>
Problem description:
* Once the ethercat network is setup and the RT thread is started, quite randomly the system freezes without errors of any sorts. Sometimes it happens when motors are enabled and operational, sometimes when they are enabled and idle, sometimes even if they are disabled. It is not reproducible and I couldn't link it to any particular step in my application. Sometimes it happens even if I simply start it, but always after successful initialization.
* Even when I manage to close the application, next time I try to run it it tells me that master is busy and ec_e1000e is in use. Only solution is to manually hard-shut-down the PC.
* Other thing I noticed is that even if the main thread (the GUI one, so non-RT) is closed, the child RT-thread stays running with status D (uninterruptible sleep) blocking a great deal of CPU (that is why probably the whole system freezes).
* We tried with different computers (both laptop and desktop) to exclude a platform's dependency, but the issue remains.
Please let me know if there is any missing important information that can help understanding the problem.
Thank you a lot for the support.
Best regards,
Simone
Roberto Viola
Technical Dept
+39 0536836680
[cid:image001.jpg at 01D51792.D2B48E50]
SYSTEM CERAMICS S.p.A.
Via Ghiarola Vecchia, 73
41042 Fiorano (Mo) ITALY
+39 0536 836111
info at system-electronics.it<mailto:info at system-electronics.it>
www.system-electronics.it<http://www.system-electronics.it>
[cid:image002.jpg at 01D51792.D2B48E50]
________________________________
Le informazioni contenute in questa email, inclusi i suoi allegati, sono riservate e ad uso esclusivo del destinatario. Qualora le fosse pervenuta per errore, lei non è autorizzato a copiare, inoltrare e/o rendere nota questa email e i suoi allegati, totalmente o parzialmente, e pertanto la preghiamo di cancellarla immediatamente senza visionarne il contenuto e gli allegati.
Avvertenza: la presente casella e-mail ed i messaggi da essa derivanti, sono di esclusivo utilizzo aziendale /lavorativo e mai personale.
Risposte al presente messaggio: si avvisa il destinatario che eventuali sue risposte, potranno essere lette dall’intera azienda /ufficio /reparto di appartenenza del mittente.
The information contained in this e-mail, including attachments, is confidential and exclusively for the use of the intended recipient. If you received this communication by mistake you are not authorized to copy, send and/or publish this message and its attachments, in whole or in part and therefore please delete this message.
____________________________________________________
SIMONE COMARI
Research Fellow
DIN – Dept. of Industrial Engineering
Alma Mater Studiorum – University of Bologna
Via Umberto Terracini, 24, 40131 Bologna (BO), Italy
E-mail: simone.comari2 at unibo.it<mailto:simone.comari2 at unibo.it>
Websites:
https://www.unibo.it/sitoweb/simone.comari2
http://grab.diem.unibo.it<http://grab.diem.unibo.it/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20190531/30c14b58/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 15896 bytes
Desc: image001.jpg
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20190531/30c14b58/attachment-0008.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 7366 bytes
Desc: image002.jpg
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20190531/30c14b58/attachment-0009.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image6abc58.JPG
Type: image/jpeg
Size: 15896 bytes
Desc: image6abc58.JPG
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20190531/30c14b58/attachment-0008.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imaged460d1.JPG
Type: image/jpeg
Size: 7366 bytes
Desc: imaged460d1.JPG
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20190531/30c14b58/attachment-0009.jpe>
More information about the Etherlab-users
mailing list