Discussion:
[Sisuite-devel] si_netbootmond: design issue: best solution to fix it?
Olivier LAHAYE
2014-07-23 14:23:23 UTC
Permalink
Dear all,

I'm working on si_netbootmond (in fact working on OSCAR side) and discovered that
si_netbootmond rely on the fact that the imaging method is rsync.
It monitors the rsyncd log file for magic works stating that imaging is complete.

There are 2 problems here:
1/ It doesn't work for deployment using flamethrower /multicast or bittorrent
2/ If a client is successfully imaged, this doesn't mean that it is able to reboot (bad
image, bad bootloader, ...) in this situation, setting local boot is wrong.

I see 3 solutions:
1/ replace the actual code that scans /var/log/systemimager/rsyncd for magic
words for magic words: /scripts\/imaging_complete_?([\.0-9]+)?/
with something that would do something like
use File::Monitor;
my $monitor = File::Monitor->new();$monitor->watch('somefile.txt');




Regards,

Olivier.
--
Olivier Lahaye
DRT/LIST/DIR
Olivier LAHAYE
2014-07-23 14:37:52 UTC
Permalink
Dear all,

I'm working on si_netbootmond (in fact working on OSCAR side) and discovered that
si_netbootmond rely on the fact that the imaging method is rsync.
It monitors the rsyncd log file for magic works stating that imaging is complete.

There are 2 problems here:
1/ It doesn't work for deployment using flamethrower / multicast or bittorrent
2/ If a client is successfully imaged, this doesn't mean that it is able to reboot (bad
image, bad bootloader, ...) in this situation, setting local boot is wrong as we certainly
want to boot from net again in order to re-image.

I see 3 solutions:
1/ replace the actual code that scans /var/log/systemimager/rsyncd for magic words:
/scripts\/imaging_complete_?([\.0-9]+)?/ with something that would do something like:

use File::Monitor;
my $monitor = File::Monitor->new();
$monitor->watch('/var/lib/systemimager/clients.xml');
In a loop, searching for clients with status 102 (REBOOTED)
and run:
si_mkclientnetboot --localboot --clients "<the client>"

2/ Keep actual code and add an option to monitor client.xml instead

3/ drop or left si_netbootmond untouched and add an option to unable updating the
netboot to local in si_monitor. Indeed, si_monitor won't require an active checking of a
file. it is in blocking state on a socket listenning for infos. if it receives message
rebooted, then it could call a:
si_mkclientnetboot --localboot --clients "<the client>"

Technically, solution 3 has the advantage of being passive while methode 1 and 2 are
active listenning of node status change. Problem of solution 3 is that I don't know if
it's logic to have the hability to update netboot of client in si_monitor, and I if
si_monitor is not started, then no possibility to update netboot.

What is the best solution?

Regards,

Olivier.
--
Olivier Lahaye
DRT/LIST/DIR
Olivier LAHAYE
2014-07-23 17:18:22 UTC
Permalink
Ok,

After studying a little bit more the problem, I have a question that I have no
answer for.

Why have si_netbootmond been created while a better place to add this
monitoring would have been the monitoring source: si_monitor.

Indeed, si_monitor is aware in real time of the "imaged" status and this,
whatever the deployment solution is (bittorrent, rsyncd, flamethrower, ...)

In my precedent post, I though that looking for status 102 (rebooted) was the
solution, unfortunately, if client is set to netboot, a rinstall loop will
occure before the rebooted status had a chance to be sent to si_monitor.

So, I think that a rock solid solution would be to have the following
algoryhtm set in si_monitor.

have a si_monitor configuration parameter to enable or disable the netbootmond
feature.
if enabled, when receiving "imaged" (status 100) (or maybe finalizing (101) or
even rebooting (104), and if NET_BOOT_DEFAULT is set to local, then it would
run a:
si_mkclientnetboot --localboot --clients "<the client>" and optionally start a
timer. (configurable).

When timer expires, it would check the /var/lib/systemimager/clients.xml for
that client, and if "rebooted (102) status is not there, then assumes that
reboot failed (bad boot loader, wrong fstab, garbage from postinstall script,
...) and revert the client to netboot so another imaging attempt can occure.

I admit that the netboot disabling could lead to endless loop doing failed
reimaging, but we could think about putting a no timout netboot waiting for a
keypress or a poweroff netboot or any other "on-fail-to-reboot" configurable
behavior.

Anyway, aside the "timer" option, the main purpose here is: why not
integrating si_netbootmond into si_monitor as we can benefit of real-time
client status without active waiting and we are independant of the deploying
method. (and it is also not that difficult to code mainly at line
si_monitor:390)

if(($client->{'status'} == 100) && $netbootmond_option_enabled) {
system("si_mkclientnetboot --localboot --clients \"$client->{'name'}\"");
}

What do you think?

Best regards,

Olivier.
Post by Olivier LAHAYE
Dear all,
I'm working on si_netbootmond (in fact working on OSCAR side) and discovered
that si_netbootmond rely on the fact that the imaging method is rsync.
It monitors the rsyncd log file for magic works stating that imaging is complete.
1/ It doesn't work for deployment using flamethrower / multicast or
bittorrent 2/ If a client is successfully imaged, this doesn't mean that it
is able to reboot (bad image, bad bootloader, ...) in this situation,
setting local boot is wrong as we certainly want to boot from net again in
order to re-image.
1/ replace the actual code that scans /var/log/systemimager/rsyncd for magic
words: /scripts\/imaging_complete_?([\.0-9]+)?/ with something that would
use File::Monitor;
my $monitor = File::Monitor->new();
$monitor->watch('/var/lib/systemimager/clients.xml');
In a loop, searching for clients with status 102 (REBOOTED)
si_mkclientnetboot --localboot --clients "<the client>"
2/ Keep actual code and add an option to monitor client.xml instead
3/ drop or left si_netbootmond untouched and add an option to unable
updating the netboot to local in si_monitor. Indeed, si_monitor won't
require an active checking of a file. it is in blocking state on a socket
listenning for infos. if it receives message rebooted, then it could call
si_mkclientnetboot --localboot --clients "<the client>"
Technically, solution 3 has the advantage of being passive while methode 1
and 2 are active listenning of node status change. Problem of solution 3 is
that I don't know if it's logic to have the hability to update netboot of
client in si_monitor, and I if si_monitor is not started, then no
possibility to update netboot.
What is the best solution?
Regards,
Olivier.
Cordialement,

Olivier.
--
Olivier Lahaye
DRT/LIST/DIR
Loading...