October 01, 2004 (technical)

Fixing Obsidians Network Problem

Came in on monday the 27th to find that a lot of the cluster seemed to be messed up.

The workstations couldn't ping the frontend-0, frontend-0 didn't appear to have it's routing or network addresses working appropriately.

Fix:

  • halt all the machines
    From frontend-0: "cluster-fork halt -t now"
  • fix the frontend-0 network
    From frontend-0: modified the /etc/sysconfig/network-scripts/ifcfg-eth1
    replacing the DHCPD option with a static IP option
    New network settings:
    IP: 172.16.2.240
    Netmask: 255.255.0.0
    This also automatically fixed the "route"ing tables, now any 172.16.x.x traffic was going through eth1
  • reboot the comp-pvfs nodes
  • to aid in daniel's quest, i then turned dhcpd off - which was giving him dramas. (After resetting frontend-0 DHCPD will be started again

*nb: the dhcpd.conf on frontend-0 doesn't actually give out IP's - only to nodes that were assimilated via "insert-ethers" - which meant that machines requesting IP's from frontend-0 were hanging.

Even though i considered modifying the frontend-0's dhcpd.conf to allow it to give out IP's - because the file is automatically updated via "insert-ethers" - i thought it better not to touch it.

Posted by xntrik at October 1, 2004 03:50 PM | TrackBack