NGOP -- What is it?
- Scalable Monitoring System
- "Next Generation OPerations"
- Written at Fermilab
- Written in Python/C++
- Past HEPIX talks discussed architecture etc.
HEPIX 2002: NGOP UpdateFermilab
What's happened lately?
- Monitoring still more stuff
- Smarter Agents
- Start/Stop standardization
- Performance Monitoring still on hold
- New User Interface code (more to come)
- Impending Lights Out Operation
HEPIX 2002: NGOP UpdateFermilab
More Stuff
- Major Clusters:
Farms | EMail | AFS |
Enstore | WWW | USCMS |
FNALU |
- Totals:
Hosts: ~1,100 Components: ~25,000
HEPIX 2002: NGOP UpdateFermilab
Smarter Agents/Servers
- Ping agent:
route discovery, outage groups
- Swatch agent:
combines repeated messages
- Status Engines:
Delayed effect rules
HEPIX 2002: NGOP UpdateFermilab
New User Interface:
- Split old GUI into:
- GUI Front End
- Status Engine(s)
- Location Server
- Improves GUI responsiveness.
HEPIX 2002: NGOP UpdateFermilab
Status Engine
- Determines "status" of entities
-- individual and composite
- Triggers actions on status change
- Multiple GUI's -> One Status Engine
- One Status Engine per "role"
-- operator, admin, etc.
HEPIX 2002: NGOP UpdateFermilab
GUI Front Ends:
- java gui
- Responsive, prettier
- Bigger memory footprint
- web gui
- FastCGI
-
HEPIX 2002: NGOP UpdateFermilab
Demo
HEPIX 2002: NGOP UpdateFermilab
Security, Privacy, Legal