Welcome to www.ClusterGate.RU

Internet
ClusterGate.RU

Home: Why clusters and for what the clusters are
News: mainly about the site
Linux: General Information,
Distributions, etc.
Clustering: software systems to organize clusters
Virtualization:
Hardware: server hardware and hardware for clustering
General Purpose Software
File Systems:
local and distributed
Access Methods
to large volume of data
Data transfer
between clusters
Security: all aspects (antihacker software, spare backup copy, power control)
High Performance Computing:
Examples of powerful clusters
Examples: midrange clusters
Monitoring and Measurement tools
Batch/Load Balance systems
Grid ...
Further reading:
Journals, Reviews,
News, Books
Computing in High Energy Physics:
Computing sites,
application packages
This is Monitoring page for ClusterGate.RU

  • Lire -- The Lire log analyzer and report generator
  • Zenoss -- Open Source Enterprise Monitoring. Zenoss Core is an enterprise-grade network and systems monitoring product that delivers the functionality IT operations teams need to effectively manage the health and performance of their entire infrastructure through a single, integrated package.

    For far too long, robust IT infrastructure monitoring was out of reach for most organizations because of the cost and complexity of the proprietary systems that offered the required functionality. Zenoss has changed the game by offering a complete, easy-to-use solution as a free (i.e. no money), downloadable, open source software product.

  • Big brother -- well known service/host monitoring system. Big Brother monitors System and Network-delivered services for availability. Your current network status is displayed on a color-coded web page in near-real time. When problems are detected, you're immediately notified by e-mail, pager, or text messaging.
  • Nagios -- NagiosŪ is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser.
  • Ganglia -- Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
  • Cricket -- is a high performance, extremely flexible system for monitoring trends in time-series data. Cricket was expressly developed to help network managers visualize and understand the traffic on their networks, but it can be used all kinds of other jobs, as well.
    Cricket has two components, a collector and a grapher. The collector runs from cron every 5 minutes (or at a different rate, if you want), and stores data into a datastructure managed by RRD Tool. Later, when you want to check on the data you have collected, you can use a web-based interface to view graphs of the data.
    Cricket reads a set of config files called a config tree. The config tree expresses everything Cricket needs to know about the types of data to be collected, how to get it, and from which targets it should collect data. The config tree is designed to minimize redundant information, making it compact and easy to manage, and preventing silly mistakes from occurring due to copy-and-paste errors. Cricket is written entirely in Perl and is distributed under the GNU General Public License.
  • Zabbix -- ZABBIX is software for monitoring of your applications, network and servers. ZABBIX supports both polling and trapping techniques to collect data from monitored hosts. A flexible notification mechanism allows easy and quickly configure different types of notifications for pre-defined events. ZABBIX offers advanced monitoring, alerting and visualisation features today which are missing in other monitoring systems, even some of the best commercial ones. Use of industry standards makes integration of ZABBIX into existing infrastructure trouble-free.
  • BOSS -- the name Batch Object Submission System may give you wrong idea that the system is scheduler or something like that. In reality it is job/task monitoring system (not service monitoring). BOSS (Batch Object Submission System) provides an easy to use book keeping system for jobs running on a Linux computing farm. Different job types can be registered to the BOSS System, allowing the storage on a local database of information specific to the task which is being performed by the job itself. This information is used both for job monitoring and for book-keeping.
  • Monalisa -- MONitoring Agents using a Large Integrated Services Architecture. The MonALISA framework is a fully distributed service system with no single point of failure and it provides:
    • Distributed Registration and Discovery for Services and Applications.
    • Monitoring all aspects of complex systems :
      • System information for computer nodes and clusters.
      • Network information (traffic, flows, connectivity, topology) for WAN and LAN.
      • Monitoring the performance of Applications, Jobs or services.
      • End User Systems, and End To End performance measurements.
    • Can interact with any other services to provide in near real-time
    • customized information based on monitoring information.
    • Secure, remote administration for services and applications.
    • Agents to supervise applications, to restart or reconfigure them, and to
    • notify other services when certain conditions are detected.
    • The Agent system can be used to develop higher level decision services,
    • implemented as a distributed network of communicating agents, to perform
    • global optimization tasks.
    • Graphical User Interfaces to visualize complex information.
    • Global monitoring repositories for distributed Virtual Organizations.
    MonALISA is currently used in several large scale distributed system and proved to be a reliable and scalable system.
  • R-GMA: -- Integrated Applications Management, Server Management, and Database Monitoring Software. Integrated Applications Management, Server Management, and Database Monitoring Software. R-GMA is in wide use in Grid like distributed systems.
  • Test harness and reporting framework -- Inca is a flexible framework for the automated testing, benchmarking and monitoring of Grid systems. It includes mechanisms to schedule the execution of information gathering scripts and to collect, archive, publish, and display data.
    Originally developed for the TeraGrid project, Inca is a general framework that can be adapted and used by other Grids. Inca offers a diverse set of use cases including:
    • Software Stack Validation & Verification
    • Network Bandwidth Measurements
    • Grid Benchmarking
  • ManageEngine -- professionsl monitoring/management tool. Integrated Applications Management, Server Management, and Database Monitoring Software
  • Lemon RRD framework -- Lemon RRD framework is a part of the Lemon project at CERN (http://cern.ch/lemon) and is used to retrieve metric information from the MR (Monitoring Repository) and store it into time series serializes aging data structures that are stored as rrd files on a disk. These are integral part of the RRDtool project (http://www.rrdtool.org) that we used for our purposes. This is then passed over to the web interface for visualization. Framework is generic enough to allow different source of data other than MR. LRF supports grouping of machines (objects) into groups (clusters, racks, hardware models,...) and provides summary or average overview of each group independently even if certain machines are part of more of these groupings. This is all provided already at the time of gathering of information from the Monitoring Repository. The overview of the Lemon is available here and of the Lemon RRD framework is here.
  • sysstat package -- news, information, documentation and links software for the sysstat utilities created for Linux. The sysstat utilities are a collection of performance monitoring tools for Linux. These include sar, sadf, mpstat, iostat and sa tools.

Measurement tools

  • Bonnie++ -- is a benchmark suite that is aimed at performing a number of simple tests of hard drive and file system performance. Then you can decide which test is important and decide how to compare different systems after running it. The main program tests database type access to a single file (or a set of files if you wish to test more than 1G of storage), and it tests creation, reading, and deleting of small files which can simulate the usage of programs such as Squid, INN, or Maildir format email.
  • NetLogger Anyone who has ever tried to debug or do performance analysis of complex distributed applications knows that it can be a very difficult task. Problems may be in many various software components, hardware components, networks, OS's, etc.
    NetLogger is designed to make this easier. NetLogger is both a methodology for analyzing distributed systems, and a set of tools to help implement the methodology. In fact, you can use the NetLogger methodology without using any of the LBNL provided tools.
  • Iperf known tool for network measurement. Iperf is a tool to measure maximum TCP bandwidth, allowing the tuning of various parameters and UDP characteristics. Iperf reports bandwidth, delay jitter, datagram loss.
  • NetPerf is a benchmark that can be used to measure the performance of many different types of networking.
  • Network MOnitoring tools -- large list of available monitoring/measuring tools (SLAC.STANFORD.EDU)
  • IOzone good benchmark tool for file systems IOzone is a filesystem benchmark tool. The benchmark generates and measures a variety of file operations. Iozone has been ported to many machines and runs under many operating systems. Iozone is useful for performing a broad filesystem analysis of a vendor's computer platform. Benchmark Features:
    • ANSII C source
    • POSIX async I/O
    • Mmap() file I/O
    • Normal file I/O
    • Single stream measurement
    • Multiple stream measurement
    • Distributed fileserver measurements (Cluster)
    • POSIX pthreads
    • Multi-process measurement
    • Excel importable output for graph generation
    • Latency plots
    • 64bit compatible source
    • Large file compatible
    • Stonewalling in throughput tests to eliminate straggler effects
    • Processor cache size configurable
    • Selectable measurements with fsync, O_SYNC
    • Builds for: AIX, BSDI, HP-UX, IRIX, FreeBSD, Linux, OpenBSD, NetBSD,
    • OSFV3, OSFV4, OSFV5, SCO OpenServer, Solaris, Windows95/98/NT
  • Internet End-to-end Performance Monitoring not bad intro into the matter
  • Distributed Systems Department at LBL (here is good source of information on netowrking)
  • Various measurement/taxonomy tools The CAIDA Tools site contains CAIDA tools and software as well as a taxonomy of available research and visualization tools.
  • The list of measurement tools (SPEC, bonnie, TPC, a range of kernel tools, etc.)
  • Disk benchmarks -- the list of different tools to do measurement on disk I/O.
  • FIO is a tool that will spawn a number of threads or processes doing a particular type of io action as specified by the user. fio takes a number of global parameters, each inherited by the thread unless otherwise parameters given to them overriding that setting is given. The typical use of fio is to write a job file matching the io load one wants to simulate.
  • Memtest86 -- A Stand-alone Memory Diagnostic. Memtest86 is thorough, stand alone memory test for x86 architecture computers. BIOS based memory tests are a quick, cursory check and often miss many of the failures that are detected by Memtest86.
  • BenchMarkHQ -- pretty large collection for benchmark utilities (English and Russian)
  • SysBench: a system performance benchmark. SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.
    The idea of this benchmark suite is to quickly get an impression about system performance without setting up complex database benchmarks or even without installing a database at all. Current features allow to test the following system parameters:
    • file I/O performance
    • scheduler performance
    • memory allocation and transfer speed
    • POSIX threads implementation performance
    • database server performance (OLTP benchmark)
  • CPU/Memory/Disk/System Tests -- many testing tools for different parts of the system.
Please email the
portalmaster@pnpi.spb.ru
with questions or comments.
Our smart sponsors:

All rights reserved. Copyright © 2006, 2007, 2008, 2009. Andrey Y Shevel.


Last revised: Monday, 27-Apr-2009 12:41:47 MSD
Current date/time: Saturday, 04-May-2024 10:18:46 MSK
This document URL http://hepd.pnpi.spb.ru:443/ClusterGate.RU/CG_Monitoring/index.shtml