UCR

iResearch



Cluster Computing @ UCR


Datacenter

Cluster Computing

Cluster computing, or "Clustering" is the use of multiple computers working together to perform highly intensive tasks as though they are a single machine. Complex processes are sped up by the use of parallel processing, where a single program is divided and executed on a series of machines simultaneously.

Clustering has been used to process large amounts of data in projects such as Seti@Home and Folding@Home.

UC Riverside offers three cluster models that can be utilized by any interested researcher. Click the tabs above to read about each type of cluster.

The Benefits of Cluster Computing

Cluster computing allows researchers and departments to maximize the most processing power out of a limited budget without the need to invest in a supercomputer or mainframe.

Some benefits of participating in Cluster Computing at UCR are:

  • High processing power by utilization of parallel processing.
  • Hardware reliability and scalability
  • Power, cooling, and FTE provided by Computing & Communications that wouldn’t be available to a cluster housed in a researcher’s laboratory
  • Direct connection to the campus high speed 10 Gigabit backbone
  • 10 Gigabit link to Cenic/Internet2

For more information regarding Computational Clusters at UCR, please read the document provided below.

Computational Clusters White Paper

Departmental

Departmentally Maintained Clusters

This type of cluster is meant for researchers who have computing needs that fall outside of the campus cluster standards.

These systems are built to particular PI/lab/center specifications and managed by PI funded staff, but housed within C&C’s data center with C&C staff management, mentoring, or backup provided to the departmental administrator.

Dna Model

Centrally Hosted Dedicated Clusters

Centrally hosted dedicated clusters are built to campus standards so that multiple systems servicing the needs of many researchers can be deployed and maintained by a relatively small number of systems administrators. These campus standard computational clusters will be maintained by Computing and Communications and housed in the campus Co-Location Facility.

Technical staff within Computing and Communications will assist all researchers who wish to be part of this program with purchasing the best solution for their needs.

Technical Details

Intended Users

These clusters will be designed to meet the needs of research labs, research centers, and PIs with relatively substantial compute requirements.

The nodes are readily available high-power 64-bit systems, meant for applications that can easily be distributed, rather than a highly parallelized cluster system with a fast (and expensive interconnect).

Financial Support Model

PIs fund these Standardized/Dedicated Clusters via their own funding sources, including initial complements, extramural funds, etc. PIs are also responsible for maintaining these clusters.

Equipment Obsolescence

To maintain efficient use of all resources dedicated to this program it is important that all hardware housed in the Co-Location Facility be up to date. After a period of three years after any cluster has been purchased and housed in the Co-Location Facility it will be evaluated based on condition of the hardware and the cost of continued maintenance. If it is deemed to be still cost effective to continue to house the hardware it will be retained and evaluated annually after. If it is no longer feasible to continue to maintain the hardware, Computing and Communications will work with the equipment owners to either turn over administration of the cluster to their own staff or assist them with disposing of the obsolete hardware.

Technical Details

  • Campus Cluster Standards

    Clusters housed in the Co-Location Facility must adhere to the following hardware standards:

    Computer Nodes

    • 1U Rack Mounted Servers
    • Dual AMD Opteron 2214 Dual Core Processors
    • At least 8 GB of RAM per node
    • At least 1 80-160GB Hard Drive
    • High quality hardware that is as green as possible
    • Dual Gigabit Ethernet ports
    • DDR Infiniband interconnect HCA with cable (Optional)
    • IPMI
    • 3 year warranty

    Master Node

    • 1U Rack Mounted Server
    • Dual AMD Opteron 2214 Dual Core Processors
    • At least 8 GB of RAM
    • Dual Gigabit Ethernet ports
    • 3 year warranty
    • Interconnect (Network)
    • Gigabit Ethernet switch
    • Infiniband fast interconnect switch with embedded subnet manager (Optional)

    Storage

    • No specific standard but PI must consult with Computing and Communications prior to purchase to ensure that the support staff is prepared.

    Scheduling/Administration Software

    • Perceus for cluster OS provisioning
    • Sun Grid Engine is highly recommended but any number of schedulers/resource managers are supported as long as they support interoperability with the UC/UCR Grid.

    Operating System

    • CentOS or ScientificLinux

    These standards will be reviewed from time to time to ensure that clusters purchased as part of this program will be of the high quality and best performance available.

  • Software Installed

    A standard set of software packages will be installed on every centrally hosted dedicated cluster. Researchers may choose to alter this list as they see fit but are responsible for purchasing any non open source software package that isn’t part of the initial install.

    Compilers

    Other Development Tools/Numerical Libraries

    Databases

    Science and Mathematics

    Plotting and Graphing

    Visualization and Modeling

Circuit Board

Collaborative Computational Clusters

UCR’s collaborative cluster provides a shared system as a computing resource for campus researchers with limited financial resources.

Collaborative Cluster Components

The 69 compute nodes available in the cluster are divided into three logical sub clusters.

  • Set 1: The first 40 nodes are part of a general purpose cluster that is made available to researchers and sponsored graduate and undergraduate students who do not have access to any other cluster on campus to run computations that take a very short time to run.
  • Set 2: The next 24 nodes are part of a base shared cluster system available only to researchers who contribute nodes to expand the cluster.

All nodes in this first set of 64 are part of an extremely fast InfiniBand interconnect. Jobs run on these nodes may take advantage of the low latency communication and high bandwidth that this interconnect offers.

  • Set 3: The remaining 5 nodes in the cluster are part of what is known as the Application Cluster and are intended for jobs that cannot take advantage of the Infiniband network.

Technical Details

Intended Users

The collaborative cluster is ideal for researchers with higher end computer needs but without the financial resources to obtain the infrastructure necessary for a fast interconnect (an InfiniBand interconnect switch can add more than $50,000 to the overall cost of a cluster).

Of this group two types are targeted:

  • Researchers who need occasional use of computational resources. This group of researchers would be able to make use of the 40 node sub cluster to run the occasional short job.
  • Researchers who regularly use high end computation in their research. This group of researchers would have available to them the 24 nodes of the base shared cluster in additional to their own nodes they add to the cluster and for a longer amount of time.

Financial Support Model

The initial complement of 69 compute nodes, master node, InfiniBand interconnect and storage server is provided by Computing and Communications in the campus data center. The master node houses user accounts and is the node on which jobs are launched and applications created and/or compiled.

Faculty researchers who make regular use of the cluster are encouraged to participate in building the cluster by using funding sources at their disposal (such as grants and initial complements) to purchase additional compute nodes.

Gaining Access

Access to the cluster may be done through either the UCR Grid Portal or through SSH to the head node.

Governance

An advisory panel of one representative of each research group that has contributed will be established to recommend policy changes on the collaborative cluster to the Associate Vice Chancellor of Computing and Communications. This group will also review requests for abnormal uses of the cluster such as a request to use all available nodes for a special research project.

Technical Details

  • Hardware

    Special care as been made to ensure that this cluster is as green as possible. This includes using lower voltage processors and more efficient power supplies.

    Researchers who add nodes to the cluster must follow the campus cluster standards. To make the cluster as homogeneous as possible however, the InfiniBand HCA is required.

    The hardware specification for the collaborative cluster is as follows:

    64 Compute Nodes

    • 1U Rack Mounted Servers
    • Dual AMD Opteron 2214 HE Dual Core Processors
    • 8 GB of RAM per node
    • 1 80GB Hard Drive
    • High quality hardware that is extremely green
    • Dual Gigabit Ethernet ports
    • DDR Infiniband interconnect HCA

    Master Node

    • 1U Rack Mounted Server
    • Dual AMD Opteron 2214 HE Dual Core Processors
    • 8 GB of RAM
    • Dual Gigabit Ethernet ports
    • 1.5 TB internal storage
    • DVD-RW drive

    Interconnect (Network)

    • Gigabit Ethernet switch
    • QLogic Infiniband fast interconnect switch

    Storage

    • Sun X4500 Storage Server (48TB raw capacity)

    • Backup X4500 Storage Server (offsite, 48TB raw capacity)

  • Software Installed

    Software will be available to researchers who use any of the nodes that have been provided by Computing and Communications. Any nodes that have been added by researchers will have a software suite available that is similar to those with dedicated clusters. Researcher requests to add additional software will be evaluated on a case by case basis.

    The collaborative cluster comes with the following software installed.

    Compilers

    • ifort (Fortran 77/95)
    • icc (C/C++)
    • gcc (C)
    • g++ (C++)
    • g77 (Fortran 77)
    • gfortran (Fortran 95)
    • gdb (Debugger)
    • gnat (Ada 95)

    Other Development Tools/Numerical Libraries

    • Sun JDK
    • MPICH
    • ARPACK
    • ATLAS
    • Basic Linear Algebra Subprograms (BLAS)
    • FFTW
    • GNU Scientific Library (GLS)
    • HDF(4)
    • HDF5
    • Intel Math Kernel Library (MKL)
    • LAPACK
    • netCDF
    • OpenMotif
    • ScaLAPACK

    Databases

    • MySQL
    • PostGreSQL

    Chemistry

    • Gaussian
    • Q-Chem

    Science and Mathematics

    • Mathematica
    • Octave

    Plotting and Graphing

    • gnuplot
    • grace
    • Metis
    • ParMetis

    Statistics

    • R
    • SAS*

    Social Sciences

    • SPSS*

    Visualization and Modeling

    • IDL
    • IDL Analyst
    • OpenDX
    • ParaView
    • POVray
    • RasMol
    • TecPlot
  • Storage

    All users of the cluster of any either the general purpose component or the shared component will have access to the shared Sun X4500 storage server.

    Users of the general purpose component will have a 2 GB home directory quota available to them. This storage space is reserved for data specific to the use of the cluster and is not intended for long term storage.

    Researchers who contribute nodes to the cluster will have 1TB for free for distribution to members of their research group and may purchase additional storage in 1TB increments for $2000 each. This cost will not only cover the cost of the disk but of the maintenance and regular backups that will be done.

  • Queues

    To ensure the maximum use of the available cycles on the cluster several queues will be established.

    General Purpose Component

    On the general purpose 40 node component a 24 hour queue will be in effect. This means that any job that is submitted to this queue will be limited to a run time of 24 hours. There is no guarantee of when the job will start and is based on the number of waiting jobs in the queue and the number of cores being requested. Jobs that require more cores will be pushed back in the queue.

    Shared Components

    Researchers who have contributed nodes will have access to a research group queue that has a time constraint of 14 days. This means that if a researcher has contributed 8 cores to the cluster a job may be submitted that uses up to 8 cores and may run for a maximum 14 days.

    These researchers also have access to a 24 hour surplus cycle queue that takes advantage of unused cycles on the base set of 24 nodes provided by Computing and Communications as well as any unused cores on nodes that have been contributed by other researchers.


More Information

General Campus Information

University of California, Riverside
900 University Ave.
Riverside, CA 92521
Tel: (951) 827-1012

Career OpportunitiesUCR Libraries
Campus StatusDirections to UCR

iResearch Information

Research Technology Support
Computing & Communications Bldg.

Tel: (951) 827-3174
Fax: (951) 827-4541
E-mail: researchtechnology@ucr.edu

Footer