<?xml version="1.0" encoding="utf-8"?>
<!-- generator="FeedCreator 1.7.2-ppt DokuWiki" -->
<?xml-stylesheet href="https://lear.inrialpes.fr/local/wiki/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="https://lear.inrialpes.fr/local/wiki/feed.php">
        <title> administration</title>
        <description></description>
        <link>https://lear.inrialpes.fr/local/wiki/</link>
        <image rdf:resource="https://lear.inrialpes.fr/local/wiki/lib/tpl/dokuwiki/images/favicon.ico" />
       <dc:date>2023-09-06T10:16:17+02:00</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:beegfs_client&amp;rev=1611579902&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:emergency_oar_restart&amp;rev=1587120168&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:gpu_cluster_doc&amp;rev=1605283507&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:index&amp;rev=1628522720&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:new_user_setup&amp;rev=1610192744&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:nvidia_driver_reinstall&amp;rev=1605621060&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:raid_setup&amp;rev=1575899005&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:recap&amp;rev=1605608382&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:underclock_nvidia_headless&amp;rev=1559834510&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:using_docker&amp;rev=1532611831&amp;do=diff"/>
                <rdf:li rdf:resource="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:web_servers&amp;rev=1567430342&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="https://lear.inrialpes.fr/local/wiki/lib/tpl/dokuwiki/images/favicon.ico">
        <title></title>
        <link>https://lear.inrialpes.fr/local/wiki/</link>
        <url>https://lear.inrialpes.fr/local/wiki/lib/tpl/dokuwiki/images/favicon.ico</url>
    </image>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:beegfs_client&amp;rev=1611579902&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-01-25T14:05:02+02:00</dc:date>
        <title>administration:beegfs_client</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:beegfs_client&amp;rev=1611579902&amp;do=diff</link>
        <description>====== BeeGFS Client Setup ======

  * ''apt install beegfs-helperd beegfs-client''
  * Change '/etc/beegfs/beegfs-client.conf':

sysMgmtdHost                  = bmanage-cp

  * ''mkdir -p /services/scratch''
  *  Change '/etc/beegfs/beegfs-mounts.conf':

/services/scratch/ /etc/beegfs/beegfs-client.conf

  * ''systemctl restart beegfs-helperd beegfs-client''
  * Finally, you need to make sure that beegfs is pruned in ''/etc/updatedb.conf'':

''sed -i '$ s/\&quot;$/\ beegfs\&quot;/' /etc/updatedb.conf''

…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:emergency_oar_restart&amp;rev=1587120168&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2020-04-17T12:42:48+02:00</dc:date>
        <title>administration:emergency_oar_restart</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:emergency_oar_restart&amp;rev=1587120168&amp;do=diff</link>
        <description>[[administration:index|Administration]]

====== Emergency OAR Restart ======

Sometimes OAR locks up completely. No new jobs are scheduled, and existing jobs stay in the &quot;Waiting&quot; state even after being killed.

When this happens, the next thing you have to try (on edgar) is to run the following:

  - systemctl stop oar-server
  - systemctl stop postgresql    / systemctl stop pgsql
  - open htop, F6 Filter &quot;Almighty&quot;, send SIGKILL  to running instances
  - systemctl restart postgresql
  - system…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:gpu_cluster_doc&amp;rev=1605283507&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2020-11-13T17:05:07+02:00</dc:date>
        <title>administration:gpu_cluster_doc</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:gpu_cluster_doc&amp;rev=1605283507&amp;do=diff</link>
        <description>[[administration:index|Back to Administration]]

====== GPU cluster documentation ======
===== Basics =====


Currently the GPU cluster is ran as a separate set of resources accessible from the **edgar** machine.

**edgar** runs an **OAR scheduler**.

The command **oarnodesetting** is the entry point for most operations, such as:
&lt;code&gt;
oarnodesetting -h gpuhost38 -s Alive
# WARNING: kills existing jobs, does NOT reschedule idempotent
oarnodesetting -h gpuhost38 -s Absent

# no new jobs can be s…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:index&amp;rev=1628522720&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-08-09T17:25:20+02:00</dc:date>
        <title>administration:index</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:index&amp;rev=1628522720&amp;do=diff</link>
        <description>====== Administration ======

This page covers topics related to machine administration.
Some pages are only accessible to [[:system_administrators|System Administrators]].

  * **Introduction**
    * [[administration:recap|Quick recap]]
  * **Setup**
    * [[administration:os_deployment:index|Installing new machines]]
    * [[administration:new_user_setup|New User setup]]
    * [[administration:using_docker|Set up Docker]]
    * [[administration:beegfs_client|BeeGFS Client Setup]]
    * [[admin…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:new_user_setup&amp;rev=1610192744&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-01-09T12:45:44+02:00</dc:date>
        <title>administration:new_user_setup</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:new_user_setup&amp;rev=1610192744&amp;do=diff</link>
        <description>====== New User setup ======

Below are described some steps to setup a new user account.

===== Default folders =====

Here is the command to create the default folders for a given user when no physical connexion can be made to the Thoth subnetwork.

   * ''su &lt;username&gt;''
   * ''cd /home/&lt;username&gt;''
   * ''xdg-user-dirs-update''

===== Remote SSH keys setup =====

When a new member ''&lt;user&gt;'' starts remotely, admins should grant them remote ssh access to ''&lt;user_machine&gt;''. The procedure is d…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:nvidia_driver_reinstall&amp;rev=1605621060&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2020-11-17T14:51:00+02:00</dc:date>
        <title>administration:nvidia_driver_reinstall</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:nvidia_driver_reinstall&amp;rev=1605621060&amp;do=diff</link>
        <description>====== Updating the NVIDIA drivers ======


Before (re)installing a (new) driver, do ''init 3''. Post-installation, do ''init 5''.

You can check the installed packages (''ii'') using ''dpkg --list | grep nvidia''

You can see on the [[http://edgar/monika|GPU cluster status page]] which node may be failing because of a NVIDIA driver.

===== Available options =====

There are 4 options available to fix a failing node. The first option is always the preferred one.

  - NVIDIA driver reinstallation…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:raid_setup&amp;rev=1575899005&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2019-12-09T14:43:25+02:00</dc:date>
        <title>administration:raid_setup</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:raid_setup&amp;rev=1575899005&amp;do=diff</link>
        <description>====== Tutorial on how to set up a RAID array of two disks (scratch) ======

This tutorial intend to serve as a walk-through for setting up a RAID array of two disks in the team's desktop machines. This RAID array will work as a **scratch** in your computer.

First, you should check that disks (HW) are correctly connected to the computer. Once you check that the disks are connected, you could check that they are recognized in your machine by running fdisk -l. You should be able to locate two add…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:recap&amp;rev=1605608382&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2020-11-17T11:19:42+02:00</dc:date>
        <title>administration:recap</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:recap&amp;rev=1605608382&amp;do=diff</link>
        <description>====== Quick Recap for System Administrators ======

Here is a quick summary of our role as system administrators.We are responsible for the team's fleet of desktop machines, the GPU cluster, as well as the huge amount of data stored in the different scratches.

We have several tasks that can be summarized as follows: setup, monitoring, fixing and ordering.

===== Setup =====

  * Whenever a new member joins the team, we assign him a desktop machine, based on the availability (See machines.ods i…</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:underclock_nvidia_headless&amp;rev=1559834510&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2019-06-06T17:21:50+02:00</dc:date>
        <title>administration:underclock_nvidia_headless</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:underclock_nvidia_headless&amp;rev=1559834510&amp;do=diff</link>
        <description>Back to Administration

&lt;https://bitcointalk.org/index.php?topic=2432849.0&gt;

Force X server even without monitors:
&lt;pre&gt;
nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=31 --use-display-device=“DFP-0” --connected-monitor=“DFP-0”
&lt;/pre&gt;</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:using_docker&amp;rev=1532611831&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2018-07-26T15:30:31+02:00</dc:date>
        <title>administration:using_docker</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:using_docker&amp;rev=1532611831&amp;do=diff</link>
        <description>DOCKER

Here are some informations about how to install and configure Docker, so that it can be run by users on their desktop machines without potential safety issues.

Installation

You can check this page to see how to install Docker on our Ubuntu machines:</description>
    </item>
    <item rdf:about="https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:web_servers&amp;rev=1567430342&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2019-09-02T15:19:02+02:00</dc:date>
        <title>administration:web_servers</title>
        <link>https://lear.inrialpes.fr/local/wiki/doku.php?id=administration:web_servers&amp;rev=1567430342&amp;do=diff</link>
        <description>Web servers

There are two web-facing servers in Thoth.

The first is the public webpage “&lt;http://thoth.inrialpes.fr&gt;”. It is a virtual machine hosted by the MI. The content is drawn from the following NFS mount:

/home/wwwlear

The second web server is “pascal”. Is it physically in Jakob's office (armoire).</description>
    </item>
</rdf:RDF>
