Postgres

From NYU CCPP Wiki

Contents

Using the Postgres Database at NYU

This is a quick guide to using the postgres database at NYU. Note, currently one can only access the db from within our network.

For an SQL tutorial see this page: http://www.w3schools.com/sql/ See also Postgres Tricks for a list of useful commands and Postgres Administration for some administration tips.

Environment

SDSS database and user

The user should set the following environment variables for convenience. Note these are automatically set if you source the setup script described in the IDL interface section below.

PGUSER=sdss
PGDATABASE=sdss

These are the default user and database on the system. No authentication is required for this user. Then when the user runs the command line tool 'psql' or accesses the database through the IDL, C, python, etc APIs no connection information will be needed.

Authentication

For logins that require passwords, a file ~/.pgpass can be created (unreadable to the outside world) with the following form:

*:*:*:username:password

which will allow login with no typing of a password. If you have more than one host and database with different accounts you specify this fully: hostname:port:database:username:password

For example, in order to run the COPY command you have to run as the postgres user. You will need a .pgpass file with the following definition:

*:*:*:postgres:the_correct_pwd

Ask Erin Sheldon for the password. This priveledge is very useful especially from the IDL interface (e.g. struct2table)

Command Line

The command line tool is a simple way to look around at the data.

 [esheldon@early ~] psql
 sdss=>

To see the existing tables

 sdss=> \d
                  List of relations
  Schema |         Name         | Type  |  Owner
 --------+----------------------+-------+----------
  public | datasweep            | table | postgres
  public | datasweep_meta       | table | postgres
  public | field_rotation       | table | sdss
  public | lowz                 | table | postgres
  public | lssall0              | table | postgres
  public | lssfull0             | table | postgres
  public | maxbcg               | table | postgres
  public | maxbcg_input         | table | postgres
  public | maxbcg_input_meta    | table | postgres
  public | maxbcg_neigh         | table | postgres
  public | recipes              | table | sdss
  public | vagc                 | table | postgres
  public | vagc_match           | table | postgres
  public | zphot                | table | postgres
  public | zphot_princeton_meta | table | postgres

To see what is in each table:

 sdss=> \d zphot
             Table "public.zphot"
      Column      |   Type   | Modifiers
 -----------------+----------+-----------
  photoid         | bigint   | not null
  run             | smallint | not null
  rerun           | smallint | not null
  camcol          | smallint | not null
  field           | smallint | not null
  id              | smallint | not null
  photoz_z        | real     | not null
  photoz_zerr1    | real     | not null
  photoz_zerr2    | real     | not null
  photoz_zerr3    | real     | not null
  photoz_zerr4    | real     | not null
  photoz_zwarning | smallint | not null
 Indexes:
     "zphot_pkey" PRIMARY KEY, btree (photoid)
     "zphot_rrcfi_index" btree (run, rerun, camcol, field, id)

Note each column definition and the list of indexes.

Look at the first 10 items in the datasweeps:

 sdss=> select run,camcol,field,modelflux[2] from datasweep limit 10;
  run | camcol | field | modelflux
 -----+--------+-------+-----------
   94 |      1 |    11 |   39.1747
   94 |      1 |    11 |   248.069
   94 |      1 |    11 |   362.275
   94 |      1 |    11 |  0.794669
   94 |      1 |    11 |   2.42472
   94 |      1 |    11 |  0.866375
   94 |      1 |    11 |   6.57751
   94 |      1 |    11 |   4.65699
   94 |      1 |    11 |  0.781404
   94 |      1 |    11 |   13.1877

IDL interface

In general, you will need to get SDSSIDL http://sdssidl.sourceforge.net/ set up. At NYU all you need to do is source this file:

source ~esheldon/idl_libraries/sdssidl_config/sdssidl_setup.sh

or .csh for cshell.

There is an IDL interface to the postgresql database. The basic routine is called pgsql_query() (http://sdssidl.sourceforge.net/src/pgsql_query.c.html)

IDL> st = pgsql_query('select * from datasweep where run=756 and modelflux[2] > 100')

Will return all data for all objects in run 756 with that flux bound. The data is returned in an IDL structure:

 IDL> help, st, /str
 ** Structure <2168818>, 41 tags, length=704, data length=696, refs=1:
  PHOTOID         LONG64        756137100011000140
  RUN             LONG               756
  RERUN           INT            137
  CAMCOL          INT              1
  FIELD           INT             11
  ID              LONG               140
  OBJC_TYPE       LONG                 3
  ... etc ...

There are a whole sweet of tools in the postgres class for manipulating databases http://sdssidl.sourceforge.net/postgres/postgres__define.html For a list of all methods do

IDL> methods,'postgres'

For example, if you have a structure in memory, you can stuff it into a new (or existing) table.

 pg=obj_new('postgres')
 pg->struct2table, struct, tablename

This will by default be in the sdss database owned by the sdss user if that is how your environment is set (see above). The sdss database is the best place for public tables (although one would often create them as the postgres user to keep them from getting modified). Often we won't want to clog the main table space with user tables, so we can set you up with your own database. Just let Erin Sheldon know. To create the table under a different user and database:

pg->struct2table, ...., connect_info='dbname=jsmith user=jsmith'

You can also do generic queries with error checking.

 struct = pg->query('select * from datasweep where run=756 and modelflux[2] > 100', status=status)
 if status ne pg->status_val('success') then message,'Failed to retrieve data'

To see what tables are available:

 pg->describe
 #  SCHEMANAME             TABLENAME  TABLEOWNER  TABLESPACE  HASINDEXES  HASRULES  HASTRIGGERS
 0      public          maxbcg_input    postgres                       1         0            0
 1      public        datasweep_meta    postgres                       0         0            0
 2      public             datasweep    postgres                       1         0            0
 3      public              lssfull0    postgres                       1         0            0
 .... etc.

And to describe a given table

 pg->describe, 'datasweep'
 ** Structure <a4cd08>, 41 tags, length=704, data length=696, refs=1:
  PHOTOID         LONG64         94137100011000004
  RUN             LONG                94
  RERUN           INT            137
  CAMCOL          INT              1
  FIELD           INT             11
  ID              LONG                 4
  OBJC_TYPE       LONG                 3
  OBJC_FLAGS      LONG         302645268
  OBJC_FLAGS2     LONG             24600
  COLC            FLOAT     Array[5]
  PETROR50        FLOAT     Array[5]
  PETROR90        FLOAT     Array[5]
  R_DEV           FLOAT     Array[5]
  R_EXP           FLOAT     Array[5]
  ... snip ....
  Indexes for 'datasweep'
              index                             description
 ------------------------------------------------------------
   datasweep_pkey            PRIMARY KEY, btree (photoid)
    datasweep_mf0                  btree ((modelflux[0]))
    datasweep_mf1                  btree ((modelflux[1]))
    datasweep_mf2                  btree ((modelflux[2]))
    datasweep_mf3                  btree ((modelflux[3]))
    datasweep_mf4                  btree ((modelflux[4]))
  datasweep_rrcfi   btree (run, rerun, camcol, field, id)