Project

General

Profile

--databrowser

All files available on the MiKlip server are scanned and indexed in a special server (SOLR). This allows us to query the server which responds almost immediately. Because of the miklip configuration the first time you call the tool it might take up to a couple of seconds to start. After that normally you should see results within a second.

help

 freva --databrowser --help

The query is of the form key=value. <value> might use *, ? as wildcards or any regular expression encclosed in forward slashes. Depending on your shell and the symbols used, remeber to escape the sequences properly. 
The safest would be to enclosed those in single quotes.

For Example:
    %s project=baseline1 model=MPI-ESM-LR experiment=/decadal200[0-3]/ time_frequency=*hr variable='/ta|tas|vu/'

Usage: freva --databrowser [options]

Options:
  -d, --debug           turn on debugging info and show stack trace on
                        exceptions.
  -h, --help            show this help message and exit
  --multiversion        select not only the latest version but all of them
  --relevant-only       show only facets that filter results (i.e. >1 possible
                        values)
  --batch-size=N        Number of files to retrieve
  --count-facet-values  Show the number of files for each values in each facet
  --attributes          retrieve all possible attributes for the current
                        search instead of the files
  --all-facets          retrieve all facets (attributes & values) instead of
                        the files
  --facet=FACET         retrieve these facets (attributes & values) instead of
                        the files

Usage

The @databrowser expects a list of attribute=value (or key=value) pairs. There are a few differences and many more options (explained next).
Most important is that you don't need to split the search according to the type of data you are searching for. You might as well search for files both on observations, reanalysis and model data all at the same time.

Also important is that all searches are made case insensitive (so don't worry about upper or lower casing)

You can also search for attributes themselves instead of file paths. For example you can search for the list of variables available that satisfies a certain constraint (e.g. sampled 6hr, from a certain model, etc).

Defining the search

solr_search project=baseline1 variable=tas time_frequency=mon

Defining the possible values

There are many more options for defining a value for a given attribute:

Attribute syntax Meaning
attribute=value Search for files containing exactly that attribute
attribute=val* Search for files containing a value for attribute that starts with the prefix val
attribute=*lue Search for files containing a value for attribute that ends with the suffix lue
attribute=*alu* Search for files containing a value for attribute that has alu somewhere
attribute=/.*alu.*/ Search for files containing a value for attribute that matches the given regular expression (yes! you might use any regular expression to find what you want. Check the table after this one)
attribute=value1 attribute=value2 Search for files containing either value1 OR value2 for the given attribute (note that's the same attribute twice!)
attribute1=value1 attribute2=value2 Search for files containing value1 for attribute1 AND value2 for attribute2
attribute_not_=value Search for files NOT containing value
attribute_not_=value1 attribute_not_=value2 Search for files containing NEITHER value1 or value2

NOTE: When using * remember that your shell might give it a different meaning (normally it will try to match files with that name) to turn that off you can use backslash / in most shells

Regular Expressions must be given within forward slashes (/) and are match agains the whole value and not some part of it. Here's a summary (there might be more... check it!)

Syntax Meaning
==Characters==
<any_non_special_character> that character
Any one character
[<any_character>] Any one character between brackets
[<any_character>-<any_other_character>] Any one character between those characters (e.g. [a-e] is like [abcde]
==Repetitions==
* 0 or more times
+ 1 or more times
{n} exactly n times
{n,} at least n times
{n,m} from n to m times
RegExA|RegExB Either RegExpA or RegExpB
==Some examples==
abc exactly "abc"
[abc] either "a", "b" or "c"
[abc]{3} three characters from those given. E.g. "aaa", "bab" or "cab"
[abc]{2,4} two to four characters from those given. E.g. "aaa", "ab" or "cccc"
[a-z]+[0-9]* One ore more characters followed by cero or more number, e.g. "a", "tas", "cfaddbze94"

Searching for metadata

You might as well want to now about possible values that an attribute can take after a certain search is done. For this you use the --facet flag (facets are the possible attributes that partition the result set).
For example to see the time frequency (time resolution) in which reanalysis are available you might issue the following query:

$ freva --databrowser --facet time_frequency project=reanalysis
time_frequency: 6hr,day,mon

You might also ask for more than one single facet by definig the --facet flag multiple times. For example let's also see a list of variables:

$ freva --databrowser --facet time_frequency --facet variable project=reanalysis
variable: cl,clt,evspsbl,hfls,hfss,hur,hus,pr,prc,prsn,prw,ps,psl,rlds,rldscs,rlus,rlut,rlutcs,rsds,rsdscs,rsdt,rsut,rsutcs,sfcwind,ta,tas,tauu,tauv,tro3,ts,ua,uas,va,vas,wap,zg
time_frequency: 6hr,day,mon

Please note that those are not related, i.e. the values of the time_frequency facet do not correspond to any particular variable. It is like issuing to difference queries.

Also note that you can further define this as usual with a given query. For example check which files are at 6hr frequency:

$ freva --databrowser --facet variable project=reanalysis time_frequency=6hr
variable: psl,sfcwind,tas,zg

If you want to see how many files would return if you further select that variable (drill down query) you may add the --count-facet-values flag (simply --count will also do):

$ freva --databrowser --count-facet-values --facet variable project=reanalysis time_frequency=6hr
variable: psl (7991),sfcwind (33),tas (33),zg (131)

This means that there are 7991 files containing the variable psl, 33 for sfcwind, and so on.

If you want to check all facets at once you may use the --all-facets flag (don't worry this is still very fast)

$ freva --databrowser --all-facets project=reanalysis time_frequency=6hrcmor_table: 6hrplev
realm: atmos
data_type: reanalysis
institute: ecmwf,jma-criepi,nasa-gmao,ncep-ncar,noaa-cires
project: 
time_frequency: 6hr
experiment: 20cr,cfsr,eraint,jra-25,merra,merra_testarea,ncep1,ncep2
variable: psl,sfcwind,tas,zg
model: cdas,cfs,geos-5,ifs,jcdas,nomads
data_structure: 
ensemble: r10i1p1,r11i1p1,r12i1p1,r13i1p1,r14i1p1,r15i1p1,r16i1p1,r17i1p1,r18i1p1,r19i1p1,r1i1p1,r20i1p1,r21i1p1,r22i1p1,r23i1p1,r24i1p1,r25i1p1,r26i1p1,r27i1p1,r28i1p1,r29i1p1,r2i1p1,r30i1p1,r31i1p1,r32i1p1,r33i1p1,r34i1p1,r35i1p1,r36i1p1,r37i1p1,r38i1p1,r39i1p1,r3i1p1,r40i1p1,r41i1p1,r42i1p1,r43i1p1,r44i1p1,r45i1p1,r46i1p1,r47i1p1,r48i1p1,r49i1p1,r4i1p1,r50i1p1,r51i1p1,r52i1p1,r53i1p1,r54i1p1,r55i1p1,r56i1p1,r5i1p1,r6i1p1,r7i1p1,r8i1p1,r9i1p1

And again you can also have the --count flag:

$ freva --databrowser --all-facets --count project=reanalysis time_frequency=6hr
cmor_table: 6hrplev (8188)
realm: atmos (8188)
data_type: reanalysis (8188)
institute: ecmwf (132),jma-criepi (66),nasa-gmao (99),ncep-ncar (163),noaa-cires (7728)
project: 
time_frequency: 6hr (8188)
experiment: 20cr (7728),cfsr (64),eraint (132),jra-25 (66),merra (66),merra_testarea (33),ncep1 (65),ncep2 (34)
variable: psl (7991),sfcwind (33),tas (33),zg (131)
model: cdas (99),cfs (64),geos-5 (99),ifs (132),jcdas (66),nomads (7728)
data_structure: 
ensemble: r10i1p1 (138),r11i1p1 (138),r12i1p1 (138),r13i1p1 (138),r14i1p1 (138),r15i1p1 (138),r16i1p1 (138),r17i1p1 (138),r18i1p1 (138),r19i1p1 (138),r1i1p1 (598),r20i1p1 (138),r21i1p1 (138),r22i1p1 (138),r23i1p1 (138),r24i1p1 (138),r25i1p1 (138),r26i1p1 (138),r27i1p1 (138),r28i1p1 (138),r29i1p1 (138),r2i1p1 (138),r30i1p1 (138),r31i1p1 (138),r32i1p1 (138),r33i1p1 (138),r34i1p1 (138),r35i1p1 (138),r36i1p1 (138),r37i1p1 (138),r38i1p1 (138),r39i1p1 (138),r3i1p1 (138),r40i1p1 (138),r41i1p1 (138),r42i1p1 (138),r43i1p1 (138),r44i1p1 (138),r45i1p1 (138),r46i1p1 (138),r47i1p1 (138),r48i1p1 (138),r49i1p1 (138),r4i1p1 (138),r50i1p1 (138),r51i1p1 (138),r52i1p1 (138),r53i1p1 (138),r54i1p1 (138),r55i1p1 (138),r56i1p1 (138),r5i1p1 (138),r6i1p1 (138),r7i1p1 (138),r8i1p1 (138),r9i1p1 (138)

You might have also seen that some facets are not relevant at all as they are not partitioning the resulting data (e.g. see cmor_table or data_type). You can leave them out by adding the --relevant-only flag

$ freva --databrowser --all-facets --count --relevant-only project=reanalysis time_frequency=6hr
institute: ecmwf (132),jma-criepi (66),nasa-gmao (99),ncep-ncar (163),noaa-cires (7728)
experiment: 20cr (7728),cfsr (64),eraint (132),jra-25 (66),merra (66),merra_testarea (33),ncep1 (65),ncep2 (34)
variable: psl (7991),sfcwind (33),tas (33),zg (131)
model: cdas (99),cfs (64),geos-5 (99),ifs (132),jcdas (66),nomads (7728)
ensemble: r10i1p1 (138),r11i1p1 (138),r12i1p1 (138),r13i1p1 (138),r14i1p1 (138),r15i1p1 (138),r16i1p1 (138),r17i1p1 (138),r18i1p1 (138),r19i1p1 (138),r1i1p1 (598),r20i1p1 (138),r21i1p1 (138),r22i1p1 (138),r23i1p1 (138),r24i1p1 (138),r25i1p1 (138),r26i1p1 (138),r27i1p1 (138),r28i1p1 (138),r29i1p1 (138),r2i1p1 (138),r30i1p1 (138),r31i1p1 (138),r32i1p1 (138),r33i1p1 (138),r34i1p1 (138),r35i1p1 (138),r36i1p1 (138),r37i1p1 (138),r38i1p1 (138),r39i1p1 (138),r3i1p1 (138),r40i1p1 (138),r41i1p1 (138),r42i1p1 (138),r43i1p1 (138),r44i1p1 (138),r45i1p1 (138),r46i1p1 (138),r47i1p1 (138),r48i1p1 (138),r49i1p1 (138),r4i1p1 (138),r50i1p1 (138),r51i1p1 (138),r52i1p1 (138),r53i1p1 (138),r54i1p1 (138),r55i1p1 (138),r56i1p1 (138),r5i1p1 (138),r6i1p1 (138),r7i1p1 (138),r8i1p1 (138),r9i1p1 (138)

If you try to retrieve all variables stored (remember there are over +2.100.000 files!) you'll notice an ellipses (...) ath the end of the list:

$ freva --databrowser --facet variable
variable: abs550aer,ageice,agessc,albisccp,arag,areacella,areacello,bacc,baresoilfrac,basin,bddtalk,bddtdic,bddtdife,bddtdin,bddtdip,bddtdisi,bfe,bmelt,bsi,burntarea,c3pftfrac,c4pftfrac,calc,ccb,cct,ccwd,cdnc,cfad2lidarsr532,cfaddbze94,cfadlidarsr532,cfc11,cfc113global,cfc11global,cfc12global,ch4,ch4global,chl,chlcalc,chldiat,chldiaz,chlmisc,chlpico,ci,cl,clc,clcalipso,clcalipso2,clccalipso,cldnci,cldncl,cldnvi,cleaf,clhcalipso,cli,clic,clis,clisccp,clitter,clitterabove,clitterbelow,clivi,cllcalipso,clmcalipso,clrcalipso,cls,clt,cltc,cltcalipso,cltisccp,cltnobs,cltstddev,clw,clwc,clws,clwvi,cmisc,co2,co2mass,co3,co3satarag,co3satcalc,concaerh2o,concbb,concbc,conccn,concdms,concdust,concnh4,concno3,concoa,concpoa,concso2,concso4,concsoa,concss,cproduct,croot,cropfrac,csoil,csoilfast...

This means there are more results than those being shown here. We limit the results to 100 for usability sake. If you still think this is a bug instead of a terrific feature, then you might use a special search word to change this facet.limit. That's the number of results that will be retrieved. Setting it to -1 retrieves just everything... be aware that make cause some problems if you don't know what you are doing (well sometimes it might also cause problems if you do... so use with discretion)

$ freva --databrowser --facet variable facet.limit=-1
variable: abs550aer,ageice,agessc,albisccp,arag,areacella,areacello,bacc,baresoilfrac,basin,bddtalk,bddtdic,bddtdife,bddtdin,bddtdip,bddtdisi,bfe,bmelt,bsi,burntarea,c3pftfrac,c4pftfrac,calc,ccb,cct,ccwd,cdnc,cfad2lidarsr532,cfaddbze94,cfadlidarsr532,cfc11,cfc113global,cfc11global,cfc12global,ch4,ch4global,chl,chlcalc,chldiat,chldiaz,chlmisc,chlpico,ci,cl,clc,clcalipso,clcalipso2,clccalipso,cldnci,cldncl,cldnvi,cleaf,clhcalipso,cli,clic,clis,clisccp,clitter,clitterabove,clitterbelow,clivi,cllcalipso,clmcalipso,clrcalipso,cls,clt,cltc,cltcalipso,cltisccp,cltnobs,cltstddev,clw,clwc,clws,clwvi,cmisc,co2,co2mass,co3,co3satarag,co3satcalc,concaerh2o,concbb,concbc,conccn,concdms,concdust,concnh4,concno3,concoa,concpoa,concso2,concso4,concsoa,concss,cproduct,croot,cropfrac,csoil,csoilfast,csoilmedium,csoilslow,cveg,cwood,darag,dcalc,demc,dems,deptho,detoc,dfe,difmxybo,difmxylo,diftrblo,diftrelo,difvho,difvmbo,difvmo,difvmto,difvso,difvtrbo,difvtrto,dispkevfo,dispkexyfo,dissic,dissoc,divice,dmc,dms,dpco2,dpo2,dpocdtdiaz,dpocdtpico,drybc,drydms,drydust,drynh3,drynh4,dryoa,drypoa,dryso2,dryso4,drysoa,dryss,dtauc,dtaus,ec550aer,edt,emibb,emibc,emidms,emidust,eminh3,emioa,emipoa,emiso2,emiso4,emiss,eparag100,epc100,epcalc100,epfe100,epsi100,evap,evisct,eviscu,evs,evspsbl,evspsblsoi,evspsblveg,evu,exparag,expc,expcalc,expcfe,expn,expp,expsi,fbddtalk,fbddtdic,fbddtdife,fbddtdin,fbddtdip,fbddtdisi,fco2antt,fco2fos,fco2nat,fddtalk,fddtdic,fddtdife,fddtdin,fddtdip,fddtdisi,fediss,fescav,ffire,fgco2,fgdms,fgo2,fgrazing,fharvest,ficeberg,flittersoil,fluc,frc,frfe,friver,frn,fsc,fsfe,fsitherm,fsn,fveglitter,fvegsoil,gpp,grassfrac,graz,grcongel,grfrazil,gridspec,grlateral,h2o,hcfc22global,hcice,hfbasin,hfcorr,hfds,hfdsn,hfevapds,hfgeou,hfibthermds,hfls,hflssi,hfrainds,hfrunoffds,hfsifrazil,hfsithermds,hfsnthermds,hfss,hfssi,hfx,hfxba,hfxdiff,hfy,hfyba,hfydiff,htovovrt,hur,hurs,hus,husnobs,huss,husstderr,ialb,inc,intdic,intparag,intpbfe,intpbsi,intpcalc,intpcalcite,intpdiat,intpdiaz,intpmisc,intpn2,intpnitrate,intpp,intppico,lai,landcoverfrac,loadbc,loaddust,loadnh4,loadoa,loadpoa,loadso4,loadsoa,loadss,lwsnl,masscello,masso,mc,mcd,mcu,mfo,mlotst,mlotstsq,mrfso,mrlsl,mrro,mrros,mrso,mrsofc,mrsos,msftbarot,msftmrhoz,msftmrhozba,msftmyz,msftmyzba,msftyrhoz,msftyrhozba,msftyyz,msftyyzba,n2o,n2oglobal,nbp,nep,nh4,no3,npp,nppleaf,npproot,nppwood,o2,o2min,od550aer,od550lt1aer,od870aer,omldamax,omlmax,orog,parag,parasolrefl,pasturefrac,pbfe,pbo,pbsi,pcalc,pctisccp,pdi,pfull,ph,phalf,phyc,phycalc,phydiat,phydiaz,phyfe,phymisc,phyn,phyp,phypico,phypmisc,physi,pnitrate,po4,pon,pop,pp,pr,prc,prcprof,prlsns,prlsprof,prsn,prsnc,prstderr,prveg,prw,ps,psl,pso,ra,reffclic,reffclis,reffclwc,reffclws,reffclwtop,reffrainc,reffrains,reffsnowc,reffsnows,residualfrac,rgrowth,rh,rhopoto,rhs,rhsmax,rhsmin,ridgice,rld,rld4co2,rldcs,rldcs4co2,rlds,rldscs,rldssi,rlu,rlu4co2,rlucs,rlucs4co2,rlus,rlussi,rlut,rlut4co2,rlutcs,rlutcs4co2,rmaint,rootd,rsd,rsd4co2,rsdcs,rsdcs4co2,rsds,rsdscs,rsdscsdiff,rsdsdiff,rsdssi,rsdt,rsntds,rsu,rsu4co2,rsucs,rsucs4co2,rsus,rsuscs,rsussi,rsut,rsut4co2,rsutcs,rsutcs4co2,rtmt,sbl,sblsi,sci,sconcbc,sconcdust,sconcnh4,sconcno3,sconcoa,sconcpoa,sconcso4,sconcsoa,sconcss,sfcwind,sfcwindmax,sfcwindnobs,sfcwindstderr,sfdsi,sfriver,sftgif,sftlf,sftof,shrubfrac,si,sic,sim,sit,smc,snc,snd,snm,snomelt,snotoice,snw,so,soga,sootsn,sos,spco2,ssi,strairx,strairy,streng,strocnx,strocny,sza,ta,ta700,talk,tanobs,tas,tasmax,tasmin,tastderr,tauu,tauucorr,tauuo,tauv,tauvcorr,tauvo,thetao,thetaoga,thkcello,tmelt,tnhus,tnhusa,tnhusc,tnhusd,tnhusmp,tnhusscpbl,tnkebto,tnpeo,tnpeot,tnpeotb,tnsccw,tnsccwa,tnsccwacr,tnsccwacs,tnsccwbl,tnsccwce,tnsccwcm,tnsccwif,tnscli,tnsclia,tnscliag,tnsclias,tnsclibfpcl,tnsclibl,tnsclicd,tnsclicm,tnsclids,tnscliemi,tnsclihencl,tnsclihenv,tnsclihon,tnscliif,tnsclimcl,tnsclimr,tnscliricl,tnsclirir,tnsclw,tnsclwa,tnsclwac,tnsclwar,tnsclwas,tnsclwbfpcli,tnsclwcd,tnsclwce,tnsclwcm,tnsclwhen,tnsclwhon,tnsclwmi,tnsclwri,tnt,tnta,tntc,tntmp,tntr,tntscpbl,toffset,tos,tosnobs,tossq,tosstderr,tran,transifs,transix,transiy,treefrac,treefracprimdec,treefracprimever,tro3,tro3nobs,tro3stderr,ts,tsice,tsl,tslsi,tsn,tsnint,tso,ua,uas,uasnobs,uasstderr,umo,uncalipso,uo,usi,va,vas,vasnobs,vasstderr,vmo,vo,volcello,volo,vsf,vsfcorr,vsfevap,vsfpr,vsfriver,vsfsit,vsi,wap,wap500,wetbc,wetdms,wetdust,wetnh4,wetoa,wetpoa,wetso2,wetso4,wetsoa,wetss,wfcorr,wfo,wfonocorr,wmo,wmosq,zfull,zg,zhalf,zmeso,zmicro,zo2min,zooc,zoocmisc,zos,zosga,zosnobs,zossga,zossq,zosstderr,zostoga,zsatarag,zsatcalc

By the way, do you want to count them? Those are 619 variables!

$ freva --databrowser --facet variable facet.limit=-1 | tr ',' '\n' | wc -l
619

Bash auto completion

And if that's not awesome enough (I know it never is), then try the bash auto-completion. If you are using bash, everything is already setup when you issued the module load evaluation_system command.
Whenever you hit tab the word will be completed to the longest unique string that matches your previous input. A second tab will bring up a list of all possible completions after that.

For example (<TAB> denotes presing the tab key):

freva --databrowser project=base<TAB>

results in
freva --databrowser project=baseline

Now pressing <TAB> again will show all other possibilities:
$ freva --databrowser project=baseline<TAB>
baseline0  baseline1

But flags are not the only thing being populated, it also work on atributes:

$ freva --databrowser <TAB><TAB>
cmor_table=      ensemble=        institute=       project=         time_frequency=
data_type=       experiment=      model=           realm=           variable=

... and of course values:

$ freva --databrowser institute=m<TAB><TAB>
miroc  mohc   mpi-m  mri

And (yes! That wasn't all) this is also query aware:

$ freva --databrowser institute=<TAB><TAB>
bcc           csiro-bom     inpe          miroc         nasa-gsfc     nimr-kma
bnu           csiro-qccce   ipsl          mohc          nasa-jpl      noaa-cires
cccma         ecmwf         jma-criepi    mpi-m         nasa-larc     noaa-gfdl
cmcc          fio           lasg-cess     mri           ncar          nsf-doe-ncar
cnes          ichec         lasg-iap      nasa-giss     ncc           remss
cnrm-cerfacs  inm           loa_ipsl      nasa-gmao     ncep-ncar     
$ freva --databrowser project=reanalysis institute=<TAB><TAB>
ecmwf       jma-criepi  nasa-gmao   ncep-ncar   noaa-cires 

Note that if you mix flags this might not work as intended (or not at all).

Examples

Find any ta* variable in any baseline

freva --databrowser data_type=baseline* variable=ta*

Find out if a file was republished under different versions

freva --databrowser --multiversion --facet version project=baseline0 variable=tas time_frequency=mon ensemble=r1i1p1

Find what a given variable name stands for (e.g. wetso4)

$ ncdump -h $(freva --databrowser variable=wetso4 | head -n 1) | grep wetso4:standard_name
        wetso4:standard_name = "tendency_of_atmosphere_mass_content_of_sulfate_expressed_as_sulfur_dry_aerosol_due_to_wet_deposition" ;