$Header$ -*-text-*-

netCDF Operators NCO version 5.1.1 lunge toward you

http://nco.sf.net (Homepage, Mailing lists, Help)
http://github.com/nco (Source Code, Issues, Releases, Developers)

What's new?
Version 5.1.1 add features for NCZarr, regridding, and interpolation. 
All operators now support NCZarr I/O and input filenames via stdin.
ncremap supports two new vertical extrapolation methods, 1D files, and
allows flexible masking based on external fields such sub-gridscale
extent. ncclimo outputs regional averages. 
Numerous minor fixes improve codec support and regridding control.
All users are encouraged to upgrade to this feature-rich release.

Work on NCO 5.1.2 has commenced and aims to add support for Zarr S3
stores and to polish support for new codecs.

Enjoy,
Charlie

NEW FEATURES (full details always in ChangeLog):

A. All operators now support specifying input files via stdin.
This capability was implemented with NCZarr in mind, though it can
also be used with traditional POSIX files. The ncap2, ncks, ncrename,
and ncatted operators accept one or two filenames as positional
arguments. If the input file is provided via stdin, then the output
file, if any, must be specified with -o so the operators know whether
to check stdin. Multi-file operators (ncra, ncrcat, ncecat) will
continue to identify the last positional argument as the output
file unless -o is used. The best best practice is to use -o fl_out to
specify output filenames when stdin is used for input filenames:
echo in.nc | ncks              
echo in.nc | ncks -o out.nc
echo "in1.nc in2.nc" | ncbo -o out.nc
echo "in1.nc in2.nc" | ncflint -o out.nc
http://nco.sf.net/nco.html#stdin

B. All NCO operators support NCZarr I/O. This support is currently
limited to the "file://" scheme. Support for the S3 scheme is next.
All NCO commands should work as expected independent of the back-end
storage format of the I/O. Operators can ingest and output POSIX,
Zarr, or a mixture of these two file formats.

in_ncz="file://${HOME}/in_zarr4#mode=nczarr,file"
in_psx="${HOME}/in_zarr4.nc"
out_ncz="file://${HOME}/foo#mode=nczarr,file"
out_psx="${HOME}/foo.nc"

ncks ${in_ncz} # Print contents of Zarr file
ncks -O -v var ${in_psx} ${out_psx} # POSIX input to POSIX output
ncks -O -v var ${in_psx} ${out_ncz} # POSIX input to Zarr output
ncks -O -v var ${in_ncz} ${out_psx} # Zarr input to  POSIX output
ncks -O -v var ${in_ncz} ${out_ncz} # Zarr input to Zarr output
ncks -O --cmp='gbr|shf|zst' ${in_psx} ${out_ncz} # Quantize/Compress
ncks -O --cmp='gbr|shf|zst' ${in_ncz} ${out_ncz} # Quantize/Compress

Commands with Zarr I/O behave mostly as expected. NCO treats
Zarr and POSIX files identically once they are "opened" via the
netCDF API. Hence the main difference between Zarr and POSIX,
from the viewpoint of NCO, is in handling the filenames. By default
NCO performs operations in temporary files that it moves to a final
destination once the rest of the command succeeds. Supporting Zarr in
NCO means applying the correct procedures to create, copy,
move/rename, and delete files and directories correctly depending on 
the backend format.

Many NCO users rely on POSIX filename globbing for multi-file
operations, e.g., 'ncra in*.nc out.nc'. POSIX globbing returns
matches in POSIX format (e.g., 'in1.nc in2.nc in3.nc') which lacks the
"scheme://" indicator and the "#mode=..." fragment that the netCDF API
needs to open a Zarr store. There is no perfect solution to this.

A partial solution is available by judiciously using NCO's new stdin
capabilities for all operators. The procedure relies on using the 'ls'
command (instead of globbing) to identify the desired Zarr stores, and
piping the (POSIX-style) results of that through the newly supplied
NCO filter-script that will prepend the desired scheme and append the
desired fragment to the matched Zarr stores, and pipe those results to
the NCO operator: 

ncra in*.nc out.nc      # POSIX input files via globbing
ls in*.nc | ncra out.nc # POSIX input files via stdin
ls in*.nc | ncz2psx | ncra out.nc # Zarr input via stdin
ls in*.nc | ncz2psx --scheme=file --mode=nczarr,file | ncra out.nc

Thanks to Dennis Heimbigner of Unidata for implementing NCZarr.
http://nco.sf.net/nco.html#nczarr

C. The --glb_avg switch causes the splitter to output global-mean
timeseries files. That has been true since 2019. This switch now
causes the splitter to output three horizontally spatially averaged
timeseries. First is the global average (as before), next is the
northern hemisphere average, followed by the southern hemisphere
average. The three timeseries are now saved in a two-dimensional (time
by region) array with a "region dimension" named rgn. Region names are
stored in the variable named region_name: 
ncclimo --split --rgn_avg # Produce regional and global averages
ncclimo --split --glb_avg # Same (deprecated switch name)
Thanks to Chris Golaz of LLNL for suggesting this feature.
http://nco.sf.net/nco.html#rgn_avg

D. ncremap has long been able to re-normalize and/or mask-out fields
in partially unmapped destination gridcells. The --rnr_thr option
set the threshold value for valid cell coverage. However, the 
implementation considered only the fraction of each gridcell left
unmapped due to explicit missing values (i.e., _FillValue). Now the
implementation can also mask by the value of a specified sub-gridscale
(SGS) variable, e.g., landfrac. The --add_fll switch now sets to
_FillValue any gridcell whose sgs_frc < rnr_thr. The --add_fll switch
is currently opt-in, except for datasets produced by MPAS and
identifed as such by the -P option. The new --no_add_fll overrides
and turns off any automatic --add_fll behavior:
ncremap ...           # No renormalization/masking
ncremap --rnr=0.1 ... # Mask cells missing > 10% 
ncremap --rnr=0.1 --sgs_frc=sgs ... # Mask missing > 10%
ncremap --rnr=0.1 --sgs_frc=sgs --add_fll ... # Mask missing > 90% or sgs < 10% 
ncremap -P mpas... # --add_fll implicit, mask where sgs=0.0
ncremap -P mpas... --no_add_fll # --add_fll explicitly turned-off, no masking
ncremap -P mpas... --rnr=0.1 # Mask missing > 90% or sgs < 10% 
ncremap -P elm...  # --add_fll not implicit, no masking
Thanks to Jill Zhang of LLNL for suggesting this capability.
http://nco.sf.net/nco.html#add_fll

E. The map checker diagnoses from the global attributes map_method,
no_conserve, or noconserve (if present) whether the mapping weights
are intended to be conservative (as opposed to, e.g., bilinear).
Weights deemed non-conservative by design are no longer flagged
with dire WARNING messages. Thanks to Mark Taylor of SNL for this
suggestion. 
ncks --chk_map map.nc
http://nco.sf.net/nco.html#chk_map

F. ncremap vertical interpolation supports two new extrapolation
methods: linear and zero. Linear extrapolation does exactly what 
you think: Values outside the input domain are linearly extrapolated
from the nearest two values inside the input domain. Invoke this with
--vrt_xtr=lnr or --vrt_xtr=linear. Zero extrapolation sets values  
outside the extrapoloation domain to 0.0. Invoke this with
--vrt_xtr=zero. 
ncremap --vrt_xtr=zero --vrt=vrt.nc in.nc out.nc
ncremap --vrt_xtr=linear --vrt=vrt.nc in.nc out.nc
ncks --rgr xtr_mth=linear --vrt=vrt.nc in.nc out.nc
ncks --rgr xtr_mth=zero --vrt=vrt.nc in.nc out.nc
http://nco.sf.net/nco.html#vrt_xtr

G. All numerical operators offer robust support for Blosc codecs
when linked to netCDF 4.9.1+. This includes Blosc Zstandard, LZ, LZ4,  
and Zlib. Thanks to Dennis Heimbigner of Unidata for upstream fixes.

BUG FIXES:
   
A. NCO 5.1.0 could spew numerous and redundant diagnostic messages
about missing codecs. 5.1.1 quiets these messages.

B. If linked to netCDF 4.8.1 or earlier, NCO 5.1.0 could fail to find
standard codecs and therefore die when compression was requested.
The workarounds with 5.1.0 are to downgrade to 5.0.9 or to avoid
compression. The fix is to upgrade.

C. NCO 5.0.5 renamed distance-weighted-extrapolation (DWE) as
inverse-distance-weighting (IDW). Unfortunately the switches to invoke
this in ncremap were not updated. This means that maps intended to be
IDW in 5.0.5-5.1.0 are probably the default monotonic conservative NCO
weights. The workaround with 5.0.5-5.1.0 is to downgrade to 5.0.4.
The fix is to upgrade.

D. Previous versions of ncremap and ncks failed to vertically
interpolate 1-D vertical files (!). Doh! The workaround is to
artificially extend the 1-D file to 2-D. The fix is to upgrade.

Full release statement at http://nco.sf.net/ANNOUNCE
    
KNOWN PROBLEMS DUE TO NCO:

This section of ANNOUNCE reports and reminds users of the
existence and severity of known, not yet fixed, problems. 
These problems occur with NCO 5.1.1 built/tested under
MacOS 12.4 with netCDF 4.9.0 on HDF5 1.12.2 and with
Linux with netCDF 4.9.0 on HDF5 1.8.19.

A. NOT YET FIXED (NCO problem)
   Correctly read arrays of NC_STRING with embedded delimiters in ncatted arguments

   Demonstration:
   ncatted -D 5 -O -a new_string_att,att_var,c,sng,"list","of","str,ings" ~/nco/data/in_4.nc ~/foo.nc
   ncks -m -C -v att_var ~/foo.nc

   20130724: Verified problem still exists
   TODO nco1102
   Cause: NCO parsing of ncatted arguments is not sophisticated
   enough to handle arrays of NC_STRINGS with embedded delimiters.

B. NOT YET FIXED (NCO problem?)
   ncra/ncrcat (not ncks) hyperslabbing can fail on variables with multiple record dimensions

   Demonstration:
   ncrcat -O -d time,0 ~/nco/data/mrd.nc ~/foo.nc

   20140826: Verified problem still exists
   20140619: Problem reported by rmla
   Cause: Unsure. Maybe ncra.c loop structure not amenable to MRD?
   Workaround: Convert to fixed dimensions then hyperslab

KNOWN PROBLEMS DUE TO BASE LIBRARIES/PROTOCOLS:

A. NOT YET FIXED (netCDF4 or HDF5 problem?)
   Specifying strided hyperslab on large netCDF4 datasets leads
   to slowdown or failure with recent netCDF versions.

   Demonstration with NCO <= 4.4.5:
   time ncks -O -d time,0,,12 ~/ET_2000-01_2001-12.nc ~/foo.nc
   Demonstration with NCL:
   time ncl < ~/nco/data/ncl.ncl   
   20140718: Problem reported by Parker Norton
   20140826: Verified problem still exists
   20140930: Finish NCO workaround for problem
   20190201: Possibly this problem was fixed in netCDF 4.6.2 by https://github.com/Unidata/netcdf-c/pull/1001
   Cause: Slow algorithm in nc_var_gets()?
   Workaround #1: Use NCO 4.4.6 or later (avoids nc_var_gets())
   Workaround #2: Convert file to netCDF3 first, then use stride
   Workaround #3: Compile NCO with netCDF >= 4.6.2

B. NOT YET FIXED (netCDF4 library bug)
   Simultaneously renaming multiple dimensions in netCDF4 file can corrupt output

   Demonstration:
   ncrename -O -d lev,z -d lat,y -d lon,x ~/nco/data/in_grp.nc ~/foo.nc # Completes but produces unreadable file foo.nc
   ncks -v one ~/foo.nc

   20150922: Confirmed problem reported by Isabelle Dast, reported to Unidata
   20150924: Unidata confirmed problem
   20160212: Verified problem still exists in netCDF library
   20160512: Ditto
   20161028: Verified problem still exists with netCDF 4.4.1
   20170323: Verified problem still exists with netCDF 4.4.2-development
   20170323: https://github.com/Unidata/netcdf-c/issues/381
   20171102: Verified problem still exists with netCDF 4.5.1-development
   20171107: https://github.com/Unidata/netcdf-c/issues/597
   20190202: Progress has recently been made in netCDF 4.6.3-development
   More details: http://nco.sf.net/nco.html#ncrename_crd

C. NOT YET FIXED (would require DAP protocol change?)
   Unable to retrieve contents of variables including period '.' in name
   Periods are legal characters in netCDF variable names.
   Metadata are returned successfully, data are not.
   DAP non-transparency: Works locally, fails through DAP server.

   Demonstration:
   ncks -O -C -D 3 -v var_nm.dot -p http://thredds-test.ucar.edu/thredds/dodsC/testdods in.nc # Fails to find variable

   20130724: Verified problem still exists. 
   Stopped testing because inclusion of var_nm.dot broke all test scripts.
   NB: Hard to fix since DAP interprets '.' as structure delimiter in HTTP query string.

   Bug tracking: https://www.unidata.ucar.edu/jira/browse/NCF-47

D. NOT YET FIXED (would require DAP protocol change)
   Correctly read scalar characters over DAP.
   DAP non-transparency: Works locally, fails through DAP server.
   Problem, IMHO, is with DAP definition/protocol

   Demonstration:
   ncks -O -D 1 -H -C -m --md5_dgs -v md5_a -p http://thredds-test.ucar.edu/thredds/dodsC/testdods in.nc

   20120801: Verified problem still exists
   Bug report not filed
   Cause: DAP translates scalar characters into 64-element (this
   dimension is user-configurable, but still...), NUL-terminated
   strings so MD5 agreement fails 

"Sticky" reminders:

A. Reminder that NCO works on most HDF4 and HDF5 datasets, e.g., 
   HDF4: AMSR MERRA MODIS ...
   HDF5: GLAS ICESat Mabel SBUV ...
   HDF-EOS5: AURA HIRDLS OMI ...

B. Pre-built executables for many OS's at:
   http://nco.sf.net#bnr

