NEON Data API w/ Python¶
NEON developed an R and Python API for downloading data from their data store.
Cloning Jupyter Tutorials from Github¶
We provide some example Python3 Notebooks and R Markdown Notebooks for downloading lidar and hyperspectral data.
In the terminal:
- Clone notebooks from NEON Data Science or CyVerse GIS to a location on the VM (e.g.
git clone https://github.com/cyverse-gis/neon_data_science cd neon_data_science/lessons
- From Jupyter Notebook or Lab select a data download notebook.
- Follow the notebook instructions.
Download data from CyVerse DataStore in Bash¶
CyVerse uses a system called iRODS to move files onto and off of its Data Store.
iRODS uses multi-threaded file transfers for faster downloads and uploads than traditional
Prerequisite: Installed iRODS iCommands and initiated connection
ilscommand to view your files on the Data Store
Change ownership of the directory where you want to download the data.
sudo chown $USER:iplant-everyone /scratch -R
Create a new directory in
mkdir -p /scratch/2016_Campaign/HARV/L1/DiscreteLidar/
Use the iget command to download files from the Data Store
iget -KPQbrvf /iplant/home/shared/NEON_data_institute_2018/2016_Campaign/HARV/L1/DiscreteLidar/ClassifiedLaz /scratch/2016_Campaign/HARV/L1/DiscreteLidar/ClassifiedLaz
In this example we are using the flags to:
-K verify the checksum -P output the progress of the download. -Q use RBUDP (datagram) protocol for the data transfer -b bulk file transfer -r recursive - retrieve subcollections -v verbose -f force - write local files even it they exist already (overwrite them)
Upload data to the CyVerse DataStore in Bash¶
- Use the iput command to upload files to the Data Store
iput -KPQbrvf /scratch/2016_Campaign/HARV/L1/DiscreteLidar/some_results /iplant/home/$USER/neon/results
Note, we are using the same flags as the
iget statement above.
Download data from CyVerse DataStore with CyberDuck¶
After you’ve set up Cyberduck to access your CyVerse DataStore, you can click and drag and drop files to your localhost; or drag and drop files into a second CyberDuck window that is connected to another data source.
Dragging and dropping data with Cyberduck will cause the data to be streamed down to your localhost and then uploaded back to the second remotehost. This will greatly reduce the speed with which you transfer files.
It is strongly suggested you use the Cyberduck CLI tool to move files between two remote data stores.
Jupyter Lab Google Drive Client¶
Google Drive will ask for some authentication through your browser with a token. After you authenticate you can view files in your Google Drive and move them onto the VM.
If you have any data on Google Drive, you can drag and drop them onto your VM.
Jupyter Lab iRODS Client¶
After you’ve authenticated to CyVerse, you will be able to view your data store files.
The Jupyter iRODS Client is not suitable for downloading hundreds of files, but it is useful for finding files and copying their URLs.
Fix or improve this documentation