Skip to content

Pandas : Answers to exercises

Exercise 1

The file 2276931.csv contains precipitation data for an NOAA weather station HAVANA 4.2 SW, FL US for the year 2020 to date.

The dataset URL is:

https://raw.githubusercontent.com/UCL-EO/geog0111/master/notebooks/data/2276931.csv

  • Inspect the file to discover any issues you must account for.
  • Download the file and read into pandas
  • print the first 5 lines of data
from urlpath import URL
from pathlib import Path
import pandas as pd

# ANSWER
msg = '''
Inspect the file to discover any issues you must account for.

The file is straightforward CVS format, with the first column
the data column titles
'''
print(msg)

site = 'https://raw.githubusercontent.com'
site_dir = '/UCL-EO/geog0111/master/notebooks/data'
site_file = '2276931.csv'

# form the URL
url = URL(site,site_dir,site_file)

r = url.get()
if r.status_code == 200:
    # setup Path object for output file
    filename = Path('work',url.name)
    # write text data
    filename.write_text(r.text)
    # check size and report
    print(f'file {filename} written: {filename.stat().st_size} bytes')
else:
    print(f'failed to get {url}')

# Read the file into pandas using url.open('r').
df=pd.read_csv(filename)

# print the first 5 lines of data
df.head(5)
Inspect the file to discover any issues you must account for.

The file is straightforward CVS format, with the first column
the data column titles

file work/2276931.csv written: 15078 bytes
STATION NAME DATE PRCP SNOW
0 US1FLGD0002 HAVANA 4.2 SW, FL US 2020-01-01 0.00 0.0
1 US1FLGD0002 HAVANA 4.2 SW, FL US 2020-01-02 0.00 0.0
2 US1FLGD0002 HAVANA 4.2 SW, FL US 2020-01-03 0.00 0.0
3 US1FLGD0002 HAVANA 4.2 SW, FL US 2020-01-04 0.98 NaN
4 US1FLGD0002 HAVANA 4.2 SW, FL US 2020-01-05 0.00 0.0

Exercise 2

Read and print the data in the file 'work/dataset.csv

# ANSWER
df1=pd.read_csv(Path('work/dataset.csv'))
df1.head()
x data y data
0 0 0
1 1 1
2 2 4
3 3 9
4 4 16

Last update: September 29, 2021