Pandas : Answers to exercises
Exercise 1
The file 2276931.csv contains precipitation data for an NOAA weather station HAVANA 4.2 SW, FL US
for the year 2020 to date.
The dataset URL is:
https://raw.githubusercontent.com/UCL-EO/geog0111/master/notebooks/data/2276931.csv
- Inspect the file to discover any issues you must account for.
- Download the file and read into
pandas
- print the first 5 lines of data
from urlpath import URL
from pathlib import Path
import pandas as pd
# ANSWER
msg = '''
Inspect the file to discover any issues you must account for.
The file is straightforward CVS format, with the first column
the data column titles
'''
print(msg)
site = 'https://raw.githubusercontent.com'
site_dir = '/UCL-EO/geog0111/master/notebooks/data'
site_file = '2276931.csv'
# form the URL
url = URL(site,site_dir,site_file)
r = url.get()
if r.status_code == 200:
# setup Path object for output file
filename = Path('work',url.name)
# write text data
filename.write_text(r.text)
# check size and report
print(f'file {filename} written: {filename.stat().st_size} bytes')
else:
print(f'failed to get {url}')
# Read the file into pandas using url.open('r').
df=pd.read_csv(filename)
# print the first 5 lines of data
df.head(5)
Inspect the file to discover any issues you must account for.
The file is straightforward CVS format, with the first column
the data column titles
file work/2276931.csv written: 15078 bytes
STATION | NAME | DATE | PRCP | SNOW | |
---|---|---|---|---|---|
0 | US1FLGD0002 | HAVANA 4.2 SW, FL US | 2020-01-01 | 0.00 | 0.0 |
1 | US1FLGD0002 | HAVANA 4.2 SW, FL US | 2020-01-02 | 0.00 | 0.0 |
2 | US1FLGD0002 | HAVANA 4.2 SW, FL US | 2020-01-03 | 0.00 | 0.0 |
3 | US1FLGD0002 | HAVANA 4.2 SW, FL US | 2020-01-04 | 0.98 | NaN |
4 | US1FLGD0002 | HAVANA 4.2 SW, FL US | 2020-01-05 | 0.00 | 0.0 |
Exercise 2
Read and print the data in the file 'work/dataset.csv
# ANSWER
df1=pd.read_csv(Path('work/dataset.csv'))
df1.head()
x data | y data | |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 2 | 4 |
3 | 3 | 9 |
4 | 4 | 16 |
Last update:
September 29, 2021