pandas¶
Pandas is a module for Python to manipulate and analyze tabular data. It allows easy reading and writing to different file formats (including MS Excel).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
working with Excel data¶
myexcel = pd.read_excel("pandas_test.xlsx", sheetname="mydata", skiprows=2)
myexcel
| sample | weight | volume | magnetic moment | |
|---|---|---|---|---|
| 0 | A | 20.1 | 10.3 | 2.000000e-08 |
| 1 | B | 20.4 | 10.2 | 2.000000e-08 |
| 2 | C | 19.8 | 9.8 | 2.000000e-08 |
| 3 | D | 17.2 | 11.0 | 2.000000e-08 |
| 4 | E | 24.3 | 10.3 | 2.000000e-08 |
| 5 | F | 18.5 | 10.3 | 2.000000e-08 |
myexcel[['sample','weight']]
| sample | weight | |
|---|---|---|
| 0 | A | 20.1 |
| 1 | B | 20.4 |
| 2 | C | 19.8 |
| 3 | D | 17.2 |
| 4 | E | 24.3 |
| 5 | F | 18.5 |
calculations with rows:
myexcel['density'] = myexcel['weight'] / myexcel['volume']
save pandas dataframe back to an excel file
myexcel.to_excel('new_excel_file.xlsx')
and a text file
myexcel.to_csv('new_excel_file.csv', sep='\t')
intermagnet data¶
Data from magnetic observatories all over the world can be downloaded here:
http://www.intermagnet.org/data-donnee/download-eng.php
we start by reading in the file downloaded from intermagnet in IAGA2002 format
clf = pd.read_csv("clf20160202vsec.sec", skiprows=22, sep='\s+')
clf
| DATE | TIME | DOY | CLFX | CLFY | CLFZ | CLFF | | | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2016-02-02 | 00:00:00.000 | 33 | 21201.36 | 120.82 | 42892.47 | 47846.22 | NaN |
| 1 | 2016-02-02 | 00:00:01.000 | 33 | 21201.35 | 120.80 | 42892.49 | 47846.20 | NaN |
| 2 | 2016-02-02 | 00:00:02.000 | 33 | 21201.35 | 120.80 | 42892.47 | 47846.20 | NaN |
| 3 | 2016-02-02 | 00:00:03.000 | 33 | 21201.38 | 120.79 | 42892.49 | 47846.21 | NaN |
| 4 | 2016-02-02 | 00:00:04.000 | 33 | 21201.37 | 120.80 | 42892.49 | 47846.20 | NaN |
| 5 | 2016-02-02 | 00:00:05.000 | 33 | 21201.36 | 120.83 | 42892.46 | 47846.18 | NaN |
| 6 | 2016-02-02 | 00:00:06.000 | 33 | 21201.34 | 120.86 | 42892.45 | 47846.19 | NaN |
| 7 | 2016-02-02 | 00:00:07.000 | 33 | 21201.36 | 120.83 | 42892.45 | 47846.19 | NaN |
| 8 | 2016-02-02 | 00:00:08.000 | 33 | 21201.35 | 120.83 | 42892.45 | 47846.17 | NaN |
| 9 | 2016-02-02 | 00:00:09.000 | 33 | 21201.35 | 120.81 | 42892.47 | 47846.17 | NaN |
| 10 | 2016-02-02 | 00:00:10.000 | 33 | 21201.33 | 120.80 | 42892.44 | 47846.15 | NaN |
| 11 | 2016-02-02 | 00:00:11.000 | 33 | 21201.33 | 120.82 | 42892.43 | 47846.17 | NaN |
| 12 | 2016-02-02 | 00:00:12.000 | 33 | 21201.33 | 120.85 | 42892.41 | 47846.14 | NaN |
| 13 | 2016-02-02 | 00:00:13.000 | 33 | 21201.35 | 120.84 | 42892.43 | 47846.15 | NaN |
| 14 | 2016-02-02 | 00:00:14.000 | 33 | 21201.34 | 120.83 | 42892.45 | 47846.14 | NaN |
| 15 | 2016-02-02 | 00:00:15.000 | 33 | 21201.31 | 120.84 | 42892.43 | 47846.15 | NaN |
| 16 | 2016-02-02 | 00:00:16.000 | 33 | 21201.34 | 120.84 | 42892.41 | 47846.16 | NaN |
| 17 | 2016-02-02 | 00:00:17.000 | 33 | 21201.35 | 120.85 | 42892.40 | 47846.13 | NaN |
| 18 | 2016-02-02 | 00:00:18.000 | 33 | 21201.36 | 120.82 | 42892.44 | 47846.14 | NaN |
| 19 | 2016-02-02 | 00:00:19.000 | 33 | 21201.36 | 120.81 | 42892.45 | 47846.16 | NaN |
| 20 | 2016-02-02 | 00:00:20.000 | 33 | 21201.36 | 120.82 | 42892.44 | 47846.15 | NaN |
| 21 | 2016-02-02 | 00:00:21.000 | 33 | 21201.36 | 120.81 | 42892.44 | 47846.16 | NaN |
| 22 | 2016-02-02 | 00:00:22.000 | 33 | 21201.35 | 120.82 | 42892.46 | 47846.16 | NaN |
| 23 | 2016-02-02 | 00:00:23.000 | 33 | 21201.35 | 120.81 | 42892.48 | 47846.15 | NaN |
| 24 | 2016-02-02 | 00:00:24.000 | 33 | 21201.33 | 120.82 | 42892.47 | 47846.16 | NaN |
| 25 | 2016-02-02 | 00:00:25.000 | 33 | 21201.37 | 120.77 | 42892.46 | 47846.18 | NaN |
| 26 | 2016-02-02 | 00:00:26.000 | 33 | 21201.41 | 120.76 | 42892.48 | 47846.18 | NaN |
| 27 | 2016-02-02 | 00:00:27.000 | 33 | 21201.42 | 120.74 | 42892.47 | 47846.20 | NaN |
| 28 | 2016-02-02 | 00:00:28.000 | 33 | 21201.41 | 120.75 | 42892.47 | 47846.21 | NaN |
| 29 | 2016-02-02 | 00:00:29.000 | 33 | 21201.41 | 120.76 | 42892.50 | 47846.20 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 86370 | 2016-02-02 | 23:59:30.000 | 33 | 21191.18 | 158.23 | 42897.90 | 47846.59 | NaN |
| 86371 | 2016-02-02 | 23:59:31.000 | 33 | 21191.18 | 158.24 | 42897.90 | 47846.60 | NaN |
| 86372 | 2016-02-02 | 23:59:32.000 | 33 | 21191.18 | 158.23 | 42897.91 | 47846.60 | NaN |
| 86373 | 2016-02-02 | 23:59:33.000 | 33 | 21191.17 | 158.27 | 42897.91 | 47846.58 | NaN |
| 86374 | 2016-02-02 | 23:59:34.000 | 33 | 21191.19 | 158.28 | 42897.90 | 47846.59 | NaN |
| 86375 | 2016-02-02 | 23:59:35.000 | 33 | 21191.21 | 158.31 | 42897.91 | 47846.60 | NaN |
| 86376 | 2016-02-02 | 23:59:36.000 | 33 | 21191.25 | 158.30 | 42897.90 | 47846.60 | NaN |
| 86377 | 2016-02-02 | 23:59:37.000 | 33 | 21191.23 | 158.30 | 42897.93 | 47846.61 | NaN |
| 86378 | 2016-02-02 | 23:59:38.000 | 33 | 21191.21 | 158.34 | 42897.93 | 47846.60 | NaN |
| 86379 | 2016-02-02 | 23:59:39.000 | 33 | 21191.20 | 158.41 | 42897.91 | 47846.59 | NaN |
| 86380 | 2016-02-02 | 23:59:40.000 | 33 | 21191.19 | 158.46 | 42897.90 | 47846.56 | NaN |
| 86381 | 2016-02-02 | 23:59:41.000 | 33 | 21191.18 | 158.48 | 42897.86 | 47846.53 | NaN |
| 86382 | 2016-02-02 | 23:59:42.000 | 33 | 21191.18 | 158.48 | 42897.86 | 47846.50 | NaN |
| 86383 | 2016-02-02 | 23:59:43.000 | 33 | 21191.21 | 158.45 | 42897.85 | 47846.51 | NaN |
| 86384 | 2016-02-02 | 23:59:44.000 | 33 | 21191.21 | 158.45 | 42897.86 | 47846.53 | NaN |
| 86385 | 2016-02-02 | 23:59:45.000 | 33 | 21191.23 | 158.46 | 42897.84 | 47846.53 | NaN |
| 86386 | 2016-02-02 | 23:59:46.000 | 33 | 21191.25 | 158.45 | 42897.83 | 47846.55 | NaN |
| 86387 | 2016-02-02 | 23:59:47.000 | 33 | 21191.26 | 158.44 | 42897.84 | 47846.53 | NaN |
| 86388 | 2016-02-02 | 23:59:48.000 | 33 | 21191.26 | 158.46 | 42897.82 | 47846.56 | NaN |
| 86389 | 2016-02-02 | 23:59:49.000 | 33 | 21191.25 | 158.49 | 42897.83 | 47846.54 | NaN |
| 86390 | 2016-02-02 | 23:59:50.000 | 33 | 21191.21 | 158.49 | 42897.84 | 47846.54 | NaN |
| 86391 | 2016-02-02 | 23:59:51.000 | 33 | 21191.22 | 158.49 | 42897.82 | 47846.52 | NaN |
| 86392 | 2016-02-02 | 23:59:52.000 | 33 | 21191.22 | 158.49 | 42897.85 | 47846.53 | NaN |
| 86393 | 2016-02-02 | 23:59:53.000 | 33 | 21191.21 | 158.51 | 42897.85 | 47846.51 | NaN |
| 86394 | 2016-02-02 | 23:59:54.000 | 33 | 21191.24 | 158.48 | 42897.84 | 47846.52 | NaN |
| 86395 | 2016-02-02 | 23:59:55.000 | 33 | 21191.23 | 158.46 | 42897.83 | 47846.54 | NaN |
| 86396 | 2016-02-02 | 23:59:56.000 | 33 | 21191.23 | 158.47 | 42897.83 | 47846.53 | NaN |
| 86397 | 2016-02-02 | 23:59:57.000 | 33 | 21191.23 | 158.48 | 42897.86 | 47846.55 | NaN |
| 86398 | 2016-02-02 | 23:59:58.000 | 33 | 21191.23 | 158.48 | 42897.86 | 47846.52 | NaN |
| 86399 | 2016-02-02 | 23:59:59.000 | 33 | 21191.24 | 158.50 | 42897.88 | 47846.53 | NaN |
86400 rows × 8 columns
clf.dtypes
DATE object
TIME object
DOY int64
CLFX float64
CLFY float64
CLFZ float64
CLFF float64
| float64
dtype: object
plot single column
clf['CLFF'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x97f59b0>
plot specific columns
clf.plot(x='TIME', y='CLFF', figsize=(12,8))
<matplotlib.axes._subplots.AxesSubplot at 0x93d2710>
add a rolling mean
clf['CLFF_rolling'] = pd.rolling_mean( clf['CLFF'], window=1000, center=True)
clf.loc[490:530]
| DATE | TIME | DOY | CLFX | CLFY | CLFZ | CLFF | | | CLFF_rolling | |
|---|---|---|---|---|---|---|---|---|---|
| 490 | 2016-02-02 | 00:08:10.000 | 33 | 21202.41 | 121.06 | 42892.77 | 47846.95 | NaN | NaN |
| 491 | 2016-02-02 | 00:08:11.000 | 33 | 21202.40 | 121.07 | 42892.73 | 47846.95 | NaN | NaN |
| 492 | 2016-02-02 | 00:08:12.000 | 33 | 21202.41 | 121.10 | 42892.73 | 47846.93 | NaN | NaN |
| 493 | 2016-02-02 | 00:08:13.000 | 33 | 21202.39 | 121.11 | 42892.72 | 47846.91 | NaN | NaN |
| 494 | 2016-02-02 | 00:08:14.000 | 33 | 21202.40 | 121.09 | 42892.70 | 47846.90 | NaN | NaN |
| 495 | 2016-02-02 | 00:08:15.000 | 33 | 21202.40 | 121.07 | 42892.70 | 47846.91 | NaN | NaN |
| 496 | 2016-02-02 | 00:08:16.000 | 33 | 21202.38 | 121.08 | 42892.72 | 47846.86 | NaN | NaN |
| 497 | 2016-02-02 | 00:08:17.000 | 33 | 21202.39 | 121.06 | 42892.70 | 47846.89 | NaN | NaN |
| 498 | 2016-02-02 | 00:08:18.000 | 33 | 21202.40 | 121.06 | 42892.67 | 47846.88 | NaN | NaN |
| 499 | 2016-02-02 | 00:08:19.000 | 33 | 21202.37 | 121.06 | 42892.68 | 47846.86 | NaN | NaN |
| 500 | 2016-02-02 | 00:08:20.000 | 33 | 21202.37 | 121.03 | 42892.70 | 47846.87 | NaN | 47846.59858 |
| 501 | 2016-02-02 | 00:08:21.000 | 33 | 21202.37 | 121.03 | 42892.71 | 47846.86 | NaN | 47846.59846 |
| 502 | 2016-02-02 | 00:08:22.000 | 33 | 21202.39 | 121.06 | 42892.68 | 47846.88 | NaN | 47846.59836 |
| 503 | 2016-02-02 | 00:08:23.000 | 33 | 21202.39 | 121.02 | 42892.67 | 47846.88 | NaN | 47846.59832 |
| 504 | 2016-02-02 | 00:08:24.000 | 33 | 21202.39 | 121.05 | 42892.70 | 47846.86 | NaN | 47846.59829 |
| 505 | 2016-02-02 | 00:08:25.000 | 33 | 21202.39 | 121.04 | 42892.73 | 47846.89 | NaN | 47846.59832 |
| 506 | 2016-02-02 | 00:08:26.000 | 33 | 21202.42 | 121.04 | 42892.72 | 47846.89 | NaN | 47846.59837 |
| 507 | 2016-02-02 | 00:08:27.000 | 33 | 21202.41 | 121.03 | 42892.73 | 47846.89 | NaN | 47846.59846 |
| 508 | 2016-02-02 | 00:08:28.000 | 33 | 21202.38 | 121.06 | 42892.75 | 47846.87 | NaN | 47846.59851 |
| 509 | 2016-02-02 | 00:08:29.000 | 33 | 21202.37 | 121.08 | 42892.70 | 47846.85 | NaN | 47846.59863 |
| 510 | 2016-02-02 | 00:08:30.000 | 33 | 21202.36 | 121.11 | 42892.69 | 47846.85 | NaN | 47846.59873 |
| 511 | 2016-02-02 | 00:08:31.000 | 33 | 21202.36 | 121.12 | 42892.65 | 47846.83 | NaN | 47846.59888 |
| 512 | 2016-02-02 | 00:08:32.000 | 33 | 21202.35 | 121.11 | 42892.65 | 47846.82 | NaN | 47846.59899 |
| 513 | 2016-02-02 | 00:08:33.000 | 33 | 21202.33 | 121.12 | 42892.65 | 47846.83 | NaN | 47846.59914 |
| 514 | 2016-02-02 | 00:08:34.000 | 33 | 21202.36 | 121.16 | 42892.62 | 47846.81 | NaN | 47846.59927 |
| 515 | 2016-02-02 | 00:08:35.000 | 33 | 21202.37 | 121.15 | 42892.59 | 47846.79 | NaN | 47846.59941 |
| 516 | 2016-02-02 | 00:08:36.000 | 33 | 21202.38 | 121.16 | 42892.61 | 47846.79 | NaN | 47846.59953 |
| 517 | 2016-02-02 | 00:08:37.000 | 33 | 21202.36 | 121.17 | 42892.61 | 47846.78 | NaN | 47846.59964 |
| 518 | 2016-02-02 | 00:08:38.000 | 33 | 21202.37 | 121.16 | 42892.59 | 47846.79 | NaN | 47846.59977 |
| 519 | 2016-02-02 | 00:08:39.000 | 33 | 21202.36 | 121.19 | 42892.57 | 47846.79 | NaN | 47846.59990 |
| 520 | 2016-02-02 | 00:08:40.000 | 33 | 21202.35 | 121.22 | 42892.55 | 47846.78 | NaN | 47846.59999 |
| 521 | 2016-02-02 | 00:08:41.000 | 33 | 21202.34 | 121.22 | 42892.57 | 47846.77 | NaN | 47846.60009 |
| 522 | 2016-02-02 | 00:08:42.000 | 33 | 21202.39 | 121.23 | 42892.59 | 47846.80 | NaN | 47846.60017 |
| 523 | 2016-02-02 | 00:08:43.000 | 33 | 21202.38 | 121.25 | 42892.55 | 47846.75 | NaN | 47846.60024 |
| 524 | 2016-02-02 | 00:08:44.000 | 33 | 21202.38 | 121.29 | 42892.54 | 47846.75 | NaN | 47846.60031 |
| 525 | 2016-02-02 | 00:08:45.000 | 33 | 21202.36 | 121.29 | 42892.56 | 47846.71 | NaN | 47846.60036 |
| 526 | 2016-02-02 | 00:08:46.000 | 33 | 21202.33 | 121.30 | 42892.58 | 47846.72 | NaN | 47846.60039 |
| 527 | 2016-02-02 | 00:08:47.000 | 33 | 21202.34 | 121.30 | 42892.57 | 47846.70 | NaN | 47846.60041 |
| 528 | 2016-02-02 | 00:08:48.000 | 33 | 21202.33 | 121.29 | 42892.57 | 47846.70 | NaN | 47846.60039 |
| 529 | 2016-02-02 | 00:08:49.000 | 33 | 21202.32 | 121.28 | 42892.53 | 47846.71 | NaN | 47846.60036 |
| 530 | 2016-02-02 | 00:08:50.000 | 33 | 21202.30 | 121.28 | 42892.51 | 47846.71 | NaN | 47846.60035 |
ax = clf.plot(x='TIME', y='CLFF', figsize=(12,8))
clf.plot(ax=ax, x='TIME', y='CLFF_rolling', color='r', linewidth=3)
plt.show()