pandas¶
Pandas is a module for Python to manipulate and analyze tabular data. It allows easy reading and writing to different file formats (including MS Excel).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
working with Excel data¶
myexcel = pd.read_excel("pandas_test.xlsx", sheetname="mydata", skiprows=2)
myexcel
sample | weight | volume | magnetic moment | |
---|---|---|---|---|
0 | A | 20.1 | 10.3 | 2.000000e-08 |
1 | B | 20.4 | 10.2 | 2.000000e-08 |
2 | C | 19.8 | 9.8 | 2.000000e-08 |
3 | D | 17.2 | 11.0 | 2.000000e-08 |
4 | E | 24.3 | 10.3 | 2.000000e-08 |
5 | F | 18.5 | 10.3 | 2.000000e-08 |
myexcel[['sample','weight']]
sample | weight | |
---|---|---|
0 | A | 20.1 |
1 | B | 20.4 |
2 | C | 19.8 |
3 | D | 17.2 |
4 | E | 24.3 |
5 | F | 18.5 |
calculations with rows:
myexcel['density'] = myexcel['weight'] / myexcel['volume']
save pandas dataframe back to an excel file
myexcel.to_excel('new_excel_file.xlsx')
and a text file
myexcel.to_csv('new_excel_file.csv', sep='\t')
intermagnet data¶
Data from magnetic observatories all over the world can be downloaded here:
http://www.intermagnet.org/data-donnee/download-eng.php
we start by reading in the file downloaded from intermagnet in IAGA2002 format
clf = pd.read_csv("clf20160202vsec.sec", skiprows=22, sep='\s+')
clf
DATE | TIME | DOY | CLFX | CLFY | CLFZ | CLFF | | | |
---|---|---|---|---|---|---|---|---|
0 | 2016-02-02 | 00:00:00.000 | 33 | 21201.36 | 120.82 | 42892.47 | 47846.22 | NaN |
1 | 2016-02-02 | 00:00:01.000 | 33 | 21201.35 | 120.80 | 42892.49 | 47846.20 | NaN |
2 | 2016-02-02 | 00:00:02.000 | 33 | 21201.35 | 120.80 | 42892.47 | 47846.20 | NaN |
3 | 2016-02-02 | 00:00:03.000 | 33 | 21201.38 | 120.79 | 42892.49 | 47846.21 | NaN |
4 | 2016-02-02 | 00:00:04.000 | 33 | 21201.37 | 120.80 | 42892.49 | 47846.20 | NaN |
5 | 2016-02-02 | 00:00:05.000 | 33 | 21201.36 | 120.83 | 42892.46 | 47846.18 | NaN |
6 | 2016-02-02 | 00:00:06.000 | 33 | 21201.34 | 120.86 | 42892.45 | 47846.19 | NaN |
7 | 2016-02-02 | 00:00:07.000 | 33 | 21201.36 | 120.83 | 42892.45 | 47846.19 | NaN |
8 | 2016-02-02 | 00:00:08.000 | 33 | 21201.35 | 120.83 | 42892.45 | 47846.17 | NaN |
9 | 2016-02-02 | 00:00:09.000 | 33 | 21201.35 | 120.81 | 42892.47 | 47846.17 | NaN |
10 | 2016-02-02 | 00:00:10.000 | 33 | 21201.33 | 120.80 | 42892.44 | 47846.15 | NaN |
11 | 2016-02-02 | 00:00:11.000 | 33 | 21201.33 | 120.82 | 42892.43 | 47846.17 | NaN |
12 | 2016-02-02 | 00:00:12.000 | 33 | 21201.33 | 120.85 | 42892.41 | 47846.14 | NaN |
13 | 2016-02-02 | 00:00:13.000 | 33 | 21201.35 | 120.84 | 42892.43 | 47846.15 | NaN |
14 | 2016-02-02 | 00:00:14.000 | 33 | 21201.34 | 120.83 | 42892.45 | 47846.14 | NaN |
15 | 2016-02-02 | 00:00:15.000 | 33 | 21201.31 | 120.84 | 42892.43 | 47846.15 | NaN |
16 | 2016-02-02 | 00:00:16.000 | 33 | 21201.34 | 120.84 | 42892.41 | 47846.16 | NaN |
17 | 2016-02-02 | 00:00:17.000 | 33 | 21201.35 | 120.85 | 42892.40 | 47846.13 | NaN |
18 | 2016-02-02 | 00:00:18.000 | 33 | 21201.36 | 120.82 | 42892.44 | 47846.14 | NaN |
19 | 2016-02-02 | 00:00:19.000 | 33 | 21201.36 | 120.81 | 42892.45 | 47846.16 | NaN |
20 | 2016-02-02 | 00:00:20.000 | 33 | 21201.36 | 120.82 | 42892.44 | 47846.15 | NaN |
21 | 2016-02-02 | 00:00:21.000 | 33 | 21201.36 | 120.81 | 42892.44 | 47846.16 | NaN |
22 | 2016-02-02 | 00:00:22.000 | 33 | 21201.35 | 120.82 | 42892.46 | 47846.16 | NaN |
23 | 2016-02-02 | 00:00:23.000 | 33 | 21201.35 | 120.81 | 42892.48 | 47846.15 | NaN |
24 | 2016-02-02 | 00:00:24.000 | 33 | 21201.33 | 120.82 | 42892.47 | 47846.16 | NaN |
25 | 2016-02-02 | 00:00:25.000 | 33 | 21201.37 | 120.77 | 42892.46 | 47846.18 | NaN |
26 | 2016-02-02 | 00:00:26.000 | 33 | 21201.41 | 120.76 | 42892.48 | 47846.18 | NaN |
27 | 2016-02-02 | 00:00:27.000 | 33 | 21201.42 | 120.74 | 42892.47 | 47846.20 | NaN |
28 | 2016-02-02 | 00:00:28.000 | 33 | 21201.41 | 120.75 | 42892.47 | 47846.21 | NaN |
29 | 2016-02-02 | 00:00:29.000 | 33 | 21201.41 | 120.76 | 42892.50 | 47846.20 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
86370 | 2016-02-02 | 23:59:30.000 | 33 | 21191.18 | 158.23 | 42897.90 | 47846.59 | NaN |
86371 | 2016-02-02 | 23:59:31.000 | 33 | 21191.18 | 158.24 | 42897.90 | 47846.60 | NaN |
86372 | 2016-02-02 | 23:59:32.000 | 33 | 21191.18 | 158.23 | 42897.91 | 47846.60 | NaN |
86373 | 2016-02-02 | 23:59:33.000 | 33 | 21191.17 | 158.27 | 42897.91 | 47846.58 | NaN |
86374 | 2016-02-02 | 23:59:34.000 | 33 | 21191.19 | 158.28 | 42897.90 | 47846.59 | NaN |
86375 | 2016-02-02 | 23:59:35.000 | 33 | 21191.21 | 158.31 | 42897.91 | 47846.60 | NaN |
86376 | 2016-02-02 | 23:59:36.000 | 33 | 21191.25 | 158.30 | 42897.90 | 47846.60 | NaN |
86377 | 2016-02-02 | 23:59:37.000 | 33 | 21191.23 | 158.30 | 42897.93 | 47846.61 | NaN |
86378 | 2016-02-02 | 23:59:38.000 | 33 | 21191.21 | 158.34 | 42897.93 | 47846.60 | NaN |
86379 | 2016-02-02 | 23:59:39.000 | 33 | 21191.20 | 158.41 | 42897.91 | 47846.59 | NaN |
86380 | 2016-02-02 | 23:59:40.000 | 33 | 21191.19 | 158.46 | 42897.90 | 47846.56 | NaN |
86381 | 2016-02-02 | 23:59:41.000 | 33 | 21191.18 | 158.48 | 42897.86 | 47846.53 | NaN |
86382 | 2016-02-02 | 23:59:42.000 | 33 | 21191.18 | 158.48 | 42897.86 | 47846.50 | NaN |
86383 | 2016-02-02 | 23:59:43.000 | 33 | 21191.21 | 158.45 | 42897.85 | 47846.51 | NaN |
86384 | 2016-02-02 | 23:59:44.000 | 33 | 21191.21 | 158.45 | 42897.86 | 47846.53 | NaN |
86385 | 2016-02-02 | 23:59:45.000 | 33 | 21191.23 | 158.46 | 42897.84 | 47846.53 | NaN |
86386 | 2016-02-02 | 23:59:46.000 | 33 | 21191.25 | 158.45 | 42897.83 | 47846.55 | NaN |
86387 | 2016-02-02 | 23:59:47.000 | 33 | 21191.26 | 158.44 | 42897.84 | 47846.53 | NaN |
86388 | 2016-02-02 | 23:59:48.000 | 33 | 21191.26 | 158.46 | 42897.82 | 47846.56 | NaN |
86389 | 2016-02-02 | 23:59:49.000 | 33 | 21191.25 | 158.49 | 42897.83 | 47846.54 | NaN |
86390 | 2016-02-02 | 23:59:50.000 | 33 | 21191.21 | 158.49 | 42897.84 | 47846.54 | NaN |
86391 | 2016-02-02 | 23:59:51.000 | 33 | 21191.22 | 158.49 | 42897.82 | 47846.52 | NaN |
86392 | 2016-02-02 | 23:59:52.000 | 33 | 21191.22 | 158.49 | 42897.85 | 47846.53 | NaN |
86393 | 2016-02-02 | 23:59:53.000 | 33 | 21191.21 | 158.51 | 42897.85 | 47846.51 | NaN |
86394 | 2016-02-02 | 23:59:54.000 | 33 | 21191.24 | 158.48 | 42897.84 | 47846.52 | NaN |
86395 | 2016-02-02 | 23:59:55.000 | 33 | 21191.23 | 158.46 | 42897.83 | 47846.54 | NaN |
86396 | 2016-02-02 | 23:59:56.000 | 33 | 21191.23 | 158.47 | 42897.83 | 47846.53 | NaN |
86397 | 2016-02-02 | 23:59:57.000 | 33 | 21191.23 | 158.48 | 42897.86 | 47846.55 | NaN |
86398 | 2016-02-02 | 23:59:58.000 | 33 | 21191.23 | 158.48 | 42897.86 | 47846.52 | NaN |
86399 | 2016-02-02 | 23:59:59.000 | 33 | 21191.24 | 158.50 | 42897.88 | 47846.53 | NaN |
86400 rows × 8 columns
clf.dtypes
DATE object
TIME object
DOY int64
CLFX float64
CLFY float64
CLFZ float64
CLFF float64
| float64
dtype: object
plot single column
clf['CLFF'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x97f59b0>

plot specific columns
clf.plot(x='TIME', y='CLFF', figsize=(12,8))
<matplotlib.axes._subplots.AxesSubplot at 0x93d2710>

add a rolling mean
clf['CLFF_rolling'] = pd.rolling_mean( clf['CLFF'], window=1000, center=True)
clf.loc[490:530]
DATE | TIME | DOY | CLFX | CLFY | CLFZ | CLFF | | | CLFF_rolling | |
---|---|---|---|---|---|---|---|---|---|
490 | 2016-02-02 | 00:08:10.000 | 33 | 21202.41 | 121.06 | 42892.77 | 47846.95 | NaN | NaN |
491 | 2016-02-02 | 00:08:11.000 | 33 | 21202.40 | 121.07 | 42892.73 | 47846.95 | NaN | NaN |
492 | 2016-02-02 | 00:08:12.000 | 33 | 21202.41 | 121.10 | 42892.73 | 47846.93 | NaN | NaN |
493 | 2016-02-02 | 00:08:13.000 | 33 | 21202.39 | 121.11 | 42892.72 | 47846.91 | NaN | NaN |
494 | 2016-02-02 | 00:08:14.000 | 33 | 21202.40 | 121.09 | 42892.70 | 47846.90 | NaN | NaN |
495 | 2016-02-02 | 00:08:15.000 | 33 | 21202.40 | 121.07 | 42892.70 | 47846.91 | NaN | NaN |
496 | 2016-02-02 | 00:08:16.000 | 33 | 21202.38 | 121.08 | 42892.72 | 47846.86 | NaN | NaN |
497 | 2016-02-02 | 00:08:17.000 | 33 | 21202.39 | 121.06 | 42892.70 | 47846.89 | NaN | NaN |
498 | 2016-02-02 | 00:08:18.000 | 33 | 21202.40 | 121.06 | 42892.67 | 47846.88 | NaN | NaN |
499 | 2016-02-02 | 00:08:19.000 | 33 | 21202.37 | 121.06 | 42892.68 | 47846.86 | NaN | NaN |
500 | 2016-02-02 | 00:08:20.000 | 33 | 21202.37 | 121.03 | 42892.70 | 47846.87 | NaN | 47846.59858 |
501 | 2016-02-02 | 00:08:21.000 | 33 | 21202.37 | 121.03 | 42892.71 | 47846.86 | NaN | 47846.59846 |
502 | 2016-02-02 | 00:08:22.000 | 33 | 21202.39 | 121.06 | 42892.68 | 47846.88 | NaN | 47846.59836 |
503 | 2016-02-02 | 00:08:23.000 | 33 | 21202.39 | 121.02 | 42892.67 | 47846.88 | NaN | 47846.59832 |
504 | 2016-02-02 | 00:08:24.000 | 33 | 21202.39 | 121.05 | 42892.70 | 47846.86 | NaN | 47846.59829 |
505 | 2016-02-02 | 00:08:25.000 | 33 | 21202.39 | 121.04 | 42892.73 | 47846.89 | NaN | 47846.59832 |
506 | 2016-02-02 | 00:08:26.000 | 33 | 21202.42 | 121.04 | 42892.72 | 47846.89 | NaN | 47846.59837 |
507 | 2016-02-02 | 00:08:27.000 | 33 | 21202.41 | 121.03 | 42892.73 | 47846.89 | NaN | 47846.59846 |
508 | 2016-02-02 | 00:08:28.000 | 33 | 21202.38 | 121.06 | 42892.75 | 47846.87 | NaN | 47846.59851 |
509 | 2016-02-02 | 00:08:29.000 | 33 | 21202.37 | 121.08 | 42892.70 | 47846.85 | NaN | 47846.59863 |
510 | 2016-02-02 | 00:08:30.000 | 33 | 21202.36 | 121.11 | 42892.69 | 47846.85 | NaN | 47846.59873 |
511 | 2016-02-02 | 00:08:31.000 | 33 | 21202.36 | 121.12 | 42892.65 | 47846.83 | NaN | 47846.59888 |
512 | 2016-02-02 | 00:08:32.000 | 33 | 21202.35 | 121.11 | 42892.65 | 47846.82 | NaN | 47846.59899 |
513 | 2016-02-02 | 00:08:33.000 | 33 | 21202.33 | 121.12 | 42892.65 | 47846.83 | NaN | 47846.59914 |
514 | 2016-02-02 | 00:08:34.000 | 33 | 21202.36 | 121.16 | 42892.62 | 47846.81 | NaN | 47846.59927 |
515 | 2016-02-02 | 00:08:35.000 | 33 | 21202.37 | 121.15 | 42892.59 | 47846.79 | NaN | 47846.59941 |
516 | 2016-02-02 | 00:08:36.000 | 33 | 21202.38 | 121.16 | 42892.61 | 47846.79 | NaN | 47846.59953 |
517 | 2016-02-02 | 00:08:37.000 | 33 | 21202.36 | 121.17 | 42892.61 | 47846.78 | NaN | 47846.59964 |
518 | 2016-02-02 | 00:08:38.000 | 33 | 21202.37 | 121.16 | 42892.59 | 47846.79 | NaN | 47846.59977 |
519 | 2016-02-02 | 00:08:39.000 | 33 | 21202.36 | 121.19 | 42892.57 | 47846.79 | NaN | 47846.59990 |
520 | 2016-02-02 | 00:08:40.000 | 33 | 21202.35 | 121.22 | 42892.55 | 47846.78 | NaN | 47846.59999 |
521 | 2016-02-02 | 00:08:41.000 | 33 | 21202.34 | 121.22 | 42892.57 | 47846.77 | NaN | 47846.60009 |
522 | 2016-02-02 | 00:08:42.000 | 33 | 21202.39 | 121.23 | 42892.59 | 47846.80 | NaN | 47846.60017 |
523 | 2016-02-02 | 00:08:43.000 | 33 | 21202.38 | 121.25 | 42892.55 | 47846.75 | NaN | 47846.60024 |
524 | 2016-02-02 | 00:08:44.000 | 33 | 21202.38 | 121.29 | 42892.54 | 47846.75 | NaN | 47846.60031 |
525 | 2016-02-02 | 00:08:45.000 | 33 | 21202.36 | 121.29 | 42892.56 | 47846.71 | NaN | 47846.60036 |
526 | 2016-02-02 | 00:08:46.000 | 33 | 21202.33 | 121.30 | 42892.58 | 47846.72 | NaN | 47846.60039 |
527 | 2016-02-02 | 00:08:47.000 | 33 | 21202.34 | 121.30 | 42892.57 | 47846.70 | NaN | 47846.60041 |
528 | 2016-02-02 | 00:08:48.000 | 33 | 21202.33 | 121.29 | 42892.57 | 47846.70 | NaN | 47846.60039 |
529 | 2016-02-02 | 00:08:49.000 | 33 | 21202.32 | 121.28 | 42892.53 | 47846.71 | NaN | 47846.60036 |
530 | 2016-02-02 | 00:08:50.000 | 33 | 21202.30 | 121.28 | 42892.51 | 47846.71 | NaN | 47846.60035 |
ax = clf.plot(x='TIME', y='CLFF', figsize=(12,8))
clf.plot(ax=ax, x='TIME', y='CLFF_rolling', color='r', linewidth=3)
plt.show()
