Numpy, pandas, scikit
Matplotlib
- Scatter graph is used to represent the relationship between variables
- 折线图体现变化, 散点图体现x和y的关系,条形图统计离散数据,直方图统计连续数据
1 | import random |
x, y and grid
1 | from matplotlib import pyplot as plt |
Numpy
1 | import random |
Two dimensional and three dimensional array
1 | t2 = np.array([[1, 2, 3], [4, 5, 6]]) |
Reshape
1 | import numpy as np |
Three way to transpose
1 | import numpy as np |
Slicing, index, row and column
Siling: a group of value
Index: a single value
1 | import numpy as np |
Replace number to nan
1 | import numpy as np |
Use mean values to fillout missing values
1 | import numpy as np |
Exercie: The US YouTube video comments
1 | import numpy as np |
Exercise: The relationship between UK video likes, and comments
1 | import numpy as np |
Concatenate two data
1 | import numpy as np |
Pandas
Basic silcing, and key:value pair
1 | import numpy as np |
1 | import numpy as np |
loc function
1 | import numpy as np |
iloc function
1 | import numpy as np |
Select Data
1 | import numpy as np |
drop nan
1 | import numpy as np |
Exercise: imdb movies, rating & runtime distribution
1 | import numpy as np |
Important Exercise: Give a set of movie data, rank their genre
1 | import numpy as np |
Merge, Join默认的合并方式
- 默认的合并方式inner,并集
- merge outer,交集,NaN补全
- merge left,左边为准,NaN补全
- merge right,右边为准,NaN补全
1 | import numpy as np |
Exercise: Starbucks store in CN and US
1 | import numpy as np |
**Count the number of Starbucks of each state in the US **
1 | import numpy as np |
店铺总数排名前十的国家
1 |
|
Rank cities that has starbucks in US
1 | import numpy as np |
不同年份书的数量和不同年份书的平均评分情况
1 | import numpy as np |
Exerciese: 不同类型的紧急情况次数
1 | import numpy as np |
Exercise: 不同类型的紧急情况次数2
1 | import numpy as np |