본문으로 바로가기

(출처: https://github.com/ResidentMario/missingno)

 

 

0. 라이브러리 설치

# !pip install missingno==0.5.1
# !pip install quilt==2.9.15
# !quilt install ResidentMario/missingno_data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import quilt
import missingno as msno

 

1. 데이터 불러오기

from quilt.data.ResidentMario import missingno_data
collisions = missingno_data.nyc_collision_factors()
collisions = collisions.replace("nan", np.nan)

 

2. 결측치 시각화

1) 일반적인 데이터

msno.matrix(collisions.sample(250))

# The sparkline at right summarizes the general shape of the data completeness and 
# points out the rows with the maximum and minimum nullity in the dataset.
# This visualization will comfortably accommodate up to 50 labelled variables. 
# Past that range labels begin to overlap or become unreadable, and by default large displays omit them.

 

msno.bar(collisions.sample(1000))

 

# The missingno correlation heatmap measures nullity correlation: 
# how strongly the presence or absence of one variable affects the presence of another
msno.heatmap(collisions)

 

# The dendrogram allows you to more fully correlate variable completion, 
# revealing trends deeper than the pairwise ones visible in the correlation heatmap:
msno.dendrogram(collisions)

 

 

 

2) 시계열 데이터

null_pattern = (np.random.random(1000).reshape((50, 20)) > 0.5).astype(bool)
null_pattern = pd.DataFrame(null_pattern).replace({False: None})

msno.matrix(null_pattern.set_index(pd.period_range('1/1/2011', '2/1/2015', freq='M')) , freq='BQ')