[6주차 총정리] pd.Series 특정 문자 포함 여부 확인/ 개수 Count (pd.Series.str.contains()/ str.count())

1. pd.Series.str.contains('문자')

# 텍스트 피쳐 벡터라이징 countvectorizer
# okt로 토큰화된 데이터 그대로 사용
count_vec = CountVectorizer(tokenizer = lambda x:x, lowercase=False)
count_X = count_vec.fit_transform(df['Text_token'])

count_sim = cosine_similarity(count_X, count_X)
print(count_sim)

# 텍스트 간의 코사인 유사도 따지기
count_sim_sorted_ind = count_sim.argsort()[:, ::-1]

def find_place(df, sorted_ind, top_n=10):
    data = input("가고싶은 식당의 키워드를 입력하세요: ")
    
    keywords = df[df['Text'].str.contains(data)]  # 입력한 키워드가 포함된 text열의 index추출
    keywords_index = keywords.index.values
    similar_indexes = sorted_ind[keywords_index, :(top_n)]  # 해당 index와 유사도가 높은 순서대로 상위 n개 추출
    similar_indexes = similar_indexes.reshape(-1)
    result_df = df.iloc[similar_indexes][:top_n]
    return result_df['상호지점명']

2. pd.Series.str.count('문자')

def recommend_system():
    flag = True
    while flag:
        data = input("가고싶은 식당의 키워드를 입력하세요: ")
        try:
            keywords=model_cbow.wv.most_similar(data)
            print(keywords)
            flag = False
        except:
            flag = True
            print("다른 키워드를 입력해주세요")

            
    weighted_series=pd.Series(df['Text_token'].apply(lambda x:1))
    for keyword, weight in keywords:
        print(keyword)
        count = pd.Series(df['Text_token'].apply(lambda x:x.count(keyword)))   # x.count('문자')
        weighted_series += count*weight
    weighted_series = weighted_series.sort_values(ascending=False)

    index = weighted_series[weighted_series>0].index
    if len(index) > 5:
        index=index[:5]
        return df['상호지점명'][index]

    return df['상호지점명'][index]

'멋쟁이 사자처럼 AI SCHOOL 5기 > Today I Learned' 카테고리의 다른 글

[6주차 총정리] t-SNE 개념 정리 (0)	2022.04.26
[6주차 총정리] Word2Vec 개념 정리 (0)	2022.04.26
[6주차 총정리] Matplotlib 트리맵으로 계층적 데이터 시각화 (Squarify) (0)	2022.04.26
[6주차 총정리] Streamlit으로 간단한 웹어플리케이션 구현하기 (0)	2022.04.26
[6주차 총정리] 두 개의 pd.Series -> 하나의 pd.Series로 문자열 합치기 (pandas.Series.str.cat) (0)	2022.04.24

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

올라프의 [데이터 사이언스] 공부 일기

[6주차 총정리] pd.Series 특정 문자 포함 여부 확인/ 개수 Count (pd.Series.str.contains()/ str.count())

1. pd.Series.str.contains('문자')

2. pd.Series.str.count('문자')

'멋쟁이 사자처럼 AI SCHOOL 5기 > Today I Learned' 카테고리의 다른 글

티스토리툴바

1. pd.Series.str.contains('문자')

2. pd.Series.str.count('문자')

'멋쟁이 사자처럼 AI SCHOOL 5기 > Today I Learned' 카테고리의 다른 글

검색

티스토리툴바