티스토리 뷰
source : www.kaggle.com/learn/python
Learn Python Tutorials
Learn the most important language for data science.
www.kaggle.com
6. Strings and Dictionaries , Exercise #3
A researcher has gathered thousands of news articles. But she wants to focus her attention on articles including a specific word. Complete the function below to help her filter her list of articles.
Your function should meet the following criteria:
- Do not include documents where the keyword string shows up only as a part of a larger word. For example, if she were looking for the keyword “closed”, you would not include the string “enclosed.”
- She does not want you to distinguish upper case from lower case letters. So the phrase “Closed the case.” would be included when the keyword is “closed”
- Do not let periods or commas affect what is matched. “It is closed.” would be included when the keyword is “closed”. But you can assume there are no other types of punctuation.
나의 미흡한 첫번째 코드는 아래와 같다. 아는 함수가 없으니 어거지로 모든 경우를 else if로 묶었다.
중구난방;
def word_search(doc_list, keyword):
"""
Takes a list of documents (each document is a string) and a keyword.
Returns list of the index values into the original list for all documents
containing the keyword.
Example:
doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
>>> word_search(doc_list, 'casino')
>>> [0]
"""
c = list(keyword)
if c[0].islower():
c[0] = c[0].upper()
cKeyword = "".join(c)
lcKeyword = " "+cKeyword
rcKeyword = cKeyword+" "
lKeyword = " "+keyword
rKeyword = keyword+" "
res = []
for i in range(len(doc_list)):
if lcKeyword in doc_list[i] or rcKeyword in doc_list[i]:
res.append(i)
elif lcKeyword+',' in doc_list[i] or rcKeyword+',' in doc_list[i]:
res.append(i)
elif lcKeyword+'.' in doc_list[i] or rcKeyword+',' in doc_list[i]:
res.append(i)
elif lKeyword+',' in doc_list[i] or rKeyword+',' in doc_list[i]:
res.append(i)
elif lKeyword+'.' in doc_list[i] or rKeyword+',' in doc_list[i]:
res.append(i)
elif lKeyword in doc_list[i] or rKeyword in doc_list[i]:
res.append(i)
return res
깨끗한 답안
def word_search(documents, keyword):
# list to hold the indices of matching documents
indices = []
# Iterate through the indices (i) and elements (doc) of documents
for i, doc in enumerate(documents):
# Split the string doc into a list of words (according to whitespace)
tokens = doc.split()
# Make a transformed list where we 'normalize' each word to facilitate matching.
# Periods and commas are removed from the end of each word, and it's set to all lowercase.
normalized = [token.rstrip('.,').lower() for token in tokens]
# Is there a match? If so, update the list of matching indices.
if keyword.lower() in normalized:
indices.append(i)
return indices
여기서 내가 놓친 부분은 두 가지 이다.
1. enumerate 함수
list를 enumerate 함수에 넣으면 다음과 같이 된다.
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
lists(enumerate(seasons))
# [(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
2. 검색의 주체와 대상
문장을 나눠서 각각의 단어가 normalized keyword에 속하는 지 확인해야 하는데,
normalized keyword로 문장과 매치 되는 지 찾았기 때문에 코드가 더럽다.
'언어 > Python' 카테고리의 다른 글
함수의 결과 값이 가장 작은 element를 구하라 (0) | 2021.01.05 |
---|---|
쓰임새를 모르는 Python object 용도 찾기 (0) | 2021.01.01 |
List Comprehensions in Python (0) | 2021.01.01 |
Python sublist, slicing (0) | 2020.12.31 |
Access and Print a list of lists (0) | 2020.12.31 |