티스토리 뷰

source : www.kaggle.com/learn/python

 

Learn Python Tutorials

Learn the most important language for data science.

www.kaggle.com

6. Strings and Dictionaries , Exercise #3

 

A researcher has gathered thousands of news articles. But she wants to focus her attention on articles including a specific word. Complete the function below to help her filter her list of articles.

Your function should meet the following criteria:

  • Do not include documents where the keyword string shows up only as a part of a larger word. For example, if she were looking for the keyword “closed”, you would not include the string “enclosed.”
  • She does not want you to distinguish upper case from lower case letters. So the phrase “Closed the case.” would be included when the keyword is “closed”
  • Do not let periods or commas affect what is matched. “It is closed.” would be included when the keyword is “closed”. But you can assume there are no other types of punctuation.

나의 미흡한 첫번째 코드는 아래와 같다. 아는 함수가 없으니 어거지로 모든 경우를 else if로 묶었다. 

중구난방;

def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    c = list(keyword)
    if c[0].islower():
        c[0] = c[0].upper()
    cKeyword = "".join(c)
    lcKeyword = " "+cKeyword
    rcKeyword = cKeyword+" "
    lKeyword = " "+keyword
    rKeyword = keyword+" "
    res = []
    for i in range(len(doc_list)):
        if lcKeyword in doc_list[i] or rcKeyword in doc_list[i]:
            res.append(i)
        elif lcKeyword+',' in doc_list[i] or rcKeyword+',' in doc_list[i]:
            res.append(i)
        elif lcKeyword+'.' in doc_list[i] or rcKeyword+',' in doc_list[i]:
            res.append(i)
        elif lKeyword+',' in doc_list[i] or rKeyword+',' in doc_list[i]:
            res.append(i)
        elif lKeyword+'.' in doc_list[i] or rKeyword+',' in doc_list[i]:
            res.append(i)
        elif lKeyword in doc_list[i] or rKeyword in doc_list[i]:
            res.append(i)
            
    return res

 

깨끗한 답안

def word_search(documents, keyword):
    # list to hold the indices of matching documents
    indices = [] 
    # Iterate through the indices (i) and elements (doc) of documents
    for i, doc in enumerate(documents):
        # Split the string doc into a list of words (according to whitespace)
        tokens = doc.split()
        # Make a transformed list where we 'normalize' each word to facilitate matching.
        # Periods and commas are removed from the end of each word, and it's set to all lowercase.
        normalized = [token.rstrip('.,').lower() for token in tokens]
        # Is there a match? If so, update the list of matching indices.
        if keyword.lower() in normalized:
            indices.append(i)
    return indices

여기서 내가 놓친 부분은 두 가지 이다.

1. enumerate 함수

    list를 enumerate 함수에 넣으면 다음과 같이 된다.

 seasons = ['Spring', 'Summer', 'Fall', 'Winter']
 lists(enumerate(seasons))
 # [(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]

    

2. 검색의 주체와 대상

   문장을 나눠서 각각의 단어가 normalized keyword에 속하는 지 확인해야 하는데,

   normalized keyword로 문장과 매치 되는 지 찾았기 때문에 코드가 더럽다.

 

최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/04   »
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
글 보관함