代码之家  ›  专栏  ›  技术社区  ›  CAB

使用pandas.map更改值

  •  2
  • CAB  · 技术社区  · 6 年前

    我正在尝试使用map函数更改数据中的字符串do数值。

    这是数据:

        label   sms_message
    0   ham     Go until jurong point, crazy.. Available only ...
    1   ham     Ok lar... Joking wif u oni...
    2   spam    Free entry in 2 a wkly comp to win FA Cup fina...
    3   ham     U dun say so early hor... U c already then say...
    4   ham     Nah I don't think he goes to usf, he lives aro...
    

    我尝试将“垃圾邮件”更改为1,将“火腿”更改为0,方法如下:

    df['label'] = df.label.map({'ham':0, 'spam':1})
    

    但结果是:

        label   sms_message
    0   NaN     Go until jurong point, crazy.. Available only ...
    1   NaN     Ok lar... Joking wif u oni...
    2   NaN     Free entry in 2 a wkly comp to win FA Cup fina...
    3   NaN     U dun say so early hor... U c already then say...
    4   NaN     Nah I don't think he goes to usf, he lives aro...
    

    有人能找出问题所在吗?

    2 回复  |  直到 6 年前
        1
  •  1
  •   hygull    6 年前

    您是对的,我认为您执行了同一语句两次(1比1)。在python交互终端上执行的以下语句澄清了这一点。

    注: 如果传递字典,map()将序列中的所有值替换为 NaN 如果它与字典的键不匹配(我认为,您也做了同样的操作,即执行语句两次)。检查 pandas map(), apply() .

    熊猫文件注释 :当 精氨酸 是字典,值在 系列 不在字典中的(作为键)将转换为 .

    >>> import pandas as pd
    >>>
    >>> d = {
    ...     "label": ["ham", "ham", "spam", "ham", "ham"],
    ...     "sms_messsage": [
    ...     "Go until jurong point, crazy.. Available only ...",
    ...     "Ok lar... Joking wif u oni...",
    ...     "Free entry in 2 a wkly comp to win FA Cup fina...",
    ...     "U dun say so early hor... U c already then say...",
    ...     "Nah I don't think he goes to usf, he lives aro..."
    ...    ]
    ... }
    >>>
    >>> df = pd.DataFrame(d)
    >>> df
      label                                       sms_messsage
    0   ham  Go until jurong point, crazy.. Available only ...
    1   ham                      Ok lar... Joking wif u oni...
    2  spam  Free entry in 2 a wkly comp to win FA Cup fina...
    3   ham  U dun say so early hor... U c already then say...
    4   ham  Nah I don't think he goes to usf, he lives aro...
    >>>
    >>> df['label'] = df.label.map({'ham':0, 'spam':1})
    >>> df
       label                                       sms_messsage
    0      0  Go until jurong point, crazy.. Available only ...
    1      0                      Ok lar... Joking wif u oni...
    2      1  Free entry in 2 a wkly comp to win FA Cup fina...
    3      0  U dun say so early hor... U c already then say...
    4      0  Nah I don't think he goes to usf, he lives aro...
    >>>
    >>> df['label'] = df.label.map({'ham':0, 'spam':1})
    >>> df
       label                                       sms_messsage
    0    NaN  Go until jurong point, crazy.. Available only ...
    1    NaN                      Ok lar... Joking wif u oni...
    2    NaN  Free entry in 2 a wkly comp to win FA Cup fina...
    3    NaN  U dun say so early hor... U c already then say...
    4    NaN  Nah I don't think he goes to usf, he lives aro...
    >>>
    

    获得相同结果的其他方法

    >>> import pandas as pd
    >>>
    >>> d = {
    ...     "label": ['spam', 'ham', 'ham', 'ham', 'spam'],
    ...     "sms_message": ["M1", "M2", "M3", "M4", "M5"]
    ... }
    >>>
    >>> df = pd.DataFrame(d)
    >>> df
      label sms_message
    0  spam          M1
    1   ham          M2
    2   ham          M3
    3   ham          M4
    4  spam          M5
    >>>
    

    第一路使用 map() 具有 dictionary 参数

    >>> new_values = {'spam': 1, 'ham': 0}
    >>>
    >>> df
      label sms_message
    0  spam          M1
    1   ham          M2
    2   ham          M3
    3   ham          M4
    4  spam          M5
    >>>
    >>> df.label = df.label.map(new_values)
    >>> df
       label sms_message
    0      1          M1
    1      0          M2
    2      0          M3
    3      0          M4
    4      1          M5
    >>>
    

    第二路使用 MAP() 具有 function 参数

    >>> df.label = df.label.map(lambda v: 0 if v == 'ham' else 1)
    >>> df
       label sms_message
    0      1          M1
    1      0          M2
    2      0          M3
    3      0          M4
    4      1          M5
    >>>
    

    第三路使用 apply() 具有 功能 参数

    >>> df.label = df.label.apply(lambda v: 0 if v == "ham" else 1)
    >>>
    >>> df
       label sms_message
    0      1          M1
    1      0          M2
    2      0          M3
    3      0          M4
    4      1          M5
    >>>
    

    谢谢您。

        2
  •  0
  •   Lucas André da Silva    6 年前

    也许你的问题是读表功能。

    试着去做:

    df = pd.read_table('smsspamcollection/SMSSpamCollection',
                       sep='\t', 
                       header=None,
                       names=['label', 'sms_message'])