代码之家  ›  专栏  ›  技术社区  ›  Enryu

熊猫双括号索引[][重复]

  •  1
  • Enryu  · 技术社区  · 6 年前

    我一直在学习谷歌的机器学习速成课程,他们有一个练习区,教你如何使用熊猫和TensorFlow。在开始时,他们抓取数据帧,然后直接抓取“总房间”和“中值房间”的序列。他们用双括号抓取“总房间”系列,用一组括号抓取“中值房间”系列。我阅读了熊猫的文档,似乎您需要使用双括号来索引一个系列的唯一原因是一次索引2个列,即数据加州住房数据框架[“中位数住房价值”,“总房间”]。在以后使用单括号对数据帧中的一列进行索引时,是否有理由使用双括号对其进行索引,以实现相同的操作?

    这是我说的代码。

    california_housing_dataframe = pd.read_csv("https://dl.google.com/mlcc/mledu-datasets/california_housing_train.csv", sep=",")
    # Define the input feature: total_rooms.
    my_feature = california_housing_dataframe[["total_rooms"]]
    # Configure a numeric feature column for total_rooms.
    feature_columns = [tf.feature_column.numeric_column("total_rooms")]
    
    targets = california_housing_dataframe["median_house_value"]
    

    如果您需要更多的上下文,这里有更多的代码:

    california_housing_dataframe = pd.read_csv("https://dl.google.com/mlcc/mledu-datasets/california_housing_train.csv", sep=",")
    
    # Define the input feature: total_rooms.
    my_feature = california_housing_dataframe[["total_rooms"]]
    # Configure a numeric feature column for total_rooms.
    feature_columns = [tf.feature_column.numeric_column("total_rooms")]
    
    targets = california_housing_dataframe["median_house_value"]
    
    def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
        """Trains a linear regression model of one feature.
    
        Args:
          features: pandas DataFrame of features
          targets: pandas DataFrame of targets
          batch_size: Size of batches to be passed to the model
          shuffle: True or False. Whether to shuffle the data.
          num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely
        Returns:
          Tuple of (features, labels) for next data batch
        """
    
        # Convert pandas data into a dict of np arrays.
        features = {key:np.array(value) for key,value in dict(features).items()}                                           
    
        # Construct a dataset, and configure batching/repeating.
        ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
        ds = ds.batch(batch_size).repeat(num_epochs)
    
        # Shuffle the data, if specified.
        if shuffle:
          ds = ds.shuffle(buffer_size=10000)
    
        # Return the next batch of data.
        features, labels = ds.make_one_shot_iterator().get_next()
        return features, labels
    
    prediction_input_fn =lambda: my_input_fn(my_feature, targets, num_epochs=1, shuffle=False)
    
    # Call predict() on the linear_regressor to make predictions.
    predictions = linear_regressor.predict(input_fn=prediction_input_fn)
    

    如果您需要更多的上下文,这里是指向包含所有代码的练习的链接: https://colab.research.google.com/notebooks/mlcc/first_steps_with_tensor_flow.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=firststeps-colab&hl=en

    2 回复  |  直到 6 年前
        1
  •  3
  •   user2906838    6 年前

    单括号生成大熊猫系列,而双括号生成大熊猫数据帧。

    下面是一个例子:

    d = {'col1': [1, 2], 'col2': [3, 4]}
    df = pd.DataFrame(data=d)
    df
       col1 col2
    0   1   3
    1   2   4
    

    现在让我们使用双括号和单括号来打印类型。

    单括号产生:

    type(df["col1"])
    pandas.core.series.Series
    

    双括号得出:

    type(df[["col1"]])
    pandas.core.frame.DataFrame
    

    所以,现在您看到了区别,单括号和双括号索引之间的差异有两个不同的用途。当您想从数据帧中的现有列中创建新的数据帧时,可以使用双括号。

    这里还有一个类似的答案,有更多的解释。 The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas

        2
  •  1
  •   Mohan Radhakrishnan    6 年前

    我的\功能 是一个 <class 'pandas.core.frame.DataFrame'>

    目标 是一个 <classpandas.core.series.Series'>

    但是许多函数都在这两种数据结构上工作。我甚至可以把它们传递给matplotlib函数。

    在研究差异时,我发现它已经被解释过了。 here

    import pandas as pd
    import tensorflow as tf
    import matplotlib.pyplot as plt
    
    california_housing_dataframe = pd.read_csv("https://dl.google.com/mlcc/mledu-datasets/california_housing_train.csv", sep=",")
    # Define the input feature: total_rooms.
    my_feature = california_housing_dataframe[["total_rooms"]]
    print(type(my_feature))
    # Configure a numeric feature column for total_rooms.
    feature_columns = [tf.feature_column.numeric_column("total_rooms")]
    
    targets = california_housing_dataframe["median_house_value"]
    print(type(targets))
    
    print( my_feature.describe())
    print( targets.describe())
    
    print( my_feature.head())
    print( targets.head())
    
    print( my_feature.max())
    print( targets.max())