代码之家 › 专栏 › 技术社区 › mikelus

ML.NET LearningPipeline总是有10行

ml.net

1

mikelus · 技术社区 · 6 年前

我注意到,在情感分析示例项目中,无论测试或培训模型中有多少数据,Microsoft.ml.Legacy.LearningPipeline.Row Count始终为10。

https://github.com/dotnet/samples/blob/master/machine learning/tutorials/motionalanalysis.sln

这里有人能解释10的意义吗?

//learningpipeline允许您添加步骤,以便将所有内容保持在一起
//在学习过程中。
//<代码段5>
var pipeline=new learningpipeline();
//</snippet5>

//textLoader加载一个带有注释和相应的postive或negative情绪的数据集。
//创建加载程序时,通过将类传递给包含
//所有列名称及其类型。这用于创建模型并对其进行培训。
//<代码段6>
pipeline.add(new textloader(_datapath).createfrom<motiondata>());
//</snippet6>

//textfeatherizer是用于对输入列进行特征化的转换。
//用于格式化和清理数据。
//<代码段7>
pipeline.add(new textfeautherizer(“features”,“motionText”));
//</snippet7>

//添加FastTreeBinaryClassifier,此项目的决策树学习器,以及
//用于优化决策树性能的三个超参数。
//<代码段8>
pipeline.add(new fasttreebinaryClassifier()numLeaves=50,numTrees=50,mindDocumentsInLeafs=20);
//</snippet8>
< /代码> 

.

https://github.com/dotnet/samples/blob/master/machine-learning/tutorials/SentimentAnalysis.sln

这里有人能解释10的意义吗?

// LearningPipeline allows you to add steps in order to keep everything together 
        // during the learning process.  
        // <Snippet5>
        var pipeline = new LearningPipeline();
        // </Snippet5>

        // The TextLoader loads a dataset with comments and corresponding postive or negative sentiment. 
        // When you create a loader, you specify the schema by passing a class to the loader containing
        // all the column names and their types. This is used to create the model, and train it. 
        // <Snippet6>
        pipeline.Add(new TextLoader(_dataPath).CreateFrom<SentimentData>());
        // </Snippet6>

        // TextFeaturizer is a transform that is used to featurize an input column. 
        // This is used to format and clean the data.
        // <Snippet7>
        pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
        //</Snippet7>

        // Adds a FastTreeBinaryClassifier, the decision tree learner for this project, and 
        // three hyperparameters to be used for tuning decision tree performance.
        // <Snippet8>
        pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 50, NumTrees = 50, MinDocumentsInLeafs = 20 });
        // </Snippet8>

1 回复 | 直到 6 年前

1

2

Gal Oshri 6 年前

调试器只显示数据的预览-前10行。这里的目标是显示一些示例行,以及每个转换如何在这些行上操作,以使调试更容易。

读取整个培训数据并在其上运行所有转换非常昂贵,只有当您 .Train() . 由于转换只在几行上操作,因此在对整个数据集进行操作时,它们的效果可能会有所不同(例如,文本字典可能会更大),但希望在运行整个培训过程之前预览显示的数据有助于调试,并确保将转换应用到正确的列中。NS。

如果你对如何使这个更清楚或更有用有任何想法,如果你能在 GitHub !