by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 09:49
    datend3nker starred dotnet/machinelearning
  • 09:45
    codecov[bot] commented #5212
  • 09:45
    codecov[bot] commented #5212
  • 08:29
    mengaims opened #5212
  • 05:35
    DFMERA opened #5211
  • 04:26
    codeinpeace starred dotnet/machinelearning
  • 03:02
    guinao synchronize #5202
  • 02:54
    guinao synchronize #5202
  • 02:49
    guinao synchronize #5202
  • 01:07
    codecov[bot] commented #5205
  • 01:06
    codecov[bot] commented #5205
  • 00:29
    antoniovs1029 commented #5163
  • 00:24
    antoniovs1029 commented #5162
  • 00:18
    antoniovs1029 commented #5163
  • Jun 04 23:56
    michaelgsharp synchronize #5205
  • Jun 04 23:36
    dasokolo opened #5210
  • Jun 04 22:51
    codecov[bot] commented #5209
  • Jun 04 22:51
    codecov[bot] commented #5209
  • Jun 04 22:23
    Vya4ik003 edited #5206
  • Jun 04 22:14
    antoniovs1029 edited #5163
Omar Himada
@omarhimada
(removed) answered my own question
Ian Bebbington
@ibebbs

Hi all, just started playing with Microsoft.ML and pretty impressed. Followed this tutorial to build an image classifier model that works reasonably well (limited training data). I now want to put this model to use but have hit an issue:
The images I want to classify will be in-memory (as a Bitmap) but the trained model seems to need the images on disk. Obviously I could save the image to a temporary file but this seems wasteful when the model will need to read it back in again. From what I can see in the source code, the "LoadRawImageBytes" transform shown below doesn't have any kind of overload for loading in-memory data:

var pipeline = context.Transforms.Conversion.MapValueToKey("Label", "Label")
    .Append(context.Transforms.LoadRawImageBytes("ImageSource_featurized", null, "Image"))
    .Append(context.Transforms.CopyColumns("Features", "ImageSource_featurized"));

Any suggestions on how to go about classifying an in-memory image without saving it to disk first?

Ah, wait, just found this hidden away as a link from a Github issue.
Omar Himada
@omarhimada
you could probably read the bitmaps in with a stream
id put them in a blob storage like s3/azure, download via a StreamReader and make predictions inside that stream's using block
limits memory use and also dont need to keep them on disk
Omar Himada
@omarhimada
correction since image data is binary not text - download from blob with StreamReader, then open a MemoryStream in there, from that memory stream get the bitmaps and make predictions
Ian Bebbington
@ibebbs
Hi @omarhimada, thanks for the suggestions. To clarify, I can load the images into memory fine but don't know how to configure a pipeline such that it's able to classify from the in-memory image.
Ian Bebbington
@ibebbs
Yeah, it's exactly what I want but unfortunately they don't show how they created a model which accepts the input schema with the Bitmap. They just load a pre-made model which doesn't really help me.
Ian Bebbington
@ibebbs
ooOOoo... the "_mlContext.Transforms.ExtractPixels" might be what I'm after. Look into it deeper now. Thanks.
Omar Himada
@omarhimada
np
Ian Bebbington
@ibebbs
Mmm... doesn't look like Transforms.ExtractPixels is compatible with MulticlassClassification.Trainers.ImageClassification; no matter what I try I get an error along the lines of: 'Schema mismatch for feature column 'Features': expected VarVector<Byte>, got Vector<Byte> '. Here's the pipeline:
var pipeline = context.Transforms.Conversion.MapValueToKey("Label", "Label")
    .Append(context.Transforms.ResizeImages(outputColumnName: "ScaledImage", imageWidth: 227, imageHeight: 227, inputColumnName: "Image"))
    .Append(context.Transforms.ExtractPixels(outputColumnName: "ImageSource_featurized", inputColumnName: "ScaledImage", outputAsFloatArray: false))
    .Append(context.Transforms.CopyColumns("Features", "ImageSource_featurized"));

var trainer = context.MulticlassClassification.Trainers.ImageClassification(new ImageClassificationTrainer.Options() { LabelColumnName = "Label", FeatureColumnName = "Features" })
    .Append(context.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));

var trainingPipeline = pipeline.Append(trainer);
Omar Himada
@omarhimada
seems like the reverse of what @luisquintanilla said here - but i bet you could apply the same approach https://github.com/dotnet/machinelearning/issues/4977#issuecomment-606030427
its expecting a vector of unknown length but you're giving it a vector of known length - could probably just encode one to the other
Ian Bebbington
@ibebbs

Yeah, nice. I'd rather provide it the "unknown length" but the [ImageType] attribute that you need to decorate the Bitmap property on the input schema with requires a Width and Height values, like this:

public class Source
{
    [ImageType(227, 227)]
    public Bitmap Image { get; set; }
}

(You get an exception if the attribute isn't supplied)

Bit of a shame really as it means jumping through lots of extra hoops that isn't necessary when loading the image from disk.
Ian Bebbington
@ibebbs
FYI, I've raised an issue about it dotnet/machinelearning#5109.
Luis Quintanilla
@luisquintanilla

@ibebbs take a look at this sample. I think it might help you do what you want.

https://github.com/dotnet/machinelearning/blob/master/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/ImageClassification/ImageClassificationDefault.cs

Notice to train it uses ImageData but to score it uses InMemoryImageData. Difference being, ImageData uses a path and InMemoryImageData uses byte[].

Haiping
@Oceania2018
image.png
Model developers will love it.
Jon Wood
@jwood803
Awesome stuff, @Oceania2018!
Ian Bebbington
@ibebbs
@luisquintanilla , thanks for the link. I actually came across exactly that sample (by way of an issue search -> commit -> scanning source code changes) after posting my question. After encorporating the changes I have this approach working. Ta!
Luis Quintanilla
@luisquintanilla
Great! Glad to hear you got that working.
Haiping
@Oceania2018
Thiago Cândido
@tgcandido
Hey! I'm new to the repository and would really like to contribute. The issues that are labeled 'good first issue' are already assigned or I don't have the slightest clue on how to start resolving it. Is there someone who can point out maybe some other good first issues? Thanks!
Jon Wood
@jwood803
@tgcandido There may be some issues with the "documentation" or "up-for-grabs" tags that you can take on
Thiago Cândido
@tgcandido
Thanks @jwood803
Will check it out
Daniel
@Zuzuk-null
Who made the generative neural network?
thibaultfalque
@thibaultfalque
Hi everyone.
Praveen Raghuvanshi
@praveenraghuvanshi1512

I am getting 'Unable to load DLL 'MklImports' or one of its dependencies' while calling Fit method of ForecastBySsa() in a time series prediction.
Same code runs fine in Visual Studio 2019.

OS: Windows 10 x64
.Net core: 3.1.300
Microsoft.ML : 1.5.0
Microsoft.ML.TimeSeries : 1.5.0

Logs

SubmitCode: var model = pipeline.Fit(data);
CodeSubmissionReceived: var model = pipeline.Fit(data);
CompleteCodeSubmissionReceived: var model = pipeline.Fit(data);
System.DllNotFoundException: Unable to load DLL 'MklImports' or one of its dependencies: The specified module could not be found. (0x8007007E)
at Microsoft.ML.Transforms.TimeSeries.EigenUtils.Dsytrd(Layout matrixLayout, Uplo uplo, Int32 n, Double[] a, Int32 lda, Double[] d, Double[] e, Double[] tau)
at Microsoft.ML.Transforms.TimeSeries.EigenUtils.MklSymmetricEigenDecomposition(Single[] input, Int32 size, Single[]& eigenValues, Single[]& eigenVectors)
at Microsoft.ML.Transforms.TimeSeries.TrajectoryMatrix.ComputeSvd(Single[]& singularValues, Single[]& leftSingularvectors)
at Microsoft.ML.Transforms.TimeSeries.AdaptiveSingularSpectrumSequenceModelerInternal.TrainCore(Single[] dataArray, Int32 originalSeriesLength)
at Microsoft.ML.Transforms.TimeSeries.AdaptiveSingularSpectrumSequenceModelerInternal.Train(RoleMappedData data)
at Microsoft.ML.Transforms.TimeSeries.SsaForecastingTransformer..ctor(IHostEnvironment env, Options options, IDataView input)
at Microsoft.ML.Transforms.TimeSeries.SsaForecastingEstimator.Fit(IDataView input)
at Submission#30.<<Initialize>>d__0.MoveNext()

Omar Himada
@omarhimada
Did you see this
Praveen Raghuvanshi
@praveenraghuvanshi1512
yes, i had a look at it... it talks more of a macOS and not much on windows. Windows everything is managed through nuget. I already have specified dll in nuget package, still Jupyter notebook is unable to reference it. This link https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/install-extra-dependencies also talks about MKL but has no action plan for windows... Wondering why its happening in jupyter notebook and same thing is working fine in VS.
Praveen Raghuvanshi
@praveenraghuvanshi1512
Logged issue# 5178 (dotnet/machinelearning#5178) specifically for Windows 10
Praveen Raghuvanshi
@praveenraghuvanshi1512
There was a bug in dotnet-interactive and a new issue has been opened dotnet/interactive#492 . Workaround is to install dotnet-interactive verison v1.0.127908 using dotnet tool install -g --add-source "https://dotnet.myget.org/F/dotnet-try/api/v3/index.json" Microsoft.dotnet-interactive
johnmcge
@johnmcge
Apologies if this is the wrong place to ask. Are there known limits to the number of columns of data for regression? I'm using mlnet auto-train regression to predict a value from gene expression data. Each row of training data has an Id, Survival (to be predicted) and an expression value (int) for 60k genes. Works great up to 9,996 genes. After that, "Unable to infer column types of the file provided". Thanks so much for the great work.
Omar Himada
@omarhimada
Are you encoding any of those 9,996 or are they all distinct
Like one hot encoding
johnmcge
@johnmcge
I am not encoding. The column values (after UUID and Survival) are integers representing the relative amount of a particular strand of protein coding RNA. It works with up to 9,996 int values. 9,997 and beyond fail. I double checked the header row label and data for that column, but do not see anything unusual for the 9,997th column or others nearby. I should probably remove the UUID; will report back if results are different after doing that.
johnmcge
@johnmcge
It definately presents as related to number of columns. Removing the first column (uuid) results in success for up to 9,997 columns of expressed protien data, and now fails with 9,998 or more.
Omar Himada
@omarhimada
If you encode a bunch of them you can reduce number of columns. I believe that’s the purpose of onehotencoding
At least AFAIK that’s the purpose of its not sure if it fits your use case.
johnmcge
@johnmcge
Encoding does not fit this use case. Admittedly, this approach is not statistically valid. I'm following along with a colleague implementing the same solution in py/scikit. While the regression results are not useful, they had no issues running the ~60k columns of integer data through their pipeline on inferior hardware. just fyi.
Omar Himada
@omarhimada
Interesting yeah