Here are some of my thoughts about the project progress.
Just to recall, here are my assignments for week 1:
1- Take ECL online classes to create ECL and Roxie jobs. 2- Roxie log collecting. 3- Install HPCC Machine learning bundles 4- Run ML jobs on the sample data.
I am working on the online classes and recording Roxie Logs as I am going on
I already installed the ML bundle
I have an issue with step 4 however. Does it not require me to first clean and format the collected Roxie Logs?
Now for the next steps, my strategy goes down into the following points:
Format and Clean the collected Roxie Logs
Feature extraction: I intend to use program variables as feature. By program variable, I mean, I intend to try to isolate program variable in each log statement during parsing
Clustering: As this point, I would like to cluster logs by program variable. So, all the logs sharing a common program variable would be grouped in the same cluster
Anomaly detection: I would next analyze, for each group, the various logs related to each particular value of a program variable. A possible anomaly would be a sequence of logs related to one value of a program variable which has a log sequence different from the majority
Those are just thoughts for discussion
I like the approach Vennel! Let us create a catalog of the features and the brainstorm the use cases that would make the most sense.
@vzeufack Nice summary!
Yes the log should not be perfect at beginning but it should be ok if we parse the useful part out. Please feel free to let me know if you have any questions for this part.
It should be easier if you test on a small sample set such as those public available datasets.
Please let me know if you cannot find any. I can direct you to those websites.
Thanks for the replies! I will come back to you as soon as I make any progress
Please do you have any suggestion as tasks for next week?
I would really like to create a design document to write down the details of the approach before writing the code. The document will include the use cases.
That said, I think of the following for next week:
1- Parse Roxie Logs
2- Create the design Document
3- Finish the Online Classes - I am not sure if I will be able to finish all the Roxie classes this week, so I am suggesting this just in case I am not able to finish this week