Let’s imagine that you want to buy a new car, and you fall in love with this new car’s brand. Because you really want that car, the car’s brand comes out everywhere in your daily life, even though the amount of these cars remain the same. Our brain is trained to focus on what it wants to see.
In a previous post, we reviewed the taxonomy of metaheuristic algorithms for optimization within the context of feature selection in machine learning problems. We explained how feature selection can be tackled as a combinatorial optimization problem in a huge search space, and how heuristic algorithms (or simply metaheuristics) are able to find good solutions -although not necessarily optimal- in a reasonable amount of time by exploring such space in an intelligent manner. Recall that metaheuristics are especially well fitted when the function being optimized is non-differentiable or does not have an analytical expression at all (for instance, the magnitude being optimized is the result of a randomized complex simulation under a parameter set that constitutes a candidate solution). Note that maths cannot help us in such cases and metaheuristics can be the only way to go.
This post aims to show how to build an on-premise Mesos architecture to handle a disaster scenario when an entire Data Center is not available, covering also some framework strategies for zero data loss.
When the Father of Statistics Ronald Fisher started to witness mounting evidence in favor of the association between smoking and lung cancer, he was quick to fall back into the maxim he helped coin “Correlation does not imply causation”, discounting the evidence as spurious and continuing his habit as a heavy smoker of cigarettes. While his command of mathematics was well above his contemporaries at the time, this speaks volumes for the fact that everybody is prone to bias, and that we should attempt not to fall in love with our hypotheses too early in our decision making process. While Fisher may have had a point, the fact that he was a smoker himself certainly clouded his thinking and led him to not consider fairly all possible explanations for the available evidence.
This is the second (and last) part of the series dealing with the formal comparison of Machine Learning (ML) algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.
Let us suppose that we start to develop a webserver for our IOT App with a few endpoints, like POST for receive events, GET devicesBySensorType, GET all, and PUT for update device metadata, etc.
Have you ever watched the cooking teaching shows? You have probably noticed that chefs have usually already all the ingredients separated and chopped. A chef probably will be more useful and creative cooking rather than spending time peeling and chopping potatoes, even though it is still important in the recipe. Likewise, a data scientist will be more useful and creative building models rather than spending time with data preprocessing. In this way, where a chef would prepare exquisite delicacies a data scientist prepares succulent models.
This is the first of a two-part series dealing with the application of statistical tests for the formal comparison of several Machine Learning (ML) algorithms in order to determine whether one generally outperforms the rest or not. In this first chapter, we explain the fundamentals of statistical tests, while in the second part, we examine how they are applied to ML algorithm performance data with the aim of comparing them from a statistical point of view.
Stratio has created a new user interface that allows you to work without writing a single line of code, which means that no programming skills are needed nor to use advanced technologies such as Spark, Scala, HDFS, or Elasticsearch. Developers, Architects, BI engineers, data scientists, business users and IT administrators can create data analytics applications in minutes with a powerful Spark Visual Editor. Welcome to Sparta 2.0, a brand-new version of Sparta born with the forthcoming release of the Stratio Data Centric Platform.
Data analysts are often confronted with a seemingly difficult decision, to choose between a simple model, easy to understand, but lacking in predictive power, or a more complex one, that can attain higher accuracy, but simultaneously leaving a funny feeling in the user, who is left wondering about how it works and perhaps more importantly, how it arrived at its result. As with most things in life, everything comes with a cost. And balancing competing goals requires dealing with trade-offs. This is not an easy choice, as if the analyst cannot explain the reasoning behind the model, neither can she explain it to demanding business users. And therein lies the dilemma. Those experienced in the field of analytics have probably faced this dichotomy once or twice throughout their careers.