Saudi Crown Prince: Iran’s Supreme Leader ‘Makes Hitler Look Good’
- Jeffrey Goldberg
- Apr Two, 2018
What Mueller’s Most Conspicuous Muffle Suggests
Doing Dishes Is the Worst
The Passing of the Libertarian Uur
‘Free-Range’ Parenting’s Unfair Dual Standard
- Alexander Furnas
- Apr Three, 2012
A guide to what gegevens mining is, how it works, and why it’s significant.
Big gegevens is everywhere wij look thesis days. Businesses are falling all overheen themselves to hire ‘gegevens scientists,’ privacy advocates are worried about individual gegevens and control, and technologists and entrepreneurs scramble to find fresh ways to collect, control and monetize gegevens. Wij know that gegevens is powerful and valuable. But how?
This article is an attempt to explain how gegevens mining works and why you should care about it. Because when wij think about how our gegevens is being used, it is crucial to understand the power of this practice. Without gegevens mining, when you give someone access to information about you, all they know is what you have told them. With gegevens mining, they know what you have told them and can guess a good overeenkomst more. Waterput another way, gegevens mining permits companies and governments to use the information you provide to expose more than you think.
To most of us gegevens mining goes something like this: tons of gegevens is collected, then quant wizards work their arcane magic, and then they know all of this amazing stuff. But, how? And what types of things can they know? Here is the truth: despite the fact that the specific technical functioning of gegevens mining algorithms is fairly ingewikkeld — they are a black opbergruimte unless you are a professional statistician or laptop scientist — the uses and capabilities of thesis approaches are, ter fact, fairly comprehensible and intuitive.
For the most part, gegevens mining tells us about very large and ingewikkeld gegevens sets, the kinds of information that would be readily apparent about petite and ordinary things. For example, it can tell us that “one of thesis things is not like the other” a lade Sesame Street or it can voorstelling us categories and then sort things into pre-determined categories. But what’s elementary with Five datapoints is not so elementary with Five billion datapoints.
And thesis days, there’s always more gegevens. Wij gather far more of it then wij can digest. Almost every transaction or interaction leaves a gegevens signature that someone somewhere is capturing and storing. This is, of course, true on the internet, but, ubiquitous computing and digitization has made it increasingly true about our lives away from our computers (do wij still have those?). The sheer scale of this gegevens has far exceeded human sense-making capabilities. At thesis scales patterns are often too subtle and relationships too ingewikkeld or multi-dimensional to observe by simply looking at the gegevens. Gegevens mining is a means of automating part this process to detect interpretable patterns, it helps us see the forest without getting lost ter the trees.
Discovering information from gegevens takes two major forms: description and prediction. At the scale wij are talking about, it is hard to know what the gegevens shows. Gegevens mining is used to simplify and summarize the gegevens ter a manner that wij can understand, and then permit us to infer things about specific cases based on the patterns wij have observed. Of course, specific applications of gegevens mining methods are limited by the gegevens and computing power available, and are tailored for specific needs and goals. However, there are several main types of pattern detection that are commonly used. Thesis general forms illustrate what gegevens mining can do.
Anomaly detection : ter a large gegevens set it is possible to get a picture of what the gegevens tends to look like ter a typical case. Statistics can be used to determine if something is notably different from this pattern. For example, the IRS could specimen typical tax comebacks and use anomaly detection to identify specific comes back that differ from this for review and audit.
Association learning: This is the type of gegevens mining that drives the Amazon recommendation system. For example, this might expose that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. Thesis types of findings are often used for targeting coupons/deals or advertising. Similarly, this form of gegevens mining (albeit a fairly ingewikkeld version) is behind Netflix movie recommendations.
Cluster detection: one type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the gegevens. Without gegevens mining, an analyst would have to look at the gegevens and determine on a set of categories which they believe captures the relevant distinctions inbetween apparent groups te the gegevens. This would risk missing significant categories. With gegevens mining it is possible to let the gegevens itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But ter a plain example – again with purchasing behavior – wij can imagine that the purchasing habits of different hobbyists would look fairly different from each other: gardeners, fishermen and specimen airplane enthusiasts would all be fairly distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.
Classification: If an existing structure is already known, gegevens mining can be used to classify fresh cases into thesis pre-determined categories. Learning from a large set of pre-classified examples, algorithms can detect persistent systemic differences inbetween items ter each group and apply thesis rules to fresh classification problems. Spam filters are a superb example of this – large sets of emails that have bot identified spil spam have enabled filters to notice differences ter word usage inbetween legitimate and spam messages, and classify incoming messages according to thesis rules with a high degree of accuracy.
Regression: Gegevens mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested te predicting future engagement for a user based on past behavior. Factors like the amount of individual information collective, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included ter such a specimen. Overheen time, this prototype could be honed to include or weight things differently spil Facebook compares how the predictions differ from observed behavior. Ultimately thesis findings could be used to guide vormgeving ter order to encourage more of the behaviors that seem to lead to enhanced engagement overheen time.
The patterns detected and structures exposed by the descriptive gegevens mining are then often applied to predict other aspects of the gegevens. Amazon offers a useful example of how descriptive findings are used for prediction. The (hypothetical) association inbetween cocktail shaker and martini glass purchases, for example, could be used, along with many other similar associations, spil part of a proefje predicting the likelihood that a particular user will make a particular purchase. This proefje could match all such associations with a user’s purchasing history, and predict which products they are most likely to purchase. Amazon can then serve ads based on what that user is most likely to buy.
Gegevens mining, ter this way, can grant immense inferential power. If an algorithm can correctly classify a case into known category based on limited gegevens, it is possible to estimate a wide-range of other information about that case based on the properties of all the other cases te that category. This may sound dry, but it is how most successful Internet companies make their money and from where they draw their power.