Too smart billboard: How to collect data about people not only on the Internet. Collect feedback from an existing customer

Hello, dear readers blog site. Not so long ago I published an article "". There we met this demon pay system analysts, as it were, from the inside, i.e. learned how data is collected, how it is processed, stored, and how the reports we need are formed on their basis.

This knowledge will certainly be useful to us in the future. Well, now I want to go directly to the conversation about analytics, as such. What is it for? What methods of site analysis exist and what performance criteria should be monitored.

We will also look at how site statistics are collected, what methods and tools are most often used, and most importantly, how this very data is collected. In this regard, we will dwell in detail on such concepts as a visitor, session and hit, which are the basis of all web analytics. Without understanding these things, it will be very difficult for you to further comprehend all the intricacies of improving the efficiency of your site, which we will talk about in the articles of this section.

What to track and how to set tasks for analytics?

From the article just above, we learned that, in fact, the Google Analytics system consists of several blocks, the main of which can be considered:

Data collection tool
Tools for analysis, processing and display of collected statistical data

Why do we need such analytics systems?? Let's see:

It's in in general terms, but in general analytics is needed to improve the state of affairs with your site(and business). Thanks to it, you can measure something and track the impact of the changes you make on some important characteristics for you (traffic, conversion, etc.). What cannot be measured cannot be meaningfully improved either, which is why there is so much attention Lately All SEO specialists devote themselves to collecting statistics, processing and analyzing them. This is not an easy task, but it is very promising.

using systems like Google Analytics, depends on the type of your site. In principle, there are not so many options, so let's just list them:

Sales - relevant for online commerce
Collecting leads - for example, registrations on the site, subscriptions to news feed, filling in the order form, etc. Relevant for many types of resources that collect collections of various user actions in order to monetize them later in one way or another.
Audience engagement and resource attendance - relevant for information and news resources
Helping users in finding information is relevant for information resources such as search engines, catalogs, encyclopedias, etc.
Increasing awareness trademark, as well as audience loyalty to it - relevant when branding, i.e. brand promotion

Accordingly, you will need to understand what type your project belongs to, and on the basis of this you will already choose those performance indicators that should be monitored using the analytics system (Google or Yandex - it doesn’t matter). In theory, the process looks pretty simple:

The most annoying thing is that everything described above, in a good way, needs to be thought out even before you create a site. It is often very difficult to bring a ready-made and working Internet project to such a form that it would be possible to measure the necessary performance indicators. Without all this, using the most powerful analytics systems like Google Analytics becomes no more effective than hammering nails with a microscope.

The main options for collecting statistics for your site

However, let's abstract from this and assume that on all the above points you have more or less meaningful answers. After that, the question arises - how can we collect the data we need for analysis. As I mentioned in the article about , technically data collection can be done in two ways:

collect them directly on the web server where your site is located, capturing all requests to it. For this, data from server logs and logs are used, as well as scripts specially designed for this. This method has its pros and cons:
In terms of technical implementation, this method is a program that is installed directly on the server, where, in fact, your site is located. Most Popular from back-end analytics systems:
1. - a very popular system, which is often installed by hosters on servers by default.
2. Piwik is a very powerful tool that is in no way inferior in terms of capabilities, for example, to such a popular client statistics collection system as Yandex Metrika (although, of course, there is no web browser in Piwik).
3. Loganalyzer is a slightly more advanced analytics than Awstat.
4. Weblog Expert - also similar in essence to Awstat.
But at the same time, you can collect the necessary data directly. in user browsers who visit your site. There is such a client-oriented programming language called JavaScript, whose commands can be included in the Html code of a web page. It is on this principle that most hit counters and analytics systems like Google Analytics or Yandex Metrica work.
You add to all pages of your site a code fragment offered to you, which, when executed, will collect all the necessary data from the browsers of your site visitors (and then transfer it to the servers of the analytics system that you use). This method also has its pros and cons:
1. The collected data will not be as accurate as in the case of server statistics. It is quite difficult to determine the degree of this inaccuracy, and it depends both on the methods used and on random circumstances (in the browsers of some users, the execution of commands written in JavaScript may be forcibly disabled, or you forgot to embed the script in some separate pages your site).
2. All data will be collected and stored on third party servers (the analytics system you are using). True, in this case, their storage period will be limited, and your access to data in exceptional cases (password lost, usage rules violated, etc.) may be limited. In fact, this very data is your payment for the free of charge of most of these services that can use this huge statistical base on a huge number of sites, both for their own purposes, and to transfer, for example, to interested players in the search market for money.
3. The fact that arrays with the collected data will not need to be stored on your server is also a positive thing, because this will not require additional costs, as is the case with server statistics.
4. The analytics capabilities of client systems (those who capture data in users' browsers, i.e. clients), as a rule, are seriously superior to server counterparts.
Examples of client systems for collecting statistics can serve:
1. - those resources that host this counter automatically fall into this rating directory (quite a trust one).
2. - another statistics counter, on the basis of which the rating of the most visited sites in the subject is built.
3. - the most popular way to collect statistics on your site in Runet.
4. - a fairly popular rating of sites in Runet.
5. - a slightly more advanced system for collecting statistics with a rating of sites that have installed their counter.
6. - this is already a full-fledged system for collecting and analyzing site statistics, which has a rather weighty diamond in its crown - a webvisor.
7. is the most advanced analytics system available for free. In general, for a long time Analytics was called a little differently and was a paid system (several hundred dollars a month to find out attendance and related parameters), but then it was bought by the great and terrible Google, after which it was made available to everyone. However, a few years ago, a paid version of Analytics appeared for large sites, which has advanced functionality.
8. Adobe SiteCatalyst is the main competitor of the paid version of Google Analytics. This package is also paid, and has a rather high popularity in the bourgeoisie.
9. WebTrends is also a fairly powerful tool, widely used in the bourgeoisie.
In the continuation of this series of articles, we will consider client systems for collecting statistics, so we will talk about them in more detail.

How do analytics systems work when collecting site statistics?

So, in practice, the collection of site statistics in the client system is carried out by embedding a small code fragment written in JavaScript into all its pages. Although, in fact, this is not the code itself, but only a way to call it. The statistics collection code itself is rather voluminous, and it is loaded simultaneously with loading this web page from the Google or Yandex servers (in the case of using Analytics and Metrics, respectively), unless, of course, it was previously cached in the user's browser.

The browser executes this code by running it in its JavaScript interpreter. As a result, various data is collected and sent to the Yandex or Google servers (what kind of page, where the visitor came from, what cookies are stored for him in the browser, what screen resolution he has, what browser, what OS, and much more). And then the collected statistics are stored in the database of the analytics system that you decide to use.

This data is already accessed by the analytics system when we try to view certain reports on our site through its web interface. Based on these reports, we can already conduct further analysis. That's it, very simple. If speak about mobile applications, then, as I already mentioned in the article about, it is not the JavaScript code that is used for tracking, but the so-called developer kit (SDK). Statistical data taken in mobile applications is not sent constantly, but in batches after a certain amount of time.

All statistical data collected by the tracking code and processed will be available to you in the form of reports in the web interface of the analytics system. In Google Analytics, all reports are based on combinations parameters (metrics) and indicators (measurements).

However, in order for us to speak the same language in the future, it will be necessary to give definitions to the basic concepts (terms) that we will use. In general, I talked about them in an article about Google Analytics (see the link at the beginning of this post), but it doesn’t hurt to repeat it.

When analyzing website statistics, three main concepts are used: hits, sessions and users. All collected statistical data in any analytics system is organized hierarchically in a three-tier system. Hits are at the very bottom, sessions are located a little higher, well, and users are at the very top.

Thus, hits are an integral part of the session (the set of actions performed during the visit of a given user to the site), and the set of sessions already characterizes the user's behavior on the site (how many times he visits the site and how long his visits lasted). Let's look at all this in more detail and thoroughly:

Let's take a closer look at cookies. These are small pieces of data in text format that are stored in the browser cache. They are often used as a mechanism that allows you to remember the visitor and his preferences - to store the settings he made on the site, authorization parameters, and something else. When visiting this site again, the browser reads the cookies recorded for it and the visitor will be taken to the familiar interface, he does not need to re-authorize on this site every time the page is updated.

Cookies can be divided into two types - primary (cookies of the site visited by the visitor) and third-party (they do not belong to this site, but are present on the open page). An example of a third-party cookie source would be a banner that is displayed on a page but loaded from a third-party server. In browser settings, acceptance of third-party cookies can be disabled, which, in fact, many do.

Nevertheless, the limit allotted for one article has been exhausted, so we will continue the conversation about cookies and everything else that will allow us to master the science of meaningful collection of site statistics and work with reports built on its basis, in subsequent publications of the heading "".

Good luck to you! See you soon on the blog pages site

You can watch more videos by going to

");">

5 Sources of Target Audience Data You Must Use

In this article, we focus on the analysis of existing data, i.e. the article will be useful to those who have accumulated at least primary statistics on audience, there are groups in social networks and regular "access to the body" of customers represented by the current sales department.

Below is a detailed analysis of 5 sources of data on target audience and transferring the information received to the character map.

Character map core provide answers to the following items:

Polo-demographic data;
Emotional state/Interests;
Purpose of purchase or problem;
Purpose of visiting the site;
The main decision-making factors;
Additional decision-making factors;
objections;

Rice. 1. Character card template.

How to analyze the existing audience? Where to collect information? How to answer these questions? No need to reinvent the wheel - start with the sources you have at hand. Consider several methods for analyzing customer needs:

Incoming calls

Rice. 2. Is it like identifying needs in your sales team?

What you should pay attention to when analyzing incoming calls:

What are the most common customer calls?
How does he formulate them?
What solutions have you already tried?
Why didn't it help?
objections

The sales department and the recording of telephone conversations will help you collect this information, moreover, the questions above are standard questions for identifying needs, you don’t need to do anything criminal. Potential client will not even understand that you are collecting information.

It is worth analyzing the call records, paying attention to the client's wording, objections and his experience before the purchase, in order to trace the logic of the decision.

What information can be gleaned from phone calls: the problem / purpose of the purchase, key factors solutions and collect a list of objections that you will close on the landing page.

Collect feedback from an existing customer

Call 10 real clients and ask them to answer 6 short questions:

What problem was the client trying to solve?
What solutions has he already tried?
Why didn't it help?
Why did you decide to contact you?
What was the decisive factor in the purchase?
What result did you get?

To make customers more willing to leave reviews, you can make a mutually beneficial post in the blog case format, where you post a customer review and put a link to his project. You get +100 reputation, and the client gets additional clicks to your site. Here's how we implemented it for the .

Poll in mailing list

Do you collect a database of emails and regularly “feed” it with useful content? A survey in the mailing list will help you kill 2 birds with one stone:

Segment mailings by interests to make them more targeted and increase performance;
and get feedback from the client in terms of the quality of your newsletter and the company as a whole;

How to create a poll? There are several ways:

Use Google Forms and put a link to the survey in the letter;
Or use the built-in functionality of mailing services. Getresponce offers to create a survey right inside the service, without involving third-party solutions.

There is general rules, which must be taken into account when creating a survey: no more than 10 questions (the fewer and more accurately formulated, the better), the last question can be left open and given the opportunity to answer in your own words, answer options must be provided in advance. In exchange for completing the survey, offer a bonus and explain why you need this survey (“to send you only what you are interested in”, etc.).

Analysis of groups in social networks

What data can be obtained from social networks? At your service is a complete set of information for compiling a mind-map for the characters, a platform where you ask a question and get an answer to it. In a word, direct "access to the body" of your potential customers.

In contact with

First of all, it is worth studying the statistics of your group. What obvious data is visible in the statistics:

Gender/Age;
Geography (countries and cities);
Devices (the ratio between views from a computer and a mobile device);
Sources of referrals;

It will also be useful to conduct a semantic analysis (pull out the most popular keys, for example, using the Advego service) and understand the interests of the audience. To do this, we load the saved audiences in Cerebro or Targethunter into “groups where there are CA”, set the number of participants (1000-50000), copy the names of the groups, paste them into Advego and get a list of interest keys.

Rice. 3. An example of the Cerebro Target service interface.

Thus, in addition to socio-demographic data, you can extract the interests of the audience from VK, understand the key decision-making factors, find out who it focuses on when choosing (you can even identify opinion leaders).

Facebook

Facebook has Audience Insights at your service. From there you can even get hidden information, because. the service shows even those interests that were not indicated in the account (based on likes).

Sequencing:

Select the audience you want: all Facebook users (broad interests and competitor research), users associated with your page (your current audience), custom audience (download email database). You can specify audience parameters: gender, age, etc.
In the interest line, enter specific pages (popular places, names, etc.). We put down the desired region.

Further, by the selected group, we will be able to analyze the detailed demographic composition of the audience, geography, see the areas of activity, which are the most popular categories of pages, user activity and devices used.

Rice. 4. Screenshot from the Audience Insights service: we can evaluate the gender and age composition of the audience, see the most massive segment.

Yandex.Metrica

Yandex.Metrica allows you to get fairly complete information about the target audience of the site. In Metrica's reports, you can find information about the geography of users, gender and age characteristics, long-term interests (allows you to see typical search queries and user behavior on the network.), etc.

Rice. 5. Yandex.Metrica interface.

Standard report "Geography"

Reports > Standard Reports > Visitors > Geography

Gender characteristics. Age and Gender reports

Reports > Standard Reports > Visitors > Age

Reports > Standard Reports > Visitors > Gender

Long term characteristics

Reports > Standard Reports > Visitors > Long-Term Leads

To understand which segment of the audience is most interested in your product/service and create your own customized report, use the "Group" tool in any of the above reports. It allows you to see: audience activity and engagement level, conversions, traffic sources for each audience segment.

Output

The most complete amount of information about the user brings live communication: calls, collecting feedback and analyzing activities in social networks. That's where we get detailed information about the purpose of the purchase (problem), about the key and additional decision-making factors and identify objections.

Socio-demographic data, gender and age characteristics, geography, devices used, interests are best collected through web services (Yandex.Metrica, Google Analytics) and services that collect statistics on social networks (Cerebro, Targethunter, Audience Insights on Facebook).

In the next article, we will look at how to collect information if you do not already have a real audience, and in particular:

How to work with wordstat correctly and squeeze the maximum information about your characters out of it;
How to analyze blogs, forums and social media posts. And how to formulate an offer with the help of a thorough analysis of the blogosphere and social networks.
How to conduct a simple competitive analysis and what to look for first.

If you have any questions about collecting information about the target audience of your landing page - welcome in the comments to the article!
Or order a landing page from us and we will do everything ourselves!)

Suppose a company or a bank needs to understand who their customers are, who uses their products. Where will you get information?

There are actually a lot of sources that can tell something about a client. Firstly, the texts on the pages of social networks: about two hundred words written by the client are usually enough to determine his psychotype. Secondly, the photos that people post on the social networks Instagram, Facebook and the captions to them speak volumes. For example, extroverts like bright dynamic photos, images of people. Introverts, on the contrary, are photographs of objects, and in the design they use a calmer color scheme.

In addition, any bank or big company analyze feedback on their mailing lists of messages: carefully observe which messages and how you reacted, and which ones you ignored.

Another source is the so-called transactional behavior of the client. What does he spend money on? And where? Introverts, for example, buy a lot in Garden and Garden stores, in bookstores They don't skimp on insurance. extroverts more money spend in bars and restaurants, buy concert tickets.

It is also important whether the client spends all the money to the penny or prefers to make savings. We use any information that can help in any way.

We analyze the income and expenses of the client for about six months - this is enough to create his profile

What if there are no transactions? If a person withdraws all the money immediately after the salary and then pays in cash?

Of course, there are “difficult” clients. But the majority - 75-80% - we can still "calculate". Nowadays bank cards almost everyone has. And not everyone, as you say, immediately withdraw cash - most still prefer to keep it on cards and pay with them.

It is more convenient to buy an online plane ticket than to look for an airline representative office in the city. It is more comfortable to buy a dress in an online store than to spend an hour to get to shopping center, and then another half day of shopping to find an outfit that you like. We analyze the client's income and expenses for about six months - this is enough to create his profile.

And this information also affects whether the bank will give a loan or not, right?

Yes, including this.

But then how can you explain the fact that one of my acquaintances, who has not officially worked for 4 years and receives royalties in cash, is constantly given small loans by the bank, and another acquaintance with an official income of $1,000 is denied a loan of $5,000 by the bank? What's the catch here?

I don't know which banks you're talking about, so it's hard for me to say why.

Let's formulate the question differently. What psychotype of the client is most beneficial for the bank?

It all depends on the bank and the products it offers. Different people need different books, different food. And various banking products. For example, extroverts need travel insurance because they travel frequently. On the other hand, our company uses technologies that will inspire introverts to consider purchasing insurance.

At the conference, you said that big data does not harm people.

No, they can cause serious damage if used incorrectly. But we at DataSine do everything to ensure that the information is used strictly for its intended purpose. If a client company causes us any suspicions, we will not cooperate with it or limit the amount of information provided.

My colleagues and I are working to ensure that people receive only those emails with product offers that they really need.

Actually, why did I come to work for this company? Because I'm tired of receiving non-personalized messages e-mail, I'm tired of all this spam that fell into the box without taking into account my type of personality, my needs.

My colleagues and I are working to reduce spam so that people receive only those emails with product offers that they really need or can use. We use all the information received only for this purpose - in no case to the detriment of the client.

By the way, the European Union already has a regulation on the protection of personal data. I think other governments should follow our example.

What data, in your opinion, is better not to post on the Internet, not to make it public?

Definitely medical information. They should not be disclosed anywhere. They cannot be published or monetized. People themselves must decide what information should be public and what should not.

About the expert

Yorgan Callebaut- Member of the British Psychological Society (BPS), Head of Psychology at DataSine, where he researches big data and its impact on personality. Was at the forefront of using big data for personalization marketing campaigns banks in Europe, UK and Russia.

In the previous article, we considered data quality issues (“On Data Quality and Common Mistakes in Data Collection” on Habré).
Today I want to continue talking about data quality and discuss data collection: how to prioritize when choosing a source, how and what data to collect, assessing the value of data for a company, and more.

Collect everything

Have you decided to improve the design and payment of goods on the site?
Great, but how does the process of forming a basket by the buyer go? At what point does he final choice goods: before adding to the cart or before paying for the purchase?
Each site may be different, but how does your client behave?
If you have ordering data, you can analyze it and decide on an update vector that will be convenient not only for you, but also for users.

Gather all the data you can get your hands on. You will never know with 100% certainty which ones you might need, and there can only be one collection opportunity.

The more data you collect, the more information about users you will have, and more importantly, you will be able to understand and predict the context of their actions.
The context helps to better understand your client, his desires and intentions, and the better you know your client, the better you can fulfill his personal needs, which means increasing loyalty and increasing the likelihood of a client returning.

Today, the collection of absolutely all data is no longer such a rarity, this is especially common in online projects. In a company that maximizes data collection and knows how to work with it, almost all activities will be based on them: marketing, sales, work of personnel, updates and improvements, deliveries.
Each direction has internal and external sources data in different formats and different quality.

This is good for the work of analysts and decision making, but it also raises the problem of storing this array of data and processing it. Each action increases the financial burden and the positive effect of owning data can turn into a “headache”.

To make a decision on the appropriateness of collecting and processing certain data, it is necessary to understand their main characteristics. Let's go through them briefly:

Volume
An indicator that affects the financial costs of storing and changing data and the time costs of processing them. And although the cost of storing a unit decreases with an increase in the amount of data, but given the increasing number of sources, the financial burden may become irrational.

Diversity
A diverse set of data sources gives a more complete picture and helps to better understand the context of user actions, but back side medals - a variety of formats and the cost of integrating them into your analytics system. It is not always possible to collect all the data together, and if possible, it is not always necessary.

Speed
How much data needs to be processed per unit of time?
Recall the recent US presidential election - thanks to the fast processing of Twitter messages, it was possible to understand the mood of voters during the debate and adjust their course.

Data giants like Facebook and Google will take a huge amount of time to achieve today's results, but thanks to this, they now have data on each user and can predict their actions.
A common problem for personnel working with data is limited resources, primarily financial and human resources.
In most companies, analysts have to prioritize their data sources, and thereby give up some of them.
In addition, it is necessary to take into account the interests of the business, which means assessing the return on investment in working with data and the possible impact of data on the company.

Priorities and selection of data sources

With limited resources in working with data, specialists have to prioritize and make a choice between sources.
How to be guided by this and how to determine the value of data for the company?

The main goal of the work of analysts is to provide information necessary to other departments in a quality and timely manner. This information has a direct impact on the efficiency of the company and the work of departments.

Each department or department has its own "master" data type.
So for the customer service department, the client's contacts and data of his social networks are important, and for the marketing department - the history of purchases and the map of actions.
So it turns out that each team has its own set of "very important data" and this data is definitely more important and necessary than that of other departments.

But the problem with limited resources does not disappear from the importance and necessity of data, which means that you have to set priorities and act in accordance with them. The main factor for prioritizing data is ROI, but don't forget about accessibility, completeness, and quality.
Here is a list of some metrics that can help you prioritize:

List of options for prioritization

High
Cause: The data is needed immediately.
Explanation: If a department has an urgent need for data with tight deadlines, such data is provided first.

High
Cause: Data adds value.
Explanation: Data increases profits or reduces costs, delivering high ROI.

High
Cause: Different commands require the same data.
Explanation: By satisfying the data needs of multiple teams, you increase ROI.

High
Cause: Short term or streaming data.
Explanation: Some interfaces and protocols give a time-limited "window" for data collection, you should hurry.

Medium
Cause: An addition to an existing dataset that enhances its quality.
Explanation: The new data complements the existing data and improves understanding of the context of actions.

Medium
Cause: The data processing code can be reused.
Explanation: Using well-known code reduces ROI and reduces the number of possible errors.

Medium
Cause: The data is easily accessible.
Explanation: If the data is valuable, and it's easy to get it, go ahead.

Medium
Cause: A convenient API allows you to collect data for past periods.
Explanation: If the data is not required yesterday, and you can always access it, then you should not set it too high a priority.

Low
Cause: Analysts have access to data or other means of obtaining it.
Explanation: If analysts already have access to the data, then perhaps there are higher priority tasks.

Low
Cause: Poor data quality.
Explanation: Low-quality data can be useless and sometimes harmful.

Low
Cause: Requires extraction from web pages.
Explanation: Processing such data can be quite complex and require undue effort.

Low
Cause: Low likelihood of data being used.
Explanation: Data that would be nice to have, but if they are not, then fine.
But, having this data, you can rob cows!

As we can see, it is not important to provide all data “right now”, which means that it is necessary to prioritize and follow in accordance with them.
It is important to maintain a balance between the acquisition of new data and its value to the company.

Data relationship

You get important data from sales, marketing, logistics and customer feedback, but the biggest data value comes from establishing connections between different types data.

For example, consider Diana and her order. She recently ordered a set of outdoor furniture, matching her order with analytics data, we can see that she spent 30 minutes on the site and looked at 20 different sets. This means that she chose the furniture already on the site, not knowing in advance what she would order.
We look where it came from - search results.

If we had information about Diana's other purchases, we would know that she has often bought household goods over the past month.
Frequent online purchases and the use of search engines to find online stores indicates low brand loyalty, which means it will be difficult to persuade it to re-purchase.

So, receiving each new level of information, an individual portrait of the user is compiled, by which you can learn about his life, attachments, habits and predict his behavior.
We add information from the checkout and understand that this is a woman, and at the delivery address we see that she lives in the private sector.

By continuing to analyze, you can find information about her house and plot, predict her needs and make a preventive proposal.
With the right analysis of the data, the offer can work and we will persuade the client to re-purchase, as well as increase his loyalty through an individual approach.

Offering discounts for inviting a friend from the social network will give us access to her friend list and account information, then it will be possible to continue the individual marketing approach to the client and create targeted advertising for it, but this is unlikely to be cost-effective.

Collection and purchase of data

Today there are many ways to collect data, one of the most common is API. But besides how to collect data, they need to be updated, and here everything already depends on the volume.

It is more expedient to replace small amounts of data (up to 100 thousand rows) with fresh ones, but with large arrays, a partial update is already relevant: adding new and deleting obsolete values.

The arrays of some data are so huge that it will be too expensive for the company to process them all, in such cases, a sample is made, and analytics are carried out based on it. "Simple random sampling" is often practiced, but usually the data collected with its help is not representative and is comparable to a coin toss.

An important question: to collect raw or aggregated data?
Some data providers provide already compiled collections, but they have several disadvantages. For example, they may not have the necessary or desired values that would add value to the company's analytics based on this data, but you will not be able to collect or supplement them. Data collected by third-party aggregators is convenient for archiving and storage, and it also significantly saves time and human resources.

But if it is possible to collect raw data, then it is better to choose them - they are more complete, and you can independently aggregate them in accordance with your needs and business requests, and then work with them as you need.

Many companies collect data on their own, and also use data available in open sources. But in some cases, they are forced to pay for obtaining the necessary data to a third party. Sometimes the choice of where to acquire data may be limited, other times not, but regardless of this, when choosing a data source and making a decision to acquire data, you should pay attention to several factors:

Price
Everyone loves free data - both management and analytics - but sometimes high-quality information is only available for money. In this case, you should weigh the rationality of the acquisition and compare the cost and value of the data.

Quality
Is the data clean, can it be trusted?

exclusivity
Is the data prepared individually for you or available to everyone? Will you gain an advantage over your competitors if you use them?

Sample
Is it possible to get a sample to assess data quality prior to acquisition?

Updates
What is the lifespan of the data, how quickly does it become obsolete, will it be updated and how often?

Reliability
What are the limitations of data receiving interfaces, what other limitations can be imposed on you?

Security
If the data is important, will it be encrypted and by how secure protocols? Also, do not forget about the security of their transfer.

Terms of Use
Licensing or other restrictions. What can prevent you from taking full advantage of the data?

Format
How comfortable is it for you to work with the format of the acquired data? Is it possible to integrate them into your system?

Documentation
If you are provided with documentation - good, but if not, then it is worth asking how data is collected to assess its value and reliability.

Volume
If there is a lot of data, can you ensure its storage and processing? Valuable data will not always be voluminous, and vice versa.

Degree of detail
Is this data appropriate for the level of analytics you need?

This is not all, but the main and undoubtedly important questions questions to consider before purchasing data from vendors.