Factor analysis of statistics. Factor analysis in statistical packages Statistica Statgraphics and Factor analysis statistics for dummies

Factor analysis is a statistical method that is used when processing large amounts of experimental data. The tasks of factor analysis are: reducing the number of variables (data reduction) and determining the structure of relationships between variables, i.e. classification of variables, so factor analysis is used as a data reduction method or as a structural classification method.

An important difference between factor analysis and all the methods described above is that it cannot be used to process primary, or, as they say, “raw” experimental data, i.e. obtained directly from the examination of the subjects. The material for factor analysis is correlations, or rather, Pearson's correlation coefficients, which are calculated between the variables (i.e., psychological characteristics) included in the survey. In other words, correlation matrices, or, as they are otherwise called, intercorrelation matrices, are subjected to factor analysis. The names of the columns and rows in these matrices are the same, as they represent a list of variables included in the analysis. For this reason, intercorrelation matrices are always square, i.e. the number of rows in them is equal to the number of columns, and symmetrical, i.e. symmetrical places with respect to the main diagonal have the same correlation coefficients.

The main concept of factor analysis is a factor. This is an artificial statistical indicator resulting from special transformations of the table of correlation coefficients between the studied psychological characteristics, or the matrix of intercorrelations. The procedure for extracting factors from an intercorrelation matrix is called matrix factorization. As a result of factorization, a different number of factors can be extracted from the correlation matrix up to a number equal to the number of original variables. However, the factors identified as a result of factorization, as a rule, are unequal in their value. (five)

With the help of the identified factors, the interdependence of psychological phenomena is explained. (7)

Most often, as a result of factor analysis, not one, but several factors are determined that explain the matrix of intercorrelations of variables in different ways. In this case, the factors are divided into general, general and single. General factors are called, all factor loadings of which are significantly different from zero (zero load indicates that this variable is in no way connected with the others and does not have any effect on them in life). General - these are factors in which part of the factor loadings is different from zero. Single - these are factors in which only one of the loads differs significantly from zero. (7)

Factor analysis may be appropriate if the following criteria are met.

1. It is not possible to factorize qualitative data obtained on a scale of names, for example, such as hair color (black / brown / red), etc.
2. All variables must be independent, and their distribution must be close to normal.
3. Relationships between variables should be approximately linear, or at least not clearly curvilinear.
4. In the original correlation matrix, there should be several correlations modulo higher than 0.3. Otherwise, it is quite difficult to extract any factors from the matrix.
5. The sample of subjects should be large enough. Expert advice varies. The most rigid point of view recommends not to use factor analysis if the number of subjects is less than 100, since the standard errors of correlation in this case will be too large.

However, if the factors are well defined (for example, with loadings of 0.7 rather than 0.3), the experimenter needs a smaller sample to isolate them. In addition, if the data obtained is known to be highly reliable (for example, valid tests are used), then it is possible to analyze the data on a smaller number of subjects. (five).

To analyze the variability of a trait under the influence of controlled variables, the dispersion method is used.

To study the relationship between values - factorial method. Let us consider analytical tools in more detail: factorial, dispersion and two-factor dispersion methods for assessing variability.

ANOVA in Excel

Conditionally, the goal of the dispersion method can be formulated as follows: to isolate from the total variability of parameter 3 the particular variability:

1 - determined by the action of each of the studied values;
2 - dictated by the relationship between the studied values;
3 - random, dictated by all unaccounted for circumstances.

In a programme Microsoft Excel analysis of variance can be performed using the "Data Analysis" tool (tab "Data" - "Analysis"). This is a spreadsheet add-on. If the add-in is not available, you need to open "Excel Options" and enable the setting for analysis.

Work begins with the design of the table. Rules:

Each column should contain the values of one factor under study.
Arrange the columns in ascending/descending order of the value of the parameter under study.

Consider the analysis of variance in Excel using an example.

The firm's psychologist analyzed, using a special technique, the strategy of the behavior of employees in conflict situation. It is assumed that behavior is influenced by the level of education (1 - secondary, 2 - secondary specialized, 3 - higher education).

Enter data into an Excel spreadsheet:

Significant parameter is filled with yellow color. Since the P-value between groups is greater than 1, Fisher's test cannot be considered significant. Consequently, behavior in a conflict situation does not depend on the level of education.

Factor analysis in Excel: an example

Factor analysis is a multivariate analysis of relationships between the values of variables. Via this method the most important tasks can be solved:

comprehensively describe the measured object (moreover, capaciously, compactly);
identify hidden variable values that determine the presence of linear statistical correlations;
classify variables (determine the relationship between them);
reduce the number of required variables.

Consider the example of factor analysis. Suppose we know the sales of any goods for the last 4 months. It is necessary to analyze which items are in demand and which are not.

Now you can clearly see which product sales give the main growth.

Two-way analysis of variance in Excel

Shows how two factors affect the change in the value of a random variable. Consider two-way analysis of variance in Excel using an example.

A task. A group of men and women were presented with sounds of different volumes: 1 - 10 dB, 2 - 30 dB, 3 - 50 dB. The response time was recorded in milliseconds. It is necessary to determine whether gender affects the response; Does loudness affect response?

are called factor analysis. The main varieties of factor analysis are deterministic analysis and stochastic analysis.

Deterministic factor analysis is based on a methodology for studying the influence of such factors, the relationship of which with a generalizing economic indicator is functional. The latter means that the generalizing indicator is either a product, or a quotient of division, or an algebraic sum of individual factors.

Stochastic factor analysis is based on a methodology for studying the influence of such factors, the relationship of which with a generalizing economic indicator is probabilistic, otherwise - correlational.

In the presence of a functional relationship with a change in the argument, there is always a corresponding change in the function. If there is a probabilistic relationship, the change in the argument can be combined with several values of the change in the function.

Factor analysis is also subdivided into straight, otherwise deductive analysis and back(inductive) analysis.

First type of analysis carries out the study of the influence of factors by the deductive method, that is, in the direction from the general to the particular. In reverse factor analysis the influence of factors is studied by the inductive method - in the direction from private factors to generalizing economic indicators.

Classification of factors affecting the effectiveness of the organization

The factors whose influence is studied during the conduct are classified according to various criteria. First of all, they can be divided into two main types: internal factors , depending on the activity of this , and external factors independent of this organization.

Internal factors, depending on the magnitude of their impact on, can be divided into main and secondary. The main ones include factors related to the use and materials, as well as factors due to the supply and marketing activities and some other aspects of the functioning of the organization. The main factors have a fundamental impact on the general economic indicators. External factors, which do not depend on this organization, are determined by natural and climatic (geographical), socio-economic, as well as external economic conditions.

Depending on the duration of their impact on economic indicators, we can distinguish fixed and variable factors. The first type of factors has an impact on economic performance, which is not limited in time. Variable factors affect economic performance only for a certain period of time.

Factors can be divided into extensive (quantitative) and intensive (qualitative) on the basis of the essence of their influence on economic indicators. So, for example, if the influence of labor factors on the volume of output is studied, then the change in the number of workers will be an extensive factor, and the change in the labor productivity of one worker will be an intensive factor.

Factors affecting economic performance, according to the degree of their dependence on the will and consciousness of employees of the organization and other persons, can be divided into objective and subjective factors. TO objective factors can be attributed to weather conditions, natural disasters that do not depend on human activity. Subjective factors are entirely dependent on people. The vast majority of factors should be classified as subjective.

Factors can also be subdivided, depending on the scope of their action, into factors of unlimited and factors of limited action. The first type of factors operates everywhere, in any branches of the national economy. The second type of factors affects only within an industry or even an individual organization.

According to their structure, the factors are divided into simple and complex. The vast majority of factors are complex, including several constituent parts. However, there are also factors that cannot be divided. For example, capital productivity can serve as an example of a complex factor. The number of days the equipment has worked in a given period is a simple factor.

By the nature of the impact on generalizing economic indicators, there are direct and indirect factors. Thus, the change in products sold, although it has an inverse effect on the amount of profit, should be considered direct factors, that is, a factor of the first order. A change in the value of material costs has an indirect effect on profit, i.e. affects profit not directly, but through the cost, which is a factor of the first order. Based on this, the level of material costs should be considered a second-order factor, that is, an indirect factor.

Depending on whether it is possible to quantify the influence of this factor on the general economic indicator, there are measurable and non-measurable factors.

This classification is closely related to the classification of efficiency improvement reserves. economic activity organizations, or, in other words, reserves for improving the analyzed economic indicators.

Factor economic analysis

In those signs that characterize the cause, are called factorial, independent. The same signs that characterize the consequence are usually called resultant, dependent.

The combination of factor and resultant signs that are in the same causal relationship is called factor system . There is also the concept of a factor system model. It characterizes the relationship between the resultant feature, denoted as y, and factor features, denoted as . In other words, the factor system model expresses the relationship between general economic indicators and individual factors that affect this indicator. At the same time, other economic indicators act as factors, which are the reasons for the change in the generalizing indicator.

Factor system model can be mathematically expressed using the following formula:

Establishing dependencies between generalizing (effective) and influencing factors is called economic and mathematical modeling.

Two types of relationships between generalizing indicators and factors influencing them are studied:

functional (otherwise - functionally determined, or rigidly determined connection.)
stochastic (probabilistic) connection.

functional connection- this is such a relationship in which each value of the factor (factorial attribute) corresponds to a well-defined non-random value of the generalizing indicator (effective attribute).

Stochastic connection- this is such a relationship in which each value of a factor (factorial attribute) corresponds to a set of values \u200b\u200bof a generalizing indicator (effective attribute). Under these conditions, for each value of the factor x, the values of the generalizing indicator y form a conditional statistical distribution. As a result, a change in the value of the factor x only on average causes a change in the general indicator y.

In accordance with the two considered types of relationships, there are methods of deterministic factor analysis and methods of stochastic factor analysis. Consider the following diagram:

Methods used in factor analysis. Scheme No. 2

The greatest completeness and depth of analytical research, the greatest accuracy of the results of the analysis is ensured by the use of economic and mathematical methods of research.

These methods have a number of advantages over traditional and statistical methods of analysis.

Thus, they provide a more accurate and detailed calculation of the influence of individual factors on the change in the values of economic indicators and also make it possible to solve a number of analytical problems that cannot be done without the use of economic and mathematical methods.

Factor analysis is one of the most powerful statistical tools for data analysis. It is based on the procedure for combining groups of variables that correlate with each other (“correlation pleiades” or “correlation nodes”) into several factors.

In other words, the purpose of factor analysis is to concentrate the initial information, expressing a large number of considered features through a smaller number of more capacious internal characteristics, which, however, cannot be directly measured (and in this sense are latent).

For example, let's hypothetically imagine a legislature at the regional level, consisting of 100 deputies. Among the various issues on the agenda for voting are: a) a bill proposing to restore the monument to V.I. Lenin on central square city - the administrative center of the region; b) an appeal to the President of the Russian Federation with a demand to return all strategic production to state ownership. The contingency matrix shows the following distribution of deputies' votes:

	Monument to Lenin (for)	Monument to Lenin (against)
Appeal to the President (for)	49	4
Appeal to the President (against)	6	41

It is obvious that the votes are statistically related: the overwhelming majority of deputies who support the idea of restoring the monument to Lenin also support the return to state ownership strategic enterprises. Similarly, most opponents of the restoration of the monument are at the same time opponents of the return of enterprises to state ownership. At the same time, the voting is completely unrelated to each other thematically.

It is logical to assume that the revealed statistical relationship is due to the existence of some hidden (latent) factor. Legislators, formulating their point of view on a wide variety of issues, are guided by a limited, small set of political positions. In this case, we can assume the presence of a hidden split in the deputies according to the criterion of support / rejection of conservative socialist values. A group of "conservatives" stands out (according to our contingency table - 49 deputies) and their opponents (41 deputies). Having identified such splits, we can describe a large number of individual votes in terms of a small number of factors that are latent in the sense that we cannot detect them directly: in our hypothetical parliament, there has never been a vote in which MPs would have been asked to determine their attitude towards conservative socialist values. We detect the presence of this factor based on a meaningful analysis of quantitative relationships between variables. Moreover, if nominal variables are deliberately taken in our example - support for the bill with the categories “for” (1) and “against” (0), then in reality factor analysis effectively processes interval data.

Factor analysis is very actively used both in political science and in "neighboring" sociology and psychology. One of the important reasons for the great demand for this method is the variety of problems that can be solved with its help. Thus, there are at least three “typical” goals of factor analysis:

dimensionality reduction (reduction) of data. Factor analysis, highlighting the nodes of interrelated features and reducing them to some generalized factors, reduces the initial basis of features of the description. The solution of this problem is important in a situation where objects are measured by a large number of variables and the researcher is looking for a way to group them according to a semantic feature. The transition from many variables to several factors makes it possible to make the description more compact, to get rid of uninformative and duplicate variables;

Revealing the structure of objects or features (classification). This problem is close to that which is solved by the cluster analysis method. But if cluster analysis takes their values for several variables as the “coordinates” of objects, then factor analysis determines the position of the object relative to factors (related groups of variables). In other words, with the help of factor analysis, one can evaluate the similarity and difference of objects in the space of their correlations, or in the factor space. The resulting latent variables act as the coordinate axes of the factor space, the objects under consideration are projected onto these axes, which makes it possible to create a visual geometric representation of the studied data, convenient for meaningful interpretation;

indirect measurement. Factors, being latent (empirically not observable), cannot be directly measured. However, factor analysis allows not only to identify latent variables, but also to quantify their value for each object.

Let us consider the algorithm and interpretation of the factor analysis statistics on the example of data on the results of the parliamentary elections in the Ryazan region in 1999 (a federal district). To simplify the example, let's take the electoral statistics only for those parties that have overcome the 5% threshold. The data are taken in the context of territorial election commissions (by cities and districts of the region).

The first step is to standardize the data by converting it to standard scores (so-called L-scores calculated using the normal distribution function).

TEAK (territorial election commission)	"Apple"	"Unity"	Block Zhirinovsky	OVR	CPRF	THX
Ermishinskaya	1,49	35,19	6,12	5,35	31,41	2,80
Zakharovskaya	2,74	18,33	7,41	11,41	31,59	l b 3"
Kadomskaya	1,09	29,61	8,36	5,53	35,87	1,94
Kasimovskaya	1,30	39,56	5,92	5,28	29,96	2,37
Kasimovskaya city	3,28	39,41	5,65	6,14	24,66	4,61
The same in standardized scores (g-scores)
Ermishinskaya	-0,83	1,58	-0,25	-0,91	-0,17	-0,74
Zakharovskaya	-0,22	-1,16	0,97	0,44	-0,14	0,43
Kadomskaya	-1,03	0,67	1,88	-0,87	0,59	-1,10
Kasimovskaya	-0,93	2,29	-0,44	-0,92	-0,42	-0,92
Kasimovskaya city	0,04	2,26	-0,70	-0,73	-1,32	0,01
Etc. (total 32 cases)

	"An Apple"	"Unity"	BJ	OVR	CPRF	THX
"An Apple"
"Unity"	-0,55
BJ	-0,47	0,27
OVR	0,60	-0,72	-0,47
CPRF	-0,61	0,01	0,10	-0,48
THX	0,94	-0,45	-0,39	0,52	-0,67

Already a visual analysis of the matrix of pair correlations allows us to make assumptions about the composition and nature of the correlation pleiades. For example, positive correlations are found for the "Union of Right Forces", "Yabloko" and the "Fatherland - All Russia" bloc (pairs "Yabloko" - OVR, "Yabloko" - SPS and OVR - SPS). At the same time, these three variables are negatively correlated with the CPRF (support for the CPRF), to a lesser extent with Unity (support for Unity), and even less with the BZ variable (support for the Zhirinovsky Bloc). Thus, we presumably have two pronounced correlation pleiades:

("Yabloko" + OVR + SPS) - the Communist Party of the Russian Federation;

("Yabloko" + OVR + SPS) - "Unity".

These are two different pleiades, not one, since there is no connection between Unity and the Communist Party of the Russian Federation (0.01). As for the BZ variable, it is more difficult to make an assumption, here the correlations are less pronounced.

To test our assumptions, we need to CALCULATE the eigenvalues of the factors (eigenvalues), factor scores, and factor loadings for each variable. Such calculations are quite complicated and require serious skills in working with matrices, so here we will not consider the computational aspect. Let's just say that these calculations can be carried out in two ways: the method of principal components (principal components) and the method of principal factors (principal factors). Principal component method is more common, statistical programs use it "by default".

Let us dwell on the interpretation of eigenvalues, factorial values and factor loadings.

The eigenvalues of the factors for our case are as follows:

bgcolor=white>5

Factor	Eigenvalue	% total variation
1	3,52	58,75
2	1,14	19,08
3	0,76	12,64
4	0,49	S.22
0,05	0.80
6	0,03	0,51
Total	6	100%

The greater the eigenvalue of the factor, the greater its explanatory power (the maximum value is equal to the number of variables, in our case 6). One of the key elements of factor analysis statistics is the % total variance indicator. It shows what proportion of the variation (variability) of variables explains the extracted factor. In our case, the weight of the first factor outweighs the weight of all other factors combined: it explains almost 59% of the total variation. The second factor explains 19% of the variation, the third - 12.6%, and so on. descending.

Having the eigenvalues of the factors, we can start solving the problem of data dimension reduction. The reduction will occur due to the exclusion from the model of factors that have the least explanatory power. And here the key question is how many factors to leave in the model and what criteria to follow. So, factors 5 and 6 are clearly superfluous, which together explain a little more than 1% of the entire variation. But the fate of factors 3 and 4 is no longer so obvious.

As a rule, factors remain in the model, the eigenvalue of which exceeds unity (the Kaiser criterion). In our case, these are factors 1 and 2. However, it is useful to check the correctness of removing four factors using other criteria. One of the most widely used methods is scree plot analysis. For our case, it looks like:

The chart got its name from its resemblance to the side of a mountain. "Scree" is a geological term for rock fragments that accumulate at the bottom of a rocky slope. "Rock" is truly influential factors, "scree" is statistical noise. Figuratively speaking, you need to find a place on the graph where the “rock” ends and the “scree” begins (where the decrease in eigenvalues from left to right is greatly slowed down). In our case, the choice must be made from the first and second inflections corresponding to two and four factors. Leaving four factors, we get a very high accuracy of the model (more than 98% of the total variation), but make it quite complex. Leaving two factors, we will have a significant unexplained part of the variation (about 22%), but the model will become concise and easy to analyze (in particular, visually). Thus, in this case, it is better to sacrifice some accuracy in favor of compactness, leaving the first and second factors.

You can check the adequacy of the obtained model using special matrices of reproduced correlations and residual coefficients (residual correlations). The matrix of reproduced correlations contains the coefficients that were recovered from the two factors left in the model. Of particular importance in it is the main diagonal, on which the commonalities of variables are located (in the table in italics), which show how accurately the model reproduces the correlation of a variable with the same variable, which should be unity.

The matrix of residual coefficients contains the difference between the original and reproduced coefficients. For example, the reproduced correlation between the ATP and Yabloko variables is 0.88, while the original one is 0.94. Remainder = 0.94 - 0.88 = 0.06. The lower the residual values, the higher the quality of the model.

Reproduced correlations
	"An Apple"	"Unity"	BJ	OVR	CPRF	THX
"An Apple"	0,89
"Unity"	-0,53	0,80
BJ	-0,47	0,59	0,44
OVR	0,73	-0,72	-0,56	0,76
CPRF	-0,70	0,01	0,12	-0,34	0,89
THX	0,88 -0,43		-0,40	0,66	-0,77	0,88
Residual odds
	"An Apple"	"Unity"	BJ	OVR	CPRF	THX
"An Apple"	0,11
"Unity"	-0,02	0,20
BJ	0,00	-0,31	0,56
OVR	-0,13	-0,01	0,09	0,24
CPRF	0,09	0,00	-0,02	-0,14	0,11
THX	0,06	-0,03	0,01	-0,14	0,10	0,12

As can be seen from the matrices, the two-factor model, being generally adequate, does not explain individual relationships well. Thus, the generality of the BZ variable is very low (only 0.56), the value of the residual coefficient of connection between BZ and “Unity” is too high (-0.31).

Now it is necessary to decide how important an adequate representation of the BJ variable is for this particular study. If the importance is high (for example, if the study is devoted to the analysis of the electorate of this particular party), it is correct to return to the four-factor model. If not, two factors can be left.
Taking into account the educational nature of our tasks, we leave a simpler model.

Factor loads can be represented as correlation coefficients of each variable with each of the identified factors 1ak, the correlation between the values of the first factor variable and the values of the "Apple" variable is -0.93. All factor loadings are given in the factor mapping matrix-

The closer the relationship of the variable with the factor under consideration, the higher the value of the factor load. The positive sign of the factor loading indicates a direct, and the negative sign indicates the feedback of the variable with the factor.

Having the values of factor loads, we can construct a geometric representation of the results of factor analysis. On the X axis, we plot the loads of variables on factor 1, on the Y axis, the loads of variables on factor 2, and we get a two-dimensional factor space.

Before proceeding to a meaningful analysis of the results obtained, let's perform one more operation - rotation. The importance of this operation is dictated by the fact that there is not one, but many variants of the matrix of factor loads that equally explain the relationships of variables (the matrix of intercorrelations). It is necessary to choose a solution that is easier to interpret meaningfully. This is considered a load matrix in which the values of each variable for each factor are maximized or minimized (close to one or zero).

Consider a schematic example. There are four objects located in the factor space as follows:

Loads on both factors for all objects are significantly different from zero, and we are forced to use both factors to interpret the position of objects. But if we “rotate” the entire structure clockwise around the intersection of the coordinate axes, we get the following picture:

In this case, the loads on factor 1 will be close to zero, and the loads on factor 2 will be close to unity (simple structure principle). Accordingly, for a meaningful interpretation of the position of objects, we will involve only one factor - factor 2.

There is quite a large number of factor rotation methods. Thus, the group of orthogonal rotation methods always preserves a right angle between the coordinate axes. These include vanmax (minimizes the number of variables with a high factor load), quartimax (minimizes the number of factors needed to explain the variable), equamax (a combination of the two previous methods). Oblique rotation methods do not necessarily preserve a right angle between the axes (eg direct obiimin). The promax method is a combination of orthogonal and oblique rotation methods. In most cases, the vanmax method is used, which gives good results for most policy research tasks. In addition, as with many other methods, it is recommended to experiment with various techniques rotation.

In our example, after rotation by the varimax method, we obtain the following matrix of factor loadings:

Accordingly, the geometric representation of the factor space will look like:

Now we can proceed to a meaningful interpretation of the results obtained. The key opposition - the electoral split - according to the first factor is formed by the Communist Party of the Russian Federation on the one hand, and Yabloko and the Union of Right Forces (to a lesser extent OVR) - on the other. In terms of content - based on the specifics of the ideological attitudes of the named subjects of the electoral process - we can interpret this demarcation as a "left-right" split, which is "classical" for political science.

Opposition on factor 2 is formed by OVR and Unity. The “Zhirinovsky Block” adjoins the latter, but we cannot reliably judge its position in the factor space due to the peculiarities of the model, which poorly explains the relationships of this particular variable. To explain this configuration, it is necessary to recall the political realities of the 1999 election campaign. At that time, the struggle within the political elite led to the formation of two echelons of the "party of power" - the "Unity" and "Fatherland - All Russia" blocs. The difference between them was not of an ideological nature: in fact, the population was offered to choose not from two ideological platforms, but from two elite groups, each of which had significant power resources and regional support. Thus, this split can be interpreted as "power-elite" (or, somewhat simplifying, "power-opposition").

In general, we get a geometric representation of a certain electoral space of the Ryazan region for these elections, if we understand the electoral space as a space of electoral choice, the structure of key political alternatives (“splits”). The combination of these two splits was very typical of the 1999 parliamentary elections.

Comparing the results of factor analysis for the same region in different elections, we can judge the presence of continuity in the configuration of the electoral choice of territory space. For example, a factor analysis of the federal parliamentary elections (1995, 1999 and 2003) held in Tatarstan showed a stable configuration of the electoral space. For the 1999 elections, only one factor was left in the model with an explanatory power of 83% of the variation, which made it impossible to build a two-dimensional diagram. The corresponding column shows factor loadings.

If you look closely at these results, you will notice that the same main split is reproduced in the republic from election to election: “the “party of power” is all the rest.” In 1995, the “party of power” was the block “Our Home is Russia "(NDR), in 1999 - OVR, in 2003 - "United Russia". Over time, only the "details" change - the name of the "party of power". The new political "label" very easily fits into the static matrix of a one-dimensional political choice.

At the end of the chapter, we will give one practical advice. The success of the development of statistical methods, by and large, is possible only with intensive practical work with special programs (the already mentioned SPSS, Statistica or at least Microsoft Excel). It is no coincidence that the presentation of statistical techniques is conducted by us in the mode of work algorithms: this allows the student to independently go through all the stages of analysis, sitting at the computer. Without attempts at practical analysis of real data, the idea of the possibilities of statistical methods in political analysis will inevitably remain general and abstract. And today the ability to apply statistics to solve both theoretical and applied problems is a fundamentally important component of the model of a political scientist.

Control questions and tasks

1. What levels of measurement correspond to the average values - mode, median, arithmetic mean? What measures of variation are typical for each of them?

2. For what reasons it is necessary to take into account the form of distribution of variables?

3. What does the statement “There is a statistical relationship between two variables” mean?

4. What useful information about relationships between variables can be obtained based on the analysis of contingency tables?

5. What can be learned about the relationship between variables based on the values of the chi-square and lambda statistical tests?

6. Define the concept of "error" in statistical research. How can this indicator be used to judge the quality of the constructed statistical model?

7. What is the main purpose of correlation analysis? What characteristics of a statistical relationship does this method reveal?

8. How to interpret the value of the Pearson correlation coefficient?

9. Describe the method of dispersion analysis. What other statistical methods use ANOVA statistics and why?

10. Explain the meaning of the term "null hypothesis".

11. What is a regression line, what method is used to build it?

12. What does the coefficient R show in the final statistics of the regression analysis?

13. Explain the term "multidimensional classification method".

14. Explain the main differences between clustering using hierarchical cluster analysis and K-means.

15. How can cluster analysis be used to study the image of political leaders?

16. What is the main task solved by discriminant analysis? Define a discriminant function.

17. Name three classes of problems solved using factor analysis. Define the term "factor".

18. Describe the three main methods for checking the quality of the model in factor analysis (Kaiser's criterion, the "scree" criterion, the matrix of reproduced correlations).

International migration of financial resources in the context of factor analysis

25. J.-B. Say entered the history of economic science as the author of the factorial theory of value. What are the main provisions of this theory?

Feasibility study of the construction project and analysis of collateral for the requested construction loan

All phenomena and processes of economic activity of enterprises are interconnected and interdependent. Some of them are directly related, others indirectly. Hence, an important methodological issue in economic analysis is the study and measurement of the influence of factors on the magnitude of the studied economic indicators.

Factor analysis in the educational literature is interpreted as a section of multivariate statistical analysis that combines methods for estimating the dimension of a set of observed variables by studying the structure of covariance or correlation matrices.

Factor analysis begins its history in psychometrics and is currently widely used not only in psychology, but also in neurophysiology, sociology, political science, economics, statistics and other sciences. The main ideas of factor analysis were laid down by the English psychologist and anthropologist F. Galton. The development and implementation of factor analysis in psychology was carried out by such scientists as: Ch.Spearman, L.Thurstone and R.Kettel. Mathematical factor analysis was developed Hotelling, Harman, Kaiser, Thurstone, Tucker and other scientists.

This type of analysis allows the researcher to solve two main tasks: to describe the subject of measurement compactly and at the same time comprehensively. With the help of factor analysis, it is possible to identify the factors responsible for the presence of linear statistical relationships of correlations between the observed variables.

Goals of factor analysis

For example, when analyzing the scores obtained on several scales, the researcher notes that they are similar to each other and have a high correlation coefficient, in which case he can assume that there is some latent variable, which can be used to explain the observed similarity of the obtained estimates. Such a latent variable is called a factor that affects numerous indicators of other variables, which leads to the possibility and need to mark it as the most general, higher order.

Thus, two goals of factor analysis:

determination of relationships between variables, their classification, i.e. "objective R-classification";
reduction in the number of variables.

To identify the most significant factors and, as a result, the factor structure, it is most justified to use principal component method. The essence of this method is to replace correlated components with uncorrelated factors. Another important characteristic of the method is the ability to restrict the most informative principal components and exclude the rest from the analysis, which simplifies the interpretation of the results. The advantage of this method is also that it is the only mathematically justified method of factor analysis.

Factor analysis- a methodology for a comprehensive and systematic study and measurement of the impact of factors on the value of the effective indicator.

Types of factor analysis

There are the following types of factor analysis:

1) Deterministic (functional) - the effective indicator is presented as a product, private or algebraic sum of factors.

2) Stochastic (correlation) - the relationship between the performance and factor indicators is incomplete or probabilistic.

3) Direct (deductive) - from the general to the particular.

4) Reverse (inductive) - from the particular to the general.

5) Single-stage and multi-stage.

6) Static and dynamic.

7) Retrospective and prospective.

Factor analysis can also be exploration- it is carried out in the study of the latent factor structure without an assumption about the number of factors and their loads, and confirmatory designed to test hypotheses about the number of factors and their loads. The practical implementation of factor analysis begins with checking its conditions.

Mandatory conditions for factor analysis:

All signs must be quantitative;
The number of features should be twice the number of variables;
The sample must be homogeneous;
The source variables must be distributed symmetrically;
Factor analysis is carried out on correlating variables.

In the analysis, variables that are strongly correlated with each other are combined into one factor, as a result, the variance is redistributed between the components and the most simple and clear structure of factors is obtained. After combining, the correlation of the components within each factor with each other will be higher than their correlation with components from other factors. This procedure also makes it possible to isolate latent variables, which is especially important in the analysis of social perceptions and values.

Stages of factor analysis

As a rule, factor analysis is carried out in several stages.

Stages of factor analysis:

Stage 1. Selection of factors.

Stage 2. Classification and systematization of factors.

Stage 3. Modeling the relationship between performance and factor indicators.

Stage 4. Calculation of the influence of factors and assessment of the role of each of them in changing the value of the effective indicator.

Stage 5 Practical use of the factor model (calculation of reserves for the growth of the effective indicator).

According to the nature of the relationship between indicators, there are deterministic methods And stochastic factor analysis

Deterministic factor analysis is a methodology for studying the influence of factors whose relationship with the performance indicator is functional, i.e. when the performance indicator of the factor model is presented as a product, private or algebraic sum of factors.

Methods of deterministic factor analysis: Chain substitution method; Method of absolute differences; Relative difference method; Integral method; Logarithm method.

This type of factor analysis is the most common, because, being quite simple to use (compared to stochastic analysis), it allows you to understand the logic of the main factors of enterprise development, quantify their influence, understand which factors, and in what proportion, it is possible and expedient to change for improve production efficiency.

Stochastic analysis is a methodology for studying factors whose relationship with the performance indicator, in contrast to the functional one, is incomplete, probabilistic (correlation). If with a functional (full) dependence, a corresponding change in the function always occurs with a change in the argument, then with a correlation relationship, a change in the argument can give several values of the increase in the function, depending on the combination of other factors that determine this indicator.

Methods of stochastic factor analysis: Pair correlation method; Multiple correlation analysis; Matrix models; Mathematical programming; Operations Research Method; Game theory.

It is also necessary to distinguish between static and dynamic factor analysis. The first type is used when studying the influence of factors on performance indicators for the corresponding date. Another type is a methodology for studying cause-and-effect relationships in dynamics.

And, finally, factor analysis can be retrospective, which studies the reasons for the increase in performance indicators for past periods, and prospective, which examines the behavior of factors and performance indicators in the future.