Remarks on the use of Pearson’s and Spearman’s correlation coefficients in assessing relationships in ophthalmic data

1,3,7,9 Pearson’s tests use linear modelling relationships to describe how well a relationship describes an interaction between variables. 2 Spearman’s coefficient uses a monotonic function to assess relationships with rank variables. 2 Critical factors affecting the Background: A correlation coefficient is a measure of a relationship between any two quantitative and categorical variables. The coefficient describes the degree of relationship between two variables. Associated variables change in tandem – a change in one variable, and the second changes in the same or opposite direction. Correlation is a commonly used statistical procedure. Medical studies use this test widely to explore diagnosis, prognosis and predicting normative parameters for reference measurements. This test is not uncommon in the ophthalmic field, and many studies in the literature used this statistical procedure. However, in some studies, the interpretation of this test was incorrect, possibly because of the test being partially misunderstood. Aim: This study aims to review articles that used those statistic tests to provide an overview of correlation coefficient tests, their indications and interpretations. Correlation analyses and interpretations in ophthalmic data studies are also discussed. Methods: The preferred reporting items for systematic reviews and meta-analyses guidelines were followed and correlation studies that explored ophthalmic data were searched, investigated and reviewed. This review covered a span over the period published between 1990–2020. Results: This critical review included 64 papers. The papers were directed to investigate many variables, for example, visual acuity, contrast sensitivity, dry eye, myopia, retina and low vision. Some of those papers found significant results while the others did not report any. Their reporting and interpretation of the correlation coefficient varied widely. Conclusion: The studies reviewed suggested that there is a need for reporting, in every single study, the normality of the data, r -value, p -value and the extent of the shared variance between investigated outcomes. Lastly, the clinical implications of those studies findings are recommended to be stated clearly.

choice of a correlation coefficient test include data type, linearity of relationships, presence of outliers and adherence to the parametric assumption. 2 The correlation coefficient (r) is a statistical measure of the strength of linear relationship between two variables. 1 The correlation coefficient is bounded between −1 and +1, inclusive. 10 The strength of the correlation increases from 0 to 1. A value of zero indicates no correlation; a value of one means a complete correlation (and 100% of the variance is explained by the relationship). 9 The sign of the r-value indicates a correlation direction, either direct (+) or inverse (−). 10 Therefore, an absolute necessity is explicit reporting of strength and direction of r when reporting correlation coeffcients in the literature. The authors reporting relationships usually use terms, such as perfect, strong, good and weak. 9 Unfortunately, no standard exists amongst the authors in the field. The same value of r is described differently by several researchers in terms of strength. 9 However, in general, a correlation coefficient (r) of < 0.20 is often considered 'very weak' or 'negligible'. 11 Correlation coefficients (r) of 0.30-0.40 are often classified as a low or fair or mild relationship, of 0.40-0.70 as a moderate relationship, of 0.70-0.90 a strong or high relationship and > 0.90 as a 'very high' relationship. 4,12,13,14,15 However, even these suggested cut-off points are still arbitrary and inconsistent, and should be used carefully. For instance, an r-value of 0.60 could be interpreted as either 'good' or 'moderate'. A correlation coefficient (r) of 0.39 represents a 'weak' association, whereas 0.40 presents a 'moderate' relationship and the transition is difficult to justify. 3 Thus, interpreting the clinical significance of an association is perhaps more important than classifying the strength of a relationship.
When interpreting a correlation coefficient (r), investigators should consider the coefficient of determination (r 2 ) value, in addition to the r-and p-values. 16 This coefficient (r 2 ) indicates the proportion of variance shared between two variables. 4,15,17,18,19 For example, if we observed an r-value of 0.40, 16% of the variation in one variable is explained by variation in the second variable concerned. 10,18 Hauke and Tomasz 1 compared Pearson's and Spearman's coefficients on the same set of data. They concluded that significance in one test might be accompanied by either significance or non-significance in the other, even for large data sets. The two tests have their own specific assumptions, and subsequently, differences exist between the two coefficients; a negative relationship can be identified by one test and a positive coefficient in the other. 1 It is crucial, therefore, to understand assumptions regarding data underlying each test and to check the normality of data before starting statistical analyses, the suggestion of which is supported by others, such as Rebekic et al. 2 They also compared Pearson's and Spearman's coefficients on the same set of variables in winter wheat genotypes. Although they found some similarity between the two tests in terms of correlation coefficients strength and significance, they also found some discrepancies between both tests, especially in terms of finding a non-significant outcome in Pearson's test and a significant outcome in Spearman's coefficient test. They concluded that the most crucial factors affecting the choice of an appropriate test include data type, linearity of relationship, presence of outliers and violation of parametric assumptions.
One of the main goals of statistical analysis was to provide an evaluation of confidence regarding the size of an effect of the investigated matter. It is common to express such confidence in terms of 'probabilities' of hypotheses. 20 Misinterpretation and misuse of statistical tests may involve statistical significance. 20 For example, a medical journal suggested that denying the null hypothesis via a significance testing procedure is invalid, and therefore, authors are not required to present it in their articles. 21 Specifically, the statistical analysis classifies results as significant or non-significant based on a p value. 22 The variable p stands for probability and measures what is the probability that an observed difference between groups is because of chance. 23 A p-value close to 0 means that the observed difference is unlikely to be accounted for chance, whereas a p-value close to 1 indicates no difference between the groups other than because of mostly chance. 23 Fisher proposed a 0.05 cut-off point, where p < 0.05 (5% significance) considered as a standard level for concluding that there is evidence against the hypothesis tested. 24 The smaller p value indicates the greater statistical incompatibility of the data with the null hypothesis and vice versa with the greater p-value. 22 Specifically, p > 0.05 can indicate that no evidence of difference exist, although it does not mean that there is no difference between the groups. 22 A value of p > 0.05 can be a result of several factors, including incorrect study design, imprecise measurement, inaccurate statistical analysis or small sample size. 22 Therefore, p > 0.05 does not warrant that no difference exist between the groups, but would mean that no difference was observed in this specific observation. 22 The American Statistical Association released six principles regarding the interpretation and proper use of values and the reporting of p-values. 25 The purpose of this study, therefore, was to review the two popular correlation coeffcients reported in ophthalmic data, summarise the strength of correlation coeffcients and discuss issues on the use of Pearson's and Spearman's correlation coefficients and their interpretation.

Methods and materials
The preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines were followed whilst preparing this review. 26 The search focused on relevant peerreviewed publications in eye healthcare that used Pearson's and Spearman's tests. The author employed systematic searches for PubMed and Science Direct databases using dates during the period 1990-2020. Databases of publications, such as Google scholar, and manual searches were also used. Several keywords were used in different combinations, including ocular, eye, vision, visual, ophthalmology, optometry, association, correlation, relationship, Pearson's and Spearman's test. The author screened information in identified articles to remove any duplicate paper from the review list. Abstracts were screened, and non-relevant manuscripts were excluded. The remaining papers were studied to determine which should be included in the review.
Initially, 1310 papers were identified using the PubMed and Science Direct search. Additional 250 papers were found using other search methods. After duplicate papers were removed, 1120 papers remained. Abstracts were screened initially to ensure that relationships under investigation were assessed with Pearson's and Spearman's tests. In some papers, statistical methods were not stated. Therefore, additional vetting of 300 full-text articles was completed. Finally, 64 papers were included and critically reviewed as they satisfied the inclusion criteria, including the papers published on eye healthcare between 1990 and 2020, which used one of the two targeted statistical tests.

Exploring relationships in ophthalmic data
Researchers have investigated several factors that could influence visual acuity (VA), contrast sensitivity (CS) and astigmatism. Terry et al. 27 investigated the relationship between the donor corneal thickness and post-operative vision. They concluded that a significant but weak relationship exists between them but did not control for age (Table 1). Subjects' age ranged from 31 to 90 years, which is a wide range that encompasses many age-related visual and physiological differences. Nejabat et al. 28 observed a weak relationship between the keratoconus corneal cylinder and RGP-corrected VA (Table 1). This result suggests that patients with advanced keratoconus showed poorer VA with RGP lenses, an expected outcome. A study by Bilen et al. 29 reported that refraction and several topographic, pachymetric and wavefront indices derived from Galilei's corneal wavefront instrument showed a significant relationship with CS and logMAR VA, although they did not report the strength of the relationship found (Table 1). 29 Furthermore, Kamiya et al. found a correlation between the Objective Scatter Index (OSI) and VA, but no association was found with corneal high-order aberration after Descemet's stripping automated endothelial keratoplasty (Table 1). 30 Finally, Kawamorita et al. reported a relationship between central and peripheral astigmatism (Table 1). 31 Some researchers were interested in investigating the impact of reduced VA and loss of visual field (VF) on the quality of life (QoL). Specifically, Sawada et al. 32 explored the relationship between the scores of QoL questionnaire and the loss of VF and VA in open-angle glaucoma patients. They reported a good correlation between the QoL and VF loss in 10 out of 12 subscales (Table 1). They also reported a weaker significant relationship with VA (Table 1). This was because of VA being mostly maintained until the late stages of glaucoma. However, the strength of relationships and coefficients of determination were not fully discussed in this study.
Other studies were directed towards investigating dry eye disease. For example, Herbaut et al. 33 investigated Ocular Surface Disease Index, measurement of tear film break-up time, the Oxford score, van Bijsterveld score and Schirmer I test related to OSI recorded over 20 s without blinking. They reported that the OSI significantly correlated with all these parameters (Table 1). However, most correlation coefficients were < 0.3, indicating that the variation shared between these variables was only in the range of 2% -4%, which is consistent with a relatively poor relationship. Accommodation and accommodative convergence were investigated to explore some factors influencing their measurements using both statistical tests of interest in this review. Bruce et al. 34 found a relationship between the accommodative convergence to accommodation (AC/A) ratio and convergence accommodation to convergence (CA/C) ratio with age ( Table 2). They reported the strength of the relationship in accordance with the suggested interpretation mentioned earlier. They suggested that AC/A ratios increase 0.126 pd/D per year and CA/C decreases 0.003 D/pd per year. Gwiazda et al. 35 reported a 'strong' correlation between myopia and blur-driven accommodation ( Table 1). The authors also suggested that increasing accommodative functioning might prove effective in slowing down the progression of myopia; however, randomised controlled trials (RCTs) would be needed to support such routine therapy. Other researchers explored factors that correlated with visual distortions and myopia progression. Piano et al. 36 reported that visual distortions correlated with motor fusion, log stereoacuity, near angle of heterotrophic or heterophoric deviation and amblyopia depth (Table 2). Furthermore, Hyman et al. 37 compared 3-year myopia progression with increases in axial length and reported a significant relationship ( Table 2). 37 They also reported age, gender and ethnicities as crucial factors in myopic progression; however, Pearson's correlation was not conducted to explore these relationships. 37 There were several studies that is directed towards investigating the retinal nerve fibre layer (RNFL), retinal layer thicknesses, foveal thickness, hyper-reflective foci and their correlation with other visual parameters using Pearson's and Spearman's tests. Amanullah et al. 38 explored the correlation between CS and RNFL thickness in different areas of vision in patients with glaucoma. The authors reported a correlation between the RNFL thickness in the inferior quadrant (RNFL clock hours 5:00-7:00) and the CS score in the left upper area of vision (r = 0.20-0.50, p < 0.05, between four visits). A variation in r-values across four visits was observed and only in specific areas of the vision. Therefore, this result might be taken cautiously. However, previous studies suggest reduced CS in glaucomatous patients. 39, 40 Lee et al. 41 investigated the retinal layer thicknesses and visual function in patients with traumatic optic neuropathy. The most significant relationships reported were with mean deviation in Humphrey field analysis and Visual Field Index in this analysis (r = 0.50-0.70, p < 0.05). However, some correlation coefficients were found to be < 0.2 and were not discussed in terms of their strength. Aslan et al. investigated correlations between macular and RNFL thickness parameters with pupillometry measurements of patients with attention deficit hyperactivity disorder (ADHD). 42 They found a significant relationship in the right eye but a weak and insignificant relationship in the left eye (Table 2). Thus, they suggested that pupillometry measurements may be used as supportive diagnostic tools for ADHD, although the strength of the correlation coefficient was 0.3. Furthermore, Katsanos et al. found an association between the RNFL thickness and perimetry measures. 43 They reported a moderate relationship, even though the correlation coefficient was < 0.50 (Table 2). In addition, Balasubramanian et al. explored the relationship between the retinal layer thicknesses and visual function amongst young adults born preterm. The authors reported that the inner retinal layer thickness was 'moderately' correlated with VA (r = 0.30, p < 0.001), 44 although the strength of the reported association was mild or fair. Ye et al. reported that the thickness of one outer retinal sublayer (myoid and ellipsoid zone) is significantly correlated with choroid thickness, indicating that thinner choroids are associated with worse vision (Table 2). 45 However, authors did not report relationships with the other retinal sublayers-outer plexiform layer, Henle fibre layer and outer nuclear layer, outer segment of photoreceptors, and interdigitation zone and RPE-Bruch complex. 45 The study by Holm et al. investigated the foveal thickness measured by optical coherence tomography (OCT) and foveal function by multifocal electroretinography in patients with nonproliferative diabetic retinopathy. 46 They reported an inverse relationship with central area of the OCT, with prolonged implicit times, inversely correlated with VA (Table 2). Finally, Piri et al. investigated the relationship between the number of hyper-reflective foci and Stargardt's disease severity in terms of the degree of retinal atrophy, VA and disease duration. 47 The number of hyper-reflective foci in the Bruch membrane or RPE complex, choriocapillaris, and Sattler's layer increased with decreasing VA (r = 0.9, p < 0.05). They also reported a correlation between the number of hyperreflective foci in the choriocapillaris and the Sattler layer and disease duration (r = 0.98, p < 0.05). They did not find any relationship between these variables and best-corrected VA and central macular thickness. However, there were interesting significant relationships observed, which indicate that 98% of variation is shared. This close relationship means that variables could measure the same parameter, and measuring one variable (i.e. either VA or disease duration) could be sufficient to assess disease severity and could also be more cost effective.
Several studies investigating patients with low vision were reviewed. Messias et al. explored the relationship between the VF and electroretinography indices in patients with retinitis pigmentosa (RP). 48 The authors reported inverse correlations between these measurements (Table 3). Furthermore, Messias et al.'s. study was one of the few studies that described, in their statistical methods, the classification used in reporting the strength of the correlation coefficients. 48 Murakami et al.'s study investigated relationships between aqueous flare and VA and mean deviation (MD) of static perimetry test in RP patients. 49 They observed that aqueous flare values are correlated with VA and MD (Table 3). McMahon et al., 50 explored the relationship between the reading rate and saccadic frequency in patients with age-related macular degeneration (ARMD). Log reading rates of patients were highly correlated with the re-fixation rate for five-letter task scores, indicating a strong association between the saccadic frequency in a sequencing task and patient reading rates (Table 3). Amore et al. explored the relationship between fixation stability and reading performance in ARMD. 51 They reported that reduced reading performance is significantly correlated with fixation instability (Table 3). 51 Cheong et al. investigated the relationship between visual span and reading performance in ARMD. 52 They stated that reading speed did not correlate with visual span size (Table 3). However, reading speed was correlated with information transfer rate, and visual span size was also related to scotoma size (Table 3). They also suggested that slower information transfer in patients with ARMD is correlated with VA, CS and reading acuity ( Table 3). 52 Puell et al. 53 explored the relationship between macular pigment and VA in patients with ARMD, and reported a significant relationship for both high-and lowcontrast VA, indicating that when macular pigment increases, VA improves (  54 However, low correlation coefficients were also suggested as significant but r was < 0.30, indicating a relatively poor relationship. Chen et al. 55 reported a relationship between post-operative central fovea thickness and post-operative best-corrected VA in patients with idiopathic macular epi-retinal membranes (Table 3). However, a comparison with outcomes of surgery without internal limiting membrane peeling was not made, and therefore, the authors' suggestion could be inconclusive.
Other studies investigated factors related to refractive status, although their interpretations were not largely conforming to the strength of the founded correlation coefficients. For example, the study by Mutti et al. 56 provides a good example of mixed interpretation of correlation values. They reported a significant correlation between residuals from the orthogonal regression at the age of 3 months with Mohindra retinoscopy (r = 0.22, p = 0.001) and dynamic retinoscopy (r = 0.15, p = 0.036). They concluded, 'more hyperopic levels of http://www.avehjournal.org Open Access defocus at distance and close up were associated with poorer emmetropisation than that predicted by the underlying level of wet spherical equivalent refractive error of the right eye'. The authors also reported that worse VA at 3 months of age was related to more hyperopic wet spherical errors at 18 months (r = 0.18, p = 0.021), although these low r-values signify no such clinically significant relationship. These results reveal only 2% -4% of shared variation. However, Mutti et al. 56 reported an interesting significant relationship between the change of refractive errors at the age of 3 months and a total accommodative response at near and far distances (r = 0.41, p < 0.0001, r = 0.36, p < 0.0001). Furthermore, they reported a correlation between total accommodative response at near and far with wet spherical equivalent refractive error (r = 0.51, r = 0.47, p < 0.0001, for both). Lauriola 57 investigated the relationship between psychological parameters and child refraction but likewise included mixed interpretations of correlation values. 57 They reported a 'large' correlation in refraction between a mother and child, although the value was only 0.3, indicating a low or mild relationship. They also reported a significant correlation between myopia, introversion and mental closeness (r was −0.15, −0.12, p < 0.01, respectively); however, these relationships indicate only 1% -2% shared variances, which point to a relatively poor or negligible relationship.
Similarly, Khojasteh et al. reported a significant relationship between one multifocal electroretinogram parameter and OCT in the eyes of patients with diabetic macular oedema. 58 Although the r-value was only 0.06 (shared variance of 0.0018%), such reporting highlights the importance of interpreting r-values in a systematic fashion. This study also reported an r-value of 0.48 in the same manner as the r-value of 0.06 without discussing percentages of shared variance. 58 Similarly, Zheng et al.'s study reported that baseline and best-corrected VA levels correlated with baseline depressive symptoms (r = 0.14, p < 0.001, r = 0.17, p = 0.01, respectively) despite low r-values. 59 Moreover, Kattan et al. investigated the relationship between binocular summation and stereoacuity after strabismus surgery. 60 The authors reported a significant relationship between Sloan low-contrast acuity (LCA, 2.5% and 1.25%) and near and distant stereoacuity based on r-values ranging from -0.18 to 0.24 (p < 0.05), which actually indicates a poor relationship. Relationship strength was not fully discussed in these studies, and the percentage of shared variance was not included. 58,60 Finally, Leray et al. suggested that modification of corneal asphericity to induce spherical aberration (SA) can improve the depth of focus in hyperopic LASIK based on outcomes in the relationship between spherical aberration and changes in pseudoaccommodation values for intermediate (r = −0.320, p < 0.01 for negative SA; r = 0.270, p < 0.05 for positive SA) and near vision (r = -0.348, p < 0.01, and r = 0.268, p < 0.05, respectively). 61 The strength of coefficients was mild at best; yet, the authors suggested that aspheric hyperopic LASIK can increase the depth of focus. Experts and reviewers in the field could suggest a cut-off point for correlation coefficient to limit recommendations for changes to surgical procedures or invasive interventions. Comparatively, one of the few studies that discussed the proper interpretation of r 2 is Masters et al. 90 , where the authors discussed that variables showing a correlation coefficient of r = 0.37 share only 14.7% of the variance, and therefore, concluded that although, there was a correlation but it is not the main determinant of the investigated outcome and the result was not conclusive.
Several studies reviewed and investigated the relationship of ocular measurements with various ocular factors and did not find any evidence of relationship between them, which are demonstrated in detail (Table 4). 62,63,64,65,66,67,68,69,70 These studies investigated, for example, patients with myopia, dryness symptoms and keratoconus, and patients with AMD and QoL of patients.

Ethical considerations
The author confirms that ethical clearance was not needed for this review. This study followed all ethical standards for research without direct contact with human or animal subjects.

Discussion
This study aimed to present and discuss methods for investigating and interpreting Pearson's and Spearman's correlations in ophthalmic data. Most reviewed studies did not discuss strengths of relationships, did not report assessment of normality of data distribution, did not report the basis for choosing Pearson's correlation coefficient or Spearman test and at times used contrasting interpretations of correlation coefficients.
One of the interesting observations whilst reviewing these papers was that some studies did not report or comment on the strength of significant relationships. 71,72,73,74,75,76,77,78,79,80,81 Source: Adapted from Ravikumar A, Sarver EJ, Applegate RA. Change in visual acuity is highly correlated with change in six image quality metrics independent of wavefront error and/or pupil diameter. J Vis. 2012;12 (10)  Furthermore, other studies focused more on finding a significant p-value than on relationship strength. 82,83 This misuse may indicate a need for standardised criteria for interpreting correlation coefficient relationships. Additionally, authors would need to discuss findings in terms of their clinical significance and implications. The statistical significance mainly indicates the reliability of the study data, whilst the clinical significance reflects its impact on professional in clinical practice. 85,86 Whereas statistical significance depends heavily on the sample size, in studies with large sample sizes, even small difference between groups can appear to be statistically significant. 85 Therefore, the clinical practitioner has to interpret cautiously whether this statistical significance has any clinical impact. 85 In some studies, the difference might be relatively miniscule, which might not lead to a decision to change the current clinical practice. 87,88 It has been suggested that the clinical significance could be reflected into the extent of change, whether the change makes a marked difference to patients' lives, consumer acceptability, cost-effectiveness and ease of implementation. 89 However, deciding the cut-off point for clinical significance is usually based on the judgement of the clinician, patients' preferences, side-effect profiles and the economic factor. 85,87 In this review, the studies overwhelmingly did not include the clinical significance as a main part of their discussion. In the peer-review process for medical journals, the authors shall be requested for more in-depth discussion of the clinical implication of their findings and relative to their reported statistical significance.
One of the more important issues in investigating relationship is plotting data in graphs, which is the first best step before performing any numerical analysis. 3,15 Such plots may help authors to avoid misusing correlation coefficients for relationships, which are not adequately characterised by the analysis. 3 Schober et al. 3 present examples of these relationships (e.g. see Schober et al. Figures 2A, 3A, and 3B-D). 3 Researchers should not depend only on correlation coefficient values in isolation but should plot data for a visual inspection of the relationship. 3,15 Scatter plot analysis might show a monotonic trend, a good example of which is presented by Ravikumar et al. 90 ( Figure 1, adapted with permission from their papers). Another example could be seen in the Murakami et al.'s study, where aqueous flare correlations with VA and MD demonstrate how a scatter plot illustrates the true strength of a relationship and to what extent variance is shared ( Figure 2, adapted with permission from their papers). 49 The figure in Murakami et al.'s study shows that where aqueous flare is < 10, most participants are < 50 years old, indicating a confounding factor of age and explaining why only 10% of the variance is explained by the association. A more appropriate analysis might be obtained with additional participants recruited in the 50+ age group, and a stronger relationship might be observed.
In conclusion, this review focuses on the use of Pearson's and Spearman's statistical tests for assessing relationships in ophthalmic data and methods of interpretation and reporting. A peer review of studies that present correlations should require authors to report normality of their data, r-values, p-values and to what degree the association explains the variance between two factors or measures. Furthermore, the clinical implication of their findings should be stated clearly, and an in-depth discussion would be preferable. Finally, association does not imply causation, and more detailed analysis can be obtained by regression.