Finding correlation coefficient (Pearson’s product moment coefficient) and z-score normalization of a set of Data in Python

● Problem Statement

Suppose that a hospital tested the age and body fat data for 18 randomly selected adults
with the following results:

Age 23 23 27 27 39 41 47 49 50
%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2

Age 52 54 54 56 57 58 58 60 61
%fat 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7

(a) Normalize the two attributes age and %fat based on z-score normalization.
(b) Calculate the correlation coefficient (Pearson’s product moment coefficient). Are
these two attributes positively or negatively correlated? Compute their covariance.

● Algorithm

Input : Data set of elements as dataSetAgeArray and dataSetFatArray.
And dataSetNo for the numbers of elements in the list
Output: Displaying z-score normalization, and correlation coefficient
Data Structure: dataSetAgeArray and dataSetFatArray are list where we store
user-inputted data/predefined data.
Description: Using statistics and scipy package’s functions to calculate numerical
things with less complexity.
Step 1 : Start
Step 2 : dataSetNo ← 18
Step 3 : dataSetAgeArray ← 23,23,27,27,39,41,47,49,50,52,54,54,56,57,58,58,60,61
Step 4 : dataSetFatArray ← 9.5,26.5,7.8,17.8,31.4,25.9,27.4,
27.2,31.2,34.6,42.5,28.8,33.4,30.2,34.1,32.9,41.2,35.7
Step 5 : Display “Calculating z-score normalization”
Step 6: meanValueAge ← Call mean function with dataSetAgeArray
from statistics pacakges
Step 7 : stDVAge← Call stdev function with dataSetAgeArray
from statistics pacakges
Step 8 : meanFatAge ← Call mean function with dataSetFatArray
from statistics pacakges
Step 9 : stDVFat← Call stdev function with dataSetFatArray
from statistics pacakges
Step 10: Repeat steps 11 to 14 for i ← 0 and i less than DataSetNo
Step 11 : normalizedData ← call zNor with dataSetAgeArray[i],meanValueAge,stDVAge
Step 12 : add normalizedData to newAgeNor
Step 13 : normalizedData ← call zNor with dataSetFatArray[i],meanValueFat,stDVFat
Step 14 : add normalizedData to newFatNor
Step 15 : Display”After doing z-score normalization on Ages” newAgeNor
Step 16 : Display “After doing z-score normalization on Fat” newFatNor
Step 17 : corr ← call pearsonr with dataSetAgeArray, dataSetFatArray
Step 18 : Display “Pearsons correlation: ” corr[0]
Step 19 : Stop

Algorithm for zNor

Input: mean value of the list as mean
Stander Division as stdDv
An integer called num
Output : Returns the answer
Step 1: Start
Step 2 : Return round with 2 decimal places (num-mean)/stdDv

Source Code

import statistics
from scipy.stats import pearsonr

def zNor (num,mean,stdDv):
    return round((num-mean)/stdDv,2)

dataSetNo=18
dataSetAgeArray=[23,23,27,27,39,41,47,49,50,52,54,54,56,57,58,58,60,61]
dataSetFatArray=[9.5,26.5,7.8,17.8,31.4,25.9,27.4,27.2,31.2,34.6,42.5,28.8,33.4,30.2,34.1,32.9,41.2,35.7]

'''
dataSetNo=int(input("Enter the no to data set:\t"))
dataSetAgeArray=[]
dataSetFatArray=[]
    for data in range(dataSetNo):
    age=int(input("Enter Age"))
    dataSetFatArray.append(float(input("Enter fat")))
    dataSetAgeArray.append(float(input("Enter Age"))) 
'''


print("\nCalculating z-score normalization")
newAgeNor=[]
newFatNor=[]
meanValueAge=statistics.mean(dataSetAgeArray)
stDVAge=statistics.stdev(dataSetAgeArray)
meanValueFat=statistics.mean(dataSetFatArray)
stDVFat=statistics.stdev(dataSetFatArray)

for i in range(dataSetNo):
    normalizedData=zNor(dataSetAgeArray[i],meanValueAge,stDVAge)
    newAgeNor.append(normalizedData)
    normalizedData=zNor(dataSetFatArray[i],meanValueFat,stDVFat)
    newFatNor.append(normalizedData)

print("After doing z-score normalization on Ages\n",newAgeNor)
print("After doing z-score normalization on Fat\n",newFatNor)

corr=pearsonr(dataSetAgeArray, dataSetFatArray)

print("Pearsons correlation:\t%.2f"%corr[0])

● Output:

Calculating z-score normalization
After doing z-score normalization on Ages
[-1.77, -1.77, -1.47, -1.47, -0.56, -0.41, 0.04, 0.19, 0.27, 0.42, 0.57, 0.57, 0.72, 0.8, 0.87, 0.87,
1.03, 1.1]
After doing z-score normalization on Fat
[-2.08, -0.25, -2.27, -1.19, 0.28, -0.31, -0.15, -0.17, 0.26, 0.63, 1.48, 0.0, 0.5, 0.15, 0.57, 0.44,
1.34, 0.75]
Pearsons correlation: 0.82
Process finished with exit code 0

● Discussion:

Here I have used a packages called Statistics and scipy to uses it’s functions. Using those
function the programme became shorter and saved a lot of time.

Leave a Comment

Your email address will not be published. Required fields are marked *