Overview

Dataset statistics

Number of variables15
Number of observations891
Missing cells869
Missing cells (%)6.5%
Duplicate rows53
Duplicate rows (%)5.9%
Total size in memory80.7 KiB
Average record size in memory92.7 B

Variable types

Categorical8
Numeric4
Boolean3

Alerts

Dataset has 53 (5.9%) duplicate rowsDuplicates
adult_male is highly overall correlated with alive and 3 other fieldsHigh correlation
age is highly overall correlated with whoHigh correlation
alive is highly overall correlated with adult_male and 3 other fieldsHigh correlation
alone is highly overall correlated with parch and 1 other fieldsHigh correlation
class is highly overall correlated with deck and 1 other fieldsHigh correlation
deck is highly overall correlated with class and 1 other fieldsHigh correlation
embark_town is highly overall correlated with embarkedHigh correlation
embarked is highly overall correlated with embark_townHigh correlation
parch is highly overall correlated with aloneHigh correlation
pclass is highly overall correlated with class and 1 other fieldsHigh correlation
sex is highly overall correlated with adult_male and 3 other fieldsHigh correlation
sibsp is highly overall correlated with aloneHigh correlation
survived is highly overall correlated with adult_male and 3 other fieldsHigh correlation
who is highly overall correlated with adult_male and 4 other fieldsHigh correlation
age has 177 (19.9%) missing valuesMissing
deck has 688 (77.2%) missing valuesMissing
sibsp has 608 (68.2%) zerosZeros
parch has 678 (76.1%) zerosZeros
fare has 15 (1.7%) zerosZeros

Reproduction

Analysis started2026-04-28 12:32:51.920657
Analysis finished2026-04-28 12:33:06.436215
Duration14.52 seconds
Software versionydata-profiling v4.18.4
Download configurationconfig.json

Variables

survived
Categorical

High correlation 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
0
549 
1
342 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Length

2026-04-28T12:33:06.806486image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:07.007353image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring characters

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring scripts

ValueCountFrequency (%)
Common891
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

pclass
Categorical

High correlation 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
3
491 
1
216 
2
184 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3

Common Values

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Length

2026-04-28T12:33:07.499771image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:08.050912image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring scripts

ValueCountFrequency (%)
Common891
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

sex
Categorical

High correlation 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
male
577 
female
314 

Length

Max length6
Median length4
Mean length4.704826
Min length4

Characters and Unicode

Total characters4192
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Length

2026-04-28T12:33:08.473377image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:08.823765image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Most occurring characters

ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4192
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin4192
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

age
Real number (ℝ)

High correlation  Missing 

Distinct88
Distinct (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean29.699118
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2026-04-28T12:33:09.593919image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.526497
Coefficient of variation (CV)0.48912219
Kurtosis0.17827415
Mean29.699118
Median Absolute Deviation (MAD)9
Skewness0.38910778
Sum21205.17
Variance211.01912
MonotonicityNot monotonic
2026-04-28T12:33:10.282887image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2430
 
3.4%
2227
 
3.0%
1826
 
2.9%
2825
 
2.8%
3025
 
2.8%
1925
 
2.8%
2124
 
2.7%
2523
 
2.6%
3622
 
2.5%
2920
 
2.2%
Other values (78)467
52.4%
(Missing)177
 
19.9%
ValueCountFrequency (%)
0.421
 
0.1%
0.671
 
0.1%
0.752
 
0.2%
0.832
 
0.2%
0.921
 
0.1%
17
0.8%
210
1.1%
36
0.7%
410
1.1%
54
 
0.4%
ValueCountFrequency (%)
801
 
0.1%
741
 
0.1%
712
0.2%
70.51
 
0.1%
702
0.2%
661
 
0.1%
653
0.3%
642
0.2%
632
0.2%
624
0.4%

sibsp
Real number (ℝ)

High correlation  Zeros 

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.52300786
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2026-04-28T12:33:10.397408image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1027434
Coefficient of variation (CV)2.1084644
Kurtosis17.88042
Mean0.52300786
Median Absolute Deviation (MAD)0
Skewness3.6953517
Sum466
Variance1.2160431
MonotonicityNot monotonic
2026-04-28T12:33:10.488495image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
316
 
1.8%
418
 
2.0%
55
 
0.6%
87
 
0.8%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
 
2.0%
316
 
1.8%
228
 
3.1%
1209
 
23.5%
0608
68.2%

parch
Real number (ℝ)

High correlation  Zeros 

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.38159371
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2026-04-28T12:33:10.584308image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.80605722
Coefficient of variation (CV)2.1123441
Kurtosis9.7781252
Mean0.38159371
Median Absolute Deviation (MAD)0
Skewness2.749117
Sum340
Variance0.64972824
MonotonicityNot monotonic
2026-04-28T12:33:10.689335image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
55
 
0.6%
35
 
0.6%
44
 
0.4%
61
 
0.1%
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
35
 
0.6%
44
 
0.4%
55
 
0.6%
61
 
0.1%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.4%
35
 
0.6%
280
 
9.0%
1118
 
13.2%
0678
76.1%

fare
Real number (ℝ)

Zeros 

Distinct248
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.204208
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2026-04-28T12:33:10.825220image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.693429
Coefficient of variation (CV)1.5430725
Kurtosis33.398141
Mean32.204208
Median Absolute Deviation (MAD)6.9042
Skewness4.7873165
Sum28693.949
Variance2469.4368
MonotonicityNot monotonic
2026-04-28T12:33:10.974967image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0543
 
4.8%
1342
 
4.7%
7.895838
 
4.3%
7.7534
 
3.8%
2631
 
3.5%
10.524
 
2.7%
7.92518
 
2.0%
7.77516
 
1.8%
7.229215
 
1.7%
26.5515
 
1.7%
Other values (238)615
69.0%
ValueCountFrequency (%)
015
1.7%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
6.451
 
0.1%
6.49582
 
0.2%
6.752
 
0.2%
6.85831
 
0.1%
6.951
 
0.1%
ValueCountFrequency (%)
512.32923
0.3%
2634
0.4%
262.3752
0.2%
247.52082
0.2%
227.5254
0.4%
221.77921
 
0.1%
211.51
 
0.1%
211.33753
0.3%
164.86672
0.2%
153.46253
0.3%

embarked
Categorical

High correlation 

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
S
644 
C
168 
Q
77 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters889
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowC
3rd rowS
4th rowS
5th rowS

Common Values

ValueCountFrequency (%)
S644
72.3%
C168
 
18.9%
Q77
 
8.6%
(Missing)2
 
0.2%

Length

2026-04-28T12:33:11.110986image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:11.196785image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s644
72.4%
c168
 
18.9%
q77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter889
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin889
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII889
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

class
Categorical

High correlation 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
Third
491 
First
216 
Second
184 

Length

Max length6
Median length5
Mean length5.2065095
Min length5

Characters and Unicode

Total characters4639
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowThird
2nd rowFirst
3rd rowThird
4th rowFirst
5th rowThird

Common Values

ValueCountFrequency (%)
Third491
55.1%
First216
24.2%
Second184
 
20.7%

Length

2026-04-28T12:33:11.309923image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:11.395580image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
third491
55.1%
first216
24.2%
second184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3748
80.8%
Uppercase Letter891
 
19.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i707
18.9%
r707
18.9%
d675
18.0%
h491
13.1%
s216
 
5.8%
t216
 
5.8%
e184
 
4.9%
c184
 
4.9%
o184
 
4.9%
n184
 
4.9%
Uppercase Letter
ValueCountFrequency (%)
T491
55.1%
F216
24.2%
S184
 
20.7%

Most occurring scripts

ValueCountFrequency (%)
Latin4639
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4639
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

who
Categorical

High correlation 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
man
537 
woman
271 
child
83 

Length

Max length5
Median length3
Mean length3.7946128
Min length3

Characters and Unicode

Total characters3381
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowman
2nd rowwoman
3rd rowwoman
4th rowwoman
5th rowman

Common Values

ValueCountFrequency (%)
man537
60.3%
woman271
30.4%
child83
 
9.3%

Length

2026-04-28T12:33:11.499606image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:11.595000image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
man537
60.3%
woman271
30.4%
child83
 
9.3%

Most occurring characters

ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3381
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring scripts

ValueCountFrequency (%)
Latin3381
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3381
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

adult_male
Boolean

High correlation 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1023.0 B
True
537 
False
354 
ValueCountFrequency (%)
True537
60.3%
False354
39.7%
2026-04-28T12:33:11.664919image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

deck
Categorical

High correlation  Missing 

Distinct7
Distinct (%)3.4%
Missing688
Missing (%)77.2%
Memory size1.3 KiB
C
59 
B
47 
D
33 
E
32 
A
15 
Other values (2)
17 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters203
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowC
3rd rowE
4th rowG
5th rowC

Common Values

ValueCountFrequency (%)
C59
 
6.6%
B47
 
5.3%
D33
 
3.7%
E32
 
3.6%
A15
 
1.7%
F13
 
1.5%
G4
 
0.4%
(Missing)688
77.2%

Length

2026-04-28T12:33:11.757846image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:11.861318image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
c59
29.1%
b47
23.2%
d33
16.3%
e32
15.8%
a15
 
7.4%
f13
 
6.4%
g4
 
2.0%

Most occurring characters

ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter203
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin203
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII203
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

embark_town
Categorical

High correlation 

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
Southampton
644 
Cherbourg
168 
Queenstown
77 

Length

Max length11
Median length11
Mean length10.535433
Min length9

Characters and Unicode

Total characters9366
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSouthampton
2nd rowCherbourg
3rd rowSouthampton
4th rowSouthampton
5th rowSouthampton

Common Values

ValueCountFrequency (%)
Southampton644
72.3%
Cherbourg168
 
18.9%
Queenstown77
 
8.6%
(Missing)2
 
0.2%

Length

2026-04-28T12:33:11.991721image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2026-04-28T12:33:12.085913image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
southampton644
72.4%
cherbourg168
 
18.9%
queenstown77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
S644
6.9%
a644
6.9%
m644
6.9%
p644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8477
90.5%
Uppercase Letter889
 
9.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o1533
18.1%
t1365
16.1%
u889
10.5%
h812
9.6%
n798
9.4%
a644
7.6%
m644
7.6%
p644
7.6%
r336
 
4.0%
e322
 
3.8%
Other values (4)490
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin9366
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
S644
6.9%
a644
6.9%
m644
6.9%
p644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII9366
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
S644
6.9%
a644
6.9%
m644
6.9%
p644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

alive
Boolean

High correlation 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1023.0 B
False
549 
True
342 
ValueCountFrequency (%)
False549
61.6%
True342
38.4%
2026-04-28T12:33:12.152913image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

alone
Boolean

High correlation 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1023.0 B
True
537 
False
354 
ValueCountFrequency (%)
True537
60.3%
False354
39.7%
2026-04-28T12:33:12.211311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Interactions

2026-04-28T12:32:59.389023image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:53.773951image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:55.353739image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:56.787089image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:59.836420image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:54.137290image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:55.676580image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:57.629056image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:33:00.539835image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:54.516644image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:55.951752image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:58.218130image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:33:01.035196image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:54.838595image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:56.280454image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-04-28T12:32:58.882036image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2026-04-28T12:33:12.312168image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
adult_maleagealivealoneclassdeckembark_townembarkedfareparchpclasssexsibspsurvivedwho
adult_male1.0000.3920.5540.4010.0960.2080.1010.1010.1640.3920.0960.9060.3230.5540.999
age0.3921.0000.1550.3510.2690.1970.0650.0650.135-0.2540.2690.099-0.1820.1550.650
alive0.5540.1551.0000.1980.3370.1070.1660.1660.2830.1570.3370.5400.1870.9980.563
alone0.4010.3510.1981.0000.1270.1960.1100.1100.3040.6860.1270.3000.8370.1980.451
class0.0960.2690.3370.1271.0000.6260.2600.2600.4790.0221.0000.1300.1480.3370.137
deck0.2080.1970.1070.1960.6261.0000.1470.1470.1820.1360.6260.2380.1140.1070.273
embark_town0.1010.0650.1660.1100.2600.1471.0001.0000.1960.0520.2600.1130.0920.1660.079
embarked0.1010.0650.1660.1100.2600.1471.0001.0000.1960.0520.2600.1130.0920.1660.079
fare0.1640.1350.2830.3040.4790.1820.1960.1961.0000.4100.4790.1890.4470.2830.159
parch0.392-0.2540.1570.6860.0220.1360.0520.0520.4101.0000.0220.2470.4500.1570.386
pclass0.0960.2690.3370.1271.0000.6260.2600.2600.4790.0221.0000.1300.1480.3370.137
sex0.9060.0990.5400.3000.1300.2380.1130.1130.1890.2470.1301.0000.2060.5400.947
sibsp0.323-0.1820.1870.8370.1480.1140.0920.0920.4470.4500.1480.2061.0000.1870.373
survived0.5540.1550.9980.1980.3370.1070.1660.1660.2830.1570.3370.5400.1871.0000.563
who0.9990.6500.5630.4510.1370.2730.0790.0790.1590.3860.1370.9470.3730.5631.000

Missing values

2026-04-28T12:33:01.666870image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2026-04-28T12:33:03.113916image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2026-04-28T12:33:05.981560image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
003male22.0107.2500SThirdmanTrueNaNSouthamptonnoFalse
111female38.01071.2833CFirstwomanFalseCCherbourgyesFalse
213female26.0007.9250SThirdwomanFalseNaNSouthamptonyesTrue
311female35.01053.1000SFirstwomanFalseCSouthamptonyesFalse
403male35.0008.0500SThirdmanTrueNaNSouthamptonnoTrue
503maleNaN008.4583QThirdmanTrueNaNQueenstownnoTrue
601male54.00051.8625SFirstmanTrueESouthamptonnoTrue
703male2.03121.0750SThirdchildFalseNaNSouthamptonnoFalse
813female27.00211.1333SThirdwomanFalseNaNSouthamptonyesFalse
912female14.01030.0708CSecondchildFalseNaNCherbourgyesFalse
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
88103male33.0007.8958SThirdmanTrueNaNSouthamptonnoTrue
88203female22.00010.5167SThirdwomanFalseNaNSouthamptonnoTrue
88302male28.00010.5000SSecondmanTrueNaNSouthamptonnoTrue
88403male25.0007.0500SThirdmanTrueNaNSouthamptonnoTrue
88503female39.00529.1250QThirdwomanFalseNaNQueenstownnoFalse
88602male27.00013.0000SSecondmanTrueNaNSouthamptonnoTrue
88711female19.00030.0000SFirstwomanFalseBSouthamptonyesTrue
88803femaleNaN1223.4500SThirdwomanFalseNaNSouthamptonnoFalse
88911male26.00030.0000CFirstmanTrueCCherbourgyesTrue
89003male32.0007.7500QThirdmanTrueNaNQueenstownnoTrue

Duplicate rows

Most frequently occurring

survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone# duplicates
3503maleNaN007.8958SThirdmanTrueNaNSouthamptonnoTrue13
3603maleNaN008.0500SThirdmanTrueNaNSouthamptonnoTrue12
3303maleNaN007.7500QThirdmanTrueNaNQueenstownnoTrue8
4613femaleNaN007.7500QThirdwomanFalseNaNQueenstownyesTrue7
902maleNaN000.0000SSecondmanTrueNaNSouthamptonnoTrue6
3003maleNaN007.2250CThirdmanTrueNaNCherbourgnoTrue5
3103maleNaN007.2292CThirdmanTrueNaNCherbourgnoTrue5
4003maleNaN8269.5500SThirdmanTrueNaNSouthamptonnoFalse4
202male23.00013.0000SSecondmanTrueNaNSouthamptonnoTrue3
302male25.00013.0000SSecondmanTrueNaNSouthamptonnoTrue3