Linked in: connecting data to enhance research

Linked in: connecting data to enhance research

HIV Australia | Vol. 10 No. 1 | June 2012

Dean Murphy considers the benefits and challenges associated with data linage of health record information.

Information about health is recorded throughout people’s lives as they come in contact with the health system. Much of this information is kept in databases by health departments, hospitals, and other organisations. Record linkage is the task of identifying records corresponding to the same entity from one or more data sources.1

Linking to records that already exist is relatively quick and provides data for populations that would be prohibitively expensive to collect in any other way. These endpoints include diagnosis of notifiable diseases, hospitalisations, emergency department presentations, or even death.

Record linkage allows for the possibility of conducting sub-studies looking at predictors of specific clinical outcomes or endpoints within a particular population.

Data linkage can be conducted at study, institutional, regional or state/national level. Some jurisdictions, notably the province of Alberta in Canada, have extensive linkage of health and increasingly also social data.

In Australia, linkage has been used for health and medical research in Western Australia since the 1970s and is now coordinated by a collaboration of academic and health institutions and the state department of health.

The WA Data Linkage System was established in 1995 to connect all available health and related information for the WA population. In New South Wales, the Centre for Health Record Linkage (CHeReL) was established in 2006 and is jointly managed by the Cancer Institute NSW and the NSW Ministry of Health. The CHeReL undertakes linkage between research projects and existing health databases.

Linkage can also be made between data and biological specimens or genetic material. Human tissue collected in individual studies can be stored and used in the future to link back to the demographic and other data collected from the individuals concerned.

Perhaps the most famous example of record linkage was the ill-fated Icelandic Health Sector Database (HSD) in which the genome of the Icelandic population was to be matched with medical records to determine genetic predictors of health outcomes.2

This information could then be leveraged for profit by a private company, deCODE genetics who would commercialise the results of the research which it was hoped would lead in particular to the development of pharmaceutical products. The Icelandic biobank was built on an opt-out model in relation to the inclusion of people’s medical records.

An example of data linkage is the 45 and Up Study, which is a study of ageing in which over 250,000 men and women aged over 45 from across NSW – about 10% of this age group – have been recruited for prospective follow-up.

Information is collected via a baseline questionnaire from participants selected at random from the Medicare Australia enrolment database (with oversampling of residents in rural areas and those aged 80 years). A follow-up questionnaire is mailed every five years.

The study involves linking to medical records including data from the NSW Admitted Patient Data Collection, Medical and Pharmaceutical Benefits Schedule (MBS and PBS) datasets, NSW Central Cancer Registry and Australian Bureau of Statistics (ABS) mortality data.

HIV research and data linkage

In HIV-related research, linkage is also common. Linkage can be conducted between studies of people living with HIV and existing databases to determine associations between data collected in studies and particular health outcomes – for example, whether patterns of antiretroviral use are associated with increased or decreased morbidities.

Additionally, linkage can be conducted to more accurately determine risk of HIV acquisition within particular groups.

For example, Jin et al. (2008)3 conducted linkage to determine HIV incidence among a cohort of gay men in addition to HIV acquisition determined through testing identifiers, including name code (first two letters of the first and last name) and date of birth, from participants were matched against the national HIV register to identify HIV infections that occurred in those who tested outside the study, or who had been lost to active follow up.

A total of 22 HIV infections were determined in this way compared to 31 through testing in the study.4

This shows the importance of linkage because in its absence the cohort study would have reported a lower incidence.

In another study by Pierce et al. (2011)5, men in Victoria who had received HIV post-exposure prophylaxis (PEP) were similarly matched against the Victorian HIV Surveillance Registry to determine incidence among previous users of PEP.

Online and offline consent

Linkage between data collected directly through clinical research or intervieweradministered surveys and health records is one thing but linkage from data collected online to health records is very uncommon and still viewed with suspicion, largely because linkage generally requires explicit consent based on a signed consent form.

This seems to be based on an assumption that people are not fully aware of what they are consenting to online, and that it is difficult for researchers (and ethics committees) to determine if a participant has understood the informed consent statement.

However, an analysis by Varnhagen et al. (2005)6 found no difference between online and offline studies. They suggest that online studies actually offer opportunities to improve consent. Arguably, online research poses less risk to participants because reduced social pressure makes it easier for participants to withdraw if they feel discomfort i.e. ‘active continuous consent’.7, 8

In 2009, as part of a study to determine the feasibility of an online cohort study of gay men, a survey was conducted that explored men’s willingness to allow linkage to health-related data and whether willingness was associated with any demographic or behavioural characteristics.9 The survey focused on attitudes to linking with databases of HIV/AIDS, STIs, hepatitis, cancer, Medicare records, and death.

Among 1,135 men eligible participants from all states and territories (9% of whom were HIV-positive), only 47% said they would be willing to join a part of the study consisting of record linkage. (However, nearly a quarter of the others were unsure rather than refusing outright, which possibly reflects a general lack of familiarity and inadequate understanding of record linkage.) There were few differences between responses to different registries.

Participants were also asked which identifiers they would be willing to provide to facilitate record linkage.

Most participants indicated they would be willing to provide their postcode (92%) and date of birth (83%). In relation to other identifiers there was a direct relationship between the perceived personal nature of the information and willingness to disclose.

The proportion of respondents who would be willing to provide the first two letters of their given name or the first two letters of their family name was 76% and 70% respectively (and only 48% would be willing to provide the first two letters of both their given name and the first two letters of their family name).

A sizeable proportion (26–29%) indicated that they were unsure if they would be willing to provide these minimal identifiers. Less than 10% expressed an outright refusal to have their data linked to any register.

About one-third indicated they would be willing to provide their full name or Medicare number.

The factors influencing willingness to provide a minimal identifier varied between HIV-negative and HIV-positive men. Among HIV-negative men, those who were willing to provide a name code were older and more likely to identify as gay.

Men holding a university degree had a tendency to be more reluctant to provide these key identifiers.

Among HIV-positive men, those who were willing to provide key name code identifiers were older and had a tendency to be less educated.

HIV-positive men who reported any unprotected sex with casual partners were less likely to be willing to provide key identifiers. This may suggest some concerns related to disclosing information about unprotected sex in the context of the increasing tendency in some jurisdictions to criminalise of HIV transmission.

HIV and eLinkage: the challenges

The challenges to linkage to HIV-related records are both practical/technical and ethical. Some health databases contain full names, which in conjunction with dates-of-birth would allow for deterministic rather than probabilistic linkage. However results from the survey mentioned above demonstrate that only about one-third of men would be definitely be willing to provide full names for this purpose.

It seems feasible and acceptable to request name codes and dates-of-birth from participants. This does not preclude linkage to other databases such as the National Cancer Database and the National Death Index. It means, however, that these databases would need to be customised for linkage which involves additional costs.

An online study does not mean that consent is less thorough than in other study design – in fact it is likely that the opposite may be true. However, inquiries to data custodians and experts in record linkage (such as CHeReL and Australian Institute of Health and Welfare) about whether online consent would be sufficient for linkage did not produce any definitive answers.

It is likely that this will only be known by making an application to ethics committees. What seems certain however is that linkage between individual studies and Medicare data is logistically and ethically complex as well as prohibitively expensive.

Although the 45 and Up Study for example links to Medicare, this study is set up in a way mentioned above that allowed this, such as the selection of participants from the Medicare Australia enrolment database, continued from previous page as well as an insistence on signed consent (including a physical audit of signatures). Consent to data linkage was also a prerequisite, rather than an optional consent, for participation in the study.


Linkage between health records is increasingly common and contributes to accuracy of health and epidemiological data. These data are kept in largely electronic databases in health departments, hospitals, and other organisations.

Paradoxically, research data collected electronically in the form of online surveys and cohorts presents a problem for data linkage in that permission to undertake linkage studies is traditionally dependent on informed consent, based on a signature.

There is a suspicion that consent obtained online is less thorough than consent obtained in person, or even by return mail. However, recent research has suggested that online consent is comparable to consent obtained through traditional means.

Australian gay men seem relatively willing to provide information that would allow for linkage to existing health databases. This willingness may be the result of trust built up over three decades of HIV-related research and would be interesting to compare to other populations.

Not surprisingly, willingness to provide information is inversely related to its identifying nature. In addition, among HIV-positive men who reported any unprotected sex with casual partners there was less willingness to provide identifiers for data linkage, which may indicate concerns related to disclosure of these practices.


1 Lifang, G., Baxter, R. (2006). Decision Models for Record Linkage. In Williams, G. and Simoff, S. (Eds.) Data Mining, Lecture Notes in Computer Science, 3755, 146–160. DOI: 10.1007/11677437_12 3755

2 Pálsson, G. (2008). The rise and fall of a biobank: the case of Iceland. In Gottweis, H, Petersen, A. (Eds.) Biobanks: Governance in Comparative Perspective. Routledge, Abingdon, UK.

3 Jin, F., Prestage, G., McDonald, A., Ramacciotti, T., Imrie, J., Kippax, S., et al. (2008). Trend in HIV incidence in a cohort of homosexual men in Sydney: data from the Health in Men Study. Sexual Health, 5, 109–112.

4 Jin, F., Jansson, J., Law, M., et al. (2010). Per-contact probability of HIV transmission in homosexual men in Sydney in the era of HAART. AIDS, 24, 907–913.

5 Pierce, A., Yohannes, K., Guy, R., et al.

(2011). HIV seroconversions among male non-occupational postexposure prophylaxis service users: a data linkage study. Sexual Health, 8, 179–183.

6 Varnhagen, C., Gushta, M., Daniels, J., et al. (2005). How informed is online informed consent? Ethics & Behavior, 15, 37–48.

7 Pequegnat, W., Rosser, B., Bowen, A., et al. (2007). Conducting internet-based HIV/STD prevention survey research: considerations in design and evaluation. AIDS & Behaviour, 11, 505–521.

8 Sproull, L ., Kiesler, S. (1991). Connections: New ways of working in the networked organization. MIT Press, Cambridge, MA.

9 Adam, P., Imrie, J., Murphy, D., et al. (2009, September). A national gay men’s Internet based prospective cohort and behavioural surveillance platform in Australia—results of a feasibility and acceptability study. Paper presented at the 21st Australasian HIV/AIDS Conference, Brisbane.

Dean Murphy works at AFAO in the areas of HIV education and biomedical prevention.