Package 'Scalelink'

Title: Create Scale Linkage Scores
Description: Perform a 'probabilistic' linkage of two data files using a scaling procedure using the methods described in Goldstein, H., Harron, K. and Cortina-Borja, M. (2017) <doi:10.1002/sim.7287>.
Authors: Chris Charlton [aut, cre], Harvey Goldstein [aut]
Maintainer: Chris Charlton <[email protected]>
License: GPL (>= 2)
Version: 1.0
Built: 2024-11-06 03:56:01 UTC
Source: https://github.com/cran/Scalelink

Help Index


buildAstar

Description

Builds the A* matrix

Usage

buildAstar(foinew, ldfnew, grainsize, debug)

Arguments

foinew

numeric matrix representing the file of interest

ldfnew

numeric matrix representing the linking data file

grainsize

integer determining minimum grain size for parallisation

debug

Boolean indicating whether to output additional debugging information

Details

buildAstar takes a matrix representing the file of interest and a matrix representing the linking data file and creates a matrix that can then be used to generating linking scores. Reporting frequency as this occurs can be specified via the nreport option. This is implemented in C++ to provide a speed increase over implementing it directly in the R equivalent.


Calculates linking scores for a file of interest and linkage data file.

Description

This function calculates a score from two files, the file of interest (FOI) and linkage data file (LDF).

Usage

calcScores(FOI, LDF, missing.value = NA, min.parallelblocksize = 1,
  output.varnames = NULL, debug = FALSE)

Arguments

FOI

A data.frame object, matrix or vector to be used as the file of interest. This must contain only the variables of interest, and these must be in the same order as the LDF.

LDF

A data.frame object, matrix or vector to be used as the linkage data file. This must contain only the variables of interest, and these must be in the same order as the FOI.

missing.value

Value used to represent missing data; Defaults to NA

min.parallelblocksize

The minimum block size when splitting up the data accross processors. You may wish to change this to optimise the allocation of processors. see (https://rcppcore.github.io/RcppParallel/#tuning).

output.varnames

Labels to apply to function output; Defaults to column names of the FOI data.frame

debug

Boolean indicating whether to output additional debugging information

Value

A list containing: An numeric vector of scores, one for each of the identifiers of interest and a matrix containing A*.

Author(s)

Goldstein H., and Charlton, C.M.J., (2017) Centre for Multilevel Modelling, University of Bristol.


File of interest

Description

File of interest data data with 7742 records and 5 variables.

Format

A data frame with 7742 observations on the following 5 variables:

id

Record Identifier (not used for linking).

Day

Day of Week.

Month

Month of Year.

Year

Year.

Sex

Gender: with codes 1 Male and 2 Female.

Details

The FOI dataset is one of the sample datasets provided with this package for demonstration purposes.

Source

Synthetic data created by Harvey Goldstein

Examples

data(FOI, package = "Scalelink")
summary(FOI)

Linking data file

Description

Linking data file data with 10000 records and 5 variables.

Format

A data frame with 10000 observations on the following 5 variables:

id

Record Identifier (not used for linking).

Day

Day of Week.

Month

Month of Year.

Year

Year.

Sex

Gender: with codes 1 Male and 2 Female.

Details

The LDF dataset is one of the sample datasets provided with this package for demonstration purposes. This version include records with missing data

Source

Synthetic data created by Harvey Goldstein

Examples

data(LDF, package = "Scalelink")
summary(LDF)

Linking data file

Description

File of interest data data with 8142 records and 5 variables.

Format

A data frame with 8142 observations on the following 5 variables:

id

Record Identifier (not used for linking).

Day

Day of Week.

Month

Month of Year.

Year

Year.

Sex

Gender: with codes 1 Male and 2 Female.

Details

The LDFCOMP dataset is one of the sample datasets provided with this package for demonstration purposes. This version has records containing missing data removed

Source

Synthetic data created by Harvey Goldstein

Examples

data(LDFCOMP, package = "Scalelink")
summary(LDFCOMP)