TileDBArray 1.15.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.45770363 -0.20880539 0.36228173 . 0.1641972 0.7203913
## [2,] 1.97958628 -0.63187782 0.99566278 . -0.8235002 -0.2409268
## [3,] 1.33011885 1.53163183 -0.96530716 . 0.2634567 1.8981530
## [4,] 1.91496812 0.07047745 -0.22104700 . -0.7731964 0.9707261
## [5,] -2.25690071 0.29001644 0.76035415 . 0.3395853 1.3328695
## ... . . . . . .
## [96,] 2.648129509 2.418325123 0.607748802 . -0.24785796 -1.30970152
## [97,] -1.752048935 0.292559936 -0.349729092 . 0.01340300 1.61970657
## [98,] 0.957073576 0.005443759 0.377724318 . 1.00792317 -1.25184797
## [99,] 0.515575773 -0.869194638 -1.323618067 . -0.51439111 -0.03107525
## [100,] 0.571606767 2.889791137 1.224202943 . 0.33895691 -1.83968748
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.45770363 -0.20880539 0.36228173 . 0.1641972 0.7203913
## [2,] 1.97958628 -0.63187782 0.99566278 . -0.8235002 -0.2409268
## [3,] 1.33011885 1.53163183 -0.96530716 . 0.2634567 1.8981530
## [4,] 1.91496812 0.07047745 -0.22104700 . -0.7731964 0.9707261
## [5,] -2.25690071 0.29001644 0.76035415 . 0.3395853 1.3328695
## ... . . . . . .
## [96,] 2.648129509 2.418325123 0.607748802 . -0.24785796 -1.30970152
## [97,] -1.752048935 0.292559936 -0.349729092 . 0.01340300 1.61970657
## [98,] 0.957073576 0.005443759 0.377724318 . 1.00792317 -1.25184797
## [99,] 0.515575773 -0.869194638 -1.323618067 . -0.51439111 -0.03107525
## [100,] 0.571606767 2.889791137 1.224202943 . 0.33895691 -1.83968748
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.0 0.0 0.0 . 0 0
## [2,] 1.6 0.0 0.0 . 0 0
## [3,] 0.0 0.0 0.0 . 0 0
## [4,] 0.0 0.0 0.0 . 0 0
## [5,] 0.0 0.0 0.0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] TRUE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.45770363 -0.20880539 0.36228173 . 0.1641972 0.7203913
## GENE_2 1.97958628 -0.63187782 0.99566278 . -0.8235002 -0.2409268
## GENE_3 1.33011885 1.53163183 -0.96530716 . 0.2634567 1.8981530
## GENE_4 1.91496812 0.07047745 -0.22104700 . -0.7731964 0.9707261
## GENE_5 -2.25690071 0.29001644 0.76035415 . 0.3395853 1.3328695
## ... . . . . . .
## GENE_96 2.648129509 2.418325123 0.607748802 . -0.24785796 -1.30970152
## GENE_97 -1.752048935 0.292559936 -0.349729092 . 0.01340300 1.61970657
## GENE_98 0.957073576 0.005443759 0.377724318 . 1.00792317 -1.25184797
## GENE_99 0.515575773 -0.869194638 -1.323618067 . -0.51439111 -0.03107525
## GENE_100 0.571606767 2.889791137 1.224202943 . 0.33895691 -1.83968748
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.4577036 1.9795863 1.3301188 1.9149681 -2.2569007 0.4289128
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.45770363 -0.20880539 0.36228173 -1.56567380 1.10292397
## GENE_2 1.97958628 -0.63187782 0.99566278 -1.94805521 0.12397755
## GENE_3 1.33011885 1.53163183 -0.96530716 -0.98297608 1.35200543
## GENE_4 1.91496812 0.07047745 -0.22104700 -1.14310962 0.24848150
## GENE_5 -2.25690071 0.29001644 0.76035415 0.28421331 0.94047831
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.9154073 -0.4176108 0.7245635 . 0.3283944 1.4407825
## GENE_2 3.9591726 -1.2637556 1.9913256 . -1.6470004 -0.4818537
## GENE_3 2.6602377 3.0632637 -1.9306143 . 0.5269134 3.7963060
## GENE_4 3.8299362 0.1409549 -0.4420940 . -1.5463929 1.9414522
## GENE_5 -4.5138014 0.5800329 1.5207083 . 0.6791706 2.6657390
## ... . . . . . .
## GENE_96 5.29625902 4.83665025 1.21549760 . -0.49571593 -2.61940304
## GENE_97 -3.50409787 0.58511987 -0.69945818 . 0.02680599 3.23941314
## GENE_98 1.91414715 0.01088752 0.75544864 . 2.01584633 -2.50369594
## GENE_99 1.03115155 -1.73838928 -2.64723613 . -1.02878222 -0.06215049
## GENE_100 1.14321353 5.77958227 2.44840589 . 0.67791383 -3.67937497
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 1.9628681 4.3275567 -0.5725028 1.8605080 6.8182686 -2.4166429 3.9451971
## SAMP_8 SAMP_9 SAMP_10
## 13.9829602 -6.0112044 12.9692186
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.51482942
## GENE_2 0.73141374
## GENE_3 3.11741228
## GENE_4 2.17047426
## GENE_5 -1.02443276
## GENE_6 -0.84764024
## GENE_7 1.44736739
## GENE_8 -1.13585691
## GENE_9 -0.24834285
## GENE_10 -1.82568614
## GENE_11 -0.85709242
## GENE_12 0.22299786
## GENE_13 -0.43798162
## GENE_14 1.98159651
## GENE_15 -3.26284315
## GENE_16 -1.95011002
## GENE_17 1.43799012
## GENE_18 -0.19782970
## GENE_19 0.44158249
## GENE_20 1.97346096
## GENE_21 0.22385716
## GENE_22 0.25162184
## GENE_23 -0.61149878
## GENE_24 2.29988676
## GENE_25 0.85374020
## GENE_26 -0.58958075
## GENE_27 0.06926647
## GENE_28 2.92891586
## GENE_29 0.97670002
## GENE_30 -1.72749634
## GENE_31 0.63007599
## GENE_32 -0.46158450
## GENE_33 0.33861488
## GENE_34 0.23120355
## GENE_35 0.74447766
## GENE_36 2.94652468
## GENE_37 -0.17313582
## GENE_38 -0.79761831
## GENE_39 -0.94495550
## GENE_40 0.86931469
## GENE_41 -1.70425420
## GENE_42 0.28589843
## GENE_43 3.26885618
## GENE_44 0.73686682
## GENE_45 0.08087672
## GENE_46 1.10962946
## GENE_47 -1.11912656
## GENE_48 -1.71242182
## GENE_49 -1.19243686
## GENE_50 0.40159561
## GENE_51 -1.18475291
## GENE_52 -3.07420669
## GENE_53 -0.85364087
## GENE_54 -0.65096191
## GENE_55 0.42598870
## GENE_56 2.49463396
## GENE_57 1.32818420
## GENE_58 0.87132519
## GENE_59 1.28912581
## GENE_60 1.52847713
## GENE_61 -0.46727332
## GENE_62 -1.96305391
## GENE_63 1.45783200
## GENE_64 -2.80454513
## GENE_65 -0.36879483
## GENE_66 0.13933310
## GENE_67 -3.52443822
## GENE_68 -1.14019538
## GENE_69 0.65516064
## GENE_70 -0.05203438
## GENE_71 1.39371171
## GENE_72 1.27146872
## GENE_73 0.53929283
## GENE_74 -0.12357552
## GENE_75 1.66155622
## GENE_76 3.82471366
## GENE_77 -1.04318386
## GENE_78 0.60918058
## GENE_79 -1.36077379
## GENE_80 1.53337906
## GENE_81 -1.88953613
## GENE_82 1.05982557
## GENE_83 -0.08387146
## GENE_84 -1.10982141
## GENE_85 -2.48571212
## GENE_86 -1.08218527
## GENE_87 -0.14371004
## GENE_88 6.17742337
## GENE_89 0.97873771
## GENE_90 -0.84074592
## GENE_91 -0.13097178
## GENE_92 -0.64999127
## GENE_93 -0.71565822
## GENE_94 2.67589007
## GENE_95 -1.46959631
## GENE_96 4.31626007
## GENE_97 -0.76582362
## GENE_98 -2.46188386
## GENE_99 -0.62328262
## GENE_100 2.58179124
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.57212378 -0.08807232 -0.40822465 . 0.01395368 0.33451742
## [2,] 0.54967876 0.45826421 1.18480030 . -0.60348765 1.56755723
## [3,] 0.70893747 -0.37627835 -1.76917236 . 0.71316251 -1.64547205
## [4,] -0.55134376 0.19067421 0.18855666 . -1.62339877 -0.78246021
## [5,] -0.75141240 0.06738702 -0.01525364 . 1.04714759 -0.36643025
## ... . . . . . .
## [96,] -0.9673203 -0.8996915 -1.3444686 . 0.94630355 1.23204182
## [97,] 1.5592633 -0.8026372 -0.3513248 . 1.34035799 -0.52578395
## [98,] -1.3185802 0.4113368 0.4595915 . 0.32679458 -1.01706799
## [99,] 1.0760680 -0.9261556 1.4376935 . 0.01846361 0.73636980
## [100,] 1.2117853 0.6684684 -1.5927374 . -1.00306568 -0.16261324
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.57212378 -0.08807232 -0.40822465 . 0.01395368 0.33451742
## [2,] 0.54967876 0.45826421 1.18480030 . -0.60348765 1.56755723
## [3,] 0.70893747 -0.37627835 -1.76917236 . 0.71316251 -1.64547205
## [4,] -0.55134376 0.19067421 0.18855666 . -1.62339877 -0.78246021
## [5,] -0.75141240 0.06738702 -0.01525364 . 1.04714759 -0.36643025
## ... . . . . . .
## [96,] -0.9673203 -0.8996915 -1.3444686 . 0.94630355 1.23204182
## [97,] 1.5592633 -0.8026372 -0.3513248 . 1.34035799 -0.52578395
## [98,] -1.3185802 0.4113368 0.4595915 . 0.32679458 -1.01706799
## [99,] 1.0760680 -0.9261556 1.4376935 . 0.01846361 0.73636980
## [100,] 1.2117853 0.6684684 -1.5927374 . -1.00306568 -0.16261324
sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.17 TileDBArray_1.15.0 DelayedArray_0.31.0
## [4] SparseArray_1.5.0 S4Arrays_1.5.0 abind_1.4-5
## [7] IRanges_2.39.0 S4Vectors_0.43.0 MatrixGenerics_1.17.0
## [10] matrixStats_1.3.0 BiocGenerics_0.51.0 Matrix_1.7-0
## [13] BiocStyle_2.33.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.0
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.12
## [7] nanoarrow_0.4.0.1 jquerylib_0.1.4 yaml_2.3.8
## [10] fastmap_1.1.1 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.45.0 tiledb_0.26.0
## [16] knitr_1.46 bookdown_0.39 bslib_0.7.0
## [19] rlang_1.1.3 cachem_1.0.8 xfun_0.43
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.2
## [25] zlibbioc_1.51.0 spdl_0.0.5 digest_0.6.35
## [28] grid_4.4.0 lifecycle_1.0.4 data.table_1.15.4
## [31] evaluate_0.23 nanotime_0.3.7 zoo_1.8-12
## [34] rmarkdown_2.26 tools_4.4.0 htmltools_0.5.8.1