How to use?
Once you have installed pyjaspar, you can load the module and connect to the latest release of JASPAR:
>>> from pyjaspar import jaspardb
Connect to the JASPAR
Next step is to connect to the version of JASPAR you’re interested by creating a jaspardb class object. For example here we’re using the the JASPAR2018.
>>> jdb_obj = jaspardb(release='JASPAR2018')
You can also check JASPAR version you are connected to using:
>>> print(jdb_obj.release)
JASPAR2018
By default it is set to latest release/version of JASPAR database. For example.
>>> jdb_obj = jaspardb()
>>> print(jdb_obj.release)
JASPAR2020
You can also connect to a local copy of JASPAR SQLite database by setting absolute path sqlite_db_path. For example.
>>> jdb_obj = jaspardb(sqlite_db_path='/path/to/jaspar.sqlite')
Get available releases
You can find the available releases/version of JASPAR using get_releases method.
>>> print(jdb_obj.get_releases())
['JASPAR2022', 'JASPAR2020', 'JASPAR2018', 'JASPAR2016', 'JASPAR2014']
Get motif by using JASPAR ID
If you want to get the motif details for a specific TF using the JASPAR ID. If you skip the version of motif, it will return the latest version.
>>> motif = jdb_obj.fetch_motif_by_id('MA0095.2')
Printing the motif will all the associated meta-information stored in the JASPAR database cluding the matric counts.
>>> print(motif)
TF name YY1
Matrix ID MA0095.2
Collection CORE
TF class ['C2H2 zinc finger factors']
TF family ['More than 3 adjacent zinc finger factors']
Species 9606
Taxonomic group vertebrates
Accession ['P25490']
Data type used ChIP-seq
Medline 18950698
Matrix:
0 1 2 3 4 5 6 7 8 9 10 11
A: 1126.00 6975.00 6741.00 2506.00 7171.00 0.00 11.00 13.00 812.00 867.00 899.00 1332.00
C: 4583.00 0.00 99.00 1117.00 0.00 12.00 0.00 0.00 5637.00 1681.00 875.00 4568.00
G: 801.00 181.00 268.00 3282.00 0.00 0.00 7160.00 7158.00 38.00 2765.00 4655.00 391.00
T: 661.00 15.00 63.00 266.00 0.00 7159.00 0.00 0.00 684.00 1858.00 742.00 880.00
Get the count matrix using .counts
>>> print(motif.counts)
0 1 2 3 4 5 6 7 8 9 10 11
A: 1126.00 6975.00 6741.00 2506.00 7171.00 0.00 11.00 13.00 812.00 867.00 899.00 1332.00
C: 4583.00 0.00 99.00 1117.00 0.00 12.00 0.00 0.00 5637.00 1681.00 875.00 4568.00
G: 801.00 181.00 268.00 3282.00 0.00 0.00 7160.00 7158.00 38.00 2765.00 4655.00 391.00
T: 661.00 15.00 63.00 266.00 0.00 7159.00 0.00 0.00 684.00 1858.00 742.00 880.00
Get motifs by TF name
You can use the fetch_motifs_by_name function to find motifs by TF name. This method returns a list of motifs for the same TF name across taxonomic group. For example, below search will return two CTCF motifs one in vertebrates and another in plants taxon.
>>> motifs = jdb_obj.fetch_motifs_by_name("CTCF")
>>> print(len(motifs))
2
>>> print(motifs)
TF name CTCF
Matrix ID MA0139.1
Collection CORE
TF class ['C2H2 zinc finger factors'
TF family ['More than 3 adjacent zinc finger factors']
Species 9606
Taxonomic group vertebrates
Accession ['P49711']
Data type used ChIP-seq
Medline 17512414
Matrix:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
A: 87.00 167.00 281.00 56.00 8.00 744.00 40.00 107.00 851.00 5.00 333.00 54.00 12.00 56.00 104.00 372.00 82.00 117.00 402.00
C: 291.00 145.00 49.00 800.00 903.00 13.00 528.00 433.00 11.00 0.00 3.00 12.00 0.00 8.00 733.00 13.00 482.00 322.00 181.00
G: 76.00 414.00 449.00 21.00 0.00 65.00 334.00 48.00 32.00 903.00 566.00 504.00 890.00 775.00 5.00 507.00 307.00 73.00 266.00
T: 459.00 187.00 134.00 36.00 2.00 91.00 11.00 324.00 18.00 3.00 9.00 341.00 8.00 71.00 67.00 17.00 37.00 396.00 59.00
TF name CTCF
Matrix ID MA0531.1
Collection CORE
TF class ['C2H2 zinc finger factors']
TF family ['More than 3 adjacent zinc finger factors']
Species 7227
Taxonomic group insects
Accession ['Q9VS55']
Data type used ChIP-chip
Medline 17616980
Matrix:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
A: 306.00 313.00 457.00 676.00 257.00 1534.00 202.00 987.00 2.00 0.00 2.00 124.00 1.00 79.00 231.00
C: 876.00 1147.00 383.00 784.00 714.00 1.00 0.00 0.00 4.00 0.00 0.00 1645.00 0.00 1514.00 773.00
G: 403.00 219.00 826.00 350.00 87.00 192.00 1700.00 912.00 311.00 1902.00 1652.00 3.00 1807.00 8.00 144.00
T: 317.00 223.00 236.00 92.00 844.00 175.00 0.00 3.00 1585.00 0.00 248.00 130.00 94.00 301.00 754.00
Search motifs based on meta-info
A more commonly used function is fetch_motifs helps you to get motifs which match a specified set of criteria. You can query the database based on the available meta-information in the database.
For example, here we are gettting the widely used CORE collection for vertebrates. It returns a list of 746 non-redundent motifs for JASPAR2020 release.
>>> motifs = jdb_obj.fetch_motifs(
collection = 'CORE',
tax_group = ['vertebrates']
)
>>> print(len(motifs))
746
You can loop through these motifs and perform your analysis.
>>> for motif in motifs:
print(motif.matrix_id)
MA0004.1
MA0006.1
-
-
-
MA0528.2
MA0609.2
Here is a list of meta-info fetch_motifs method takes as an arugment to filter the motifs.
Argument |
Description |
---|---|
matrix_id |
Takes precedence over all other selection criteria except ‘all’. Only motifs with the given JASPAR matrix ID(s) are returned. A matrix ID may be specified as just a base ID or full JASPAR IDs including version number. If only a base ID is provided for specific motif(s), then just the latest version of those motif(s) are returned unless ‘all_versions’ is also specified. |
collection |
Only motifs from the specified JASPAR collection(s) are returned. NOTE - if not specified, the collection defaults to CORE for all other selection criteria except ‘all’ and ‘matrix_id’. To apply the other selection criteria across all JASPAR collections, explicitly set collection=None. |
tf_name |
Only motifs with the given name(s) are returned. |
tf_class |
Only motifs of the given TF class(es) are returned. |
tf_family |
Only motifs from the given TF families are returned. |
tax_group |
Only motifs belonging to the given taxonomic supergroups are returned (e.g. ‘vertebrates’, ‘insects’, ‘nematodes’ etc.) |
species |
Only motifs derived from the given species are returned. Species are specified as taxonomy IDs. |
data_type |
Only motifs generated with the given data type (e.g. (‘ChIP-seq’, ‘PBM’, ‘SELEX’ etc.) are returned. |
pazar_id |
Only motifs with the given PAZAR TF ID are returned. |
medline |
Only motifs with the given medline (PubmMed IDs) are returned. |
min_ic |
Only motifs whose profile matrices have at least this information content (specificty) are returned. |
min_length |
Only motifs whose profiles are of at least this length are returned. |
min_sites |
Only motifs compiled from at least these many binding sites are returned. |
all_versions |
Unless specified, just the latest version of motifs determined by the other selection criteria are returned. Otherwise all versions of the selected motifs are returned. |
all |
Takes precedent of all other selection criteria. Every motif is returned. If ‘all_versions’ is also specified, all versions of every motif are returned, otherwise just the latest version of every motif is returned. |