Skip to content

scopus

Scopus module.

This module is responsible to communicate with Scopus API, providing an efficient client that can fetch as much data as it can, as fast as possible.

InvalidStringError

Bases: Exception

The response has a status code of 413 or 400. The search string might be too long.

Source code in src/sesg/scopus/client.py
224
225
class InvalidStringError(Exception):
    """The response has a status code of 413 or 400. The search string might be too long."""  # noqa: E501

OutOfAPIKeysError

Bases: Exception

All API keys available are expired.

Source code in src/sesg/scopus/client.py
228
229
class OutOfAPIKeysError(Exception):
    """All API keys available are expired."""

Page dataclass

A successfull Scopus Response.

Parameters:

Name Type Description Default
n_results int

Number of results for this query. Notice that even if it displays more than 5000 results, Scopus will limit to retrieve only 5000.

required
n_pages int

Number of pages that needs to be fetched to get all results. Limited to 200 due to Scopus API 5000 entries limit.

required
current_page int

Current page being fetched. Starts at 1, being at most 200.

required
entries list[Entry]

Studies returned from the API.

required
Source code in src/sesg/scopus/client.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
@dataclass(frozen=True)
class Page:
    """A successfull Scopus Response.

    Args:
        n_results (int): Number of results for this query. Notice that even if it displays more than 5000 results, Scopus will limit to retrieve only 5000.
        n_pages (int): Number of pages that needs to be fetched to get all results. Limited to 200 due to Scopus API 5000 entries limit.
        current_page (int): Current page being fetched. Starts at 1, being at most 200.
        entries (list[Entry]): Studies returned from the API.
    """  # noqa: E501

    @dataclass
    class Entry:
        """A study entry returned from the API.

        Args:
            scopus_id (str): The ID of the study determined by Scopus.
            title (str): The title of the study.
            cited_by_count (Optional[int]): How many studies cites this one.
        """

        scopus_id: str
        title: str
        cited_by_count: int | None
        _rest: Any

    n_results: int
    n_pages: int
    current_page: int
    entries: list[Entry]

Entry dataclass

A study entry returned from the API.

Parameters:

Name Type Description Default
scopus_id str

The ID of the study determined by Scopus.

required
title str

The title of the study.

required
cited_by_count Optional[int]

How many studies cites this one.

required
Source code in src/sesg/scopus/client.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
@dataclass
class Entry:
    """A study entry returned from the API.

    Args:
        scopus_id (str): The ID of the study determined by Scopus.
        title (str): The title of the study.
        cited_by_count (Optional[int]): How many studies cites this one.
    """

    scopus_id: str
    title: str
    cited_by_count: int | None
    _rest: Any

ScopusClient

Creates a client that cycles through the available keys to perform efficient searches.

Attributes:

Name Type Description
DUMMY_QUERY str

Used when a dummy query is needed. This value is used, for example, to check if the API key is expired.

To perform a search, use the search method.

Note

You can purge the expired API keys with the purge_expired_keys method.

Source code in src/sesg/scopus/client.py
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
class ScopusClient:
    """Creates a client that cycles through the available keys to perform efficient searches.

    Attributes:
        DUMMY_QUERY (str): Used when a dummy query is needed. This value is used, for example, to check if the API key is expired.

    To perform a search, use the [`search`][sesg.scopus.client.ScopusClient.search] method.

    !!!note
        You can purge the expired API keys with the `purge_expired_keys` method.
    """  # noqa: E501

    DUMMY_QUERY = "test"

    def __init__(
        self,
        api_keys_list: list[str],
    ) -> None:
        """Initializes the instance.

        Args:
            api_keys_list (list[str]): List with API keys.
        """  # noqa: E501
        self.clients_list = MutableCycle(create_clients_list(api_keys_list))

    def delete_client(
        self,
        client: httpx.AsyncClient,
    ) -> None:
        """Deletes a client from the list of clients.

        Used when the client's API key is expired.

        Args:
            client (httpx.AsyncClient): Client to be deleted.
        """
        self.clients_list.delete_item(client)

    async def fetch(
        self,
        params: ScopusParams,
    ) -> httpx.Response:
        """Sends a request with the given params, if a client is available and returns the response.

        Will recursively retry with another API key if the response's status code is 429.

        Args:
            params (ScopusParams): Dictionary with the fetch parameters.

        Raises:
            OutOfAPIKeysError: If all API keys are expired.
            InvalidStringError: If the string is too long, meaning the response's status code is either 413, 429.

        Returns:
            The response obtained.
        """  # noqa: E501
        try:
            client = next(self.clients_list)
        except StopIteration:
            raise OutOfAPIKeysError()

        response = await client.get("", params=params)  # type: ignore

        if check_string_is_invalid(response):
            raise InvalidStringError()

        if check_api_key_is_expired(response):
            self.delete_client(client)

            return await self.fetch(params)

        return response

    async def fetch_first_page(
        self,
        query: str,
    ) -> tuple[Page, list[ScopusParams]]:
        """Requests for the first page of a query.

        Args:
            query (str): Query to request for the first page.

        Returns:
            A tuple with the parsed response and a list of ScopusParams for pagination.
        """
        params: ScopusParams = {
            "query": query,
            "start": 0,
        }

        res = await self.fetch_and_parse(params)

        params_list = create_params_pagination(query, res.n_results)

        return res, params_list

    @retry(
        stop=stop_after_attempt(MAX_ATTEMPTS_ON_KEY_ERROR),
        retry=retry_if_exception_type(KeyError),
        retry_error_callback=lambda _: raise_too_many_key_errors(),
    )
    @retry(
        stop=stop_after_attempt(MAX_ATTEMPTS_ON_JSON_DECODE_ERROR),
        retry=retry_if_exception_type(JSONDecodeError),
        retry_error_callback=lambda _: raise_too_many_json_decode_errors(),
    )
    @retry(
        stop=stop_after_attempt(MAX_ATTEMPTS_ON_SCOPUS_INTERNAL_ERROR),
        retry=retry_if_exception_type(ScopusInternalError),
        retry_error_callback=lambda _: raise_too_many_scopus_internal_errors(),
    )
    @retry(
        stop=stop_after_attempt(MAX_ATTEMPTS_ON_SSL_ERROR),
        retry=retry_if_exception_type(SSLError),
        retry_error_callback=lambda _: raise_too_many_ssl_errors(),
    )
    async def fetch_and_parse(
        self,
        params: ScopusParams,
    ) -> Page:
        """Makes a request using the given parameters, and parses the response.

        Args:
            params (ScopusParams): Parameters of the request.

        Raises:
            InvalidStringError: If the response has a status code of 400 or 413.
            TooManyJSONDecodeErrors: If the maximum number of attempts on JSONDecodeError is reached.
            TooManyKeyErrors: If the maximum number of attempts on KeyError is reached.
            TooManySSLSErrors: If the maximum number of attempts on SSLError is reached.
            OutOfAPIKeysError: If all API keys are expired.

        Returns:
            A parsed response, meaning a [`Page`][sesg.scopus.client.Page] instance.
        """  # noqa: E501
        response = await self.fetch(params)

        if response.status_code == 500:
            raise ScopusInternalError()

        return parse_response(response)

    async def search(
        self,
        query: str,
        max_concurrent_tasks: int | None = None,
    ) -> AsyncIterable[Page]:
        """Performs concurrent requests to all of the pages of the given query.

        Args:
            query (str): The query to search for.
            max_concurrent_tasks (Optional[int]): The maximum number of concurrently running tasks. If None, will set to the number of pages of the query.

        Raises:
            InvalidStringError: If the response has a status code of 400 or 413.
            TooManyJSONDecodeErrors: If the maximum number of attempts on JSONDecodeError is reached.
            TooManyKeyErrors: If the maximum number of attempts on KeyError is reached.
            OutOfAPIKeysError: If all API keys are expired.

        Yields:
            A [`Page`][sesg.scopus.client.Page] instance.
        """  # noqa: E501
        first_page, params_list = await self.fetch_first_page(query)

        yield first_page

        max_concurrent_tasks = max_concurrent_tasks or max(len(params_list), 1)

        async with aiometer.amap(
            self.fetch_and_parse,
            params_list,
            max_at_once=max_concurrent_tasks,
            max_per_second=len(self.clients_list) * MAX_REQUESTS_PER_SECOND_PER_API_KEY,
        ) as next_pages:
            async for page in next_pages:
                yield page

    async def get_expired_clients(self) -> list[httpx.AsyncClient]:
        """Verifies which clients have expired API keys.

        Returns:
            List of clients with expired API keys.
        """
        params: ScopusParams = {
            "query": ScopusClient.DUMMY_QUERY,
            "start": 0,
        }

        fns = [
            partial(
                client.get,
                "",
                params=params,  # type: ignore
            )
            for client in self.clients_list.items
        ]

        responses = await aiometer.run_all(
            fns,
            max_at_once=len(self.clients_list),
            max_per_second=len(self.clients_list),
        )

        expired_clients: list[httpx.AsyncClient] = []

        for client, response in zip(
            self.clients_list.items,
            responses,
        ):
            if check_api_key_is_expired(response):
                expired_clients.append(client)

        return expired_clients

    async def purge_expired_clients(self):
        """Removes all clients with expired API keys from the list of clients."""
        expired_clients = await self.get_expired_clients()

        for client in expired_clients:
            self.delete_client(client)

__init__(api_keys_list)

Initializes the instance.

Parameters:

Name Type Description Default
api_keys_list list[str]

List with API keys.

required
Source code in src/sesg/scopus/client.py
266
267
268
269
270
271
272
273
274
275
def __init__(
    self,
    api_keys_list: list[str],
) -> None:
    """Initializes the instance.

    Args:
        api_keys_list (list[str]): List with API keys.
    """  # noqa: E501
    self.clients_list = MutableCycle(create_clients_list(api_keys_list))

delete_client(client)

Deletes a client from the list of clients.

Used when the client's API key is expired.

Parameters:

Name Type Description Default
client httpx.AsyncClient

Client to be deleted.

required
Source code in src/sesg/scopus/client.py
277
278
279
280
281
282
283
284
285
286
287
288
def delete_client(
    self,
    client: httpx.AsyncClient,
) -> None:
    """Deletes a client from the list of clients.

    Used when the client's API key is expired.

    Args:
        client (httpx.AsyncClient): Client to be deleted.
    """
    self.clients_list.delete_item(client)

fetch(params) async

Sends a request with the given params, if a client is available and returns the response.

Will recursively retry with another API key if the response's status code is 429.

Parameters:

Name Type Description Default
params ScopusParams

Dictionary with the fetch parameters.

required

Raises:

Type Description
OutOfAPIKeysError

If all API keys are expired.

InvalidStringError

If the string is too long, meaning the response's status code is either 413, 429.

Returns:

Type Description
httpx.Response

The response obtained.

Source code in src/sesg/scopus/client.py
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
async def fetch(
    self,
    params: ScopusParams,
) -> httpx.Response:
    """Sends a request with the given params, if a client is available and returns the response.

    Will recursively retry with another API key if the response's status code is 429.

    Args:
        params (ScopusParams): Dictionary with the fetch parameters.

    Raises:
        OutOfAPIKeysError: If all API keys are expired.
        InvalidStringError: If the string is too long, meaning the response's status code is either 413, 429.

    Returns:
        The response obtained.
    """  # noqa: E501
    try:
        client = next(self.clients_list)
    except StopIteration:
        raise OutOfAPIKeysError()

    response = await client.get("", params=params)  # type: ignore

    if check_string_is_invalid(response):
        raise InvalidStringError()

    if check_api_key_is_expired(response):
        self.delete_client(client)

        return await self.fetch(params)

    return response

fetch_and_parse(params) async

Makes a request using the given parameters, and parses the response.

Parameters:

Name Type Description Default
params ScopusParams

Parameters of the request.

required

Raises:

Type Description
InvalidStringError

If the response has a status code of 400 or 413.

TooManyJSONDecodeErrors

If the maximum number of attempts on JSONDecodeError is reached.

TooManyKeyErrors

If the maximum number of attempts on KeyError is reached.

TooManySSLSErrors

If the maximum number of attempts on SSLError is reached.

OutOfAPIKeysError

If all API keys are expired.

Returns:

Type Description
Page

A parsed response, meaning a Page instance.

Source code in src/sesg/scopus/client.py
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
@retry(
    stop=stop_after_attempt(MAX_ATTEMPTS_ON_KEY_ERROR),
    retry=retry_if_exception_type(KeyError),
    retry_error_callback=lambda _: raise_too_many_key_errors(),
)
@retry(
    stop=stop_after_attempt(MAX_ATTEMPTS_ON_JSON_DECODE_ERROR),
    retry=retry_if_exception_type(JSONDecodeError),
    retry_error_callback=lambda _: raise_too_many_json_decode_errors(),
)
@retry(
    stop=stop_after_attempt(MAX_ATTEMPTS_ON_SCOPUS_INTERNAL_ERROR),
    retry=retry_if_exception_type(ScopusInternalError),
    retry_error_callback=lambda _: raise_too_many_scopus_internal_errors(),
)
@retry(
    stop=stop_after_attempt(MAX_ATTEMPTS_ON_SSL_ERROR),
    retry=retry_if_exception_type(SSLError),
    retry_error_callback=lambda _: raise_too_many_ssl_errors(),
)
async def fetch_and_parse(
    self,
    params: ScopusParams,
) -> Page:
    """Makes a request using the given parameters, and parses the response.

    Args:
        params (ScopusParams): Parameters of the request.

    Raises:
        InvalidStringError: If the response has a status code of 400 or 413.
        TooManyJSONDecodeErrors: If the maximum number of attempts on JSONDecodeError is reached.
        TooManyKeyErrors: If the maximum number of attempts on KeyError is reached.
        TooManySSLSErrors: If the maximum number of attempts on SSLError is reached.
        OutOfAPIKeysError: If all API keys are expired.

    Returns:
        A parsed response, meaning a [`Page`][sesg.scopus.client.Page] instance.
    """  # noqa: E501
    response = await self.fetch(params)

    if response.status_code == 500:
        raise ScopusInternalError()

    return parse_response(response)

fetch_first_page(query) async

Requests for the first page of a query.

Parameters:

Name Type Description Default
query str

Query to request for the first page.

required

Returns:

Type Description
tuple[Page, list[ScopusParams]]

A tuple with the parsed response and a list of ScopusParams for pagination.

Source code in src/sesg/scopus/client.py
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
async def fetch_first_page(
    self,
    query: str,
) -> tuple[Page, list[ScopusParams]]:
    """Requests for the first page of a query.

    Args:
        query (str): Query to request for the first page.

    Returns:
        A tuple with the parsed response and a list of ScopusParams for pagination.
    """
    params: ScopusParams = {
        "query": query,
        "start": 0,
    }

    res = await self.fetch_and_parse(params)

    params_list = create_params_pagination(query, res.n_results)

    return res, params_list

get_expired_clients() async

Verifies which clients have expired API keys.

Returns:

Type Description
list[httpx.AsyncClient]

List of clients with expired API keys.

Source code in src/sesg/scopus/client.py
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
async def get_expired_clients(self) -> list[httpx.AsyncClient]:
    """Verifies which clients have expired API keys.

    Returns:
        List of clients with expired API keys.
    """
    params: ScopusParams = {
        "query": ScopusClient.DUMMY_QUERY,
        "start": 0,
    }

    fns = [
        partial(
            client.get,
            "",
            params=params,  # type: ignore
        )
        for client in self.clients_list.items
    ]

    responses = await aiometer.run_all(
        fns,
        max_at_once=len(self.clients_list),
        max_per_second=len(self.clients_list),
    )

    expired_clients: list[httpx.AsyncClient] = []

    for client, response in zip(
        self.clients_list.items,
        responses,
    ):
        if check_api_key_is_expired(response):
            expired_clients.append(client)

    return expired_clients

purge_expired_clients() async

Removes all clients with expired API keys from the list of clients.

Source code in src/sesg/scopus/client.py
466
467
468
469
470
471
async def purge_expired_clients(self):
    """Removes all clients with expired API keys from the list of clients."""
    expired_clients = await self.get_expired_clients()

    for client in expired_clients:
        self.delete_client(client)

search(query, max_concurrent_tasks=None) async

Performs concurrent requests to all of the pages of the given query.

Parameters:

Name Type Description Default
query str

The query to search for.

required
max_concurrent_tasks Optional[int]

The maximum number of concurrently running tasks. If None, will set to the number of pages of the query.

None

Raises:

Type Description
InvalidStringError

If the response has a status code of 400 or 413.

TooManyJSONDecodeErrors

If the maximum number of attempts on JSONDecodeError is reached.

TooManyKeyErrors

If the maximum number of attempts on KeyError is reached.

OutOfAPIKeysError

If all API keys are expired.

Yields:

Type Description
AsyncIterable[Page]

A Page instance.

Source code in src/sesg/scopus/client.py
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
async def search(
    self,
    query: str,
    max_concurrent_tasks: int | None = None,
) -> AsyncIterable[Page]:
    """Performs concurrent requests to all of the pages of the given query.

    Args:
        query (str): The query to search for.
        max_concurrent_tasks (Optional[int]): The maximum number of concurrently running tasks. If None, will set to the number of pages of the query.

    Raises:
        InvalidStringError: If the response has a status code of 400 or 413.
        TooManyJSONDecodeErrors: If the maximum number of attempts on JSONDecodeError is reached.
        TooManyKeyErrors: If the maximum number of attempts on KeyError is reached.
        OutOfAPIKeysError: If all API keys are expired.

    Yields:
        A [`Page`][sesg.scopus.client.Page] instance.
    """  # noqa: E501
    first_page, params_list = await self.fetch_first_page(query)

    yield first_page

    max_concurrent_tasks = max_concurrent_tasks or max(len(params_list), 1)

    async with aiometer.amap(
        self.fetch_and_parse,
        params_list,
        max_at_once=max_concurrent_tasks,
        max_per_second=len(self.clients_list) * MAX_REQUESTS_PER_SECOND_PER_API_KEY,
    ) as next_pages:
        async for page in next_pages:
            yield page

TooManyJSONDecodeErrors

Bases: Exception

Reached the maximum number of attempts on JSONDecodeError.

Source code in src/sesg/scopus/client.py
204
205
class TooManyJSONDecodeErrors(Exception):
    """Reached the maximum number of attempts on JSONDecodeError."""

TooManyKeyErrors

Bases: Exception

Reached the maximum number of attempts on KeyError.

Source code in src/sesg/scopus/client.py
208
209
class TooManyKeyErrors(Exception):
    """Reached the maximum number of attempts on KeyError."""

TooManyScopusInternalErrors

Bases: Exception

Reached the maximum number of attempts on ScopusInternalError.

Source code in src/sesg/scopus/client.py
212
213
class TooManyScopusInternalErrors(Exception):
    """Reached the maximum number of attempts on ScopusInternalError."""