# Near Duplicates Detection

### Execute Test:

The following code snippet is set up to perform a Near Duplicate Detection Test, helping you to identify and address duplicate images in your dataset.

**Step 1: Define the Duplication Detection Rules**

Start by creating rules to identify what constitutes a near duplicate in your dataset.

```python
rules = LQRules()
rules.add(metric="similarity_score", metric_threshold=0.99)

near_duplicates_detection = nearest_duplicate(test_session=test_session,
                                          dataset_name = "Enter-your-dataset-name",
                                          test_name = "near_duplicate_detection_1",
                                          type = "near_duplicates",
                                          output_type="near_duplicates",
                                          embed_col_name="embedding",
                                          rules=rules)
                                          
test_session.add()

test_session.run()
```

* `LQRules()`: Initialises the rules for the near duplicate detection.
* `rules.add()`: Adds a rule for detecting duplicates:
  * `metric`: The performance metric used for detection, "similarity\_score" in this instance.
  * `metric_threshold`: The threshold for the similarity score; a value of 0.99 indicates a very high similarity, typical of near duplicates.
* `nearest_duplicate()`: Configures the near duplicate detection test with the following parameters:
  * `test_session`: The session object linked to your RagaAI project.
  * `dataset_name`: The name of your dataset, replace "Enter-your-dataset-name" with the actual name.
  * `type`: The type of test, "near\_duplicates" in this case.
  * `output_type`: The expected result of the test, "near\_duplicates" here.
  * `embed_col_name`: The column name in your dataset containing the embeddings used for comparison.
  * `rules`: The ruleset you've defined for measuring similarity and detecting duplicates.

`test_session.add()`: Registers the near duplicate detection test within the session.

`test_session.run()`: Initiates the execution of all tests in the session, including the near duplicate detection test.

By following these steps, you have successfully set up and executed a Near Duplicate Detection Test in RagaAI.

Post-execution, review the results to identify and remove or handle duplicates as necessary.

### Analysing Test Results

<figure><img src="/files/ymywXc90P9MSVNye7JWo" alt=""><figcaption></figcaption></figure>

* **Similarity Assessment**: The test evaluates each image against others, assigning similarity scores.
* **Classification**: Images with a similarity score above the threshold to any other image are classified as 'failed'.

#### Analysing Results

* **Embedding View**: View your dataset in an interactive visual format to identify clusters of duplicates.
* **Datagrid View**: Scan through images and their pass/fail status.

#### Detailed Review

* **Image View**: Click on an image to see its near duplicates and their similarity scores in detail.

By following these steps, you can ensure that your dataset is free from unwanted duplications, refining the quality and diversity of your image data.

###


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.raga.ai/ragaai-prism/test-inventory/super-resolution/near-duplicates-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
