Skip to main content

How to remove duplicates in the Tables ( SSIS and SQL )

 1. Using Sort Transformation : 

  We can remove the duplicates in the SSIS server using the Sort Operation. The Sort operation has a Built in Feature. We just have to check the checkbox.



2. Using Fuzzy Grouping Transformation :

  This performs Data cleaning task by identifying rows of data that are likely to be duplicate.



















Popular posts from this blog

How to remove special characters in the Data(using Script component Task and Regex)

Start with the Below screen shot ( Before Script Task) and you will come to know about the senario. Note : The Script Component also works for the special characters within the Data as well . for ex "A$hish" Hover over the below Regex. it will show > Show potential fixes > in that Select System.Text.RegularExpressions. The Below will disappear.  Use the Below Script.  public override void Input0_ProcessInputRow(Input0Buffer Row) { Row.CustomerID = RemoveSpecialCharacters(Row.CustomerID); Row.NameStyle = RemoveSpecialCharacters(Row.NameStyle); Row.Title = RemoveSpecialCharacters(Row.Title); Row.FirstName = RemoveSpecialCharacters(Row.FirstName); Row.LastName = RemoveSpecialCharacters(Row.LastName); Row.CompanyName = RemoveSpecialCharacters(Row.CompanyName); Row.SalesPerson = RemoveSpecialCharacters(Row.SalesPerson); Row.EmailAddress = RemoveSpecialCharacters(Row.EmailAddress); Row.Phone = RemoveSpecialCharacters(Row.Phone); } public static string ...

DBMS

Database : database can be defined as a collection of information stored at a particular place written in a predetermined manner, belongs to a particular topic.  DBMS: (DATA BASE MANAGEMENT SYSTEM):  it is defined as software which is present inside the database which is used o maintain or manage the data within the database .                                 In the above diagram, if the user needs any info from the database. He has to contact with  dbms , then the dbms goes to the database, takes the information from the  database and gives it back to the user. Database models :  these models are broadly divided into the following types: 1.         FMS/FDMS 2.         HMS/HDMS 3.         NDBMS 4.         RDBMS 5.      ...

Different Schemas in DWH

  SCHEMAS:   Star Schema: -       Star Schema means Dimension Tables are directly linked to the Fact Table. Snow Flake Schema: -       Dimension Tables are indirectly linked to the Fact Table OR -       Dimension table is linking to another dimension table .       Star Schema   Snow Flake Schema 1 Has redundant data and hence tough to maintain/change 1 No redundancy and hence more easy to maintain/change 2 Less complex queries and hence easy to understand 2 More complex queries and hence less easy to understand 3 Less foreign keys so faster execution time of cube. 3 More foreign keys so take more execution of cube 4 Has De-Normalized Tables 4 Has Normalized Tabl...