.. ipython:: python
   :suppress:

   from matplotlib import pyplot as plt
   import pandas.util._doctools as doctools

   p = doctools.TablePlotter()

Merge, join, concatenate and compare

pandas provides various methods for combining and comparing :class:`Series` or :class:`DataFrame`.

:func:`~pandas.concat`: Merge multiple :class:`Series` or :class:`DataFrame` objects along a shared index or column
:meth:`DataFrame.join`: Merge multiple :class:`DataFrame` objects along the columns
:meth:`DataFrame.combine_first`: Update missing values with non-missing values in the same location
:func:`~pandas.merge`: Combine two :class:`Series` or :class:`DataFrame` objects with SQL-style joining
:func:`~pandas.merge_ordered`: Combine two :class:`Series` or :class:`DataFrame` objects along an ordered axis
:func:`~pandas.merge_asof`: Combine two :class:`Series` or :class:`DataFrame` objects by near instead of exact matching keys
:meth:`Series.compare` and :meth:`DataFrame.compare`: Show differences in values between two :class:`Series` or :class:`DataFrame` objects

:func:`~pandas.concat`

The :func:`~pandas.concat` function concatenates an arbitrary amount of :class:`Series` or :class:`DataFrame` objects along an axis while performing optional set logic (union or intersection) of the indexes on the other axes. Like numpy.concatenate, :func:`~pandas.concat` takes a list or dict of homogeneously-typed objects and concatenates them.

.. ipython:: python

   df1 = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "C": ["C0", "C1", "C2", "C3"],
           "D": ["D0", "D1", "D2", "D3"],
       },
       index=[0, 1, 2, 3],
   )

   df2 = pd.DataFrame(
       {
           "A": ["A4", "A5", "A6", "A7"],
           "B": ["B4", "B5", "B6", "B7"],
           "C": ["C4", "C5", "C6", "C7"],
           "D": ["D4", "D5", "D6", "D7"],
       },
       index=[4, 5, 6, 7],
   )

   df3 = pd.DataFrame(
       {
           "A": ["A8", "A9", "A10", "A11"],
           "B": ["B8", "B9", "B10", "B11"],
           "C": ["C8", "C9", "C10", "C11"],
           "D": ["D8", "D9", "D10", "D11"],
       },
       index=[8, 9, 10, 11],
   )

   frames = [df1, df2, df3]
   result = pd.concat(frames)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_basic.png
   p.plot(frames, result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

Note

:func:`~pandas.concat` makes a full copy of the data, and iteratively reusing :func:`~pandas.concat` can create unnecessary copies. Collect all :class:`DataFrame` or :class:`Series` objects in a list before using :func:`~pandas.concat`.

frames = [process_your_file(f) for f in files]
result = pd.concat(frames)

Note

When concatenating :class:`DataFrame` with named axes, pandas will attempt to preserve these index/column names whenever possible. In the case where all inputs share a common name, this name will be assigned to the result. When the input names do not all agree, the result will be unnamed. The same is true for :class:`MultiIndex`, but the logic is applied separately on a level-by-level basis.

Joining logic of the resulting axis

The join keyword specifies how to handle axis values that don't exist in the first :class:`DataFrame`.

join='outer' takes the union of all axis values.

.. ipython:: python

   df4 = pd.DataFrame(
       {
           "B": ["B2", "B3", "B6", "B7"],
           "D": ["D2", "D3", "D6", "D7"],
           "F": ["F2", "F3", "F6", "F7"],
       },
       index=[2, 3, 6, 7],
   )
   result = pd.concat([df1, df4], axis=1)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_axis1.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=False);
   plt.close("all");

join='inner' takes the intersection of the axis values.

.. ipython:: python

   result = pd.concat([df1, df4], axis=1, join="inner")
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_axis1_inner.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=False);
   plt.close("all");

To perform an effective "left" join using the exact index from the original :class:`DataFrame`, result can be reindexed.

.. ipython:: python

   result = pd.concat([df1, df4], axis=1).reindex(df1.index)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_axis1_join_axes.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=False);
   plt.close("all");

Ignoring indexes on the concatenation axis

For :class:`DataFrame` objects which don't have a meaningful index, the ignore_index ignores overlapping indexes.

.. ipython:: python

   result = pd.concat([df1, df4], ignore_index=True, sort=False)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_ignore_index.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=True);
   plt.close("all");

Concatenating :class:`Series` and :class:`DataFrame` together

You can concatenate a mix of :class:`Series` and :class:`DataFrame` objects. The :class:`Series` will be transformed to :class:`DataFrame` with the column name as the name of the :class:`Series`.

.. ipython:: python

   s1 = pd.Series(["X0", "X1", "X2", "X3"], name="X")
   result = pd.concat([df1, s1], axis=1)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_mixed_ndim.png
   p.plot([df1, s1], result, labels=["df1", "s1"], vertical=False);
   plt.close("all");

Unnamed :class:`Series` will be numbered consecutively.

.. ipython:: python

   s2 = pd.Series(["_0", "_1", "_2", "_3"])
   result = pd.concat([df1, s2, s2, s2], axis=1)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_unnamed_series.png
   p.plot([df1, s2], result, labels=["df1", "s2"], vertical=False);
   plt.close("all");

ignore_index=True will drop all name references.

.. ipython:: python

   result = pd.concat([df1, s1], axis=1, ignore_index=True)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_series_ignore_index.png
   p.plot([df1, s1], result, labels=["df1", "s1"], vertical=False);
   plt.close("all");

Resulting `keys`

The keys argument adds another axis level to the resulting index or column (creating a :class:`MultiIndex`) associate specific keys with each original :class:`DataFrame`.

.. ipython:: python

   result = pd.concat(frames, keys=["x", "y", "z"])
   result
   result.loc["y"]

.. ipython:: python
   :suppress:

   @savefig merging_concat_keys.png
   p.plot(frames, result, labels=["df1", "df2", "df3"], vertical=True)
   plt.close("all");

The keys argument can override the column names when creating a new :class:`DataFrame` based on existing :class:`Series`.

.. ipython:: python

   s3 = pd.Series([0, 1, 2, 3], name="foo")
   s4 = pd.Series([0, 1, 2, 3])
   s5 = pd.Series([0, 1, 4, 5])

   pd.concat([s3, s4, s5], axis=1)
   pd.concat([s3, s4, s5], axis=1, keys=["red", "blue", "yellow"])

You can also pass a dict to :func:`concat` in which case the dict keys will be used for the keys argument unless other keys argument is specified:

.. ipython:: python

   pieces = {"x": df1, "y": df2, "z": df3}
   result = pd.concat(pieces)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_dict.png
   p.plot([df1, df2, df3], result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

.. ipython:: python

   result = pd.concat(pieces, keys=["z", "y"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_dict_keys.png
   p.plot([df1, df2, df3], result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

The :class:`MultiIndex` created has levels that are constructed from the passed keys and the index of the :class:`DataFrame` pieces:

.. ipython:: python

   result.index.levels

levels argument allows specifying resulting levels associated with the keys.

.. ipython:: python

   result = pd.concat(
       pieces, keys=["x", "y", "z"], levels=[["z", "y", "x", "w"]], names=["group_key"]
   )
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_dict_keys_names.png
   p.plot([df1, df2, df3], result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

.. ipython:: python

   result.index.levels

Appending rows to a :class:`DataFrame`

If you have a :class:`Series` that you want to append as a single row to a :class:`DataFrame`, you can convert the row into a :class:`DataFrame` and use :func:`concat`.

.. ipython:: python

   s2 = pd.Series(["X0", "X1", "X2", "X3"], index=["A", "B", "C", "D"])
   result = pd.concat([df1, s2.to_frame().T], ignore_index=True)
   result

.. ipython:: python
   :suppress:

   @savefig merging_append_series_as_row.png
   p.plot([df1, s2], result, labels=["df1", "s2"], vertical=True);
   plt.close("all");

:func:`~pandas.merge`

:func:`~pandas.merge` performs join operations similar to relational databases like SQL. Users who are familiar with SQL but new to pandas can reference a :ref:`comparison with SQL<compare_with_sql.join>`.

Merge types

:func:`~pandas.merge` implements common SQL style joining operations.

one-to-one: joining two :class:`DataFrame` objects on their indexes which must contain unique values.
many-to-one: joining a unique index to one or more columns in a different :class:`DataFrame`.
many-to-many: joining columns on columns.

Note

When joining columns on columns, potentially a many-to-many join, any indexes on the passed :class:`DataFrame` objects will be discarded.

For a many-to-many join, if a key combination appears more than once in both tables, the :class:`DataFrame` will have the Cartesian product of the associated data.

.. ipython:: python

   left = pd.DataFrame(
       {
           "key": ["K0", "K1", "K2", "K3"],
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
       }
   )

   right = pd.DataFrame(
       {
           "key": ["K0", "K1", "K2", "K3"],
           "C": ["C0", "C1", "C2", "C3"],
           "D": ["D0", "D1", "D2", "D3"],
       }
   )
   result = pd.merge(left, right, on="key")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

The how argument to :func:`~pandas.merge` specifies which keys are included in the resulting table. If a key combination does not appear in either the left or right tables, the values in the joined table will be NA. Here is a summary of the how options and their SQL equivalent names:

Merge method	SQL Join Name	Description
`left`	`LEFT OUTER JOIN`	Use keys from left frame only
`right`	`RIGHT OUTER JOIN`	Use keys from right frame only
`outer`	`FULL OUTER JOIN`	Use union of keys from both frames
`inner`	`INNER JOIN`	Use intersection of keys from both frames
`cross`	`CROSS JOIN`	Create the cartesian product of rows of both frames

.. ipython:: python

   left = pd.DataFrame(
      {
         "key1": ["K0", "K0", "K1", "K2"],
         "key2": ["K0", "K1", "K0", "K1"],
         "A": ["A0", "A1", "A2", "A3"],
         "B": ["B0", "B1", "B2", "B3"],
      }
   )
   right = pd.DataFrame(
      {
         "key1": ["K0", "K1", "K1", "K2"],
         "key2": ["K0", "K0", "K0", "K0"],
         "C": ["C0", "C1", "C2", "C3"],
         "D": ["D0", "D1", "D2", "D3"],
      }
   )
   result = pd.merge(left, right, how="left", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_left.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, how="right", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_right.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);

.. ipython:: python

   result = pd.merge(left, right, how="outer", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_outer.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, how="inner", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, how="cross")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_cross.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

You can merge :class:`Series` and a :class:`DataFrame` with a :class:`MultiIndex` if the names of the :class:`MultiIndex` correspond to the columns from the :class:`DataFrame`. You can also transform the :class:`Series` to a :class:`DataFrame` using :meth:`Series.reset_index` before merging:

.. ipython:: python

   df = pd.DataFrame({"Let": ["A", "B", "C"], "Num": [1, 2, 3]})
   df

   ser = pd.Series(
       ["a", "b", "c", "d", "e", "f"],
       index=pd.MultiIndex.from_arrays(
           [["A", "B", "C"] * 2, [1, 2, 3, 4, 5, 6]], names=["Let", "Num"]
       ),
   )
   ser

   pd.merge(df, ser.reset_index(), on=["Let", "Num"])

Performing an outer join with duplicate join keys in :class:`DataFrame`:

.. ipython:: python

   left = pd.DataFrame({"A": [1, 2], "B": [2, 2]})

   right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})

   result = pd.merge(left, right, on="B", how="outer")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_dup.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

Warning

Merging on duplicate keys significantly increase the dimensions of the result and can cause a memory overflow.

Merge key uniqueness

The validate argument checks whether the uniqueness of merge keys. Key uniqueness is checked before merge operations and can protect against memory overflows and unexpected key duplication.

.. ipython:: python
   :okexcept:

   left = pd.DataFrame({"A": [1, 2], "B": [1, 2]})
   right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})
   result = pd.merge(left, right, on="B", how="outer", validate="one_to_one")

If the user is aware of the duplicates in the right :class:`DataFrame` but wants to ensure there are no duplicates in the left :class:`DataFrame`, one can use the validate='one_to_many' argument instead, which will not raise an exception.

.. ipython:: python

   pd.merge(left, right, on="B", how="outer", validate="one_to_many")

Merge result indicator

:func:`~pandas.merge` accepts the argument indicator. If True, a Categorical-type column called _merge will be added to the output object that takes on values:

Observation Origin _merge value

Merge key only in 'left' frame left_only

Merge key only in 'right' frame right_only

Merge key in both frames both

.. ipython:: python

   df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})
   df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})
   pd.merge(df1, df2, on="col1", how="outer", indicator=True)

A string argument to indicator will use the value as the name for the indicator column.

.. ipython:: python

   pd.merge(df1, df2, on="col1", how="outer", indicator="indicator_column")

Overlapping value columns

The merge suffixes argument takes a tuple or list of strings to append to overlapping column names in the input :class:`DataFrame` to disambiguate the result columns:

.. ipython:: python

   left = pd.DataFrame({"k": ["K0", "K1", "K2"], "v": [1, 2, 3]})
   right = pd.DataFrame({"k": ["K0", "K0", "K3"], "v": [4, 5, 6]})

   result = pd.merge(left, right, on="k")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_overlapped.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, on="k", suffixes=("_l", "_r"))
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_overlapped_suffix.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

:meth:`DataFrame.join`

:meth:`DataFrame.join` combines the columns of multiple, potentially differently-indexed :class:`DataFrame` into a single result :class:`DataFrame`.

.. ipython:: python

   left = pd.DataFrame(
       {"A": ["A0", "A1", "A2"], "B": ["B0", "B1", "B2"]}, index=["K0", "K1", "K2"]
   )

   right = pd.DataFrame(
       {"C": ["C0", "C2", "C3"], "D": ["D0", "D2", "D3"]}, index=["K0", "K2", "K3"]
   )

   result = left.join(right)
   result

.. ipython:: python
   :suppress:

   @savefig merging_join.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = left.join(right, how="outer")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_outer.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = left.join(right, how="inner")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

:meth:`DataFrame.join` takes an optional on argument which may be a column or multiple column names that the passed :class:`DataFrame` is to be aligned.

.. ipython:: python

   left = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "key": ["K0", "K1", "K0", "K1"],
       }
   )

   right = pd.DataFrame({"C": ["C0", "C1"], "D": ["D0", "D1"]}, index=["K0", "K1"])

   result = left.join(right, on="key")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_key_columns.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(
       left, right, left_on="key", right_index=True, how="left", sort=False
   )
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_key_columns.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

To join on multiple keys, the passed :class:`DataFrame` must have a :class:`MultiIndex`:

.. ipython:: python

   left = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "key1": ["K0", "K0", "K1", "K2"],
           "key2": ["K0", "K1", "K0", "K1"],
       }
   )

   index = pd.MultiIndex.from_tuples(
       [("K0", "K0"), ("K1", "K0"), ("K2", "K0"), ("K2", "K1")]
   )
   right = pd.DataFrame(
       {"C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]}, index=index
   )
   result = left.join(right, on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_multikeys.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

The default for :class:`DataFrame.join` is to perform a left join which uses only the keys found in the calling :class:`DataFrame`. Other join types can be specified with how.

.. ipython:: python

   result = left.join(right, on=["key1", "key2"], how="inner")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_multikeys_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

Joining a single Index to a MultiIndex

You can join a :class:`DataFrame` with a :class:`Index` to a :class:`DataFrame` with a :class:`MultiIndex` on a level. The name of the :class:`Index` will match the level name of the :class:`MultiIndex`.

..  ipython:: python

    left = pd.DataFrame(
        {"A": ["A0", "A1", "A2"], "B": ["B0", "B1", "B2"]},
        index=pd.Index(["K0", "K1", "K2"], name="key"),
    )

    index = pd.MultiIndex.from_tuples(
        [("K0", "Y0"), ("K1", "Y1"), ("K2", "Y2"), ("K2", "Y3")],
        names=["key", "Y"],
    )
    right = pd.DataFrame(
        {"C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]},
        index=index,
    )

    result = left.join(right, how="inner")
    result

.. ipython:: python
   :suppress:

   @savefig merging_join_multiindex_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

Joining with two :class:`MultiIndex`

The :class:`MultiIndex` of the input argument must be completely used in the join and is a subset of the indices in the left argument.

.. ipython:: python

   leftindex = pd.MultiIndex.from_product(
       [list("abc"), list("xy"), [1, 2]], names=["abc", "xy", "num"]
   )
   left = pd.DataFrame({"v1": range(12)}, index=leftindex)
   left

   rightindex = pd.MultiIndex.from_product(
       [list("abc"), list("xy")], names=["abc", "xy"]
   )
   right = pd.DataFrame({"v2": [100 * i for i in range(1, 7)]}, index=rightindex)
   right

   left.join(right, on=["abc", "xy"], how="inner")

.. ipython:: python

   leftindex = pd.MultiIndex.from_tuples(
       [("K0", "X0"), ("K0", "X1"), ("K1", "X2")], names=["key", "X"]
   )
   left = pd.DataFrame(
       {"A": ["A0", "A1", "A2"], "B": ["B0", "B1", "B2"]}, index=leftindex
   )

   rightindex = pd.MultiIndex.from_tuples(
       [("K0", "Y0"), ("K1", "Y1"), ("K2", "Y2"), ("K2", "Y3")], names=["key", "Y"]
   )
   right = pd.DataFrame(
       {"C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]}, index=rightindex
   )

   result = pd.merge(
       left.reset_index(), right.reset_index(), on=["key"], how="inner"
   ).set_index(["key", "X", "Y"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_two_multiindex.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

Merging on a combination of columns and index levels

Strings passed as the on, left_on, and right_on parameters may refer to either column names or index level names. This enables merging :class:`DataFrame` instances on a combination of index levels and columns without resetting indexes.

.. ipython:: python

   left_index = pd.Index(["K0", "K0", "K1", "K2"], name="key1")

   left = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "key2": ["K0", "K1", "K0", "K1"],
       },
       index=left_index,
   )

   right_index = pd.Index(["K0", "K1", "K2", "K2"], name="key1")

   right = pd.DataFrame(
       {
           "C": ["C0", "C1", "C2", "C3"],
           "D": ["D0", "D1", "D2", "D3"],
           "key2": ["K0", "K0", "K0", "K1"],
       },
       index=right_index,
   )

   result = left.merge(right, on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merge_on_index_and_column.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

Note

When :class:`DataFrame` are joined on a string that matches an index level in both arguments, the index level is preserved as an index level in the resulting :class:`DataFrame`.

Note

When :class:`DataFrame` are joined using only some of the levels of a :class:`MultiIndex`, the extra levels will be dropped from the resulting join. To preserve those levels, use :meth:`DataFrame.reset_index` on those level names to move those levels to columns prior to the join.

Joining multiple :class:`DataFrame`

A list or tuple of :class:`DataFrame` can also be passed to :meth:`~DataFrame.join` to join them together on their indexes.

.. ipython:: python

   right2 = pd.DataFrame({"v": [7, 8, 9]}, index=["K1", "K1", "K2"])
   result = left.join([right, right2])

.. ipython:: python
   :suppress:

   @savefig merging_join_multi_df.png
   p.plot(
       [left, right, right2],
       result,
       labels=["left", "right", "right2"],
       vertical=False,
   );
   plt.close("all");

:meth:`DataFrame.combine_first`

:meth:`DataFrame.combine_first` update missing values from one :class:`DataFrame` with the non-missing values in another :class:`DataFrame` in the corresponding location.

.. ipython:: python

   df1 = pd.DataFrame(
       [[np.nan, 3.0, 5.0], [-4.6, np.nan, np.nan], [np.nan, 7.0, np.nan]]
   )
   df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5.0, 1.6, 4]], index=[1, 2])
   result = df1.combine_first(df2)
   result

.. ipython:: python
   :suppress:

   @savefig merging_combine_first.png
   p.plot([df1, df2], result, labels=["df1", "df2"], vertical=False);
   plt.close("all");

:func:`merge_ordered`

:func:`merge_ordered` combines order data such as numeric or time series data with optional filling of missing data with fill_method.

.. ipython:: python

   left = pd.DataFrame(
       {"k": ["K0", "K1", "K1", "K2"], "lv": [1, 2, 3, 4], "s": ["a", "b", "c", "d"]}
   )

   right = pd.DataFrame({"k": ["K1", "K2", "K4"], "rv": [1, 2, 3]})

   pd.merge_ordered(left, right, fill_method="ffill", left_by="s")

:func:`merge_asof`

:func:`merge_asof` is similar to an ordered left-join except that matches are on the nearest key rather than equal keys. For each row in the left :class:`DataFrame`, the last row in the right :class:`DataFrame` are selected where the on key is less than the left's key. Both :class:`DataFrame` must be sorted by the key.

Optionally :func:`merge_asof` can perform a group-wise merge by matching the by key in addition to the nearest match on the on key.

.. ipython:: python

   trades = pd.DataFrame(
       {
           "time": pd.to_datetime(
               [
                   "20160525 13:30:00.023",
                   "20160525 13:30:00.038",
                   "20160525 13:30:00.048",
                   "20160525 13:30:00.048",
                   "20160525 13:30:00.048",
               ]
           ),
           "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
           "price": [51.95, 51.95, 720.77, 720.92, 98.00],
           "quantity": [75, 155, 100, 100, 100],
       },
       columns=["time", "ticker", "price", "quantity"],
   )

   quotes = pd.DataFrame(
       {
           "time": pd.to_datetime(
               [
                   "20160525 13:30:00.023",
                   "20160525 13:30:00.023",
                   "20160525 13:30:00.030",
                   "20160525 13:30:00.041",
                   "20160525 13:30:00.048",
                   "20160525 13:30:00.049",
                   "20160525 13:30:00.072",
                   "20160525 13:30:00.075",
               ]
           ),
           "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG", "AAPL", "GOOG", "MSFT"],
           "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
           "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
       },
       columns=["time", "ticker", "bid", "ask"],
   )
   trades
   quotes
   pd.merge_asof(trades, quotes, on="time", by="ticker")

:func:`merge_asof` within 2ms between the quote time and the trade time.

.. ipython:: python

   pd.merge_asof(trades, quotes, on="time", by="ticker", tolerance=pd.Timedelta("2ms"))

:func:`merge_asof` within 10ms between the quote time and the trade time and exclude exact matches on time. Note that though we exclude the exact matches (of the quotes), prior quotes do propagate to that point in time.

.. ipython:: python

   pd.merge_asof(
       trades,
       quotes,
       on="time",
       by="ticker",
       tolerance=pd.Timedelta("10ms"),
       allow_exact_matches=False,
   )

:meth:`~Series.compare`

The :meth:`Series.compare` and :meth:`DataFrame.compare` methods allow you to compare two :class:`DataFrame` or :class:`Series`, respectively, and summarize their differences.

.. ipython:: python

   df = pd.DataFrame(
       {
           "col1": ["a", "a", "b", "b", "a"],
           "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
           "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
       },
       columns=["col1", "col2", "col3"],
   )
   df
   df2 = df.copy()
   df2.loc[0, "col1"] = "c"
   df2.loc[2, "col3"] = 4.0
   df2
   df.compare(df2)

By default, if two corresponding values are equal, they will be shown as NaN. Furthermore, if all values in an entire row / column are equal, that row / column will be omitted from the result. The remaining differences will be aligned on columns.

Stack the differences on rows.

.. ipython:: python

   df.compare(df2, align_axis=0)

Keep all original rows and columns with keep_shape=True.

.. ipython:: python

   df.compare(df2, keep_shape=True)

Keep all the original values even if they are equal.

.. ipython:: python

   df.compare(df2, keep_shape=True, keep_equal=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merging.rst

merging.rst

Merge, join, concatenate and compare

:func:`~pandas.concat`

Joining logic of the resulting axis

Ignoring indexes on the concatenation axis

Concatenating :class:`Series` and :class:`DataFrame` together

Resulting `keys`

Appending rows to a :class:`DataFrame`

:func:`~pandas.merge`

Merge types

Merge key uniqueness

Merge result indicator

Overlapping value columns

:meth:`DataFrame.join`

Joining a single Index to a MultiIndex

Joining with two :class:`MultiIndex`

Merging on a combination of columns and index levels

Joining multiple :class:`DataFrame`

:meth:`DataFrame.combine_first`

:func:`merge_ordered`

:func:`merge_asof`

:meth:`~Series.compare`

Observation Origin	`_merge` value
Merge key only in `'left'` frame	`left_only`
Merge key only in `'right'` frame	`right_only`
Merge key in both frames	`both`

Files

merging.rst

Latest commit

History

merging.rst

File metadata and controls

Merge, join, concatenate and compare

:func:`~pandas.concat`

Joining logic of the resulting axis

Ignoring indexes on the concatenation axis

Concatenating :class:`Series` and :class:`DataFrame` together

Resulting keys

Appending rows to a :class:`DataFrame`

:func:`~pandas.merge`

Merge types

Merge key uniqueness

Merge result indicator

Overlapping value columns

:meth:`DataFrame.join`

Joining a single Index to a MultiIndex

Joining with two :class:`MultiIndex`

Merging on a combination of columns and index levels

Joining multiple :class:`DataFrame`

:meth:`DataFrame.combine_first`

:func:`merge_ordered`

:func:`merge_asof`

:meth:`~Series.compare`

Resulting `keys`