2006-11-08 Dorit Nuzman <dorit@il.ibm.com>

* tree-vect-analyze.c (vect_mark_relevant, vect_stmt_relevant_p): Take enum argument instead of bool. (vect_analyze_operations): Call vectorizable_type_promotion. * tree-vectorizer.h (type_promotion_vec_info_type): New enum stmt_vec_info_type value. (supportable_widening_operation, vectorizable_type_promotion): New function declarations. * tree-vect-transform.c (vect_gen_widened_results_half): New function. (vectorizable_type_promotion): New function. (vect_transform_stmt): Call vectorizable_type_promotion. * tree-vect-analyze.c (supportable_widening_operation): New function. * tree-vect-patterns.c (vect_recog_dot_prod_pattern): Add implementation. * tree-vect-generic.c (expand_vector_operations_1): Consider correct mode. * tree.def (VEC_WIDEN_MULT_HI_EXPR, VEC_WIDEN_MULT_LO_EXPR): (VEC_UNPACK_HI_EXPR, VEC_UNPACK_LO_EXPR): New tree-codes. * tree-inline.c (estimate_num_insns_1): Add cases for above new tree-codes. * tree-pretty-print.c (dump_generic_node, op_prio): Likewise. * expr.c (expand_expr_real_1): Likewise. * optabs.c (optab_for_tree_code): Likewise. (init_optabs): Initialize new optabs. * genopinit.c (vec_widen_umult_hi_optab, vec_widen_smult_hi_optab, vec_widen_smult_hi_optab, vec_widen_smult_lo_optab, vec_unpacks_hi_optab, vec_unpacks_lo_optab, vec_unpacku_hi_optab, vec_unpacku_lo_optab): Initialize new optabs. * optabs.h (OTI_vec_widen_umult_hi, OTI_vec_widen_umult_lo): (OTI_vec_widen_smult_h, OTI_vec_widen_smult_lo, OTI_vec_unpacks_hi, OTI_vec_unpacks_lo, OTI_vec_unpacku_hi, OTI_vec_unpacku_lo): New optab indices. (vec_widen_umult_hi_optab, vec_widen_umult_lo_optab): (vec_widen_smult_hi_optab, vec_widen_smult_lo_optab): (vec_unpacks_hi_optab, vec_unpacku_hi_optab, vec_unpacks_lo_optab): (vec_unpacku_lo_optab): New optabs. * doc/md.texi (vec_unpacks_hi, vec_unpacks_lo, vec_unpacku_hi): (vec_unpacku_lo, vec_widen_umult_hi, vec_widen_umult_lo): (vec_widen_smult_hi, vec_widen_smult_lo): New. * doc/c-tree.texi (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): (VEC_WIDEN_MULT_HI_EXPR, VEC_WIDEN_MULT_LO_EXPR, VEC_UNPACK_HI_EXPR): (VEC_UNPACK_LO_EXPR, VEC_PACK_MOD_EXPR, VEC_PACK_SAT_EXPR): New. * config/rs6000/altivec.md (UNSPEC_VMULWHUB, UNSPEC_VMULWLUB): (UNSPEC_VMULWHSB, UNSPEC_VMULWLSB, UNSPEC_VMULWHUH, UNSPEC_VMULWLUH): (UNSPEC_VMULWHSH, UNSPEC_VMULWLSH): New. (UNSPEC_VPERMSI, UNSPEC_VPERMHI): New. (vec_vperm_v8hiv4si, vec_vperm_v16qiv8hi): New patterns used to implement the unsigned unpacking patterns. (vec_unpacks_hi_v16qi, vec_unpacks_hi_v8hi, vec_unpacks_lo_v16qi): (vec_unpacks_lo_v8hi): New signed unpacking patterns. (vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi): (vec_unpacku_lo_v8hi): New unsigned unpacking patterns. (vec_widen_umult_hi_v16qi, vec_widen_umult_lo_v16qi): (vec_widen_smult_hi_v16qi, vec_widen_smult_lo_v16qi): (vec_widen_umult_hi_v8hi, vec_widen_umult_lo_v8hi): (vec_widen_smult_hi_v8hi, vec_widen_smult_lo_v8hi): New widening multiplication patterns. * target.h (builtin_mul_widen_even, builtin_mul_widen_odd): New. * target-def.h (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): New. * config/rs6000/rs6000.c (rs6000_builtin_mul_widen_even): New. (rs6000_builtin_mul_widen_odd): New. (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): Defined. (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): Defined. * tree-vectorizer.h (enum vect_relevant): New enum type. (_stmt_vec_info): Field relevant chaned from bool to enum vect_relevant. (STMT_VINFO_RELEVANT_P): Updated. (STMT_VINFO_RELEVANT): New. * tree-vectorizer.c (new_stmt_vec_info): Use STMT_VINFO_RELEVANT instead of STMT_VINFO_RELEVANT_P. * tree-vect-analyze.c (vect_mark_relevant, vect_stmt_relevant_p): Replace calls to STMT_VINFO_RELEVANT_P with STMT_VINFO_RELEVANT, and boolean variable with enum vect_relevant. (vect_mark_stmts_to_be_vectorized): Likewise + update documentation. * doc/tm.texi (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): New. (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): New. 2006-11-08 Richard Henderson <rth@redhat.com> * config/i386/sse.md (vec_widen_umult_hi_v8hi, vec_widen_umult_lo_v8hi): New. (vec_widen_smult_hi_v4si, vec_widen_smult_lo_v4si, vec_widen_umult_hi_v4si, vec_widen_umult_lo_v4si): New. * config/i386/i386.c (ix86_expand_sse_unpack): New. * config/i386/i386-protos.h (ix86_expand_sse_unpack): New. * config/i386/sse.md (vec_unpacku_hi_v16qi, vec_unpacks_hi_v16qi, vec_unpacku_lo_v16qi, vec_unpacks_lo_v16qi, vec_unpacku_hi_v8hi, vec_unpacks_hi_v8hi, vec_unpacku_lo_v8hi, vec_unpacks_lo_v8hi, vec_unpacku_hi_v4si, vec_unpacks_hi_v4si, vec_unpacku_lo_v4si, vec_unpacks_lo_v4si): New. 2006-11-08 Dorit Nuzman <dorit@il.ibm.com> * tree-vect-transform.c (vectorizable_type_demotion): New function. (vect_transform_stmt): Add case for type_demotion_vec_info_type. (vect_analyze_operations): Call vectorizable_type_demotion. * tree-vectorizer.h (type_demotion_vec_info_type): New enum stmt_vec_info_type value. (vectorizable_type_demotion): New function declaration. * tree-vect-generic.c (expand_vector_operations_1): Consider correct mode. * tree.def (VEC_PACK_MOD_EXPR, VEC_PACK_SAT_EXPR): New tree-codes. * expr.c (expand_expr_real_1): Add case for VEC_PACK_MOD_EXPR and VEC_PACK_SAT_EXPR. * tree-iniline.c (estimate_num_insns_1): Likewise. * tree-pretty-print.c (dump_generic_node, op_prio): Likewise. * optabs.c (optab_for_tree_code): Likewise. * optabs.c (expand_binop): In case of vec_pack_*_optabs the mode compared against the predicate of the result is not 'mode' (the input to the function) but a mode with half the size of 'mode'. (init_optab): Initialize new optabs. * optabs.h (OTI_vec_pack_mod, OTI_vec_pack_ssat, OTI_vec_pack_usat): New optab indices. (vec_pack_mod_optab, vec_pack_ssat_optab, vec_pack_usat_optab): New optabs. * genopinit.c (vec_pack_mod_optab, vec_pack_ssat_optab): (vec_pack_usat_optab): Initialize new optabs. * doc/md.texi (vec_pack_mod, vec_pack_ssat, vec_pack_usat): New. * config/rs6000/altivec.md (vec_pack_mod_v8hi, vec_pack_mod_v4si): New. 2006-11-08 Richard Henderson <rth@redehat.com> * config/i386/sse.md (vec_pack_mod_v8hi, vec_pack_mod_v4si): (vec_pack_mod_v2di, vec_interleave_highv16qi, vec_interleave_lowv16qi): (vec_interleave_highv8hi, vec_interleave_lowv8hi): (vec_interleave_highv4si, vec_interleave_lowv4si): (vec_interleave_highv2di, vec_interleave_lowv2di): New. 2006-11-08 Dorit Nuzman <dorit@il.ibm.com> * tree-vect-transform.c (vectorizable_reduction): Support multiple datatypes. (vect_transform_stmt): Removed redundant code. 2006-11-08 Dorit Nuzman <dorit@il.ibm.com> * tree-vect-transform.c (vectorizable_operation): Support multiple datatypes. 2006-11-08 Dorit Nuzman <dorit@il.ibm.com> * tree-vect-transform.c (vect_align_data_ref): Removed. (vect_create_data_ref_ptr): Added additional argument - ptr_incr. Updated function documentation. Return the increment stmt in ptr_incr. (bump_vector_ptr): New function. (vect_get_vec_def_for_stmt_copy): New function. (vect_finish_stmt_generation): Create a stmt_info to newly created vector stmts. (vect_setup_realignment): Call vect_create_data_ref_ptr with additional argument. (vectorizable_reduction, vectorizable_assignment): Not supported yet if VF is greater than the number of elements that can fit in one vector word. (vectorizable_operation, vectorizable_condition): Likewise. (vectorizable_store, vectorizable_load): Support the case that the VF is greater than the number of elements that can fit in one vector word. (vect_transform_loop): Don't fail in case of multiple data-types. * tree-vect-analyze.c (vect_determine_vectorization_factor): Don't fail in case of multiple data-types; the smallest type determines the VF. (vect_analyze_data_ref_dependence): Don't record datarefs as same_align if they are of different sizes. (vect_update_misalignment_for_peel): Compare misalignments in terms of number of elements rather than number of bytes. (vect_enhance_data_refs_alignment): Fix/Add dump printouts. (vect_can_advance_ivs_p): Fix a dump printout git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@118577 138bc75d-0d04-0410-961f-82ee72b054a4
author: Dorit Nuzman <dorit@il.ibm.com> 2006-11-08 07:32:44 +0000
committer: Dorit Nuzman <dorit@il.ibm.com> 2006-11-08 07:32:44 +0000
commit: 73ded14a31c67f5c40c2b11b109f68ddf7be7e3b (patch)
tree: f63ce21ba1bb5e2d1d0cb84948e597d8223aaab5
parent: b5b83ee7d0e568bfb183b10956d0ea0c40904745 (diff)
57 files changed, 4210 insertions, 713 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ec098898221..d215ec51982 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,177 @@
+2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-analyze.c (vect_mark_relevant, vect_stmt_relevant_p): Take 
+	enum argument instead of bool.
+	(vect_analyze_operations): Call vectorizable_type_promotion.
+	* tree-vectorizer.h (type_promotion_vec_info_type): New enum
+	stmt_vec_info_type value.
+	(supportable_widening_operation, vectorizable_type_promotion): New
+	function declarations.
+	* tree-vect-transform.c (vect_gen_widened_results_half): New function.
+	(vectorizable_type_promotion): New function.
+	(vect_transform_stmt): Call vectorizable_type_promotion.
+	* tree-vect-analyze.c (supportable_widening_operation): New function.
+	* tree-vect-patterns.c (vect_recog_dot_prod_pattern): 
+	Add implementation.
+	* tree-vect-generic.c (expand_vector_operations_1): Consider correct
+	mode.
+	
+	* tree.def (VEC_WIDEN_MULT_HI_EXPR, VEC_WIDEN_MULT_LO_EXPR):
+	(VEC_UNPACK_HI_EXPR, VEC_UNPACK_LO_EXPR): New tree-codes.
+	* tree-inline.c (estimate_num_insns_1): Add cases for above new 
+	tree-codes.
+	* tree-pretty-print.c (dump_generic_node, op_prio): Likewise.
+	* expr.c (expand_expr_real_1): Likewise.
+	* optabs.c (optab_for_tree_code): Likewise.
+	(init_optabs): Initialize new optabs.
+	* genopinit.c (vec_widen_umult_hi_optab, vec_widen_smult_hi_optab,
+	vec_widen_smult_hi_optab, vec_widen_smult_lo_optab,
+	vec_unpacks_hi_optab, vec_unpacks_lo_optab, vec_unpacku_hi_optab,
+	vec_unpacku_lo_optab): Initialize new optabs.
+	* optabs.h (OTI_vec_widen_umult_hi, OTI_vec_widen_umult_lo):
+	(OTI_vec_widen_smult_h, OTI_vec_widen_smult_lo, OTI_vec_unpacks_hi,
+	OTI_vec_unpacks_lo, OTI_vec_unpacku_hi, OTI_vec_unpacku_lo): New 
+	optab indices.
+	(vec_widen_umult_hi_optab, vec_widen_umult_lo_optab):
+	(vec_widen_smult_hi_optab, vec_widen_smult_lo_optab):
+	(vec_unpacks_hi_optab, vec_unpacku_hi_optab, vec_unpacks_lo_optab):
+	(vec_unpacku_lo_optab): New optabs.
+	* doc/md.texi (vec_unpacks_hi, vec_unpacks_lo, vec_unpacku_hi): 
+	(vec_unpacku_lo, vec_widen_umult_hi, vec_widen_umult_lo): 
+	(vec_widen_smult_hi, vec_widen_smult_lo): New.
+	* doc/c-tree.texi (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR):
+	(VEC_WIDEN_MULT_HI_EXPR, VEC_WIDEN_MULT_LO_EXPR, VEC_UNPACK_HI_EXPR):
+	(VEC_UNPACK_LO_EXPR, VEC_PACK_MOD_EXPR, VEC_PACK_SAT_EXPR): New.
+	 
+	* config/rs6000/altivec.md (UNSPEC_VMULWHUB, UNSPEC_VMULWLUB):
+	(UNSPEC_VMULWHSB, UNSPEC_VMULWLSB, UNSPEC_VMULWHUH, UNSPEC_VMULWLUH):
+	(UNSPEC_VMULWHSH, UNSPEC_VMULWLSH): New.
+	(UNSPEC_VPERMSI, UNSPEC_VPERMHI): New.
+	(vec_vperm_v8hiv4si, vec_vperm_v16qiv8hi): New patterns used to
+	implement the unsigned unpacking patterns.
+	(vec_unpacks_hi_v16qi, vec_unpacks_hi_v8hi, vec_unpacks_lo_v16qi):
+	(vec_unpacks_lo_v8hi): New signed unpacking patterns.
+	(vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi):
+	(vec_unpacku_lo_v8hi): New unsigned unpacking patterns.
+	(vec_widen_umult_hi_v16qi, vec_widen_umult_lo_v16qi):
+	(vec_widen_smult_hi_v16qi, vec_widen_smult_lo_v16qi): 
+	(vec_widen_umult_hi_v8hi, vec_widen_umult_lo_v8hi):
+	(vec_widen_smult_hi_v8hi, vec_widen_smult_lo_v8hi): New widening
+	multiplication patterns.
+
+	* target.h (builtin_mul_widen_even, builtin_mul_widen_odd): New.
+	* target-def.h (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN):
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): New.
+	* config/rs6000/rs6000.c (rs6000_builtin_mul_widen_even): New.
+	(rs6000_builtin_mul_widen_odd): New.
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): Defined.
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): Defined.
+	* tree-vectorizer.h (enum vect_relevant): New enum type.
+	(_stmt_vec_info): Field relevant chaned from bool to enum
+	vect_relevant.
+	(STMT_VINFO_RELEVANT_P): Updated.
+	(STMT_VINFO_RELEVANT): New.
+	* tree-vectorizer.c (new_stmt_vec_info): Use STMT_VINFO_RELEVANT
+	instead of STMT_VINFO_RELEVANT_P.
+	* tree-vect-analyze.c (vect_mark_relevant, vect_stmt_relevant_p):
+	Replace calls to STMT_VINFO_RELEVANT_P with STMT_VINFO_RELEVANT,
+	and boolean variable with enum vect_relevant.
+	(vect_mark_stmts_to_be_vectorized): Likewise + update documentation.
+	* doc/tm.texi (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): New.
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): New.
+
+	2006-11-08  Richard Henderson  <rth@redhat.com>
+
+	* config/i386/sse.md (vec_widen_umult_hi_v8hi,
+	vec_widen_umult_lo_v8hi): New.
+	(vec_widen_smult_hi_v4si, vec_widen_smult_lo_v4si,
+	vec_widen_umult_hi_v4si, vec_widen_umult_lo_v4si): New.
+
+	* config/i386/i386.c (ix86_expand_sse_unpack): New. 
+	* config/i386/i386-protos.h (ix86_expand_sse_unpack): New. 
+	* config/i386/sse.md (vec_unpacku_hi_v16qi, vec_unpacks_hi_v16qi,
+	vec_unpacku_lo_v16qi, vec_unpacks_lo_v16qi, vec_unpacku_hi_v8hi,
+	vec_unpacks_hi_v8hi, vec_unpacku_lo_v8hi, vec_unpacks_lo_v8hi,
+	vec_unpacku_hi_v4si, vec_unpacks_hi_v4si, vec_unpacku_lo_v4si,
+	vec_unpacks_lo_v4si): New.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vectorizable_type_demotion): New function.
+	(vect_transform_stmt): Add case for type_demotion_vec_info_type.
+	(vect_analyze_operations): Call vectorizable_type_demotion.
+	* tree-vectorizer.h (type_demotion_vec_info_type): New enum 
+	stmt_vec_info_type value.
+	(vectorizable_type_demotion): New function declaration.
+	* tree-vect-generic.c (expand_vector_operations_1): Consider correct
+	mode.
+
+	* tree.def (VEC_PACK_MOD_EXPR, VEC_PACK_SAT_EXPR): New tree-codes.
+	* expr.c (expand_expr_real_1): Add case for VEC_PACK_MOD_EXPR and
+	VEC_PACK_SAT_EXPR.
+	* tree-iniline.c (estimate_num_insns_1): Likewise.
+	* tree-pretty-print.c (dump_generic_node, op_prio): Likewise.
+	* optabs.c (optab_for_tree_code): Likewise.
+
+	* optabs.c (expand_binop): In case of vec_pack_*_optabs the mode 
+	compared against the predicate of the result is not 'mode' (the input 
+	to the function) but a mode with half the size of 'mode'.
+	(init_optab): Initialize new optabs.
+	* optabs.h (OTI_vec_pack_mod, OTI_vec_pack_ssat, OTI_vec_pack_usat):
+	New optab indices.
+	(vec_pack_mod_optab, vec_pack_ssat_optab,  vec_pack_usat_optab): New
+	optabs.
+	* genopinit.c (vec_pack_mod_optab, vec_pack_ssat_optab):
+	(vec_pack_usat_optab): Initialize new optabs.
+	* doc/md.texi (vec_pack_mod, vec_pack_ssat, vec_pack_usat): New.
+	* config/rs6000/altivec.md (vec_pack_mod_v8hi, vec_pack_mod_v4si): New.
+
+	2006-11-08  Richard Henderson  <rth@redehat.com>
+
+	* config/i386/sse.md (vec_pack_mod_v8hi, vec_pack_mod_v4si):
+	(vec_pack_mod_v2di, vec_interleave_highv16qi, vec_interleave_lowv16qi):
+	(vec_interleave_highv8hi, vec_interleave_lowv8hi):
+	(vec_interleave_highv4si, vec_interleave_lowv4si):
+	(vec_interleave_highv2di, vec_interleave_lowv2di): New.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vectorizable_reduction): Support multiple 
+	datatypes.
+	(vect_transform_stmt): Removed redundant code.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vectorizable_operation): Support multiple 
+	datatypes.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vect_align_data_ref): Removed.
+	(vect_create_data_ref_ptr): Added additional argument - ptr_incr. 
+	Updated function documentation. Return the increment stmt in ptr_incr.
+	(bump_vector_ptr): New function.
+	(vect_get_vec_def_for_stmt_copy): New function.
+	(vect_finish_stmt_generation): Create a stmt_info to newly created
+	vector stmts.
+	(vect_setup_realignment): Call vect_create_data_ref_ptr with additional
+	argument.
+	(vectorizable_reduction, vectorizable_assignment): Not supported yet if
+	VF is greater than the number of elements that can fit in one vector
+	word.
+	(vectorizable_operation, vectorizable_condition): Likewise.
+	(vectorizable_store, vectorizable_load): Support the case that the VF
+	is greater than the number of elements that can fit in one vector word.
+	(vect_transform_loop): Don't fail in case of multiple data-types.
+	* tree-vect-analyze.c (vect_determine_vectorization_factor): Don't fail 
+	in case of multiple data-types; the smallest type determines the VF.
+	(vect_analyze_data_ref_dependence): Don't record datarefs as same_align
+	if they are of different sizes.
+	(vect_update_misalignment_for_peel): Compare misalignments in terms of
+	number of elements rather than number of bytes.
+	(vect_enhance_data_refs_alignment): Fix/Add dump printouts.
+	(vect_can_advance_ivs_p): Fix a dump printout
+
 2006-11-07  Eric Christopher  <echristo@apple.com>
 
 	* libgcc2.c (__bswapdi2): Rename from bswapDI2.
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index e7154700577..b8d20a33742 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -105,6 +105,7 @@ extern int ix86_expand_int_movcc (rtx[]);
 extern int ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern int ix86_expand_int_addcc (rtx[]);
 extern void ix86_expand_call (rtx, rtx, rtx, rtx, rtx, int);
 extern void x86_initialize_trampoline (rtx, rtx, rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cd416286015..89f88ed2e5b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11733,6 +11733,52 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+/* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
+   true if we should do zero extension, else sign extension.  HIGH_P is
+   true if we want the N/2 high elements, else the low elements.  */
+
+void
+ix86_expand_sse_unpack (rtx operands[2], bool unsigned_p, bool high_p)
+{
+  enum machine_mode imode = GET_MODE (operands[1]);
+  rtx (*unpack)(rtx, rtx, rtx);
+  rtx se, dest;
+
+  switch (imode)
+    {
+    case V16QImode:
+      if (high_p)
+        unpack = gen_vec_interleave_highv16qi;
+      else
+        unpack = gen_vec_interleave_lowv16qi;
+      break;
+    case V8HImode:
+      if (high_p)
+        unpack = gen_vec_interleave_highv8hi;
+      else
+        unpack = gen_vec_interleave_lowv8hi;
+      break;
+    case V4SImode:
+      if (high_p)
+        unpack = gen_vec_interleave_highv4si;
+      else 
+        unpack = gen_vec_interleave_lowv4si;
+      break;
+    default:
+      gcc_unreachable (); 
+    }
+
+  dest = gen_lowpart (imode, operands[0]);
+
+  if (unsigned_p)
+    se = force_reg (imode, CONST0_RTX (imode));
+  else
+    se = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode),
+                              operands[1], pc_rtx, pc_rtx);
+
+  emit_insn (unpack (dest, operands[1], se));
+}
+
 /* Expand conditional increment or decrement using adb/sbb instructions.
    The default case using setcc followed by the conditional move can be
    done by generic code.  */
@@ -14863,7 +14909,7 @@ static const struct builtin_description bdesc_2arg[] =
   { MASK_MMX, CODE_FOR_sse2_ussubv8hi3, "__builtin_ia32_psubusw128", IX86_BUILTIN_PSUBUSW128, 0, 0 },
 
   { MASK_SSE2, CODE_FOR_mulv8hi3, "__builtin_ia32_pmullw128", IX86_BUILTIN_PMULLW128, 0, 0 },
-  { MASK_SSE2, CODE_FOR_sse2_smulv8hi3_highpart, "__builtin_ia32_pmulhw128", IX86_BUILTIN_PMULHW128, 0, 0 },
+  { MASK_SSE2, CODE_FOR_smulv8hi3_highpart, "__builtin_ia32_pmulhw128", IX86_BUILTIN_PMULHW128, 0, 0 },
 
   { MASK_SSE2, CODE_FOR_andv2di3, "__builtin_ia32_pand128", IX86_BUILTIN_PAND128, 0, 0 },
   { MASK_SSE2, CODE_FOR_sse2_nandv2di3, "__builtin_ia32_pandn128", IX86_BUILTIN_PANDN128, 0, 0 },
@@ -14898,7 +14944,7 @@ static const struct builtin_description bdesc_2arg[] =
   { MASK_SSE2, CODE_FOR_sse2_packssdw, "__builtin_ia32_packssdw128", IX86_BUILTIN_PACKSSDW128, 0, 0 },
   { MASK_SSE2, CODE_FOR_sse2_packuswb, "__builtin_ia32_packuswb128", IX86_BUILTIN_PACKUSWB128, 0, 0 },
 
-  { MASK_SSE2, CODE_FOR_sse2_umulv8hi3_highpart, "__builtin_ia32_pmulhuw128", IX86_BUILTIN_PMULHUW128, 0, 0 },
+  { MASK_SSE2, CODE_FOR_umulv8hi3_highpart, "__builtin_ia32_pmulhuw128", IX86_BUILTIN_PMULHUW128, 0, 0 },
   { MASK_SSE2, CODE_FOR_sse2_psadbw, 0, IX86_BUILTIN_PSADBW128, 0, 0 },
 
   { MASK_SSE2, CODE_FOR_sse2_umulsidi3, 0, IX86_BUILTIN_PMULUDQ, 0, 0 },
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 78976ed441f..9985b7d479c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2620,7 +2620,20 @@
   [(set_attr "type" "sseimul")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_smulv8hi3_highpart"
+(define_insn "smulv8hi3_highpart"
+  [(set (match_operand:V8HI 0 "register_operand" "")
+        (truncate:V8HI
+          (lshiftrt:V8SI 
+            (mult:V8SI 
+              (sign_extend:V8SI
+                (match_operand:V8HI 1 "nonimmediate_operand" ""))
+              (sign_extend:V8SI
+                (match_operand:V8HI 2 "nonimmediate_operand" "")))
+            (const_int 16))))]
+  "TARGET_SSE2"
+  "ix86_fixup_binary_operands_no_copy (MULT, V8HImode, operands);")
+  
+(define_insn "*smulv8hi3_highpart"
   [(set (match_operand:V8HI 0 "register_operand" "=x")
 	(truncate:V8HI
 	  (lshiftrt:V8SI
@@ -2635,7 +2648,20 @@
   [(set_attr "type" "sseimul")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_umulv8hi3_highpart"
+(define_insn "umulv8hi3_highpart"
+  [(set (match_operand:V8HI 0 "register_operand" "")
+        (truncate:V8HI
+          (lshiftrt:V8SI
+            (mult:V8SI
+              (zero_extend:V8SI
+                (match_operand:V8HI 1 "nonimmediate_operand" ""))
+              (zero_extend:V8SI
+                (match_operand:V8HI 2 "nonimmediate_operand" "")))
+            (const_int 16))))]
+  "TARGET_SSE2"
+  "ix86_fixup_binary_operands_no_copy (MULT, V8HImode, operands);")
+
+(define_insn "*umulv8hi3_highpart"
   [(set (match_operand:V8HI 0 "register_operand" "=x")
 	(truncate:V8HI
 	  (lshiftrt:V8SI
@@ -2792,6 +2818,122 @@
   DONE;
 })
 
+(define_expand "vec_widen_umult_hi_v8hi"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")
+   (match_operand:V8HI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, t1, t2, dest;
+
+  op1 = operands[1];
+  op2 = operands[2];
+  t1 = gen_reg_rtx (V8HImode);
+  t2 = gen_reg_rtx (V8HImode);
+  dest = gen_lowpart (V8HImode, operands[0]);
+
+  emit_insn (gen_mulv8hi3 (t1, op1, op2));
+  emit_insn (gen_umulv8hi3_highpart (t2, op1, op2));
+  emit_insn (gen_vec_interleave_highv8hi (dest, t1, t2));
+  DONE;
+})
+
+(define_expand "vec_widen_umult_lo_v8hi"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")
+   (match_operand:V8HI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, t1, t2, dest;
+
+  op1 = operands[1];
+  op2 = operands[2];
+  t1 = gen_reg_rtx (V8HImode);
+  t2 = gen_reg_rtx (V8HImode);
+  dest = gen_lowpart (V8HImode, operands[0]);
+
+  emit_insn (gen_mulv8hi3 (t1, op1, op2));
+  emit_insn (gen_umulv8hi3_highpart (t2, op1, op2));
+  emit_insn (gen_vec_interleave_lowv8hi (dest, t1, t2));
+  DONE;
+})
+
+(define_expand "vec_widen_smult_hi_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:V4SI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, t1, t2;
+
+  op1 = operands[1];
+  op2 = operands[2];
+  t1 = gen_reg_rtx (V4SImode);
+  t2 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_interleave_highv4si (t1, op1, op1));
+  emit_insn (gen_vec_interleave_highv4si (t2, op2, op2));
+  emit_insn (gen_sse2_umulv2siv2di3 (operands[0], t1, t2));
+  DONE;
+})
+
+(define_expand "vec_widen_smult_lo_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:V4SI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, t1, t2;
+
+  op1 = operands[1];
+  op2 = operands[2];
+  t1 = gen_reg_rtx (V4SImode);
+  t2 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_interleave_lowv4si (t1, op1, op1));
+  emit_insn (gen_vec_interleave_lowv4si (t2, op2, op2));
+  emit_insn (gen_sse2_umulv2siv2di3 (operands[0], t1, t2));
+  DONE;
+})
+
+(define_expand "vec_widen_umult_hi_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:V4SI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, t1, t2;
+
+  op1 = operands[1];
+  op2 = operands[2];
+  t1 = gen_reg_rtx (V4SImode);
+  t2 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_interleave_highv4si (t1, op1, op1));
+  emit_insn (gen_vec_interleave_highv4si (t2, op2, op2));
+  emit_insn (gen_sse2_umulv2siv2di3 (operands[0], t1, t2));
+  DONE;
+})
+
+(define_expand "vec_widen_umult_lo_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:V4SI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, t1, t2;
+
+  op1 = operands[1];
+  op2 = operands[2];
+  t1 = gen_reg_rtx (V4SImode);
+  t2 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_interleave_lowv4si (t1, op1, op1));
+  emit_insn (gen_vec_interleave_lowv4si (t2, op2, op2));
+  emit_insn (gen_sse2_umulv2siv2di3 (operands[0], t1, t2));
+  DONE;
+})
+
 (define_expand "sdot_prodv8hi"
   [(match_operand:V4SI 0 "register_operand" "")
    (match_operand:V8HI 1 "nonimmediate_operand" "")
@@ -3215,6 +3357,227 @@
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
+;; Reduce:
+;;      op1 = abcdefghijklmnop
+;;      op2 = qrstuvwxyz012345
+;;       h1 = aqbrcsdteufvgwhx
+;;       l1 = iyjzk0l1m2n3o4p5
+;;       h2 = aiqybjrzcks0dlt1
+;;       l2 = emu2fnv3gow4hpx5
+;;       h3 = aeimquy2bfjnrvz3
+;;       l3 = cgkosw04dhlptx15
+;;   result = bdfhjlnprtvxz135
+(define_expand "vec_pack_mod_v8hi"
+  [(match_operand:V16QI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")
+   (match_operand:V8HI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, h1, l1, h2, l2, h3, l3;
+                                                                                
+  op1 = gen_lowpart (V16QImode, operands[1]);
+  op2 = gen_lowpart (V16QImode, operands[2]);
+  h1 = gen_reg_rtx (V16QImode);
+  l1 = gen_reg_rtx (V16QImode);
+  h2 = gen_reg_rtx (V16QImode);
+  l2 = gen_reg_rtx (V16QImode);
+  h3 = gen_reg_rtx (V16QImode);
+  l3 = gen_reg_rtx (V16QImode);
+                                                                                
+  emit_insn (gen_vec_interleave_highv16qi (h1, op1, op2));
+  emit_insn (gen_vec_interleave_lowv16qi (l1, op1, op2));
+  emit_insn (gen_vec_interleave_highv16qi (h2, l1, h1));
+  emit_insn (gen_vec_interleave_lowv16qi (l2, l1, h1));
+  emit_insn (gen_vec_interleave_highv16qi (h3, l2, h2));
+  emit_insn (gen_vec_interleave_lowv16qi (l3, l2, h2));
+  emit_insn (gen_vec_interleave_lowv16qi (operands[0], l3, h3));
+  DONE;
+})
+                                                                                
+;; Reduce:
+;;      op1 = abcdefgh
+;;      op2 = ijklmnop
+;;       h1 = aibjckdl
+;;       l1 = emfngohp
+;;       h2 = aeimbfjn
+;;       l2 = cgkodhlp
+;;   result = bdfhjlnp
+(define_expand "vec_pack_mod_v4si"
+  [(match_operand:V8HI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:V4SI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, h1, l1, h2, l2;
+                                                                                
+  op1 = gen_lowpart (V8HImode, operands[1]);
+  op2 = gen_lowpart (V8HImode, operands[2]);
+  h1 = gen_reg_rtx (V8HImode);
+  l1 = gen_reg_rtx (V8HImode);
+  h2 = gen_reg_rtx (V8HImode);
+  l2 = gen_reg_rtx (V8HImode);
+                                                                                
+  emit_insn (gen_vec_interleave_highv8hi (h1, op1, op2));
+  emit_insn (gen_vec_interleave_lowv8hi (l1, op1, op2));
+  emit_insn (gen_vec_interleave_highv8hi (h2, l1, h1));
+  emit_insn (gen_vec_interleave_lowv8hi (l2, l1, h1));
+  emit_insn (gen_vec_interleave_lowv8hi (operands[0], l2, h2));
+  DONE;
+})
+                                                                                
+;; Reduce:
+;;     op1 = abcd
+;;     op2 = efgh
+;;      h1 = aebf
+;;      l1 = cgdh
+;;  result = bdfh
+(define_expand "vec_pack_mod_v2di"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V2DI 1 "register_operand" "")
+   (match_operand:V2DI 2 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx op1, op2, h1, l1;
+                                                                                
+  op1 = gen_lowpart (V4SImode, operands[1]);
+  op2 = gen_lowpart (V4SImode, operands[2]);
+  h1 = gen_reg_rtx (V4SImode);
+  l1 = gen_reg_rtx (V4SImode);
+                                                                                
+  emit_insn (gen_vec_interleave_highv4si (h1, op1, op2));
+  emit_insn (gen_vec_interleave_lowv4si (l1, op1, op2));
+  emit_insn (gen_vec_interleave_lowv4si (operands[0], l1, h1));
+  DONE;
+})
+
+(define_expand "vec_interleave_highv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=x")
+        (vec_select:V16QI
+          (vec_concat:V32QI
+            (match_operand:V16QI 1 "register_operand" "0")
+            (match_operand:V16QI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 8)  (const_int 24)
+                     (const_int 9)  (const_int 25)
+                     (const_int 10) (const_int 26)
+                     (const_int 11) (const_int 27)
+                     (const_int 12) (const_int 28)
+                     (const_int 13) (const_int 29)
+                     (const_int 14) (const_int 30)
+                     (const_int 15) (const_int 31)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpckhbw (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_interleave_lowv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=x")
+        (vec_select:V16QI
+          (vec_concat:V32QI
+            (match_operand:V16QI 1 "register_operand" "0")
+            (match_operand:V16QI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 0) (const_int 16)
+                     (const_int 1) (const_int 17)
+                     (const_int 2) (const_int 18)
+                     (const_int 3) (const_int 19)
+                     (const_int 4) (const_int 20)
+                     (const_int 5) (const_int 21)
+                     (const_int 6) (const_int 22)
+                     (const_int 7) (const_int 23)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpcklbw (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_interleave_highv8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
+        (vec_select:V8HI
+          (vec_concat:V16HI
+            (match_operand:V8HI 1 "register_operand" "0")
+            (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 4) (const_int 12)
+                     (const_int 5) (const_int 13)
+                     (const_int 6) (const_int 14)
+                     (const_int 7) (const_int 15)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpckhwd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_interleave_lowv8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
+        (vec_select:V8HI
+          (vec_concat:V16HI
+            (match_operand:V8HI 1 "register_operand" "0")
+            (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 0) (const_int 8)
+                     (const_int 1) (const_int 9)
+                     (const_int 2) (const_int 10)
+                     (const_int 3) (const_int 11)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpcklwd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_interleave_highv4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
+        (vec_select:V4SI
+          (vec_concat:V8SI
+            (match_operand:V4SI 1 "register_operand" "0")
+            (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 2) (const_int 6)
+                     (const_int 3) (const_int 7)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpckhdq (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_interleave_lowv4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
+        (vec_select:V4SI
+          (vec_concat:V8SI
+            (match_operand:V4SI 1 "register_operand" "0")
+            (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 0) (const_int 4)
+                     (const_int 1) (const_int 5)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpckldq (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_interleave_highv2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+        (vec_select:V2DI
+          (vec_concat:V4DI
+            (match_operand:V2DI 1 "register_operand" "0")
+            (match_operand:V2DI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 1)
+                     (const_int 3)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpckhqdq (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_interleave_lowv2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+        (vec_select:V2DI
+          (vec_concat:V4DI
+            (match_operand:V2DI 1 "register_operand" "0")
+            (match_operand:V2DI 2 "nonimmediate_operand" "xm"))
+          (parallel [(const_int 0)
+                     (const_int 2)])))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse2_punpcklqdq (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "sse2_packsswb"
   [(set (match_operand:V16QI 0 "register_operand" "=x")
 	(vec_concat:V16QI
@@ -3832,6 +4195,114 @@
   DONE;
 })
 
+(define_expand "vec_unpacku_hi_v16qi"
+  [(match_operand:V8HI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, true, true);
+  DONE;
+})
+
+(define_expand "vec_unpacks_hi_v16qi"
+  [(match_operand:V8HI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, false, true);
+  DONE;
+})
+
+(define_expand "vec_unpacku_lo_v16qi"
+  [(match_operand:V8HI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, true, false);
+  DONE;
+})
+
+(define_expand "vec_unpacks_lo_v16qi"
+  [(match_operand:V8HI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, false, false);
+  DONE;
+})
+
+(define_expand "vec_unpacku_hi_v8hi"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, true, true);
+  DONE;
+})
+
+(define_expand "vec_unpacks_hi_v8hi"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, false, true);
+  DONE;
+})
+
+(define_expand "vec_unpacku_lo_v8hi"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, true, false);
+  DONE;
+})
+
+(define_expand "vec_unpacks_lo_v8hi"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, false, false);
+  DONE;
+})
+
+(define_expand "vec_unpacku_hi_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, true, true);
+  DONE;
+})
+
+(define_expand "vec_unpacks_hi_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, false, true);
+  DONE;
+})
+
+(define_expand "vec_unpacku_lo_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, true, false);
+  DONE;
+})
+
+(define_expand "vec_unpacks_lo_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  ix86_expand_sse_unpack (operands, false, false);
+  DONE;
+})
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Miscellaneous
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e0326856f1a..7a78a9405f3 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -122,6 +122,20 @@
    (UNSPEC_VCONDU_V4SI  305)
    (UNSPEC_VCONDU_V8HI  306)
    (UNSPEC_VCONDU_V16QI 307)
+   (UNSPEC_VMULWHUB     308)
+   (UNSPEC_VMULWLUB     309)
+   (UNSPEC_VMULWHSB     310)
+   (UNSPEC_VMULWLSB     311)
+   (UNSPEC_VMULWHUH     312)
+   (UNSPEC_VMULWLUH     313)
+   (UNSPEC_VMULWHSH     314)
+   (UNSPEC_VMULWLSH     315)
+   (UNSPEC_VUPKHUB      316)
+   (UNSPEC_VUPKHUH      317)
+   (UNSPEC_VUPKLUB      318)
+   (UNSPEC_VUPKLUH      319)
+   (UNSPEC_VPERMSI	320)
+   (UNSPEC_VPERMHI	321)
    ])
 
 (define_constants
@@ -2203,6 +2217,371 @@
   DONE;
 }")
 
+(define_expand "vec_unpacks_hi_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
+                     UNSPEC_VUPKHSB))]
+  "TARGET_ALTIVEC"
+  "
+{
+  emit_insn (gen_altivec_vupkhsb (operands[0], operands[1]));
+  DONE;
+}")
+
+(define_expand "vec_unpacks_hi_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_VUPKHSH))]
+  "TARGET_ALTIVEC"
+  "
+{
+  emit_insn (gen_altivec_vupkhsh (operands[0], operands[1]));
+  DONE;
+}")
+
+(define_expand "vec_unpacks_lo_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
+                     UNSPEC_VUPKLSB))]
+  "TARGET_ALTIVEC"
+  "
+{
+  emit_insn (gen_altivec_vupklsb (operands[0], operands[1]));
+  DONE;
+}")
+
+(define_expand "vec_unpacks_lo_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_VUPKLSH))]
+  "TARGET_ALTIVEC"
+  "
+{
+  emit_insn (gen_altivec_vupklsh (operands[0], operands[1]));
+  DONE;
+}")
+
+(define_insn "vperm_v8hiv4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+                   (match_operand:V4SI 2 "register_operand" "v")
+                   (match_operand:V16QI 3 "register_operand" "v")]
+                  UNSPEC_VPERMSI))]
+  "TARGET_ALTIVEC"
+  "vperm %0,%1,%2,%3"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "vperm_v16qiv8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
+                   (match_operand:V8HI 2 "register_operand" "v")
+                   (match_operand:V16QI 3 "register_operand" "v")]
+                  UNSPEC_VPERMHI))]
+  "TARGET_ALTIVEC"
+  "vperm %0,%1,%2,%3"
+  [(set_attr "type" "vecperm")])
+
+
+(define_expand "vec_unpacku_hi_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
+                     UNSPEC_VUPKHUB))]
+  "TARGET_ALTIVEC"      
+  "
+{  
+  rtx vzero = gen_reg_rtx (V8HImode);
+  rtx mask = gen_reg_rtx (V16QImode);
+  rtvec v = rtvec_alloc (16);
+   
+  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
+   
+  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0);
+  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
+  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2);
+  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
+  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
+  DONE;
+}")
+
+(define_expand "vec_unpacku_hi_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_VUPKHUH))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx vzero = gen_reg_rtx (V4SImode);
+  rtx mask = gen_reg_rtx (V16QImode);
+  rtvec v = rtvec_alloc (16);
+
+  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
+ 
+  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0);
+  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
+  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2);
+  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
+  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
+  DONE;
+}")
+
+(define_expand "vec_unpacku_lo_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
+                     UNSPEC_VUPKLUB))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx vzero = gen_reg_rtx (V8HImode);
+  rtx mask = gen_reg_rtx (V16QImode);
+  rtvec v = rtvec_alloc (16);
+
+  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
+
+  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8);
+  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
+  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10);
+  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
+  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
+  DONE;
+}")
+
+(define_expand "vec_unpacku_lo_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_VUPKLUH))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx vzero = gen_reg_rtx (V4SImode);
+  rtx mask = gen_reg_rtx (V16QImode);
+  rtvec v = rtvec_alloc (16);
+
+  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
+ 
+  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8);
+  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
+  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10);
+  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
+  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
+  DONE;
+}")
+
+(define_expand "vec_widen_umult_hi_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
+                      (match_operand:V16QI 2 "register_operand" "v")]
+                     UNSPEC_VMULWHUB))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx ve = gen_reg_rtx (V8HImode);
+  rtx vo = gen_reg_rtx (V8HImode);
+  
+  emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_widen_umult_lo_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
+                      (match_operand:V16QI 2 "register_operand" "v")]
+                     UNSPEC_VMULWLUB))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx ve = gen_reg_rtx (V8HImode);
+  rtx vo = gen_reg_rtx (V8HImode);
+  
+  emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_widen_smult_hi_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
+                      (match_operand:V16QI 2 "register_operand" "v")]
+                     UNSPEC_VMULWHSB))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx ve = gen_reg_rtx (V8HImode);
+  rtx vo = gen_reg_rtx (V8HImode);
+  
+  emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_widen_smult_lo_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
+                      (match_operand:V16QI 2 "register_operand" "v")]
+                     UNSPEC_VMULWLSB))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx ve = gen_reg_rtx (V8HImode);
+  rtx vo = gen_reg_rtx (V8HImode);
+  
+  emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_widen_umult_hi_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+                      (match_operand:V8HI 2 "register_operand" "v")]
+                     UNSPEC_VMULWHUH))]
+  "TARGET_ALTIVEC"
+  "
+{ 
+  rtx ve = gen_reg_rtx (V4SImode);
+  rtx vo = gen_reg_rtx (V4SImode);
+  
+  emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_widen_umult_lo_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+                      (match_operand:V8HI 2 "register_operand" "v")]
+                     UNSPEC_VMULWLUH))]
+  "TARGET_ALTIVEC"
+  "
+{ 
+  rtx ve = gen_reg_rtx (V4SImode);
+  rtx vo = gen_reg_rtx (V4SImode);
+  
+  emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_widen_smult_hi_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+                      (match_operand:V8HI 2 "register_operand" "v")]
+                     UNSPEC_VMULWHSH))]
+  "TARGET_ALTIVEC"
+  "
+{ 
+  rtx ve = gen_reg_rtx (V4SImode);
+  rtx vo = gen_reg_rtx (V4SImode);
+  
+  emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_widen_smult_lo_v8hi"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+                      (match_operand:V8HI 2 "register_operand" "v")]
+                     UNSPEC_VMULWLSH))]
+  "TARGET_ALTIVEC"
+  "
+{ 
+  rtx ve = gen_reg_rtx (V4SImode);
+  rtx vo = gen_reg_rtx (V4SImode);
+  
+  emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+  DONE;
+}")
+
+(define_expand "vec_pack_mod_v8hi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+        (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
+                       (match_operand:V8HI 2 "register_operand" "v")]
+                      UNSPEC_VPKUHUM))]
+  "TARGET_ALTIVEC"
+  "
+{
+  emit_insn (gen_altivec_vpkuhum (operands[0], operands[1], operands[2]));
+  DONE;
+}")
+                                                                                
+(define_expand "vec_pack_mod_v4si"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
+                      (match_operand:V4SI 2 "register_operand" "v")]
+                     UNSPEC_VPKUWUM))]
+  "TARGET_ALTIVEC"
+  "
+{
+  emit_insn (gen_altivec_vpkuwum (operands[0], operands[1], operands[2]));
+  DONE;
+}")
+
 (define_expand "negv4sf2"
   [(use (match_operand:V4SF 0 "register_operand" ""))
    (use (match_operand:V4SF 1 "register_operand" ""))]
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9d784aeb50d..eec8a49093a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -693,6 +693,8 @@ static int rs6000_sched_reorder (FILE *, int, rtx *, int *, int);
 static int rs6000_sched_reorder2 (FILE *, int, rtx *, int *, int);
 static int rs6000_use_sched_lookahead (void);
 static tree rs6000_builtin_mask_for_load (void);
+static tree rs6000_builtin_mul_widen_even (tree);
+static tree rs6000_builtin_mul_widen_odd (tree);
 
 static void def_builtin (int, const char *, tree, int);
 static void rs6000_init_builtins (void);
@@ -952,6 +954,10 @@ static const char alt_reg_names[][8] =
 
 #undef TARGET_VECTORIZE_BUILTIN_MASK_FOR_LOAD
 #define TARGET_VECTORIZE_BUILTIN_MASK_FOR_LOAD rs6000_builtin_mask_for_load
+#undef TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN
+#define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN rs6000_builtin_mul_widen_even
+#undef TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD
+#define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD rs6000_builtin_mul_widen_odd
 
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
@@ -1631,6 +1637,52 @@ rs6000_builtin_mask_for_load (void)
     return 0;
 }
 
+/* Implement targetm.vectorize.builtin_mul_widen_even.  */
+static tree
+rs6000_builtin_mul_widen_even (tree type)
+{
+  if (!TARGET_ALTIVEC)
+    return NULL_TREE;
+
+  switch (TYPE_MODE (type))
+    {
+    case V8HImode:
+      return TYPE_UNSIGNED (type) ? 
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULEUH] :
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULESH];
+
+    case V16QImode:
+      return TYPE_UNSIGNED (type) ?
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULEUB] :
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULESB];
+    default:
+      return NULL_TREE;
+    }
+}
+
+/* Implement targetm.vectorize.builtin_mul_widen_odd.  */
+static tree
+rs6000_builtin_mul_widen_odd (tree type)
+{
+  if (!TARGET_ALTIVEC)
+    return NULL_TREE;
+
+  switch (TYPE_MODE (type))
+    {
+    case V8HImode:
+      return TYPE_UNSIGNED (type) ?
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULOUH] :
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULOSH];
+
+    case V16QImode:
+      return TYPE_UNSIGNED (type) ?
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULOUB] :
+            rs6000_builtin_decls[ALTIVEC_BUILTIN_VMULOSB];
+    default:
+      return NULL_TREE;
+    }
+}
+
 /* Handle generic options of the form -mfoo=yes/no.
    NAME is the option name.
    VALUE is the option value.
diff --git a/gcc/doc/c-tree.texi b/gcc/doc/c-tree.texi
index bc4d9b67fa8..486e71d7779 100644
--- a/gcc/doc/c-tree.texi
+++ b/gcc/doc/c-tree.texi
@@ -1928,6 +1928,14 @@ This macro returns the attributes on the type @var{type}.
 @tindex OMP_CONTINUE
 @tindex OMP_ATOMIC
 @tindex OMP_CLAUSE
+@tindex VEC_LSHIFT_EXPR
+@tindex VEC_RSHIFT_EXPR
+@tindex VEC_WIDEN_MULT_HI_EXPR
+@tindex VEC_WIDEN_MULT_LO_EXPR
+@tindex VEC_UNPACK_HI_EXPR
+@tindex VEC_UNPACK_LO_EXPR
+@tindex VEC_PACK_MOD_EXPR
+@tindex VEC_PACK_SAT_EXPR
 
 The internal representation for expressions is for the most part quite
 straightforward.  However, there are a few facts that one must bear in
@@ -2735,4 +2743,44 @@ same clause @code{C} need to be represented as multiple @code{C} clauses
 chained together.  This facilitates adding new clauses during
 compilation.
 
+@item VEC_LSHIFT_EXPR
+@item VEC_RSHIFT_EXPR
+These nodes represent whole vector left and right shifts, respectively.  
+The first operand is the vector to shift; it will always be of vector type.  
+The second operand is an expression for the number of bits by which to
+shift.  Note that the result is undefined if the second operand is larger
+than or equal to the first operand's type size.
+
+@item VEC_WIDEN_MULT_HI_EXPR
+@item VEC_WIDEN_MULT_LO_EXPR
+These nodes represent widening vector multiplication of the high and low
+parts of the two input vectors, respectively.  Their operands are vectors 
+that contain the same number of elements (@code{N}) of the same integral type.  
+The result is a vector that contains half as many elements, of an integral type 
+whose size is twice as wide.  In the case of @code{VEC_WIDEN_MULT_HI_EXPR} the
+high @code{N/2} elements of the two vector are multiplied to produce the
+vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the
+low @code{N/2} elements of the two vector are multiplied to produce the
+vector of @code{N/2} products.
+
+@item VEC_UNPACK_HI_EXPR
+@item VEC_UNPACK_LO_EXPR
+These nodes represent unpacking of the high and low parts of the input vector, 
+respectively.  The single operand is a vector that contains @code{N} elements 
+of the same integral type.  The result is a vector that contains half as many 
+elements, of an integral type whose size is twice as wide.  In the case of 
+@code{VEC_UNPACK_HI_EXPR} the high @code{N/2} elements of the vector are 
+extracted and widened (promoted).  In the case of @code{VEC_UNPACK_LO_EXPR} the 
+low @code{N/2} elements of the vector are extracted and widened (promoted).
+
+@item VEC_PACK_MOD_EXPR
+@item VEC_PACK_SAT_EXPR
+These nodes represent packing of elements of the two input vectors into the
+output vector, using modulo or saturating arithmetic, respectively.
+Their operands are vectors that contain the same number of elements 
+of the same integral type.  The result is a vector that contains twice as many 
+elements, of an integral type whose size is half as wide.  In both cases
+the elements of the two vectors are demoted and merged (concatenated) to form
+the output vector.
+
 @end table
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index dd0f379a550..26be25c1f2d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3495,6 +3495,36 @@ Operand 2 is an integer shift amount in bits.
 Operand 0 is where the resulting shifted vector is stored.
 The output and input vectors should have the same modes.
 
+@cindex @code{vec_pack_mod_@var{m}} instruction pattern
+@cindex @code{vec_pack_ssat_@var{m}} instruction pattern
+@cindex @code{vec_pack_usat_@var{m}} instruction pattern
+@item @samp{vec_pack_mod_@var{m}}, @samp{vec_pack_ssat_@var{m}}, @samp{vec_pack_usat_@var{m}}
+Narrow (demote) and merge the elements of two vectors.
+Operands 1 and 2 are vectors of the same mode.
+Operand 0 is the resulting vector in which the elements of the two input
+vectors are concatenated after narrowing them down using modulo arithmetic or
+signed/unsigned saturating arithmetic.
+
+@cindex @code{vec_unpacks_hi_@var{m}} instruction pattern
+@cindex @code{vec_unpacks_lo_@var{m}} instruction pattern
+@cindex @code{vec_unpacku_hi_@var{m}} instruction pattern
+@cindex @code{vec_unpacku_lo_@var{m}} instruction pattern
+@item @samp{vec_unpacks_hi_@var{m}}, @samp{vec_unpacks_lo_@var{m}}, @samp{vec_unpacku_hi_@var{m}}, @samp{vec_unpacku_lo_@var{m}}
+Extract and widen (promote) the high/low part of a vector of signed/unsigned
+elements. The input vector (operand 1) has N signed/unsigned elements of size S. 
+Using sign/zero extension widen (promote) the high/low elements of the vector,
+and place the resulting N/2 values of size 2*S in the output vector (operand 0).
+
+@cindex @code{vec_widen_umult_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_umult_lo__@var{m}} instruction pattern
+@cindex @code{vec_widen_smult_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_smult_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_umult_hi_@var{m}}, @samp{vec_widen_umult_lo_@var{m}}, @samp{vec_widen_smult_hi_@var{m}}, @samp{vec_widen_smult_lo_@var{m}}
+Signed/Unsigned widening multiplication. 
+The two inputs (operands 1 and 2) are vectors with N 
+signed/unsigned elements of size S. Multiply the high/low elements of the two 
+vectors, and put the N/2 products of size 2*S in the output vector (opernad 0). 
+
 @cindex @code{mulhisi3} instruction pattern
 @item @samp{mulhisi3}
 Multiply operands 1 and 2, which have mode @code{HImode}, and store
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c4dc052ddbe..768c7c7540d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5284,6 +5284,28 @@ the argument @var{OFF} to @code{REALIGN_LOAD}, in which case the low
 log2(@var{VS})-1 bits of @var{addr} will be considered.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN (tree @var{x})
+This hook should return the DECL of a function @var{f} that implements
+widening multiplication of the even elements of two input vectors of type @var{x}.
+
+If this hook is defined, the autovectorizer will use it along with the
+@code{TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD} target hook when vectorizing
+widening multiplication in cases that the order of the results does not have to be
+preserved (e.g. used only by a reduction computation). Otherwise, the
+@code{widen_mult_hi/lo} idioms will be used.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD (tree @var{x})
+This hook should return the DECL of a function @var{f} that implements
+widening multiplication of the odd elements of two input vectors of type @var{x}.
+
+If this hook is defined, the autovectorizer will use it along with the
+@code{TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN} target hook when vectorizing
+widening multiplication in cases that the order of the results does not have to be
+preserved (e.g. used only by a reduction computation). Otherwise, the
+@code{widen_mult_hi/lo} idioms will be used.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index 50564b65072..f380413760d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8757,6 +8757,37 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode,
 	return target;
       }
 
+    case VEC_UNPACK_HI_EXPR:
+    case VEC_UNPACK_LO_EXPR:
+      {
+	op0 = expand_expr (TREE_OPERAND (exp, 0), NULL_RTX, VOIDmode, 0);
+	this_optab = optab_for_tree_code (code, type);
+	temp = expand_widen_pattern_expr (exp, op0, NULL_RTX, NULL_RTX,
+					  target, unsignedp);
+	gcc_assert (temp);
+	return temp;
+      }
+
+    case VEC_WIDEN_MULT_HI_EXPR:
+    case VEC_WIDEN_MULT_LO_EXPR:
+      {
+	tree oprnd0 = TREE_OPERAND (exp, 0);
+	tree oprnd1 = TREE_OPERAND (exp, 1);
+
+	expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, 0);
+	target = expand_widen_pattern_expr (exp, op0, op1, NULL_RTX,
+					    target, unsignedp);
+	gcc_assert (target);
+	return target;
+      }
+
+    case VEC_PACK_MOD_EXPR:
+    case VEC_PACK_SAT_EXPR:
+      {
+	mode = TYPE_MODE (TREE_TYPE (TREE_OPERAND (exp, 0)));
+	goto binop;
+      }
+
     default:
       return lang_hooks.expand_expr (exp, original_target, tmode,
 				     modifier, alt_rtl);
diff --git a/gcc/genopinit.c b/gcc/genopinit.c
index a169edd991a..ceac9fe9507 100644
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@@ -214,7 +214,17 @@ static const char * const optabs[] =
   "reduc_smin_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_smin_$a$)",
   "reduc_umin_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_umin_$a$)",
   "reduc_splus_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_splus_$a$)" ,
-  "reduc_uplus_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_uplus_$a$)" 
+  "reduc_uplus_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_uplus_$a$)",
+  "vec_widen_umult_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_widen_umult_hi_$a$)",
+  "vec_widen_umult_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_widen_umult_lo_$a$)",
+  "vec_widen_smult_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_widen_smult_hi_$a$)",
+  "vec_widen_smult_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_widen_smult_lo_$a$)",
+  "vec_unpacks_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacks_hi_$a$)",
+  "vec_unpacks_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacks_lo_$a$)",
+  "vec_unpacku_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_hi_$a$)",
+  "vec_unpacku_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_lo_$a$)",
+  "vec_pack_mod_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_mod_$a$)",
+  "vec_pack_ssat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_ssat_$a$)",  "vec_pack_usat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_usat_$a$)"
 };
 
 static void gen_insn (rtx);
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 92887ab6b8d..a638056d0b1 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -315,6 +315,28 @@ optab_for_tree_code (enum tree_code code, tree type)
     case VEC_RSHIFT_EXPR:
       return vec_shr_optab;
 
+    case VEC_WIDEN_MULT_HI_EXPR:
+      return TYPE_UNSIGNED (type) ? 
+	vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
+
+    case VEC_WIDEN_MULT_LO_EXPR:
+      return TYPE_UNSIGNED (type) ? 
+	vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
+
+    case VEC_UNPACK_HI_EXPR:
+      return TYPE_UNSIGNED (type) ? 
+	vec_unpacku_hi_optab : vec_unpacks_hi_optab;
+
+    case VEC_UNPACK_LO_EXPR:
+      return TYPE_UNSIGNED (type) ? 
+	vec_unpacku_lo_optab : vec_unpacks_lo_optab;
+
+    case VEC_PACK_MOD_EXPR:
+      return vec_pack_mod_optab;
+                                                                                
+    case VEC_PACK_SAT_EXPR:
+      return TYPE_UNSIGNED (type) ? vec_pack_usat_optab : vec_pack_ssat_optab;
+                                                                                
     default:
       break;
     }
@@ -1276,6 +1298,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
       int icode = (int) binoptab->handlers[(int) mode].insn_code;
       enum machine_mode mode0 = insn_data[icode].operand[1].mode;
       enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+      enum machine_mode tmp_mode;
       rtx pat;
       rtx xop0 = op0, xop1 = op1;
 
@@ -1329,8 +1352,21 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	  && mode1 != VOIDmode)
 	xop1 = copy_to_mode_reg (mode1, xop1);
 
-      if (!insn_data[icode].operand[0].predicate (temp, mode))
-	temp = gen_reg_rtx (mode);
+      if (binoptab == vec_pack_mod_optab 
+	  || binoptab == vec_pack_usat_optab
+          || binoptab == vec_pack_ssat_optab)
+	{
+	  /* The mode of the result is different then the mode of the
+	     arguments.  */
+	  tmp_mode = insn_data[icode].operand[0].mode;
+	  if (GET_MODE_NUNITS (tmp_mode) != 2 * GET_MODE_NUNITS (mode))
+	    return 0;
+	}
+      else
+        tmp_mode = mode;
+
+      if (!insn_data[icode].operand[0].predicate (temp, tmp_mode))
+	temp = gen_reg_rtx (tmp_mode);
 
       pat = GEN_FCN (icode) (temp, xop0, xop1);
       if (pat)
@@ -5354,6 +5390,17 @@ init_optabs (void)
   vec_shr_optab = init_optab (UNKNOWN);
   vec_realign_load_optab = init_optab (UNKNOWN);
   movmisalign_optab = init_optab (UNKNOWN);
+  vec_widen_umult_hi_optab = init_optab (UNKNOWN);
+  vec_widen_umult_lo_optab = init_optab (UNKNOWN);
+  vec_widen_smult_hi_optab = init_optab (UNKNOWN);
+  vec_widen_smult_lo_optab = init_optab (UNKNOWN);
+  vec_unpacks_hi_optab = init_optab (UNKNOWN);
+  vec_unpacks_lo_optab = init_optab (UNKNOWN);
+  vec_unpacku_hi_optab = init_optab (UNKNOWN);
+  vec_unpacku_lo_optab = init_optab (UNKNOWN);
+  vec_pack_mod_optab = init_optab (UNKNOWN);
+  vec_pack_usat_optab = init_optab (UNKNOWN);
+  vec_pack_ssat_optab = init_optab (UNKNOWN);
 
   powi_optab = init_optab (UNKNOWN);
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index b47e7623a2e..d197766c772 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -260,6 +260,22 @@ enum optab_index
   OTI_vec_shr,
   /* Extract specified elements from vectors, for vector load.  */
   OTI_vec_realign_load,
+  /* Widening multiplication.  
+     The high/low part of the resulting vector of products is returned.  */
+  OTI_vec_widen_umult_hi,
+  OTI_vec_widen_umult_lo,
+  OTI_vec_widen_smult_hi,
+  OTI_vec_widen_smult_lo,
+  /* Extract and widen the high/low part of a vector of signed/unsigned 
+     elements.  */
+  OTI_vec_unpacks_hi,
+  OTI_vec_unpacks_lo,
+  OTI_vec_unpacku_hi,
+  OTI_vec_unpacku_lo,
+  /* Narrow (demote) and merge the elements of two vectors.  */
+  OTI_vec_pack_mod,
+  OTI_vec_pack_usat,
+  OTI_vec_pack_ssat,
 
   /* Perform a raise to the power of integer.  */
   OTI_powi,
@@ -385,7 +401,18 @@ extern GTY(()) optab optab_table[OTI_MAX];
 #define vec_shl_optab (optab_table[OTI_vec_shl])
 #define vec_shr_optab (optab_table[OTI_vec_shr])
 #define vec_realign_load_optab (optab_table[OTI_vec_realign_load])
-
+#define vec_widen_umult_hi_optab (optab_table[OTI_vec_widen_umult_hi])
+#define vec_widen_umult_lo_optab (optab_table[OTI_vec_widen_umult_lo])
+#define vec_widen_smult_hi_optab (optab_table[OTI_vec_widen_smult_hi])
+#define vec_widen_smult_lo_optab (optab_table[OTI_vec_widen_smult_lo])
+#define vec_unpacks_hi_optab (optab_table[OTI_vec_unpacks_hi])
+#define vec_unpacku_hi_optab (optab_table[OTI_vec_unpacku_hi])
+#define vec_unpacks_lo_optab (optab_table[OTI_vec_unpacks_lo])
+#define vec_unpacku_lo_optab (optab_table[OTI_vec_unpacku_lo])
+#define vec_pack_mod_optab (optab_table[OTI_vec_pack_mod])
+#define vec_pack_ssat_optab (optab_table[OTI_vec_pack_ssat])
+#define vec_pack_usat_optab (optab_table[OTI_vec_pack_usat])
+                                                                                
 #define powi_optab (optab_table[OTI_powi])
 
 /* Conversion optabs have their own table and indexes.  */
diff --git a/gcc/target-def.h b/gcc/target-def.h
index 82e711a79c7..1e158c164f3 100644
--- a/gcc/target-def.h
+++ b/gcc/target-def.h
@@ -332,9 +332,13 @@ Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
    TARGET_SCHED_SET_SCHED_FLAGS}
 
 #define TARGET_VECTORIZE_BUILTIN_MASK_FOR_LOAD 0
+#define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN 0
+#define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD 0
 
 #define TARGET_VECTORIZE                                                \
-  {TARGET_VECTORIZE_BUILTIN_MASK_FOR_LOAD}
+  {TARGET_VECTORIZE_BUILTIN_MASK_FOR_LOAD,				\
+   TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN,                             \
+   TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD}
 
 #define TARGET_DEFAULT_TARGET_FLAGS 0
 
diff --git a/gcc/target.h b/gcc/target.h
index b193b624f39..82eaccdd551 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -369,6 +369,13 @@ struct gcc_target
        by the vectorizer, and return the decl of the target builtin
        function.  */
     tree (* builtin_mask_for_load) (void);
+
+    /* Target builtin that implements vector widening multiplication.
+       builtin_mul_widen_eve computes the element-by-element products 
+       for the even elements, and builtin_mul_widen_odd computes the
+       element-by-element products for the odd elements.  */
+    tree (* builtin_mul_widen_even) (tree);
+    tree (* builtin_mul_widen_odd) (tree);
   } vectorize;
 
   /* The initial value of target_flags.  */
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4459c08a472..1a6cf27a3fb 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,67 @@
+2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* gcc.dg/vect/vect-1.c: Loop with multiple types removed (appears in
+	vect-9.c).
+	* gcc.dg/vect/vect-106.c: Removed (duplicate of vect-9.c).
+	* gcc.dg/vect/vect-9.c: Now vectorizable.
+	* gcc.dg/vect/vect-reduc-dot-s16a.c: Now vectorizable also on targets
+	that support vect_widen_mult. 
+	* gcc.dg/vect/vect-reduc-dot-u16.c: Removed (split into two new tests).
+	* gcc.dg/vect/vect-reduc-dot-u16a.c: New test (split from 
+	vect-reduc-dot-u16.c).
+	* gcc.dg/vect/vect-reduc-dot-u16b.c: New test (split from 
+	vect-reduc-dot-u16.c).
+	* gcc.dg/vect/vect-reduc-dot-s8.c: Removed (split into three new tests).
+	* gcc.dg/vect/vect-reduc-dot-s8a.c: New test (split from
+	vect-reduc-dot-s8.c).
+	* gcc.dg/vect/vect-reduc-dot-s8b.c: New test (split from
+	vect-reduc-dot-s8.c).
+	* gcc.dg/vect/vect-reduc-dot-s8c.c: New test (split from
+	vect-reduc-dot-s8.c).
+	* gcc.dg/vect/vect-reduc-dot-u8.c: Removed (split into two new tests).
+	* gcc.dg/vect/vect-reduc-dot-u8a.c: New test (split from
+	vect-reduc-dot-u8.c).
+	* gcc.dg/vect/vect-reduc-dot-u8b.c: New test (split from
+	vect-reduc-dot-u8.c).
+	* gcc.dg/vect/vect-widen-mult-sum.c: New test.
+	* gcc.dg/vect/vect-multitypes-9.c: New test.
+	* gcc.dg/vect/vect-multitypes-10.c: New test.
+	* gcc.dg/vect/vect-widen-mult-s16.c: New test.
+	* gcc.dg/vect/vect-widen-mult-u16.c: New test.
+	* gcc.dg/vect/vect-widen-mult-u8.c: New test.
+	* gcc.dg/vect/vect-widen-mult-s8.c: New test.
+	* gcc.dg/vect/wrapv-vect-reduc-dot-s8.c: Removed.
+	* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: New reduced version of
+	wrapv-vect-reduc-dot-s8.c.
+	* lib/target-support.exp (check_effective_target_vect_unpack): New.
+	(check_effective_target_vect_widen_sum_hi_to_si): Now also includes
+	targets that support vec_unpack. 
+	(check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
+	(check_effective_target_vect_widen_mult_qi_to_hi): New.
+	(check_effective_target_vect_widen_mult_hi_to_si): New.
+	(check_effective_target_vect_widen_sum): Removed.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* gcc.dg/vect/vect-multitypes-8.c: New test.
+	* lib/target-supports.exp (check_effective_target_vect_pack_mod): New.
+
+	2006-11-08 Dorit Nuzman  <dorit@il.ibm.com>
+
+	* gcc.dg/vect/vect-multitypes-7.c: New test.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* gcc.dg/vect/vect-multitypes-4.c: New test.
+	* gcc.dg/vect/vect-multitypes-5.c: New test.
+	* gcc.dg/vect/vect-multitypes-6.c: New test.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+	
+	* gcc.dg/vect/vect-multitypes-1.c: New test.
+	* gcc.dg/vect/vect-multitypes-2.c: New test.
+	* gcc.dg/vect/vect-multitypes-3.c: New test.
+
 2006-11-07  Eric Christopher  <echristo@apple.com>
 
 	* gcc.target/i386/builtin-bswap-1.c: Rewrite for 64-bit.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-1.c b/gcc/testsuite/gcc.dg/vect/vect-1.c
index 938a7b3d181..6df6af078f5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-1.c
@@ -19,9 +19,6 @@ foo (int n)
   int ia[N];
   int ib[N];
   int ic[N];
-  short sa[N];
-  short sb[N];
-  short sc[N];
   int i,j;
   int diff = 0;
   char cb[N];
@@ -80,16 +77,6 @@ foo (int n)
   fbar (a);
   fbar (d);
 
-
-  /* Not vectorizable yet (two types with different nunits in vector).  */
-  for (i = 0; i < N; i++){
-    ia[i] = ib[i] + ic[i];
-    sa[i] = sb[i] + sc[i];
-  }
-  ibar (ia);
-  sbar (sa);
-
-
   /* Not vetorizable yet (too conservative dependence test).  */
   for (i = 0; i < N; i++){
     a[i] = b[i] + c[i];
diff --git a/gcc/testsuite/gcc.dg/vect/vect-106.c b/gcc/testsuite/gcc.dg/vect/vect-106.c
deleted file mode 100644
index 43e43a6cd1f..00000000000
--- a/gcc/testsuite/gcc.dg/vect/vect-106.c
+++ /dev/null
@@ -1,40 +0,0 @@
-/* { dg-require-effective-target vect_int } */
-
-#include <stdarg.h>
-#include "tree-vect.h"
-
-#define N 16
-
-int
-main1 (void)
-{
-  int i;
-  short sb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
-  int ia[N];
-
-  /* Type cast.  */
-  for (i = 0; i < N; i++)
-    {
-      ia[i] = (int) sb[i];
-    }
-
-
-  /* Check results.  */
-  for (i = 0; i < N; i++)
-    {
-      if (ia[i] != (int) sb[i])
-	abort();
-    }
-
-  return 0;
-}
-
-int main (void)
-{
-  check_vect ();
-  return main1 ();
-}
-
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ 
-/* { dg-final { cleanup-tree-dump "vect" } } */
-
diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c b/gcc/testsuite/gcc.dg/vect/vect-109.c
index ba6b2cee3d6..e861a772d5b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-109.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
@@ -3,42 +3,75 @@
 #include <stdarg.h>
 #include "tree-vect.h"
 
-#define N 16
+#define N 32
 
-int
-main1 ()
+short sa[N];
+short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+int ia[N];
+int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main1 (int n)
+{
+  int i;
+
+  /* Multiple types with different sizes, used in idependent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      sa[i+2] = sb[i] + sc[i];
+      ia[i+1] = ib[i] + ic[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+2] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+int main2 (int n)
 {
   int i;
-  short sc[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
-  short sb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
-  short sa[N];
-  int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
-  int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
-  int ia[N];
-
-  /* Two types with different nunits in vector.  */
-  for (i = 0; i < N; i++)
+
+  /* Multiple types with different sizes, used in idependent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
     {
-      ia[i] = ib[i] + ic[i];
+      ia[i+1] = ib[i] + ic[i];
       sa[i] = sb[i] + sc[i];
     }
 
-  /* Check results.  */
-  for (i = 0; i < N; i++)
+  /* check results:  */
+  for (i = 0; i < n; i++)
     {
-      if (ia[i] != ib[i] + ic[i] || sa[i] != sb[i] + sc[i])
-	abort();
+      if (sa[i] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
+        abort ();
     }
 
- return 0;
+  return 0;
 }
-     
+
+
 int main (void)
-{    
+{ 
   check_vect ();
-  return main1 ();
+  
+  main1 (N-2);
+  main2 (N-1);
+
+  return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "not vectorized: unsupported unaligned store" 2 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-9.c b/gcc/testsuite/gcc.dg/vect/vect-9.c
index 76638d0df76..77ff55e064c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-9.c
@@ -11,7 +11,7 @@ int main1 ()
   short sb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
   int ia[N];
 
-  /* Not vetorizable yet (type cast).  */
+  /* Requires type promotion (vector unpacking) support.  */
   for (i = 0; i < N; i++)
     {
       ia[i] = (int) sb[i];
@@ -34,5 +34,5 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-96.c b/gcc/testsuite/gcc.dg/vect/vect-96.c
index ac3194e62c3..ed463fb39dd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-96.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-96.c
@@ -38,7 +38,7 @@ int main (void)
 }
 
 /* The store is unaligned, the load is aligned. For targets that support unaligned
-   loads, peel to align the store and generated unaligned access for the loads.
+   loads, peel to align the store and generate an unaligned access for the load.
    For targets that don't support unaligned loads, version for the store.  */
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
new file mode 100644
index 00000000000..2b884011952
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
@@ -0,0 +1,87 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+short sa[N];
+short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+int ia[N];
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
+   which will also align the access to 'ia[i+3]', and the loop could be 
+   vectorized on all targets that support unaligned loads.
+ */
+
+int main1 (int n)
+{
+  int i;
+
+  /* Multiple types with different sizes, used in idependent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      sa[i+7] = sb[i];
+      ia[i+3] = ib[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+7] != sb[i] || ia[i+3] != ib[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-3%VF. This will result in a peeling factor
+   5 if VF=8, or 1 if VF=4,2. In either case, this will also align the access 
+   to 'sa[i+3]', and the loop could be vectorized on targets that support 
+   unaligned loads.  */
+
+int main2 (int n)
+{
+  int i;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i+3] = ib[i];
+      sa[i+3] = sb[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] || ia[i+3] != ib[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (N-7);
+  main2 (N-3);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 4 "vect" { xfail vect_no_align } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
new file mode 100644
index 00000000000..89584672589
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
@@ -0,0 +1,67 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+unsigned char uX[N] __attribute__ ((__aligned__(16)));
+unsigned short uY[N] __attribute__ ((__aligned__(16)));
+unsigned int uresult[N];
+signed char X[N] __attribute__ ((__aligned__(16)));
+signed short Y[N] __attribute__ ((__aligned__(16)));
+int result[N];
+
+/* Unsigned type promotion (hi->si) */
+int
+foo1(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    uX[i] = 5;
+    uresult[i] = (unsigned int)uY[i];
+  }
+}
+
+/* Signed type promotion (hi->si) */
+int
+foo2(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    uX[i] = 5;
+    result[i] = (int)Y[i];
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = 16-i;
+    uX[i] = 16-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (uresult[i] != (unsigned short)uY[i])
+      abort ();
+  }
+  
+  foo2 (N);
+  
+  for (i=0; i<N; i++) {
+    if (result[i] != (short)Y[i])
+      abort ();
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_unpack } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
new file mode 100644
index 00000000000..4f4ecbc5a4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int main1 ()
+{
+  int i;
+  int ia[N];
+  int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  short sa[N];
+  short sb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  char ca[N];
+  char cb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+
+  /* Multiple types with different sizes, used in independent
+     cmputations. Vectorizable. All accesses aligned.   */
+  for (i = 0; i < N; i++)
+    {
+      ia[i] = ib[i];
+      sa[i] = sb[i];
+      ca[i] = cb[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (ia[i] != ib[i] 
+	  || sa[i] != sb[i] 
+	  || ca[i] != cb[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
new file mode 100644
index 00000000000..959fa025e39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int ib[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+short sb[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+char cb[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+
+int main1 (int n, int * __restrict__ pib, 
+	   short * __restrict__ psb, 
+	   char * __restrict__ pcb)
+{
+  int i;
+  int ia[N];
+  short sa[N];
+  char ca[N];
+
+  /* Multiple types with different sizes, used in independent
+     computations. Vectorizable. The loads are misaligned.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i] = pib[i];
+      sa[i] = psb[i];
+      ca[i] = pcb[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (ia[i] != pib[i] 
+	  || sa[i] != psb[i] 
+	  || ca[i] != pcb[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (N, ib, sb, cb);
+  main1 (N-3, ib, sb, &cb[2]);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 "vect" {xfail vect_no_align } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
new file mode 100644
index 00000000000..63f244d06b7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
@@ -0,0 +1,91 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+unsigned short sa[N];
+unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[N];
+unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
+   which will also align the access to 'ia[i+3]', and the loop could be
+   vectorized on all targets that support unaligned loads.
+ */
+
+int main1 (int n)
+{
+  int i;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      sa[i+7] = sb[i] + sc[i];
+      ia[i+3] = ib[i] + ic[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-3%VF. This will result in a peeling factor
+   5 if VF=8, or 1 if VF=4,2. In either case, this will also align the access 
+   to 'sa[i+3]', and the loop could be vectorized on targets that support 
+   unaligned loads.  */
+
+int main2 (int n)
+{
+  int i;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i+3] = ib[i] + ic[i];
+      sa[i+3] = sb[i] + sc[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (N-7);
+  main2 (N-3);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 8 "vect" { xfail vect_no_align } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
new file mode 100644
index 00000000000..d6ad34d7468
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int main1 ()
+{
+  int i;
+  unsigned int ia[N];
+  unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned  int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned short sa[N];
+  unsigned short sc[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned short sb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char ca[N];
+  unsigned char cc[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char cb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+
+  /* Multiple types with different sizes, used in independent
+     computations. Vectorizable. All accesses aligned.  */
+  for (i = 0; i < N; i++)
+    {
+      ia[i] = ib[i] + ic[i];
+      sa[i] = sb[i] + sc[i];
+      ca[i] = cb[i] + cc[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (ia[i] != ib[i] + ic[i] 
+	  || sa[i] != sb[i] + sc[i] 
+	  || ca[i] != cb[i] + cc[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
new file mode 100644
index 00000000000..34c284a2a9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
@@ -0,0 +1,64 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+unsigned int ic[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+unsigned int ib[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+unsigned short sc[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+unsigned short sb[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+unsigned char cc[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+unsigned char cb[N] __attribute__ ((__aligned__(16))) = 
+	{0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+
+int main1 (int n, 
+	   unsigned int * __restrict__ pic, unsigned int * __restrict__ pib, 
+	   unsigned short * __restrict__ psc, unsigned short * __restrict__ psb,
+	   unsigned char * __restrict__ pcc, unsigned char * __restrict__ pcb)
+{
+  int i;
+  unsigned int ia[N];
+  unsigned short sa[N];
+  unsigned char ca[N];
+
+  /* Multiple types with different sizes, used in independent
+     computations. Vectorizable. The loads are misaligned.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i] = pib[i] + pic[i];
+      sa[i] = psb[i] + psc[i];
+      ca[i] = pcb[i] + pcc[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (ia[i] != pib[i] + pic[i] 
+	  || sa[i] != psb[i] + psc[i] 
+	  || ca[i] != pcb[i] + pcc[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (N, ic, ib, sc, sb, cc, cb);
+  main1 (N-3, ic, ib, &sc[1], sb, cc, &cb[2]);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 6 "vect" {xfail vect_no_align } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-7.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-7.c
new file mode 100644
index 00000000000..8cbb502a3f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-7.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdio.h>
+
+#define N 64
+
+#define DOT1 43680
+#define DOT2 -20832
+
+signed short X[N] __attribute__ ((__aligned__(16)));
+signed short Y[N] __attribute__ ((__aligned__(16)));
+unsigned char CX[N] __attribute__ ((__aligned__(16)));
+
+void
+foo1(int len) {
+  int i;
+  int result1 = 0;
+  short prod;
+
+  for (i=0; i<len; i++) {
+    result1 += (X[i] * Y[i]);
+    CX[i] = 5;
+  }
+
+  if (result1 != DOT1)
+    abort ();
+}
+
+
+int main (void)
+{
+  int i, dot1, dot2;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+    CX[i] = i;
+  }
+
+  foo1 (N);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_hi } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
new file mode 100644
index 00000000000..639415cdff2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+unsigned char uX[N] __attribute__ ((__aligned__(16)));
+unsigned char uresultX[N];
+unsigned int uY[N] __attribute__ ((__aligned__(16)));
+unsigned short uresultY[N];
+
+/* Unsigned type demotion (si->hi) */
+
+int
+foo1(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    uresultX[i] = uX[i];
+    uresultY[i] = (unsigned short)uY[i];
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    uX[i] = 16-i;
+    uY[i] = 16-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (uresultX[i] != uX[i])
+      abort ();
+    if (uresultY[i] != (unsigned short)uY[i])
+      abort ();
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_mod } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
new file mode 100644
index 00000000000..82c53e461f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
@@ -0,0 +1,63 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+unsigned char uX[N] __attribute__ ((__aligned__(16)));
+unsigned short uresult[N];
+signed char X[N] __attribute__ ((__aligned__(16)));
+short result[N];
+
+/* Unsigned type promotion (qi->hi) */
+int
+foo1(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    uresult[i] = (unsigned short)uX[i];
+  }
+}
+
+/* Signed type promotion (qi->hi) */
+int
+foo2(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    result[i] = (short)X[i];
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = 16-i;
+    uX[i] = 16-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (uresult[i] != (unsigned short)uX[i])
+      abort ();
+  }
+  
+  foo2 (N);
+  
+  for (i=0; i<N; i++) {
+    if (result[i] != (short)X[i])
+      abort ();
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_unpack } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
index d92511fcdfc..4f0c3e999d5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
@@ -50,5 +50,6 @@ main (void)
 
 /* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_hi } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
new file mode 100644
index 00000000000..31bdb1079fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+#define DOT1 43680
+
+signed char X[N] __attribute__ ((__aligned__(16)));
+signed char Y[N] __attribute__ ((__aligned__(16)));
+
+/* char->short->int dot product.
+   The dot-product pattern should be detected.
+   Vectorizable on vect_sdot_qi targets (targets that support dot-product of 
+   signed chars).
+
+   In the future could also be vectorized as widening-mult + widening-summation,
+   or with type-conversion support.
+ */
+int
+foo1(int len) {
+  int i;
+  int result = 0;
+  short prod;
+
+  for (i=0; i<len; i++) {
+    prod = X[i] * Y[i];
+    result += prod;
+  }
+  return result;
+}
+
+int main (void)
+{
+  int i, dot1;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  dot1 = foo1 (N);
+  if (dot1 != DOT1)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_qi } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi && vect_widen_sum_hi_to_si } } } } */
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
index 8e5d48035b3..9b22d748e1a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
@@ -5,34 +5,11 @@
 
 #define N 64
 
-#define DOT1 43680
 #define DOT2 -21856
-#define DOT3 43680
 
 signed char X[N] __attribute__ ((__aligned__(16)));
 signed char Y[N] __attribute__ ((__aligned__(16)));
 
-/* char->short->int dot product.
-   The dot-product pattern should be detected.
-   Vectorizable on vect_sdot_qi targets (targets that support dot-product of 
-   signed chars).
-
-   In the future could also be vectorized as widening-mult + widening-summation,
-   or with type-conversion support.
- */
-int
-foo1(int len) {
-  int i;
-  int result = 0;
-  short prod;
-
-  for (i=0; i<len; i++) {
-    prod = X[i] * Y[i];
-    result += prod;
-  }
-  return result;
-}
-
 /* char->short->short dot product.
    The dot-product pattern should be detected.
    The reduction is currently not vectorized becaus of the signed->unsigned->signed
@@ -45,9 +22,8 @@ foo1(int len) {
    When the dot-product is detected, the loop should be vectorized on vect_sdot_qi 
    targets (targets that support dot-product of signed char).  
    This test would currently fail to vectorize on targets that support
-   dot-product of chars when the accumulator is int.
-
-   In the future could also be vectorized as widening-mult + summation,
+   dot-product of chars into an int accumulator.
+   Alternatively, the loop could also be vectorized as widening-mult + summation,
    or with type-conversion support.
  */
 short
@@ -61,23 +37,9 @@ foo2(int len) {
   return result;
 }
 
-/* char->int->int dot product. 
-   Not detected as a dot-product pattern.
-   Currently fails to be vectorized due to presence of type conversions. */
-int
-foo3(int len) {
-  int i;
-  int result = 0;
-
-  for (i=0; i<len; i++) {
-    result += (X[i] * Y[i]);
-  }
-  return result;
-}
-
 int main (void)
 {
-  int i, dot1, dot3;
+  int i;
   short dot2;
 
   check_vect ();
@@ -87,25 +49,16 @@ int main (void)
     Y[i] = 64-i;
   }
 
-  dot1 = foo1 (N);
-  if (dot1 != DOT1)
-    abort ();
-
   dot2 = foo2 (N);
   if (dot2 != DOT2)
     abort ();
 
-  dot3 = foo3 (N);
-  if (dot3 != DOT3)
-    abort ();
-
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 2 "vect" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_qi } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
 
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8c.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8c.c
new file mode 100644
index 00000000000..bba41dfd80b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8c.c
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+#define DOT3 43680
+
+signed char X[N] __attribute__ ((__aligned__(16)));
+signed char Y[N] __attribute__ ((__aligned__(16)));
+
+/* char->int->int dot product. 
+   Not detected as a dot-product pattern.
+   Currently fails to be vectorized due to presence of type conversions. */
+int
+foo3(int len) {
+  int i;
+  int result = 0;
+
+  for (i=0; i<len; i++) {
+    result += (X[i] * Y[i]);
+  }
+  return result;
+}
+
+int main (void)
+{
+  int i, dot3;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  dot3 = foo3 (N);
+  if (dot3 != DOT3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16a.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16a.c
new file mode 100644
index 00000000000..38753f725d8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16a.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+#define DOT1 43680
+#define DOT2 43680
+
+unsigned short X[N] __attribute__ ((__aligned__(16)));
+unsigned short Y[N] __attribute__ ((__aligned__(16)));
+
+/* short->short->int dot product. 
+   Not detected as a dot-product pattern.
+   Requires support for non-widneing multiplication and widening-summation.  */
+unsigned int
+foo1(int len) {
+  int i;
+  unsigned int result = 0;
+  unsigned short prod;
+
+  for (i=0; i<len; i++) {
+    prod = X[i] * Y[i];
+    result += prod;
+  }
+  return result;
+}
+
+int main (void)
+{
+  unsigned int dot1;
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  dot1 = foo1 (N);
+  if (dot1 != DOT1)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_short_mult && vect_widen_sum_hi_to_si } } } } */ 
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16b.c
index 03db7e0b6a6..7c83108eace 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16b.c
@@ -5,28 +5,11 @@
 
 #define N 64
 
-#define DOT1 43680
 #define DOT2 43680
 
 unsigned short X[N] __attribute__ ((__aligned__(16)));
 unsigned short Y[N] __attribute__ ((__aligned__(16)));
 
-/* short->short->int dot product. 
-   Not detected as a dot-product pattern.
-   Not vectorized due to presence of type-conversions. */
-unsigned int
-foo1(int len) {
-  int i;
-  unsigned int result = 0;
-  unsigned short prod;
-
-  for (i=0; i<len; i++) {
-    prod = X[i] * Y[i];
-    result += prod;
-  }
-  return result;
-}
-
 /* short->int->int dot product. 
    Currently not detected as a dot-product pattern: the multiplication 
    promotes the ushorts to int, and then the product is promoted to unsigned 
@@ -46,7 +29,7 @@ foo2(int len) {
 
 int main (void)
 {
-  unsigned int dot1, dot2;
+  unsigned int  dot2;
   int i;
 
   check_vect ();
@@ -56,10 +39,6 @@ int main (void)
     Y[i] = 64-i;
   }
 
-  dot1 = foo1 (N);
-  if (dot1 != DOT1)
-    abort ();
-
   dot2 = foo2 (N);
   if (dot2 != DOT2)
     abort ();
@@ -69,9 +48,9 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" { xfail *-*-* } } } */
 
-/* Once the dot-product pattern is detected in the second loop, we expect
+/* Once the dot-product pattern is detected, we expect
    that loop to be vectorized on vect_udot_hi targets (targets that support 
-   dot-product of unsigned shorts).  */
+   dot-product of unsigned shorts) and targets that support widening multiplication.  */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */ 
 
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8.c
deleted file mode 100644
index ad68bc752c5..00000000000
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8.c
+++ /dev/null
@@ -1,101 +0,0 @@
-/* { dg-require-effective-target vect_int } */
-
-#include <stdarg.h>
-#include "tree-vect.h"
-
-#define N 64
-
-#define DOT1 43680
-#define DOT2 43680
-#define DOT3 43680
-
-unsigned char X[N] __attribute__ ((__aligned__(16)));
-unsigned char Y[N] __attribute__ ((__aligned__(16)));
-
-/* char->short->int dot product. 
-   Detected as a dot-product pattern.
-   Should be vectorized on targets that support dot-product for unsigned chars.
-   */
-unsigned int
-foo1(int len) {
-  int i;
-  unsigned int result = 0;
-  unsigned short prod;
-
-  for (i=0; i<len; i++) {
-    prod = X[i] * Y[i];
-    result += prod;
-  }
-  return result;
-}
-
-/* char->short->short dot product. 
-   Detected as a dot-product pattern.
-   Should be vectorized on targets that support dot-product for unsigned chars.
-   This test currently fails to vectorize on targets that support dot-product 
-   of chars only when the accumulator is int.
-   */
-unsigned short
-foo2(int len) {
-  int i;
-  unsigned short result = 0;
-
-  for (i=0; i<len; i++) {
-    result += (unsigned short)(X[i] * Y[i]);
-  }
-  return result;
-}
-
-/* char->int->int dot product. 
-   Not detected as a dot-product.
-   Doesn't get vectorized due to presence of type converisons.  */
-unsigned int
-foo3(int len) {
-  int i;
-  unsigned int result = 0;
-
-  for (i=0; i<len; i++) {
-    result += (X[i] * Y[i]);
-  }
-  return result;
-}
-
-int main (void)
-{
-  unsigned int dot1, dot3;
-  unsigned short dot2;
-  int i;
-
-  check_vect ();
-
-  for (i=0; i<N; i++) {
-    X[i] = i;
-    Y[i] = 64-i;
-  }
-
-  dot1 = foo1 (N);
-  if (dot1 != DOT1)
-    abort ();
-
-  dot2 = foo2 (N);
-  if (dot2 != DOT2)
-    abort ();
-
-  dot3 = foo3 (N);
-  if (dot3 != DOT3)
-    abort ();
-
-  return 0;
-}
-
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 2 "vect" } } */
-
-/* When the vectorizer is enhanced to vectorize foo2 (accumulation into short) for 
-   targets that support accumulation into int (powerpc, ia64) we'd have:
-dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_udot_qi } }
-*/
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_udot_qi } } } */
-
-/* { dg-final { cleanup-tree-dump "vect" } } */
-
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c
new file mode 100644
index 00000000000..0c5cf78f2f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+#define DOT 43680
+
+unsigned char X[N] __attribute__ ((__aligned__(16)));
+unsigned char Y[N] __attribute__ ((__aligned__(16)));
+
+/* char->short->int dot product. 
+   Detected as a dot-product pattern.
+   Should be vectorized on targets that support dot-product for unsigned chars
+   (vect_udot_qi),
+   and on targets that support widening-multiplication and widening-summation
+   (vect_widen_mult_qi && vec_widen_sum_qi_to_si).
+   Widening-multiplication can also be supported by type promotion and non-widening 
+   multiplication (vect_unpack && vect_short_mult);
+   Widening summation can also be supported by type promotion and non-widening 
+   summation (vect_unpack).
+   */
+unsigned int
+foo (int len) {
+  int i;
+  unsigned int result = 0;
+  unsigned short prod;
+
+  for (i=0; i<len; i++) {
+    prod = X[i] * Y[i];
+    result += prod;
+  }
+  return result;
+}
+
+int main (void)
+{
+  unsigned int dot;
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  dot = foo (N);
+  if (dot != DOT)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_udot_qi } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi && vect_widen_sum_qi_to_si } } } } */
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c
new file mode 100644
index 00000000000..e3216a0b319
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c
@@ -0,0 +1,60 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+#define DOT 43680
+
+unsigned char X[N] __attribute__ ((__aligned__(16)));
+unsigned char Y[N] __attribute__ ((__aligned__(16)));
+
+/* char->short->short dot product. 
+   Detected as a dot-product pattern.
+   Should be vectorized on targets that support dot-product for unsigned chars,
+   but currently this test cannot be vectorized as a dot-product on targets
+   that support char->short->int dot-product. 
+   Alternatively, this test can be vectorized using vect_widen_mult_qi (or
+   vect_unpack and non-widening multplication: vect_unpack && vect_short_mult).
+   */
+unsigned short
+foo (int len) {
+  int i;
+  unsigned short result = 0;
+
+  for (i=0; i<len; i++) {
+    result += (unsigned short)(X[i] * Y[i]);
+  }
+  return result;
+}
+
+int main (void)
+{
+  unsigned short dot;
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  dot = foo (N);
+  if (dot != DOT)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
+
+/* When the vectorizer is enhanced to vectorize accumulation into short for 
+   targets that support accumulation into int (powerpc, ia64) we'd have:
+dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_udot_qi || vect_widen_mult_qi_to_hi } }
+*/
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target vect_widen_mult_qi_to_hi} } } */
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
new file mode 100644
index 00000000000..23baa20bded
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+short X[N] __attribute__ ((__aligned__(16)));
+short Y[N] __attribute__ ((__aligned__(16)));
+int result[N];
+
+/* short->int widening-mult */
+int
+foo1(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    result[i] = X[i] * Y[i];
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (result[i] != X[i] * Y[i])
+      abort ();
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
new file mode 100644
index 00000000000..626b22f0565
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+signed char X[N] __attribute__ ((__aligned__(16)));
+signed char Y[N] __attribute__ ((__aligned__(16)));
+short result[N];
+
+/* char->short widening-mult */
+int
+foo1(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    result[i] = X[i] * Y[i];
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (result[i] != X[i] * Y[i])
+      abort ();
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_qi_to_hi } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-sum.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-sum.c
new file mode 100644
index 00000000000..668b20c478b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-sum.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define SUM 0 
+ 
+/* Require widening-mult or data-unpacking (for the type promotion).  */
+int
+main1 (short *in, int off, short scale, int n)
+{
+ int i;
+ int sum = 0;
+
+ for (i = 0; i < n; i++) {
+   sum += ((int) in[i] * (int) in[i+off]) >> scale;
+ }
+
+ return sum;
+}
+
+int main (void)
+{
+  int i;
+  int sum;
+  short X[N];
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = 16-i; 
+  }
+
+  sum = main1 (X, 1, 16, N-1);
+  
+  if (sum != SUM)
+    abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
new file mode 100644
index 00000000000..3531d20f247
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+unsigned short X[N] __attribute__ ((__aligned__(16)));
+unsigned short Y[N] __attribute__ ((__aligned__(16)));
+unsigned int result[N];
+
+/* short->int widening-mult */
+int
+foo1(int len) {
+  int i;
+
+  /* Not vectorized because X[i] and Y[i] are casted to 'int'
+     so the widening multiplication pattern is not recognized.  */
+  for (i=0; i<len; i++) {
+    result[i] = (unsigned int)(X[i] * Y[i]);
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (result[i] != X[i] * Y[i])
+      abort ();
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
new file mode 100644
index 00000000000..214014196fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+unsigned char X[N] __attribute__ ((__aligned__(16)));
+unsigned char Y[N] __attribute__ ((__aligned__(16)));
+unsigned short result[N];
+
+/* char->short widening-mult */
+int
+foo1(int len) {
+  int i;
+
+  for (i=0; i<len; i++) {
+    result[i] = X[i] * Y[i];
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (result[i] != X[i] * Y[i])
+      abort ();
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_qi_to_hi } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8.c b/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8.c
deleted file mode 100644
index b11b9c70086..00000000000
--- a/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8.c
+++ /dev/null
@@ -1,108 +0,0 @@
-/* { dg-require-effective-target vect_int } */
-
-#include <stdarg.h>
-#include "tree-vect.h"
-
-#define N 64
-
-#define DOT1 43680
-#define DOT2 -21856
-#define DOT3 43680
-
-signed char X[N] __attribute__ ((__aligned__(16)));
-signed char Y[N] __attribute__ ((__aligned__(16)));
-
-/* char->short->int dot product.
-   The dot-product pattern should be detected.
-   Vectorizable on vect_sdot_qi targets (targets that support dot-product of 
-   signed chars).
-
-   In the future could also be vectorized as widening-mult + widening-summation,
-   or with type-conversion support.
- */
-int
-foo1(int len) {
-  int i;
-  int result = 0;
-  short prod;
-
-  for (i=0; i<len; i++) {
-    prod = X[i] * Y[i];
-    result += prod;
-  }
-  return result;
-}
-
-/* char->short->short dot product.
-   The dot-product pattern should be detected.
-   Should be vectorized on vect_sdot_qi targets (targets that support 
-   dot-product of signed char).  
-   This test currently fails to vectorize on targets that support
-   dot-product of chars when the accumulator is int.
-
-   In the future could also be vectorized as widening-mult + summation,
-   or with type-conversion support.
- */
-short
-foo2(int len) {
-  int i;
-  short result = 0;
-
-  for (i=0; i<len; i++) {
-    result += (X[i] * Y[i]);
-  }
-  return result;
-}
-
-/* char->int->int dot product. 
-   Not detected as a dot-product pattern.
-   Currently fails to be vectorized due to presence of type conversions. */
-int
-foo3(int len) {
-  int i;
-  int result = 0;
-
-  for (i=0; i<len; i++) {
-    result += (X[i] * Y[i]);
-  }
-  return result;
-}
-
-int main (void)
-{
-  int i, dot1, dot3;
-  short dot2;
-
-  check_vect ();
-
-  for (i=0; i<N; i++) {
-    X[i] = i;
-    Y[i] = 64-i;
-  }
-
-  dot1 = foo1 (N);
-  if (dot1 != DOT1)
-    abort ();
-
-  dot2 = foo2 (N);
-  if (dot2 != DOT2)
-    abort ();
-
-  dot3 = foo3 (N);
-  if (dot3 != DOT3)
-    abort ();
-
-  return 0;
-}
-
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 2 "vect" } } */
-
-/* When vectorizer is enhanced to vectorize foo2 (accumulation into short) for targets 
-   that support accumulation into int (ia64) we'd have:
-dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_sdot_qi } }
-*/
-/* In the meantime expect: */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_qi } } } */
-
-/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c b/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c
new file mode 100644
index 00000000000..724bb58b767
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c
@@ -0,0 +1,62 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+#define DOT -21856
+
+signed char X[N] __attribute__ ((__aligned__(16)));
+signed char Y[N] __attribute__ ((__aligned__(16)));
+
+/* char->short->short dot product.
+   The dot-product pattern should be detected.
+   Should be vectorized on vect_sdot_qi targets (targets that support 
+   dot-product of signed char).  
+   This test currently fails to vectorize on targets that support
+   dot-product of chars into and int accumulator.
+   Can also be vectorized as widening-mult + summation,
+   or with type-conversion support.
+ */
+short
+foo(int len) {
+  int i;
+  short result = 0;
+
+  for (i=0; i<len; i++) {
+    result += (X[i] * Y[i]);
+  }
+  return result;
+}
+
+int main (void)
+{
+  int i;
+  short dot;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  dot = foo (N);
+  if (dot != DOT)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
+
+/* When vectorizer is enhanced to vectorize accumulation into short for targets 
+   that support accumulation into int (e.g. ia64) we'd have:
+dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_qi } }
+*/
+/* In the meantime expect: */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_qi_to_hi } } } */
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 89bb9d1ddca..8e263a352d7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1491,18 +1491,19 @@ proc check_effective_target_vect_no_bitwise { } {
 
 # Return 1 if the target plus current options supports a vector
 # widening summation of *short* args into *int* result, 0 otherwise.
+# A target can also support this widening summation if it can support
+# promotion (unpacking) from shorts to ints.
 #
 # This won't change for different subtargets so cache the result.
                                                                                                 
 proc check_effective_target_vect_widen_sum_hi_to_si { } {
     global et_vect_widen_sum_hi_to_si
-                                                                                                
+
     if [info exists et_vect_widen_sum_hi_to_si_saved] {
         verbose "check_effective_target_vect_widen_sum_hi_to_si: using cached result" 2
     } else {
-        set et_vect_widen_sum_hi_to_si_saved 0
-        if { [istarget powerpc*-*-*]
-	     || [istarget ia64-*-*] } {
+        set et_vect_widen_sum_hi_to_si_saved [check_effective_target_vect_unpack]
+        if { [istarget powerpc*-*-*] } {
             set et_vect_widen_sum_hi_to_si_saved 1
         }
     }
@@ -1512,19 +1513,21 @@ proc check_effective_target_vect_widen_sum_hi_to_si { } {
 
 # Return 1 if the target plus current options supports a vector
 # widening summation of *char* args into *short* result, 0 otherwise.
+# A target can also support this widening summation if it can support
+# promotion (unpacking) from chars to shorts.
 #
 # This won't change for different subtargets so cache the result.
                                                                                                 
 proc check_effective_target_vect_widen_sum_qi_to_hi { } {
     global et_vect_widen_sum_qi_to_hi
-                                                                                                
+
     if [info exists et_vect_widen_sum_qi_to_hi_saved] {
         verbose "check_effective_target_vect_widen_sum_qi_to_hi: using cached result" 2
     } else {
         set et_vect_widen_sum_qi_to_hi_saved 0
-        if { [istarget ia64-*-*] } {
+	if { [check_effective_target_vect_unpack] } {
             set et_vect_widen_sum_qi_to_hi_saved 1
-        }
+	}
     }
     verbose "check_effective_target_vect_widen_sum_qi_to_hi: returning $et_vect_widen_sum_qi_to_hi_saved" 2
     return $et_vect_widen_sum_qi_to_hi_saved
@@ -1537,7 +1540,7 @@ proc check_effective_target_vect_widen_sum_qi_to_hi { } {
                                                                                                 
 proc check_effective_target_vect_widen_sum_qi_to_si { } {
     global et_vect_widen_sum_qi_to_si
-                                                                                                
+
     if [info exists et_vect_widen_sum_qi_to_si_saved] {
         verbose "check_effective_target_vect_widen_sum_qi_to_si: using cached result" 2
     } else {
@@ -1551,24 +1554,61 @@ proc check_effective_target_vect_widen_sum_qi_to_si { } {
 }
 
 # Return 1 if the target plus current options supports a vector
-# widening summation, 0 otherwise.
+# widening multiplication of *char* args into *short* result, 0 otherwise.
+# A target can also support this widening multplication if it can support
+# promotion (unpacking) from chars to shorts, and vect_short_mult (non-widening
+# multiplication of shorts).
 #
 # This won't change for different subtargets so cache the result.
-                                                                                                
-proc check_effective_target_vect_widen_sum { } {
-    global et_vect_widen_sum
-                                                                                                
-    if [info exists et_vect_widen_sum_saved] {
-        verbose "check_effective_target_vect_widen_sum: using cached result" 2
+
+
+proc check_effective_target_vect_widen_mult_qi_to_hi { } {
+    global et_vect_widen_mult_qi_to_hi
+
+    if [info exists et_vect_widen_mult_qi_to_hi_saved] {
+        verbose "check_effective_target_vect_widen_mult_qi_to_hi: using cached result" 2
     } else {
-        set et_vect_widen_sum_saved 0
-        if { [istarget powerpc*-*-*]
-	     || [istarget ia64-*-*] } {
-            set et_vect_widen_sum_saved 1
+	if { [check_effective_target_vect_unpack]
+	     && [check_effective_target_vect_short_mult] } {
+	    set et_vect_widen_mult_qi_to_hi_saved 1
+	} else {
+	    set et_vect_widen_mult_qi_to_hi_saved 0
+	}
+        if { [istarget powerpc*-*-*] } {
+            set et_vect_widen_mult_qi_to_hi_saved 1
         }
     }
-    verbose "check_effective_target_vect_widen_sum: returning $et_vect_widen_sum_saved" 2
-    return $et_vect_widen_sum_saved
+    verbose "check_effective_target_vect_widen_mult_qi_to_hi: returning $et_vect_widen_mult_qi_to_hi_saved" 2
+    return $et_vect_widen_mult_qi_to_hi_saved
+}
+
+# Return 1 if the target plus current options supports a vector
+# widening multiplication of *short* args into *int* result, 0 otherwise.
+# A target can also support this widening multplication if it can support
+# promotion (unpacking) from shorts to ints, and vect_int_mult (non-widening
+# multiplication of ints).
+#
+# This won't change for different subtargets so cache the result.
+
+
+proc check_effective_target_vect_widen_mult_hi_to_si { } {
+    global et_vect_widen_mult_hi_to_si
+
+    if [info exists et_vect_widen_mult_hi_to_si_saved] {
+        verbose "check_effective_target_vect_widen_mult_hi_to_si: using cached result" 2
+    } else {
+        if { [check_effective_target_vect_unpack]
+             && [check_effective_target_vect_int_mult] } {
+          set et_vect_widen_mult_hi_to_si_saved 1
+        } else {
+          set et_vect_widen_mult_hi_to_si_saved 0
+        }
+        if { [istarget powerpc*-*-*] } {
+            set et_vect_widen_mult_hi_to_si_saved 1
+        }
+    }
+    verbose "check_effective_target_vect_widen_mult_hi_to_si: returning $et_vect_widen_mult_hi_to_si_saved" 2
+    return $et_vect_widen_mult_hi_to_si_saved
 }
 
 # Return 1 if the target plus current options supports a vector
@@ -1583,9 +1623,6 @@ proc check_effective_target_vect_sdot_qi { } {
         verbose "check_effective_target_vect_sdot_qi: using cached result" 2
     } else {
         set et_vect_sdot_qi_saved 0
-        if { [istarget ia64-*-*] } {
-            set et_vect_sdot_qi_saved 1
-        }
     }
     verbose "check_effective_target_vect_sdot_qi: returning $et_vect_sdot_qi_saved" 2
     return $et_vect_sdot_qi_saved
@@ -1603,8 +1640,7 @@ proc check_effective_target_vect_udot_qi { } {
         verbose "check_effective_target_vect_udot_qi: using cached result" 2
     } else {
         set et_vect_udot_qi_saved 0
-        if { [istarget powerpc*-*-*]
-             || [istarget ia64-*-*] } {
+        if { [istarget powerpc*-*-*] } {
             set et_vect_udot_qi_saved 1
         }
     }
@@ -1626,8 +1662,7 @@ proc check_effective_target_vect_sdot_hi { } {
         set et_vect_sdot_hi_saved 0
         if { [istarget powerpc*-*-*] 
 	     || [istarget i?86-*-*]
-             || [istarget x86_64-*-*]
-             || [istarget ia64-*-*] } {
+             || [istarget x86_64-*-*] } {
             set et_vect_sdot_hi_saved 1
         }
     }
@@ -1656,6 +1691,51 @@ proc check_effective_target_vect_udot_hi { } {
 }
 
 
+# Return 1 if the target plus current options supports a vector
+# demotion (packing) of shorts (to chars) and ints (to shorts) 
+# using modulo arithmetic, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+                                                                                
+proc check_effective_target_vect_pack_mod { } {
+    global et_vect_pack_mod
+                                                                                
+    if [info exists et_vect_pack_mod_saved] {
+        verbose "check_effective_target_vect_pack_mod: using cached result" 2
+    } else {
+        set et_vect_pack_mod_saved 0
+        if { [istarget powerpc*-*-*]
+             || [istarget i?86-*-*]
+             || [istarget x86_64-*-*] } {
+            set et_vect_pack_mod_saved 1
+        }
+    }
+    verbose "check_effective_target_vect_pack_mod: returning $et_vect_pack_mod_saved" 2
+    return $et_vect_pack_mod_saved
+}
+
+# Return 1 if the target plus current options supports a vector
+# promotion (unpacking) of chars (to shorts) and shorts (to ints), 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+                                   
+proc check_effective_target_vect_unpack { } {
+    global et_vect_unpack
+                                        
+    if [info exists et_vect_unpack_saved] {
+        verbose "check_effective_target_vect_unpack: using cached result" 2
+    } else {
+        set et_vect_unpack_saved 0
+        if { [istarget powerpc*-*-*]
+             || [istarget i?86-*-*]
+             || [istarget x86_64-*-*] } {
+            set et_vect_unpack_saved 1
+        }
+    }
+    verbose "check_effective_target_vect_unpack: returning $et_vect_unpack_saved" 2  
+    return $et_vect_unpack_saved
+}
+
 # Return 1 if the target plus current options does not support a vector
 # alignment mechanism, 0 otherwise.
 #
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 61b1dab954d..07fae35c909 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -1765,6 +1765,12 @@ estimate_num_insns_1 (tree *tp, int *walk_subtrees, void *data)
     case REDUC_PLUS_EXPR:
     case WIDEN_SUM_EXPR:
     case DOT_PROD_EXPR: 
+    case VEC_WIDEN_MULT_HI_EXPR:
+    case VEC_WIDEN_MULT_LO_EXPR:
+    case VEC_UNPACK_HI_EXPR:
+    case VEC_UNPACK_LO_EXPR:
+    case VEC_PACK_MOD_EXPR:
+    case VEC_PACK_SAT_EXPR:
 
     case WIDEN_MULT_EXPR:
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index dc846c842c1..ddc84f374de 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1702,9 +1702,9 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
       dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
-      pp_string (buffer, " , ");
+      pp_string (buffer, ", ");
       dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
-      pp_string (buffer, " , ");
+      pp_string (buffer, ", ");
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
@@ -1863,6 +1863,50 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
       pp_string (buffer, " > ");
       break;
 
+    case VEC_WIDEN_MULT_HI_EXPR:
+      pp_string (buffer, " VEC_WIDEN_MULT_HI_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case VEC_WIDEN_MULT_LO_EXPR:
+      pp_string (buffer, " VEC_WIDEN_MULT_LO_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case VEC_UNPACK_HI_EXPR:
+      pp_string (buffer, " VEC_UNPACK_HI_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case VEC_UNPACK_LO_EXPR:
+      pp_string (buffer, " VEC_UNPACK_LO_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case VEC_PACK_MOD_EXPR:
+      pp_string (buffer, " VEC_PACK_MOD_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+                                                                                
+    case VEC_PACK_SAT_EXPR:
+      pp_string (buffer, " VEC_PACK_SAT_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+                                                                                
     case BLOCK:
       {
 	tree t;
@@ -2165,6 +2209,8 @@ op_prio (tree op)
     case MINUS_EXPR:
       return 12;
 
+    case VEC_WIDEN_MULT_HI_EXPR:
+    case VEC_WIDEN_MULT_LO_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
     case MULT_EXPR:
@@ -2218,6 +2264,10 @@ op_prio (tree op)
     case REDUC_PLUS_EXPR:
     case VEC_LSHIFT_EXPR:
     case VEC_RSHIFT_EXPR:
+    case VEC_UNPACK_HI_EXPR:
+    case VEC_UNPACK_LO_EXPR:
+    case VEC_PACK_MOD_EXPR:
+    case VEC_PACK_SAT_EXPR:
       return 16;
 
     case SAVE_EXPR:
diff --git a/gcc/tree-vect-analyze.c b/gcc/tree-vect-analyze.c
index f247cd9ab31..8151c74d92d 100644
--- a/gcc/tree-vect-analyze.c
+++ b/gcc/tree-vect-analyze.c
@@ -54,8 +54,6 @@ static bool vect_determine_vectorization_factor (loop_vec_info);
 
 /* Utility functions for the analyses.  */
 static bool exist_non_indexing_operands_for_use_p (tree, tree);
-static void vect_mark_relevant (VEC(tree,heap) **, tree, bool, bool);
-static bool vect_stmt_relevant_p (tree, loop_vec_info, bool *, bool *);
 static tree vect_get_loop_niters (struct loop *, tree *);
 static bool vect_analyze_data_ref_dependence
   (struct data_dependence_relation *, loop_vec_info);
@@ -187,22 +185,9 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "nunits = %d", nunits);
 
-          if (vectorization_factor)
-            {
-              /* FORNOW: don't allow mixed units. 
-                 This restriction will be relaxed in the future.  */
-              if (nunits != vectorization_factor) 
-                {
-                  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
-                    fprintf (vect_dump, "not vectorized: mixed data-types");
-                  return false;
-                }
-            }
-          else
+	  if (!vectorization_factor
+              || (nunits > vectorization_factor))
             vectorization_factor = nunits;
-
-          gcc_assert (GET_MODE_SIZE (TYPE_MODE (scalar_type))
-                        * vectorization_factor == UNITS_PER_SIMD_WORD);
         }
     }
 
@@ -310,7 +295,9 @@ vect_analyze_operations (loop_vec_info loop_vinfo)
               gcc_assert (!VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (stmt))));
               gcc_assert (STMT_VINFO_VECTYPE (stmt_info));
 
-	      ok = (vectorizable_operation (stmt, NULL, NULL)
+	      ok = (vectorizable_type_promotion (stmt, NULL, NULL)
+		    || vectorizable_type_demotion (stmt, NULL, NULL)
+		    || vectorizable_operation (stmt, NULL, NULL)
 		    || vectorizable_assignment (stmt, NULL, NULL)
 		    || vectorizable_load (stmt, NULL, NULL)
 		    || vectorizable_store (stmt, NULL, NULL)
@@ -588,6 +575,8 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   struct data_reference *drb = DDR_B (ddr);
   stmt_vec_info stmtinfo_a = vinfo_for_stmt (DR_STMT (dra)); 
   stmt_vec_info stmtinfo_b = vinfo_for_stmt (DR_STMT (drb));
+  int dra_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dra))));
+  int drb_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (drb))));
   lambda_vector dist_v;
   unsigned int loop_depth;
          
@@ -628,7 +617,7 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
 	fprintf (vect_dump, "dependence distance  = %d.", dist);
 
       /* Same loop iteration.  */
-      if (dist % vectorization_factor == 0)
+      if (dist % vectorization_factor == 0 && dra_size == drb_size)
 	{
 	  /* Two references with distance zero have the same alignment.  */
 	  VEC_safe_push (dr_p, heap, STMT_VINFO_SAME_ALIGN_REFS (stmtinfo_a), drb);
@@ -837,12 +826,15 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
                                    struct data_reference *dr_peel, int npeel)
 {
   unsigned int i;
-  int drsize;
   VEC(dr_p,heap) *same_align_drs;
   struct data_reference *current_dr;
+  int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr))));
+  int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel))));
 
   if (known_alignment_for_access_p (dr)
-      && DR_MISALIGNMENT (dr) == DR_MISALIGNMENT (dr_peel))
+      && known_alignment_for_access_p (dr_peel)
+      && (DR_MISALIGNMENT (dr)/dr_size == 
+	  DR_MISALIGNMENT (dr_peel)/dr_peel_size))
     {
       DR_MISALIGNMENT (dr) = 0;
       return;
@@ -856,7 +848,8 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
     {
       if (current_dr != dr)
         continue;
-      gcc_assert (DR_MISALIGNMENT (dr) == DR_MISALIGNMENT (dr_peel));
+      gcc_assert (DR_MISALIGNMENT (dr)/dr_size == 
+		  DR_MISALIGNMENT (dr_peel)/dr_peel_size);
       DR_MISALIGNMENT (dr) = 0;
       return;
     }
@@ -864,12 +857,13 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
   if (known_alignment_for_access_p (dr)
       && known_alignment_for_access_p (dr_peel))
     {  
-      drsize = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr))));
-      DR_MISALIGNMENT (dr) += npeel * drsize;
+      DR_MISALIGNMENT (dr) += npeel * dr_size;
       DR_MISALIGNMENT (dr) %= UNITS_PER_SIMD_WORD;
       return;
     }
 
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "Setting misalignment to -1.");
   DR_MISALIGNMENT (dr) = -1;
 }
 
@@ -1014,6 +1008,9 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   bool do_versioning = false;
   bool stat;
 
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
+
   /* While cost model enhancements are expected in the future, the high level
      view of the code at this time is as follows:
 
@@ -1080,6 +1077,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
           mis = DR_MISALIGNMENT (dr0);
           mis /= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr0))));
           npeel = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - mis;
+          if (vect_print_dump_info (REPORT_DETAILS))
+            fprintf (vect_dump, "Try peeling by %d",npeel);
         }
 
       /* Ensure that all data refs can be vectorized after the peel.  */
@@ -1423,14 +1422,14 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo)
 
 static void
 vect_mark_relevant (VEC(tree,heap) **worklist, tree stmt,
-		    bool relevant_p, bool live_p)
+		    enum vect_relevant relevant, bool live_p)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  bool save_relevant_p = STMT_VINFO_RELEVANT_P (stmt_info);
+  enum vect_relevant save_relevant = STMT_VINFO_RELEVANT (stmt_info);
   bool save_live_p = STMT_VINFO_LIVE_P (stmt_info);
 
   if (vect_print_dump_info (REPORT_DETAILS))
-    fprintf (vect_dump, "mark relevant %d, live %d.",relevant_p, live_p);
+    fprintf (vect_dump, "mark relevant %d, live %d.", relevant, live_p);
 
   if (STMT_VINFO_IN_PATTERN_P (stmt_info))
     {
@@ -1445,20 +1444,21 @@ vect_mark_relevant (VEC(tree,heap) **worklist, tree stmt,
       pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
       stmt_info = vinfo_for_stmt (pattern_stmt);
       gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
-      save_relevant_p = STMT_VINFO_RELEVANT_P (stmt_info);
+      save_relevant = STMT_VINFO_RELEVANT (stmt_info);
       save_live_p = STMT_VINFO_LIVE_P (stmt_info);
       stmt = pattern_stmt;
     }
 
   STMT_VINFO_LIVE_P (stmt_info) |= live_p;
-  STMT_VINFO_RELEVANT_P (stmt_info) |= relevant_p;
+  if (relevant > STMT_VINFO_RELEVANT (stmt_info))
+    STMT_VINFO_RELEVANT (stmt_info) = relevant;
 
   if (TREE_CODE (stmt) == PHI_NODE)
     /* Don't put phi-nodes in the worklist. Phis that are marked relevant
        or live will fail vectorization later on.  */
     return;
 
-  if (STMT_VINFO_RELEVANT_P (stmt_info) == save_relevant_p
+  if (STMT_VINFO_RELEVANT (stmt_info) == save_relevant
       && STMT_VINFO_LIVE_P (stmt_info) == save_live_p)
     {
       if (vect_print_dump_info (REPORT_DETAILS))
@@ -1484,7 +1484,7 @@ vect_mark_relevant (VEC(tree,heap) **worklist, tree stmt,
 
 static bool
 vect_stmt_relevant_p (tree stmt, loop_vec_info loop_vinfo,
-		      bool *relevant_p, bool *live_p)
+		      enum vect_relevant *relevant, bool *live_p)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   ssa_op_iter op_iter;
@@ -1492,12 +1492,12 @@ vect_stmt_relevant_p (tree stmt, loop_vec_info loop_vinfo,
   use_operand_p use_p;
   def_operand_p def_p;
 
-  *relevant_p = false;
+  *relevant = vect_unused_in_loop;
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
   if (is_ctrl_stmt (stmt) && (stmt != LOOP_VINFO_EXIT_COND (loop_vinfo)))
-    *relevant_p = true;
+    *relevant = vect_used_in_loop;
 
   /* changing memory.  */
   if (TREE_CODE (stmt) != PHI_NODE)
@@ -1505,7 +1505,7 @@ vect_stmt_relevant_p (tree stmt, loop_vec_info loop_vinfo,
       {
 	if (vect_print_dump_info (REPORT_DETAILS))
 	  fprintf (vect_dump, "vec_stmt_relevant_p: stmt has vdefs.");
-	*relevant_p = true;
+	*relevant = vect_used_in_loop;
       }
 
   /* uses outside the loop.  */
@@ -1529,7 +1529,7 @@ vect_stmt_relevant_p (tree stmt, loop_vec_info loop_vinfo,
 	}
     }
 
-  return (*live_p || *relevant_p);
+  return (*live_p || *relevant);
 }
 
 
@@ -1564,7 +1564,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
   stmt_vec_info stmt_vinfo;
   basic_block bb;
   tree phi;
-  bool relevant_p, live_p;
+  bool live_p;
+  enum vect_relevant relevant;
   tree def, def_stmt;
   enum vect_def_type dt;
 
@@ -1584,8 +1585,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
           print_generic_expr (vect_dump, phi, TDF_SLIM);
         }
 
-      if (vect_stmt_relevant_p (phi, loop_vinfo, &relevant_p, &live_p))
-	vect_mark_relevant (&worklist, phi, relevant_p, live_p);
+      if (vect_stmt_relevant_p (phi, loop_vinfo, &relevant, &live_p))
+	vect_mark_relevant (&worklist, phi, relevant, live_p);
     }
 
   for (i = 0; i < nbbs; i++)
@@ -1601,8 +1602,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
 	      print_generic_expr (vect_dump, stmt, TDF_SLIM);
 	    } 
 
-	  if (vect_stmt_relevant_p (stmt, loop_vinfo, &relevant_p, &live_p))
-            vect_mark_relevant (&worklist, stmt, relevant_p, live_p);
+	  if (vect_stmt_relevant_p (stmt, loop_vinfo, &relevant, &live_p))
+            vect_mark_relevant (&worklist, stmt, relevant, live_p);
 	}
     }
 
@@ -1619,7 +1620,7 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
           print_generic_expr (vect_dump, stmt, TDF_SLIM);
 	}
 
-      /* Examine the USEs of STMT. For each ssa-name USE thta is defined
+      /* Examine the USEs of STMT. For each ssa-name USE that is defined
          in the loop, mark the stmt that defines it (DEF_STMT) as
          relevant/irrelevant and live/dead according to the liveness and
          relevance properties of STMT.
@@ -1630,13 +1631,13 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
       ann = stmt_ann (stmt);
       stmt_vinfo = vinfo_for_stmt (stmt);
 
-      relevant_p = STMT_VINFO_RELEVANT_P (stmt_vinfo);
+      relevant = STMT_VINFO_RELEVANT (stmt_vinfo);
       live_p = STMT_VINFO_LIVE_P (stmt_vinfo);
 
       /* Generally, the liveness and relevance properties of STMT are
          propagated to the DEF_STMTs of its USEs:
              STMT_VINFO_LIVE_P (DEF_STMT_info) <-- live_p
-             STMT_VINFO_RELEVANT_P (DEF_STMT_info) <-- relevant_p
+             STMT_VINFO_RELEVANT (DEF_STMT_info) <-- relevant
 
          Exceptions:
 
@@ -1659,18 +1660,22 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
              the def_stmt of these uses we want to set liveness/relevance
              as follows:
                STMT_VINFO_LIVE_P (DEF_STMT_info) <-- false
-               STMT_VINFO_RELEVANT_P (DEF_STMT_info) <-- true
+               STMT_VINFO_RELEVANT (DEF_STMT_info) <-- vect_used_by_reduction
              because even though STMT is classified as live (since it defines a
              value that is used across loop iterations) and irrelevant (since it
              is not used inside the loop), it will be vectorized, and therefore
              the corresponding DEF_STMTs need to marked as relevant.
+	     We distinguish between two kinds of relevant stmts - those that are
+	     used by a reduction conputation, and those that are (also) used by 	     a regular computation. This allows us later on to identify stmts 
+	     that are used solely by a reduction, and therefore the order of 
+	     the results that they produce does not have to be kept.
        */
 
       /* case 2.2:  */
       if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
         {
-          gcc_assert (!relevant_p && live_p);
-          relevant_p = true;
+          gcc_assert (relevant == vect_unused_in_loop && live_p);
+          relevant = vect_used_by_reduction;
           live_p = false;
         }
 
@@ -1710,7 +1715,7 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
 	      && TREE_CODE (def_stmt) == PHI_NODE)
 	    continue;
 
-	  vect_mark_relevant (&worklist, def_stmt, relevant_p, live_p);
+	  vect_mark_relevant (&worklist, def_stmt, relevant, live_p);
 	}
     }				/* while worklist */
 
@@ -1738,7 +1743,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
   /* Analyze phi functions of the loop header.  */
 
   if (vect_print_dump_info (REPORT_DETAILS))
-    fprintf (vect_dump, "=== vect_can_advance_ivs_p ===");
+    fprintf (vect_dump, "vect_can_advance_ivs_p:");
 
   for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
     {
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index b1e567aa870..d1ac94f898f 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -411,9 +411,15 @@ expand_vector_operations_1 (block_stmt_iterator *bsi)
   gcc_assert (code != CONVERT_EXPR);
   op = optab_for_tree_code (code, type);
 
-  /* For widening vector operations, the relevant type is of the arguments,
-     not the widened result.  */
-  if (code == WIDEN_SUM_EXPR)
+  /* For widening/narrowgin vector operations, the relevant type is of the 
+     arguments, not the widened result.  */
+  if (code == WIDEN_SUM_EXPR
+      || code == VEC_WIDEN_MULT_HI_EXPR
+      || code == VEC_WIDEN_MULT_LO_EXPR
+      || code == VEC_UNPACK_HI_EXPR
+      || code == VEC_UNPACK_LO_EXPR
+      || code == VEC_PACK_MOD_EXPR
+      || code == VEC_PACK_SAT_EXPR)
     type = TREE_TYPE (TREE_OPERAND (rhs, 0));
 
   /* Optabs will try converting a negation into a subtraction, so
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 9ac11b07bd3..67ab90f1eec 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -334,12 +334,69 @@ vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
 */
 
 static tree
-vect_recog_widen_mult_pattern (tree last_stmt ATTRIBUTE_UNUSED, 
-			       tree *type_in ATTRIBUTE_UNUSED, 
-			       tree *type_out ATTRIBUTE_UNUSED)
+vect_recog_widen_mult_pattern (tree last_stmt, 
+			       tree *type_in, 
+			       tree *type_out)
 {
-  /* Yet to be implemented.   */
-  return NULL;
+  tree expr;
+  tree def_stmt0, def_stmt1;
+  tree oprnd0, oprnd1;
+  tree type, half_type0, half_type1;
+  tree pattern_expr;
+  tree vectype;
+  tree dummy;
+  enum tree_code dummy_code;
+
+  if (TREE_CODE (last_stmt) != MODIFY_EXPR)
+    return NULL;
+
+  expr = TREE_OPERAND (last_stmt, 1);
+  type = TREE_TYPE (expr);
+
+  /* Starting from LAST_STMT, follow the defs of its uses in search
+     of the above pattern.  */
+
+  if (TREE_CODE (expr) != MULT_EXPR)
+    return NULL;
+
+  oprnd0 = TREE_OPERAND (expr, 0);
+  oprnd1 = TREE_OPERAND (expr, 1);
+  if (TYPE_MAIN_VARIANT (TREE_TYPE (oprnd0)) != TYPE_MAIN_VARIANT (type)
+      || TYPE_MAIN_VARIANT (TREE_TYPE (oprnd1)) != TYPE_MAIN_VARIANT (type))
+    return NULL;
+
+  /* Check argument 0 */
+  if (!widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0))
+    return NULL;
+  oprnd0 = TREE_OPERAND (TREE_OPERAND (def_stmt0, 1), 0);
+
+  /* Check argument 1 */
+  if (!widened_name_p (oprnd1, last_stmt, &half_type1, &def_stmt1))
+    return NULL;
+  oprnd1 = TREE_OPERAND (TREE_OPERAND (def_stmt1, 1), 0);
+
+  if (TYPE_MAIN_VARIANT (half_type0) != TYPE_MAIN_VARIANT (half_type1))
+    return NULL;
+
+  /* Pattern detected.  */
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_recog_widen_mult_pattern: detected: ");
+
+  /* Check target support  */
+  vectype = get_vectype_for_scalar_type (half_type0);
+  if (!supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt, vectype,
+                                       &dummy, &dummy, &dummy_code,
+                                       &dummy_code))
+    return NULL;
+
+  *type_in = vectype;
+  *type_out = NULL_TREE;
+
+  /* Pattern supported. Create a stmt to be used to replace the pattern: */
+  pattern_expr = build2 (WIDEN_MULT_EXPR, type, oprnd0, oprnd1);
+  if (vect_print_dump_info (REPORT_DETAILS))
+    print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
+  return pattern_expr;
 }
 
 
diff --git a/gcc/tree-vect-transform.c b/gcc/tree-vect-transform.c
index ee17fa443bf..19097fd8c3e 100644
--- a/gcc/tree-vect-transform.c
+++ b/gcc/tree-vect-transform.c
@@ -47,11 +47,11 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
 
 /* Utility functions for the code transformation.  */
 static bool vect_transform_stmt (tree, block_stmt_iterator *);
-static void vect_align_data_ref (tree);
 static tree vect_create_destination_var (tree, tree);
 static tree vect_create_data_ref_ptr 
-  (tree, block_stmt_iterator *, tree, tree *, bool); 
+  (tree, block_stmt_iterator *, tree, tree *, tree *, bool); 
 static tree vect_create_addr_base_for_vector_ref (tree, tree *, tree);
+static tree vect_setup_realignment (tree, block_stmt_iterator *, tree *);
 static tree vect_get_new_vect_var (tree, enum vect_var_kind, const char *);
 static tree vect_get_vec_def_for_operand (tree, tree, tree *);
 static tree vect_init_vector (tree, tree);
@@ -191,30 +191,14 @@ vect_create_addr_base_for_vector_ref (tree stmt,
 }
 
 
-/* Function vect_align_data_ref.
-
-   Handle misalignment of a memory accesses.
-
-   FORNOW: Can't handle misaligned accesses. 
-   Make sure that the dataref is aligned.  */
-
-static void
-vect_align_data_ref (tree stmt)
-{
-  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
-
-  /* FORNOW: can't handle misaligned accesses; 
-             all accesses expected to be aligned.  */
-  gcc_assert (aligned_access_p (dr));
-}
-
-
 /* Function vect_create_data_ref_ptr.
 
-   Create a memory reference expression for vector access, to be used in a
-   vector load/store stmt. The reference is based on a new pointer to vector
-   type (vp).
+   Create a new pointer to vector type (vp), that points to the first location
+   accessed in the loop by STMT, along with the def-use update chain to 
+   appropriately advance the pointer through the loop iterations. Also set
+   aliasing information for the pointer.  This vector pointer is used by the
+   callers to this function to create a memory reference expression for vector 
+   load/store access.
 
    Input:
    1. STMT: a stmt that references memory. Expected to be of the form
@@ -240,17 +224,18 @@ vect_align_data_ref (tree stmt)
 
       Return the initial_address in INITIAL_ADDRESS.
 
-   2. If ONLY_INIT is true, return the initial pointer.  Otherwise, create
-      a data-reference in the loop based on the new vector pointer vp.  This
-      new data reference will by some means be updated each iteration of
-      the loop.  Return the pointer vp'.
+   2. If ONLY_INIT is true, just return the initial pointer.  Otherwise, also
+      update the pointer in each iteration of the loop.  
 
-   FORNOW: handle only aligned and consecutive accesses.  */
+      Return the increment stmt that updates the pointer in PTR_INCR.
+
+   3. Return the pointer.  */
 
 static tree
 vect_create_data_ref_ptr (tree stmt,
 			  block_stmt_iterator *bsi ATTRIBUTE_UNUSED,
-			  tree offset, tree *initial_address, bool only_init)
+			  tree offset, tree *initial_address, tree *ptr_incr,
+			  bool only_init)
 {
   tree base_name;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
@@ -362,12 +347,85 @@ vect_create_data_ref_ptr (tree stmt,
 	}
       merge_alias_info (vect_ptr_init, indx_before_incr);
       merge_alias_info (vect_ptr_init, indx_after_incr);
+      if (ptr_incr)
+	*ptr_incr = incr;
 
       return indx_before_incr;
     }
 }
 
 
+/* Function bump_vector_ptr
+
+   Increment a pointer (to a vector type) by vector-size. Connect the new 
+   increment stmt to the exising def-use update-chain of the pointer.
+
+   The pointer def-use update-chain before this function:
+                        DATAREF_PTR = phi (p_0, p_2)
+                        ....
+        PTR_INCR:       p_2 = DATAREF_PTR + step 
+
+   The pointer def-use update-chain after this function:
+                        DATAREF_PTR = phi (p_0, p_2)
+                        ....
+                        NEW_DATAREF_PTR = DATAREF_PTR + vector_size
+                        ....
+        PTR_INCR:       p_2 = NEW_DATAREF_PTR + step
+
+   Input:
+   DATAREF_PTR - ssa_name of a pointer (to vector type) that is being updated 
+                 in the loop.
+   PTR_INCR - the stmt that updates the pointer in each iteration of the loop.
+              The increment amount across iterations is also expected to be
+              vector_size.      
+   BSI - location where the new update stmt is to be placed.
+   STMT - the original scalar memory-access stmt that is being vectorized.
+
+   Output: Return NEW_DATAREF_PTR as illustrated above.
+   
+*/
+
+static tree
+bump_vector_ptr (tree dataref_ptr, tree ptr_incr, block_stmt_iterator *bsi,
+                 tree stmt)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vptr_type = TREE_TYPE (dataref_ptr);
+  tree ptr_var = SSA_NAME_VAR (dataref_ptr);
+  tree update = fold_convert (vptr_type, TYPE_SIZE_UNIT (vectype));
+  tree incr_stmt;
+  ssa_op_iter iter;
+  use_operand_p use_p;
+  tree new_dataref_ptr;
+
+  incr_stmt = build2 (MODIFY_EXPR, void_type_node, ptr_var,
+                build2 (PLUS_EXPR, vptr_type, dataref_ptr, update));
+  new_dataref_ptr = make_ssa_name (ptr_var, incr_stmt);
+  TREE_OPERAND (incr_stmt, 0) = new_dataref_ptr;
+  vect_finish_stmt_generation (stmt, incr_stmt, bsi);
+
+  /* Update the vector-pointer's cross-iteration increment.  */
+  FOR_EACH_SSA_USE_OPERAND (use_p, ptr_incr, iter, SSA_OP_USE)
+    {
+      tree use = USE_FROM_PTR (use_p);
+
+      if (use == dataref_ptr)
+        SET_USE (use_p, new_dataref_ptr);
+      else
+        gcc_assert (tree_int_cst_compare (use, update) == 0);
+    }
+
+  /* Copy the points-to information if it exists. */
+  if (DR_PTR_INFO (dr))
+    duplicate_ssa_name_ptr_info (new_dataref_ptr, DR_PTR_INFO (dr));
+  merge_alias_info (new_dataref_ptr, dataref_ptr);
+
+  return new_dataref_ptr;
+}
+
+
 /* Function vect_create_destination_var.
 
    Create a new temporary of type VECTYPE.  */
@@ -568,14 +626,99 @@ vect_get_vec_def_for_operand (tree op, tree stmt, tree *scalar_def)
 }
 
 
+/* Function vect_get_vec_def_for_stmt_copy
+
+   Return a vector-def for an operand. This function is used when the 
+   vectorized stmt to be created (by the caller to this function) is a "copy" 
+   created in case the vectorized result cannot fit in one vector, and several 
+   copies of the vector-stmt are required. In this case the vector-def is 
+   retrieved from the vector stmt recorded in the STMT_VINFO_RELATED_STMT field 
+   of the stmt that defines VEC_OPRND. 
+   DT is the type of the vector def VEC_OPRND.
+
+   Context:
+        In case the vectorization factor (VF) is bigger than the number
+   of elements that can fit in a vectype (nunits), we have to generate
+   more than one vector stmt to vectorize the scalar stmt. This situation
+   arises when there are multiple data-types operated upon in the loop; the 
+   smallest data-type determines the VF, and as a result, when vectorizing
+   stmts operating on wider types we need to create 'VF/nunits' "copies" of the
+   vector stmt (each computing a vector of 'nunits' results, and together
+   computing 'VF' results in each iteration).  This function is called when 
+   vectorizing such a stmt (e.g. vectorizing S2 in the illusration below, in 
+   which VF=16 and nuniti=4, so the number of copies required is 4):
+
+   scalar stmt:         vectorized into:        STMT_VINFO_RELATED_STMT
+ 
+   S1: x = load         VS1.0:  vx.0 = memref0      VS1.1
+                        VS1.1:  vx.1 = memref1      VS1.2
+                        VS1.2:  vx.2 = memref2      VS1.3
+                        VS1.3:  vx.3 = memref3 
+
+   S2: z = x + ...      VSnew.0:  vz0 = vx.0 + ...  VSnew.1
+                        VSnew.1:  vz1 = vx.1 + ...  VSnew.2
+                        VSnew.2:  vz2 = vx.2 + ...  VSnew.3
+                        VSnew.3:  vz3 = vx.3 + ...
+
+   The vectorization of S1 is explained in vectorizable_load.
+   The vectorization of S2:
+        To create the first vector-stmt out of the 4 copies - VSnew.0 - 
+   the function 'vect_get_vec_def_for_operand' is called to 
+   get the relevant vector-def for each operand of S2. For operand x it
+   returns  the vector-def 'vx.0'.
+
+        To create the remaining copies of the vector-stmt (VSnew.j), this 
+   function is called to get the relevant vector-def for each operand.  It is 
+   obtained from the respective VS1.j stmt, which is recorded in the 
+   STMT_VINFO_RELATED_STMT field of the stmt that defines VEC_OPRND.
+
+        For example, to obtain the vector-def 'vx.1' in order to create the 
+   vector stmt 'VSnew.1', this function is called with VEC_OPRND='vx.0'. 
+   Given 'vx0' we obtain the stmt that defines it ('VS1.0'); from the 
+   STMT_VINFO_RELATED_STMT field of 'VS1.0' we obtain the next copy - 'VS1.1',
+   and return its def ('vx.1').
+   Overall, to create the above sequence this function will be called 3 times:
+        vx.1 = vect_get_vec_def_for_stmt_copy (dt, vx.0);
+        vx.2 = vect_get_vec_def_for_stmt_copy (dt, vx.1);
+        vx.3 = vect_get_vec_def_for_stmt_copy (dt, vx.2);  */
+
+static tree
+vect_get_vec_def_for_stmt_copy (enum vect_def_type dt, tree vec_oprnd)
+{
+  tree vec_stmt_for_operand;
+  stmt_vec_info def_stmt_info;
+
+  if (dt == vect_invariant_def || dt == vect_constant_def)
+    {
+      /* Do nothing; can reuse same def.  */ ;
+      return vec_oprnd;
+    }
+
+  vec_stmt_for_operand = SSA_NAME_DEF_STMT (vec_oprnd);
+  def_stmt_info = vinfo_for_stmt (vec_stmt_for_operand);
+  gcc_assert (def_stmt_info);
+  vec_stmt_for_operand = STMT_VINFO_RELATED_STMT (def_stmt_info);
+  gcc_assert (vec_stmt_for_operand);
+  vec_oprnd = TREE_OPERAND (vec_stmt_for_operand, 0);
+
+  return vec_oprnd;
+}
+
+
 /* Function vect_finish_stmt_generation.
 
    Insert a new stmt.  */
 
 static void
-vect_finish_stmt_generation (tree stmt, tree vec_stmt, block_stmt_iterator *bsi)
+vect_finish_stmt_generation (tree stmt, tree vec_stmt, 
+			     block_stmt_iterator *bsi)
 {
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+
   bsi_insert_before (bsi, vec_stmt, BSI_SAME_STMT);
+  set_stmt_info (get_stmt_ann (vec_stmt), 
+		 new_stmt_vec_info (vec_stmt, loop_vinfo)); 
 
   if (vect_print_dump_info (REPORT_DETAILS))
     {
@@ -1135,7 +1278,7 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   tree vec_dest;
   tree scalar_dest;
   tree op;
-  tree loop_vec_def0, loop_vec_def1;
+  tree loop_vec_def0 = NULL_TREE, loop_vec_def1 = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
@@ -1145,7 +1288,7 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   enum machine_mode vec_mode;
   int op_type;
   optab optab, reduc_optab;
-  tree new_temp;
+  tree new_temp = NULL_TREE;
   tree def, def_stmt;
   enum vect_def_type dt;
   tree new_phi;
@@ -1155,6 +1298,14 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   stmt_vec_info orig_stmt_info;
   tree expr = NULL_TREE;
   int i;
+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+  stmt_vec_info prev_stmt_info;
+  tree reduc_def;
+  tree new_stmt = NULL_TREE;
+  int j;
+
+  gcc_assert (ncopies >= 1);
 
   /* 1. Is vectorizable reduction?  */
 
@@ -1194,7 +1345,6 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   operation = TREE_OPERAND (stmt, 1);
   code = TREE_CODE (operation);
   op_type = TREE_CODE_LENGTH (code);
-
   if (op_type != binary_op && op_type != ternary_op)
     return false;
   scalar_dest = TREE_OPERAND (stmt, 0);
@@ -1339,28 +1489,63 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   /* Create the reduction-phi that defines the reduction-operand.  */
   new_phi = create_phi_node (vec_dest, loop->header);
 
-  /* Prepare the operand that is defined inside the loop body  */
-  op = TREE_OPERAND (operation, 0);
-  loop_vec_def0 = vect_get_vec_def_for_operand (op, stmt, NULL);
-  if (op_type == binary_op)
-    expr = build2 (code, vectype, loop_vec_def0, PHI_RESULT (new_phi));
-  else if (op_type == ternary_op)
+  /* In case the vectorization factor (VF) is bigger than the number
+     of elements that we can fit in a vectype (nunits), we have to generate
+     more than one vector stmt - i.e - we need to "unroll" the
+     vector stmt by a factor VF/nunits.  For more details see documentation
+     in vectorizable_operation.  */
+
+  prev_stmt_info = NULL;
+  for (j = 0; j < ncopies; j++)
     {
-      op = TREE_OPERAND (operation, 1);
-      loop_vec_def1 = vect_get_vec_def_for_operand (op, stmt, NULL);
-      expr = build3 (code, vectype, loop_vec_def0, loop_vec_def1, 
-		     PHI_RESULT (new_phi));
+      /* Handle uses.  */
+      if (j == 0)
+        {
+          op = TREE_OPERAND (operation, 0);
+          loop_vec_def0 = vect_get_vec_def_for_operand (op, stmt, NULL);
+          if (op_type == ternary_op)
+            {
+              op = TREE_OPERAND (operation, 1);
+              loop_vec_def1 = vect_get_vec_def_for_operand (op, stmt, NULL);
+            }
+                                                                                
+          /* Get the vector def for the reduction variable from the phi node */
+          reduc_def = PHI_RESULT (new_phi);
+        }
+      else
+        {
+          enum vect_def_type dt = vect_unknown_def_type; /* Dummy */
+          loop_vec_def0 = vect_get_vec_def_for_stmt_copy (dt, loop_vec_def0);
+          if (op_type == ternary_op)
+            loop_vec_def1 = vect_get_vec_def_for_stmt_copy (dt, loop_vec_def1);
+                                                                                
+          /* Get the vector def for the reduction variable from the vectorized
+             reduction operation generated in the previous iteration (j-1)  */
+          reduc_def = TREE_OPERAND (new_stmt ,0);
+        }
+                                                                                
+      /* Arguments are ready. create the new vector stmt.  */
+                                                                                
+      if (op_type == binary_op)
+        expr = build2 (code, vectype, loop_vec_def0, reduc_def);
+      else
+        expr = build3 (code, vectype, loop_vec_def0, loop_vec_def1, 
+								reduc_def);
+      new_stmt = build2 (MODIFY_EXPR, void_type_node, vec_dest, expr);
+      new_temp = make_ssa_name (vec_dest, new_stmt);
+      TREE_OPERAND (new_stmt, 0) = new_temp;
+      vect_finish_stmt_generation (stmt, new_stmt, bsi);
+                                                                                
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
     }
-
-  /* Create the vectorized operation that computes the partial results  */
-  *vec_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, expr);
-  new_temp = make_ssa_name (vec_dest, *vec_stmt);
-  TREE_OPERAND (*vec_stmt, 0) = new_temp;
-  vect_finish_stmt_generation (stmt, *vec_stmt, bsi);
-
+                                                                                
   /* Finalize the reduction-phi (set it's arguments) and create the
      epilog reduction code.  */
-  vect_create_epilog_for_reduction (new_temp, stmt, epilog_reduc_code, new_phi);
+  vect_create_epilog_for_reduction (new_temp, stmt, epilog_reduc_code, new_phi);                                                                                
   return true;
 }
 
@@ -1385,6 +1570,12 @@ vectorizable_assignment (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   tree new_temp;
   tree def, def_stmt;
   enum vect_def_type dt;
+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (ncopies > 1)
+    return false; /* FORNOW */
 
   /* Is vectorizable assignment?  */
   if (!STMT_VINFO_RELEVANT_P (stmt_info))
@@ -1475,21 +1666,28 @@ vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   tree scalar_dest;
   tree operation;
   tree op0, op1 = NULL;
-  tree vec_oprnd0, vec_oprnd1=NULL;
+  tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  int i;
   enum tree_code code;
   enum machine_mode vec_mode;
   tree new_temp;
   int op_type;
-  tree op;
   optab optab;
   int icode;
   enum machine_mode optab_op2_mode;
   tree def, def_stmt;
-  enum vect_def_type dt;
+  enum vect_def_type dt0, dt1;
+  tree new_stmt;
+  stmt_vec_info prev_stmt_info;
+  int nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
+  int nunits_out;
+  tree vectype_out;
+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
+  int j;
+
+  gcc_assert (ncopies >= 1);
 
   /* Is STMT a vectorizable binary/unary operation?   */
   if (!STMT_VINFO_RELEVANT_P (stmt_info))
@@ -1511,6 +1709,12 @@ vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   if (TREE_CODE (TREE_OPERAND (stmt, 0)) != SSA_NAME)
     return false;
 
+  scalar_dest = TREE_OPERAND (stmt, 0);
+  vectype_out = get_vectype_for_scalar_type (TREE_TYPE (scalar_dest));
+  nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+  if (nunits_out != nunits_in)
+    return false;
+
   operation = TREE_OPERAND (stmt, 1);
   code = TREE_CODE (operation);
   optab = optab_for_tree_code (code, vectype);
@@ -1524,16 +1728,24 @@ vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
       return false;
     }
 
-  for (i = 0; i < op_type; i++)
+  op0 = TREE_OPERAND (operation, 0);
+  if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
     {
-      op = TREE_OPERAND (operation, i);
-      if (!vect_is_simple_use (op, loop_vinfo, &def_stmt, &def, &dt))
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "use not simple.");
+      return false;
+    }
+                                                                                
+  if (op_type == binary_op)
+    {
+      op1 = TREE_OPERAND (operation, 1);
+      if (!vect_is_simple_use (op1, loop_vinfo, &def_stmt, &def, &dt1))
 	{
 	  if (vect_print_dump_info (REPORT_DETAILS))
 	    fprintf (vect_dump, "use not simple.");
 	  return false;
-	}	
-    } 
+	}
+    }
 
   /* Supportable by target?  */
   if (!optab)
@@ -1576,8 +1788,8 @@ vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
 	 by a scalar shift operand.  */
       optab_op2_mode = insn_data[icode].operand[2].mode;
       if (! (VECTOR_MODE_P (optab_op2_mode)
-	     || dt == vect_constant_def
-	     || dt == vect_invariant_def))
+	     || dt1 == vect_constant_def
+	     || dt1 == vect_invariant_def))
 	{
 	  if (vect_print_dump_info (REPORT_DETAILS))
 	    fprintf (vect_dump, "operand mode requires invariant argument.");
@@ -1597,49 +1809,485 @@ vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
     fprintf (vect_dump, "transform binary/unary operation.");
 
   /* Handle def.  */
-  scalar_dest = TREE_OPERAND (stmt, 0);
   vec_dest = vect_create_destination_var (scalar_dest, vectype);
 
-  /* Handle uses.  */
+  /* In case the vectorization factor (VF) is bigger than the number
+     of elements that we can fit in a vectype (nunits), we have to generate
+     more than one vector stmt - i.e - we need to "unroll" the
+     vector stmt by a factor VF/nunits. In doing so, we record a pointer
+     from one copy of the vector stmt to the next, in the field
+     STMT_VINFO_RELATED_STMT. This is necessary in order to allow following
+     stages to find the correct vector defs to be used when vectorizing
+     stmts that use the defs of the current stmt. The example below illustrates
+     the vectorization process when VF=16 and nunits=4 (i.e - we need to create
+     4 vectorized stmts):
+                                                                                
+     before vectorization:
+                                RELATED_STMT    VEC_STMT
+        S1:     x = memref      -               -
+        S2:     z = x + 1       -               -
+                                                                                
+     step 1: vectorize stmt S1 (done in vectorizable_load. See more details
+             there):
+                                RELATED_STMT    VEC_STMT
+        VS1_0:  vx0 = memref0   VS1_1           -
+        VS1_1:  vx1 = memref1   VS1_2           -
+        VS1_2:  vx2 = memref2   VS1_3           -
+        VS1_3:  vx3 = memref3   -               -
+        S1:     x = load        -               VS1_0
+        S2:     z = x + 1       -               -
+                                                                                
+     step2: vectorize stmt S2 (done here):
+        To vectorize stmt S2 we first need to find the relevant vector
+        def for the first operand 'x'. This is, as usual, obtained from
+        the vector stmt recorded in the STMT_VINFO_VEC_STMT of the stmt
+        that defines 'x' (S1). This way we find the stmt VS1_0, and the
+        relevant vector def 'vx0'. Having found 'vx0' we can generate
+        the vector stmt VS2_0, and as usual, record it in the
+        STMT_VINFO_VEC_STMT of stmt S2.
+        When creating the second copy (VS2_1), we obtain the relevant vector
+        def from the vector stmt recorded in the STMT_VINFO_RELATED_STMT of
+        stmt VS1_0. This way we find the stmt VS1_1 and the relevant
+        vector def 'vx1'. Using 'vx1' we create stmt VS2_1 and record a
+        pointer to it in the STMT_VINFO_RELATED_STMT of the vector stmt VS2_0.
+        Similarly when creating stmts VS2_2 and VS2_3. This is the resulting
+        chain of stmts and pointers:
+                                RELATED_STMT    VEC_STMT
+        VS1_0:  vx0 = memref0   VS1_1           -
+        VS1_1:  vx1 = memref1   VS1_2           -
+        VS1_2:  vx2 = memref2   VS1_3           -
+        VS1_3:  vx3 = memref3   -               -
+        S1:     x = load        -               VS1_0
+        VS2_0:  vz0 = vx0 + v1  VS2_1           -
+        VS2_1:  vz1 = vx1 + v1  VS2_2           -
+        VS2_2:  vz2 = vx2 + v1  VS2_3           -
+        VS2_3:  vz3 = vx3 + v1  -               -
+        S2:     z = x + 1       -               VS2_0  */
+                                                                                
+  prev_stmt_info = NULL;
+  for (j = 0; j < ncopies; j++)
+    {
+      /* Handle uses.  */
+      if (j == 0)
+	{
+	  vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	  if (op_type == binary_op)
+	    {
+	      if (code == LSHIFT_EXPR || code == RSHIFT_EXPR)
+	        {
+	          /* Vector shl and shr insn patterns can be defined with
+	             scalar operand 2 (shift operand).  In this case, use
+	             constant or loop invariant op1 directly, without
+	             extending it to vector mode first.  */
+	          optab_op2_mode = insn_data[icode].operand[2].mode;
+	          if (!VECTOR_MODE_P (optab_op2_mode))
+	            {
+	              if (vect_print_dump_info (REPORT_DETAILS))
+	                fprintf (vect_dump, "operand 1 using scalar mode.");
+	              vec_oprnd1 = op1;
+	            }
+	        }
+	      if (!vec_oprnd1)
+	        vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
+	    }
+	}
+      else
+	{
+	  vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	  if (op_type == binary_op)
+	    vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt1, vec_oprnd1);
+	}
+
+      /* Arguments are ready. create the new vector stmt.  */
+                                                                                
+      if (op_type == binary_op)
+        new_stmt = build2 (MODIFY_EXPR, void_type_node, vec_dest,
+                    build2 (code, vectype, vec_oprnd0, vec_oprnd1));
+      else
+        new_stmt = build2 (MODIFY_EXPR, void_type_node, vec_dest,
+                    build1 (code, vectype, vec_oprnd0));
+      new_temp = make_ssa_name (vec_dest, new_stmt);
+      TREE_OPERAND (new_stmt, 0) = new_temp;
+      vect_finish_stmt_generation (stmt, new_stmt, bsi);
+                                                                                
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+    }
+
+  return true;
+}
+
+
+/* Function vectorizable_type_demotion
+                                                                                
+   Check if STMT performs a binary or unary operation that involves
+   type demotion, and if it can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   stmt to replace it, put it in VEC_STMT, and insert it at BSI.
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+                                                                                
+bool
+vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
+                             tree *vec_stmt)
+{
+  tree vec_dest;
+  tree scalar_dest;
+  tree operation;
+  tree op0;
+  tree vec_oprnd0=NULL, vec_oprnd1=NULL;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  enum tree_code code;
+  tree new_temp;
+  tree def, def_stmt;
+  enum vect_def_type dt0;
+  tree new_stmt;
+  stmt_vec_info prev_stmt_info;
+  int nunits_in;
+  int nunits_out;
+  tree vectype_out;
+  int ncopies;
+  int j;
+  tree expr;
+  tree vectype_in;
+  tree scalar_type;
+  optab optab;
+  enum machine_mode vec_mode;
+                                                                                
+  /* Is STMT a vectorizable type-demotion operation?  */
+                                                                                
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+                                                                                
+  gcc_assert (STMT_VINFO_DEF_TYPE (stmt_info) == vect_loop_def);
+                                                                                
+  if (STMT_VINFO_LIVE_P (stmt_info))
+    {
+      /* FORNOW: not yet supported.  */
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "value used after loop.");
+      return false;
+    }
+                                                                                
+  if (TREE_CODE (stmt) != MODIFY_EXPR)
+    return false;
+                                                                                
+  if (TREE_CODE (TREE_OPERAND (stmt, 0)) != SSA_NAME)
+    return false;
+                                                                                
+  operation = TREE_OPERAND (stmt, 1);
+  code = TREE_CODE (operation);
+  if (code != NOP_EXPR && code != CONVERT_EXPR)
+    return false;
+                                                                                
+  op0 = TREE_OPERAND (operation, 0);
+  vectype_in = get_vectype_for_scalar_type (TREE_TYPE (op0));
+  nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
+                                                                                
+  scalar_dest = TREE_OPERAND (stmt, 0);
+  scalar_type = TREE_TYPE (scalar_dest);
+  vectype_out = get_vectype_for_scalar_type (scalar_type);
+  nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+  if (nunits_in != nunits_out / 2) /* FORNOW */
+    return false;
+                                                                                
+  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
+  gcc_assert (ncopies >= 1);
+                                                                                
+  /* Check the operands of the operation.  */
+  if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "use not simple.");
+      return false;
+    }
+                                                                                
+  /* Supportable by target?  */
+  code = VEC_PACK_MOD_EXPR;
+  optab = optab_for_tree_code (VEC_PACK_MOD_EXPR, vectype_in);
+  if (!optab)
+    return false;
+                                                                                
+  vec_mode = TYPE_MODE (vectype_in);
+  if (optab->handlers[(int) vec_mode].insn_code == CODE_FOR_nothing)
+    return false;
+                                                                                
+  STMT_VINFO_VECTYPE (stmt_info) = vectype_in;
+                                                                                
+  if (!vec_stmt) /* transformation not required.  */
+    {
+      STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
+      return true;
+    }
+                                                                                
+  /** Transform.  **/
+                                                                                
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "transform type demotion operation. ncopies = %d.",
+                        ncopies);
+                                                                                
+  /* Handle def.  */
+  vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
+  
+  /* In case the vectorization factor (VF) is bigger than the number
+     of elements that we can fit in a vectype (nunits), we have to generate
+     more than one vector stmt - i.e - we need to "unroll" the
+     vector stmt by a factor VF/nunits.   */
+  prev_stmt_info = NULL;
+  for (j = 0; j < ncopies; j++)
+    {
+      /* Handle uses.  */
+      if (j == 0)
+	{
+	  enum vect_def_type dt = vect_unknown_def_type; /* Dummy */
+	  vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	  vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt, vec_oprnd0);
+	}
+      else
+	{
+	  vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd1);
+	  vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	}
+                                                                                
+      /* Arguments are ready. Create the new vector stmt.  */
+      expr = build2 (code, vectype_out, vec_oprnd0, vec_oprnd1);
+      new_stmt = build2 (MODIFY_EXPR, void_type_node, vec_dest, expr);
+      new_temp = make_ssa_name (vec_dest, new_stmt);
+      TREE_OPERAND (new_stmt, 0) = new_temp;
+      vect_finish_stmt_generation (stmt, new_stmt, bsi);
+                                                                                
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+                                                                                
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+    }
+                                                                                
+  *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
+  return true;
+}
+
+
+/* Function vect_gen_widened_results_half
+
+   Create a vector stmt whose code, type, number of arguments, and result
+   variable are CODE, VECTYPE, OP_TYPE, and VEC_DEST, and its arguments are 
+   VEC_OPRND0 and VEC_OPRND1. The new vector stmt is to be inserted at BSI.
+   In the case that CODE is a CALL_EXPR, this means that a call to DECL
+   needs to be created (DECL is a function-decl of a target-builtin).
+   STMT is the original scalar stmt that we are vectorizing.  */
+
+static tree
+vect_gen_widened_results_half (enum tree_code code, tree vectype, tree decl,
+                               tree vec_oprnd0, tree vec_oprnd1, int op_type,
+                               tree vec_dest, block_stmt_iterator *bsi,
+			       tree stmt)
+{ 
+  tree vec_params;
+  tree expr; 
+  tree new_stmt; 
+  tree new_temp; 
+  tree sym; 
+  ssa_op_iter iter;
+ 
+  /* Generate half of the widened result:  */ 
+  if (code == CALL_EXPR) 
+    {  
+      /* Target specific support  */ 
+      vec_params = build_tree_list (NULL_TREE, vec_oprnd0); 
+      if (op_type == binary_op) 
+        vec_params = tree_cons (NULL_TREE, vec_oprnd1, vec_params); 
+      expr = build_function_call_expr (decl, vec_params); 
+    } 
+  else 
+    { 
+      /* Generic support */ 
+      gcc_assert (op_type == TREE_CODE_LENGTH (code)); 
+      if (op_type == binary_op) 
+        expr = build2 (code, vectype, vec_oprnd0, vec_oprnd1); 
+      else  
+        expr = build1 (code, vectype, vec_oprnd0); 
+    } 
+  new_stmt = build2 (MODIFY_EXPR, void_type_node, vec_dest, expr); 
+  new_temp = make_ssa_name (vec_dest, new_stmt); 
+  TREE_OPERAND (new_stmt, 0) = new_temp; 
+  vect_finish_stmt_generation (stmt, new_stmt, bsi); 
+
+  if (code == CALL_EXPR)
+    {
+      FOR_EACH_SSA_TREE_OPERAND (sym, new_stmt, iter, SSA_OP_ALL_VIRTUALS)
+        {
+          if (TREE_CODE (sym) == SSA_NAME)
+            sym = SSA_NAME_VAR (sym);
+          mark_sym_for_renaming (sym);
+        }
+    }
+
+  return new_stmt;
+}
+
+
+/* Function vectorizable_type_promotion
+
+   Check if STMT performs a binary or unary operation that involves
+   type promotion, and if it can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   stmt to replace it, put it in VEC_STMT, and insert it at BSI.
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_type_promotion (tree stmt, block_stmt_iterator *bsi,
+                             tree *vec_stmt)
+{
+  tree vec_dest;
+  tree scalar_dest;
+  tree operation;
+  tree op0, op1 = NULL;
+  tree vec_oprnd0=NULL, vec_oprnd1=NULL;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  enum tree_code code, code1 = CODE_FOR_nothing, code2 = CODE_FOR_nothing;
+  tree decl1 = NULL_TREE, decl2 = NULL_TREE;
+  int op_type; 
+  tree def, def_stmt;
+  enum vect_def_type dt0, dt1;
+  tree new_stmt;
+  stmt_vec_info prev_stmt_info;
+  int nunits_in;
+  int nunits_out;
+  tree vectype_out;
+  int ncopies;
+  int j;
+  tree vectype_in;
+  
+  /* Is STMT a vectorizable type-promotion operation?  */
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  gcc_assert (STMT_VINFO_DEF_TYPE (stmt_info) == vect_loop_def);
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+    {
+      /* FORNOW: not yet supported.  */
+      if (vect_print_dump_info (REPORT_DETAILS))
+	fprintf (vect_dump, "value used after loop.");
+      return false;
+    }
+
+  if (TREE_CODE (stmt) != MODIFY_EXPR)
+    return false;
+
+  if (TREE_CODE (TREE_OPERAND (stmt, 0)) != SSA_NAME)
+    return false;
+
+  operation = TREE_OPERAND (stmt, 1);
+  code = TREE_CODE (operation);
+  if (code != NOP_EXPR && code != WIDEN_MULT_EXPR)
+    return false;
+
   op0 = TREE_OPERAND (operation, 0);
-  vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+  vectype_in = get_vectype_for_scalar_type (TREE_TYPE (op0));
+  nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
+  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
+  gcc_assert (ncopies >= 1);
+
+  scalar_dest = TREE_OPERAND (stmt, 0);
+  vectype_out = get_vectype_for_scalar_type (TREE_TYPE (scalar_dest));
+  nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+  if (nunits_out != nunits_in / 2) /* FORNOW */
+    return false;
+
+  /* Check the operands of the operation.  */
+  if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+	fprintf (vect_dump, "use not simple.");
+      return false;
+    }
 
+  op_type = TREE_CODE_LENGTH (code);
   if (op_type == binary_op)
     {
       op1 = TREE_OPERAND (operation, 1);
+      if (!vect_is_simple_use (op1, loop_vinfo, &def_stmt, &def, &dt1))
+        {
+	  if (vect_print_dump_info (REPORT_DETAILS))
+	    fprintf (vect_dump, "use not simple.");
+          return false;
+        }
+    }
 
-      if (code == LSHIFT_EXPR || code == RSHIFT_EXPR)
-	{
-	  /* Vector shl and shr insn patterns can be defined with
-	     scalar operand 2 (shift operand).  In this case, use
-	     constant or loop invariant op1 directly, without
-	     extending it to vector mode first.  */
+  /* Supportable by target?  */
+  if (!supportable_widening_operation (code, stmt, vectype_in,
+				       &decl1, &decl2, &code1, &code2))
+    return false;
 
-	  optab_op2_mode = insn_data[icode].operand[2].mode;
-	  if (!VECTOR_MODE_P (optab_op2_mode))
-	    {
-	      if (vect_print_dump_info (REPORT_DETAILS))
-		fprintf (vect_dump, "operand 1 using scalar mode.");
-	      vec_oprnd1 = op1;
-	    }
-	}
+  STMT_VINFO_VECTYPE (stmt_info) = vectype_in;
 
-      if (!vec_oprnd1)
-	vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL); 
+  if (!vec_stmt) /* transformation not required.  */
+    {
+      STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
+      return true;
     }
 
-  /* Arguments are ready. create the new vector stmt.  */
+  /** Transform.  **/
 
-  if (op_type == binary_op)
-    *vec_stmt = build2 (MODIFY_EXPR, vectype, vec_dest,
-		build2 (code, vectype, vec_oprnd0, vec_oprnd1));
-  else
-    *vec_stmt = build2 (MODIFY_EXPR, vectype, vec_dest,
-		build1 (code, vectype, vec_oprnd0));
-  new_temp = make_ssa_name (vec_dest, *vec_stmt);
-  TREE_OPERAND (*vec_stmt, 0) = new_temp;
-  vect_finish_stmt_generation (stmt, *vec_stmt, bsi);
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "transform type promotion operation. ncopies = %d.",
+                        ncopies);
+
+  /* Handle def.  */
+  vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
+
+  /* In case the vectorization factor (VF) is bigger than the number
+     of elements that we can fit in a vectype (nunits), we have to generate
+     more than one vector stmt - i.e - we need to "unroll" the
+     vector stmt by a factor VF/nunits.   */
 
+  prev_stmt_info = NULL;
+  for (j = 0; j < ncopies; j++)
+    {
+      /* Handle uses.  */
+      if (j == 0)
+        {
+	  vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	  if (op_type == binary_op)
+	    vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
+        }
+      else
+        {
+	  vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	  if (op_type == binary_op)
+	    vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt1, vec_oprnd1);
+        }
+
+      /* Arguments are ready. Create the new vector stmt.  We are creating 
+         two vector defs because the widened result does not fit in one vector.
+         The vectorized stmt can be expressed as a call to a taregt builtin,
+         or a using a tree-code.  */
+      /* Generate first half of the widened result:  */
+      new_stmt = vect_gen_widened_results_half (code1, vectype_out, decl1, 
+                        vec_oprnd0, vec_oprnd1, op_type, vec_dest, bsi, stmt);
+      if (j == 0)
+        STMT_VINFO_VEC_STMT (stmt_info) = new_stmt;
+      else
+        STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+
+      /* Generate second half of the widened result:  */
+      new_stmt = vect_gen_widened_results_half (code2, vectype_out, decl2,
+                        vec_oprnd0, vec_oprnd1, op_type, vec_dest, bsi, stmt);
+      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+
+    }
+
+  *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
   return true;
 }
 
@@ -1658,7 +2306,7 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   tree scalar_dest;
   tree data_ref;
   tree op;
-  tree vec_oprnd1;
+  tree vec_oprnd = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
@@ -1667,8 +2315,16 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   tree dummy;
   enum dr_alignment_support alignment_support_cheme;
   ssa_op_iter iter;
+  def_operand_p def_p;
   tree def, def_stmt;
   enum vect_def_type dt;
+  stmt_vec_info prev_stmt_info;
+  tree dataref_ptr = NULL_TREE;
+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+  int j;
+
+  gcc_assert (ncopies >= 1);
 
   /* Is vectorizable store? */
 
@@ -1707,45 +2363,188 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   /** Transform.  **/
 
   if (vect_print_dump_info (REPORT_DETAILS))
-    fprintf (vect_dump, "transform store");
+    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
 
   alignment_support_cheme = vect_supportable_dr_alignment (dr);
   gcc_assert (alignment_support_cheme);
   gcc_assert (alignment_support_cheme == dr_aligned);  /* FORNOW */
 
-  /* Handle use - get the vectorized def from the defining stmt.  */
-  vec_oprnd1 = vect_get_vec_def_for_operand (op, stmt, NULL);
+  /* In case the vectorization factor (VF) is bigger than the number
+     of elements that we can fit in a vectype (nunits), we have to generate
+     more than one vector stmt - i.e - we need to "unroll" the
+     vector stmt by a factor VF/nunits.  For more details see documentation in 
+     vect_get_vec_def_for_copy_stmt.  */
 
-  /* Handle def.  */
-  /* FORNOW: make sure the data reference is aligned.  */
-  vect_align_data_ref (stmt);
-  data_ref = vect_create_data_ref_ptr (stmt, bsi, NULL_TREE, &dummy, false);
-  data_ref = build_fold_indirect_ref (data_ref);
+  prev_stmt_info = NULL;
+  for (j = 0; j < ncopies; j++)
+    {
+      tree new_stmt;
+      tree ptr_incr;
 
-  /* Arguments are ready. create the new vector stmt.  */
-  *vec_stmt = build2 (MODIFY_EXPR, vectype, data_ref, vec_oprnd1);
-  vect_finish_stmt_generation (stmt, *vec_stmt, bsi);
+      if (j == 0)
+	{
+	  vec_oprnd = vect_get_vec_def_for_operand (op, stmt, NULL);
+	  dataref_ptr = vect_create_data_ref_ptr (stmt, bsi, NULL_TREE, &dummy, 
+						  &ptr_incr, false);
+	}
+      else 
+	{
+	  vec_oprnd = vect_get_vec_def_for_stmt_copy (dt, vec_oprnd);
+	  dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
+	}
 
-  /* Copy the V_MAY_DEFS representing the aliasing of the original array
-     element's definition to the vector's definition then update the
-     defining statement.  The original is being deleted so the same
-     SSA_NAMEs can be used.  */
-  copy_virtual_operands (*vec_stmt, stmt);
+      /* Arguments are ready. create the new vector stmt.  */
+      data_ref = build_fold_indirect_ref (dataref_ptr);
+      new_stmt = build2 (MODIFY_EXPR, vectype, data_ref, vec_oprnd);
+      vect_finish_stmt_generation (stmt, new_stmt, bsi);
 
-  FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_VMAYDEF)
-    {
-      SSA_NAME_DEF_STMT (def) = *vec_stmt;
+     /* Set the V_MAY_DEFS for the vector pointer. If this virtual def has a 
+	use outside the loop and a loop peel is performed then the def may be 
+	renamed by the peel.  Mark it for renaming so the later use will also 
+	be renamed.  */
+      copy_virtual_operands (new_stmt, stmt);
+      if (j == 0)
+	{
+	  /* The original store is deleted so the same SSA_NAMEs can be used.  
+	   */
+	  FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_VMAYDEF)
+	    {
+	      SSA_NAME_DEF_STMT (def) = new_stmt;
+	      mark_sym_for_renaming (SSA_NAME_VAR (def));
+	    }
 
-      /* If this virtual def has a use outside the loop and a loop peel is 
-	 performed then the def may be renamed by the peel.  Mark it for 
-	 renaming so the later use will also be renamed.  */
-      mark_sym_for_renaming (SSA_NAME_VAR (def));
+	  STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
+	}
+      else
+	{
+	  /* Create new names for all the definitions created by COPY and
+	     add replacement mappings for each new name.  */
+          FOR_EACH_SSA_DEF_OPERAND (def_p, new_stmt, iter, SSA_OP_VMAYDEF)
+	    {
+	      create_new_def_for (DEF_FROM_PTR (def_p), new_stmt, def_p);
+	      mark_sym_for_renaming (SSA_NAME_VAR (DEF_FROM_PTR (def_p)));
+	    }
+
+	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	}
+
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
     }
 
   return true;
 }
 
 
+/* Function vect_setup_realignment
+  
+   This function is called when vectorizing an unaligned load using
+   the dr_unaligned_software_pipeline scheme.
+   This function generates the following code at the loop prolog:
+
+      p = initial_addr;
+      msq_init = *(floor(p));   # prolog load
+      realignment_token = call target_builtin; 
+    loop:
+      msq = phi (msq_init, ---)
+
+   The code above sets up a new (vector) pointer, pointing to the first 
+   location accessed by STMT, and a "floor-aligned" load using that pointer.
+   It also generates code to compute the "realignment-token" (if the relevant
+   target hook was defined), and creates a phi-node at the loop-header bb
+   whose arguments are the result of the prolog-load (created by this
+   function) and the result of a load that takes place in the loop (to be
+   created by the caller to this function).
+   The caller to this function uses the phi-result (msq) to create the 
+   realignment code inside the loop, and sets up the missing phi argument,
+   as follows:
+
+    loop: 
+      msq = phi (msq_init, lsq)
+      lsq = *(floor(p'));        # load in loop
+      result = realign_load (msq, lsq, realignment_token);
+
+   Input:
+   STMT - (scalar) load stmt to be vectorized. This load accesses
+          a memory location that may be unaligned.
+   BSI - place where new code is to be inserted.
+   
+   Output:
+   REALIGNMENT_TOKEN - the result of a call to the builtin_mask_for_load
+                       target hook, if defined.
+   Return value - the result of the loop-header phi node.
+*/
+
+static tree
+vect_setup_realignment (tree stmt, block_stmt_iterator *bsi,
+                        tree *realignment_token)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  edge pe = loop_preheader_edge (loop);
+  tree scalar_dest = TREE_OPERAND (stmt, 0);
+  tree vec_dest;
+  tree init_addr;
+  tree inc;
+  tree ptr;
+  tree data_ref;
+  tree new_stmt;
+  basic_block new_bb;
+  tree msq_init;
+  tree new_temp;
+  tree phi_stmt;
+  tree msq;
+
+  /* 1. Create msq_init = *(floor(p1)) in the loop preheader  */
+  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+  ptr = vect_create_data_ref_ptr (stmt, bsi, NULL_TREE, &init_addr, &inc, true);
+  data_ref = build1 (ALIGN_INDIRECT_REF, vectype, ptr);
+  new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, data_ref);
+  new_temp = make_ssa_name (vec_dest, new_stmt);
+  TREE_OPERAND (new_stmt, 0) = new_temp;
+  new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
+  gcc_assert (!new_bb);
+  msq_init = TREE_OPERAND (new_stmt, 0);
+  copy_virtual_operands (new_stmt, stmt);
+  update_vuses_to_preheader (new_stmt, loop);
+
+  /* 2. Create permutation mask, if required, in loop preheader.  */
+  if (targetm.vectorize.builtin_mask_for_load)
+    {
+      tree builtin_decl;
+      tree params = build_tree_list (NULL_TREE, init_addr);
+
+      vec_dest = vect_create_destination_var (scalar_dest, 
+							TREE_TYPE (new_stmt));
+      builtin_decl = targetm.vectorize.builtin_mask_for_load ();
+      new_stmt = build_function_call_expr (builtin_decl, params);
+      new_stmt = build2 (MODIFY_EXPR, void_type_node, vec_dest, new_stmt);
+      new_temp = make_ssa_name (vec_dest, new_stmt);
+      TREE_OPERAND (new_stmt, 0) = new_temp;
+      new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
+      gcc_assert (!new_bb);
+      *realignment_token = TREE_OPERAND (new_stmt, 0);
+
+      /* The result of the CALL_EXPR to this builtin is determined from
+         the value of the parameter and no global variables are touched
+         which makes the builtin a "const" function.  Requiring the
+         builtin to have the "const" attribute makes it unnecessary
+         to call mark_call_clobbered.  */
+      gcc_assert (TREE_READONLY (builtin_decl));
+    }
+
+  /* 3. Create msq = phi <msq_init, lsq> in loop  */
+  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+  msq = make_ssa_name (vec_dest, NULL_TREE);
+  phi_stmt = create_phi_node (msq, loop->header); 
+  SSA_NAME_DEF_STMT (msq) = phi_stmt;
+  add_phi_arg (phi_stmt, msq_init, loop_preheader_edge (loop));
+
+  return msq;
+}
+
+
 /* vectorizable_load.
 
    Check if STMT reads a non scalar data-ref (array/pointer/structure) that 
@@ -1762,18 +2561,25 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   tree data_ref = NULL;
   tree op;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  stmt_vec_info prev_stmt_info; 
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree new_temp;
   int mode;
-  tree init_addr;
   tree new_stmt;
   tree dummy;
-  basic_block new_bb;
-  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  edge pe = loop_preheader_edge (loop);
   enum dr_alignment_support alignment_support_cheme;
+  tree dataref_ptr = NULL_TREE;
+  tree ptr_incr;
+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+  int j;
+  tree msq = NULL_TREE, lsq;
+  tree offset = NULL_TREE;
+  tree realignment_token = NULL_TREE;
+  tree phi_stmt = NULL_TREE;
 
   /* Is vectorizable load? */
   if (!STMT_VINFO_RELEVANT_P (stmt_info))
@@ -1828,142 +2634,148 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   alignment_support_cheme = vect_supportable_dr_alignment (dr);
   gcc_assert (alignment_support_cheme);
 
-  if (alignment_support_cheme == dr_aligned
-      || alignment_support_cheme == dr_unaligned_supported)
-    {
-      /* Create:
+  /* In case the vectorization factor (VF) is bigger than the number
+     of elements that we can fit in a vectype (nunits), we have to generate
+     more than one vector stmt - i.e - we need to "unroll" the
+     vector stmt by a factor VF/nunits. In doing so, we record a pointer
+     from one copy of the vector stmt to the next, in the field
+     STMT_VINFO_RELATED_STMT. This is necessary in order to allow following
+     stages to find the correct vector defs to be used when vectorizing
+     stmts that use the defs of the current stmt. The example below illustrates
+     the vectorization process when VF=16 and nunits=4 (i.e - we need to create
+     4 vectorized stmts):
+
+     before vectorization:
+                                RELATED_STMT    VEC_STMT
+        S1:     x = memref      -               -
+        S2:     z = x + 1       -               -
+
+     step 1: vectorize stmt S1:
+        We first create the vector stmt VS1_0, and, as usual, record a
+        pointer to it in the STMT_VINFO_VEC_STMT of the scalar stmt S1.
+        Next, we create the vector stmt VS1_1, and record a pointer to
+        it in the STMT_VINFO_RELATED_STMT of the vector stmt VS1_0.
+        Similarly, for VS1_2 and VS1_3. This is the resulting chain of
+        stmts and pointers:
+                                RELATED_STMT    VEC_STMT
+        VS1_0:  vx0 = memref0   VS1_1           -
+        VS1_1:  vx1 = memref1   VS1_2           -
+        VS1_2:  vx2 = memref2   VS1_3           -
+        VS1_3:  vx3 = memref3   -               -
+        S1:     x = load        -               VS1_0
+        S2:     z = x + 1       -               -
+
+     See in documentation in vect_get_vec_def_for_stmt_copy for how the 
+     information we recorded in RELATED_STMT field is used to vectorize 
+     stmt S2.  */
+
+  /* If the data reference is aligned (dr_aligned) or potentially unaligned
+     on a target that supports unaligned accesses (dr_unaligned_supported)
+     we generate the following code:
          p = initial_addr;
          indx = 0;
          loop {
+	   p = p + indx * vectype_size;
            vec_dest = *(p);
            indx = indx + 1;
          }
-      */
 
-      vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      data_ref = vect_create_data_ref_ptr (stmt, bsi, NULL_TREE, &dummy, false);
-      if (aligned_access_p (dr))
-        data_ref = build_fold_indirect_ref (data_ref);
-      else
-	{
-	  int mis = DR_MISALIGNMENT (dr);
-	  tree tmis = (mis == -1 ? size_zero_node : size_int (mis));
-	  tmis = size_binop (MULT_EXPR, tmis, size_int(BITS_PER_UNIT));
-	  data_ref = build2 (MISALIGNED_INDIRECT_REF, vectype, data_ref, tmis);
-	}
-      new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, data_ref);
-      new_temp = make_ssa_name (vec_dest, new_stmt);
-      TREE_OPERAND (new_stmt, 0) = new_temp;
-      vect_finish_stmt_generation (stmt, new_stmt, bsi);
-      copy_virtual_operands (new_stmt, stmt);
+     Otherwise, the data reference is potentially unaligned on a target that
+     does not support unaligned accesses (dr_unaligned_software_pipeline) - 
+     then generate the following code, in which the data in each iteration is
+     obtained by two vector loads, one from the previous iteration, and one
+     from the current iteration:
+         p1 = initial_addr;
+         msq_init = *(floor(p1))
+         p2 = initial_addr + VS - 1;
+         realignment_token = call target_builtin;
+         indx = 0;
+         loop {
+           p2 = p2 + indx * vectype_size
+           lsq = *(floor(p2))
+           vec_dest = realign_load (msq, lsq, realignment_token)
+           indx = indx + 1;
+           msq = lsq;
+         }
+  */
+
+  if (alignment_support_cheme == dr_unaligned_software_pipeline)
+    {
+      msq = vect_setup_realignment (stmt, bsi, &realignment_token);
+      phi_stmt = SSA_NAME_DEF_STMT (msq);
+      offset = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
     }
-  else if (alignment_support_cheme == dr_unaligned_software_pipeline)
-    {
-      /* Create:
-	 p1 = initial_addr;
-	 msq_init = *(floor(p1))
-	 p2 = initial_addr + VS - 1;
-	 magic = have_builtin ? builtin_result : initial_address;
-	 indx = 0;
-	 loop {
-	   p2' = p2 + indx * vectype_size
-	   lsq = *(floor(p2'))
-	   vec_dest = realign_load (msq, lsq, magic)
-	   indx = indx + 1;
-	   msq = lsq;
-	 }
-      */
-
-      tree offset;
-      tree magic;
-      tree phi_stmt;
-      tree msq_init;
-      tree msq, lsq;
-      tree dataref_ptr;
-      tree params;
-
-      /* <1> Create msq_init = *(floor(p1)) in the loop preheader  */
-      vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      data_ref = vect_create_data_ref_ptr (stmt, bsi, NULL_TREE, 
-					   &init_addr, true);
-      data_ref = build1 (ALIGN_INDIRECT_REF, vectype, data_ref);
-      new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, data_ref);
-      new_temp = make_ssa_name (vec_dest, new_stmt);
-      TREE_OPERAND (new_stmt, 0) = new_temp;
-      new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
-      gcc_assert (!new_bb);
-      msq_init = TREE_OPERAND (new_stmt, 0);
-      copy_virtual_operands (new_stmt, stmt);
-      update_vuses_to_preheader (new_stmt, loop);
 
+  prev_stmt_info = NULL;
+  for (j = 0; j < ncopies; j++)
+    { 
+      /* 1. Create the vector pointer update chain.  */
+      if (j == 0)
+        dataref_ptr = vect_create_data_ref_ptr (stmt, bsi, offset,
+                                                &dummy, &ptr_incr, false);
+      else
+        dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
 
-      /* <2> Create lsq = *(floor(p2')) in the loop  */ 
-      offset = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
+      /* 2. Create the vector-load in the loop.  */
+      switch (alignment_support_cheme)
+      {
+      case dr_aligned:
+        gcc_assert (aligned_access_p (dr));
+        data_ref = build_fold_indirect_ref (dataref_ptr);
+        break;
+      case dr_unaligned_supported:
+        {
+          int mis = DR_MISALIGNMENT (dr);
+          tree tmis = (mis == -1 ? size_zero_node : size_int (mis));
+
+          gcc_assert (!aligned_access_p (dr));
+          tmis = size_binop (MULT_EXPR, tmis, size_int(BITS_PER_UNIT));
+          data_ref =
+                build2 (MISALIGNED_INDIRECT_REF, vectype, dataref_ptr, tmis);
+          break;
+        }
+      case dr_unaligned_software_pipeline:
+        gcc_assert (!aligned_access_p (dr));
+        data_ref = build1 (ALIGN_INDIRECT_REF, vectype, dataref_ptr);
+        break;
+      default:
+        gcc_unreachable ();
+      }
       vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      dataref_ptr = vect_create_data_ref_ptr (stmt, bsi, offset, &dummy, false);
-      data_ref = build1 (ALIGN_INDIRECT_REF, vectype, dataref_ptr);
       new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, data_ref);
       new_temp = make_ssa_name (vec_dest, new_stmt);
       TREE_OPERAND (new_stmt, 0) = new_temp;
       vect_finish_stmt_generation (stmt, new_stmt, bsi);
-      lsq = TREE_OPERAND (new_stmt, 0);
       copy_virtual_operands (new_stmt, stmt);
+      mark_new_vars_to_rename (new_stmt);
 
-
-      /* <3> */
-      if (targetm.vectorize.builtin_mask_for_load)
-	{
-	  /* Create permutation mask, if required, in loop preheader.  */
-	  tree builtin_decl;
-	  params = build_tree_list (NULL_TREE, init_addr);
-	  builtin_decl = targetm.vectorize.builtin_mask_for_load ();
-	  new_stmt = build_function_call_expr (builtin_decl, params);
-	  vec_dest = vect_create_destination_var (scalar_dest,
-                                                  TREE_TYPE (new_stmt));
-          new_stmt = build2 (MODIFY_EXPR, TREE_TYPE (vec_dest), vec_dest,
-                             new_stmt);
+      /* 3. Handle explicit realignment if necessary/supported.  */
+      if (alignment_support_cheme == dr_unaligned_software_pipeline)
+        {
+          /* Create in loop: 
+             <vec_dest = realign_load (msq, lsq, realignment_token)>  */
+          lsq = TREE_OPERAND (new_stmt, 0);
+          if (!realignment_token)
+            realignment_token = dataref_ptr;
+          vec_dest = vect_create_destination_var (scalar_dest, vectype);
+          new_stmt =
+            build3 (REALIGN_LOAD_EXPR, vectype, msq, lsq, realignment_token);
+          new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, new_stmt);
           new_temp = make_ssa_name (vec_dest, new_stmt);
-	  TREE_OPERAND (new_stmt, 0) = new_temp;
-	  new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
-	  gcc_assert (!new_bb);
-	  magic = TREE_OPERAND (new_stmt, 0);
-
-	  /* The result of the CALL_EXPR to this builtin is determined from
-	     the value of the parameter and no global variables are touched
-	     which makes the builtin a "const" function.  Requiring the
-	     builtin to have the "const" attribute makes it unnecessary
-	     to call mark_call_clobbered.  */
-	  gcc_assert (TREE_READONLY (builtin_decl));
-	}
-      else
-	{
-	  /* Use current address instead of init_addr for reduced reg pressure.
-	   */
-	  magic = dataref_ptr;
-	}
-
-
-      /* <4> Create msq = phi <msq_init, lsq> in loop  */ 
-      vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      msq = make_ssa_name (vec_dest, NULL_TREE);
-      phi_stmt = create_phi_node (msq, loop->header); /* CHECKME */
-      SSA_NAME_DEF_STMT (msq) = phi_stmt;
-      add_phi_arg (phi_stmt, msq_init, loop_preheader_edge (loop));
-      add_phi_arg (phi_stmt, lsq, loop_latch_edge (loop));
-
+          TREE_OPERAND (new_stmt, 0) = new_temp;
+          vect_finish_stmt_generation (stmt, new_stmt, bsi);
+          if (j == ncopies - 1)
+            add_phi_arg (phi_stmt, lsq, loop_latch_edge (loop));
+          msq = lsq;
+        }
 
-      /* <5> Create <vec_dest = realign_load (msq, lsq, magic)> in loop  */
-      vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      new_stmt = build3 (REALIGN_LOAD_EXPR, vectype, msq, lsq, magic);
-      new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, new_stmt);
-      new_temp = make_ssa_name (vec_dest, new_stmt); 
-      TREE_OPERAND (new_stmt, 0) = new_temp;
-      vect_finish_stmt_generation (stmt, new_stmt, bsi);
+      if (j == 0)
+        STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+        STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
     }
-  else
-    gcc_unreachable ();
 
-  *vec_stmt = new_stmt;
   return true;
 }
 
@@ -2093,6 +2905,12 @@ vectorizable_condition (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
   enum machine_mode vec_mode;
   tree def;
   enum vect_def_type dt;
+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (ncopies > 1)
+    return false; /* FORNOW */
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info))
     return false;
@@ -2203,6 +3021,16 @@ vect_transform_stmt (tree stmt, block_stmt_iterator *bsi)
     {
       switch (STMT_VINFO_TYPE (stmt_info))
       {
+      case type_demotion_vec_info_type:
+        done = vectorizable_type_demotion (stmt, bsi, &vec_stmt);
+        gcc_assert (done);
+        break;
+                                                                                
+      case type_promotion_vec_info_type:
+	done = vectorizable_type_promotion (stmt, bsi, &vec_stmt);
+	gcc_assert (done);
+	break;
+
       case op_vec_info_type:
 	done = vectorizable_operation (stmt, bsi, &vec_stmt);
 	gcc_assert (done);
@@ -2269,12 +3097,6 @@ vect_transform_stmt (tree stmt, block_stmt_iterator *bsi)
         done = vectorizable_live_operation (stmt, bsi, &vec_stmt);
         gcc_assert (done);
       }
-
-      if (vec_stmt)
-        {
-          gcc_assert (!STMT_VINFO_VEC_STMT (stmt_info));
-          STMT_VINFO_VEC_STMT (stmt_info) = vec_stmt;
-        }
     }
 
   return is_store; 
@@ -2680,7 +3502,7 @@ vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters)
   if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
     {
       int byte_misalign = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
-      int element_size = vectype_align/vf;
+      int element_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr))));
       int elem_misalign = byte_misalign / element_size;
 
       if (vect_print_dump_info (REPORT_DETAILS))
@@ -3100,11 +3922,11 @@ vect_transform_loop (loop_vec_info loop_vinfo,
 	      bsi_next (&si);
 	      continue;
 	    }
-	  /* FORNOW: Verify that all stmts operate on the same number of
-	             units and no inner unrolling is necessary.  */
-	  gcc_assert 
-		(TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info))
-		 == (unsigned HOST_WIDE_INT) vectorization_factor);
+
+	  if ((TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info))
+		 != (unsigned HOST_WIDE_INT) vectorization_factor)
+	      && vect_print_dump_info (REPORT_DETAILS))
+	    fprintf (vect_dump, "multiple-types.");
 
 	  /* -------- vectorize statement ------------ */
 	  if (vect_print_dump_info (REPORT_DETAILS))
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 2b0064934d7..c35fc302597 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -136,6 +136,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
 #include "cfgloop.h"
 #include "cfglayout.h"
 #include "expr.h"
+#include "recog.h"
 #include "optabs.h"
 #include "params.h"
 #include "toplev.h"
@@ -1359,8 +1360,8 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo)
   STMT_VINFO_TYPE (res) = undef_vec_info_type;
   STMT_VINFO_STMT (res) = stmt;
   STMT_VINFO_LOOP_VINFO (res) = loop_vinfo;
-  STMT_VINFO_RELEVANT_P (res) = 0;
-  STMT_VINFO_LIVE_P (res) = 0;
+  STMT_VINFO_RELEVANT (res) = 0;
+  STMT_VINFO_LIVE_P (res) = false;
   STMT_VINFO_VECTYPE (res) = NULL;
   STMT_VINFO_VEC_STMT (res) = NULL;
   STMT_VINFO_IN_PATTERN_P (res) = false;
@@ -1753,6 +1754,127 @@ vect_is_simple_use (tree operand, loop_vec_info loop_vinfo, tree *def_stmt,
 }
 
 
+/* Function supportable_widening_operation
+
+   Check whether an operation represented by the code CODE is a 
+   widening operation that is supported by the target platform in 
+   vector form (i.e., when operating on arguments of type VECTYPE).
+    
+   The two kinds of widening operations we currently support are
+   NOP and WIDEN_MULT. This function checks if these oprations
+   are supported by the target platform either directly (via vector 
+   tree-codes), or via target builtins.
+
+   Output:
+   - CODE1 and CODE2 are codes of vector operations to be used when 
+   vectorizing the operation, if available. 
+   - DECL1 and DECL2 are decls of target builtin functions to be used
+   when vectorizing the operation, if available. In this case,
+   CODE1 and CODE2 are CALL_EXPR.  */
+
+bool
+supportable_widening_operation (enum tree_code code, tree stmt, tree vectype,
+                                tree *decl1, tree *decl2,
+                                enum tree_code *code1, enum tree_code *code2)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  bool ordered_p;
+  enum machine_mode vec_mode;
+  enum insn_code icode1, icode2;
+  optab optab1, optab2;
+  tree expr = TREE_OPERAND (stmt, 1);
+  tree type = TREE_TYPE (expr);
+  tree wide_vectype = get_vectype_for_scalar_type (type);
+  enum tree_code c1, c2;
+
+  /* The result of a vectorized widening operation usually requires two vectors 
+     (because the widened results do not fit int one vector). The generated 
+     vector results would normally be expected to be generated in the same 
+     order as in the original scalar computation. i.e. if 8 results are 
+     generated in each vector iteration, they are to be organized as follows:
+        vect1: [res1,res2,res3,res4], vect2: [res5,res6,res7,res8]. 
+
+     However, in the special case that the result of the widening operation is 
+     used in a reduction copmutation only, the order doesn't matter (because 
+     when vectorizing a reduction we change the order of the computation). 
+     Some targets can take advatage of this and generate more efficient code. 
+     For example, targets like Altivec, that support widen_mult using a sequence
+     of {mult_even,mult_odd} generate the following vectors:
+        vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8].  */
+
+   if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction)
+     ordered_p = false;
+   else
+     ordered_p = true;
+
+  if (!ordered_p
+      && code == WIDEN_MULT_EXPR
+      && targetm.vectorize.builtin_mul_widen_even
+      && targetm.vectorize.builtin_mul_widen_even (vectype)
+      && targetm.vectorize.builtin_mul_widen_odd
+      && targetm.vectorize.builtin_mul_widen_odd (vectype))
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "Unordered widening operation detected.");
+
+      *code1 = *code2 = CALL_EXPR;
+      *decl1 = targetm.vectorize.builtin_mul_widen_even (vectype);
+      *decl2 = targetm.vectorize.builtin_mul_widen_odd (vectype);
+      return true;
+    }
+
+  switch (code)
+    {
+    case WIDEN_MULT_EXPR:
+      if (BYTES_BIG_ENDIAN)
+        {
+          c1 = VEC_WIDEN_MULT_HI_EXPR;
+          c2 = VEC_WIDEN_MULT_LO_EXPR;
+        }
+      else
+        {
+          c2 = VEC_WIDEN_MULT_HI_EXPR;
+          c1 = VEC_WIDEN_MULT_LO_EXPR;
+        }
+      break;
+
+    case NOP_EXPR:
+      if (BYTES_BIG_ENDIAN)
+        {
+          c1 = VEC_UNPACK_HI_EXPR;
+          c2 = VEC_UNPACK_LO_EXPR;
+        }
+      else
+        {
+          c2 = VEC_UNPACK_HI_EXPR;
+          c1 = VEC_UNPACK_LO_EXPR;
+        }
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  *code1 = c1;
+  *code2 = c2;
+  optab1 = optab_for_tree_code (c1, vectype);
+  optab2 = optab_for_tree_code (c2, vectype);
+
+  if (!optab1 || !optab2)
+    return false;
+
+  vec_mode = TYPE_MODE (vectype);
+  if ((icode1 = optab1->handlers[(int) vec_mode].insn_code) == CODE_FOR_nothing
+      || insn_data[icode1].operand[0].mode != TYPE_MODE (wide_vectype)
+      || (icode2 = optab2->handlers[(int) vec_mode].insn_code)
+                                                        == CODE_FOR_nothing
+      || insn_data[icode2].operand[0].mode != TYPE_MODE (wide_vectype))
+    return false;
+
+  return true;
+}
+
+
 /* Function reduction_code_for_scalar_code
 
    Input:
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 536aae803ce..b56f7ded95c 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -165,7 +165,16 @@ enum stmt_vec_info_type {
   op_vec_info_type,
   assignment_vec_info_type,
   condition_vec_info_type,
-  reduc_vec_info_type
+  reduc_vec_info_type,
+  type_promotion_vec_info_type,
+  type_demotion_vec_info_type
+};
+
+/* Indicates whether/how a variable is used in the loop.  */
+enum vect_relevant {
+  vect_unused_in_loop = 0,
+  vect_used_by_reduction,
+  vect_used_in_loop  
 };
 
 typedef struct data_reference *dr_p;
@@ -182,10 +191,10 @@ typedef struct _stmt_vec_info {
   /* The loop_vec_info with respect to which STMT is vectorized.  */
   loop_vec_info loop_vinfo;
 
-  /* Not all stmts in the loop need to be vectorized. e.g, the incrementation
+  /* Not all stmts in the loop need to be vectorized. e.g, the increment
      of the loop induction variable and computation of array indexes. relevant
      indicates whether the stmt needs to be vectorized.  */
-  bool relevant;
+  enum vect_relevant relevant;
 
   /* Indicates whether this stmts is part of a computation whose result is
      used outside the loop.  */
@@ -232,7 +241,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_TYPE(S)                (S)->type
 #define STMT_VINFO_STMT(S)                (S)->stmt
 #define STMT_VINFO_LOOP_VINFO(S)          (S)->loop_vinfo
-#define STMT_VINFO_RELEVANT_P(S)          (S)->relevant
+#define STMT_VINFO_RELEVANT(S)            (S)->relevant
 #define STMT_VINFO_LIVE_P(S)              (S)->live
 #define STMT_VINFO_VECTYPE(S)             (S)->vectype
 #define STMT_VINFO_VEC_STMT(S)            (S)->vectorized_stmt
@@ -242,6 +251,8 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_SAME_ALIGN_REFS(S)     (S)->same_align_refs
 #define STMT_VINFO_DEF_TYPE(S)            (S)->def_type
 
+#define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_loop)
+
 static inline void set_stmt_info (stmt_ann_t ann, stmt_vec_info stmt_info);
 static inline stmt_vec_info vinfo_for_stmt (tree stmt);
 
@@ -328,6 +339,8 @@ extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
 extern enum dr_alignment_support vect_supportable_dr_alignment
   (struct data_reference *);
 extern bool reduction_code_for_scalar_code (enum tree_code, enum tree_code *);
+extern bool supportable_widening_operation (enum tree_code, tree, tree,
+  tree *, tree *, enum tree_code *, enum tree_code *);
 /* Creation and deletion of loop and stmt info structs.  */
 extern loop_vec_info new_loop_vec_info (struct loop *loop);
 extern void destroy_loop_vec_info (loop_vec_info);
@@ -354,6 +367,8 @@ void vect_pattern_recog (loop_vec_info);
 extern bool vectorizable_load (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_store (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_operation (tree, block_stmt_iterator *, tree *);
+extern bool vectorizable_type_promotion (tree, block_stmt_iterator *, tree *);
+extern bool vectorizable_type_demotion (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_assignment (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_condition (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_live_operation (tree, block_stmt_iterator *, tree *);
diff --git a/gcc/tree.def b/gcc/tree.def
index ffc772fa173..2c16cb91f48 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1073,6 +1073,28 @@ DEFTREECODE (WIDEN_MULT_EXPR, "widen_mult_expr", tcc_binary, 2)
 DEFTREECODE (VEC_LSHIFT_EXPR, "vec_lshift_expr", tcc_binary, 2)
 DEFTREECODE (VEC_RSHIFT_EXPR, "vec_rshift_expr", tcc_binary, 2)
 
+/* Widening vector multiplication.
+   The two operands are vectors with N elements of size S. Multiplying the
+   elements of the two vectors will result in N products of size 2*S.
+   VEC_WIDEN_MULT_HI_EXPR computes the N/2 high products.
+   VEC_WIDEN_MULT_LO_EXPR computes the N/2 low products.  */
+DEFTREECODE (VEC_WIDEN_MULT_HI_EXPR, "widen_mult_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_MULT_LO_EXPR, "widen_mult_hi_expr", tcc_binary, 2)
+
+/* Unpack (extract and promote/widen) the high/low elements of the input vector
+   into the output vector. The input vector has twice as many elements
+   as the output vector, that are half the size of the elements
+   of the output vector.  This is used to support type promotion. */
+DEFTREECODE (VEC_UNPACK_HI_EXPR, "vec_unpack_hi_expr", tcc_unary, 1)
+DEFTREECODE (VEC_UNPACK_LO_EXPR, "vec_unpack_lo_expr", tcc_unary, 1)
+
+/* Pack (demote/narrow and merge) the elements of the two input vectors
+   into the output vector, using modulo/saturating arithmetic.
+   The elements of the input vectors are twice the size of the elements of the
+   output vector.  This is used to support type demotion.  */
+DEFTREECODE (VEC_PACK_MOD_EXPR, "vec_pack_mod_expr", tcc_binary, 2)
+DEFTREECODE (VEC_PACK_SAT_EXPR, "vec_pack_sat_expr", tcc_binary, 2)
+                                                                                
 /*
 Local variables:
 mode:c
author	Dorit Nuzman <dorit@il.ibm.com>	2006-11-08 07:32:44 +0000
committer	Dorit Nuzman <dorit@il.ibm.com>	2006-11-08 07:32:44 +0000
commit	73ded14a31c67f5c40c2b11b109f68ddf7be7e3b (patch)
tree	f63ce21ba1bb5e2d1d0cb84948e597d8223aaab5
parent	b5b83ee7d0e568bfb183b10956d0ea0c40904745 (diff)